POST /api/chat/free
Free-tier chat endpoint. OpenAI-shape request, OpenAI-shape response. Routes through the foundation-seeded Fireworks credential by default; falls through to the caller's active Fireworks BYOK credential if one exists. Used by the Pulse in guest mode.
A kernel CKO envelope is computed pre-call and injected into the messages so the model knows the live framework state (operators it can cite, R(t), phase, proof_digest of recent computes). Self-heals on upstream errors with a graceful message rather than leaking the provider's error text.
Auth
None required. CORS-open, rate-limited per IP. The Pulse on any page can call it directly.
Cost
0 ZEQ to the caller. The framework absorbs the cost via the foundation-seeded credential. Free-tier limit: ~10 chats/day per visitor IP, applied at the rate-limiter layer.
Request
curl -X POST https://YOUR-FRAMEWORK/api/chat/free \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "user", "content": "What does operator KO42 do?" }
],
"stream": false,
"temperature": 0.7,
"max_tokens": 4096,
"model": null
}'
Body
| Field | Type | Required | Notes |
|---|---|---|---|
messages | OpenAI-shape array | yes | [{role, content}, ...]. Last 20 are kept. |
stream | bool | no | Default true (SSE). Set false for single-shot JSON. |
temperature | float (0–2) | no | Default 0.7. |
max_tokens | int | no | Default 4096 (Fireworks's non-streaming cap). |
model | string | no | Override accounts/fireworks/models/llama-v3p3-70b-instruct. |
Response · 200 OK (stream=false)
OpenAI chat-completions shape, with extra X-Zeq-* headers:
{
"id": "chatcmpl-...",
"object": "chat.completion",
"model": "accounts/fireworks/models/llama-v3p3-70b-instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "KO42 is the metric tensioner — it enforces..."
},
"finish_reason": "stop"
}
],
"usage": { "prompt_tokens": 1240, "completion_tokens": 420, "total_tokens": 1660 }
}
Headers:
X-Zeq-Provider: fireworks
X-Zeq-Model: accounts/fireworks/models/llama-v3p3-70b-instruct
X-Zeq-Tier: free
Response · 200 OK (stream=true)
Standard SSE chunks:
data: {"id":"...","choices":[{"delta":{"content":"KO42"}}]}
data: {"id":"...","choices":[{"delta":{"content":" is the"}}]}
data: [DONE]
Self-heal on upstream error
If Fireworks (or your BYOK provider) returns a non-2xx, the framework doesn't leak the upstream error. It returns a graceful self-heal message instead, surfaced as a normal assistant message:
The free-tier provider returned a hiccup (HTTP 400). The HulyaPulse
continues at 1.287 Hz. Try again in a moment, or add your own
Fireworks key in Settings → Models for unlimited use.
This keeps the Pulse's UX graceful and points the user at the BYOK upgrade path. Per-IP refund is applied so the bad call doesn't count against the free-tier quota.
Rate limits
- ~10 chats/day per visitor IP (sliding window).
- Per-window quota refunds on upstream failure (self-heal path).
- 429 returned with
Retry-Afterheader when exhausted.
CORS
Access-Control-Allow-Origin: *. The Pulse on any page (framework,
hosted /s/<slug>/, or third-party embed via /embed/orb.js) can call
this endpoint directly.
Related
- Pulse — what calls this
- BYOK — bypass the free-tier limit by adding your own key
/api/zeq/agent/page-chat— the machine-bound, paid alternative