`POST /api/chat/free`

Free-tier chat endpoint. OpenAI-shape request, OpenAI-shape response. Routes through the foundation-seeded Fireworks credential by default; falls through to the caller's active Fireworks BYOK credential if one exists. Used by the Pulse in guest mode.

A kernel CKO envelope is computed pre-call and injected into the messages so the model knows the live framework state (operators it can cite, R(t), phase, proof_digest of recent computes). Self-heals on upstream errors with a graceful message rather than leaking the provider's error text.

Auth

None required. CORS-open, rate-limited per IP. The Pulse on any page can call it directly.

Cost

0 ZEQ to the caller. The framework absorbs the cost via the foundation-seeded credential. Free-tier limit: ~10 chats/day per visitor IP, applied at the rate-limiter layer.

Request

curl -X POST https://YOUR-FRAMEWORK/api/chat/free \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      { "role": "user", "content": "What does operator KO42 do?" }
    ],
    "stream":      false,
    "temperature": 0.7,
    "max_tokens":  4096,
    "model":       null
  }'

Body

Field	Type	Required	Notes
`messages`	OpenAI-shape array	yes	`[{role, content}, ...]`. Last 20 are kept.
`stream`	bool	no	Default `true` (SSE). Set `false` for single-shot JSON.
`temperature`	float (0–2)	no	Default 0.7.
`max_tokens`	int	no	Default 4096 (Fireworks's non-streaming cap).
`model`	string	no	Override `accounts/fireworks/models/llama-v3p3-70b-instruct`.

Response · 200 OK (stream=false)

OpenAI chat-completions shape, with extra X-Zeq-* headers:

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "model": "accounts/fireworks/models/llama-v3p3-70b-instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "KO42 is the metric tensioner — it enforces..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": { "prompt_tokens": 1240, "completion_tokens": 420, "total_tokens": 1660 }
}

Headers:

X-Zeq-Provider: fireworks
X-Zeq-Model:    accounts/fireworks/models/llama-v3p3-70b-instruct
X-Zeq-Tier:     free

Response · 200 OK (stream=true)

Standard SSE chunks:

data: {"id":"...","choices":[{"delta":{"content":"KO42"}}]}

data: {"id":"...","choices":[{"delta":{"content":" is the"}}]}

data: [DONE]

Self-heal on upstream error

If Fireworks (or your BYOK provider) returns a non-2xx, the framework doesn't leak the upstream error. It returns a graceful self-heal message instead, surfaced as a normal assistant message:

The free-tier provider returned a hiccup (HTTP 400). The HulyaPulse
continues at 1.287 Hz. Try again in a moment, or add your own
Fireworks key in Settings → Models for unlimited use.

This keeps the Pulse's UX graceful and points the user at the BYOK upgrade path. Per-IP refund is applied so the bad call doesn't count against the free-tier quota.

Rate limits

~10 chats/day per visitor IP (sliding window).
Per-window quota refunds on upstream failure (self-heal path).
429 returned with Retry-After header when exhausted.

CORS

Access-Control-Allow-Origin: *. The Pulse on any page (framework, hosted /s/<slug>/, or third-party embed via /embed/orb.js) can call this endpoint directly.

Pulse — what calls this
BYOK — bypass the free-tier limit by adding your own key
/api/zeq/agent/page-chat — the machine-bound, paid alternative

Auth​

Cost​

Request​

Body​

Response · 200 OK (stream=false)​

Response · 200 OK (stream=true)​

Self-heal on upstream error​

Rate limits​

CORS​

Related​