الانتقال إلى المحتوى الرئيسي

POST /api/chat/free

Free-tier chat endpoint. OpenAI-shape request, OpenAI-shape response. Routes through the foundation-seeded Fireworks credential by default; falls through to the caller's active Fireworks BYOK credential if one exists. Used by the Pulse in guest mode.

A kernel CKO envelope is computed pre-call and injected into the messages so the model knows the live framework state (operators it can cite, R(t), phase, proof_digest of recent computes). Self-heals on upstream errors with a graceful message rather than leaking the provider's error text.

Auth

None required. CORS-open, rate-limited per IP. The Pulse on any page can call it directly.

Cost

0 ZEQ to the caller. The framework absorbs the cost via the foundation-seeded credential. Free-tier limit: ~10 chats/day per visitor IP, applied at the rate-limiter layer.

Request

curl -X POST https://YOUR-FRAMEWORK/api/chat/free \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "user", "content": "What does operator KO42 do?" }
],
"stream": false,
"temperature": 0.7,
"max_tokens": 4096,
"model": null
}'

Body

FieldTypeRequiredNotes
messagesOpenAI-shape arrayyes[{role, content}, ...]. Last 20 are kept.
streamboolnoDefault true (SSE). Set false for single-shot JSON.
temperaturefloat (0–2)noDefault 0.7.
max_tokensintnoDefault 4096 (Fireworks's non-streaming cap).
modelstringnoOverride accounts/fireworks/models/llama-v3p3-70b-instruct.

Response · 200 OK (stream=false)

OpenAI chat-completions shape, with extra X-Zeq-* headers:

{
"id": "chatcmpl-...",
"object": "chat.completion",
"model": "accounts/fireworks/models/llama-v3p3-70b-instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "KO42 is the metric tensioner — it enforces..."
},
"finish_reason": "stop"
}
],
"usage": { "prompt_tokens": 1240, "completion_tokens": 420, "total_tokens": 1660 }
}

Headers:

X-Zeq-Provider: fireworks
X-Zeq-Model: accounts/fireworks/models/llama-v3p3-70b-instruct
X-Zeq-Tier: free

Response · 200 OK (stream=true)

Standard SSE chunks:

data: {"id":"...","choices":[{"delta":{"content":"KO42"}}]}

data: {"id":"...","choices":[{"delta":{"content":" is the"}}]}

data: [DONE]

Self-heal on upstream error

If Fireworks (or your BYOK provider) returns a non-2xx, the framework doesn't leak the upstream error. It returns a graceful self-heal message instead, surfaced as a normal assistant message:

The free-tier provider returned a hiccup (HTTP 400). The HulyaPulse
continues at 1.287 Hz. Try again in a moment, or add your own
Fireworks key in Settings → Models for unlimited use.

This keeps the Pulse's UX graceful and points the user at the BYOK upgrade path. Per-IP refund is applied so the bad call doesn't count against the free-tier quota.

Rate limits

  • ~10 chats/day per visitor IP (sliding window).
  • Per-window quota refunds on upstream failure (self-heal path).
  • 429 returned with Retry-After header when exhausted.

CORS

Access-Control-Allow-Origin: *. The Pulse on any page (framework, hosted /s/<slug>/, or third-party embed via /embed/orb.js) can call this endpoint directly.