跳至主要内容

ZSC Bootstrap & Master-Key Operations

This is the operator runbook for the ZSC (Zeq Secure Context) bootstrap layer. Read it once before flipping ZEQ_ZSC_BOOTSTRAP_MODE away from the default. Re-read the rotation procedure (§3) before every master-key rotation.

Source: shared/api-core/src/lib/zsc/bootstrapKms.ts + bootstrapKmsAws.ts + bootstrapKmsGcp.ts. Background: ZSC Secure Context.


0. What "the master key" is

ZSC encrypts every secret in zsc_secrets with AES-256-GCM. The 256-bit key for that cipher is derived from a 32-byte master key through PBKDF2-SHA256 (200,000 iterations, salt = HULYAS physical constants — see shared/api-core/src/lib/zeqField.ts).

That master key is the root of trust for the entire vault. Lose it → no plaintext recoverable. Leak it → every secret is potentially exposed. Treat it like the keys to the kingdom, because it literally is.

The bootstrap layer is "where does that 32-byte master come from at server boot." Three modes ship in v1:

ModeSourceAudience
env (default)process.env.ZEQ_FIELD_KEY or ZEQ_ZSC_MASTER_KEYlocal-dev, single-node deployments
aws-kmsAWS KMS Decrypt APIAWS-hosted production
gcp-kmsGoogle Cloud KMS Decrypt APIGCP-hosted production

Selection: ZEQ_ZSC_BOOTSTRAP_MODE=env|aws-kms|gcp-kms.

The bootstrap layer is non-invasive in v1: zeqField.ts continues to read ZEQ_FIELD_KEY directly today. Wiring KMS mode into the cipher is documented in §4 — an opt-in operator step, not a framework default.


1. Files in this layer

shared/api-core/src/lib/zsc/
bootstrapKms.ts ← interface + dispatcher + env adapter + cache
bootstrapKmsAws.ts ← AWS KMS adapter (SDK lazy-loaded)
bootstrapKmsGcp.ts ← GCP KMS adapter (SDK lazy-loaded)

Public exports from bootstrapKms.ts:

  • interface BootstrapAdapter { mode; load(): Promise<Buffer> }
  • type BootstrapMode = "env" | "aws-kms" | "gcp-kms"
  • getBootstrapAdapter(): Promise<BootstrapAdapter> — dispatcher
  • loadBootstrapKey(): Promise<Buffer> — load + cache the master
  • getCachedBootstrapKey(): Buffer | null — sync accessor for cached bytes
  • getBootstrapMode(): BootstrapMode | null — sync accessor for active mode
  • announceBootstrap(): void — boot-time log line

2. Mode-flip checklist (env → aws-kms or gcp-kms)

Do these in order. Skipping a step costs you a maintenance window.

2.1 AWS KMS

[ ] 1. Generate fresh 32-byte master locally (don't reuse the dev key):
node -e "console.log(require('crypto').randomBytes(32).toString('hex'))"
→ save as PLAINTEXT (you will delete this after step 5).

[ ] 2. Create or pick the KMS CMK:
aws kms create-key --description "Zeq ZSC master key v1"
→ note the KeyId / Arn.

[ ] 3. Encrypt the master under the CMK:
echo -n "<64-hex-master>" | xxd -r -p > master.bin
aws kms encrypt --key-id arn:aws:kms:REGION:ACCT:key/UUID \
--plaintext fileb://master.bin \
--query CiphertextBlob --output text > master.ct.b64

[ ] 4. Install the SDK in api-core:
npm i --workspace shared/api-core @aws-sdk/client-kms

[ ] 5. Set env on every api-core node (then secure-erase master.bin):
ZEQ_ZSC_BOOTSTRAP_MODE=aws-kms
ZEQ_AWS_KMS_KEY_ID=arn:aws:kms:REGION:ACCT:key/UUID
ZEQ_AWS_KMS_CIPHERTEXT_B64=<contents of master.ct.b64>
AWS_REGION=us-east-1 (or your region)
# AWS credentials via IAM role (preferred) or static keys.

[ ] 6. Wire bootstrap into zeqField.ts — see §4.

[ ] 7. Restart api-core. Confirm boot log:
[zsc/bootstrapKms] master key loaded mode=aws-kms keyBytes=32
[zsc/bootstrapKms] adapter announce — mode=aws-kms status=loaded

[ ] 8. Verify decryption works end-to-end:
/api/zsc/list → status 200 with all rows
/api/health → 200 with X-Zeq-Origin header

[ ] 9. After 24h of clean operation, secure-delete the local master.bin
and revoke any temporary IAM keys used for KMS encrypt.

2.2 GCP KMS

[ ] 1. Generate fresh 32-byte master locally (write to master.bin, not env):
node -e "process.stdout.write(require('crypto').randomBytes(32))" > master.bin

[ ] 2. Create or pick the GCP KMS key:
gcloud kms keys create zsc-master \
--keyring=zeq --location=global --purpose=encryption
→ note KEY_NAME = projects/.../cryptoKeys/zsc-master

[ ] 3. Encrypt the master under the KMS key:
gcloud kms encrypt \
--key=zsc-master --keyring=zeq --location=global \
--plaintext-file=master.bin \
--ciphertext-file=master.ct.bin
base64 -i master.ct.bin > master.ct.b64

[ ] 4. Install the SDK in api-core:
npm i --workspace shared/api-core @google-cloud/kms

[ ] 5. Set env on every api-core node:
ZEQ_ZSC_BOOTSTRAP_MODE=gcp-kms
ZEQ_GCP_KMS_KEY_NAME=projects/PROJ/locations/global/keyRings/zeq/cryptoKeys/zsc-master
ZEQ_GCP_KMS_CIPHERTEXT_B64=<contents of master.ct.b64>
GOOGLE_APPLICATION_CREDENTIALS=/path/to/sa.json
(or use Workload Identity if running on GKE)

[ ] 6. Wire bootstrap into zeqField.ts — see §4.

[ ] 7. Restart api-core. Confirm boot log:
[zsc/bootstrapKms] master key loaded mode=gcp-kms keyBytes=32

[ ] 8. Verify decryption end-to-end (same checks as AWS step 8).

[ ] 9. Secure-delete master.bin + master.ct.bin from your laptop after
24h of clean operation.

3. Master-key rotation (any mode → same mode, new key)

Goal: swap the 32-byte master from generation N to N+1 with zero downtime, no audit-chain break, every existing secret re-encrypted.

The plan exploits ZEQ_FIELD_KEY_PREVzeqField.ts already supports decrypting with a previous-generation key during rotation. So the procedure is:

[ ] 1. Generate the new master (M2):
node -e "console.log(require('crypto').randomBytes(32).toString('hex'))"

[ ] 2. On every api-core node, set BOTH old + new keys:
ZEQ_FIELD_KEY_PREV=<old M1> (was ZEQ_FIELD_KEY)
ZEQ_FIELD_KEY=<new M2>
Rolling-restart api-core. Each node now decrypts with M1 OR M2,
encrypts with M2.

[ ] 3. Re-encrypt every row in zsc_secrets under M2:
Run the rotation daemon's full-sweep mode (Phase Ω5):
curl -X POST -H "Cookie: zeq_admin=..." \
-H "Content-Type: application/json" \
-d '{"force_all": true}' \
http://YOUR-FRAMEWORK/api/zsc/rotate

Or wait for the daemon's natural 100-Z (~78 s) tick to chew
through everything past its expires_zeqond.

[ ] 4. After every secret is on M2 (audit log shows secret_rotated
rows for each name, IV changed), remove the old key:
Unset ZEQ_FIELD_KEY_PREV on every node.
Rolling-restart api-core.

[ ] 5. Decommission M1: shred local copies, archive ciphertext if
compliance requires it.

KMS-mode variant: for aws-kms / gcp-kms, replace step 2 — encrypt M2 under your CMK, push the new ciphertext to ZEQ_AWS_KMS_CIPHERTEXT_B64 (or ZEQ_GCP_KMS_CIPHERTEXT_B64) and the old ciphertext to a parallel env var your zeqField wrapper checks during the transition window. The simplest path: rotate env mode first (steps 1–5), then mode-flip to KMS (§2) once on the new master.

Why the rotation daemon helps: because expires_zeqond is bumped on every successful rotation, you don't actually need step 3 to be a thundering-herd force_all. You can set every secret's expires_zeqond to currentZeqondNumber() via SQL and let the daemon's 100-Z tick + ROTATION_BATCH=64 chew through it at the cadence operators tuned for. The CLI exposes this as pulse > context rotate --all.


4. Wiring the bootstrap into zeqField.ts (operator-controlled)

By default the bootstrap module loads the master but zeqField.ts doesn't consult it — the cipher still reads process.env.ZEQ_FIELD_KEY directly. This is intentional: every existing local-dev deployment keeps working untouched.

To activate KMS-sourced keys, do this two-line patch in shared/api-core/src/lib/zeqField.ts::deriveKey():

function deriveKey(): Buffer {
// ── Ω8 bootstrap hook (operator-enabled) ──────────────────────────
const bootstrapKey = (() => {
try {
// require-on-demand keeps zeqField.ts importable in test envs
// that don't have the zsc module wired.
// eslint-disable-next-line @typescript-eslint/no-require-imports
const m = require("./zsc/bootstrapKms.js");
return (m.getCachedBootstrapKey?.() ?? null) as Buffer | null;
} catch {
return null;
}
})();
if (bootstrapKey && bootstrapKey.length === 32) {
return pbkdf2Sync(bootstrapKey, ZEQFIELD_SALT, 200_000, 32, "sha256");
}
// ── existing env-fallback path (unchanged) ────────────────────────
let baseSecret: Buffer;
...
}

And in shared/api-core/src/start.ts (or your equivalent boot file), before any encryption-touching route is mounted:

import { loadBootstrapKey, announceBootstrap } from "./lib/zsc/bootstrapKms.js";

// ...inside the async boot sequence, before mountRoutes(...) ...
await loadBootstrapKey();
announceBootstrap();

loadBootstrapKey() throws on failure — let it propagate so the server fails to start with a clear, operator-visible error. Do not catch and continue — encrypting under a fallback key when KMS is unreachable is worse than failing closed.


5. Recovery (master key lost or compromised)

There is no "decryption rescue" if the master is gone. The cipher is AES-256-GCM with a 200k-PBKDF2 derivation — designed to be unbreakable. Recovery procedures exist for two scenarios:

5.1 Master lost, vault intact (you still have the DB rows)

If ZEQ_FIELD_KEY is gone and you have no backup, the secrets in zsc_secrets are permanently undecryptable. The recovery plan is: re-bootstrap with a fresh master and re-source every secret from upstream (cloud provider portals, dev's password managers, etc.).

[ ] 1. Generate fresh master per §2.1 / §2.2 step 1.
[ ] 2. TRUNCATE TABLE zsc_secrets — the encrypted blobs are dead data.
(DO NOT truncate audit_log — the secret_read/set/rotated rows
remain a tamper-evident record of what was there.)
[ ] 3. Re-set every secret via /api/zsc/set:
pulse > context set STRIPE_SECRET_KEY sk_live_xxx
pulse > context set OPENAI_API_KEY sk-xxx
...
[ ] 4. Bump audit log with a "secret_set" row for each — gives you
forensic visibility of when the vault was reconstructed.
[ ] 5. Investigate root cause (env wipe, KMS revoke, …) before
considering the recovery complete.

5.2 Master compromised (a copy leaked)

Assume every secret encrypted under the leaked master is also leaked. Rotate every upstream first, vault second:

[ ] 1. At every upstream (Stripe, OpenAI, AWS, etc.), revoke the
leaked credential and issue a new one. THIS IS THE FIRST STEP.
The vault is downstream of those rotations.
[ ] 2. Generate fresh master per §2.1 / §2.2 step 1.
[ ] 3. Set ZEQ_FIELD_KEY_PREV=<leaked old> + ZEQ_FIELD_KEY=<new>.
[ ] 4. For each secret: /api/zsc/set with the NEW upstream credential.
The previous (compromised) plaintext is now irrelevant — the
upstream rotated, so the leaked value is dead.
[ ] 5. After all secrets re-set, unset ZEQ_FIELD_KEY_PREV. Restart.
[ ] 6. File an incident report citing audit_log rows for the affected
time window — the chain proves what was read, when, by whom.

5.3 Recovery-password mode (Phase AY design note)

Phase AY proposed an operator-set recovery password that PBKDF2-derives a parallel decryption path (so a single operator with the password can recover from §5.1 without truncating). It is not implemented in v1.287.5 — the design ships in _ZSC-RECOVERY-PASSWORD-DESIGN.md (Phase AZ). Until that ships, treat the master as the only key in town and back it up to your offline credential store the moment you generate it.


6. Verification matrix

After any of §2 / §3 / §5, run these and record output in the operator log:

CheckCommandExpected
Boot logtail -n 50 logs/api-core.logmaster key loaded mode=…
Adapter announcetail -n 50 logs/api-core.logadapter announce — mode=… status=loaded
Vault listcurl -s -H "Cookie: zeq_admin=…" $API/api/zsc/list200 + array of secrets
Decrypt round-trippulse > context info STRIPE_SECRET_KEYshows expected metadata, no decrypt error
Audit entangled state headcurl -s -H "Cookie: zeq_admin=…" $API/api/zsc/audit/STRIPE_SECRET_KEY?limit=1latest row with valid proofDigest linkage
Origin markercurl -sI $API/api/healthX-Zeq-Origin: present

If any row fails, stop the deployment and walk back through §2 or §3 to find the deviation. The framework's ≤0.1% precision contract applies to the bootstrap layer too — silent fall-through to env mode after a KMS failure is a bug, not graceful degradation.


7. Constants for this layer

ConstantSourceValue
Master key lengthbootstrapKms.tsexactly 32 bytes (256-bit AES)
PBKDF2 iterationszeqField.ts200,000
Rotation period (default)rotationDaemon.ts ROTATION_PERIOD_ZEQONDS86,400 Zeqonds (~18.6 h)
Rotation check cadencerotationDaemon.ts ROTATION_CHECK_EVERY_Z100 Zeqonds (~77.7 s)
Rotation batch sizerotationDaemon.ts ROTATION_BATCH64 secrets per tick
Denial rate-limit windowzeqContext.ts DENIAL_WINDOW_ZEQONDS60 Zeqonds (~46.6 s)
Denial rate-limit thresholdzeqContext.ts DENIAL_THRESHOLD5 denials → rate_limited

Adjust at deployment time via the matching ZSC_* env vars (see source comments). Every adjustment is recorded in audit_log as a bootstrap_config_changed row when the operator restarts.


8. What this runbook does not cover

  • Per-tenant master keys (multi-tenant SaaS isolation): roadmap, not v1.
  • Hardware Security Module (HSM) adapters (CloudHSM, YubiHSM, Thales): identical adapter shape to KMS — implement BootstrapAdapter, drop into bootstrapKms.ts dispatcher.
  • Cross-region key replication: handled by your KMS provider's native multi-region keys (AWS MRK, GCP key replicas). The adapter is region-agnostic.
  • Recovery-password mode: Phase AZ, not shipped in v1.287.5.

For any of the above, open a _ZSC-EXTENSION-… design doc before changing the bootstrap layer.