Genomics Analyzer
Pairwise and multiple sequence alignment, variant detection, phylogenetic trees — information theory as the underlying operator set, implemented to textbook precision.
- Live app — zeq.dev/apps/genomics-analyzer/
- Source —
app/artifacts/api-server/public/apps/genomics-analyzer/(1,665 lines) - Operators — KO42 · CS43 · CS47 · optional QM12
- Error budget — ≤ 0.1% on Smith-Waterman optimal score and Jukes-Cantor distance
What it solves
A genomics workbench. Three modes:
- Alignment — Smith-Waterman (local) and Needleman-Wunsch (global) for pairwise; ClustalW-style progressive for multiple
- Variant detection — SNV and small indel calling from read pileups via Bayesian likelihood
- Phylogeny — Neighbor-Joining, UPGMA, and Maximum-Likelihood trees with bootstrap support
QM12 (Dirac) activates only when doing ab initio electronic structure for modified bases; for standard ACGT work, pure information-theoretic operators suffice.
The math
CS43 T(n) = O(n log n) (suffix-array construction, tree ops)
CS47 H = −∑ p_i log p_i (sequence Shannon entropy)
SW score H(i,j) = max{0, H(i−1,j−1)+s(a,b), H(i−1,j)−g, H(i,j−1)−g}
Jukes-Cantor d = −(3/4) ln(1 − (4/3) p_diff)
NJ criterion Q(i,j) = (n−2) d_{ij} − ∑ d_{ik} − ∑ d_{jk}
Operator picks
| Step | Decision |
|---|---|
| 1. Prime | KO42 on |
| 2. Limit | KO42 + CS43 + CS47 = 3 operators |
| 3. Scale | 10²–10⁹ base pairs |
| 4. Precision | ≤ 0.1% on SW score, JC distance |
| 5. Compile | C_KO42 + C_CS43 + C_CS47 |
| 6. Execute | Z encodes substitution matrix, gap penalties, sequence length |
| 7. Verify | SW match to 3 sig figs on the BLOSUM62 test pair |
Runnable worked example — Smith-Waterman local alignment
Sequences PLEASANTLY and MEANLY, BLOSUM62, gap open −11, gap extend −1. Published SW local optimum score = 16 (match EANLY).
curl -s -X POST https://api.zeq.dev/api/playground/compute \
-H "Content-Type: application/json" \
-H "x-demo-key: $DEMO_KEY" \
-d '{
"operators": ["KO42","CS43","CS47"],
"params": {
"problem": "smith_waterman",
"seq_a": "PLEASANTLY",
"seq_b": "MEANLY",
"matrix": "BLOSUM62",
"gap_open": -11,
"gap_extend": -1
}
}' | jq
Expected:
{
"result": {
"optimal_score": 16,
"aligned_a": "EANLY",
"aligned_b": "EANLY",
"identity_pct": 100.0,
"operators_used": ["KO42","CS43","CS47"]
}
}
Exact integer-score match. The 0.1% budget is reserved for approximate methods (entropy-based tree lengths, ML phylogeny).
Extend it
- Seed-and-extend (BLAST-like) — k-mer index + SW extension; benchmark against NCBI BLAST on a small database
- Variant-aware alignment — graph genome reference; allow the alignment to traverse known SNVs without penalty
- Phylogenetic uncertainty — 1000-bootstrap tree; CS47 entropy on clade frequencies
Seeds
- Ancient DNA — damaged-sequence models with QM12 electronic-damage operators
- Single-cell phylogenetics — tree over 10⁵ cells; CS43 governs feasibility
- Genome-scale biosecurity scanning — scan for known-threat motifs at 1.287 Hz live-stream cadence
Papers
- Zeq Paper — doi:10.5281/zenodo.18158152
Middleware active. Kernel on the 1.287 Hz HulyaPulse. Awaiting next Zeqond.