Every answer page can show an ✦ INTERESTINGbadge, and the archive ranks questions by it. Here's how the score is computed. No human votes feed into it — it's entirely derived from what the substrate did when it tried to answer.
The idea, in one sentence
The most editorially interesting answers are the ones where the substrate visibly wrestled— it found the question's framing alien, pushed back from inside its tradition, and grounded the pushback in cited passages. Bland direct answers and flat refusals score lower.
The formula
Each answer starts at 50and collects adjustments. The final score is clamped to 0–100. Discrete components set the answer's broad shape; continuous substrate signals (top1, spread10, prose length) differentiate answers within the same shape, so the sort never collapses to ties. Anything ≥ 80 gets the ✦ badge.
signal
weight
baseline (every answer starts here)
+50
mode = objection (substrate reframed)
+12
verdict = grounded (citations resolved)
+8
verdict = refused (honest silence)
−22
verdict = ungrounded (cited outside the workspace)
−8
alien + grounded combo (disagreed but had reasons)
+2
signal.top1 (substrate confidence in its #1 passage)
0 to +14
signal.spread10 (gap between top-1 and top-10 retrievals)
0 to +18
prose length (sweet spot ~300–600 chars)
0 to +12
Why those weights
Objection mode (+12)is the biggest discrete bonus because it captures the editorial distinction the site is built on: when a mind reframes, it's doing something a chatbot wouldn't.
Grounded (+8) rewards answers whose citations resolve back to the retrieved workspace — no hallucinated verses.
Refused (−22) is the largest penalty because a silent answer is honest but not editorially rich. Refusals still appear in the archive; they just sort lower.
Signal spreadis the gap between the substrate's top-1 retrieval and its top-10. A wide spread means the substrate has opinions — it cared about specific passages, not just “eh, these all kinda fit.” Scaled continuously across the observed 0.006–0.045 range so two answers with the same discrete shape can still differ.
Top1 confidenceis the substrate's similarity to its #1 chosen passage. High top1 = the substrate found a clear match rather than a fuzzy cluster.
What the score does NOT measure
Visitor likes, votes, or comments. (Not yet wired in.)
Whether the answer is “correct.” Each mind speaks only from its own corpus; correctness across worldviews is the visitor's judgment, not the system's.
How spicy / controversial the question is. Wholly canonical questions and modern dilemmas use the same formula.
Recency. A great answer that landed three months ago scores the same as one that landed yesterday.
Where the code lives
The scoring function is a single pure file: webapp/lib/interestingness.ts. Adjust the weights, rebuild, and the badge thresholds + archive rankings update everywhere.
The substrate-side signals it consumes — mode, verdict, signal.spread10, signal.alien — are written into each answer JSON at generation time, in boundary/answer_questions.py.