ergo · myzona

The contradiction guard

The four-stage write-time check that catches stale beliefs before they store.

Every guarded write (remember, learn, supersede) runs a four-stage pipeline against the active claims in scope before anything is stored. ingest skips it entirely — reference facts upsert freely.

incoming statement ─▶ 1. NORMALIZE ─▶ 2. STRUCTURAL ─▶ 3. NLI (gated) ─▶ 4. JUDGE ─▶ tier

The tier decides the outcome: clean (store), warn (store + surface a warning), or block (refuse with HTTP 409 and the conflicting claim).

Stage 1 — Normalize

Turns prose into a structured claim so the comparison isn't at the mercy of wording.

{
  "modality":     "should",        // must | should | may | must_not | should_not | may_not
  "subject":      "deploy",        // canonical entity / action
  "object":       null,            // canonical state verb (enabled/used/removed…), enum-constrained
  "value":        null,            // atomic value selector (colour/port/version…), free text
  "scope":        { "env": null, "team": null },
  "valid_from":   null,
  "valid_until":  null,
  "subject_kind": "PRESENT"        // PRESENT | FUZZY | MISSING
}
  • subject is the critical piece. If it's MISSING, the claim is treated as incomparable to anything that doesn't share the subject. If FUZZY, comparison defers to NLI.
  • value keeps the distinguisher out of subject (so subjects still match) and out of object (so the polarity enum stays clean). It's what lets "must use a blue canary" and "must use a red canary" be seen as a conflict instead of collapsing to the same norm.

Normalizers are pluggable (rule-based/deterministic, or an LLM backend). If the LLM returns something that isn't JSON — a refusal, chatter — extraction fails open to an unknown norm and the claim stores un-gated rather than erroring. A normalizer hiccup can never 500 a write.

Stage 2 — Structural compare

Pure, deterministic Python. Given the incoming norm and a candidate from the active set, it returns a verdict with a confidence tier (HIGH / MED / LOW):

if incoming.subject_kind == "MISSING":
    return "incomparable"                       # NLI never runs

if same_subject and overlapping_scope and overlapping_time:
    if opposing_modality:                       # must vs must_not, approved vs rejected …
        return "contradiction"                  # HIGH
    if same_direction_stance and value_a != value_b and both_atomic:
        return "contradiction (value)"          # HIGH — can't require BOTH
    if same_modality:
        return "consistent"                     # HIGH
    return "uncertain"                          # MED — defer

if subject_kind == "FUZZY":  return "defer to NLI"    # LOW
if disjoint_scope or disjoint_time:  return "coexist" # HIGH — principled, not an escape hatch
return "unknown"                                       # LOW — defer to NLI
  • Scope = environment + team + tenant. A prod claim and a dev claim aren't a contradiction even if the statements look opposing.
  • Time = valid_from / valid_until. Non-overlapping date windows → coexist.
  • Modality opposition is a fixed lexicon (mustmust_not, approvedrejected, grantdeny, …).
  • Value conflicts require both stances determined and equal, both values atomic (≤ 2 tokens), and the differing tokens grounded in the raw text — so a hallucinated value can't manufacture a contradiction, and config-enumeration docs don't false-flag.

Stage 3 — NLI (gated)

Natural-language inference runs only when structural returns LOW confidence (it couldn't decide). Even then the result is gated by subject/polarity/overlap checks before it's allowed to block — because raw NLI over-fires (>0.9) on unrelated cross-domain pairs. The gate is what turns a noisy classifier into a trustworthy blocker.

Stage 4 — Judge → tier

The judge combines the structural verdict and the (optional) gated NLI residual into a final tier:

tieroutcome
cleanstore the claim, no conflict
warnstore, but return a warning (softer conflict / uncertain)
blockrefuse the write — HTTP 409 with the conflicting claim(s) and their reasons

A blocked write is not an error to route around — it's the guard doing its job. Resolve it by cancelling, superseding with a new reason, or forcing an exception on the record.

The bet, validated

NLI alone on real prose barely beats a lexical baseline. Normalize → structural → gated-NLI → judge beats every piece alone — see the numbers.