Methodology - Software Defined Corporation

MFid Methodology v2.2.0 · 2026-06-18

One canonical formula. Two reporting forms. Published rubrics for every dimension.

For a worked example of the formula applied end-to-end on a published vendor SLA, see the MFid walkthrough.

Why this page exists

A firm whose entire product is auditing other people’s claims cannot ship a metric in two formulas, two evidence-tier names, and an undocumented Intentionality rubric. This page is the canonical reference. Every other page on the site is required to match it. If a page contradicts this one, this one wins and the other page is a finding against ourselves.

This is the firewall recommended in the 2026-05-26 thesis review (§4.1, §4.2, §4.5). It exists because the review was right.

The canonical formula

MFid_aggregate = (D × E × O × I)^1/4

D, E, O, I ∈ [0,1]. Geometric mean of four normalized dimension scores.

Why a geometric mean and not an arithmetic mean. A geometric mean is dominated by the smallest factor. A single weak dimension caps the composite — an MFid of 0.5 on any one dimension limits the aggregate to roughly 0.84 even if the other three are perfect. That is the policy we want. An arithmetic mean would let a perfect Observability score paper over a broken Determinism score. We treat fidelity gaps as non-substitutable: you cannot fix a lie about latency by being more transparent about it.

Why these four dimensions and not three or five. D, E, O, I are the smallest set that survives every domain we have scored. Determinism, Efficiency, and Observability are the engineering core. Intentionality is the dimension that distinguishes a fast wrong answer from a fast right one — the one most measurement frameworks omit and the one most needed in autonomous-systems era.

Two reporting forms — and how they relate

Past versions of this site published two formulas without saying so. That was a finding. The reconciliation:

Form 1 — Aggregate MFid (the canonical number)

MFid_aggregate = (D × E × O × I)^1/4. One number per system, vendor, process, or stack. This is the number that goes on the board deck. It is always a geometric mean of four [0,1] dimension scores. Weights are fixed at 1/4 each. There are no per-engagement weight choices.

Form 2 — Domain-projected fidelity (MFid_app, MFid_net, …)

When a system exposes domain-specific service-level indicators (latency, throughput, reliability for an app; bandwidth, jitter, loss for a network), we compute a domain projection:

MFid_app = w_L·L + w_T·T + w_R·R

where L, T, R ∈ [0,1] are fidelity scores per SLI (claimed/observed clipped to 1.0) and weights w sum to 1. Weights are not chosen per case. They are derived from the system’s own published SLO portfolio — if the operator weights latency at 50% of their SLO budget, we use 0.5. If no SLO portfolio exists, we use the default Tier-1 weighting (0.5 / 0.3 / 0.2 in order of business impact: response time, throughput, reliability) and publish the choice in the finding.

A domain projection is not the aggregate MFid. It is one input into one of the four dimensions. Roll-up:

Determinism (D) ← tail dispersion of each L/T/R signal (p99/p50 ratio inverted and clipped).
Efficiency (E) ← resource-bounded form of L and T (claimed cost-per-unit ÷ observed cost-per-unit).
Observability (O) ← coverage of L/T/R signals: fraction of customer journeys for which a measurement exists.
Intentionality (I) ← scored separately; see rubric below.

Every published MFid_app in our case studies is annotated with both forms going forward: the projection that motivated the finding, and the aggregate it rolled into.

Operational definitions of D, E, O, I

Each dimension is a normalized composite of underlying measurements. Each has a published formula. None of them are subjective. None of them are scored by vibes.

D — Determinism

Definition: Same input, under stated conditions, produces the same observable output within a stated tolerance.

Measurement: D = 1 − min(1, σ / (μ × τ)) where σ is the standard deviation of the observable, μ is its mean, and τ is the published tolerance band (e.g. 10%). Computed per SLI; aggregated by minimum across SLIs (a system is only as deterministic as its worst-behaved indicator).

Worst-case binding: If a single tail event in the measurement window exceeded 2σ on a critical SLI, D is capped at 0.9 regardless of the formula above. We refuse to let a calm hour hide a panic minute.

E — Efficiency

Definition: Resource cost per unit of useful output, relative to the published or contracted cost.

Measurement: E = min(1, claimed_cost_per_unit / observed_cost_per_unit), computed in the natural unit of the system (cycles/token, watts/inference, dollars/transaction, joules/request, bytes/query). One unit per system, declared up front. E is a one-sided ratio: under-cost (better than claim) is clipped at 1.0 — we report it but it does not inflate the score under standard scoring. The exception is documented, multi-dimensional under-reporting; see When MFid exceeds 1.0 below.

O — Observability

Definition: Fraction of the customer-relevant behavior surface for which a current, queryable, retained measurement exists.

Measurement: O = (covered_SLIs / required_SLIs) × retention_factor × freshness_factor. Required SLIs are enumerated up front for each system from its specification (not from what is currently instrumented — that would let absence become a credit). Retention factor is 1.0 if telemetry is retained ≥ 30 days, scaled down otherwise. Freshness factor is 1.0 if the dashboard is queryable in < 60 seconds, scaled down otherwise. A score you have to mine from log files is not observability; it is archaeology.

I — Intentionality

This is the dimension the 2026-05-26 review flagged as undefined. The review was correct. The rubric below is the answer.

Definition: The fraction of system activity that demonstrably serves the stated purpose under audit, with the remainder classified as out-of-spec drift (not necessarily harmful — but not what was contracted).

Important framing. Intentionality is not a property of the artifact in isolation; it is a property of the artifact relative to its specification. We measure the specification as carefully as we measure the system. An unclear spec produces a low Intentionality ceiling, not a low Intentionality score — we publish the ceiling and recommend the spec be tightened.

Measurement (three-part, each scored [0,1], aggregated by geometric mean):

I_spec — Specification clarity. Does a written, dated, signed specification exist that enumerates required behaviors and forbidden behaviors? Scored by document analysis on a 7-point checklist (existence, dating, scope, behavior list, forbidden-list, change log, sign-off). Reproducible across reviewers with κ ≥ 0.7 on a 50-spec calibration set; calibration set published on request.
I_trace — Operational coverage. Of the operations the system performed during the measurement window, what fraction can be traced to a specified behavior? Computed as (traced_operations / total_operations) from logs, traces, or transaction records. Operations with no trace match are not assumed malicious — they are assumed unscored, and counted against I_trace.
I_drift — Forbidden-behavior detection. Of operations classifiable as “outside spec” (output schemas, data destinations, decision boundaries the spec excludes), what fraction were caught by automated guardrails before having effect? Computed as (blocked_violations / detected_violations). A system with no violations and no detection capability scores 0.5 (we cannot tell whether it is well-behaved or unmonitored).

I = (I_spec × I_trace × I_drift)^1/3.

For ML and autonomous systems specifically: I_spec is scored against the model card and policy document. I_trace is computed from prompt/response logs against the policy classifier. I_drift measures jailbreak/red-team catch rate. A worked example on a public LLM endpoint is published in the MFid open-source repository as examples/intentionality_llm_worked.md.

What I is not. I is not a measure of whether the system is good. A perfectly-malicious system with a clear spec authorizing maliciousness scores I = 1.0. I measures fidelity to the spec, not the wisdom of the spec. That is by design. We score the gap between claim and reality; we do not score the claim itself. That is the customer’s job.

When MFid exceeds 1.0 — the honest underreporter

The standard MFid ceiling is 1.0. Every dimension formula contains a min(1, …) clip, and a geometric mean of values bounded to [0,1] cannot exceed 1.0. That ceiling is a design choice, not a mathematical constraint. Its purpose is to prevent a vendor that out-performs in one dimension from masking a deficit in another.

There is a narrow category of entity for which the ceiling is wrong: the verified, systematic underreporter — an organization whose published specifications are demonstrably and consistently conservative across multiple independent dimensions, not by accident, not once, but as a documented operating posture. When observed reality outperforms every stated claim, the min(1, …) clip silently discards information. In those cases, clipping produces a less accurate number, not a more rigorous one. We lift the clip.

The canonical case: Porsche

Porsche publishes 0–60 times, quarter-mile figures, lap times, and power outputs that independent reviewers and owners consistently beat — not by a rounding error, but by a margin that is too regular to be noise. A 911 GT3 RS quoted at a Nürburgring time the factory already knows it can beat. A Taycan Turbo S rated at a 0–60 the launch-control software was tuned to exceed. The understatement is not accidental; it is policy. Porsche does not want a customer to discover their car underperforms the sticker. The sticker is a floor, not a target.

Applied to MFid: an audit of Porsche's published performance specifications against independently measured reality would find that E (efficiency — delivered performance per published rating), D (determinism — variance of the gap between claim and observation), and I (intentionality — systematic consistency of the understatement) all produce raw ratios above 1.0 before the clip is applied. Under standard scoring, all three are rounded down to 1.0 and the composite scores 1.0 — indistinguishable from a vendor who just barely met their word. That conflation is the problem.

The scoring rule for verified underreporters

The min(1, …) clip is lifted on any dimension where the following conditions are simultaneously met:

The raw ratio exceeds 1.0 on its own evidence. Observed performance is better than claimed, not merely equal.
The pattern is multi-instance. The gap is documented across at least three independent measurements or product lines — not a single favorable test.
The understatement is structural, not accidental. There is evidence the vendor knowingly set conservative specifications — internal communications, design posture, or a consistent historical record — rather than simply underestimating their own system.
No other dimension is below 0.9. A super-unity score on one axis cannot be used to redeem a deficiency on another. This rule preserves the core principle of the geometric mean: a chain is only as strong as its weakest link.

When all four conditions hold, each qualifying dimension's raw ratio is admitted without the clip. The aggregate geometric mean is then computed normally. A resulting score above 1.0 is published as-is, annotated with the designation Verified Underreporter and the specific dimensions that cleared the threshold.

What a score above 1.0 means — and does not mean

An MFid above 1.0 does not mean the system is perfect. It means the vendor's claims are structurally conservative relative to what they actually deliver. The organization tells you less than the truth in their favor. In a landscape where most published specifications are optimistic, aspirational, or outright wrong, a vendor that systematically underreports is a meaningful outlier. The score reflects that outlier status numerically rather than relegating it to a footnote.

The designation is rare by design. Verified Underreporter status is not awarded for beating a spec once, or for modest outperformance within normal manufacturing variance. It requires the kind of documented, intentional conservatism that an organization chooses as a brand posture and executes consistently. Most vendors will never qualify. The ones that do earn a score that cannot be confused with "barely passing."

The upper bound in practice is roughly 1.15 to 1.25 — constrained not by the formula but by how consistently any real-world organization can outperform its own stated claims across all four dimensions simultaneously. A published MFid above 1.25 would require extraordinary evidence and would be treated with the same scrutiny we apply to a vendor claiming 0.99 on a Tier-2 basis.

Evidence tiers — one name, one definition

Every published MFid number carries an explicit evidence tier and a coverage percentage. The tiers, canonically:

Tier 1 — Measured Reality. ✓ What we observe in the client environment, under client load, on the client’s worst day. The verdict.
1E. Tier 1E — Engineering Estimation. ✓ Where direct measurement of a subsystem is incomplete, the uncovered portion is scored by named inference method and labeled 1E. The inference method must be explicitly stated in the published finding. A Tier-1E sub-score is not a Tier-1 score; it is reported separately in the coverage label.
Tier 2 — Published Specification. ✓ What the vendor put in writing. The claim being tested. Example: NVMe latency vs. datasheet; ISP bandwidth vs. contract; cloud uptime vs. SLA.
Tier 3 — Scientific Calculation. ✓ What physics, mathematics, or architectural law allow. The ceiling no claim can exceed. Example: thermal throttling derived from TDP; channel capacity bounded by Shannon’s theorem.

v2.1 renumbering (2026-05-28). This is a renumbering, not a methodology change. The three categories and their definitions are unchanged. No published scores moved as a result. Tier 1 is now Measured Reality because that is where reader intuition puts the strongest evidence — the prior numbering (T1 = Scientific Calculation) inverted that intuition. The sub-label for incomplete measurement moves with the tier and is now 1E.

The naming history. Earlier versions of the site used “Measured Reality” on the homepage and “Engineering Estimation” on the manifesto for the same tier. That was a finding against ourselves. The reconciliation is above: one tier (Tier 1 — Measured Reality) with an explicit sub-label (1E — Engineering Estimation) only when the measurement is incomplete.

The coverage rule. Every published MFid number carries the form:

MFid 0.81 (Tier-1 over 72% of subsystems; Tier-2 published spec for the remaining 28%)

A score with no tier label and no coverage percentage is not a published MFid. It is a draft.

Limitations we will not hide

The four dimensions are not commensurable in a strict measurement-theoretic sense. The geometric-mean aggregation is a policy choice — it penalizes weakness — and not a derivation. We have argued the choice above; we have not proved it is unique.
Intentionality is a meta-level measurement. It depends on a specification existing, being current, and being readable. The rubric handles this by scoring spec clarity (I_spec) as a sub-factor, but the dependence is real.
MFid does not score what has no spec. Vendor lock-in, switching cost, strategic dependency, and category-creation risk are real business exposures with no published claim to test against. MFid is silent on them by design. A high MFid on a vendor you cannot leave is still a problem; that problem belongs in procurement and architecture review, not in this instrument.
MFid does not detect compromise that stays within spec. A breach that does not violate any published SLI — a low-and-slow data exfiltration under the throughput ceiling, an authorized credential used by the wrong human — will register as I = 1.0 because behavior matches specification. Security incident detection is downstream of MFid, not a substitute for it.
MFid does not measure novelty. A system that does something genuinely new — a category that did not exist when its spec was written — will score against an obsolete spec. The instrument will say “faithful to claim” while the claim has been overtaken by what the system is actually being used for. Spec drift is real and Intentionality’s I_drift sub-score catches some of it; the deeper problem of category creation is not solved here.
The aggregate MFid is computed by SDCorp. The framework, the rubrics, the rollups, the publication cadence are ours. The audit-of-the-audit problem is unresolved by this page alone. Standards engagement, academic co-authorship, and third-party attestation of a sample engagement are all on the roadmap; none is currently engaged or scheduled. We will not list them as “in progress” until they are. Status — honestly — published on the live status page.

Constraint on public-facing targets

An MFid target’s claim must be publicly verifiable to qualify for a public-facing teardown or walkthrough. Privately contracted SLAs, NDA-bound specifications, and access-restricted documentation are out of scope for SDCorp’s public artifacts because the audit source is not falsifiable by the reader.

Private targets remain valid for client-internal engagements; the constraint applies only to the public-facing artifact (blog post, evergreen teardown page, walkthrough). This constraint exists because MFid’s value to the reader depends on the reader being able to verify the underlying claim independently.

Versioning and change log

This methodology is versioned. The current version is MFid Methodology v2.2.0 (2026-06-18).

v2.2.0 (2026-06-18). Added “When MFid exceeds 1.0 — the honest underreporter” section. Formalizes the conditions under which the min(1, …) clip is lifted on individual dimensions and the aggregate composite is permitted to exceed 1.0. Introduces the “Verified Underreporter” designation, the four-condition gate, and Porsche as the canonical reference case. Practical upper bound of ~1.15–1.25 documented. E — Efficiency section updated to cross-reference the new section rather than footnote the concept inline. No existing tier definitions or published scores changed.
v2.1.1 (2026-05-29). Added “Constraint on public-facing targets” section formalizing that public teardowns and walkthroughs require publicly verifiable target claims; private contracted SLAs are out of scope for public artifacts. Surfaced during the Cloudflare walkthrough fact-check incident (see walkthrough v1.1 revision note). No tier definitions or scores changed.
v2.1 (2026-05-28). Evidence-tier renumbering: Tier 1 is now Measured Reality (was Tier 3), Tier 2 unchanged (Published Specification), Tier 3 is now Scientific Calculation (was Tier 1). Sub-label for incomplete measurement moves with the tier and is now 1E (was 3E). Three categories and their definitions are unchanged. No published scores moved as a result. Reason: in ordinary English “Tier 1” reads as strongest evidence; the prior numbering inverted reader intuition. Limitations section expanded to name three scope gaps the instrument does not cover (no-spec exposures, in-spec compromise, novelty/category-creation).
v2.0 (2026-05-27). First reconciled methodology page. v1.x had published two formulas without reconciling them and used two names for the third evidence tier; v2.0 fixed both, published the Intentionality rubric, and added the coverage-label requirement.

Change log is maintained in the MFid open-source repository under CHANGELOG.md.

Want the formula applied to your stack?

Bring the spec, we bring the math. The number does the rest.

Request an Investigation