The four primitives

Now let's think together. In Three failures, one missing layer I argued that healthcare's three persistent AI failures share one shape: each requires a governance protocol layer that doesn't yet exist anywhere — not in applications, not in regulations, not in platforms, not even in the standards layer that handles data shape. This post specifies what such a layer must provide.

Four primitives carry the load.

Content-addressable Health Assets. Programmable Consent. Hash-chained Provenance. Quality-weighted Contribution. Each one addresses a specific failure named in the previous post. Each one earns its place against an alternative that doesn't work.

The claim isn't that these four are provably the smallest possible set. Design spaces resist that kind of proof. The claim is that each one earns its place, that they cluster naturally as a working set, and that any honest governance protocol has to answer all four questions they answer.

Specifying, instead of waving at it#

"Consent and audit" is what every patient-data pitch already says. The phrase is correct and inert. Anything strict enough to actually rule out the failure modes post 2 named has to be specified at the level of data structures and algorithms, not slogans.

A primitive, to count here, has to do three things:

Address a specific failure mode that doesn't dissolve if you refuse to define the primitive. (If "consent" can be replaced by "the patient signed a form," the form was the primitive, not the word.)
Rule out alternatives that look similar but don't carry the same guarantee. (Signed consent records aren't the same as hash-chained consent records, even though both involve signing.)
Compose with the other primitives without circular dependency. (Provenance can't be the thing that verifies its own integrity.)

The four primitives below each pass this bar. Each section names the failure it addresses, the obvious alternative that doesn't work, and what breaks if you remove it.

Health Assets#

Failure addressed: fragmentation.

Post 2 named the fragmentation: EHRs hold parts, apps hold other parts, research datasets are fixed snapshots. For a governance protocol to mean anything across all of these, it needs a way to refer to a specific piece of clinical data that everyone agrees is the same piece — verifiably, across systems that don't trust each other.

That's what a Health Asset is. From HAVEN whitepaper §6.1¹:

HealthAsset := {
    asset_id        : ContentHash      // Derived from content
    data_ref        : SecureReference  // Pointer to clinical data
    substrate       : Identifier       // Data format (FHIR, OMOP, etc.)
    consent_ref     : ConsentID        // Active consent policy
    quality_class   : {A, B, C, D}     // Data quality grade
    provenance_ref  : ProvenanceID     // Audit chain reference
    patient_ref     : PatientID        // Owner of this data
    created_at      : Timestamp
}

The asset_id is a SHA-256 hash of the content. Change one byte of the underlying data, the hash changes, the asset_id no longer matches. The pointer carries its own integrity check. The same trick Git uses for commits².

The obvious alternative: just give every record a UUID.

UUIDs work fine inside one system. They fail at the boundary. Two custodians can issue the same UUID for different records, or different UUIDs for what should be the same record. Reconciliation becomes a coordination problem that has to be solved custodian by custodian. Content addressing dissolves it: same content, same hash, anywhere. No registry needed. No reconciliation needed³.

What breaks if you remove this primitive: the protocol loses any basis for saying two systems are referring to the same record. Every audit becomes "trust me, this is the same record." Every consent becomes ambiguous about what it covers. The fragmentation failure named in post 2 stays unfixed.

Content addressing isn't new. Git has used it since 2005. IPFS implements it for general data. RFC 6920 specifies it for URIs⁴. The choice in HAVEN is to apply it to healthcare records specifically, in a substrate-neutral way — the same Health Asset can wrap a FHIR resource, an OMOP measurement, or a raw document reference.

Failure addressed: no role for the patient.

Three of the four primitives map directly to one of the failures named in post 2. This one doesn't. Consent is the precondition for the other three to mean anything. It's what turns the patient from a data source into an actor in the coordination protocol. Without it, governance has nothing to bite on.

A patient's record is one of dozens. Each custodian decides what's shared, with whom, for what purpose. The patient signs a form, often under duress, and the form is then interpreted application by application. Revoking is a phone call to records. Auditing is a FOIA request. "Consent" in this regime is a paper artifact, not a machine-verifiable proposition.

HAVEN's Consent Protocol turns it into one. From whitepaper §6.2¹:

ConsentAttestation := {
    consent_id      : UUID
    grantor         : PatientIdentity   // Who grants
    grantee         : AccessorIdentity  // Who receives
    scope           : DataScope         // What data
    purpose         : PurposeType       // Why
    conditions      : Conditions[]      // Under what rules
    ...
    status          : {active, revoked, expired}
    signature       : CryptoSignature
}

Three properties make this primitive different from existing consent practice.

Closed-world semantics. If you didn't explicitly grant access to a resource type, the answer is no. Silence is denial. Existing consent regimes default to permission for anything not explicitly forbidden; HAVEN inverts that.

Deterministic verification. Same inputs, same answer, every time. No randomness, no "it depends." That's what makes the consent machine-verifiable rather than interpretable.

Immediate revocation. The next verify() call after a revoke() returns denied. Not "after the next sync." Not "within 24 hours." Immediately.

The ethical foundation isn't new. The Nuremberg Code (1947)⁵ established that voluntary consent is the floor for medical research. The Belmont Report (1979)⁶ codified the principle for modern practice. What's new is making the principle executable — turning a 40-page form into a function.

The obvious alternative: signed consent forms (digital or paper).

A signed form attests that consent happened. It doesn't attest to what the consent permits, doesn't compose with audit trails, and doesn't carry revocation state. Two systems sharing the same signed form will interpret its scope differently. The form is evidence; the primitive needs to be a function.

A note on Identity. Consent grants reference two parties — grantor and grantee. Both have to be verifiable identities for the consent to mean anything. HAVEN deliberately doesn't define how identity is established: "How you verify patients are who they say they are is up to you"⁷. The protocol consumes identity from established systems (OIDC, DIDs, EHR identity proofing⁸) and operates over those. Identity-proofing is its own deep field; reinventing it inside the governance protocol would be a bad bet.

What breaks if you remove this primitive: data flow without governance. The sovereignty failure stays unfixed regardless of how clean the data layer is.

Provenance#

Failure addressed: missing audit.

An audit log inside the system being audited is auditable by the system's custodian. Nobody else. If MyChart logs your record access, you have to ask MyChart for the log. If the log is wrong, you have to ask MyChart to prove it isn't. That's not audit. That's a custodian's self-attestation, served on a printout.

The Provenance Record fixes this by making the log structurally tamper-evident. From whitepaper §6.3¹:

ProvenanceEntry := {
    entry_id        : UUID
    timestamp       : Timestamp
    event_type      : EventType
    actor           : Identity
    subject         : AssetRef | ConsentRef
    details         : EventData
    previous_hash   : Hash          // Chain linkage
    signature       : CryptoSignature
}

Each entry includes the hash of the previous one. Tampering with history breaks the chain — the change cascades forward, every entry after the tampered one becomes invalid. Each entry is signed with Ed25519 or ECDSA, binding it to a specific actor. Verification is O(log n) via Merkle proofs⁹: you don't need to replay the whole chain to check a single entry.

This is the same construction Certificate Transparency uses for the public web's certificate logs¹⁰. And before CT, it's the same construction Haber and Stornetta proposed in 1991¹¹ — seventeen years before Bitcoin. The technique is well-understood. The novelty is applying it to clinical data access.

The obvious alternative: signed but mutable audit logs.

Signatures alone aren't enough. The custodian who owns the log can re-sign a modified version with the same key, and the substitution is undetectable to anyone who doesn't have the original. The chaining is what makes substitution detectable. Without it, "audit" remains a courtesy.

What breaks if you remove this primitive: the patient has no basis for verifying any claim about what happened to their record. Consent becomes unenforceable in the wild, because revocation can't be verified after the fact. The missing-audit failure stays unfixed.

Contribution#

Failure addressed: misaligned incentives.

Patients contribute data; researchers use it; outcomes flow to neither directly. To realign, the protocol needs an accounting primitive — something that turns "Alice contributed records to study X" into a value-weighted quantity that can be tracked, attributed, and eventually paid.

From whitepaper §6.4¹:

Contribution := {
    patient_id      : PatientIdentity
    asset_refs      : AssetRef[]
    quality_score   : Float[0, 1]
    tier            : ContributionTier
    context         : UsageContext
    timestamp       : Timestamp
}

The score follows a transparent formula: Value = TierWeight × QualityScore × VolumeNorm. Tiers run from PROFILE (demographics) through STRUCTURED (labs, meds, conditions) and LONGITUDINAL (multi-year records) to COMPLEX (notes, imaging, genomics). Quality is determined by a three-gate protocol — provenance valid, structure complete, concepts mapped — producing a score from 0 to 1 and a class from A to D.

The score isn't dollars. It's a relative weight. If Alice scores 0.83 and Bob scores 0.41, Alice contributed roughly twice as much to that study. What that translates to in money is between the implementing system, the patients, and the business model. HAVEN provides the accounting, not the payment rails.

The obvious alternative #1: equal-share data dividends.

The Datacoup / Datawallet / LunaDNA model — every contributor gets the same share. This collapses on contact with reality. A patient contributing a single demographic record is treated identically to one contributing ten years of multi-system labs. Researchers won't trust the cohort because it can't be quality-weighted. Patients who contribute heavily get the same as those who contribute thinly. The system fails on both ends¹².

The obvious alternative #2: pure clinical-weight, no quality gating.

Skip the quality gates, weight by clinical content alone. Works in theory, breaks in practice — clinical content quality varies wildly. A LONGITUDINAL record with 95% concept-mapping coverage is different research material from a LONGITUDINAL record with 30%. Without quality gating, "value" becomes garbage-in garbage-out.

The three-gate quality protocol exists because each previous attempt at patient data marketplaces collapsed in one of these two ways. The historical evidence is on the page already.

What breaks if you remove this primitive: value pools at the custodian, not the patient. The misaligned-incentives failure stays unfixed. The protocol becomes another consent-and-audit layer with no honest accounting of where research value goes.

Data Shapley and related attribution methods¹³ suggest more refined math is possible. The three-tier quality-weighted formula is HAVEN's deliberate floor — easy enough to compute, hard enough to defend, and intentionally open to richer attribution schemes layered on top.

Why these four cluster#

Each primitive answers a different question that any governance protocol has to answer:

Primitive	Question it answers
Health Asset	What is the record?
Consent	Who may use it, and for what?
Provenance	What happened to it?
Contribution	What was it worth?

Remove any one and the protocol stops being a protocol. Remove Health Assets and Consent has no stable thing to authorize. Remove Consent and Provenance has nothing to audit against. Remove Provenance and the whole system runs on trust. Remove Contribution and the system has no honest reason for patients to participate.

The four cluster naturally because each answers a category of question that the others can't. They aren't variations on a theme. They aren't aspects of a single underlying concept. They're four distinct functions a governance protocol has to provide if it's going to be the missing layer post 2 named.

That's the working set. A reader who can show that one of them is reducible to another, or that a fifth answers a question I haven't named, should write back. The series is better for the pressure.

What this argument can't show#

Three limits worth naming.

Identity sits outside the protocol's scope. HAVEN's position is that identity-proofing happens outside the protocol, in established systems. That's a deliberate boundary. It also means the protocol inherits whatever weakness exists in the identity layer it rides on. A weak identity binding produces weak consents; HAVEN doesn't fix that upstream problem. It just declines to make it worse.

Settlement is downstream. Turning attribution scores into actual payments — to patients, to research funds, to whatever model the implementing system chooses — is an application concern, not a protocol primitive. HAVEN gives you the accounting. What's done with the accounting is yours to design.

"Cluster naturally" is a judgment call. The argument that each of these four primitives is necessary doesn't rule out the possibility that some other set of four (or five) could do the same work via different decompositions. Design spaces resist that kind of proof, as post 2 already conceded. The defensible claim is necessity against the failures we named. Universal minimality is a separate question.

What comes next#

Specifying four primitives didn't dissolve every problem post 2 named. Two of them surfaced as separate work during specification — not because the primitives are wrong, but because each ran into a verification regime the cryptographic primitives don't cover. One of them lives in a different epistemology entirely.

The next post takes up the first of those two gaps.

HAVEN whitepaper v2.0 (February 2026). DOI: 10.5281/zenodo.18701303. Source: github.com/Chesterguan/HAVEN. ↩ ↩² ↩³ ↩⁴
Git uses SHA-1 today, with migration to SHA-256 in progress. The construction is identical to HAVEN's: hash the content, use the hash as the identifier. Original design: Linus Torvalds, 2005. ↩
IPFS (InterPlanetary File System) implements the same model for general data storage. Content Identifiers (CIDs) are the operational form. The pattern long predates blockchain. ↩
Farrell, S., Kutscher, D., Dannewitz, C., Ohlman, B., Keränen, A., and Hallam-Baker, P. RFC 6920: Naming Things with Hashes. April 2013. Defines the ni: URI scheme for content-addressable resources, including hash-algorithm parameterization. ↩
"The Nuremberg Code." Trials of War Criminals before the Nuremberg Military Tribunals. U.S. Government Printing Office, 1949 (originally issued 1947). ↩
National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. The Belmont Report: Ethical Principles and Guidelines for the Protection of Human Subjects of Research. U.S. Department of Health, Education, and Welfare, 1979. ↩
HAVEN whitepaper §9, "What We're Not Trying to Do." ↩
NIST Special Publication 800-63-4 (2024): Digital Identity Guidelines — identity-proofing assurance levels. W3C Decentralized Identifiers (DIDs) v1.0, W3C Recommendation, July 2022. OpenID Connect Core 1.0 (federated identity). eIDAS Regulation EU 910/2014 and eIDAS 2.0 (2024) for the EU's electronic identification framework. HAVEN is compatible with any of these as the underlying identity substrate. ↩
Merkle, R.C. "A digital signature based on a conventional encryption function." Advances in Cryptology — CRYPTO '87. The hash-tree construction enabling O(log n) inclusion proofs. ↩
Laurie, B., Messeri, E., and Stradling, R. RFC 9162: Certificate Transparency Version 2.0. December 2021. The current normative standard for the public web's certificate transparency logs. ↩
Haber, S., and Stornetta, W.S. "How to time-stamp a digital document." Journal of Cryptology 3.2 (1991): 99-111. First presented at CRYPTO '90. The original hash-linked timestamp construction. ↩
Datacoup (founded 2012, NYC; shut down November 2019; later acquired by ODE July 2021), Datawallet (founded 2014; pivoted to crypto with a $40M DXT token sale in February 2018; functionally dormant by 2026), LunaDNA (founded December 2017 by Bob Kain et al.; closed January 31, 2024 citing capital shortage). Each attempted a patient-side data marketplace with various dividend models; each failed to attract either the patient volume or the research-buyer trust necessary to sustain a market. ↩
Ghorbani, A., and Zou, J. "Data Shapley: Equitable Valuation of Data for Machine Learning." Proceedings of the 36th International Conference on Machine Learning (ICML), 2019. Shapley-value-based attribution for individual data points in machine-learning training sets. ↩