Companion application

Outcome Primitives: A Framework for Measuring AGI Value in the World

March 2026 · Live PDF · In-vivo application

The companion paper applies the public VCF projection to a persistent AI agent platform and reports the first large-scale distribution of outcome types, structural magnitude, and claim-strength evidence. It reframes the AGI question from passing tests to doing the work.

Publication snapshot

Participants

1,305

Value episodes

17,921

Externally verifiable

19.7%

Key claims

All ten outcome primitives appear in the wild, including initiative-scale interpersonal support that benchmark suites do not even represent.

41.4% of classified episodes exceed eight hours of human effort equivalent, showing substantial structural value beyond micro-task usage.

The evidence distribution stays honest: 36.9% remain unverified, 43.3% show behavioral evidence, and 19.7% leave externally verifiable traces.

Study design

The Public VCF Projection

The application paper uses the minimum visible projection of VCF: Outcome Primitive, Outcome Magnitude, and claim-strength tier. It keeps the reporting surface broad and comparable while leaving richer canonical detail latent.

What the study found

A Capability Landscape in the Wild

Across a 21-day window, the study maps thousands of real-world value episodes into the OP × OM landscape and shows that AI systems are already producing meaningful outcome diversity beyond the benchmark narrative.

How it knows

The Agent as Measurement Instrument

The platform classifies outcomes through structured memory files, workspace artifacts, and behavioral metadata instead of raw chat review. That lets the research team observe outcome traces while minimizing dependence on conversation transcripts.

Open limits

Evidence Gaps Stay Visible

The paper explicitly states where the framework undercounts internal outcomes, where daily files blur attempt boundaries, and why no E3 experimental support yet exists for in-vivo AI evaluation.