Companion application
Outcome Primitives: A Framework for Measuring AGI Value in the World
March 2026 · Live PDF · In-vivo application
The companion paper applies the public VCF projection to a persistent AI agent platform and reports the first large-scale distribution of outcome types, structural magnitude, and claim-strength evidence. It reframes the AGI question from passing tests to doing the work.
Publication snapshot
Participants
1,305
Value episodes
17,921
Externally verifiable
19.7%
Key claims
All ten outcome primitives appear in the wild, including initiative-scale interpersonal support that benchmark suites do not even represent.
41.4% of classified episodes exceed eight hours of human effort equivalent, showing substantial structural value beyond micro-task usage.
The evidence distribution stays honest: 36.9% remain unverified, 43.3% show behavioral evidence, and 19.7% leave externally verifiable traces.
Study design
The Public VCF Projection
The application paper uses the minimum visible projection of VCF: Outcome Primitive, Outcome Magnitude, and claim-strength tier. It keeps the reporting surface broad and comparable while leaving richer canonical detail latent.
What the study found
A Capability Landscape in the Wild
Across a 21-day window, the study maps thousands of real-world value episodes into the OP × OM landscape and shows that AI systems are already producing meaningful outcome diversity beyond the benchmark narrative.
How it knows
The Agent as Measurement Instrument
The platform classifies outcomes through structured memory files, workspace artifacts, and behavioral metadata instead of raw chat review. That lets the research team observe outcome traces while minimizing dependence on conversation transcripts.
Open limits
Evidence Gaps Stay Visible
The paper explicitly states where the framework undercounts internal outcomes, where daily files blur attempt boundaries, and why no E3 experimental support yet exists for in-vivo AI evaluation.