Sand Tables for Supervision
How toy worlds expose brittle reward models, hidden teacher assumptions, and the cost of cheap evaluations.
Read note
Independent AI research from the high desert
Long-form inquiry into machine intelligence, alignment pressure, synthetic culture, tool cognition, and the strange weather produced when models begin to write back.
DEL5 treats AI as a civilization-scale instrument: something to test, annotate, distrust, tune, and live beside. The work is slow, citation-heavy, and public by default.
Expect field notes, replication logs, annotated bibliographies, wagers, negative results, and occasional speculative essays where the evidence runs out and judgment has to begin.
Archive
How toy worlds expose brittle reward models, hidden teacher assumptions, and the cost of cheap evaluations.
Read noteWhy agent systems become legible only after every failed plan, retry, and tool call is treated as first-class data.
Read noteA taxonomy of synthetic media failure modes, from harmless filler to corrosive feedback loops.
Read noteNotes on mechanistic explanations that change a researcher's mind instead of merely decorating a paper.
Read noteWhat happens when a model can keep a promise for an afternoon, a week, or a whole research season.
Read noteA defense of awkward uncertainty, visible scratch work, and models that reveal what they do not know.
Read noteSequences
Field methods for using frontier models without surrendering taste, rigor, or authorship.
Failure analysis, eval design, and the social pressure that distorts technical safety work.
Notes on agents, memory, autonomy, and the next layer of human-computer collaboration.
Lab
The lab is a public workbench for small experiments: prompt audits, model comparisons, eval harnesses, strange transcripts, literature maps, and raw observations worth returning to.
annotated papers
replication logs
open wagers
Latency is a cognitive tax. Measure it before arguing about agency.