See through the mist.

Eval-driven infrastructure for AI systems.

go get github.com/greynewell/mist-go

Use Cases

Reward hacking, reproducibility crises, and the debugging abyss. Why RL training fails and what's missing from the toolchain.

Bad data, silent failures, and catastrophic forgetting. The real problems behind fine-tuning that tutorials skip.

Cascading errors, context rot, and agents that ignore instructions. What the benchmarks don't show about production agents.