See through the mist.

Eval-driven infrastructure for AI systems.

go get github.com/greynewell/mist-go

Use Cases

RL Environments

Reward hacking, reproducibility crises, and the debugging abyss. Why RL training fails and what's missing from the toolchain.

Model Harnesses

Bad data, silent failures, and catastrophic forgetting. The real problems behind fine-tuning that tutorials skip.

AI Agents

Cascading errors, context rot, and agents that ignore instructions. What the benchmarks don't show about production agents.