Contributing real-world cases and runs

The repo is useful only if it stays grounded in real work without becoming a dump of private prompts or random benchmark puzzles.

The contribution model

There are two primary contribution types:

Both start as issues. Accepted suite changes are curated.

Run results should be packaged with:

python tools/package_submission.py runs/local/<run-dir>

The zip includes:

Attach that zip to a Submit run result issue instead of committing local run output directly.

A useful case proposal answers:

What real workflow failure inspired this?
What is the minimum safe context needed to reproduce it?
What should the sidecar model return?
How is it scored?
Why does this matter when a local model is used beside Codex, Claude Code, Cursor, Aider, or another frontier coding agent?

Accepted cases should test companion-agent usefulness:

They should not be generic trivia or abstract puzzle prompts.

Describe the failure mode, not your business. Sanitize aggressively.