Introduction
Evalify.sh is a community-driven registry of expert-authored evaluation criteria for AI agent skills.
What are evals?
Evaluation criteria — evals — are structured tests that define what a skill should do and how well it should do it. They're used by tools like skill-creator to generate accuracy reports and improve skill design.
A single eval looks like this:
{
"prompt": "Summarize this pull request diff in under 100 words",
"expectations": [
"Response is under 100 words",
"Response mentions the core change",
"Response does not include file paths or line numbers"
]
}
Better evals → better-designed skills → more reliable AI agents.
Who is this for?
Skill consumers — you're building or using an AI agent and want battle-tested evals to validate it behaves correctly. Browse the registry, pull a pack, and point skill-creator at it.
Skill authors — you've designed a skill and want to share your evaluation criteria with the community. Write a pack, publish it, and let others improve on your work.
Supported formats
Evalify supports two eval formats:
- Anthropic skill-creator v2 — the format used by skills built with skill-creator. Includes
skill_name,id,expected_output,files, andexpectations. - Evalify (native) — a minimal format: just
promptandexpectations.
All formats normalize to the same internal model, so packs are usable regardless of which format they were authored in. See Import & Export for the full format reference.
Guides
- Getting Started — Browse, install, and use eval packs
- Publishing — Write and publish your own eval pack
- Import & Export — Supported formats and how conversion works
- CLI Reference — Full command reference