Import & Export

Evalify reads and writes eval files in multiple formats. The same pack can be consumed by different tools — Evalify handles the conversion.

Internal model

All formats are normalized to a single internal representation before processing:

interface EvalItem {
  prompt: string;        // The task or query to send to the AI
  expectations: string[]; // Verifiable pass/fail criteria
}

This is the model stored in the registry and returned by the CLI. Import and export are transformations to and from this model.

Anthropic Skill Creator v2

ID: anthropic/skillcreator/v2

Import (read)

Evalify detects this format when the file has a skill_name wrapper, or when items contain id + expected_output + files alongside prompt + expectations. Both the wrapped and bare-array forms are accepted.

{
  "skill_name": "my-skill",
  "evals": [
    {
      "id": 1,
      "prompt": "Summarize this pull request diff in under 100 words",
      "expected_output": "A concise summary that mentions the core change",
      "files": [],
      "expectations": [
        "Response is under 100 words",
        "Response mentions the core change",
        "Response does not include file paths or line numbers"
      ]
    }
  ]
}

On import, prompt and expectations are extracted. expected_output, id, and files are not part of the internal model but are preserved when round-tripping.

Export (write)

When exporting back to Skill Creator v2, Evalify produces the full wrapper format. id is re-assigned sequentially from 1. expected_output is left empty (it's a human-authored prose field). files is set to [].

{
  "skill_name": "my-skill",
  "evals": [
    {
      "id": 1,
      "prompt": "...",
      "expected_output": "",
      "files": [],
      "expectations": ["..."]
    }
  ]
}

Evalify (native)

ID: evalify

Import (read)

Detected when items have prompt + expectations without Skill Creator-specific fields. Accepts both a bare array and an { evals: [...] } wrapper.

[
  {
    "prompt": "Summarize this pull request diff in under 100 words",
    "expectations": [
      "Response is under 100 words",
      "Response mentions the core change"
    ]
  }
]

Export (write)

Exports as a flat array of { prompt, expectations } objects — no wrapper, no extra fields.

Format detection order

Detection runs most-specific first:

Skill Creator v2 — skill_name wrapper, or items with id + expected_output + files
Evalify — prompt + expectations fallback