Aiqrion v0.6 is in alpha — model output is not production-grade and may be incorrect. Aiqrion does not claim to beat Claude / ChatGPT / Gemini / Grok yet.

Aiqrion eval dashboard

Read-only view of Aiqrion eval reports. Sandbox runs use the mock backend (deterministic answers); v0.7 wires real model-backed scores once GPUs are attached.

Recent runs

RunBackendModelCasesPassFailStatus
v0.5/baseline-real-model.jsonmockaiqrion-core880pass
v0.5/adapter-eval.jsonmockaiqrion-core (adapter)880pass
v0.5/codex-real-model-run.jsonmockaiqrion-codex110pass
v0.5/rag-qdrant-run.jsonqdrant-fallbackaiqrion-core330pass
v0.5/agentos-basic-flow.jsonagentosaiqrion-runtime330pass

Before / after comparison

v0.5/baseline-real-model.json vs v0.5/adapter-eval.json

  • regressions: 0
  • improvements: 0
  • unchanged: 8

Sandbox runs hit the deterministic mock backend, so the comparison correctly reports zero deltas — pipeline proven.