Aiqrion eval dashboard
Read-only view of Aiqrion eval reports. Sandbox runs use the mock backend (deterministic answers); v0.7 wires real model-backed scores once GPUs are attached.
Recent runs
| Run | Backend | Model | Cases | Pass | Fail | Status |
|---|---|---|---|---|---|---|
v0.5/baseline-real-model.json | mock | aiqrion-core | 8 | 8 | 0 | pass |
v0.5/adapter-eval.json | mock | aiqrion-core (adapter) | 8 | 8 | 0 | pass |
v0.5/codex-real-model-run.json | mock | aiqrion-codex | 1 | 1 | 0 | pass |
v0.5/rag-qdrant-run.json | qdrant-fallback | aiqrion-core | 3 | 3 | 0 | pass |
v0.5/agentos-basic-flow.json | agentos | aiqrion-runtime | 3 | 3 | 0 | pass |
Before / after comparison
v0.5/baseline-real-model.json vs v0.5/adapter-eval.json
- regressions: 0
- improvements: 0
- unchanged: 8
Sandbox runs hit the deterministic mock backend, so the comparison correctly reports zero deltas — pipeline proven.