commit 58f9f221b100aca67ddf13e0d384c99c56a72f63 Author: Carlos Ouyang Date: Wed Apr 15 11:06:05 2026 +0800 Initial standalone memabra release diff --git a/README.md b/README.md new file mode 100644 index 0000000..fffe149 --- /dev/null +++ b/README.md @@ -0,0 +1,84 @@ +# memabra + +An intuition-driven control plane for agent memory and action selection. + +## What is memabra? + +memabra is a local-first, observable, trainable, and replayable agent memory and action orchestration system. + +Instead of being a simple memory database, memabra acts as a meta-cognitive controller for agents: given a task, it quickly decides whether to answer directly, recall memory, load a skill, or invoke a tool — and continuously improves this judgment based on task outcomes. + +## Install + +```bash +git clone https://github.com/TacitLab/memabra.git +cd memabra +python -m venv venv +source venv/bin/activate +pip install -e ".[dev]" +``` + +## Quick start + +### 1. See the available commands + +```bash +memabra --help +``` + +### 2. Run a dry-run evaluation + +A safe way to see the full workflow without actually promoting a new router version: + +```bash +memabra run --dry-run --format text +``` + +### 3. Check system status + +```bash +memabra status --format text +``` + +### 4. List saved router versions + +```bash +memabra version list --format text +``` + +### 5. Roll back to a previous version + +```bash +memabra version rollback --format text +``` + +## CLI subcommands + +| Command | Description | +|---------|-------------| +| `memabra run` | Run the online learning workflow | +| `memabra status` | Show current system state | +| `memabra version list` | List all saved router versions | +| `memabra version rollback ` | Roll back to a specific version | + +## Text output format + +By default, memabra prints JSON. For operator-friendly summaries, add `--format text`: + +- **Status** — current version, trajectory/report counts, latest report timing and promotion outcome. +- **Version list** — total count, current active version highlighted. +- **Workflow** — grouped into Summary, Baseline, Challenger, Deltas, and Decision sections with normalized `yes/no` flags and fixed-precision metrics. + +## Running tests + +```bash +pytest tests/ -q +``` + +## Project status + +See [docs/PROGRESS.md](docs/PROGRESS.md) for a detailed capability roadmap and [docs/DEMO.md](docs/DEMO.md) for walkthrough examples. + +## License + +MIT diff --git a/docs/ALPHA_ITERATION_1_PLAN.md b/docs/ALPHA_ITERATION_1_PLAN.md new file mode 100644 index 0000000..382423f --- /dev/null +++ b/docs/ALPHA_ITERATION_1_PLAN.md @@ -0,0 +1,252 @@ +# memabra Alpha Iteration 1 Plan + +> For Hermes: continue this plan autonomously in small TDD-driven increments. Each run should complete one or more concrete tasks, update this file's progress section, run targeted tests first, then run the full memabra test suite. + +Goal: turn memabra from a showable prototype into a safe self-improving alpha by adding an online learning loop with automatic training, evaluation, gated promotion, and rollback-safe router deployment. + +Architecture: +- Keep the current layered design. +- Do not replace existing routers; add an orchestration layer around them. +- Promotion must be benchmark-gated: no automatic router switch without passing evaluation thresholds. +- Persist every training/promotion attempt as an auditable artifact. + +Tech stack: +- Existing memabra Python package under `src/memabra/` +- Existing pytest suite under `tests/memabra/` +- Existing persistence via JSON artifacts; keep it simple for alpha + +--- + +## Acceptance criteria + +Alpha Iteration 1 is complete when memabra can: +1. detect newly accumulated trajectories +2. build a training dataset from eligible trajectories +3. train a challenger router automatically +4. run challenger vs baseline on a fixed benchmark set +5. promote challenger only if thresholds are met +6. save a versioned promoted router +7. keep an auditable training/promotion report +8. leave the currently active router unchanged when challenger loses + +--- + +## Implementation phases + +### Phase A — Benchmark-gated online learning loop + +#### Task A1: Add a promotion policy object +Objective: define explicit acceptance rules for promoting a challenger router. + +Files: +- Create: `src/memabra/promotion.py` +- Create: `tests/memabra/test_promotion.py` + +Required behavior: +- Define a `PromotionPolicy` dataclass +- Inputs should include at least: + - `min_reward_delta` + - `max_error_rate_increase` + - `max_latency_increase_ms` + - `required_task_count` +- Provide `evaluate(baseline, challenger) -> PromotionDecision` +- `PromotionDecision` should include: + - `accepted: bool` + - `reasons: list[str]` + - `metrics: dict` + +TDD steps: +1. Write failing tests for accepted and rejected cases. +2. Run targeted tests and verify failure. +3. Implement minimal policy logic. +4. Re-run targeted tests. +5. Re-run full memabra suite. + +#### Task A2: Add benchmark suite persistence +Objective: store and load a fixed benchmark task set for repeatable evaluations. + +Files: +- Create: `src/memabra/benchmarks.py` +- Create: `tests/memabra/test_benchmarks.py` + +Required behavior: +- Define a serializable benchmark suite format +- Load/save benchmark tasks from JSON +- Provide a default benchmark seed for memory/tool/skill/composite coverage + +TDD steps: +1. Write failing benchmark round-trip tests. +2. Verify RED. +3. Implement load/save helpers. +4. Verify GREEN. +5. Run full suite. + +#### Task A3: Add online training coordinator +Objective: orchestrate dataset selection, training, evaluation, and promotion. + +Files: +- Create: `src/memabra/online_learning.py` +- Create: `tests/memabra/test_online_learning.py` + +Required behavior: +- Define `OnlineLearningCoordinator` +- It should: + - query trajectories from `ArtifactIndex` + - enforce minimum new trajectory count + - train a challenger with `DatasetBuilder` + - evaluate challenger with `Evaluator` + - apply `PromotionPolicy` + - save promoted routers via `RouterVersionStore` + - emit a structured report whether accepted or rejected + +TDD steps: +1. Write failing tests for: + - skip when too few new trajectories + - reject when policy fails + - accept and save version when policy passes +2. Verify failure. +3. Implement minimal coordinator. +4. Verify targeted tests. +5. Run full suite. + +### Phase B — Auditability and safe deployment + +#### Task B1: Add training run reports +Objective: persist every online-learning attempt, not just successful promotions. + +Files: +- Extend: `src/memabra/persistence.py` or create `src/memabra/training_reports.py` +- Create: `tests/memabra/test_training_reports.py` + +Required behavior: +- Save a JSON report per training run +- Include: + - timestamp + - source trajectory ids + - sample count + - baseline metrics + - challenger metrics + - promotion decision + - promoted version id if any + +#### Task B2: Add active router metadata tracking +Objective: make it obvious which router is active and why. + +Files: +- Extend: `src/memabra/router_versioning.py` +- Extend: `tests/memabra/test_router_versioning.py` + +Required behavior: +- Track metadata for current active router +- Record promotion source, benchmark result summary, and prior version +- Make rollback preserve audit trail + +### Phase C — Product surface and automation + +#### Task C1: Add app-level online learning entrypoint +Objective: expose one-call retrain/evaluate/promote behavior from `MemabraApp`. + +Files: +- Extend: `src/memabra/app.py` +- Extend: `tests/memabra/test_app.py` + +Required behavior: +- Add a method like `run_online_learning_cycle(...)` +- Return a structured result dict/report + +#### Task C2: Add CLI entrypoint for the alpha loop +Objective: make the safe online-learning loop runnable from the command line. + +Files: +- Extend: `src/memabra/cli.py` +- Extend: `tests/memabra/test_cli_workflow.py` +- Update: `docs/projects/memabra/DEMO.md` + +Required behavior: +- Add a callable workflow that: + - seeds or uses existing artifacts + - runs one online-learning cycle + - prints the report JSON + +#### Task C3: Update docs and wrap-up materials +Objective: document the alpha loop clearly. + +Files: +- Update: `docs/projects/memabra/PROGRESS.md` +- Update: `docs/projects/memabra/ROADMAP.md` +- Update: `docs/projects/memabra/DEMO.md` +- Optional: create `docs/projects/memabra/ONLINE_LEARNING.md` + +Required behavior: +- Explain promotion gates +- Explain how to run one cycle manually +- Explain where reports and versions are stored + +--- + +## Suggested run order for autonomous 20-minute cycles + +Cycle group 1: +- A1 promotion policy +- A2 benchmark suite persistence + +Cycle group 2: +- A3 online training coordinator + +Cycle group 3: +- B1 training run reports +- B2 active router metadata tracking + +Cycle group 4: +- C1 app-level entrypoint +- C2 CLI workflow +- C3 docs cleanup + +--- + +## Estimated autonomous runs + +Recommended initial budget: 18 runs at every 20 minutes. + +Reasoning: +- 3 to 4 runs for Phase A +- 3 to 4 runs for Phase B +- 2 to 3 runs for Phase C +- remaining runs as slack for regression fixes, docs cleanup, and one or two extra quality passes + +At 20 minutes per run, 18 runs gives about 6 hours of autonomous iteration, which is a reasonable overnight alpha push. + +--- + +## Progress tracker + +- [x] Task A1 — promotion policy +- [x] Task A2 — benchmark suite persistence +- [x] Task A3 — online training coordinator +- [x] Task B1 — training run reports +- [x] Task B2 — active router metadata tracking +- [x] Task C1 — app-level online learning entrypoint +- [x] Task C2 — CLI online learning workflow +- [x] Task C3 — docs cleanup and operator guidance +- [x] Task D1 — baseline version selection for online learning +- [x] Task E1 — task case index for episodic retrieval + +## Run log + +- 2026-04-14: Plan created. Ready for autonomous overnight execution. +- 2026-04-14 22:52 UTC: Completed Tasks A1–A3. Promotion policy, benchmark persistence, and online training coordinator implemented with tests. Full suite: 71 passed. +- 2026-04-14 23:22 UTC: Completed Tasks B1–C3. Training reports, active router metadata tracking, app/CLI entrypoints, and docs implemented with tests. Full suite: 78 passed. +- 2026-04-14 23:24 UTC: Quality pass — CLI main() now defaults to online-learning workflow, fixed schema test resource warning, added missing alpha module exports to package __init__.py. Full suite: 82 passed. +- 2026-04-14 23:50 UTC: Docs and repo hygiene pass — updated DEMO.md and ONLINE_LEARNING.md to reflect that `python -m src.memabra.cli` runs the online-learning workflow; added `docs/projects/memabra/demo-artifacts/` to `.gitignore`; verified CLI end-to-end (promoted=true, version saved, report emitted). Full suite: 82 passed. +- 2026-04-15 00:49 UTC: Safety and usability pass — added exception handling in `OnlineLearningCoordinator` so training/evaluation failures emit error reports instead of crashing; added CLI argument parsing (`--base-dir`, `--min-new-trajectories`); fixed `python -m src.memabra.cli` RuntimeWarning via lazy `cli` import; added `TrainingReportStore.get_report()` for by-id lookup; exported `BenchmarkTask` from package `__init__.py`; updated DEMO.md and ONLINE_LEARNING.md. Full suite: 88 passed. +- 2026-04-15 01:15 UTC: Repo hygiene and commit pass — verified end-to-end CLI workflow produced a promoted router, version, and report; updated `.gitignore` to exclude runtime artifact directories (`router-versions/`, `training-reports/`); committed entire memabra alpha codebase (67 files, 6,818 insertions). Full suite: 88 passed. +- 2026-04-15 02:00 UTC: Persistence pass — `OnlineLearningCoordinator` now supports `seen_trajectory_store` to persist seen trajectory IDs across restarts, preventing duplicate retraining in cron jobs. Added `test_coordinator_persists_seen_trajectory_ids_across_restarts`. Fixed evaluation leakage by refreshing the artifact index after benchmarking and marking post-evaluation trajectories as seen. Wired `seen_trajectory_store` through `app.py` and `cli.py`; CLI now defaults to `/seen-trajectories.json`. Added corresponding tests. Full suite: 91 passed. +- 2026-04-15 02:27 UTC: Dry-run pass — committed pending persistence-pass changes, then added `--dry-run` CLI flag and `dry_run` parameter through the full stack (`OnlineLearningCoordinator`, `app.py`, `cli.py`). In dry-run mode training and evaluation execute but promotion and version saving are skipped; an audit report is still emitted with `dry_run: true`. Added `test_coordinator_dry_run_does_not_promote_or_save_version` and `test_main_entrypoint_passes_dry_run_flag`. Updated `ONLINE_LEARNING.md`. Full suite: 93 passed. +- 2026-04-15 02:51 UTC: Baseline-version pass — added `baseline_version_id` parameter to `OnlineLearningCoordinator.run_cycle()`, `MemabraApp.run_online_learning_cycle()`, and CLI `--baseline-version` flag. This lets operators evaluate a challenger against a specific saved router version rather than the currently active one. Added tests for coordinator, app, and CLI. Updated `ONLINE_LEARNING.md`. Full suite: 96 passed. +- 2026-04-15 03:18 UTC: Verification pass — confirmed all tasks A1–D1 are complete and stable. Ran full memabra suite (96 passed) and end-to-end CLI workflow (promoted=true, version saved, report emitted). No code changes required; repo is clean and ready for operator review. +- 2026-04-15 04:02 UTC: Started Phase E — added `CaseIndex` (`src/memabra/case_index.py`) for task-level episodic retrieval. Maps normalized task inputs to the highest-reward trajectory ID, with JSON save/load. Added `tests/memabra/test_case_index.py` (4 tests). Full suite: 100 passed. +- 2026-04-15 04:27 UTC: Integrated `CaseIndex` into `MemabraApp` and `MemabraRunner` for episodic retrieval. Added app-level methods (`build_case_index`, `save_case_index`, `load_case_index`, `best_trajectory_for`). Runner now injects an episodic memory candidate when a case index hit occurs. Added CLI flags `--case-index` and `--rebuild-case-index`. Updated docs. Full suite: 107 passed. +- 2026-04-15 04:54 UTC: Added `case_index_path` support to `OnlineLearningCoordinator` so the case index is automatically rebuilt after each online-learning cycle (including benchmark-generated trajectories). Wired parameter through `app.py` and `cli.py`. Added tests for coordinator, app, and CLI. Full suite: 110 passed. +- 2026-04-15 05:18 UTC: Added `TrajectorySummarizer` (`src/memabra/trajectory_summary.py`) for generating human-readable trajectory summaries. Integrated summarizer into `MemabraRunner` so episodic memory candidates contain rich summaries when a `persistence_store` is available. Added `tests/memabra/test_trajectory_summary.py` (4 tests) and updated runner test. Full suite: 114 passed. +- 2026-04-15 05:42 UTC: Added CLI `--status` flag (`src/memabra/cli.py`) to print current system state (active router version, version count, trajectory count, report count, latest report summary) without running a learning cycle. Added `tests/memabra/test_cli_workflow.py::test_main_status_flag_prints_status_and_skips_workflow`. Full suite: 115 passed. +- 2026-04-15 06:05 UTC: Added CLI `--rollback` and `--list-versions` flags for operator-safe router version management. Added error handling for missing rollback targets (exits 1 with clean message). Added corresponding tests. Full suite: 118 passed. Updated `ONLINE_LEARNING.md` and `DEMO.md` documentation. diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md new file mode 100644 index 0000000..550fb35 --- /dev/null +++ b/docs/ARCHITECTURE.md @@ -0,0 +1,219 @@ +# Architecture + +## 1. 问题定义 + +我们要解决的不是“怎样让模型记住更多”,而是: +当 agent 遇到一个任务时,怎样在有限上下文、有限工具预算和有限时间下,快速决定是否要调用 memory、skill、tool,并让这个决策过程能够被训练和修正。 + +## 2. 系统总览 + +系统采用四层架构。 + +### 2.1 Retrieval Layer(候选召回层) +输入: +- 当前用户任务 +- 对话短摘要 +- 当前环境状态 +- 失败历史 / 最近修正 + +输出: +- top-k memory candidates +- top-k skill candidates +- top-k tool candidates + +职责: +- 从不同来源召回候选对象 +- 统一为标准候选格式 +- 不做最终决策,只做缩小搜索空间 + +### 2.2 Policy Layer(直觉 / 路由层) +输入: +- 当前任务表示 +- 候选对象集合 +- 历史选择特征 +- 成本与风险信号 + +输出: +- 直接回答 +- 读取某条 memory +- 加载某个 skill +- 调用某个 tool +- 组合动作(如先 skill 后 tool) +- 请求澄清 + +职责: +- 模拟“直觉” +- 先做快速动作选择 +- 后续可从规则逐步升级到分类器、reranker、bandit、RL policy + +### 2.3 Execution Layer(执行层) +职责: +- 注入记忆到上下文 +- 加载 skill 指令 +- 调用真实工具 +- 记录执行步骤、耗时、报错、产出 + +### 2.4 Evaluation Layer(反馈 / 归因层) +职责: +- 判断任务是否成功 +- 分析步骤数、重试数、错误率、用户修正次数 +- 拆解 reward +- 产生可训练轨迹 + +没有这一层,就没有真正的“学习”,只有玄学调参。 + +## 3. 统一对象模型 + +虽然 memory、skill、tool 性质不同,但在召回和路由阶段可以统一成候选对象: + +```json +{ + "id": "string", + "type": "memory|skill|tool", + "title": "string", + "summary": "string", + "triggers": ["string"], + "cost": 0.0, + "confidence": 0.0, + "success_rate": 0.0, + "freshness": 0.0, + "risk": 0.0, + "embedding": "vector-ref", + "tags": ["string"], + "source": "user|system|generated|external" +} +``` + +注意:统一的是候选接口,不是语义本体。 +三类对象必须保持边界: +- memory 存事实 +- skill 存程序 +- tool 存动作能力 + +## 4. 记忆系统分层 + +### 4.1 Semantic Memory(事实记忆) +例如: +- 用户偏好 +- 机器环境 +- 项目约定 +- API 限制 + +### 4.2 Procedural Memory(程序性记忆) +即 skill: +- 某类任务的处理流程 +- 踩坑经验 +- 验证步骤 + +### 4.3 Episodic Memory(情景记忆) +- 某次任务的具体轨迹 +- 当时用了什么资源 +- 为什么成功或失败 + +### 4.4 Working Memory(工作记忆) +- 当前任务临时状态 +- 本轮推理中间产物 +- 不应直接沉淀为长期记忆 + +## 5. 训练策略:先外部策略,后端到端 + +### 5.1 Phase A:不改基础模型权重 +先训练一个小型策略器,决定: +- 要不要查记忆 +- 查哪类记忆 +- 要不要 skill +- 先用哪个工具 + +可选实现: +- 规则 + 分数融合 +- 轻量分类器 +- reranker +- contextual bandit + +### 5.2 Phase B:从轨迹中学 reranking / routing +训练输入: +- 任务上下文 +- 候选对象集合 +- 实际动作 +- 结果 reward + +训练目标: +- 最大化任务完成率 +- 最小化无效调用 +- 减少用户重复提供信息 +- 减少不必要的上下文膨胀 + +### 5.3 Phase C:端到端实验 +只有当以下条件成立,才值得考虑: +- 已有高质量轨迹数据 +- 能做 credit assignment +- 有稳定的离线评估环境 +- 能控制灾难性遗忘 + +## 6. Feedback & Reward 设计 + +reward 不能只看任务是否成功。要拆成多项: +- task_success:最终是否完成 +- efficiency:用了多少步 +- retrieval_hit:是否命中关键 memory/skill/tool +- user_correction_penalty:用户是否纠正 +- tool_error_penalty:是否触发无效工具调用 +- context_cost_penalty:上下文是否膨胀过度 +- latency_penalty:是否过慢 + +可组合为: + +```text +R = a*task_success + b*retrieval_hit - c*tool_error - d*user_correction - e*latency - f*context_cost +``` + +## 7. 关键难点 + +### 7.1 Credit Assignment +成功了,到底是谁的功劳? +要记录候选集、最终选择、未选备选项,才能做反事实分析。 + +### 7.2 False Reinforcement +错误记忆被反复命中,会自我强化。 +需要: +- 置信度 +- 可撤销 +- 最近验证时间 +- 来源追踪 + +### 7.3 Exploitation vs Exploration +老选最稳的对象会变保守,永远学不到新模式。 +需要安全探索机制。 + +### 7.4 Type Boundary Collapse +如果把 memory、skill、tool 混成一个大向量池,系统会越来越糊。 + +## 8. 推荐 MVP + +### MVP-1:可观测系统 +- 定义对象 schema +- 定义事件 schema +- 统一记录轨迹 +- 做基础检索 +- 用规则路由 + +### MVP-2:轻量学习型路由 +- 加入候选打分器 +- 从优秀轨迹训练动作选择器 +- 做离线回放评估 + +### MVP-3:在线自适应 +- 使用 bandit / preference updates +- 根据任务结果微调路由策略 + +### MVP-4:端到端试验场 +- 小规模实验性训练 +- 与分层方案对比 +- 验证是否真有收益 + +## 9. 核心原则 + +1. 先可观测,再可学习 +2. 先学路由,再学大脑 +3. 先做分层归因,再做端到端优化 +4. 优化“何时依赖什么”,而不是盲目优化“模型看起来更聪明” \ No newline at end of file diff --git a/docs/DECISIONS.md b/docs/DECISIONS.md new file mode 100644 index 0000000..7e01ce6 --- /dev/null +++ b/docs/DECISIONS.md @@ -0,0 +1,94 @@ +# Design Decisions + +## D-001: 不以端到端训练作为第一阶段目标 + +决定: +第一阶段采用分层架构,不直接训练一个从任务到动作的黑盒大模型。 + +原因: +- 反馈稀疏 +- credit assignment 困难 +- 数据量不足时容易学偏 +- 可解释性太差,难 debug + +影响: +项目先构建 observability、logging、router 和 reward 层。 + +## D-002: 将 memory、skill、tool 统一为候选对象接口,但不混淆类型 + +决定: +在召回和排序阶段,三者共享统一候选 schema;在存储、执行和评估阶段,保持强类型边界。 + +原因: +- 统一召回便于路由决策 +- 保持类型边界可避免语义坍塌 + +影响: +后续 schema 设计需要同时支持统一特征和类型特有字段。 + +## D-003: 记忆分为 facts / procedures / episodes / working 四层 + +决定: +长期系统至少区分: +- facts +- procedures +- episodes +- working memory + +原因: +“记忆”不是一坨文本,人的有效直觉来自多种记忆系统协同。 + +影响: +每个写入动作都要先判定落到哪一层,而不是直接塞进统一向量库。 + +## D-004: 先优化路由策略,再考虑学习基础模型内部权重 + +决定: +学习目标先放在 external policy 上,而不是 foundation model 的参数上。 + +原因: +- 小模型更便宜 +- 训练更稳定 +- 更容易比较实验结果 +- 更适合本地部署 + +影响: +需要专门设计 router features、训练样本和离线评估框架。 + +## D-005: reward 必须拆分,不使用单一任务成败信号 + +决定: +reward 将拆分为 success、efficiency、retrieval_hit、user_correction、tool_error、latency、context_cost 等因子。 + +原因: +只看任务成功会掩盖大量中间行为质量问题。 + +影响: +需要事件级 logging,不能只存最终答案。 + +## D-006: 所有学习都建立在可回放轨迹上 + +决定: +任何策略更新都必须能追溯到完整 trajectory。 + +原因: +不可回放,就无法排查策略劣化;不可回放,也无法做人类审计。 + +影响: +trajectory schema 和 replay 工具会成为基础设施,而不是可选项。 + +## D-007: 项目正式命名为 memabra + +决定: +项目正式名采用 `memabra`。 + +副标题: +An intuition-driven control plane for agent memory and action selection. + +原因: +- 需要一个可品牌化、可传播的短名 +- 技术本质由副标题补足 +- 避免旧名把项目误导成“单纯记忆管理工具” + +影响: +后续所有原型代码、文档、schema 标识、演示材料统一使用 memabra。 \ No newline at end of file diff --git a/docs/DEMO.md b/docs/DEMO.md new file mode 100644 index 0000000..913805e --- /dev/null +++ b/docs/DEMO.md @@ -0,0 +1,148 @@ +# Demo + +memabra now has a polished wrap-up workflow in addition to the lower-level demo app. + +## Quick run + +If you installed the repo in editable mode, prefer the dedicated CLI command: + +```bash +source venv/bin/activate +memabra +``` + +The legacy developer entrypoint still works too: + +```bash +source venv/bin/activate +python -m src.memabra.cli +``` + +This runs the online-learning loop: it seeds demo tasks, trains a challenger router, evaluates it against a benchmark suite, promotes it if thresholds are met, and prints a JSON report. + +You can override the default artifact directory and minimum trajectory threshold: + +```bash +source venv/bin/activate +memabra run --base-dir /custom/artifacts --min-new-trajectories 5 +``` + +You can also enable episodic retrieval by rebuilding the case index from saved trajectories: + +```bash +source venv/bin/activate +memabra run --rebuild-case-index +``` + +You can check system status, list versions, or roll back without running a learning cycle: + +```bash +source venv/bin/activate +memabra status +memabra version list +memabra version rollback 20260414-123456 +``` + +If you want operator-friendly output instead of raw JSON, use `--format text`: + +```bash +source venv/bin/activate +memabra status --format text +memabra version list --format text +memabra version rollback 20260414-123456 --format text +memabra run --dry-run --format text +``` + +The text formatter is aimed at operators: status output includes the latest report timing/outcome, version listings highlight the currently active router version, and workflow output is grouped into summary/baseline/challenger/deltas/decision sections with normalized yes/no and fixed-precision metrics. + +You can also call it programmatically: + +```bash +source venv/bin/activate +python - <<'PY' +from src.memabra.cli import run_online_learning_workflow +result = run_online_learning_workflow() +print(result) +PY +``` + +The online-learning workflow will: +1. build a demo app +2. seed example tasks (if no trajectories exist yet) +3. run one online-learning cycle +4. train a challenger router +5. evaluate it against the baseline on a fixed benchmark suite +6. promote it only if the promotion policy accepts +7. persist a training report under `training-reports/` +8. print a JSON report + +## Python API + +```python +from src.memabra.cli import run_wrapup_workflow, run_online_learning_workflow + +# Legacy wrap-up demo +result = run_wrapup_workflow() +print(result) + +# Safe online-learning loop with benchmark-gated promotion +result = run_online_learning_workflow() +print(result) +``` + +## Lower-level demo app + +You can still drive the app manually: + +```bash +source venv/bin/activate +python - <<'PY' +from src.memabra.app import build_demo_app +app = build_demo_app() + +for prompt in [ + 'Use my telegram preference for this answer.', + 'Check the current system status.', + 'Deploy this service with the usual workflow.', +]: + trajectory = app.run_task(prompt, channel='telegram', user_id='oza') + print(prompt) + print(trajectory['decisions'][0]['decision_type'], trajectory['outcome']['status'], trajectory['reward']['total']) + print([event['event_type'] for event in trajectory['events']]) + print('---') + +print(app.replay_summary()) +PY +``` + +## Output locations + +By default the workflows write to: +- `docs/projects/memabra/demo-artifacts/trajectories/` +- `docs/projects/memabra/demo-artifacts/memories/` +- `docs/projects/memabra/demo-artifacts/router-versions/` +- `docs/projects/memabra/demo-artifacts/training-reports/` + +## What this proves + +The alpha is able to demonstrate the whole loop: +- retrieval +- routing +- execution +- persistence +- replay +- training +- evaluation +- router versioning +- benchmark-gated promotion +- auditable training reports + +## Limits + +This is still an alpha: +- learning is lightweight, not a deep model +- storage is JSON-file based +- promotion policy thresholds are manually configured +- tool/skill integration is still narrower than a production agent platform + +But it is now a safe, self-improving alpha, not just a pile of modules. diff --git a/docs/EXECUTION_AND_PERSISTENCE.md b/docs/EXECUTION_AND_PERSISTENCE.md new file mode 100644 index 0000000..17bee56 --- /dev/null +++ b/docs/EXECUTION_AND_PERSISTENCE.md @@ -0,0 +1,77 @@ +# Execution and Persistence + +## 目标 + +给 memabra 补上两块真正让系统“落地”的骨头: +- execution:让路由决策进入可执行动作层 +- persistence:让 trajectory 和 memory record 能落到磁盘 + +## 当前实现 + +### execution.py +提供: +- `ActionResult` +- `MemoryExecutor` +- `SkillExecutor` +- `ToolExecutor` (原 MockToolExecutor,现已升级为可接真实后端) +- `ExecutionEngine` +- `ToolBackend` 协议(支持 `params` 传参) +- `LocalFunctionToolAdapter` — 将工具映射到本地 Python 函数 +- `SubprocessToolAdapter` — 将工具映射到 shell 命令 +- `ToolRegistry` — 按 `tool_id` 注册、查找和执行工具 + +当前行为: +- `inject_memory` 会产出 `memory_injected` 事件,并在有 memory store 时标记 `last_used_at` +- `load_skill` 会产出 `skill_loaded` 事件 +- `call_tool` 会通过 `ToolBackend` 协议调用真实后端,产出 `tool_called` 和 `tool_result` 事件 +- `RouteDecision` 现在携带 `selected_payloads`,可以将候选参数经由 `ToolExecutor` 传递给后端 +- 其他 decision_type 先走 noop + +这一步的意义是: +memabra 第一次有了 execution stage,而不是只有 policy stage。 +并且 tool 层现在可以接入真实的本地函数或子进程后端,不再是纯 mock。 + +### persistence.py +提供: +- `PersistenceStore` + +当前能力: +- 保存 trajectory 到 `artifacts/trajectories/` +- 读取 trajectory +- 列出 trajectory 文件 +- 保存 memory record 到 `artifacts/memories/` +- 读取 memory record +- 列出 memory 文件 + +这意味着 prototype artifacts 已经不再只是内存态漂浮物。 + +### runner writeback integration +runner 现在支持: +- 挂 execution engine +- 挂 persistence store +- 挂 memory store +- 执行后扩展 execution events +- 可选把 trajectory 落盘 +- 对 memory inject 决策进行基本 writeback / mark_used + +## 当前闭环 + +现在的最小系统流程已经变成: +任务 -> retrieval -> router -> execution -> trajectory -> validation -> persistence -> replay + +这就真正有点 agent runtime 的味儿了。 + +## 当前限制 + +- ~~tool 执行还是 mock 的~~ 已升级为可插拔式真实后端 +- skill 执行只是事件层,不是真加载技能 +- writeback 逻辑还很粗糙 +- persistence 目前是 JSON 文件,没有索引层 + +## 下一步建议 + +1. ~~做真实 `ToolExecutor` / `SkillExecutor` adapter 协议~~ tool adapter 已完成 +2. 做真实 `SkillExecutor` adapter(从文件系统加载 skill payload) +3. 把 persistence 接到 replay 默认数据源 +4. 给 runner 增加 outcome / reward 的真实更新逻辑 +5. 做 richer telemetry 和失败事件归因 diff --git a/docs/NAMING.md b/docs/NAMING.md new file mode 100644 index 0000000..c8745df --- /dev/null +++ b/docs/NAMING.md @@ -0,0 +1,48 @@ +# Naming + +最终命名确定为: + +# memabra + +副标题: +An intuition-driven control plane for agent memory and action selection. + +## 选择理由 + +这个名字成立,因为它同时满足两件事: + +1. 作为品牌名,它短、好记、有辨识度。 +2. 作为系统名,它配合副标题后,能准确表达项目本质不是“记忆库”,而是 memory、skill、tool 的动作选择与控制系统。 + +## 命名策略 + +- 品牌名:`memabra` +- 技术描述:`An intuition-driven control plane for agent memory and action selection.` + +这样分层后: +- `memabra` 负责让人记住 +- 副标题负责让人看懂 + +## 为什么不用纯功能名 + +像 `Agent Memory Manager` 这样直接描述功能的名字,问题是太窄: +- 太像存储工具 +- 没体现 routing / policy / evaluation / learning +- 没体现它是 agent 的元认知控制器 + +## 内部表达建议 + +在技术文档里,可以把 memabra 描述为: +- local-first metacognitive router +- agent memory and action orchestration system +- intuition-driven control plane + +这三个说法分别适合: +- 研究语境 +- 工程语境 +- 对外介绍语境 + +## 结论 + +命名不再强调“memory manager”,而强调“intuition-driven control”。 +这更接近项目真正的骨架。 \ No newline at end of file diff --git a/docs/ONLINE_LEARNING.md b/docs/ONLINE_LEARNING.md new file mode 100644 index 0000000..bd0ef43 --- /dev/null +++ b/docs/ONLINE_LEARNING.md @@ -0,0 +1,171 @@ +# Online Learning Operator Guide + +## What it does + +memabra's online learning loop lets the system safely retrain its router from accumulated trajectories, evaluate the new challenger against the current baseline, and promote it only if explicit thresholds are met. + +## How to run one cycle + +### From Python + +```python +from src.memabra.cli import run_online_learning_workflow + +result = run_online_learning_workflow() +print(result) +``` + +### From the shell + +```bash +source venv/bin/activate +python -m src.memabra.cli +``` + +Or with custom options: + +```bash +source venv/bin/activate +python -m src.memabra.cli --base-dir /custom/artifacts --min-new-trajectories 5 +``` + +By default the CLI persists seen trajectory IDs to `/seen-trajectories.json` so repeated runs skip already-processed data. You can override the path: + +```bash +source venv/bin/activate +python -m src.memabra.cli --seen-trajectory-store /custom/artifacts/seen.json +``` + +### Dry-run mode + +To train and evaluate a challenger without actually promoting it or saving a new router version: + +```bash +source venv/bin/activate +python -m src.memabra.cli --dry-run +``` + +This still produces a training report (with `dry_run: true`) so you can inspect what would have happened before allowing a real promotion. + +### Evaluate against a specific baseline version + +By default the online-learning cycle uses the currently active router as the baseline. You can pin the baseline to a specific saved version instead: + +```bash +source venv/bin/activate +python -m src.memabra.cli --baseline-version 20260414-123456 +``` + +This is useful when you want to compare a challenger against a known-good version rather than whatever happens to be active right now. The report will record `baseline_version_id` for audit. + +### Episodic retrieval with case index + +You can load or rebuild a case index for episodic retrieval during task execution: + +```bash +source venv/bin/activate +python -m src.memabra.cli --rebuild-case-index +``` + +This builds a `CaseIndex` from all saved trajectories and saves it to the default path (`/case-index.json`). On subsequent runs, load it without rebuilding: + +```bash +source venv/bin/activate +python -m src.memabra.cli --case-index /custom/artifacts/case-index.json +``` + +When a case index path is provided, the online-learning cycle automatically rebuilds the index after training and evaluation, so benchmark-generated trajectories are included for future episodic retrieval. + +When a case index is loaded, the runner injects an episodic memory candidate into retrieval for inputs that match a previously seen task, surfacing the best past trajectory as a hint to the router. + +Or inline: + +```bash +source venv/bin/activate +python - <<'PY' +from src.memabra.cli import run_online_learning_workflow +print(run_online_learning_workflow()) +PY +``` + +## Promotion gates + +A challenger is promoted only when **all** of the following are true: + +- `reward_delta >= min_reward_delta` — the challenger must improve average reward by at least this amount +- `error_rate_delta <= max_error_rate_increase` — the challenger must not increase errors beyond this limit +- `latency_delta_ms <= max_latency_increase_ms` — the challenger must not become slower beyond this limit +- `task_count >= required_task_count` — the benchmark must include at least this many tasks + +Default policy in the CLI workflow is lenient for alpha exploration. In production you should tighten these thresholds. + +## Where reports and versions are stored + +By default everything lands under: + +- `docs/projects/memabra/demo-artifacts/trajectories/` — raw task trajectories +- `docs/projects/memabra/demo-artifacts/router-versions/versions/` — versioned router weights +- `docs/projects/memabra/demo-artifacts/router-versions/current.json` — active router metadata (includes promotion source, benchmark summary, prior version, rollback history) +- `docs/projects/memabra/demo-artifacts/training-reports/` — one JSON report per training run + +## What happens when the challenger loses + +- The active router in the app **remains unchanged** +- A training report is still saved with the rejection reasons +- No new version is registered as current + +## Rolling back + +You can roll back to any previous version from Python: + +```python +from src.memabra.router_versioning import RouterVersionStore + +store = RouterVersionStore() +store.rollback("20260414-123456") +current = store.get_current() +print(current) +``` + +Or from the CLI: + +```bash +source venv/bin/activate +python -m src.memabra.cli --rollback 20260414-123456 +``` + +To see all available versions before rolling back: + +```bash +source venv/bin/activate +python -m src.memabra.cli --list-versions +``` + +Rollback preserves an audit trail in `current.json` (`rollback_from`, `rolled_back_at`). + +## Status check + +To quickly inspect the current system state without running a learning cycle: + +```bash +source venv/bin/activate +python -m src.memabra.cli --status +``` + +## Architecture summary + +``` +Trajectories -> ArtifactIndex -> DatasetBuilder -> SimpleLearningRouter (challenger) + | + v +BenchmarkSuite -> Evaluator -> baseline vs challenger + | + v + PromotionPolicy.evaluate() + | + +-------------------+-------------------+ + | accepted | rejected + v v + RouterVersionStore.save() training report saved + app.set_router(challenger) active router unchanged +``` diff --git a/docs/PROGRESS.md b/docs/PROGRESS.md new file mode 100644 index 0000000..ad3e26b --- /dev/null +++ b/docs/PROGRESS.md @@ -0,0 +1,162 @@ +# memabra Progress + +## Current status + +Project status: safe self-improving alpha, benchmark-gated online learning loop complete +Date: 2026-04-15 +Project: memabra +Subtitle: An intuition-driven control plane for agent memory and action selection. + +## What exists now + +memabra now has a complete safe self-improving alpha control-plane loop: +- candidate retrieval +- routing decisions +- memory / skill / tool execution +- telemetry events +- trajectory construction +- runtime validation +- artifact persistence +- replay and analytics +- artifact indexing and dataset slicing +- lightweight learning router training +- A/B evaluation +- router weight versioning and rollback +- benchmark-gated promotion with explicit policy thresholds +- auditable training reports +- exception-safe online learning coordinator +- configurable CLI entrypoint +- persisted seen-trajectory tracking across restarts (safe for cron jobs) +- dry-run mode for training/evaluation without promotion risk +- baseline version selection for challenger evaluation +- task case index (`CaseIndex`) for episodic retrieval: maps normalized inputs to the best past trajectory ID +- `CaseIndex` integration into `MemabraApp` (build, save, load, lookup) and `MemabraRunner` (injects episodic candidate on matching inputs) +- CLI flags `--case-index` and `--rebuild-case-index` for operator-managed episodic retrieval +- `OnlineLearningCoordinator` auto-rebuilds case index after each cycle when `case_index_path` is provided, ensuring benchmark-generated trajectories are indexed +- `TrajectorySummarizer` generates human-readable trajectory summaries from task input, decisions, outcome, and reward +- `MemabraRunner` enriches episodic memory candidate summaries using `TrajectorySummarizer` when `persistence_store` is available +- CLI `--status` flag prints current system state (active router version, counts, latest report) without triggering a learning cycle +- CLI is now subcommand-driven (`run`, `status`, `version list`, `version rollback`) with a dedicated packaged `memabra` entrypoint +- CLI `--format text` mode provides operator-friendly summaries for status checks, version listings, rollbacks, and workflow runs, including latest report details, current-version highlighting, sectioned workflow summaries, normalized yes/no flags, and fixed-precision benchmark/promotion metrics + +## Major completed capabilities + +### Foundations +- project naming, architecture, roadmap, decisions, reward spec +- candidate / event / trajectory / memory schemas +- prototype package structure under `src/memabra/` + +### Runtime path +- `retrieval.py`: typed candidate retrieval +- `router.py`: heuristic router, feature-scoring router, learning router +- `execution.py`: memory, skill, tool executors and adapters +- `runner.py`: end-to-end task -> trajectory orchestration +- `persistence.py`: trajectory and memory artifact storage +- `replay.py`: replay summaries over examples and persisted runs +- `memory_store.py`: typed memory records with verify/revoke support + +### Adapters and evaluation +- real tool adapters: + - `LocalFunctionToolAdapter` + - `SubprocessToolAdapter` + - `ToolRegistry` +- real skill loading: + - `FileSystemSkillBackend` +- richer evaluation path: + - `OutcomeEngine` + - `RewardEngine` + - `ArtifactIndex` + - `DatasetBuilder` + - `Evaluator` + - `RouterVersionStore` +- Alpha Iteration 1 — online learning loop: + - `PromotionPolicy` with benchmark-gated promotion rules + - `BenchmarkSuite` persistence (JSON load/save + default seed) + - `OnlineLearningCoordinator` for retrain/evaluate/promote cycles + - exception-safe coordinator: training/evaluation failures emit auditable error reports instead of crashing + - `TrainingReportStore.get_report()` for by-id report lookup + +### Product/demo surface +- `app.py`: `MemabraApp`, demo builders, artifact index access, training hooks, `run_online_learning_cycle` +- `cli.py`: wrap-up workflow and `run_online_learning_workflow` with benchmark-gated promotion +- `cli.py`: argument parsing (`--base-dir`, `--min-new-trajectories`) and clean `python -m src.memabra.cli` execution +- `DEMO.md`: runnable walkthrough with CLI options + +## Current test status + +Command: +`source venv/bin/activate && python -m pytest tests/memabra -q` + +Latest result: +`118 passed` + +All alpha iteration 1 source, tests, and documentation have been committed to the repository (commit `34cf507c`). + +## Most important current files + +### Core package +- `src/memabra/app.py` +- `src/memabra/cli.py` +- `src/memabra/router.py` +- `src/memabra/runner.py` +- `src/memabra/execution.py` +- `src/memabra/evaluator.py` +- `src/memabra/router_versioning.py` +- `src/memabra/promotion.py` +- `src/memabra/online_learning.py` +- `src/memabra/training_reports.py` +- `src/memabra/benchmarks.py` +- `src/memabra/case_index.py` + +### Tests +- `tests/memabra/test_app.py` +- `tests/memabra/test_cli_workflow.py` +- `tests/memabra/test_package_exports.py` +- `tests/memabra/test_promotion.py` +- `tests/memabra/test_online_learning.py` +- `tests/memabra/test_training_reports.py` +- `tests/memabra/test_benchmarks.py` +- `tests/memabra/test_router_versioning.py` +- `tests/memabra/test_evaluator.py` +- `tests/memabra/test_router_protocol.py` +- `tests/memabra/test_execution_persistence.py` + +## Wrap-up status + +The project is now in a safe self-improving alpha state. +It can: +- run realistic demo tasks +- persist trajectories +- replay and inspect results +- train a lightweight router from saved artifacts +- compare baseline vs challenger routers +- apply a promotion policy with explicit thresholds +- save and reload router versions with metadata +- emit auditable training reports +- run an online-learning cycle from the CLI +- leave the active router unchanged when challenger fails +- survive training/evaluation failures gracefully and emit error reports +- accept CLI overrides for artifact directory and trajectory thresholds +- persist seen-trajectory state across restarts so cron jobs don't retrain on the same data +- default CLI `main()` persists seen trajectories to `/seen-trajectories.json` +- run in dry-run mode to evaluate a challenger without promoting it +- run in baseline-version mode to compare a challenger against a specific saved version instead of the currently active router +- index successful task cases by normalized input for episodic retrieval (`CaseIndex`) +- build/save/load a case index from `MemabraApp` +- inject episodic memory candidates during runner retrieval when a similar past task exists +- use `--case-index` and `--rebuild-case-index` CLI flags to manage episodic retrieval +- online-learning cycles automatically refresh the case index after training/evaluation when a case-index path is configured +- episodic memory candidates now include rich human-readable summaries when the past trajectory is available via `persistence_store` +- CLI `--status` flag provides a quick read-only snapshot of the active router, versions, trajectories, and reports +- CLI `--rollback` and `--list-versions` flags enable operator-safe router version management without touching code + +## Next sensible frontier + +1. tighter integration with real Hermes trajectories +2. multi-turn conversation state and working-memory updates +3. richer real-world tool ecosystem integration (MCP, web, git, files) +4. stronger storage/index backend beyond plain JSON files + +## One-line summary + +memabra is now a runnable, test-covered safe self-improving alpha for agent memory/action routing, with online learning, benchmark-gated promotion, and auditable reports. diff --git a/docs/PROTOTYPE_LAYOUT.md b/docs/PROTOTYPE_LAYOUT.md new file mode 100644 index 0000000..5b8b855 --- /dev/null +++ b/docs/PROTOTYPE_LAYOUT.md @@ -0,0 +1,90 @@ +# Prototype Layout + +## 目标 + +为 memabra 建立一个最小可运行的原型目录结构,让后续 rule-based router、replay harness、sample trajectories 和训练样本生成都能有明确落点。 + +## 目录结构 + +```text +src/memabra/ +├── __init__.py +├── candidate_types.py # 统一候选对象与决策类型 +├── router.py # Rule-based router baseline +├── telemetry.py # 事件、reward、轨迹的运行时结构 +├── reward.py # reward 聚合逻辑 +├── retrieval.py # 后续:候选召回接口 +├── memory_store.py # 后续:长期记忆存取 +├── replay.py # 后续:trajectory 回放与评估 +└── schemas.py # 后续:schema 装载/校验 + +tests/memabra/ +└── test_router_smoke.py # baseline 冒烟测试 +``` + +## 当前已落地 + +已创建: +- `src/memabra/__init__.py` +- `src/memabra/candidate_types.py` +- `src/memabra/router.py` +- `src/memabra/telemetry.py` +- `src/memabra/reward.py` +- `tests/memabra/test_router_smoke.py` + +## 模块边界 + +### candidate_types.py +负责: +- `CandidateObject` +- `DecisionType` +- 后续可扩展 memory/skill/tool type-specific adapter + +### router.py +负责: +- `TaskContext` +- `RouteDecision` +- `RuleBasedRouter` + +当前只实现 baseline 启发式,后续升级为: +- 特征打分器 +- reranker +- learned policy + +### telemetry.py +负责: +- 原子事件结构 +- reward breakdown +- 后续 trajectory runtime objects + +### reward.py +负责: +- reward 组合与计算 +- 后续权重版本化 + +## 设计原则 + +1. 先有可运行 baseline,再抽象复杂接口 +2. 数据结构先简单,但字段命名与 Phase 0 schema 保持一致 +3. 先保证 replayable,再考虑高性能 +4. 不提前引入数据库或向量库耦合 + +## 下一步落点 + +- `retrieval.py`:定义候选召回接口 +- `replay.py`:实现 trajectory 读取、回放和指标计算 +- `schemas.py`:把 JSON schema 转成运行时校验入口 +- `sample_data/`:放示例 candidates 和 trajectories + +## 验证建议 + +在项目根目录运行: + +```bash +source venv/bin/activate +python -m pytest tests/memabra/test_router_smoke.py -q +``` + +期望: +- baseline router 冒烟测试通过 +- 说明最小原型骨架已可被导入和调用 diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 0000000..64c3a7a --- /dev/null +++ b/docs/README.md @@ -0,0 +1,87 @@ +# memabra + +An intuition-driven control plane for agent memory and action selection. + +## Quick start + +If you are working from this repository, activate the virtualenv and install the project in editable mode so the dedicated `memabra` command is available: + +```bash +source venv/bin/activate +uv pip install -e ".[dev]" +memabra --help +memabra run --base-dir /tmp/memabra-demo --format text --dry-run +``` + +The dedicated CLI is the fastest way to experience the alpha. It supports subcommands for different operations: + +- `memabra run` — run the online-learning loop +- `memabra status` — show system status +- `memabra version list` — list saved router versions +- `memabra version rollback ` — roll back to a version + +memabra 的目标,不是做一个“会存东西的记忆库”,而是做一个本地 agent 的元认知控制器: +在面对任务时,能像人的直觉一样,快速判断该直接回答、查记忆、加载 skill、还是调用工具;并且根据任务结果持续优化这种判断。 + +一句话定义: +这是一个 local-first、可观测、可训练、可回放的 agent memory and action orchestration system。 + +## 为什么要做 + +传统 agent 的常见问题: +- 上下文越来越胖,什么都往 prompt 里塞 +- 记忆、skill、工具是三套割裂系统 +- 成功或失败后,很难知道到底是哪一步起了作用 +- 想“学习”时,缺少可归因的轨迹数据 + +memabra 要解决的本质问题是: +什么时候该依赖什么。 + +## 核心观点 + +先不要一上来做端到端神经网络大一统训练。 +先建立 4 层结构: +1. 检索层:召回候选 memory / skill / tool +2. 路由层:决定调用什么,以及先后顺序 +3. 执行层:真正注入记忆、加载 skill、调用工具 +4. 评估层:记录结果,分配 credit,形成训练样本 + +如果这 4 层都看不清,直接端到端训练,大概率会学成“少调工具、靠模型硬猜”的歪路子。 + +## 项目输出 + +当前目录先以方案与设计文档为主: +- `ARCHITECTURE.md`:系统架构 +- `ROADMAP.md`:分阶段路线图 +- `DECISIONS.md`:关键设计决策 +- `PROGRESS.md`:当前进度和下一步 +- `schemas/`:Phase 0 的统一 schema +- `reward_spec.md`:奖励设计草案 + +后续可以补: +- `experiments/`:训练与评估实验 +- `src/`:原型代码 +- `tests/`:验证与回归测试 + +## 目标能力 + +最终希望具备: +- 统一管理 facts / procedures / episodes 三类长期信息 +- 给 memory、skill、tool 建立统一候选召回机制 +- 让一个“直觉策略器”做快速动作选择 +- 通过任务结果反推策略好坏 +- 逐步从规则系统过渡到可学习策略 +- 在本地环境下可持续演化 + +## 当前状态 + +项目已初始化,并已进入 Phase 0 基础定义阶段: +- 完成方向澄清 +- 确立分层路线 +- 完成命名 +- 建立项目目录 +- 写入首版架构、路线图、决策和进度文档 +- 准备补齐 schema 与 reward 规范 + +下一步建议直接进入 Phase 0: +定义统一对象模型、轨迹日志结构、reward 拆分方案。 \ No newline at end of file diff --git a/docs/REPLAY_AND_RETRIEVAL.md b/docs/REPLAY_AND_RETRIEVAL.md new file mode 100644 index 0000000..2c8265c --- /dev/null +++ b/docs/REPLAY_AND_RETRIEVAL.md @@ -0,0 +1,60 @@ +# Replay and Retrieval + +## 目标 + +把 memabra 的最小闭环接起来: +- retrieval 负责把 memory / skill / tool 候选召回出来 +- replay 负责读取 trajectories 并汇总行为结果 + +这两者一接上,系统就不再只是静态文档和单点 router,而是具备了: +- 候选输入 +- 决策输出 +- 轨迹回放 +- 基础统计 + +## 当前实现 + +### retrieval.py +提供: +- `CandidateProvider` 协议 +- `InMemoryCandidateProvider` +- `CandidateRetriever` +- `RetrievalResult` + +当前策略: +- 使用 trigger/tag 与任务文本做简单 lexical matching +- 结合 confidence / success_rate / freshness / cost / risk 做 baseline 排序 +- 对不同 provider 输出做按类型聚合与去重 + +### replay.py +提供: +- `TrajectoryReplay` +- `ReplaySummary` + +当前能力: +- 加载单个 trajectory JSON +- 加载目录下多个 trajectory +- 汇总 outcome counts +- 汇总 reward、latency、steps、user corrections +- 统计各类 decision_type 次数 + +## 为什么这一步重要 + +没有 retrieval,router 只能对空候选做假动作。 +没有 replay,reward 和 trajectory 只是躺在磁盘上的 JSON 标本。 + +这一步之后,memabra 第一次拥有了最小闭环: +任务 -> 候选 -> 决策 -> 轨迹 -> 回放统计 + +## 当前局限 + +- retrieval 还是词面匹配,不是 embedding 或 learned ranking +- replay 只做汇总,不做 schema 校验和 counterfactual 对比 +- 还没有把 router 与 retriever 真正串成 end-to-end runner + +## 下一步 + +1. 加 `schemas.py` 做运行时校验 +2. 做 `memory_store.py` 和 provider 接口 +3. 做 `runner.py` 把 retrieval + router + telemetry 串起来 +4. 给 replay 加基线比较和 reward breakdown 分析 diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md new file mode 100644 index 0000000..5d846c2 --- /dev/null +++ b/docs/ROADMAP.md @@ -0,0 +1,136 @@ +# Roadmap + +## 总体目标 + +构建一个本地 agent 记忆管理与元认知控制系统,使 agent 能在 memory、skill、tool 之间做可学习的动作选择,并通过任务反馈逐步优化策略。 + +## Phase 0 — Foundations / 仓基 + +目标:先把“对象”和“轨迹”定义清楚。 + +交付物: +- 统一候选对象 schema +- memory / skill / tool 类型边界定义 +- 事件日志 schema +- trajectory schema +- reward 拆解草案 +- 评估指标草案 +- 原型目录布局草案 +- baseline router 设计文档 +- 示例 trajectories + +成功标准: +- 对任何一次任务,都能完整记录:看到了什么、选了什么、结果如何 +- 文档足够清晰,后续实现不靠拍脑袋 +- 有第一批 success / failure trajectory 样本可供 replay 使用 + +状态:已完成 + +## Phase 1 — Observable MVP / 可观测最小系统 + +目标:做一个不学习、但能完整运行和记录的版本。 + +交付物: +- 候选召回模块 +- memory/skill/tool 统一候选接口 +- 基于规则或启发式的 router +- 执行适配层 +- 轨迹日志落盘 +- 基础可视化 / 回放能力 + +成功标准: +- 给定任务,系统能做出动作选择 +- 每次动作都能复盘 +- 可以统计简单指标:命中率、工具调用率、任务完成率 + +状态:已完成 + +## Phase 2 — Learned Router / 学习型路由器 + +目标:让"直觉"开始可训练。 + +交付物: +- 候选特征工程 +- 训练样本构建流程 +- 轻量分类器 / reranker / bandit +- 离线评估基线 +- 路由策略 A/B 对比 + +成功标准: +- 学习型路由在离线回放中优于规则路由 +- 减少明显无效调用 +- 能识别高价值 memory / skill / tool 场景 + +状态:已完成(SimpleLearningRouter、DatasetBuilder、Evaluator、A/B comparison、RouterVersionStore) + +## Phase 3 — Rewarded Adaptation / 带反馈的适应 + +目标:利用任务结果对策略做持续更新。 + +交付物: +- reward 聚合器 +- 用户修正信号接入 +- online / batch 更新机制 +- safe exploration 策略 +- 记忆置信度更新机制 +- benchmark-gated promotion policy +- training run reports +- active router metadata tracking + +成功标准: +- 策略可在连续任务中改善 +- 不会因为少量坏反馈快速崩掉 +- 可以识别并降权错误记忆 +- promotion 必须经过 benchmark 验证 + +状态:已完成(online learning coordinator、promotion policy、training reports、version metadata、benchmark-gated promotion、active router tracking、app/CLI entrypoints 已实现) + +### Phase 4 — Episodic Learning / 情景学习 + +目标:把过往任务轨迹变成真正有用的 episodic memory。 + +交付物: +- 任务案例索引 (done) +- episode retrieval (done — via CaseIndex and runner injection) +- 相似任务复用 (done — runner injects episodic candidate) +- trajectory summarization (done — `TrajectorySummarizer` generates human-readable summaries) + +成功标准: +- 对重复型任务,系统能复用历史成功路径 +- episode 不会污染事实记忆和 skill 库 + +状态:进行中 (核心功能已完成) + +## Phase 5 — End-to-End Experiments / 端到端实验 + +目标:验证是否值得把路由进一步内化到神经模型权重中。 + +交付物: +- 训练数据集定义 +- SFT / preference / RL 实验方案 +- 与分层系统的对照评估 +- 风险分析:遗忘、过拟合、行为漂移 + +成功标准: +- 至少在一组明确任务上优于分层基线 +- 不显著降低可解释性和稳定性 + +状态:未开始 + +## 每阶段都要守住的底线 + +- 必须可回放 +- 必须可归因 +- 必须分清 memory、skill、tool +- 必须有失败样本,不只看成功样本 +- 必须能撤销错误记忆与错误策略 + +## 当前优先级 + +1. real adapters +2. richer reward/outcome updates +3. persistence-backed replay +4. router scoring v2 +5. 再谈 learned router + +这五步不打牢,后面训练都是空中楼阁。 \ No newline at end of file diff --git a/docs/ROUTER_BASELINE.md b/docs/ROUTER_BASELINE.md new file mode 100644 index 0000000..dacad3e --- /dev/null +++ b/docs/ROUTER_BASELINE.md @@ -0,0 +1,213 @@ +# Rule-Based Router Baseline + +## 目标 + +定义 memabra 在 Phase 1 使用的第一版路由策略。这个版本不学习,只靠显式规则和候选对象属性做动作选择。 + +它的价值不在于聪明,而在于: +- 可观察 +- 可解释 +- 可回放 +- 可作为 learned router 的基线 + +## 动作空间 + +router 当前允许的动作: + +1. `direct_answer` +2. `inject_memory` +3. `load_skill` +4. `call_tool` +5. `clarify` +6. `composite_action` + +### direct_answer +适用场景: +- 纯分析、命名、结构设计、解释类任务 +- 不依赖实时状态 +- 没有明显外部资源调用必要 + +### inject_memory +适用场景: +- 用户偏好 +- 项目约定 +- 环境事实 +- 历史已知稳定事实 + +### load_skill +适用场景: +- 任务像一个可复用 procedure +- 存在已知工作流 +- 过往在类似任务中复用价值高 + +### call_tool +适用场景: +- 需要获取当前状态 +- 需要访问文件、系统、网页、进程、时间等实时信息 +- 需要执行动作而不是纯推理 + +### clarify +适用场景: +- 高风险且候选信号弱 +- 信息缺失会显著改变动作选择 +- 所有候选都低置信度 + +### composite_action +适用场景: +- 先 memory 再 tool +- 先 skill 再 tool +- 先 memory 再 skill + +当前 baseline 先以单动作为主,组合动作先作为保留动作类型。 + +## 候选打分思路 + +每个候选对象都有公共字段: +- `confidence` +- `success_rate` +- `cost` +- `freshness` +- `risk` + +baseline 不做复杂学习,只用线性直觉打分。 + +### memory score + +```text +memory_score = confidence + freshness + success_rate - cost - risk +``` + +### skill score + +```text +skill_score = confidence + success_rate - cost - risk +``` + +### tool score + +```text +tool_score = confidence + success_rate - cost - risk +``` + +注意: +- memory 更看 freshness +- tool 更看 risk +- skill 更看 success_rate + +## 第一版规则 + +### Rule 1: reasoning-first 任务优先 direct_answer +若用户输入中明显包含以下信号: +- why +- think +- design +- name + +且不存在强 tool 触发词,则优先 `direct_answer`。 + +### Rule 2: 需要实时状态时优先 tool +若输入中出现: +- check +- run +- open +- current +- list +- time + +则优先找高置信 `tool` 候选。 + +额外门槛: +- `confidence >= 0.6` +- `risk <= 0.7` + +### Rule 3: 用户/项目稳定事实优先 memory +若输入中出现: +- prefer +- remember +- usually +- my +- our + +则优先找高置信、较新鲜的 `memory` 候选。 + +额外门槛: +- `confidence >= 0.65` +- `freshness >= 0.3` + +### Rule 4: 可复用工作流优先 skill +若输入中出现: +- fix +- deploy +- review +- setup +- workflow + +则优先找高 success_rate 的 `skill` 候选。 + +额外门槛: +- `confidence >= 0.55` +- `success_rate >= 0.4` + +### Rule 5: 没把握就 clarify +如果没有任何一类候选达到门槛,则返回 `clarify`。 + +这条规则很丑,但很必要。 +宁可问一句,也别瞎调一堆工具把屋顶掀了。 + +## 冲突解决顺序 + +当多个动作同时触发时,baseline 使用以下优先级: + +```text +tool > memory > skill > direct_answer > clarify +``` + +原因: +- 实时信息需求通常最硬 +- 事实约束其次 +- skill 更像增强器 +- 纯回答放在明确无外部需求时 + +后续版本可改成: +- 先 task intent classification +- 再 per-type ranking +- 最后做 global arbitration + +## 已知局限 + +1. 关键词触发太脆 +2. 不看长程上下文 +3. 不支持真正的组合动作规划 +4. 不做反事实选择比较 +5. 容易被表面词汇误导 + +## baseline 的真正用途 + +不是追求高智能,而是提供: +- 第一版可运行系统 +- 第一批可记录轨迹 +- 第一批失败样本 +- learned router 的比较对象 + +## 下一步 + +从这个 baseline 往后长,有三条路线: +1. 引入显式特征工程 +2. 引入候选 reranker +3. 引入 bandit / lightweight policy learning + +在此之前,不要急着把 heuristic 糊成“伪智能”。先把 replay 和 metrics 做出来。 + +--- + +## 实现进展:FeatureScoringRouter (v2) + +已在 `src/memabra/router.py` 中实现 `FeatureScoringRouter`,作为对 `RuleBasedRouter` 的升级: + +- 明确特征打分:memory / skill / tool 分别使用不同权重组合 `confidence`、`success_rate`、`freshness`、`cost`、`risk` +- 失败惩罚:候选 `id` 出现在 `TaskContext.recent_failures` 中时,自动扣减 0.5 分 +- 复合动作前置条件:`CandidateObject` 新增 `preconditions` 字段,支持声明如 `["memory"]` 等前置类型 +- 复合动作执行:`ExecutionEngine` 已支持 `composite_action` 决策类型,按 `composite_steps` 顺序递归执行子步骤 +- 打分透明度:`RouteDecision.score_breakdown` 记录每个候选的最终得分,方便追溯与评估 + +`FeatureScoringRouter` 保持了可解释性,同时为后续学习型策略提供了结构化特征输出。 \ No newline at end of file diff --git a/docs/RUNNER_AND_STORE.md b/docs/RUNNER_AND_STORE.md new file mode 100644 index 0000000..80aea99 --- /dev/null +++ b/docs/RUNNER_AND_STORE.md @@ -0,0 +1,83 @@ +# Runner, Schemas, and Memory Store + +## 目标 + +把 memabra 从“能分别检索、路由、回放”推进到“能产出合法 draft trajectory、能校验数据、能管理 typed memory records”。 + +## 当前实现 + +### runner.py +提供: +- `MemabraRunner` + +能力: +- 接收 `TaskContext` +- 调用 retriever 获取候选 +- 调用 router 生成动作决策 +- 自动生成 draft trajectory +- 产出最小事件流: + - `task_received` + - `candidates_recalled` + - `action_selected` + +意义: +这让 memabra 第一次具备了一个 task-to-trajectory 的实际入口。 + +### schemas.py +提供: +- `SchemaRegistry` +- `SchemaValidationError` + +当前策略: +- 先做轻量级 runtime validation +- 不依赖外部库 +- 先校验关键 required keys + +这还不是完整 JSON Schema engine,但足够先守住地板线,避免样本结构乱飞。 + +### memory_store.py +提供: +- `MemoryRecord` +- `MemorySource` +- `VerificationState` +- `InMemoryMemoryStore` + +当前能力: +- upsert +- get +- list_by_type +- mark_used +- verify +- revoke + +意义: +现在 memabra 终于不是只会“谈记忆”,而是有一个 typed memory record runtime 了。 + +## 当前闭环 + +现在已有: +- retrieval +- router +- runner +- replay +- memory store +- schema validation + +也就是: +任务 -> 候选召回 -> 路由决策 -> trajectory 草稿 -> 回放统计 +并且 memory record 本身也能做校验和状态变更。 + +## 还差什么 + +- execution adapter(真实工具/skill/memory 注入) +- 完整 JSON Schema 验证 +- trajectory 持久化层 +- richer reward aggregation +- counterfactual replay + +## 建议下一步 + +1. 做 `execution.py` +2. 做 `persistence.py` +3. 给 runner 接上 memory store 和 telemetry writeback +4. 做 richer router scoring v2 diff --git a/docs/demo-artifacts/router-versions/current.json b/docs/demo-artifacts/router-versions/current.json new file mode 100644 index 0000000..865ef95 --- /dev/null +++ b/docs/demo-artifacts/router-versions/current.json @@ -0,0 +1,13 @@ +{ + "current_version_id": "20260414-165018", + "promotion_source": null, + "benchmark_summary": { + "reward_delta": -0.446, + "error_rate_delta": 0.0, + "latency_delta_ms": -21.0, + "baseline_avg_reward": 0.886, + "challenger_avg_reward": 0.44 + }, + "prior_version_id": "20260414-155224", + "saved_at": "2026-04-14T16:50:18.865976+00:00" +} \ No newline at end of file diff --git a/docs/demo-artifacts/router-versions/versions/20260414-143742.json b/docs/demo-artifacts/router-versions/versions/20260414-143742.json new file mode 100644 index 0000000..b743272 --- /dev/null +++ b/docs/demo-artifacts/router-versions/versions/20260414-143742.json @@ -0,0 +1,50 @@ +{ + "version_id": "20260414-143742", + "weights": { + "inject_memory": { + "input_length": 43.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.95, + "top_skill_success_rate": 0.9, + "top_tool_confidence": 0.95, + "top_tool_risk": 0.0 + }, + "load_skill": { + "input_length": 44.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.95, + "top_skill_success_rate": 0.9, + "top_tool_confidence": 0.95, + "top_tool_risk": 0.0 + }, + "call_tool": { + "input_length": 32.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.9000000000000001, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "avg_reward": 1.04, + "task_count": 3, + "source": "wrapup_workflow" + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/router-versions/versions/20260414-152738.json b/docs/demo-artifacts/router-versions/versions/20260414-152738.json new file mode 100644 index 0000000..cee247f --- /dev/null +++ b/docs/demo-artifacts/router-versions/versions/20260414-152738.json @@ -0,0 +1,50 @@ +{ + "version_id": "20260414-152738", + "weights": { + "load_skill": { + "input_length": 42.15803814713897, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9499999999999997, + "top_skill_success_rate": 0.9, + "top_tool_confidence": 0.9499999999999997, + "top_tool_risk": 0.0 + }, + "call_tool": { + "input_length": 32.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.95, + "top_skill_success_rate": 0.9000000000000001, + "top_tool_confidence": 0.95, + "top_tool_risk": 0.0 + }, + "inject_memory": { + "input_length": 42.99999999999999, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.95, + "top_skill_success_rate": 0.8999999999999999, + "top_tool_confidence": 0.95, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "avg_reward": 1.04, + "task_count": 3, + "source": "wrapup_workflow" + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/router-versions/versions/20260414-155224.json b/docs/demo-artifacts/router-versions/versions/20260414-155224.json new file mode 100644 index 0000000..63b585f --- /dev/null +++ b/docs/demo-artifacts/router-versions/versions/20260414-155224.json @@ -0,0 +1,55 @@ +{ + "version_id": "20260414-155224", + "weights": { + "load_skill": { + "input_length": 42.38663484486874, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9499999999999997, + "top_skill_success_rate": 0.9, + "top_tool_confidence": 0.9499999999999997, + "top_tool_risk": 0.0 + }, + "call_tool": { + "input_length": 32.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.9000000000000001, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + }, + "inject_memory": { + "input_length": 41.75894988066825, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9499999999999997, + "top_skill_success_rate": 0.8999999999999999, + "top_tool_confidence": 0.9499999999999997, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.154, + "error_rate_delta": 0.0, + "latency_delta_ms": -21.0, + "baseline_avg_reward": 0.886, + "challenger_avg_reward": 1.04 + } + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/router-versions/versions/20260414-165018.json b/docs/demo-artifacts/router-versions/versions/20260414-165018.json new file mode 100644 index 0000000..41456d6 --- /dev/null +++ b/docs/demo-artifacts/router-versions/versions/20260414-165018.json @@ -0,0 +1,65 @@ +{ + "version_id": "20260414-165018", + "weights": { + "load_skill": { + "input_length": 41.594896331738454, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9499999999999998, + "top_skill_success_rate": 0.9000000000000001, + "top_tool_confidence": 0.9499999999999998, + "top_tool_risk": 0.0 + }, + "call_tool": { + "input_length": 32.85406896551724, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.95, + "top_skill_success_rate": 0.9, + "top_tool_confidence": 0.95, + "top_tool_risk": 0.0 + }, + "clarify": { + "input_length": 51.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.95, + "top_skill_success_rate": 0.9, + "top_tool_confidence": 0.95, + "top_tool_risk": 0.0 + }, + "inject_memory": { + "input_length": 41.45435244161358, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9499999999999996, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9499999999999996, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": -0.446, + "error_rate_delta": 0.0, + "latency_delta_ms": -21.0, + "baseline_avg_reward": 0.886, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/training-reports/report-886de309-18d0-4be6-b626-0f7d2edc8b72.json b/docs/demo-artifacts/training-reports/report-886de309-18d0-4be6-b626-0f7d2edc8b72.json new file mode 100644 index 0000000..04702f2 --- /dev/null +++ b/docs/demo-artifacts/training-reports/report-886de309-18d0-4be6-b626-0f7d2edc8b72.json @@ -0,0 +1,52 @@ +{ + "report_id": "report-886de309-18d0-4be6-b626-0f7d2edc8b72", + "timestamp": "2026-04-14T15:52:24.610516+00:00", + "source_trajectory_ids": [ + "traj-004e53d5-006c-4e61-91a4-dc51cf7ee9bd", + "traj-120aec7e-a74d-42d6-8846-c472680cc2f3", + "traj-179d0c19-3f0f-4429-a85b-3e01802290d3", + "traj-1ac5bb3d-f865-4c8c-8ff4-a9c29472b303", + "traj-3f6687ff-3a55-4a26-a7bc-8397d8da7d15", + "traj-439e4552-f248-43cb-b4eb-25db14da1ebc", + "traj-58ec7a90-3ada-4b78-bc6a-6351be4eb4b5", + "traj-6a5aaff5-9336-4a1d-b102-80f1196427ae", + "traj-707b1dec-1d9a-4a71-a07a-54841155103c", + "traj-80784ce5-fc14-4fee-9f5f-90dcec26179b", + "traj-819443a2-79ea-48b7-a543-8bb7356dba36", + "traj-9144cbc3-1ccf-4660-aad9-8db5797461eb", + "traj-9190707c-5486-4266-a6c8-32f34c6c63ec", + "traj-adb05c91-4c0c-493a-af84-517efea3f406", + "traj-affbeb5b-eb52-40fd-94cb-48b7c374f1fc", + "traj-bcad8fa2-ffd3-4e5b-9ddb-720f3898826e", + "traj-c5907bfb-61d2-47f9-a6c5-2300701bb551", + "traj-c9c11bdc-852b-4aef-851c-f2968806e535", + "traj-d2d3a115-36d8-466f-9d14-bf741316f698", + "traj-d3575889-7458-44b9-b3f1-f04cd766ca76", + "traj-dd361c81-40a1-4892-9914-2140870fff95" + ], + "sample_count": 21, + "baseline_metrics": { + "task_count": 4, + "avg_reward": 0.886, + "error_rate": 0.0, + "avg_latency_ms": 21.0 + }, + "challenger_metrics": { + "task_count": 4, + "avg_reward": 1.04, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.154, + "error_rate_delta": 0.0, + "latency_delta_ms": -21.0, + "baseline_avg_reward": 0.886, + "challenger_avg_reward": 1.04 + } + }, + "promoted_version_id": "20260414-155224" +} \ No newline at end of file diff --git a/docs/demo-artifacts/training-reports/report-e7050e1f-fa3c-42e4-9178-e57f69b2dc1d.json b/docs/demo-artifacts/training-reports/report-e7050e1f-fa3c-42e4-9178-e57f69b2dc1d.json new file mode 100644 index 0000000..59a8d56 --- /dev/null +++ b/docs/demo-artifacts/training-reports/report-e7050e1f-fa3c-42e4-9178-e57f69b2dc1d.json @@ -0,0 +1,60 @@ +{ + "report_id": "report-e7050e1f-fa3c-42e4-9178-e57f69b2dc1d", + "timestamp": "2026-04-14T16:50:18.866221+00:00", + "source_trajectory_ids": [ + "traj-004e53d5-006c-4e61-91a4-dc51cf7ee9bd", + "traj-120aec7e-a74d-42d6-8846-c472680cc2f3", + "traj-179d0c19-3f0f-4429-a85b-3e01802290d3", + "traj-1ac5bb3d-f865-4c8c-8ff4-a9c29472b303", + "traj-217ccafa-716c-4534-813b-a489ed7d6079", + "traj-3f6687ff-3a55-4a26-a7bc-8397d8da7d15", + "traj-439e4552-f248-43cb-b4eb-25db14da1ebc", + "traj-58ec7a90-3ada-4b78-bc6a-6351be4eb4b5", + "traj-6a5aaff5-9336-4a1d-b102-80f1196427ae", + "traj-707b1dec-1d9a-4a71-a07a-54841155103c", + "traj-74e92442-04fd-4f5a-979f-2dd81a7f08e1", + "traj-80784ce5-fc14-4fee-9f5f-90dcec26179b", + "traj-819443a2-79ea-48b7-a543-8bb7356dba36", + "traj-9144cbc3-1ccf-4660-aad9-8db5797461eb", + "traj-9190707c-5486-4266-a6c8-32f34c6c63ec", + "traj-9edc5088-09cc-42d6-a160-cede5357f535", + "traj-adb05c91-4c0c-493a-af84-517efea3f406", + "traj-affbeb5b-eb52-40fd-94cb-48b7c374f1fc", + "traj-b786c15f-388d-4228-9da4-c9e82b61570a", + "traj-bcad8fa2-ffd3-4e5b-9ddb-720f3898826e", + "traj-c5907bfb-61d2-47f9-a6c5-2300701bb551", + "traj-c9c11bdc-852b-4aef-851c-f2968806e535", + "traj-d2d3a115-36d8-466f-9d14-bf741316f698", + "traj-d3575889-7458-44b9-b3f1-f04cd766ca76", + "traj-dd361c81-40a1-4892-9914-2140870fff95", + "traj-e197ee51-e87c-4203-b9ee-c2f2d530cceb", + "traj-ed1d8812-f0ac-4994-86ab-21b3cf0fcb17", + "traj-f1d895a0-5442-448f-8936-4ee8b07822e6", + "traj-ffb40d01-7956-4d7b-a41c-9618487fe619" + ], + "sample_count": 29, + "baseline_metrics": { + "task_count": 4, + "avg_reward": 0.886, + "error_rate": 0.0, + "avg_latency_ms": 21.0 + }, + "challenger_metrics": { + "task_count": 4, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.446, + "error_rate_delta": 0.0, + "latency_delta_ms": -21.0, + "baseline_avg_reward": 0.886, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-165018" +} \ No newline at end of file diff --git a/docs/demo-artifacts/trajectories/traj-004e53d5-006c-4e61-91a4-dc51cf7ee9bd.json b/docs/demo-artifacts/trajectories/traj-004e53d5-006c-4e61-91a4-dc51cf7ee9bd.json new file mode 100644 index 0000000..5dfae67 --- /dev/null +++ b/docs/demo-artifacts/trajectories/traj-004e53d5-006c-4e61-91a4-dc51cf7ee9bd.json @@ -0,0 +1,192 @@ +{ + "trajectory_id": "traj-004e53d5-006c-4e61-91a4-dc51cf7ee9bd", + "task": { + "task_id": "task-5977495f-189b-4a87-8924-4834bded854c", + "input": "Check the current system status.", + "channel": "local", + "created_at": "2026-04-14T14:37:42.381631+00:00", + "user_id": "oza" + }, + "context_snapshot": { + "conversation_summary": "", + "environment_summary": "", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-telegram-pref", + "type": "memory", + "title": "Telegram preference", + "summary": "Prefer plain text on Telegram.", + "triggers": [ + "telegram", + "preference", + "answer" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 0.9, + "risk": 0.0, + "tags": [ + "output" + ], + "source": "user", + "type_payload": {} + } + ], + "skill": [ + { + "id": "skill-deploy", + "type": "skill", + "title": "Deploy workflow", + "summary": "Reusable deployment workflow.", + "triggers": [ + "deploy", + "workflow", + "service" + ], + "cost": 0.0, + "confidence": 0.8, + "success_rate": 0.9, + "freshness": 0.8, + "risk": 0.0, + "tags": [ + "ops" + ], + "source": "system", + "type_payload": {} + } + ], + "tool": [ + { + "id": "tool-terminal", + "type": "tool", + "title": "terminal", + "summary": "Run terminal-style inspection commands.", + "triggers": [ + "check", + "current", + "status", + "system" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 1.0, + "risk": 0.0, + "tags": [ + "inspection" + ], + "source": "system", + "type_payload": {} + } + ] + }, + "decisions": [ + { + "step": 1, + "decision_type": "load_skill", + "selected_ids": [ + "skill-deploy" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Predicted by learning router (score=1413.615).", + "estimated_cost": 0.0 + } + ], + "events": [ + { + "event_id": "evt-501ed3a1-622f-4e8a-b90b-2fb0384d89bd", + "trajectory_id": "traj-004e53d5-006c-4e61-91a4-dc51cf7ee9bd", + "timestamp": "2026-04-14T14:37:42.381702+00:00", + "stage": "retrieval", + "event_type": "task_received", + "payload": { + "input": "Check the current system status." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-4b6839de-ac61-414f-8939-3ba335a93cfa", + "trajectory_id": "traj-004e53d5-006c-4e61-91a4-dc51cf7ee9bd", + "timestamp": "2026-04-14T14:37:42.381707+00:00", + "stage": "retrieval", + "event_type": "candidates_recalled", + "payload": { + "memory_ids": [ + "mem-telegram-pref" + ], + "skill_ids": [ + "skill-deploy" + ], + "tool_ids": [ + "tool-terminal" + ] + }, + "metrics": {}, + "parent_event_id": "evt-501ed3a1-622f-4e8a-b90b-2fb0384d89bd" + }, + { + "event_id": "evt-1b229a15-af51-4924-932d-4d0318f0ba26", + "trajectory_id": "traj-004e53d5-006c-4e61-91a4-dc51cf7ee9bd", + "timestamp": "2026-04-14T14:37:42.381711+00:00", + "stage": "policy", + "event_type": "action_selected", + "payload": { + "step": 1, + "decision_type": "load_skill", + "selected_ids": [ + "skill-deploy" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Predicted by learning router (score=1413.615).", + "estimated_cost": 0.0 + }, + "metrics": {}, + "parent_event_id": "evt-4b6839de-ac61-414f-8939-3ba335a93cfa" + }, + { + "event_id": "evt-skill-traj-004e53d5-006c-4e61-91a4-dc51cf7ee9bd-skill-deploy", + "trajectory_id": "traj-004e53d5-006c-4e61-91a4-dc51cf7ee9bd", + "timestamp": "2026-04-14T14:37:42.381718+00:00", + "stage": "execution", + "event_type": "skill_loaded", + "payload": { + "skill_id": "skill-deploy", + "input": "Check the current system status.", + "instructions": "Demo skill payload loaded successfully." + }, + "metrics": {}, + "parent_event_id": null + } + ], + "outcome": { + "status": "success", + "steps": 1, + "latency_ms": 0, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Draft trajectory generated by MemabraRunner with execution hooks." + }, + "reward": { + "total": 1.04, + "components": { + "task_success": 0.8, + "retrieval_hit": 0.2, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.0, + "context_cost": 0.06, + "useful_reuse": 0.1 + } + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/trajectories/traj-120aec7e-a74d-42d6-8846-c472680cc2f3.json b/docs/demo-artifacts/trajectories/traj-120aec7e-a74d-42d6-8846-c472680cc2f3.json new file mode 100644 index 0000000..d36d651 --- /dev/null +++ b/docs/demo-artifacts/trajectories/traj-120aec7e-a74d-42d6-8846-c472680cc2f3.json @@ -0,0 +1,207 @@ +{ + "trajectory_id": "traj-120aec7e-a74d-42d6-8846-c472680cc2f3", + "task": { + "task_id": "task-78a318e6-c8b4-4d05-bfd8-2ebe4b19710f", + "input": "Check the current system status.", + "channel": "local", + "created_at": "2026-04-14T15:27:38.518486+00:00", + "user_id": "oza" + }, + "context_snapshot": { + "conversation_summary": "", + "environment_summary": "", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-telegram-pref", + "type": "memory", + "title": "Telegram preference", + "summary": "Prefer plain text on Telegram.", + "triggers": [ + "telegram", + "preference", + "answer" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 0.9, + "risk": 0.0, + "tags": [ + "output" + ], + "source": "user", + "type_payload": {} + } + ], + "skill": [ + { + "id": "skill-deploy", + "type": "skill", + "title": "Deploy workflow", + "summary": "Reusable deployment workflow.", + "triggers": [ + "deploy", + "workflow", + "service" + ], + "cost": 0.0, + "confidence": 0.8, + "success_rate": 0.9, + "freshness": 0.8, + "risk": 0.0, + "tags": [ + "ops" + ], + "source": "system", + "type_payload": {} + } + ], + "tool": [ + { + "id": "tool-terminal", + "type": "tool", + "title": "terminal", + "summary": "Run terminal-style inspection commands.", + "triggers": [ + "check", + "current", + "status", + "system" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 1.0, + "risk": 0.0, + "tags": [ + "inspection" + ], + "source": "system", + "type_payload": {} + } + ] + }, + "decisions": [ + { + "step": 1, + "decision_type": "call_tool", + "selected_ids": [ + "tool-terminal" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Task asks for current state or external action; tool use is justified.", + "estimated_cost": 0.0 + } + ], + "events": [ + { + "event_id": "evt-be0db4ba-93b9-4cf7-bd76-51c1af70c6d4", + "trajectory_id": "traj-120aec7e-a74d-42d6-8846-c472680cc2f3", + "timestamp": "2026-04-14T15:27:38.518550+00:00", + "stage": "retrieval", + "event_type": "task_received", + "payload": { + "input": "Check the current system status." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-fb7734b7-bdab-4e24-8dec-a9debf02529d", + "trajectory_id": "traj-120aec7e-a74d-42d6-8846-c472680cc2f3", + "timestamp": "2026-04-14T15:27:38.518556+00:00", + "stage": "retrieval", + "event_type": "candidates_recalled", + "payload": { + "memory_ids": [ + "mem-telegram-pref" + ], + "skill_ids": [ + "skill-deploy" + ], + "tool_ids": [ + "tool-terminal" + ] + }, + "metrics": {}, + "parent_event_id": "evt-be0db4ba-93b9-4cf7-bd76-51c1af70c6d4" + }, + { + "event_id": "evt-8ed4e73b-2b45-44a6-9ab6-cc6184202dc0", + "trajectory_id": "traj-120aec7e-a74d-42d6-8846-c472680cc2f3", + "timestamp": "2026-04-14T15:27:38.518561+00:00", + "stage": "policy", + "event_type": "action_selected", + "payload": { + "step": 1, + "decision_type": "call_tool", + "selected_ids": [ + "tool-terminal" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Task asks for current state or external action; tool use is justified.", + "estimated_cost": 0.0 + }, + "metrics": {}, + "parent_event_id": "evt-fb7734b7-bdab-4e24-8dec-a9debf02529d" + }, + { + "event_id": "evt-tool-traj-120aec7e-a74d-42d6-8846-c472680cc2f3-tool-terminal", + "trajectory_id": "traj-120aec7e-a74d-42d6-8846-c472680cc2f3", + "timestamp": "2026-04-14T15:27:38.518572+00:00", + "stage": "execution", + "event_type": "tool_called", + "payload": { + "tool_id": "tool-terminal", + "input": "Check the current system status." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-tool-result-traj-120aec7e-a74d-42d6-8846-c472680cc2f3-tool-terminal", + "trajectory_id": "traj-120aec7e-a74d-42d6-8846-c472680cc2f3", + "timestamp": "2026-04-14T15:27:38.518575+00:00", + "stage": "execution", + "event_type": "tool_result", + "payload": { + "tool_id": "tool-terminal", + "status": "success", + "output": "demo-result-for:tool-terminal", + "error": null, + "latency_ms": 42 + }, + "metrics": {}, + "parent_event_id": null + } + ], + "outcome": { + "status": "success", + "steps": 1, + "latency_ms": 42, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Draft trajectory generated by MemabraRunner with execution hooks." + }, + "reward": { + "total": 1.032, + "components": { + "task_success": 0.8, + "retrieval_hit": 0.25, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.008, + "context_cost": 0.06, + "useful_reuse": 0.05 + } + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/trajectories/traj-179d0c19-3f0f-4429-a85b-3e01802290d3.json b/docs/demo-artifacts/trajectories/traj-179d0c19-3f0f-4429-a85b-3e01802290d3.json new file mode 100644 index 0000000..c0988ec --- /dev/null +++ b/docs/demo-artifacts/trajectories/traj-179d0c19-3f0f-4429-a85b-3e01802290d3.json @@ -0,0 +1,207 @@ +{ + "trajectory_id": "traj-179d0c19-3f0f-4429-a85b-3e01802290d3", + "task": { + "task_id": "task-c0d9120f-4b28-4815-bcbc-1ea1cb523129", + "input": "Check the current system status.", + "channel": "telegram", + "created_at": "2026-04-14T15:27:38.512676+00:00", + "user_id": "oza" + }, + "context_snapshot": { + "conversation_summary": "", + "environment_summary": "", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-telegram-pref", + "type": "memory", + "title": "Telegram preference", + "summary": "Prefer plain text on Telegram.", + "triggers": [ + "telegram", + "preference", + "answer" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 0.9, + "risk": 0.0, + "tags": [ + "output" + ], + "source": "user", + "type_payload": {} + } + ], + "skill": [ + { + "id": "skill-deploy", + "type": "skill", + "title": "Deploy workflow", + "summary": "Reusable deployment workflow.", + "triggers": [ + "deploy", + "workflow", + "service" + ], + "cost": 0.0, + "confidence": 0.8, + "success_rate": 0.9, + "freshness": 0.8, + "risk": 0.0, + "tags": [ + "ops" + ], + "source": "system", + "type_payload": {} + } + ], + "tool": [ + { + "id": "tool-terminal", + "type": "tool", + "title": "terminal", + "summary": "Run terminal-style inspection commands.", + "triggers": [ + "check", + "current", + "status", + "system" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 1.0, + "risk": 0.0, + "tags": [ + "inspection" + ], + "source": "system", + "type_payload": {} + } + ] + }, + "decisions": [ + { + "step": 1, + "decision_type": "call_tool", + "selected_ids": [ + "tool-terminal" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Task asks for current state or external action; tool use is justified.", + "estimated_cost": 0.0 + } + ], + "events": [ + { + "event_id": "evt-2e159144-a5dc-4bab-bb15-026b156788a7", + "trajectory_id": "traj-179d0c19-3f0f-4429-a85b-3e01802290d3", + "timestamp": "2026-04-14T15:27:38.512756+00:00", + "stage": "retrieval", + "event_type": "task_received", + "payload": { + "input": "Check the current system status." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-84681604-ee59-4618-8b1b-bdc521e58e7d", + "trajectory_id": "traj-179d0c19-3f0f-4429-a85b-3e01802290d3", + "timestamp": "2026-04-14T15:27:38.512762+00:00", + "stage": "retrieval", + "event_type": "candidates_recalled", + "payload": { + "memory_ids": [ + "mem-telegram-pref" + ], + "skill_ids": [ + "skill-deploy" + ], + "tool_ids": [ + "tool-terminal" + ] + }, + "metrics": {}, + "parent_event_id": "evt-2e159144-a5dc-4bab-bb15-026b156788a7" + }, + { + "event_id": "evt-6404a35f-8775-4fc1-9648-62a27f4a1b23", + "trajectory_id": "traj-179d0c19-3f0f-4429-a85b-3e01802290d3", + "timestamp": "2026-04-14T15:27:38.512767+00:00", + "stage": "policy", + "event_type": "action_selected", + "payload": { + "step": 1, + "decision_type": "call_tool", + "selected_ids": [ + "tool-terminal" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Task asks for current state or external action; tool use is justified.", + "estimated_cost": 0.0 + }, + "metrics": {}, + "parent_event_id": "evt-84681604-ee59-4618-8b1b-bdc521e58e7d" + }, + { + "event_id": "evt-tool-traj-179d0c19-3f0f-4429-a85b-3e01802290d3-tool-terminal", + "trajectory_id": "traj-179d0c19-3f0f-4429-a85b-3e01802290d3", + "timestamp": "2026-04-14T15:27:38.512781+00:00", + "stage": "execution", + "event_type": "tool_called", + "payload": { + "tool_id": "tool-terminal", + "input": "Check the current system status." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-tool-result-traj-179d0c19-3f0f-4429-a85b-3e01802290d3-tool-terminal", + "trajectory_id": "traj-179d0c19-3f0f-4429-a85b-3e01802290d3", + "timestamp": "2026-04-14T15:27:38.512785+00:00", + "stage": "execution", + "event_type": "tool_result", + "payload": { + "tool_id": "tool-terminal", + "status": "success", + "output": "demo-result-for:tool-terminal", + "error": null, + "latency_ms": 42 + }, + "metrics": {}, + "parent_event_id": null + } + ], + "outcome": { + "status": "success", + "steps": 1, + "latency_ms": 42, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Draft trajectory generated by MemabraRunner with execution hooks." + }, + "reward": { + "total": 1.032, + "components": { + "task_success": 0.8, + "retrieval_hit": 0.25, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.008, + "context_cost": 0.06, + "useful_reuse": 0.05 + } + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/trajectories/traj-1ac5bb3d-f865-4c8c-8ff4-a9c29472b303.json b/docs/demo-artifacts/trajectories/traj-1ac5bb3d-f865-4c8c-8ff4-a9c29472b303.json new file mode 100644 index 0000000..215d24b --- /dev/null +++ b/docs/demo-artifacts/trajectories/traj-1ac5bb3d-f865-4c8c-8ff4-a9c29472b303.json @@ -0,0 +1,192 @@ +{ + "trajectory_id": "traj-1ac5bb3d-f865-4c8c-8ff4-a9c29472b303", + "task": { + "task_id": "task-f3701d8c-4931-4e43-8488-5fc670e5b2b1", + "input": "Deploy this service with the usual workflow.", + "channel": "local", + "created_at": "2026-04-14T14:37:42.380802+00:00", + "user_id": "oza" + }, + "context_snapshot": { + "conversation_summary": "", + "environment_summary": "", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-telegram-pref", + "type": "memory", + "title": "Telegram preference", + "summary": "Prefer plain text on Telegram.", + "triggers": [ + "telegram", + "preference", + "answer" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 0.9, + "risk": 0.0, + "tags": [ + "output" + ], + "source": "user", + "type_payload": {} + } + ], + "skill": [ + { + "id": "skill-deploy", + "type": "skill", + "title": "Deploy workflow", + "summary": "Reusable deployment workflow.", + "triggers": [ + "deploy", + "workflow", + "service" + ], + "cost": 0.0, + "confidence": 0.8, + "success_rate": 0.9, + "freshness": 0.8, + "risk": 0.0, + "tags": [ + "ops" + ], + "source": "system", + "type_payload": {} + } + ], + "tool": [ + { + "id": "tool-terminal", + "type": "tool", + "title": "terminal", + "summary": "Run terminal-style inspection commands.", + "triggers": [ + "check", + "current", + "status", + "system" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 1.0, + "risk": 0.0, + "tags": [ + "inspection" + ], + "source": "system", + "type_payload": {} + } + ] + }, + "decisions": [ + { + "step": 1, + "decision_type": "load_skill", + "selected_ids": [ + "skill-deploy" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Task resembles a reusable procedure; load a skill before action.", + "estimated_cost": 0.0 + } + ], + "events": [ + { + "event_id": "evt-480e859f-7e5f-42f0-bfcc-f3cb954f75d5", + "trajectory_id": "traj-1ac5bb3d-f865-4c8c-8ff4-a9c29472b303", + "timestamp": "2026-04-14T14:37:42.380861+00:00", + "stage": "retrieval", + "event_type": "task_received", + "payload": { + "input": "Deploy this service with the usual workflow." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-398d16c2-3d12-44a7-8af2-aa306e20195c", + "trajectory_id": "traj-1ac5bb3d-f865-4c8c-8ff4-a9c29472b303", + "timestamp": "2026-04-14T14:37:42.380867+00:00", + "stage": "retrieval", + "event_type": "candidates_recalled", + "payload": { + "memory_ids": [ + "mem-telegram-pref" + ], + "skill_ids": [ + "skill-deploy" + ], + "tool_ids": [ + "tool-terminal" + ] + }, + "metrics": {}, + "parent_event_id": "evt-480e859f-7e5f-42f0-bfcc-f3cb954f75d5" + }, + { + "event_id": "evt-b63063ea-1ac7-4b85-a6c7-76a03791bc85", + "trajectory_id": "traj-1ac5bb3d-f865-4c8c-8ff4-a9c29472b303", + "timestamp": "2026-04-14T14:37:42.380871+00:00", + "stage": "policy", + "event_type": "action_selected", + "payload": { + "step": 1, + "decision_type": "load_skill", + "selected_ids": [ + "skill-deploy" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Task resembles a reusable procedure; load a skill before action.", + "estimated_cost": 0.0 + }, + "metrics": {}, + "parent_event_id": "evt-398d16c2-3d12-44a7-8af2-aa306e20195c" + }, + { + "event_id": "evt-skill-traj-1ac5bb3d-f865-4c8c-8ff4-a9c29472b303-skill-deploy", + "trajectory_id": "traj-1ac5bb3d-f865-4c8c-8ff4-a9c29472b303", + "timestamp": "2026-04-14T14:37:42.380877+00:00", + "stage": "execution", + "event_type": "skill_loaded", + "payload": { + "skill_id": "skill-deploy", + "input": "Deploy this service with the usual workflow.", + "instructions": "Demo skill payload loaded successfully." + }, + "metrics": {}, + "parent_event_id": null + } + ], + "outcome": { + "status": "success", + "steps": 1, + "latency_ms": 0, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Draft trajectory generated by MemabraRunner with execution hooks." + }, + "reward": { + "total": 1.04, + "components": { + "task_success": 0.8, + "retrieval_hit": 0.2, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.0, + "context_cost": 0.06, + "useful_reuse": 0.1 + } + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/trajectories/traj-1c2b1a9e-7290-4ea4-be52-c6ba60b72da0.json b/docs/demo-artifacts/trajectories/traj-1c2b1a9e-7290-4ea4-be52-c6ba60b72da0.json new file mode 100644 index 0000000..5849c0e --- /dev/null +++ b/docs/demo-artifacts/trajectories/traj-1c2b1a9e-7290-4ea4-be52-c6ba60b72da0.json @@ -0,0 +1,170 @@ +{ + "trajectory_id": "traj-1c2b1a9e-7290-4ea4-be52-c6ba60b72da0", + "task": { + "task_id": "task-bb730dc5-88ed-4455-9dbb-6cbba55ad0ce", + "input": "Check current system status with a tool.", + "channel": "local", + "created_at": "2026-04-14T16:50:18.864549+00:00", + "user_id": null + }, + "context_snapshot": { + "conversation_summary": "", + "environment_summary": "", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-telegram-pref", + "type": "memory", + "title": "Telegram preference", + "summary": "Prefer plain text on Telegram.", + "triggers": [ + "telegram", + "preference", + "answer" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 0.9, + "risk": 0.0, + "tags": [ + "output" + ], + "source": "user", + "type_payload": {} + } + ], + "skill": [ + { + "id": "skill-deploy", + "type": "skill", + "title": "Deploy workflow", + "summary": "Reusable deployment workflow.", + "triggers": [ + "deploy", + "workflow", + "service" + ], + "cost": 0.0, + "confidence": 0.8, + "success_rate": 0.9, + "freshness": 0.8, + "risk": 0.0, + "tags": [ + "ops" + ], + "source": "system", + "type_payload": {} + } + ], + "tool": [ + { + "id": "tool-terminal", + "type": "tool", + "title": "terminal", + "summary": "Run terminal-style inspection commands.", + "triggers": [ + "check", + "current", + "status", + "system" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 1.0, + "risk": 0.0, + "tags": [ + "inspection" + ], + "source": "system", + "type_payload": {} + } + ] + }, + "decisions": [ + { + "step": 1, + "decision_type": "clarify", + "selected_ids": [], + "selected_payloads": [], + "rejected_ids": [], + "rationale": "Predicted by learning router (score=2045.615).", + "estimated_cost": 0.0 + } + ], + "events": [ + { + "event_id": "evt-f491ed7a-0017-463f-a346-2b13aac2ef27", + "trajectory_id": "traj-1c2b1a9e-7290-4ea4-be52-c6ba60b72da0", + "timestamp": "2026-04-14T16:50:18.864653+00:00", + "stage": "retrieval", + "event_type": "task_received", + "payload": { + "input": "Check current system status with a tool." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-9b88da4b-fe41-4522-ba53-e88adf3df3b4", + "trajectory_id": "traj-1c2b1a9e-7290-4ea4-be52-c6ba60b72da0", + "timestamp": "2026-04-14T16:50:18.864663+00:00", + "stage": "retrieval", + "event_type": "candidates_recalled", + "payload": { + "memory_ids": [ + "mem-telegram-pref" + ], + "skill_ids": [ + "skill-deploy" + ], + "tool_ids": [ + "tool-terminal" + ] + }, + "metrics": {}, + "parent_event_id": "evt-f491ed7a-0017-463f-a346-2b13aac2ef27" + }, + { + "event_id": "evt-2fc97f2c-8219-44d3-98c7-5a86ad88326d", + "trajectory_id": "traj-1c2b1a9e-7290-4ea4-be52-c6ba60b72da0", + "timestamp": "2026-04-14T16:50:18.864669+00:00", + "stage": "policy", + "event_type": "action_selected", + "payload": { + "step": 1, + "decision_type": "clarify", + "selected_ids": [], + "selected_payloads": [], + "rejected_ids": [], + "rationale": "Predicted by learning router (score=2045.615).", + "estimated_cost": 0.0 + }, + "metrics": {}, + "parent_event_id": "evt-9b88da4b-fe41-4522-ba53-e88adf3df3b4" + } + ], + "outcome": { + "status": "partial_success", + "steps": 1, + "latency_ms": 0, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Draft trajectory generated by MemabraRunner with execution hooks." + }, + "reward": { + "total": 0.44, + "components": { + "task_success": 0.4, + "retrieval_hit": 0.1, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.0, + "context_cost": 0.06, + "useful_reuse": 0.0 + } + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/trajectories/traj-1ea60d6e-0b83-4cdf-a601-159373c780ee.json b/docs/demo-artifacts/trajectories/traj-1ea60d6e-0b83-4cdf-a601-159373c780ee.json new file mode 100644 index 0000000..77de653 --- /dev/null +++ b/docs/demo-artifacts/trajectories/traj-1ea60d6e-0b83-4cdf-a601-159373c780ee.json @@ -0,0 +1,207 @@ +{ + "trajectory_id": "traj-1ea60d6e-0b83-4cdf-a601-159373c780ee", + "task": { + "task_id": "task-c5221ec3-e5b9-4a2f-9774-fbb75018fe08", + "input": "Check current system status with a tool.", + "channel": "local", + "created_at": "2026-04-14T16:50:18.862393+00:00", + "user_id": null + }, + "context_snapshot": { + "conversation_summary": "", + "environment_summary": "", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-telegram-pref", + "type": "memory", + "title": "Telegram preference", + "summary": "Prefer plain text on Telegram.", + "triggers": [ + "telegram", + "preference", + "answer" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 0.9, + "risk": 0.0, + "tags": [ + "output" + ], + "source": "user", + "type_payload": {} + } + ], + "skill": [ + { + "id": "skill-deploy", + "type": "skill", + "title": "Deploy workflow", + "summary": "Reusable deployment workflow.", + "triggers": [ + "deploy", + "workflow", + "service" + ], + "cost": 0.0, + "confidence": 0.8, + "success_rate": 0.9, + "freshness": 0.8, + "risk": 0.0, + "tags": [ + "ops" + ], + "source": "system", + "type_payload": {} + } + ], + "tool": [ + { + "id": "tool-terminal", + "type": "tool", + "title": "terminal", + "summary": "Run terminal-style inspection commands.", + "triggers": [ + "check", + "current", + "status", + "system" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 1.0, + "risk": 0.0, + "tags": [ + "inspection" + ], + "source": "system", + "type_payload": {} + } + ] + }, + "decisions": [ + { + "step": 1, + "decision_type": "call_tool", + "selected_ids": [ + "tool-terminal" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Task asks for current state or external action; tool use is justified.", + "estimated_cost": 0.0 + } + ], + "events": [ + { + "event_id": "evt-93525bc5-5e71-481c-a7d4-0282ef59e0a3", + "trajectory_id": "traj-1ea60d6e-0b83-4cdf-a601-159373c780ee", + "timestamp": "2026-04-14T16:50:18.862483+00:00", + "stage": "retrieval", + "event_type": "task_received", + "payload": { + "input": "Check current system status with a tool." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-a01d1dff-a6dc-4c25-a5a5-14efd6f182b2", + "trajectory_id": "traj-1ea60d6e-0b83-4cdf-a601-159373c780ee", + "timestamp": "2026-04-14T16:50:18.862492+00:00", + "stage": "retrieval", + "event_type": "candidates_recalled", + "payload": { + "memory_ids": [ + "mem-telegram-pref" + ], + "skill_ids": [ + "skill-deploy" + ], + "tool_ids": [ + "tool-terminal" + ] + }, + "metrics": {}, + "parent_event_id": "evt-93525bc5-5e71-481c-a7d4-0282ef59e0a3" + }, + { + "event_id": "evt-28946864-c699-42fd-9802-dbfe6cb09043", + "trajectory_id": "traj-1ea60d6e-0b83-4cdf-a601-159373c780ee", + "timestamp": "2026-04-14T16:50:18.862498+00:00", + "stage": "policy", + "event_type": "action_selected", + "payload": { + "step": 1, + "decision_type": "call_tool", + "selected_ids": [ + "tool-terminal" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Task asks for current state or external action; tool use is justified.", + "estimated_cost": 0.0 + }, + "metrics": {}, + "parent_event_id": "evt-a01d1dff-a6dc-4c25-a5a5-14efd6f182b2" + }, + { + "event_id": "evt-tool-traj-1ea60d6e-0b83-4cdf-a601-159373c780ee-tool-terminal", + "trajectory_id": "traj-1ea60d6e-0b83-4cdf-a601-159373c780ee", + "timestamp": "2026-04-14T16:50:18.862511+00:00", + "stage": "execution", + "event_type": "tool_called", + "payload": { + "tool_id": "tool-terminal", + "input": "Check current system status with a tool." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-tool-result-traj-1ea60d6e-0b83-4cdf-a601-159373c780ee-tool-terminal", + "trajectory_id": "traj-1ea60d6e-0b83-4cdf-a601-159373c780ee", + "timestamp": "2026-04-14T16:50:18.862515+00:00", + "stage": "execution", + "event_type": "tool_result", + "payload": { + "tool_id": "tool-terminal", + "status": "success", + "output": "demo-result-for:tool-terminal", + "error": null, + "latency_ms": 42 + }, + "metrics": {}, + "parent_event_id": null + } + ], + "outcome": { + "status": "success", + "steps": 1, + "latency_ms": 42, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Draft trajectory generated by MemabraRunner with execution hooks." + }, + "reward": { + "total": 1.032, + "components": { + "task_success": 0.8, + "retrieval_hit": 0.25, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.008, + "context_cost": 0.06, + "useful_reuse": 0.05 + } + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/trajectories/traj-217ccafa-716c-4534-813b-a489ed7d6079.json b/docs/demo-artifacts/trajectories/traj-217ccafa-716c-4534-813b-a489ed7d6079.json new file mode 100644 index 0000000..2feda3b --- /dev/null +++ b/docs/demo-artifacts/trajectories/traj-217ccafa-716c-4534-813b-a489ed7d6079.json @@ -0,0 +1,170 @@ +{ + "trajectory_id": "traj-217ccafa-716c-4534-813b-a489ed7d6079", + "task": { + "task_id": "task-5f14e5ed-0635-44a0-82e8-419187b040f3", + "input": "Use multiple capabilities: memory, skill, and tool.", + "channel": "local", + "created_at": "2026-04-14T15:52:24.605025+00:00", + "user_id": null + }, + "context_snapshot": { + "conversation_summary": "", + "environment_summary": "", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-telegram-pref", + "type": "memory", + "title": "Telegram preference", + "summary": "Prefer plain text on Telegram.", + "triggers": [ + "telegram", + "preference", + "answer" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 0.9, + "risk": 0.0, + "tags": [ + "output" + ], + "source": "user", + "type_payload": {} + } + ], + "skill": [ + { + "id": "skill-deploy", + "type": "skill", + "title": "Deploy workflow", + "summary": "Reusable deployment workflow.", + "triggers": [ + "deploy", + "workflow", + "service" + ], + "cost": 0.0, + "confidence": 0.8, + "success_rate": 0.9, + "freshness": 0.8, + "risk": 0.0, + "tags": [ + "ops" + ], + "source": "system", + "type_payload": {} + } + ], + "tool": [ + { + "id": "tool-terminal", + "type": "tool", + "title": "terminal", + "summary": "Run terminal-style inspection commands.", + "triggers": [ + "check", + "current", + "status", + "system" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 1.0, + "risk": 0.0, + "tags": [ + "inspection" + ], + "source": "system", + "type_payload": {} + } + ] + }, + "decisions": [ + { + "step": 1, + "decision_type": "clarify", + "selected_ids": [], + "selected_payloads": [], + "rejected_ids": [], + "rationale": "No high-confidence route found from the current heuristic baseline.", + "estimated_cost": 0.0 + } + ], + "events": [ + { + "event_id": "evt-13ccd07e-9bfd-4ff8-8080-47c400f0be6f", + "trajectory_id": "traj-217ccafa-716c-4534-813b-a489ed7d6079", + "timestamp": "2026-04-14T15:52:24.605116+00:00", + "stage": "retrieval", + "event_type": "task_received", + "payload": { + "input": "Use multiple capabilities: memory, skill, and tool." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-7ecaa289-b7bb-4ac6-ad62-9afb4a49d4a8", + "trajectory_id": "traj-217ccafa-716c-4534-813b-a489ed7d6079", + "timestamp": "2026-04-14T15:52:24.605126+00:00", + "stage": "retrieval", + "event_type": "candidates_recalled", + "payload": { + "memory_ids": [ + "mem-telegram-pref" + ], + "skill_ids": [ + "skill-deploy" + ], + "tool_ids": [ + "tool-terminal" + ] + }, + "metrics": {}, + "parent_event_id": "evt-13ccd07e-9bfd-4ff8-8080-47c400f0be6f" + }, + { + "event_id": "evt-ad398931-c79d-411a-93f8-8c5834f5446d", + "trajectory_id": "traj-217ccafa-716c-4534-813b-a489ed7d6079", + "timestamp": "2026-04-14T15:52:24.605138+00:00", + "stage": "policy", + "event_type": "action_selected", + "payload": { + "step": 1, + "decision_type": "clarify", + "selected_ids": [], + "selected_payloads": [], + "rejected_ids": [], + "rationale": "No high-confidence route found from the current heuristic baseline.", + "estimated_cost": 0.0 + }, + "metrics": {}, + "parent_event_id": "evt-7ecaa289-b7bb-4ac6-ad62-9afb4a49d4a8" + } + ], + "outcome": { + "status": "partial_success", + "steps": 1, + "latency_ms": 0, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Draft trajectory generated by MemabraRunner with execution hooks." + }, + "reward": { + "total": 0.44, + "components": { + "task_success": 0.4, + "retrieval_hit": 0.1, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.0, + "context_cost": 0.06, + "useful_reuse": 0.0 + } + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/trajectories/traj-3f6687ff-3a55-4a26-a7bc-8397d8da7d15.json b/docs/demo-artifacts/trajectories/traj-3f6687ff-3a55-4a26-a7bc-8397d8da7d15.json new file mode 100644 index 0000000..7c188b8 --- /dev/null +++ b/docs/demo-artifacts/trajectories/traj-3f6687ff-3a55-4a26-a7bc-8397d8da7d15.json @@ -0,0 +1,185 @@ +{ + "trajectory_id": "traj-3f6687ff-3a55-4a26-a7bc-8397d8da7d15", + "task": { + "task_id": "task-aeed227c-2e87-45d8-8d98-e270656556b6", + "input": "Use my telegram preference for this answer.", + "channel": "telegram", + "created_at": "2026-04-14T06:53:08.731336+00:00", + "user_id": "oza" + }, + "context_snapshot": { + "conversation_summary": "", + "environment_summary": "", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-telegram-pref", + "type": "memory", + "title": "Telegram preference", + "summary": "Prefer plain text on Telegram.", + "triggers": [ + "telegram", + "preference", + "answer" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 0.9, + "risk": 0.0, + "tags": [ + "output" + ], + "source": "user", + "type_payload": {} + } + ], + "skill": [ + { + "id": "skill-deploy", + "type": "skill", + "title": "Deploy workflow", + "summary": "Reusable deployment workflow.", + "triggers": [ + "deploy", + "workflow", + "service" + ], + "cost": 0.0, + "confidence": 0.8, + "success_rate": 0.9, + "freshness": 0.8, + "risk": 0.0, + "tags": [ + "ops" + ], + "source": "system", + "type_payload": {} + } + ], + "tool": [ + { + "id": "tool-terminal", + "type": "tool", + "title": "terminal", + "summary": "Run terminal-style inspection commands.", + "triggers": [ + "check", + "current", + "status", + "system" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 1.0, + "risk": 0.0, + "tags": [ + "inspection" + ], + "source": "system", + "type_payload": {} + } + ] + }, + "decisions": [ + { + "step": 1, + "decision_type": "inject_memory", + "selected_ids": [ + "mem-telegram-pref" + ], + "rejected_ids": [], + "rationale": "Task likely depends on stable user/project facts.", + "estimated_cost": 0.0 + } + ], + "events": [ + { + "event_id": "evt-d71b1fdf-5343-4ac1-89a0-75488c1ce30b", + "trajectory_id": "traj-3f6687ff-3a55-4a26-a7bc-8397d8da7d15", + "timestamp": "2026-04-14T06:53:08.731418+00:00", + "stage": "retrieval", + "event_type": "task_received", + "payload": { + "input": "Use my telegram preference for this answer." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-1f750475-1127-41e5-9f94-c87e4b019ee2", + "trajectory_id": "traj-3f6687ff-3a55-4a26-a7bc-8397d8da7d15", + "timestamp": "2026-04-14T06:53:08.731427+00:00", + "stage": "retrieval", + "event_type": "candidates_recalled", + "payload": { + "memory_ids": [ + "mem-telegram-pref" + ], + "skill_ids": [ + "skill-deploy" + ], + "tool_ids": [ + "tool-terminal" + ] + }, + "metrics": {}, + "parent_event_id": "evt-d71b1fdf-5343-4ac1-89a0-75488c1ce30b" + }, + { + "event_id": "evt-741967a5-41b9-4917-9b95-4047f89e6e19", + "trajectory_id": "traj-3f6687ff-3a55-4a26-a7bc-8397d8da7d15", + "timestamp": "2026-04-14T06:53:08.731432+00:00", + "stage": "policy", + "event_type": "action_selected", + "payload": { + "step": 1, + "decision_type": "inject_memory", + "selected_ids": [ + "mem-telegram-pref" + ], + "rejected_ids": [], + "rationale": "Task likely depends on stable user/project facts.", + "estimated_cost": 0.0 + }, + "metrics": {}, + "parent_event_id": "evt-1f750475-1127-41e5-9f94-c87e4b019ee2" + }, + { + "event_id": "evt-memory-traj-3f6687ff-3a55-4a26-a7bc-8397d8da7d15-mem-telegram-pref", + "trajectory_id": "traj-3f6687ff-3a55-4a26-a7bc-8397d8da7d15", + "timestamp": "2026-04-14T06:53:08.731437+00:00", + "stage": "execution", + "event_type": "memory_injected", + "payload": { + "record_id": "mem-telegram-pref", + "input": "Use my telegram preference for this answer." + }, + "metrics": {}, + "parent_event_id": null + } + ], + "outcome": { + "status": "success", + "steps": 1, + "latency_ms": 0, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Draft trajectory generated by MemabraRunner with execution hooks." + }, + "reward": { + "total": 1.1, + "components": { + "task_success": 0.8, + "retrieval_hit": 0.2, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.0, + "context_cost": 0.0, + "useful_reuse": 0.1 + } + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/trajectories/traj-439e4552-f248-43cb-b4eb-25db14da1ebc.json b/docs/demo-artifacts/trajectories/traj-439e4552-f248-43cb-b4eb-25db14da1ebc.json new file mode 100644 index 0000000..78cb92b --- /dev/null +++ b/docs/demo-artifacts/trajectories/traj-439e4552-f248-43cb-b4eb-25db14da1ebc.json @@ -0,0 +1,207 @@ +{ + "trajectory_id": "traj-439e4552-f248-43cb-b4eb-25db14da1ebc", + "task": { + "task_id": "task-cde62e1c-0106-4803-9c7d-a0c2f58206d6", + "input": "Check the current system status.", + "channel": "local", + "created_at": "2026-04-14T14:37:42.380386+00:00", + "user_id": "oza" + }, + "context_snapshot": { + "conversation_summary": "", + "environment_summary": "", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-telegram-pref", + "type": "memory", + "title": "Telegram preference", + "summary": "Prefer plain text on Telegram.", + "triggers": [ + "telegram", + "preference", + "answer" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 0.9, + "risk": 0.0, + "tags": [ + "output" + ], + "source": "user", + "type_payload": {} + } + ], + "skill": [ + { + "id": "skill-deploy", + "type": "skill", + "title": "Deploy workflow", + "summary": "Reusable deployment workflow.", + "triggers": [ + "deploy", + "workflow", + "service" + ], + "cost": 0.0, + "confidence": 0.8, + "success_rate": 0.9, + "freshness": 0.8, + "risk": 0.0, + "tags": [ + "ops" + ], + "source": "system", + "type_payload": {} + } + ], + "tool": [ + { + "id": "tool-terminal", + "type": "tool", + "title": "terminal", + "summary": "Run terminal-style inspection commands.", + "triggers": [ + "check", + "current", + "status", + "system" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 1.0, + "risk": 0.0, + "tags": [ + "inspection" + ], + "source": "system", + "type_payload": {} + } + ] + }, + "decisions": [ + { + "step": 1, + "decision_type": "call_tool", + "selected_ids": [ + "tool-terminal" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Task asks for current state or external action; tool use is justified.", + "estimated_cost": 0.0 + } + ], + "events": [ + { + "event_id": "evt-9252427a-3ceb-476a-b72d-a7e4f812194c", + "trajectory_id": "traj-439e4552-f248-43cb-b4eb-25db14da1ebc", + "timestamp": "2026-04-14T14:37:42.380442+00:00", + "stage": "retrieval", + "event_type": "task_received", + "payload": { + "input": "Check the current system status." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-333fbd7f-75b1-495f-acfa-6a66348ef16e", + "trajectory_id": "traj-439e4552-f248-43cb-b4eb-25db14da1ebc", + "timestamp": "2026-04-14T14:37:42.380447+00:00", + "stage": "retrieval", + "event_type": "candidates_recalled", + "payload": { + "memory_ids": [ + "mem-telegram-pref" + ], + "skill_ids": [ + "skill-deploy" + ], + "tool_ids": [ + "tool-terminal" + ] + }, + "metrics": {}, + "parent_event_id": "evt-9252427a-3ceb-476a-b72d-a7e4f812194c" + }, + { + "event_id": "evt-7f4eddba-f609-4d72-bf7c-cd6a938233a7", + "trajectory_id": "traj-439e4552-f248-43cb-b4eb-25db14da1ebc", + "timestamp": "2026-04-14T14:37:42.380452+00:00", + "stage": "policy", + "event_type": "action_selected", + "payload": { + "step": 1, + "decision_type": "call_tool", + "selected_ids": [ + "tool-terminal" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Task asks for current state or external action; tool use is justified.", + "estimated_cost": 0.0 + }, + "metrics": {}, + "parent_event_id": "evt-333fbd7f-75b1-495f-acfa-6a66348ef16e" + }, + { + "event_id": "evt-tool-traj-439e4552-f248-43cb-b4eb-25db14da1ebc-tool-terminal", + "trajectory_id": "traj-439e4552-f248-43cb-b4eb-25db14da1ebc", + "timestamp": "2026-04-14T14:37:42.380461+00:00", + "stage": "execution", + "event_type": "tool_called", + "payload": { + "tool_id": "tool-terminal", + "input": "Check the current system status." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-tool-result-traj-439e4552-f248-43cb-b4eb-25db14da1ebc-tool-terminal", + "trajectory_id": "traj-439e4552-f248-43cb-b4eb-25db14da1ebc", + "timestamp": "2026-04-14T14:37:42.380464+00:00", + "stage": "execution", + "event_type": "tool_result", + "payload": { + "tool_id": "tool-terminal", + "status": "success", + "output": "demo-result-for:tool-terminal", + "error": null, + "latency_ms": 42 + }, + "metrics": {}, + "parent_event_id": null + } + ], + "outcome": { + "status": "success", + "steps": 1, + "latency_ms": 42, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Draft trajectory generated by MemabraRunner with execution hooks." + }, + "reward": { + "total": 1.032, + "components": { + "task_success": 0.8, + "retrieval_hit": 0.25, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.008, + "context_cost": 0.06, + "useful_reuse": 0.05 + } + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/trajectories/traj-58ec7a90-3ada-4b78-bc6a-6351be4eb4b5.json b/docs/demo-artifacts/trajectories/traj-58ec7a90-3ada-4b78-bc6a-6351be4eb4b5.json new file mode 100644 index 0000000..fe9613c --- /dev/null +++ b/docs/demo-artifacts/trajectories/traj-58ec7a90-3ada-4b78-bc6a-6351be4eb4b5.json @@ -0,0 +1,192 @@ +{ + "trajectory_id": "traj-58ec7a90-3ada-4b78-bc6a-6351be4eb4b5", + "task": { + "task_id": "task-0c82e670-45ab-45f9-af74-c5920f5eb9b3", + "input": "Deploy this service with the usual workflow.", + "channel": "telegram", + "created_at": "2026-04-14T14:37:42.378256+00:00", + "user_id": "oza" + }, + "context_snapshot": { + "conversation_summary": "", + "environment_summary": "", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-telegram-pref", + "type": "memory", + "title": "Telegram preference", + "summary": "Prefer plain text on Telegram.", + "triggers": [ + "telegram", + "preference", + "answer" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 0.9, + "risk": 0.0, + "tags": [ + "output" + ], + "source": "user", + "type_payload": {} + } + ], + "skill": [ + { + "id": "skill-deploy", + "type": "skill", + "title": "Deploy workflow", + "summary": "Reusable deployment workflow.", + "triggers": [ + "deploy", + "workflow", + "service" + ], + "cost": 0.0, + "confidence": 0.8, + "success_rate": 0.9, + "freshness": 0.8, + "risk": 0.0, + "tags": [ + "ops" + ], + "source": "system", + "type_payload": {} + } + ], + "tool": [ + { + "id": "tool-terminal", + "type": "tool", + "title": "terminal", + "summary": "Run terminal-style inspection commands.", + "triggers": [ + "check", + "current", + "status", + "system" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 1.0, + "risk": 0.0, + "tags": [ + "inspection" + ], + "source": "system", + "type_payload": {} + } + ] + }, + "decisions": [ + { + "step": 1, + "decision_type": "load_skill", + "selected_ids": [ + "skill-deploy" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Task resembles a reusable procedure; load a skill before action.", + "estimated_cost": 0.0 + } + ], + "events": [ + { + "event_id": "evt-757f035e-551f-4b55-a506-2aac41134885", + "trajectory_id": "traj-58ec7a90-3ada-4b78-bc6a-6351be4eb4b5", + "timestamp": "2026-04-14T14:37:42.378322+00:00", + "stage": "retrieval", + "event_type": "task_received", + "payload": { + "input": "Deploy this service with the usual workflow." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-dfcdd452-1902-4a6c-97fc-fd6a993c2045", + "trajectory_id": "traj-58ec7a90-3ada-4b78-bc6a-6351be4eb4b5", + "timestamp": "2026-04-14T14:37:42.378327+00:00", + "stage": "retrieval", + "event_type": "candidates_recalled", + "payload": { + "memory_ids": [ + "mem-telegram-pref" + ], + "skill_ids": [ + "skill-deploy" + ], + "tool_ids": [ + "tool-terminal" + ] + }, + "metrics": {}, + "parent_event_id": "evt-757f035e-551f-4b55-a506-2aac41134885" + }, + { + "event_id": "evt-c680ed8f-a6b0-48d1-bcd4-7423089aa916", + "trajectory_id": "traj-58ec7a90-3ada-4b78-bc6a-6351be4eb4b5", + "timestamp": "2026-04-14T14:37:42.378332+00:00", + "stage": "policy", + "event_type": "action_selected", + "payload": { + "step": 1, + "decision_type": "load_skill", + "selected_ids": [ + "skill-deploy" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Task resembles a reusable procedure; load a skill before action.", + "estimated_cost": 0.0 + }, + "metrics": {}, + "parent_event_id": "evt-dfcdd452-1902-4a6c-97fc-fd6a993c2045" + }, + { + "event_id": "evt-skill-traj-58ec7a90-3ada-4b78-bc6a-6351be4eb4b5-skill-deploy", + "trajectory_id": "traj-58ec7a90-3ada-4b78-bc6a-6351be4eb4b5", + "timestamp": "2026-04-14T14:37:42.378339+00:00", + "stage": "execution", + "event_type": "skill_loaded", + "payload": { + "skill_id": "skill-deploy", + "input": "Deploy this service with the usual workflow.", + "instructions": "Demo skill payload loaded successfully." + }, + "metrics": {}, + "parent_event_id": null + } + ], + "outcome": { + "status": "success", + "steps": 1, + "latency_ms": 0, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Draft trajectory generated by MemabraRunner with execution hooks." + }, + "reward": { + "total": 1.04, + "components": { + "task_success": 0.8, + "retrieval_hit": 0.2, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.0, + "context_cost": 0.06, + "useful_reuse": 0.1 + } + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/trajectories/traj-6a5aaff5-9336-4a1d-b102-80f1196427ae.json b/docs/demo-artifacts/trajectories/traj-6a5aaff5-9336-4a1d-b102-80f1196427ae.json new file mode 100644 index 0000000..da8a717 --- /dev/null +++ b/docs/demo-artifacts/trajectories/traj-6a5aaff5-9336-4a1d-b102-80f1196427ae.json @@ -0,0 +1,191 @@ +{ + "trajectory_id": "traj-6a5aaff5-9336-4a1d-b102-80f1196427ae", + "task": { + "task_id": "task-549e2de3-bb55-4797-a862-e59f8d69a7e5", + "input": "Use my telegram preference for this answer.", + "channel": "telegram", + "created_at": "2026-04-14T15:27:38.519692+00:00", + "user_id": "oza" + }, + "context_snapshot": { + "conversation_summary": "", + "environment_summary": "", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-telegram-pref", + "type": "memory", + "title": "Telegram preference", + "summary": "Prefer plain text on Telegram.", + "triggers": [ + "telegram", + "preference", + "answer" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 0.9, + "risk": 0.0, + "tags": [ + "output" + ], + "source": "user", + "type_payload": {} + } + ], + "skill": [ + { + "id": "skill-deploy", + "type": "skill", + "title": "Deploy workflow", + "summary": "Reusable deployment workflow.", + "triggers": [ + "deploy", + "workflow", + "service" + ], + "cost": 0.0, + "confidence": 0.8, + "success_rate": 0.9, + "freshness": 0.8, + "risk": 0.0, + "tags": [ + "ops" + ], + "source": "system", + "type_payload": {} + } + ], + "tool": [ + { + "id": "tool-terminal", + "type": "tool", + "title": "terminal", + "summary": "Run terminal-style inspection commands.", + "triggers": [ + "check", + "current", + "status", + "system" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 1.0, + "risk": 0.0, + "tags": [ + "inspection" + ], + "source": "system", + "type_payload": {} + } + ] + }, + "decisions": [ + { + "step": 1, + "decision_type": "inject_memory", + "selected_ids": [ + "mem-telegram-pref" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Predicted by learning router (score=1854.615).", + "estimated_cost": 0.0 + } + ], + "events": [ + { + "event_id": "evt-369333af-5ca9-4c11-b163-6144d925ba91", + "trajectory_id": "traj-6a5aaff5-9336-4a1d-b102-80f1196427ae", + "timestamp": "2026-04-14T15:27:38.519774+00:00", + "stage": "retrieval", + "event_type": "task_received", + "payload": { + "input": "Use my telegram preference for this answer." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-51d31531-c49b-4af7-86f8-9fc3b5aff7a0", + "trajectory_id": "traj-6a5aaff5-9336-4a1d-b102-80f1196427ae", + "timestamp": "2026-04-14T15:27:38.519780+00:00", + "stage": "retrieval", + "event_type": "candidates_recalled", + "payload": { + "memory_ids": [ + "mem-telegram-pref" + ], + "skill_ids": [ + "skill-deploy" + ], + "tool_ids": [ + "tool-terminal" + ] + }, + "metrics": {}, + "parent_event_id": "evt-369333af-5ca9-4c11-b163-6144d925ba91" + }, + { + "event_id": "evt-3a842acf-5111-4b77-98a2-2a18c5a4a61d", + "trajectory_id": "traj-6a5aaff5-9336-4a1d-b102-80f1196427ae", + "timestamp": "2026-04-14T15:27:38.519784+00:00", + "stage": "policy", + "event_type": "action_selected", + "payload": { + "step": 1, + "decision_type": "inject_memory", + "selected_ids": [ + "mem-telegram-pref" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Predicted by learning router (score=1854.615).", + "estimated_cost": 0.0 + }, + "metrics": {}, + "parent_event_id": "evt-51d31531-c49b-4af7-86f8-9fc3b5aff7a0" + }, + { + "event_id": "evt-memory-traj-6a5aaff5-9336-4a1d-b102-80f1196427ae-mem-telegram-pref", + "trajectory_id": "traj-6a5aaff5-9336-4a1d-b102-80f1196427ae", + "timestamp": "2026-04-14T15:27:38.519790+00:00", + "stage": "execution", + "event_type": "memory_injected", + "payload": { + "record_id": "mem-telegram-pref", + "input": "Use my telegram preference for this answer." + }, + "metrics": {}, + "parent_event_id": null + } + ], + "outcome": { + "status": "success", + "steps": 1, + "latency_ms": 0, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Draft trajectory generated by MemabraRunner with execution hooks." + }, + "reward": { + "total": 1.04, + "components": { + "task_success": 0.8, + "retrieval_hit": 0.2, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.0, + "context_cost": 0.06, + "useful_reuse": 0.1 + } + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/trajectories/traj-707b1dec-1d9a-4a71-a07a-54841155103c.json b/docs/demo-artifacts/trajectories/traj-707b1dec-1d9a-4a71-a07a-54841155103c.json new file mode 100644 index 0000000..d71b6df --- /dev/null +++ b/docs/demo-artifacts/trajectories/traj-707b1dec-1d9a-4a71-a07a-54841155103c.json @@ -0,0 +1,207 @@ +{ + "trajectory_id": "traj-707b1dec-1d9a-4a71-a07a-54841155103c", + "task": { + "task_id": "task-23d5816f-12f3-4247-8c4f-9c01d13b1fd8", + "input": "Check the current system status.", + "channel": "telegram", + "created_at": "2026-04-14T14:37:42.377746+00:00", + "user_id": "oza" + }, + "context_snapshot": { + "conversation_summary": "", + "environment_summary": "", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-telegram-pref", + "type": "memory", + "title": "Telegram preference", + "summary": "Prefer plain text on Telegram.", + "triggers": [ + "telegram", + "preference", + "answer" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 0.9, + "risk": 0.0, + "tags": [ + "output" + ], + "source": "user", + "type_payload": {} + } + ], + "skill": [ + { + "id": "skill-deploy", + "type": "skill", + "title": "Deploy workflow", + "summary": "Reusable deployment workflow.", + "triggers": [ + "deploy", + "workflow", + "service" + ], + "cost": 0.0, + "confidence": 0.8, + "success_rate": 0.9, + "freshness": 0.8, + "risk": 0.0, + "tags": [ + "ops" + ], + "source": "system", + "type_payload": {} + } + ], + "tool": [ + { + "id": "tool-terminal", + "type": "tool", + "title": "terminal", + "summary": "Run terminal-style inspection commands.", + "triggers": [ + "check", + "current", + "status", + "system" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 1.0, + "risk": 0.0, + "tags": [ + "inspection" + ], + "source": "system", + "type_payload": {} + } + ] + }, + "decisions": [ + { + "step": 1, + "decision_type": "call_tool", + "selected_ids": [ + "tool-terminal" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Task asks for current state or external action; tool use is justified.", + "estimated_cost": 0.0 + } + ], + "events": [ + { + "event_id": "evt-15616207-b055-41b3-98e7-fca3fdd89ce9", + "trajectory_id": "traj-707b1dec-1d9a-4a71-a07a-54841155103c", + "timestamp": "2026-04-14T14:37:42.377821+00:00", + "stage": "retrieval", + "event_type": "task_received", + "payload": { + "input": "Check the current system status." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-431bb458-0488-4712-93d5-d7a689048022", + "trajectory_id": "traj-707b1dec-1d9a-4a71-a07a-54841155103c", + "timestamp": "2026-04-14T14:37:42.377827+00:00", + "stage": "retrieval", + "event_type": "candidates_recalled", + "payload": { + "memory_ids": [ + "mem-telegram-pref" + ], + "skill_ids": [ + "skill-deploy" + ], + "tool_ids": [ + "tool-terminal" + ] + }, + "metrics": {}, + "parent_event_id": "evt-15616207-b055-41b3-98e7-fca3fdd89ce9" + }, + { + "event_id": "evt-8bb2db02-56ae-4fad-a0bc-e30cd7fed98e", + "trajectory_id": "traj-707b1dec-1d9a-4a71-a07a-54841155103c", + "timestamp": "2026-04-14T14:37:42.377831+00:00", + "stage": "policy", + "event_type": "action_selected", + "payload": { + "step": 1, + "decision_type": "call_tool", + "selected_ids": [ + "tool-terminal" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Task asks for current state or external action; tool use is justified.", + "estimated_cost": 0.0 + }, + "metrics": {}, + "parent_event_id": "evt-431bb458-0488-4712-93d5-d7a689048022" + }, + { + "event_id": "evt-tool-traj-707b1dec-1d9a-4a71-a07a-54841155103c-tool-terminal", + "trajectory_id": "traj-707b1dec-1d9a-4a71-a07a-54841155103c", + "timestamp": "2026-04-14T14:37:42.377843+00:00", + "stage": "execution", + "event_type": "tool_called", + "payload": { + "tool_id": "tool-terminal", + "input": "Check the current system status." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-tool-result-traj-707b1dec-1d9a-4a71-a07a-54841155103c-tool-terminal", + "trajectory_id": "traj-707b1dec-1d9a-4a71-a07a-54841155103c", + "timestamp": "2026-04-14T14:37:42.377846+00:00", + "stage": "execution", + "event_type": "tool_result", + "payload": { + "tool_id": "tool-terminal", + "status": "success", + "output": "demo-result-for:tool-terminal", + "error": null, + "latency_ms": 42 + }, + "metrics": {}, + "parent_event_id": null + } + ], + "outcome": { + "status": "success", + "steps": 1, + "latency_ms": 42, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Draft trajectory generated by MemabraRunner with execution hooks." + }, + "reward": { + "total": 1.032, + "components": { + "task_success": 0.8, + "retrieval_hit": 0.25, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.008, + "context_cost": 0.06, + "useful_reuse": 0.05 + } + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/trajectories/traj-74e92442-04fd-4f5a-979f-2dd81a7f08e1.json b/docs/demo-artifacts/trajectories/traj-74e92442-04fd-4f5a-979f-2dd81a7f08e1.json new file mode 100644 index 0000000..b6121b9 --- /dev/null +++ b/docs/demo-artifacts/trajectories/traj-74e92442-04fd-4f5a-979f-2dd81a7f08e1.json @@ -0,0 +1,207 @@ +{ + "trajectory_id": "traj-74e92442-04fd-4f5a-979f-2dd81a7f08e1", + "task": { + "task_id": "task-e0c612c6-d846-4dc0-9c30-4a66d0a78d2a", + "input": "Check current system status with a tool.", + "channel": "local", + "created_at": "2026-04-14T15:52:24.604470+00:00", + "user_id": null + }, + "context_snapshot": { + "conversation_summary": "", + "environment_summary": "", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-telegram-pref", + "type": "memory", + "title": "Telegram preference", + "summary": "Prefer plain text on Telegram.", + "triggers": [ + "telegram", + "preference", + "answer" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 0.9, + "risk": 0.0, + "tags": [ + "output" + ], + "source": "user", + "type_payload": {} + } + ], + "skill": [ + { + "id": "skill-deploy", + "type": "skill", + "title": "Deploy workflow", + "summary": "Reusable deployment workflow.", + "triggers": [ + "deploy", + "workflow", + "service" + ], + "cost": 0.0, + "confidence": 0.8, + "success_rate": 0.9, + "freshness": 0.8, + "risk": 0.0, + "tags": [ + "ops" + ], + "source": "system", + "type_payload": {} + } + ], + "tool": [ + { + "id": "tool-terminal", + "type": "tool", + "title": "terminal", + "summary": "Run terminal-style inspection commands.", + "triggers": [ + "check", + "current", + "status", + "system" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 1.0, + "risk": 0.0, + "tags": [ + "inspection" + ], + "source": "system", + "type_payload": {} + } + ] + }, + "decisions": [ + { + "step": 1, + "decision_type": "call_tool", + "selected_ids": [ + "tool-terminal" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Task asks for current state or external action; tool use is justified.", + "estimated_cost": 0.0 + } + ], + "events": [ + { + "event_id": "evt-7befe34c-6cf6-422b-9615-11fd64b50899", + "trajectory_id": "traj-74e92442-04fd-4f5a-979f-2dd81a7f08e1", + "timestamp": "2026-04-14T15:52:24.604556+00:00", + "stage": "retrieval", + "event_type": "task_received", + "payload": { + "input": "Check current system status with a tool." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-8533f7c9-696d-413d-8484-d434ffccdd02", + "trajectory_id": "traj-74e92442-04fd-4f5a-979f-2dd81a7f08e1", + "timestamp": "2026-04-14T15:52:24.604565+00:00", + "stage": "retrieval", + "event_type": "candidates_recalled", + "payload": { + "memory_ids": [ + "mem-telegram-pref" + ], + "skill_ids": [ + "skill-deploy" + ], + "tool_ids": [ + "tool-terminal" + ] + }, + "metrics": {}, + "parent_event_id": "evt-7befe34c-6cf6-422b-9615-11fd64b50899" + }, + { + "event_id": "evt-2f878de3-e77d-42f6-8252-b692a11a69ac", + "trajectory_id": "traj-74e92442-04fd-4f5a-979f-2dd81a7f08e1", + "timestamp": "2026-04-14T15:52:24.604571+00:00", + "stage": "policy", + "event_type": "action_selected", + "payload": { + "step": 1, + "decision_type": "call_tool", + "selected_ids": [ + "tool-terminal" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Task asks for current state or external action; tool use is justified.", + "estimated_cost": 0.0 + }, + "metrics": {}, + "parent_event_id": "evt-8533f7c9-696d-413d-8484-d434ffccdd02" + }, + { + "event_id": "evt-tool-traj-74e92442-04fd-4f5a-979f-2dd81a7f08e1-tool-terminal", + "trajectory_id": "traj-74e92442-04fd-4f5a-979f-2dd81a7f08e1", + "timestamp": "2026-04-14T15:52:24.604584+00:00", + "stage": "execution", + "event_type": "tool_called", + "payload": { + "tool_id": "tool-terminal", + "input": "Check current system status with a tool." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-tool-result-traj-74e92442-04fd-4f5a-979f-2dd81a7f08e1-tool-terminal", + "trajectory_id": "traj-74e92442-04fd-4f5a-979f-2dd81a7f08e1", + "timestamp": "2026-04-14T15:52:24.604588+00:00", + "stage": "execution", + "event_type": "tool_result", + "payload": { + "tool_id": "tool-terminal", + "status": "success", + "output": "demo-result-for:tool-terminal", + "error": null, + "latency_ms": 42 + }, + "metrics": {}, + "parent_event_id": null + } + ], + "outcome": { + "status": "success", + "steps": 1, + "latency_ms": 42, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Draft trajectory generated by MemabraRunner with execution hooks." + }, + "reward": { + "total": 1.032, + "components": { + "task_success": 0.8, + "retrieval_hit": 0.25, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.008, + "context_cost": 0.06, + "useful_reuse": 0.05 + } + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/trajectories/traj-77ab4624-013b-4f56-b600-b3e0cbef7a06.json b/docs/demo-artifacts/trajectories/traj-77ab4624-013b-4f56-b600-b3e0cbef7a06.json new file mode 100644 index 0000000..8092207 --- /dev/null +++ b/docs/demo-artifacts/trajectories/traj-77ab4624-013b-4f56-b600-b3e0cbef7a06.json @@ -0,0 +1,191 @@ +{ + "trajectory_id": "traj-77ab4624-013b-4f56-b600-b3e0cbef7a06", + "task": { + "task_id": "task-ad6649f7-dcca-4dd3-9521-3409c5f4e746", + "input": "Recall my saved preference from memory.", + "channel": "local", + "created_at": "2026-04-14T16:50:18.861213+00:00", + "user_id": null + }, + "context_snapshot": { + "conversation_summary": "", + "environment_summary": "", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-telegram-pref", + "type": "memory", + "title": "Telegram preference", + "summary": "Prefer plain text on Telegram.", + "triggers": [ + "telegram", + "preference", + "answer" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 0.9, + "risk": 0.0, + "tags": [ + "output" + ], + "source": "user", + "type_payload": {} + } + ], + "skill": [ + { + "id": "skill-deploy", + "type": "skill", + "title": "Deploy workflow", + "summary": "Reusable deployment workflow.", + "triggers": [ + "deploy", + "workflow", + "service" + ], + "cost": 0.0, + "confidence": 0.8, + "success_rate": 0.9, + "freshness": 0.8, + "risk": 0.0, + "tags": [ + "ops" + ], + "source": "system", + "type_payload": {} + } + ], + "tool": [ + { + "id": "tool-terminal", + "type": "tool", + "title": "terminal", + "summary": "Run terminal-style inspection commands.", + "triggers": [ + "check", + "current", + "status", + "system" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 1.0, + "risk": 0.0, + "tags": [ + "inspection" + ], + "source": "system", + "type_payload": {} + } + ] + }, + "decisions": [ + { + "step": 1, + "decision_type": "inject_memory", + "selected_ids": [ + "mem-telegram-pref" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Task likely depends on stable user/project facts.", + "estimated_cost": 0.0 + } + ], + "events": [ + { + "event_id": "evt-61f191c1-68c7-4f0b-ab9b-f22b131e2637", + "trajectory_id": "traj-77ab4624-013b-4f56-b600-b3e0cbef7a06", + "timestamp": "2026-04-14T16:50:18.861293+00:00", + "stage": "retrieval", + "event_type": "task_received", + "payload": { + "input": "Recall my saved preference from memory." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-f0fbf671-18c5-4db2-86e5-68950b030992", + "trajectory_id": "traj-77ab4624-013b-4f56-b600-b3e0cbef7a06", + "timestamp": "2026-04-14T16:50:18.861299+00:00", + "stage": "retrieval", + "event_type": "candidates_recalled", + "payload": { + "memory_ids": [ + "mem-telegram-pref" + ], + "skill_ids": [ + "skill-deploy" + ], + "tool_ids": [ + "tool-terminal" + ] + }, + "metrics": {}, + "parent_event_id": "evt-61f191c1-68c7-4f0b-ab9b-f22b131e2637" + }, + { + "event_id": "evt-168a76e7-3c64-4f65-8a74-0969942d6d94", + "trajectory_id": "traj-77ab4624-013b-4f56-b600-b3e0cbef7a06", + "timestamp": "2026-04-14T16:50:18.861304+00:00", + "stage": "policy", + "event_type": "action_selected", + "payload": { + "step": 1, + "decision_type": "inject_memory", + "selected_ids": [ + "mem-telegram-pref" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Task likely depends on stable user/project facts.", + "estimated_cost": 0.0 + }, + "metrics": {}, + "parent_event_id": "evt-f0fbf671-18c5-4db2-86e5-68950b030992" + }, + { + "event_id": "evt-memory-traj-77ab4624-013b-4f56-b600-b3e0cbef7a06-mem-telegram-pref", + "trajectory_id": "traj-77ab4624-013b-4f56-b600-b3e0cbef7a06", + "timestamp": "2026-04-14T16:50:18.861310+00:00", + "stage": "execution", + "event_type": "memory_injected", + "payload": { + "record_id": "mem-telegram-pref", + "input": "Recall my saved preference from memory." + }, + "metrics": {}, + "parent_event_id": null + } + ], + "outcome": { + "status": "success", + "steps": 1, + "latency_ms": 0, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Draft trajectory generated by MemabraRunner with execution hooks." + }, + "reward": { + "total": 1.04, + "components": { + "task_success": 0.8, + "retrieval_hit": 0.2, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.0, + "context_cost": 0.06, + "useful_reuse": 0.1 + } + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/trajectories/traj-80784ce5-fc14-4fee-9f5f-90dcec26179b.json b/docs/demo-artifacts/trajectories/traj-80784ce5-fc14-4fee-9f5f-90dcec26179b.json new file mode 100644 index 0000000..09f34e3 --- /dev/null +++ b/docs/demo-artifacts/trajectories/traj-80784ce5-fc14-4fee-9f5f-90dcec26179b.json @@ -0,0 +1,192 @@ +{ + "trajectory_id": "traj-80784ce5-fc14-4fee-9f5f-90dcec26179b", + "task": { + "task_id": "task-37fe7921-66da-4390-a9bf-31209ae8a890", + "input": "Use my telegram preference for this answer.", + "channel": "telegram", + "created_at": "2026-04-14T14:37:42.381229+00:00", + "user_id": "oza" + }, + "context_snapshot": { + "conversation_summary": "", + "environment_summary": "", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-telegram-pref", + "type": "memory", + "title": "Telegram preference", + "summary": "Prefer plain text on Telegram.", + "triggers": [ + "telegram", + "preference", + "answer" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 0.9, + "risk": 0.0, + "tags": [ + "output" + ], + "source": "user", + "type_payload": {} + } + ], + "skill": [ + { + "id": "skill-deploy", + "type": "skill", + "title": "Deploy workflow", + "summary": "Reusable deployment workflow.", + "triggers": [ + "deploy", + "workflow", + "service" + ], + "cost": 0.0, + "confidence": 0.8, + "success_rate": 0.9, + "freshness": 0.8, + "risk": 0.0, + "tags": [ + "ops" + ], + "source": "system", + "type_payload": {} + } + ], + "tool": [ + { + "id": "tool-terminal", + "type": "tool", + "title": "terminal", + "summary": "Run terminal-style inspection commands.", + "triggers": [ + "check", + "current", + "status", + "system" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 1.0, + "risk": 0.0, + "tags": [ + "inspection" + ], + "source": "system", + "type_payload": {} + } + ] + }, + "decisions": [ + { + "step": 1, + "decision_type": "load_skill", + "selected_ids": [ + "skill-deploy" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Predicted by learning router (score=1897.615).", + "estimated_cost": 0.0 + } + ], + "events": [ + { + "event_id": "evt-01cb59f2-27b0-4be7-b0f9-c878634363ba", + "trajectory_id": "traj-80784ce5-fc14-4fee-9f5f-90dcec26179b", + "timestamp": "2026-04-14T14:37:42.381299+00:00", + "stage": "retrieval", + "event_type": "task_received", + "payload": { + "input": "Use my telegram preference for this answer." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-4281fe16-c753-4024-a0ff-e82f518e16dc", + "trajectory_id": "traj-80784ce5-fc14-4fee-9f5f-90dcec26179b", + "timestamp": "2026-04-14T14:37:42.381305+00:00", + "stage": "retrieval", + "event_type": "candidates_recalled", + "payload": { + "memory_ids": [ + "mem-telegram-pref" + ], + "skill_ids": [ + "skill-deploy" + ], + "tool_ids": [ + "tool-terminal" + ] + }, + "metrics": {}, + "parent_event_id": "evt-01cb59f2-27b0-4be7-b0f9-c878634363ba" + }, + { + "event_id": "evt-ccc6afd4-82c2-4774-ba2a-732ffa9296a4", + "trajectory_id": "traj-80784ce5-fc14-4fee-9f5f-90dcec26179b", + "timestamp": "2026-04-14T14:37:42.381309+00:00", + "stage": "policy", + "event_type": "action_selected", + "payload": { + "step": 1, + "decision_type": "load_skill", + "selected_ids": [ + "skill-deploy" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Predicted by learning router (score=1897.615).", + "estimated_cost": 0.0 + }, + "metrics": {}, + "parent_event_id": "evt-4281fe16-c753-4024-a0ff-e82f518e16dc" + }, + { + "event_id": "evt-skill-traj-80784ce5-fc14-4fee-9f5f-90dcec26179b-skill-deploy", + "trajectory_id": "traj-80784ce5-fc14-4fee-9f5f-90dcec26179b", + "timestamp": "2026-04-14T14:37:42.381314+00:00", + "stage": "execution", + "event_type": "skill_loaded", + "payload": { + "skill_id": "skill-deploy", + "input": "Use my telegram preference for this answer.", + "instructions": "Demo skill payload loaded successfully." + }, + "metrics": {}, + "parent_event_id": null + } + ], + "outcome": { + "status": "success", + "steps": 1, + "latency_ms": 0, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Draft trajectory generated by MemabraRunner with execution hooks." + }, + "reward": { + "total": 1.04, + "components": { + "task_success": 0.8, + "retrieval_hit": 0.2, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.0, + "context_cost": 0.06, + "useful_reuse": 0.1 + } + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/trajectories/traj-819443a2-79ea-48b7-a543-8bb7356dba36.json b/docs/demo-artifacts/trajectories/traj-819443a2-79ea-48b7-a543-8bb7356dba36.json new file mode 100644 index 0000000..467973e --- /dev/null +++ b/docs/demo-artifacts/trajectories/traj-819443a2-79ea-48b7-a543-8bb7356dba36.json @@ -0,0 +1,191 @@ +{ + "trajectory_id": "traj-819443a2-79ea-48b7-a543-8bb7356dba36", + "task": { + "task_id": "task-8e991184-4d09-47bd-9a70-2f3d591d875c", + "input": "Use my telegram preference for this answer.", + "channel": "telegram", + "created_at": "2026-04-14T14:37:42.377206+00:00", + "user_id": "oza" + }, + "context_snapshot": { + "conversation_summary": "", + "environment_summary": "", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-telegram-pref", + "type": "memory", + "title": "Telegram preference", + "summary": "Prefer plain text on Telegram.", + "triggers": [ + "telegram", + "preference", + "answer" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 0.9, + "risk": 0.0, + "tags": [ + "output" + ], + "source": "user", + "type_payload": {} + } + ], + "skill": [ + { + "id": "skill-deploy", + "type": "skill", + "title": "Deploy workflow", + "summary": "Reusable deployment workflow.", + "triggers": [ + "deploy", + "workflow", + "service" + ], + "cost": 0.0, + "confidence": 0.8, + "success_rate": 0.9, + "freshness": 0.8, + "risk": 0.0, + "tags": [ + "ops" + ], + "source": "system", + "type_payload": {} + } + ], + "tool": [ + { + "id": "tool-terminal", + "type": "tool", + "title": "terminal", + "summary": "Run terminal-style inspection commands.", + "triggers": [ + "check", + "current", + "status", + "system" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 1.0, + "risk": 0.0, + "tags": [ + "inspection" + ], + "source": "system", + "type_payload": {} + } + ] + }, + "decisions": [ + { + "step": 1, + "decision_type": "inject_memory", + "selected_ids": [ + "mem-telegram-pref" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Task likely depends on stable user/project facts.", + "estimated_cost": 0.0 + } + ], + "events": [ + { + "event_id": "evt-79db8272-c394-40e1-b0d3-c905c305ea26", + "trajectory_id": "traj-819443a2-79ea-48b7-a543-8bb7356dba36", + "timestamp": "2026-04-14T14:37:42.377281+00:00", + "stage": "retrieval", + "event_type": "task_received", + "payload": { + "input": "Use my telegram preference for this answer." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-22367f19-007b-49cd-9ac4-30bbcc77e8a2", + "trajectory_id": "traj-819443a2-79ea-48b7-a543-8bb7356dba36", + "timestamp": "2026-04-14T14:37:42.377287+00:00", + "stage": "retrieval", + "event_type": "candidates_recalled", + "payload": { + "memory_ids": [ + "mem-telegram-pref" + ], + "skill_ids": [ + "skill-deploy" + ], + "tool_ids": [ + "tool-terminal" + ] + }, + "metrics": {}, + "parent_event_id": "evt-79db8272-c394-40e1-b0d3-c905c305ea26" + }, + { + "event_id": "evt-84fe05fe-8ccc-4782-8cd2-28d56a659658", + "trajectory_id": "traj-819443a2-79ea-48b7-a543-8bb7356dba36", + "timestamp": "2026-04-14T14:37:42.377292+00:00", + "stage": "policy", + "event_type": "action_selected", + "payload": { + "step": 1, + "decision_type": "inject_memory", + "selected_ids": [ + "mem-telegram-pref" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Task likely depends on stable user/project facts.", + "estimated_cost": 0.0 + }, + "metrics": {}, + "parent_event_id": "evt-22367f19-007b-49cd-9ac4-30bbcc77e8a2" + }, + { + "event_id": "evt-memory-traj-819443a2-79ea-48b7-a543-8bb7356dba36-mem-telegram-pref", + "trajectory_id": "traj-819443a2-79ea-48b7-a543-8bb7356dba36", + "timestamp": "2026-04-14T14:37:42.377297+00:00", + "stage": "execution", + "event_type": "memory_injected", + "payload": { + "record_id": "mem-telegram-pref", + "input": "Use my telegram preference for this answer." + }, + "metrics": {}, + "parent_event_id": null + } + ], + "outcome": { + "status": "success", + "steps": 1, + "latency_ms": 0, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Draft trajectory generated by MemabraRunner with execution hooks." + }, + "reward": { + "total": 1.04, + "components": { + "task_success": 0.8, + "retrieval_hit": 0.2, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.0, + "context_cost": 0.06, + "useful_reuse": 0.1 + } + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/trajectories/traj-9144cbc3-1ccf-4660-aad9-8db5797461eb.json b/docs/demo-artifacts/trajectories/traj-9144cbc3-1ccf-4660-aad9-8db5797461eb.json new file mode 100644 index 0000000..d398499 --- /dev/null +++ b/docs/demo-artifacts/trajectories/traj-9144cbc3-1ccf-4660-aad9-8db5797461eb.json @@ -0,0 +1,192 @@ +{ + "trajectory_id": "traj-9144cbc3-1ccf-4660-aad9-8db5797461eb", + "task": { + "task_id": "task-57677ff6-710a-478e-9a5d-e1367db05212", + "input": "Deploy this service with the usual workflow.", + "channel": "telegram", + "created_at": "2026-04-14T15:27:38.514525+00:00", + "user_id": "oza" + }, + "context_snapshot": { + "conversation_summary": "", + "environment_summary": "", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-telegram-pref", + "type": "memory", + "title": "Telegram preference", + "summary": "Prefer plain text on Telegram.", + "triggers": [ + "telegram", + "preference", + "answer" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 0.9, + "risk": 0.0, + "tags": [ + "output" + ], + "source": "user", + "type_payload": {} + } + ], + "skill": [ + { + "id": "skill-deploy", + "type": "skill", + "title": "Deploy workflow", + "summary": "Reusable deployment workflow.", + "triggers": [ + "deploy", + "workflow", + "service" + ], + "cost": 0.0, + "confidence": 0.8, + "success_rate": 0.9, + "freshness": 0.8, + "risk": 0.0, + "tags": [ + "ops" + ], + "source": "system", + "type_payload": {} + } + ], + "tool": [ + { + "id": "tool-terminal", + "type": "tool", + "title": "terminal", + "summary": "Run terminal-style inspection commands.", + "triggers": [ + "check", + "current", + "status", + "system" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 1.0, + "risk": 0.0, + "tags": [ + "inspection" + ], + "source": "system", + "type_payload": {} + } + ] + }, + "decisions": [ + { + "step": 1, + "decision_type": "load_skill", + "selected_ids": [ + "skill-deploy" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Task resembles a reusable procedure; load a skill before action.", + "estimated_cost": 0.0 + } + ], + "events": [ + { + "event_id": "evt-fce4e540-2400-45f8-8050-50f7631422e4", + "trajectory_id": "traj-9144cbc3-1ccf-4660-aad9-8db5797461eb", + "timestamp": "2026-04-14T15:27:38.514602+00:00", + "stage": "retrieval", + "event_type": "task_received", + "payload": { + "input": "Deploy this service with the usual workflow." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-eef1d203-79d4-4037-ae0b-6dff74e035f5", + "trajectory_id": "traj-9144cbc3-1ccf-4660-aad9-8db5797461eb", + "timestamp": "2026-04-14T15:27:38.514609+00:00", + "stage": "retrieval", + "event_type": "candidates_recalled", + "payload": { + "memory_ids": [ + "mem-telegram-pref" + ], + "skill_ids": [ + "skill-deploy" + ], + "tool_ids": [ + "tool-terminal" + ] + }, + "metrics": {}, + "parent_event_id": "evt-fce4e540-2400-45f8-8050-50f7631422e4" + }, + { + "event_id": "evt-da150fe5-beff-45b0-a67d-9860205a9690", + "trajectory_id": "traj-9144cbc3-1ccf-4660-aad9-8db5797461eb", + "timestamp": "2026-04-14T15:27:38.514615+00:00", + "stage": "policy", + "event_type": "action_selected", + "payload": { + "step": 1, + "decision_type": "load_skill", + "selected_ids": [ + "skill-deploy" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Task resembles a reusable procedure; load a skill before action.", + "estimated_cost": 0.0 + }, + "metrics": {}, + "parent_event_id": "evt-eef1d203-79d4-4037-ae0b-6dff74e035f5" + }, + { + "event_id": "evt-skill-traj-9144cbc3-1ccf-4660-aad9-8db5797461eb-skill-deploy", + "trajectory_id": "traj-9144cbc3-1ccf-4660-aad9-8db5797461eb", + "timestamp": "2026-04-14T15:27:38.514623+00:00", + "stage": "execution", + "event_type": "skill_loaded", + "payload": { + "skill_id": "skill-deploy", + "input": "Deploy this service with the usual workflow.", + "instructions": "Demo skill payload loaded successfully." + }, + "metrics": {}, + "parent_event_id": null + } + ], + "outcome": { + "status": "success", + "steps": 1, + "latency_ms": 0, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Draft trajectory generated by MemabraRunner with execution hooks." + }, + "reward": { + "total": 1.04, + "components": { + "task_success": 0.8, + "retrieval_hit": 0.2, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.0, + "context_cost": 0.06, + "useful_reuse": 0.1 + } + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/trajectories/traj-9190707c-5486-4266-a6c8-32f34c6c63ec.json b/docs/demo-artifacts/trajectories/traj-9190707c-5486-4266-a6c8-32f34c6c63ec.json new file mode 100644 index 0000000..112e195 --- /dev/null +++ b/docs/demo-artifacts/trajectories/traj-9190707c-5486-4266-a6c8-32f34c6c63ec.json @@ -0,0 +1,191 @@ +{ + "trajectory_id": "traj-9190707c-5486-4266-a6c8-32f34c6c63ec", + "task": { + "task_id": "task-9f58c7ff-0bfb-4a46-bfbc-94b72b454f44", + "input": "Use my telegram preference for this answer.", + "channel": "telegram", + "created_at": "2026-04-14T14:37:42.379938+00:00", + "user_id": "oza" + }, + "context_snapshot": { + "conversation_summary": "", + "environment_summary": "", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-telegram-pref", + "type": "memory", + "title": "Telegram preference", + "summary": "Prefer plain text on Telegram.", + "triggers": [ + "telegram", + "preference", + "answer" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 0.9, + "risk": 0.0, + "tags": [ + "output" + ], + "source": "user", + "type_payload": {} + } + ], + "skill": [ + { + "id": "skill-deploy", + "type": "skill", + "title": "Deploy workflow", + "summary": "Reusable deployment workflow.", + "triggers": [ + "deploy", + "workflow", + "service" + ], + "cost": 0.0, + "confidence": 0.8, + "success_rate": 0.9, + "freshness": 0.8, + "risk": 0.0, + "tags": [ + "ops" + ], + "source": "system", + "type_payload": {} + } + ], + "tool": [ + { + "id": "tool-terminal", + "type": "tool", + "title": "terminal", + "summary": "Run terminal-style inspection commands.", + "triggers": [ + "check", + "current", + "status", + "system" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 1.0, + "risk": 0.0, + "tags": [ + "inspection" + ], + "source": "system", + "type_payload": {} + } + ] + }, + "decisions": [ + { + "step": 1, + "decision_type": "inject_memory", + "selected_ids": [ + "mem-telegram-pref" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Task likely depends on stable user/project facts.", + "estimated_cost": 0.0 + } + ], + "events": [ + { + "event_id": "evt-1e3b099e-dc40-45e4-9710-3d7f96dc459c", + "trajectory_id": "traj-9190707c-5486-4266-a6c8-32f34c6c63ec", + "timestamp": "2026-04-14T14:37:42.379999+00:00", + "stage": "retrieval", + "event_type": "task_received", + "payload": { + "input": "Use my telegram preference for this answer." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-95d17ab2-a0af-44e6-97db-55600c5d0517", + "trajectory_id": "traj-9190707c-5486-4266-a6c8-32f34c6c63ec", + "timestamp": "2026-04-14T14:37:42.380024+00:00", + "stage": "retrieval", + "event_type": "candidates_recalled", + "payload": { + "memory_ids": [ + "mem-telegram-pref" + ], + "skill_ids": [ + "skill-deploy" + ], + "tool_ids": [ + "tool-terminal" + ] + }, + "metrics": {}, + "parent_event_id": "evt-1e3b099e-dc40-45e4-9710-3d7f96dc459c" + }, + { + "event_id": "evt-cef79d76-9bcf-41c7-a430-13e18d46e95f", + "trajectory_id": "traj-9190707c-5486-4266-a6c8-32f34c6c63ec", + "timestamp": "2026-04-14T14:37:42.380029+00:00", + "stage": "policy", + "event_type": "action_selected", + "payload": { + "step": 1, + "decision_type": "inject_memory", + "selected_ids": [ + "mem-telegram-pref" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Task likely depends on stable user/project facts.", + "estimated_cost": 0.0 + }, + "metrics": {}, + "parent_event_id": "evt-95d17ab2-a0af-44e6-97db-55600c5d0517" + }, + { + "event_id": "evt-memory-traj-9190707c-5486-4266-a6c8-32f34c6c63ec-mem-telegram-pref", + "trajectory_id": "traj-9190707c-5486-4266-a6c8-32f34c6c63ec", + "timestamp": "2026-04-14T14:37:42.380034+00:00", + "stage": "execution", + "event_type": "memory_injected", + "payload": { + "record_id": "mem-telegram-pref", + "input": "Use my telegram preference for this answer." + }, + "metrics": {}, + "parent_event_id": null + } + ], + "outcome": { + "status": "success", + "steps": 1, + "latency_ms": 0, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Draft trajectory generated by MemabraRunner with execution hooks." + }, + "reward": { + "total": 1.04, + "components": { + "task_success": 0.8, + "retrieval_hit": 0.2, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.0, + "context_cost": 0.06, + "useful_reuse": 0.1 + } + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/trajectories/traj-9edc5088-09cc-42d6-a160-cede5357f535.json b/docs/demo-artifacts/trajectories/traj-9edc5088-09cc-42d6-a160-cede5357f535.json new file mode 100644 index 0000000..04aa383 --- /dev/null +++ b/docs/demo-artifacts/trajectories/traj-9edc5088-09cc-42d6-a160-cede5357f535.json @@ -0,0 +1,207 @@ +{ + "trajectory_id": "traj-9edc5088-09cc-42d6-a160-cede5357f535", + "task": { + "task_id": "task-18b8251b-4a68-45e1-93ba-645fe21a279f", + "input": "Run the deploy workflow skill.", + "channel": "local", + "created_at": "2026-04-14T15:52:24.603850+00:00", + "user_id": null + }, + "context_snapshot": { + "conversation_summary": "", + "environment_summary": "", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-telegram-pref", + "type": "memory", + "title": "Telegram preference", + "summary": "Prefer plain text on Telegram.", + "triggers": [ + "telegram", + "preference", + "answer" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 0.9, + "risk": 0.0, + "tags": [ + "output" + ], + "source": "user", + "type_payload": {} + } + ], + "skill": [ + { + "id": "skill-deploy", + "type": "skill", + "title": "Deploy workflow", + "summary": "Reusable deployment workflow.", + "triggers": [ + "deploy", + "workflow", + "service" + ], + "cost": 0.0, + "confidence": 0.8, + "success_rate": 0.9, + "freshness": 0.8, + "risk": 0.0, + "tags": [ + "ops" + ], + "source": "system", + "type_payload": {} + } + ], + "tool": [ + { + "id": "tool-terminal", + "type": "tool", + "title": "terminal", + "summary": "Run terminal-style inspection commands.", + "triggers": [ + "check", + "current", + "status", + "system" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 1.0, + "risk": 0.0, + "tags": [ + "inspection" + ], + "source": "system", + "type_payload": {} + } + ] + }, + "decisions": [ + { + "step": 1, + "decision_type": "call_tool", + "selected_ids": [ + "tool-terminal" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Task asks for current state or external action; tool use is justified.", + "estimated_cost": 0.0 + } + ], + "events": [ + { + "event_id": "evt-33bd0017-cf1b-44ac-892b-c2004bc44c1a", + "trajectory_id": "traj-9edc5088-09cc-42d6-a160-cede5357f535", + "timestamp": "2026-04-14T15:52:24.603951+00:00", + "stage": "retrieval", + "event_type": "task_received", + "payload": { + "input": "Run the deploy workflow skill." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-de288e29-228d-46d4-a657-34edae35fea4", + "trajectory_id": "traj-9edc5088-09cc-42d6-a160-cede5357f535", + "timestamp": "2026-04-14T15:52:24.603961+00:00", + "stage": "retrieval", + "event_type": "candidates_recalled", + "payload": { + "memory_ids": [ + "mem-telegram-pref" + ], + "skill_ids": [ + "skill-deploy" + ], + "tool_ids": [ + "tool-terminal" + ] + }, + "metrics": {}, + "parent_event_id": "evt-33bd0017-cf1b-44ac-892b-c2004bc44c1a" + }, + { + "event_id": "evt-256cd272-bcee-48e2-b36b-a4048b6aef3e", + "trajectory_id": "traj-9edc5088-09cc-42d6-a160-cede5357f535", + "timestamp": "2026-04-14T15:52:24.603968+00:00", + "stage": "policy", + "event_type": "action_selected", + "payload": { + "step": 1, + "decision_type": "call_tool", + "selected_ids": [ + "tool-terminal" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Task asks for current state or external action; tool use is justified.", + "estimated_cost": 0.0 + }, + "metrics": {}, + "parent_event_id": "evt-de288e29-228d-46d4-a657-34edae35fea4" + }, + { + "event_id": "evt-tool-traj-9edc5088-09cc-42d6-a160-cede5357f535-tool-terminal", + "trajectory_id": "traj-9edc5088-09cc-42d6-a160-cede5357f535", + "timestamp": "2026-04-14T15:52:24.603984+00:00", + "stage": "execution", + "event_type": "tool_called", + "payload": { + "tool_id": "tool-terminal", + "input": "Run the deploy workflow skill." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-tool-result-traj-9edc5088-09cc-42d6-a160-cede5357f535-tool-terminal", + "trajectory_id": "traj-9edc5088-09cc-42d6-a160-cede5357f535", + "timestamp": "2026-04-14T15:52:24.603990+00:00", + "stage": "execution", + "event_type": "tool_result", + "payload": { + "tool_id": "tool-terminal", + "status": "success", + "output": "demo-result-for:tool-terminal", + "error": null, + "latency_ms": 42 + }, + "metrics": {}, + "parent_event_id": null + } + ], + "outcome": { + "status": "success", + "steps": 1, + "latency_ms": 42, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Draft trajectory generated by MemabraRunner with execution hooks." + }, + "reward": { + "total": 1.032, + "components": { + "task_success": 0.8, + "retrieval_hit": 0.25, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.008, + "context_cost": 0.06, + "useful_reuse": 0.05 + } + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/trajectories/traj-adb05c91-4c0c-493a-af84-517efea3f406.json b/docs/demo-artifacts/trajectories/traj-adb05c91-4c0c-493a-af84-517efea3f406.json new file mode 100644 index 0000000..ef0dc41 --- /dev/null +++ b/docs/demo-artifacts/trajectories/traj-adb05c91-4c0c-493a-af84-517efea3f406.json @@ -0,0 +1,191 @@ +{ + "trajectory_id": "traj-adb05c91-4c0c-493a-af84-517efea3f406", + "task": { + "task_id": "task-66d9a459-4bad-40a5-beda-a9cb30f2e790", + "input": "Use my telegram preference for this answer.", + "channel": "telegram", + "created_at": "2026-04-14T15:27:38.517870+00:00", + "user_id": "oza" + }, + "context_snapshot": { + "conversation_summary": "", + "environment_summary": "", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-telegram-pref", + "type": "memory", + "title": "Telegram preference", + "summary": "Prefer plain text on Telegram.", + "triggers": [ + "telegram", + "preference", + "answer" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 0.9, + "risk": 0.0, + "tags": [ + "output" + ], + "source": "user", + "type_payload": {} + } + ], + "skill": [ + { + "id": "skill-deploy", + "type": "skill", + "title": "Deploy workflow", + "summary": "Reusable deployment workflow.", + "triggers": [ + "deploy", + "workflow", + "service" + ], + "cost": 0.0, + "confidence": 0.8, + "success_rate": 0.9, + "freshness": 0.8, + "risk": 0.0, + "tags": [ + "ops" + ], + "source": "system", + "type_payload": {} + } + ], + "tool": [ + { + "id": "tool-terminal", + "type": "tool", + "title": "terminal", + "summary": "Run terminal-style inspection commands.", + "triggers": [ + "check", + "current", + "status", + "system" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 1.0, + "risk": 0.0, + "tags": [ + "inspection" + ], + "source": "system", + "type_payload": {} + } + ] + }, + "decisions": [ + { + "step": 1, + "decision_type": "inject_memory", + "selected_ids": [ + "mem-telegram-pref" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Task likely depends on stable user/project facts.", + "estimated_cost": 0.0 + } + ], + "events": [ + { + "event_id": "evt-57882c3b-f081-4cb6-b622-98594bfd7b82", + "trajectory_id": "traj-adb05c91-4c0c-493a-af84-517efea3f406", + "timestamp": "2026-04-14T15:27:38.517938+00:00", + "stage": "retrieval", + "event_type": "task_received", + "payload": { + "input": "Use my telegram preference for this answer." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-6eafb4ae-7960-4f17-a928-77834f432cbb", + "trajectory_id": "traj-adb05c91-4c0c-493a-af84-517efea3f406", + "timestamp": "2026-04-14T15:27:38.517945+00:00", + "stage": "retrieval", + "event_type": "candidates_recalled", + "payload": { + "memory_ids": [ + "mem-telegram-pref" + ], + "skill_ids": [ + "skill-deploy" + ], + "tool_ids": [ + "tool-terminal" + ] + }, + "metrics": {}, + "parent_event_id": "evt-57882c3b-f081-4cb6-b622-98594bfd7b82" + }, + { + "event_id": "evt-b90de7a7-83a3-4bed-b63d-bf07ba3fc06a", + "trajectory_id": "traj-adb05c91-4c0c-493a-af84-517efea3f406", + "timestamp": "2026-04-14T15:27:38.517950+00:00", + "stage": "policy", + "event_type": "action_selected", + "payload": { + "step": 1, + "decision_type": "inject_memory", + "selected_ids": [ + "mem-telegram-pref" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Task likely depends on stable user/project facts.", + "estimated_cost": 0.0 + }, + "metrics": {}, + "parent_event_id": "evt-6eafb4ae-7960-4f17-a928-77834f432cbb" + }, + { + "event_id": "evt-memory-traj-adb05c91-4c0c-493a-af84-517efea3f406-mem-telegram-pref", + "trajectory_id": "traj-adb05c91-4c0c-493a-af84-517efea3f406", + "timestamp": "2026-04-14T15:27:38.517955+00:00", + "stage": "execution", + "event_type": "memory_injected", + "payload": { + "record_id": "mem-telegram-pref", + "input": "Use my telegram preference for this answer." + }, + "metrics": {}, + "parent_event_id": null + } + ], + "outcome": { + "status": "success", + "steps": 1, + "latency_ms": 0, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Draft trajectory generated by MemabraRunner with execution hooks." + }, + "reward": { + "total": 1.04, + "components": { + "task_success": 0.8, + "retrieval_hit": 0.2, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.0, + "context_cost": 0.06, + "useful_reuse": 0.1 + } + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/trajectories/traj-affbeb5b-eb52-40fd-94cb-48b7c374f1fc.json b/docs/demo-artifacts/trajectories/traj-affbeb5b-eb52-40fd-94cb-48b7c374f1fc.json new file mode 100644 index 0000000..4f4b335 --- /dev/null +++ b/docs/demo-artifacts/trajectories/traj-affbeb5b-eb52-40fd-94cb-48b7c374f1fc.json @@ -0,0 +1,186 @@ +{ + "trajectory_id": "traj-affbeb5b-eb52-40fd-94cb-48b7c374f1fc", + "task": { + "task_id": "task-c88d23cc-88f6-4352-a506-e37187a0e28a", + "input": "Deploy this service with the usual workflow.", + "channel": "telegram", + "created_at": "2026-04-14T06:53:08.732451+00:00", + "user_id": "oza" + }, + "context_snapshot": { + "conversation_summary": "", + "environment_summary": "", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-telegram-pref", + "type": "memory", + "title": "Telegram preference", + "summary": "Prefer plain text on Telegram.", + "triggers": [ + "telegram", + "preference", + "answer" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 0.9, + "risk": 0.0, + "tags": [ + "output" + ], + "source": "user", + "type_payload": {} + } + ], + "skill": [ + { + "id": "skill-deploy", + "type": "skill", + "title": "Deploy workflow", + "summary": "Reusable deployment workflow.", + "triggers": [ + "deploy", + "workflow", + "service" + ], + "cost": 0.0, + "confidence": 0.8, + "success_rate": 0.9, + "freshness": 0.8, + "risk": 0.0, + "tags": [ + "ops" + ], + "source": "system", + "type_payload": {} + } + ], + "tool": [ + { + "id": "tool-terminal", + "type": "tool", + "title": "terminal", + "summary": "Run terminal-style inspection commands.", + "triggers": [ + "check", + "current", + "status", + "system" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 1.0, + "risk": 0.0, + "tags": [ + "inspection" + ], + "source": "system", + "type_payload": {} + } + ] + }, + "decisions": [ + { + "step": 1, + "decision_type": "load_skill", + "selected_ids": [ + "skill-deploy" + ], + "rejected_ids": [], + "rationale": "Task resembles a reusable procedure; load a skill before action.", + "estimated_cost": 0.0 + } + ], + "events": [ + { + "event_id": "evt-56b47bb2-7cd9-4d1a-9364-b2b6c2b82759", + "trajectory_id": "traj-affbeb5b-eb52-40fd-94cb-48b7c374f1fc", + "timestamp": "2026-04-14T06:53:08.732515+00:00", + "stage": "retrieval", + "event_type": "task_received", + "payload": { + "input": "Deploy this service with the usual workflow." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-62bc72e7-4b3f-4a72-a98e-1ad5bf86aaa4", + "trajectory_id": "traj-affbeb5b-eb52-40fd-94cb-48b7c374f1fc", + "timestamp": "2026-04-14T06:53:08.732521+00:00", + "stage": "retrieval", + "event_type": "candidates_recalled", + "payload": { + "memory_ids": [ + "mem-telegram-pref" + ], + "skill_ids": [ + "skill-deploy" + ], + "tool_ids": [ + "tool-terminal" + ] + }, + "metrics": {}, + "parent_event_id": "evt-56b47bb2-7cd9-4d1a-9364-b2b6c2b82759" + }, + { + "event_id": "evt-23968c32-845c-4fb2-86bb-723d70dfec80", + "trajectory_id": "traj-affbeb5b-eb52-40fd-94cb-48b7c374f1fc", + "timestamp": "2026-04-14T06:53:08.732525+00:00", + "stage": "policy", + "event_type": "action_selected", + "payload": { + "step": 1, + "decision_type": "load_skill", + "selected_ids": [ + "skill-deploy" + ], + "rejected_ids": [], + "rationale": "Task resembles a reusable procedure; load a skill before action.", + "estimated_cost": 0.0 + }, + "metrics": {}, + "parent_event_id": "evt-62bc72e7-4b3f-4a72-a98e-1ad5bf86aaa4" + }, + { + "event_id": "evt-skill-traj-affbeb5b-eb52-40fd-94cb-48b7c374f1fc-skill-deploy", + "trajectory_id": "traj-affbeb5b-eb52-40fd-94cb-48b7c374f1fc", + "timestamp": "2026-04-14T06:53:08.732531+00:00", + "stage": "execution", + "event_type": "skill_loaded", + "payload": { + "skill_id": "skill-deploy", + "input": "Deploy this service with the usual workflow.", + "instructions": "Demo skill payload loaded successfully." + }, + "metrics": {}, + "parent_event_id": null + } + ], + "outcome": { + "status": "success", + "steps": 1, + "latency_ms": 0, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Draft trajectory generated by MemabraRunner with execution hooks." + }, + "reward": { + "total": 1.1, + "components": { + "task_success": 0.8, + "retrieval_hit": 0.2, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.0, + "context_cost": 0.0, + "useful_reuse": 0.1 + } + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/trajectories/traj-b786c15f-388d-4228-9da4-c9e82b61570a.json b/docs/demo-artifacts/trajectories/traj-b786c15f-388d-4228-9da4-c9e82b61570a.json new file mode 100644 index 0000000..24c14f5 --- /dev/null +++ b/docs/demo-artifacts/trajectories/traj-b786c15f-388d-4228-9da4-c9e82b61570a.json @@ -0,0 +1,191 @@ +{ + "trajectory_id": "traj-b786c15f-388d-4228-9da4-c9e82b61570a", + "task": { + "task_id": "task-920b26df-8e03-47b3-af48-99454d142e90", + "input": "Recall my saved preference from memory.", + "channel": "local", + "created_at": "2026-04-14T15:52:24.603298+00:00", + "user_id": null + }, + "context_snapshot": { + "conversation_summary": "", + "environment_summary": "", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-telegram-pref", + "type": "memory", + "title": "Telegram preference", + "summary": "Prefer plain text on Telegram.", + "triggers": [ + "telegram", + "preference", + "answer" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 0.9, + "risk": 0.0, + "tags": [ + "output" + ], + "source": "user", + "type_payload": {} + } + ], + "skill": [ + { + "id": "skill-deploy", + "type": "skill", + "title": "Deploy workflow", + "summary": "Reusable deployment workflow.", + "triggers": [ + "deploy", + "workflow", + "service" + ], + "cost": 0.0, + "confidence": 0.8, + "success_rate": 0.9, + "freshness": 0.8, + "risk": 0.0, + "tags": [ + "ops" + ], + "source": "system", + "type_payload": {} + } + ], + "tool": [ + { + "id": "tool-terminal", + "type": "tool", + "title": "terminal", + "summary": "Run terminal-style inspection commands.", + "triggers": [ + "check", + "current", + "status", + "system" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 1.0, + "risk": 0.0, + "tags": [ + "inspection" + ], + "source": "system", + "type_payload": {} + } + ] + }, + "decisions": [ + { + "step": 1, + "decision_type": "inject_memory", + "selected_ids": [ + "mem-telegram-pref" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Task likely depends on stable user/project facts.", + "estimated_cost": 0.0 + } + ], + "events": [ + { + "event_id": "evt-795ad519-4e78-4fdd-b1a9-3e1e2b2cdea0", + "trajectory_id": "traj-b786c15f-388d-4228-9da4-c9e82b61570a", + "timestamp": "2026-04-14T15:52:24.603384+00:00", + "stage": "retrieval", + "event_type": "task_received", + "payload": { + "input": "Recall my saved preference from memory." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-1fbe3cfc-ed78-40f6-b0d9-25ccd14a0110", + "trajectory_id": "traj-b786c15f-388d-4228-9da4-c9e82b61570a", + "timestamp": "2026-04-14T15:52:24.603390+00:00", + "stage": "retrieval", + "event_type": "candidates_recalled", + "payload": { + "memory_ids": [ + "mem-telegram-pref" + ], + "skill_ids": [ + "skill-deploy" + ], + "tool_ids": [ + "tool-terminal" + ] + }, + "metrics": {}, + "parent_event_id": "evt-795ad519-4e78-4fdd-b1a9-3e1e2b2cdea0" + }, + { + "event_id": "evt-a57f0922-dbfe-424a-a704-2a382ffa219b", + "trajectory_id": "traj-b786c15f-388d-4228-9da4-c9e82b61570a", + "timestamp": "2026-04-14T15:52:24.603396+00:00", + "stage": "policy", + "event_type": "action_selected", + "payload": { + "step": 1, + "decision_type": "inject_memory", + "selected_ids": [ + "mem-telegram-pref" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Task likely depends on stable user/project facts.", + "estimated_cost": 0.0 + }, + "metrics": {}, + "parent_event_id": "evt-1fbe3cfc-ed78-40f6-b0d9-25ccd14a0110" + }, + { + "event_id": "evt-memory-traj-b786c15f-388d-4228-9da4-c9e82b61570a-mem-telegram-pref", + "trajectory_id": "traj-b786c15f-388d-4228-9da4-c9e82b61570a", + "timestamp": "2026-04-14T15:52:24.603401+00:00", + "stage": "execution", + "event_type": "memory_injected", + "payload": { + "record_id": "mem-telegram-pref", + "input": "Recall my saved preference from memory." + }, + "metrics": {}, + "parent_event_id": null + } + ], + "outcome": { + "status": "success", + "steps": 1, + "latency_ms": 0, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Draft trajectory generated by MemabraRunner with execution hooks." + }, + "reward": { + "total": 1.04, + "components": { + "task_success": 0.8, + "retrieval_hit": 0.2, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.0, + "context_cost": 0.06, + "useful_reuse": 0.1 + } + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/trajectories/traj-bcad8fa2-ffd3-4e5b-9ddb-720f3898826e.json b/docs/demo-artifacts/trajectories/traj-bcad8fa2-ffd3-4e5b-9ddb-720f3898826e.json new file mode 100644 index 0000000..5f2bca2 --- /dev/null +++ b/docs/demo-artifacts/trajectories/traj-bcad8fa2-ffd3-4e5b-9ddb-720f3898826e.json @@ -0,0 +1,192 @@ +{ + "trajectory_id": "traj-bcad8fa2-ffd3-4e5b-9ddb-720f3898826e", + "task": { + "task_id": "task-35b31642-86af-4e2c-a255-cdbe19659101", + "input": "Deploy this service with the usual workflow.", + "channel": "local", + "created_at": "2026-04-14T14:37:42.382074+00:00", + "user_id": "oza" + }, + "context_snapshot": { + "conversation_summary": "", + "environment_summary": "", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-telegram-pref", + "type": "memory", + "title": "Telegram preference", + "summary": "Prefer plain text on Telegram.", + "triggers": [ + "telegram", + "preference", + "answer" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 0.9, + "risk": 0.0, + "tags": [ + "output" + ], + "source": "user", + "type_payload": {} + } + ], + "skill": [ + { + "id": "skill-deploy", + "type": "skill", + "title": "Deploy workflow", + "summary": "Reusable deployment workflow.", + "triggers": [ + "deploy", + "workflow", + "service" + ], + "cost": 0.0, + "confidence": 0.8, + "success_rate": 0.9, + "freshness": 0.8, + "risk": 0.0, + "tags": [ + "ops" + ], + "source": "system", + "type_payload": {} + } + ], + "tool": [ + { + "id": "tool-terminal", + "type": "tool", + "title": "terminal", + "summary": "Run terminal-style inspection commands.", + "triggers": [ + "check", + "current", + "status", + "system" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 1.0, + "risk": 0.0, + "tags": [ + "inspection" + ], + "source": "system", + "type_payload": {} + } + ] + }, + "decisions": [ + { + "step": 1, + "decision_type": "load_skill", + "selected_ids": [ + "skill-deploy" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Predicted by learning router (score=1941.615).", + "estimated_cost": 0.0 + } + ], + "events": [ + { + "event_id": "evt-8f072b70-4161-46fc-bede-cceb930d4cc2", + "trajectory_id": "traj-bcad8fa2-ffd3-4e5b-9ddb-720f3898826e", + "timestamp": "2026-04-14T14:37:42.382140+00:00", + "stage": "retrieval", + "event_type": "task_received", + "payload": { + "input": "Deploy this service with the usual workflow." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-fc15daf4-f738-455e-8b12-39143b3c3d6c", + "trajectory_id": "traj-bcad8fa2-ffd3-4e5b-9ddb-720f3898826e", + "timestamp": "2026-04-14T14:37:42.382146+00:00", + "stage": "retrieval", + "event_type": "candidates_recalled", + "payload": { + "memory_ids": [ + "mem-telegram-pref" + ], + "skill_ids": [ + "skill-deploy" + ], + "tool_ids": [ + "tool-terminal" + ] + }, + "metrics": {}, + "parent_event_id": "evt-8f072b70-4161-46fc-bede-cceb930d4cc2" + }, + { + "event_id": "evt-d899c751-6157-4548-893e-b766eeafeb3d", + "trajectory_id": "traj-bcad8fa2-ffd3-4e5b-9ddb-720f3898826e", + "timestamp": "2026-04-14T14:37:42.382150+00:00", + "stage": "policy", + "event_type": "action_selected", + "payload": { + "step": 1, + "decision_type": "load_skill", + "selected_ids": [ + "skill-deploy" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Predicted by learning router (score=1941.615).", + "estimated_cost": 0.0 + }, + "metrics": {}, + "parent_event_id": "evt-fc15daf4-f738-455e-8b12-39143b3c3d6c" + }, + { + "event_id": "evt-skill-traj-bcad8fa2-ffd3-4e5b-9ddb-720f3898826e-skill-deploy", + "trajectory_id": "traj-bcad8fa2-ffd3-4e5b-9ddb-720f3898826e", + "timestamp": "2026-04-14T14:37:42.382155+00:00", + "stage": "execution", + "event_type": "skill_loaded", + "payload": { + "skill_id": "skill-deploy", + "input": "Deploy this service with the usual workflow.", + "instructions": "Demo skill payload loaded successfully." + }, + "metrics": {}, + "parent_event_id": null + } + ], + "outcome": { + "status": "success", + "steps": 1, + "latency_ms": 0, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Draft trajectory generated by MemabraRunner with execution hooks." + }, + "reward": { + "total": 1.04, + "components": { + "task_success": 0.8, + "retrieval_hit": 0.2, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.0, + "context_cost": 0.06, + "useful_reuse": 0.1 + } + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/trajectories/traj-c0faa5d1-dcb4-4e86-ac6b-2abb15026f43.json b/docs/demo-artifacts/trajectories/traj-c0faa5d1-dcb4-4e86-ac6b-2abb15026f43.json new file mode 100644 index 0000000..5609905 --- /dev/null +++ b/docs/demo-artifacts/trajectories/traj-c0faa5d1-dcb4-4e86-ac6b-2abb15026f43.json @@ -0,0 +1,207 @@ +{ + "trajectory_id": "traj-c0faa5d1-dcb4-4e86-ac6b-2abb15026f43", + "task": { + "task_id": "task-1a24d0bb-b2e6-44f0-8095-2ed74368dc9d", + "input": "Run the deploy workflow skill.", + "channel": "local", + "created_at": "2026-04-14T16:50:18.861760+00:00", + "user_id": null + }, + "context_snapshot": { + "conversation_summary": "", + "environment_summary": "", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-telegram-pref", + "type": "memory", + "title": "Telegram preference", + "summary": "Prefer plain text on Telegram.", + "triggers": [ + "telegram", + "preference", + "answer" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 0.9, + "risk": 0.0, + "tags": [ + "output" + ], + "source": "user", + "type_payload": {} + } + ], + "skill": [ + { + "id": "skill-deploy", + "type": "skill", + "title": "Deploy workflow", + "summary": "Reusable deployment workflow.", + "triggers": [ + "deploy", + "workflow", + "service" + ], + "cost": 0.0, + "confidence": 0.8, + "success_rate": 0.9, + "freshness": 0.8, + "risk": 0.0, + "tags": [ + "ops" + ], + "source": "system", + "type_payload": {} + } + ], + "tool": [ + { + "id": "tool-terminal", + "type": "tool", + "title": "terminal", + "summary": "Run terminal-style inspection commands.", + "triggers": [ + "check", + "current", + "status", + "system" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 1.0, + "risk": 0.0, + "tags": [ + "inspection" + ], + "source": "system", + "type_payload": {} + } + ] + }, + "decisions": [ + { + "step": 1, + "decision_type": "call_tool", + "selected_ids": [ + "tool-terminal" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Task asks for current state or external action; tool use is justified.", + "estimated_cost": 0.0 + } + ], + "events": [ + { + "event_id": "evt-b4437076-cc94-4903-a2c7-3dd7c644dcc5", + "trajectory_id": "traj-c0faa5d1-dcb4-4e86-ac6b-2abb15026f43", + "timestamp": "2026-04-14T16:50:18.861861+00:00", + "stage": "retrieval", + "event_type": "task_received", + "payload": { + "input": "Run the deploy workflow skill." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-dd3dad15-7ace-47f7-9dd0-cf4955aa16ec", + "trajectory_id": "traj-c0faa5d1-dcb4-4e86-ac6b-2abb15026f43", + "timestamp": "2026-04-14T16:50:18.861871+00:00", + "stage": "retrieval", + "event_type": "candidates_recalled", + "payload": { + "memory_ids": [ + "mem-telegram-pref" + ], + "skill_ids": [ + "skill-deploy" + ], + "tool_ids": [ + "tool-terminal" + ] + }, + "metrics": {}, + "parent_event_id": "evt-b4437076-cc94-4903-a2c7-3dd7c644dcc5" + }, + { + "event_id": "evt-c3b04a4a-2506-47db-8d08-c8939c0eba08", + "trajectory_id": "traj-c0faa5d1-dcb4-4e86-ac6b-2abb15026f43", + "timestamp": "2026-04-14T16:50:18.861878+00:00", + "stage": "policy", + "event_type": "action_selected", + "payload": { + "step": 1, + "decision_type": "call_tool", + "selected_ids": [ + "tool-terminal" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Task asks for current state or external action; tool use is justified.", + "estimated_cost": 0.0 + }, + "metrics": {}, + "parent_event_id": "evt-dd3dad15-7ace-47f7-9dd0-cf4955aa16ec" + }, + { + "event_id": "evt-tool-traj-c0faa5d1-dcb4-4e86-ac6b-2abb15026f43-tool-terminal", + "trajectory_id": "traj-c0faa5d1-dcb4-4e86-ac6b-2abb15026f43", + "timestamp": "2026-04-14T16:50:18.861901+00:00", + "stage": "execution", + "event_type": "tool_called", + "payload": { + "tool_id": "tool-terminal", + "input": "Run the deploy workflow skill." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-tool-result-traj-c0faa5d1-dcb4-4e86-ac6b-2abb15026f43-tool-terminal", + "trajectory_id": "traj-c0faa5d1-dcb4-4e86-ac6b-2abb15026f43", + "timestamp": "2026-04-14T16:50:18.861906+00:00", + "stage": "execution", + "event_type": "tool_result", + "payload": { + "tool_id": "tool-terminal", + "status": "success", + "output": "demo-result-for:tool-terminal", + "error": null, + "latency_ms": 42 + }, + "metrics": {}, + "parent_event_id": null + } + ], + "outcome": { + "status": "success", + "steps": 1, + "latency_ms": 42, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Draft trajectory generated by MemabraRunner with execution hooks." + }, + "reward": { + "total": 1.032, + "components": { + "task_success": 0.8, + "retrieval_hit": 0.25, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.008, + "context_cost": 0.06, + "useful_reuse": 0.05 + } + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/trajectories/traj-c5907bfb-61d2-47f9-a6c5-2300701bb551.json b/docs/demo-artifacts/trajectories/traj-c5907bfb-61d2-47f9-a6c5-2300701bb551.json new file mode 100644 index 0000000..f87dcef --- /dev/null +++ b/docs/demo-artifacts/trajectories/traj-c5907bfb-61d2-47f9-a6c5-2300701bb551.json @@ -0,0 +1,191 @@ +{ + "trajectory_id": "traj-c5907bfb-61d2-47f9-a6c5-2300701bb551", + "task": { + "task_id": "task-c1f58e80-f0eb-47e9-92ab-9b1a84351dff", + "input": "Use my telegram preference for this answer.", + "channel": "telegram", + "created_at": "2026-04-14T15:27:38.512116+00:00", + "user_id": "oza" + }, + "context_snapshot": { + "conversation_summary": "", + "environment_summary": "", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-telegram-pref", + "type": "memory", + "title": "Telegram preference", + "summary": "Prefer plain text on Telegram.", + "triggers": [ + "telegram", + "preference", + "answer" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 0.9, + "risk": 0.0, + "tags": [ + "output" + ], + "source": "user", + "type_payload": {} + } + ], + "skill": [ + { + "id": "skill-deploy", + "type": "skill", + "title": "Deploy workflow", + "summary": "Reusable deployment workflow.", + "triggers": [ + "deploy", + "workflow", + "service" + ], + "cost": 0.0, + "confidence": 0.8, + "success_rate": 0.9, + "freshness": 0.8, + "risk": 0.0, + "tags": [ + "ops" + ], + "source": "system", + "type_payload": {} + } + ], + "tool": [ + { + "id": "tool-terminal", + "type": "tool", + "title": "terminal", + "summary": "Run terminal-style inspection commands.", + "triggers": [ + "check", + "current", + "status", + "system" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 1.0, + "risk": 0.0, + "tags": [ + "inspection" + ], + "source": "system", + "type_payload": {} + } + ] + }, + "decisions": [ + { + "step": 1, + "decision_type": "inject_memory", + "selected_ids": [ + "mem-telegram-pref" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Task likely depends on stable user/project facts.", + "estimated_cost": 0.0 + } + ], + "events": [ + { + "event_id": "evt-212f6d74-bafd-483b-b8ec-cf4a33bf67da", + "trajectory_id": "traj-c5907bfb-61d2-47f9-a6c5-2300701bb551", + "timestamp": "2026-04-14T15:27:38.512204+00:00", + "stage": "retrieval", + "event_type": "task_received", + "payload": { + "input": "Use my telegram preference for this answer." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-34b409a4-9ba9-4921-b3a6-e4c41bf7660c", + "trajectory_id": "traj-c5907bfb-61d2-47f9-a6c5-2300701bb551", + "timestamp": "2026-04-14T15:27:38.512211+00:00", + "stage": "retrieval", + "event_type": "candidates_recalled", + "payload": { + "memory_ids": [ + "mem-telegram-pref" + ], + "skill_ids": [ + "skill-deploy" + ], + "tool_ids": [ + "tool-terminal" + ] + }, + "metrics": {}, + "parent_event_id": "evt-212f6d74-bafd-483b-b8ec-cf4a33bf67da" + }, + { + "event_id": "evt-d117772a-0e77-4068-8ca5-0adacfcee184", + "trajectory_id": "traj-c5907bfb-61d2-47f9-a6c5-2300701bb551", + "timestamp": "2026-04-14T15:27:38.512216+00:00", + "stage": "policy", + "event_type": "action_selected", + "payload": { + "step": 1, + "decision_type": "inject_memory", + "selected_ids": [ + "mem-telegram-pref" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Task likely depends on stable user/project facts.", + "estimated_cost": 0.0 + }, + "metrics": {}, + "parent_event_id": "evt-34b409a4-9ba9-4921-b3a6-e4c41bf7660c" + }, + { + "event_id": "evt-memory-traj-c5907bfb-61d2-47f9-a6c5-2300701bb551-mem-telegram-pref", + "trajectory_id": "traj-c5907bfb-61d2-47f9-a6c5-2300701bb551", + "timestamp": "2026-04-14T15:27:38.512223+00:00", + "stage": "execution", + "event_type": "memory_injected", + "payload": { + "record_id": "mem-telegram-pref", + "input": "Use my telegram preference for this answer." + }, + "metrics": {}, + "parent_event_id": null + } + ], + "outcome": { + "status": "success", + "steps": 1, + "latency_ms": 0, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Draft trajectory generated by MemabraRunner with execution hooks." + }, + "reward": { + "total": 1.04, + "components": { + "task_success": 0.8, + "retrieval_hit": 0.2, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.0, + "context_cost": 0.06, + "useful_reuse": 0.1 + } + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/trajectories/traj-c9c11bdc-852b-4aef-851c-f2968806e535.json b/docs/demo-artifacts/trajectories/traj-c9c11bdc-852b-4aef-851c-f2968806e535.json new file mode 100644 index 0000000..a15aa41 --- /dev/null +++ b/docs/demo-artifacts/trajectories/traj-c9c11bdc-852b-4aef-851c-f2968806e535.json @@ -0,0 +1,191 @@ +{ + "trajectory_id": "traj-c9c11bdc-852b-4aef-851c-f2968806e535", + "task": { + "task_id": "task-c08fbd42-a324-4430-8277-94c666661238", + "input": "Check the current system status.", + "channel": "local", + "created_at": "2026-04-14T15:27:38.520185+00:00", + "user_id": "oza" + }, + "context_snapshot": { + "conversation_summary": "", + "environment_summary": "", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-telegram-pref", + "type": "memory", + "title": "Telegram preference", + "summary": "Prefer plain text on Telegram.", + "triggers": [ + "telegram", + "preference", + "answer" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 0.9, + "risk": 0.0, + "tags": [ + "output" + ], + "source": "user", + "type_payload": {} + } + ], + "skill": [ + { + "id": "skill-deploy", + "type": "skill", + "title": "Deploy workflow", + "summary": "Reusable deployment workflow.", + "triggers": [ + "deploy", + "workflow", + "service" + ], + "cost": 0.0, + "confidence": 0.8, + "success_rate": 0.9, + "freshness": 0.8, + "risk": 0.0, + "tags": [ + "ops" + ], + "source": "system", + "type_payload": {} + } + ], + "tool": [ + { + "id": "tool-terminal", + "type": "tool", + "title": "terminal", + "summary": "Run terminal-style inspection commands.", + "triggers": [ + "check", + "current", + "status", + "system" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 1.0, + "risk": 0.0, + "tags": [ + "inspection" + ], + "source": "system", + "type_payload": {} + } + ] + }, + "decisions": [ + { + "step": 1, + "decision_type": "inject_memory", + "selected_ids": [ + "mem-telegram-pref" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Predicted by learning router (score=1381.615).", + "estimated_cost": 0.0 + } + ], + "events": [ + { + "event_id": "evt-2e0920c4-6830-4c86-a4a3-139028e46176", + "trajectory_id": "traj-c9c11bdc-852b-4aef-851c-f2968806e535", + "timestamp": "2026-04-14T15:27:38.520262+00:00", + "stage": "retrieval", + "event_type": "task_received", + "payload": { + "input": "Check the current system status." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-f81b2a77-a012-4c62-9700-93f1b31daeb2", + "trajectory_id": "traj-c9c11bdc-852b-4aef-851c-f2968806e535", + "timestamp": "2026-04-14T15:27:38.520268+00:00", + "stage": "retrieval", + "event_type": "candidates_recalled", + "payload": { + "memory_ids": [ + "mem-telegram-pref" + ], + "skill_ids": [ + "skill-deploy" + ], + "tool_ids": [ + "tool-terminal" + ] + }, + "metrics": {}, + "parent_event_id": "evt-2e0920c4-6830-4c86-a4a3-139028e46176" + }, + { + "event_id": "evt-2b1fe09d-30b3-46c9-a706-373d5c8da08e", + "trajectory_id": "traj-c9c11bdc-852b-4aef-851c-f2968806e535", + "timestamp": "2026-04-14T15:27:38.520273+00:00", + "stage": "policy", + "event_type": "action_selected", + "payload": { + "step": 1, + "decision_type": "inject_memory", + "selected_ids": [ + "mem-telegram-pref" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Predicted by learning router (score=1381.615).", + "estimated_cost": 0.0 + }, + "metrics": {}, + "parent_event_id": "evt-f81b2a77-a012-4c62-9700-93f1b31daeb2" + }, + { + "event_id": "evt-memory-traj-c9c11bdc-852b-4aef-851c-f2968806e535-mem-telegram-pref", + "trajectory_id": "traj-c9c11bdc-852b-4aef-851c-f2968806e535", + "timestamp": "2026-04-14T15:27:38.520280+00:00", + "stage": "execution", + "event_type": "memory_injected", + "payload": { + "record_id": "mem-telegram-pref", + "input": "Check the current system status." + }, + "metrics": {}, + "parent_event_id": null + } + ], + "outcome": { + "status": "success", + "steps": 1, + "latency_ms": 0, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Draft trajectory generated by MemabraRunner with execution hooks." + }, + "reward": { + "total": 1.04, + "components": { + "task_success": 0.8, + "retrieval_hit": 0.2, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.0, + "context_cost": 0.06, + "useful_reuse": 0.1 + } + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/trajectories/traj-d2d3a115-36d8-466f-9d14-bf741316f698.json b/docs/demo-artifacts/trajectories/traj-d2d3a115-36d8-466f-9d14-bf741316f698.json new file mode 100644 index 0000000..94906f7 --- /dev/null +++ b/docs/demo-artifacts/trajectories/traj-d2d3a115-36d8-466f-9d14-bf741316f698.json @@ -0,0 +1,201 @@ +{ + "trajectory_id": "traj-d2d3a115-36d8-466f-9d14-bf741316f698", + "task": { + "task_id": "task-00ccd7d0-72d9-458f-87fa-be0ee5571e44", + "input": "Check the current system status.", + "channel": "telegram", + "created_at": "2026-04-14T06:53:08.731950+00:00", + "user_id": "oza" + }, + "context_snapshot": { + "conversation_summary": "", + "environment_summary": "", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-telegram-pref", + "type": "memory", + "title": "Telegram preference", + "summary": "Prefer plain text on Telegram.", + "triggers": [ + "telegram", + "preference", + "answer" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 0.9, + "risk": 0.0, + "tags": [ + "output" + ], + "source": "user", + "type_payload": {} + } + ], + "skill": [ + { + "id": "skill-deploy", + "type": "skill", + "title": "Deploy workflow", + "summary": "Reusable deployment workflow.", + "triggers": [ + "deploy", + "workflow", + "service" + ], + "cost": 0.0, + "confidence": 0.8, + "success_rate": 0.9, + "freshness": 0.8, + "risk": 0.0, + "tags": [ + "ops" + ], + "source": "system", + "type_payload": {} + } + ], + "tool": [ + { + "id": "tool-terminal", + "type": "tool", + "title": "terminal", + "summary": "Run terminal-style inspection commands.", + "triggers": [ + "check", + "current", + "status", + "system" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 1.0, + "risk": 0.0, + "tags": [ + "inspection" + ], + "source": "system", + "type_payload": {} + } + ] + }, + "decisions": [ + { + "step": 1, + "decision_type": "call_tool", + "selected_ids": [ + "tool-terminal" + ], + "rejected_ids": [], + "rationale": "Task asks for current state or external action; tool use is justified.", + "estimated_cost": 0.0 + } + ], + "events": [ + { + "event_id": "evt-63d64eb8-16b1-4dc7-ae03-7c094bc6e64f", + "trajectory_id": "traj-d2d3a115-36d8-466f-9d14-bf741316f698", + "timestamp": "2026-04-14T06:53:08.732042+00:00", + "stage": "retrieval", + "event_type": "task_received", + "payload": { + "input": "Check the current system status." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-04ef718b-6973-465d-920e-bc501a6e02ad", + "trajectory_id": "traj-d2d3a115-36d8-466f-9d14-bf741316f698", + "timestamp": "2026-04-14T06:53:08.732049+00:00", + "stage": "retrieval", + "event_type": "candidates_recalled", + "payload": { + "memory_ids": [ + "mem-telegram-pref" + ], + "skill_ids": [ + "skill-deploy" + ], + "tool_ids": [ + "tool-terminal" + ] + }, + "metrics": {}, + "parent_event_id": "evt-63d64eb8-16b1-4dc7-ae03-7c094bc6e64f" + }, + { + "event_id": "evt-50f19e1e-8771-42c1-8846-95b5e4a6f491", + "trajectory_id": "traj-d2d3a115-36d8-466f-9d14-bf741316f698", + "timestamp": "2026-04-14T06:53:08.732053+00:00", + "stage": "policy", + "event_type": "action_selected", + "payload": { + "step": 1, + "decision_type": "call_tool", + "selected_ids": [ + "tool-terminal" + ], + "rejected_ids": [], + "rationale": "Task asks for current state or external action; tool use is justified.", + "estimated_cost": 0.0 + }, + "metrics": {}, + "parent_event_id": "evt-04ef718b-6973-465d-920e-bc501a6e02ad" + }, + { + "event_id": "evt-tool-traj-d2d3a115-36d8-466f-9d14-bf741316f698-tool-terminal", + "trajectory_id": "traj-d2d3a115-36d8-466f-9d14-bf741316f698", + "timestamp": "2026-04-14T06:53:08.732064+00:00", + "stage": "execution", + "event_type": "tool_called", + "payload": { + "tool_id": "tool-terminal", + "input": "Check the current system status." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-tool-result-traj-d2d3a115-36d8-466f-9d14-bf741316f698-tool-terminal", + "trajectory_id": "traj-d2d3a115-36d8-466f-9d14-bf741316f698", + "timestamp": "2026-04-14T06:53:08.732068+00:00", + "stage": "execution", + "event_type": "tool_result", + "payload": { + "tool_id": "tool-terminal", + "status": "success", + "output": "demo-result-for:tool-terminal", + "error": null, + "latency_ms": 42 + }, + "metrics": {}, + "parent_event_id": null + } + ], + "outcome": { + "status": "success", + "steps": 1, + "latency_ms": 42, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Draft trajectory generated by MemabraRunner with execution hooks." + }, + "reward": { + "total": 1.058, + "components": { + "task_success": 0.8, + "retrieval_hit": 0.25, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.042, + "context_cost": 0.0, + "useful_reuse": 0.05 + } + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/trajectories/traj-d3575889-7458-44b9-b3f1-f04cd766ca76.json b/docs/demo-artifacts/trajectories/traj-d3575889-7458-44b9-b3f1-f04cd766ca76.json new file mode 100644 index 0000000..be5c37a --- /dev/null +++ b/docs/demo-artifacts/trajectories/traj-d3575889-7458-44b9-b3f1-f04cd766ca76.json @@ -0,0 +1,191 @@ +{ + "trajectory_id": "traj-d3575889-7458-44b9-b3f1-f04cd766ca76", + "task": { + "task_id": "task-9db54b7d-a508-49ac-bd3c-bd5af3eabc61", + "input": "Deploy this service with the usual workflow.", + "channel": "local", + "created_at": "2026-04-14T15:27:38.520867+00:00", + "user_id": "oza" + }, + "context_snapshot": { + "conversation_summary": "", + "environment_summary": "", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-telegram-pref", + "type": "memory", + "title": "Telegram preference", + "summary": "Prefer plain text on Telegram.", + "triggers": [ + "telegram", + "preference", + "answer" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 0.9, + "risk": 0.0, + "tags": [ + "output" + ], + "source": "user", + "type_payload": {} + } + ], + "skill": [ + { + "id": "skill-deploy", + "type": "skill", + "title": "Deploy workflow", + "summary": "Reusable deployment workflow.", + "triggers": [ + "deploy", + "workflow", + "service" + ], + "cost": 0.0, + "confidence": 0.8, + "success_rate": 0.9, + "freshness": 0.8, + "risk": 0.0, + "tags": [ + "ops" + ], + "source": "system", + "type_payload": {} + } + ], + "tool": [ + { + "id": "tool-terminal", + "type": "tool", + "title": "terminal", + "summary": "Run terminal-style inspection commands.", + "triggers": [ + "check", + "current", + "status", + "system" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 1.0, + "risk": 0.0, + "tags": [ + "inspection" + ], + "source": "system", + "type_payload": {} + } + ] + }, + "decisions": [ + { + "step": 1, + "decision_type": "inject_memory", + "selected_ids": [ + "mem-telegram-pref" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Predicted by learning router (score=1897.615).", + "estimated_cost": 0.0 + } + ], + "events": [ + { + "event_id": "evt-3e658630-fea8-44c3-afd2-fc936a2eed37", + "trajectory_id": "traj-d3575889-7458-44b9-b3f1-f04cd766ca76", + "timestamp": "2026-04-14T15:27:38.520945+00:00", + "stage": "retrieval", + "event_type": "task_received", + "payload": { + "input": "Deploy this service with the usual workflow." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-03990424-3433-4147-a963-353863758b31", + "trajectory_id": "traj-d3575889-7458-44b9-b3f1-f04cd766ca76", + "timestamp": "2026-04-14T15:27:38.520951+00:00", + "stage": "retrieval", + "event_type": "candidates_recalled", + "payload": { + "memory_ids": [ + "mem-telegram-pref" + ], + "skill_ids": [ + "skill-deploy" + ], + "tool_ids": [ + "tool-terminal" + ] + }, + "metrics": {}, + "parent_event_id": "evt-3e658630-fea8-44c3-afd2-fc936a2eed37" + }, + { + "event_id": "evt-10dfab37-ded7-473e-9de9-2f922c5bf7c8", + "trajectory_id": "traj-d3575889-7458-44b9-b3f1-f04cd766ca76", + "timestamp": "2026-04-14T15:27:38.520956+00:00", + "stage": "policy", + "event_type": "action_selected", + "payload": { + "step": 1, + "decision_type": "inject_memory", + "selected_ids": [ + "mem-telegram-pref" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Predicted by learning router (score=1897.615).", + "estimated_cost": 0.0 + }, + "metrics": {}, + "parent_event_id": "evt-03990424-3433-4147-a963-353863758b31" + }, + { + "event_id": "evt-memory-traj-d3575889-7458-44b9-b3f1-f04cd766ca76-mem-telegram-pref", + "trajectory_id": "traj-d3575889-7458-44b9-b3f1-f04cd766ca76", + "timestamp": "2026-04-14T15:27:38.520961+00:00", + "stage": "execution", + "event_type": "memory_injected", + "payload": { + "record_id": "mem-telegram-pref", + "input": "Deploy this service with the usual workflow." + }, + "metrics": {}, + "parent_event_id": null + } + ], + "outcome": { + "status": "success", + "steps": 1, + "latency_ms": 0, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Draft trajectory generated by MemabraRunner with execution hooks." + }, + "reward": { + "total": 1.04, + "components": { + "task_success": 0.8, + "retrieval_hit": 0.2, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.0, + "context_cost": 0.06, + "useful_reuse": 0.1 + } + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/trajectories/traj-d99b5307-1749-4e80-867a-877e087f226f.json b/docs/demo-artifacts/trajectories/traj-d99b5307-1749-4e80-867a-877e087f226f.json new file mode 100644 index 0000000..80ca2a8 --- /dev/null +++ b/docs/demo-artifacts/trajectories/traj-d99b5307-1749-4e80-867a-877e087f226f.json @@ -0,0 +1,170 @@ +{ + "trajectory_id": "traj-d99b5307-1749-4e80-867a-877e087f226f", + "task": { + "task_id": "task-9cda8e38-dcdf-4877-bc19-48444df0531e", + "input": "Use multiple capabilities: memory, skill, and tool.", + "channel": "local", + "created_at": "2026-04-14T16:50:18.865109+00:00", + "user_id": null + }, + "context_snapshot": { + "conversation_summary": "", + "environment_summary": "", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-telegram-pref", + "type": "memory", + "title": "Telegram preference", + "summary": "Prefer plain text on Telegram.", + "triggers": [ + "telegram", + "preference", + "answer" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 0.9, + "risk": 0.0, + "tags": [ + "output" + ], + "source": "user", + "type_payload": {} + } + ], + "skill": [ + { + "id": "skill-deploy", + "type": "skill", + "title": "Deploy workflow", + "summary": "Reusable deployment workflow.", + "triggers": [ + "deploy", + "workflow", + "service" + ], + "cost": 0.0, + "confidence": 0.8, + "success_rate": 0.9, + "freshness": 0.8, + "risk": 0.0, + "tags": [ + "ops" + ], + "source": "system", + "type_payload": {} + } + ], + "tool": [ + { + "id": "tool-terminal", + "type": "tool", + "title": "terminal", + "summary": "Run terminal-style inspection commands.", + "triggers": [ + "check", + "current", + "status", + "system" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 1.0, + "risk": 0.0, + "tags": [ + "inspection" + ], + "source": "system", + "type_payload": {} + } + ] + }, + "decisions": [ + { + "step": 1, + "decision_type": "clarify", + "selected_ids": [], + "selected_payloads": [], + "rejected_ids": [], + "rationale": "Predicted by learning router (score=2606.615).", + "estimated_cost": 0.0 + } + ], + "events": [ + { + "event_id": "evt-88a21058-c409-4836-a1b8-ef6cc63ac51e", + "trajectory_id": "traj-d99b5307-1749-4e80-867a-877e087f226f", + "timestamp": "2026-04-14T16:50:18.865214+00:00", + "stage": "retrieval", + "event_type": "task_received", + "payload": { + "input": "Use multiple capabilities: memory, skill, and tool." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-44d46564-2d71-4bed-8a3f-d3fc96fce9ef", + "trajectory_id": "traj-d99b5307-1749-4e80-867a-877e087f226f", + "timestamp": "2026-04-14T16:50:18.865225+00:00", + "stage": "retrieval", + "event_type": "candidates_recalled", + "payload": { + "memory_ids": [ + "mem-telegram-pref" + ], + "skill_ids": [ + "skill-deploy" + ], + "tool_ids": [ + "tool-terminal" + ] + }, + "metrics": {}, + "parent_event_id": "evt-88a21058-c409-4836-a1b8-ef6cc63ac51e" + }, + { + "event_id": "evt-e21e8afe-d676-4839-b9d0-fd60441b983a", + "trajectory_id": "traj-d99b5307-1749-4e80-867a-877e087f226f", + "timestamp": "2026-04-14T16:50:18.865231+00:00", + "stage": "policy", + "event_type": "action_selected", + "payload": { + "step": 1, + "decision_type": "clarify", + "selected_ids": [], + "selected_payloads": [], + "rejected_ids": [], + "rationale": "Predicted by learning router (score=2606.615).", + "estimated_cost": 0.0 + }, + "metrics": {}, + "parent_event_id": "evt-44d46564-2d71-4bed-8a3f-d3fc96fce9ef" + } + ], + "outcome": { + "status": "partial_success", + "steps": 1, + "latency_ms": 0, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Draft trajectory generated by MemabraRunner with execution hooks." + }, + "reward": { + "total": 0.44, + "components": { + "task_success": 0.4, + "retrieval_hit": 0.1, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.0, + "context_cost": 0.06, + "useful_reuse": 0.0 + } + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/trajectories/traj-dd361c81-40a1-4892-9914-2140870fff95.json b/docs/demo-artifacts/trajectories/traj-dd361c81-40a1-4892-9914-2140870fff95.json new file mode 100644 index 0000000..379c22f --- /dev/null +++ b/docs/demo-artifacts/trajectories/traj-dd361c81-40a1-4892-9914-2140870fff95.json @@ -0,0 +1,192 @@ +{ + "trajectory_id": "traj-dd361c81-40a1-4892-9914-2140870fff95", + "task": { + "task_id": "task-789e89f1-828b-405e-ab11-43dd00107f5f", + "input": "Deploy this service with the usual workflow.", + "channel": "local", + "created_at": "2026-04-14T15:27:38.519101+00:00", + "user_id": "oza" + }, + "context_snapshot": { + "conversation_summary": "", + "environment_summary": "", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-telegram-pref", + "type": "memory", + "title": "Telegram preference", + "summary": "Prefer plain text on Telegram.", + "triggers": [ + "telegram", + "preference", + "answer" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 0.9, + "risk": 0.0, + "tags": [ + "output" + ], + "source": "user", + "type_payload": {} + } + ], + "skill": [ + { + "id": "skill-deploy", + "type": "skill", + "title": "Deploy workflow", + "summary": "Reusable deployment workflow.", + "triggers": [ + "deploy", + "workflow", + "service" + ], + "cost": 0.0, + "confidence": 0.8, + "success_rate": 0.9, + "freshness": 0.8, + "risk": 0.0, + "tags": [ + "ops" + ], + "source": "system", + "type_payload": {} + } + ], + "tool": [ + { + "id": "tool-terminal", + "type": "tool", + "title": "terminal", + "summary": "Run terminal-style inspection commands.", + "triggers": [ + "check", + "current", + "status", + "system" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 1.0, + "risk": 0.0, + "tags": [ + "inspection" + ], + "source": "system", + "type_payload": {} + } + ] + }, + "decisions": [ + { + "step": 1, + "decision_type": "load_skill", + "selected_ids": [ + "skill-deploy" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Task resembles a reusable procedure; load a skill before action.", + "estimated_cost": 0.0 + } + ], + "events": [ + { + "event_id": "evt-ec9bd980-c648-43fc-8428-83a6ce0cf375", + "trajectory_id": "traj-dd361c81-40a1-4892-9914-2140870fff95", + "timestamp": "2026-04-14T15:27:38.519171+00:00", + "stage": "retrieval", + "event_type": "task_received", + "payload": { + "input": "Deploy this service with the usual workflow." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-e0f1f4e9-2a70-424d-bff6-34a156134b0f", + "trajectory_id": "traj-dd361c81-40a1-4892-9914-2140870fff95", + "timestamp": "2026-04-14T15:27:38.519177+00:00", + "stage": "retrieval", + "event_type": "candidates_recalled", + "payload": { + "memory_ids": [ + "mem-telegram-pref" + ], + "skill_ids": [ + "skill-deploy" + ], + "tool_ids": [ + "tool-terminal" + ] + }, + "metrics": {}, + "parent_event_id": "evt-ec9bd980-c648-43fc-8428-83a6ce0cf375" + }, + { + "event_id": "evt-9b1ea6f8-ac54-4aa4-ae0f-44aa3a0128dd", + "trajectory_id": "traj-dd361c81-40a1-4892-9914-2140870fff95", + "timestamp": "2026-04-14T15:27:38.519181+00:00", + "stage": "policy", + "event_type": "action_selected", + "payload": { + "step": 1, + "decision_type": "load_skill", + "selected_ids": [ + "skill-deploy" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Task resembles a reusable procedure; load a skill before action.", + "estimated_cost": 0.0 + }, + "metrics": {}, + "parent_event_id": "evt-e0f1f4e9-2a70-424d-bff6-34a156134b0f" + }, + { + "event_id": "evt-skill-traj-dd361c81-40a1-4892-9914-2140870fff95-skill-deploy", + "trajectory_id": "traj-dd361c81-40a1-4892-9914-2140870fff95", + "timestamp": "2026-04-14T15:27:38.519188+00:00", + "stage": "execution", + "event_type": "skill_loaded", + "payload": { + "skill_id": "skill-deploy", + "input": "Deploy this service with the usual workflow.", + "instructions": "Demo skill payload loaded successfully." + }, + "metrics": {}, + "parent_event_id": null + } + ], + "outcome": { + "status": "success", + "steps": 1, + "latency_ms": 0, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Draft trajectory generated by MemabraRunner with execution hooks." + }, + "reward": { + "total": 1.04, + "components": { + "task_success": 0.8, + "retrieval_hit": 0.2, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.0, + "context_cost": 0.06, + "useful_reuse": 0.1 + } + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/trajectories/traj-e197ee51-e87c-4203-b9ee-c2f2d530cceb.json b/docs/demo-artifacts/trajectories/traj-e197ee51-e87c-4203-b9ee-c2f2d530cceb.json new file mode 100644 index 0000000..f64ddc8 --- /dev/null +++ b/docs/demo-artifacts/trajectories/traj-e197ee51-e87c-4203-b9ee-c2f2d530cceb.json @@ -0,0 +1,192 @@ +{ + "trajectory_id": "traj-e197ee51-e87c-4203-b9ee-c2f2d530cceb", + "task": { + "task_id": "task-144d7465-796c-4dd0-a4e2-c2be42872c4a", + "input": "Run the deploy workflow skill.", + "channel": "local", + "created_at": "2026-04-14T15:52:24.606059+00:00", + "user_id": null + }, + "context_snapshot": { + "conversation_summary": "", + "environment_summary": "", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-telegram-pref", + "type": "memory", + "title": "Telegram preference", + "summary": "Prefer plain text on Telegram.", + "triggers": [ + "telegram", + "preference", + "answer" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 0.9, + "risk": 0.0, + "tags": [ + "output" + ], + "source": "user", + "type_payload": {} + } + ], + "skill": [ + { + "id": "skill-deploy", + "type": "skill", + "title": "Deploy workflow", + "summary": "Reusable deployment workflow.", + "triggers": [ + "deploy", + "workflow", + "service" + ], + "cost": 0.0, + "confidence": 0.8, + "success_rate": 0.9, + "freshness": 0.8, + "risk": 0.0, + "tags": [ + "ops" + ], + "source": "system", + "type_payload": {} + } + ], + "tool": [ + { + "id": "tool-terminal", + "type": "tool", + "title": "terminal", + "summary": "Run terminal-style inspection commands.", + "triggers": [ + "check", + "current", + "status", + "system" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 1.0, + "risk": 0.0, + "tags": [ + "inspection" + ], + "source": "system", + "type_payload": {} + } + ] + }, + "decisions": [ + { + "step": 1, + "decision_type": "load_skill", + "selected_ids": [ + "skill-deploy" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Predicted by learning router (score=1277.214).", + "estimated_cost": 0.0 + } + ], + "events": [ + { + "event_id": "evt-184ab2f3-c1c6-4af1-8241-d55b4731e606", + "trajectory_id": "traj-e197ee51-e87c-4203-b9ee-c2f2d530cceb", + "timestamp": "2026-04-14T15:52:24.606169+00:00", + "stage": "retrieval", + "event_type": "task_received", + "payload": { + "input": "Run the deploy workflow skill." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-9dd959ce-a5ce-42fd-b975-a03dd713adf6", + "trajectory_id": "traj-e197ee51-e87c-4203-b9ee-c2f2d530cceb", + "timestamp": "2026-04-14T15:52:24.606180+00:00", + "stage": "retrieval", + "event_type": "candidates_recalled", + "payload": { + "memory_ids": [ + "mem-telegram-pref" + ], + "skill_ids": [ + "skill-deploy" + ], + "tool_ids": [ + "tool-terminal" + ] + }, + "metrics": {}, + "parent_event_id": "evt-184ab2f3-c1c6-4af1-8241-d55b4731e606" + }, + { + "event_id": "evt-537a8488-f6eb-4f15-94ac-3e1f195c584a", + "trajectory_id": "traj-e197ee51-e87c-4203-b9ee-c2f2d530cceb", + "timestamp": "2026-04-14T15:52:24.606193+00:00", + "stage": "policy", + "event_type": "action_selected", + "payload": { + "step": 1, + "decision_type": "load_skill", + "selected_ids": [ + "skill-deploy" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Predicted by learning router (score=1277.214).", + "estimated_cost": 0.0 + }, + "metrics": {}, + "parent_event_id": "evt-9dd959ce-a5ce-42fd-b975-a03dd713adf6" + }, + { + "event_id": "evt-skill-traj-e197ee51-e87c-4203-b9ee-c2f2d530cceb-skill-deploy", + "trajectory_id": "traj-e197ee51-e87c-4203-b9ee-c2f2d530cceb", + "timestamp": "2026-04-14T15:52:24.606202+00:00", + "stage": "execution", + "event_type": "skill_loaded", + "payload": { + "skill_id": "skill-deploy", + "input": "Run the deploy workflow skill.", + "instructions": "Demo skill payload loaded successfully." + }, + "metrics": {}, + "parent_event_id": null + } + ], + "outcome": { + "status": "success", + "steps": 1, + "latency_ms": 0, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Draft trajectory generated by MemabraRunner with execution hooks." + }, + "reward": { + "total": 1.04, + "components": { + "task_success": 0.8, + "retrieval_hit": 0.2, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.0, + "context_cost": 0.06, + "useful_reuse": 0.1 + } + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/trajectories/traj-e9c37170-8764-4d70-ba0d-90213b275229.json b/docs/demo-artifacts/trajectories/traj-e9c37170-8764-4d70-ba0d-90213b275229.json new file mode 100644 index 0000000..a7dd3df --- /dev/null +++ b/docs/demo-artifacts/trajectories/traj-e9c37170-8764-4d70-ba0d-90213b275229.json @@ -0,0 +1,170 @@ +{ + "trajectory_id": "traj-e9c37170-8764-4d70-ba0d-90213b275229", + "task": { + "task_id": "task-f61f5344-3be7-4a7a-9dfa-b8d2a9c30a42", + "input": "Recall my saved preference from memory.", + "channel": "local", + "created_at": "2026-04-14T16:50:18.863539+00:00", + "user_id": null + }, + "context_snapshot": { + "conversation_summary": "", + "environment_summary": "", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-telegram-pref", + "type": "memory", + "title": "Telegram preference", + "summary": "Prefer plain text on Telegram.", + "triggers": [ + "telegram", + "preference", + "answer" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 0.9, + "risk": 0.0, + "tags": [ + "output" + ], + "source": "user", + "type_payload": {} + } + ], + "skill": [ + { + "id": "skill-deploy", + "type": "skill", + "title": "Deploy workflow", + "summary": "Reusable deployment workflow.", + "triggers": [ + "deploy", + "workflow", + "service" + ], + "cost": 0.0, + "confidence": 0.8, + "success_rate": 0.9, + "freshness": 0.8, + "risk": 0.0, + "tags": [ + "ops" + ], + "source": "system", + "type_payload": {} + } + ], + "tool": [ + { + "id": "tool-terminal", + "type": "tool", + "title": "terminal", + "summary": "Run terminal-style inspection commands.", + "triggers": [ + "check", + "current", + "status", + "system" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 1.0, + "risk": 0.0, + "tags": [ + "inspection" + ], + "source": "system", + "type_payload": {} + } + ] + }, + "decisions": [ + { + "step": 1, + "decision_type": "clarify", + "selected_ids": [], + "selected_payloads": [], + "rejected_ids": [], + "rationale": "Predicted by learning router (score=1994.615).", + "estimated_cost": 0.0 + } + ], + "events": [ + { + "event_id": "evt-21762ef9-6490-4e3f-8f3c-2ba17e20c050", + "trajectory_id": "traj-e9c37170-8764-4d70-ba0d-90213b275229", + "timestamp": "2026-04-14T16:50:18.863643+00:00", + "stage": "retrieval", + "event_type": "task_received", + "payload": { + "input": "Recall my saved preference from memory." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-40f5b045-1e94-4c07-8cf5-5a245a946b9d", + "trajectory_id": "traj-e9c37170-8764-4d70-ba0d-90213b275229", + "timestamp": "2026-04-14T16:50:18.863652+00:00", + "stage": "retrieval", + "event_type": "candidates_recalled", + "payload": { + "memory_ids": [ + "mem-telegram-pref" + ], + "skill_ids": [ + "skill-deploy" + ], + "tool_ids": [ + "tool-terminal" + ] + }, + "metrics": {}, + "parent_event_id": "evt-21762ef9-6490-4e3f-8f3c-2ba17e20c050" + }, + { + "event_id": "evt-5ed49c2e-d2b3-46ec-859e-ec00f8c001c2", + "trajectory_id": "traj-e9c37170-8764-4d70-ba0d-90213b275229", + "timestamp": "2026-04-14T16:50:18.863659+00:00", + "stage": "policy", + "event_type": "action_selected", + "payload": { + "step": 1, + "decision_type": "clarify", + "selected_ids": [], + "selected_payloads": [], + "rejected_ids": [], + "rationale": "Predicted by learning router (score=1994.615).", + "estimated_cost": 0.0 + }, + "metrics": {}, + "parent_event_id": "evt-40f5b045-1e94-4c07-8cf5-5a245a946b9d" + } + ], + "outcome": { + "status": "partial_success", + "steps": 1, + "latency_ms": 0, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Draft trajectory generated by MemabraRunner with execution hooks." + }, + "reward": { + "total": 0.44, + "components": { + "task_success": 0.4, + "retrieval_hit": 0.1, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.0, + "context_cost": 0.06, + "useful_reuse": 0.0 + } + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/trajectories/traj-ebc0d1f0-d01f-4c1f-8cdb-23c3d184b2c5.json b/docs/demo-artifacts/trajectories/traj-ebc0d1f0-d01f-4c1f-8cdb-23c3d184b2c5.json new file mode 100644 index 0000000..3a4f27f --- /dev/null +++ b/docs/demo-artifacts/trajectories/traj-ebc0d1f0-d01f-4c1f-8cdb-23c3d184b2c5.json @@ -0,0 +1,170 @@ +{ + "trajectory_id": "traj-ebc0d1f0-d01f-4c1f-8cdb-23c3d184b2c5", + "task": { + "task_id": "task-d7578bf3-95da-43f2-9b31-2c80ccb4fe33", + "input": "Run the deploy workflow skill.", + "channel": "local", + "created_at": "2026-04-14T16:50:18.864056+00:00", + "user_id": null + }, + "context_snapshot": { + "conversation_summary": "", + "environment_summary": "", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-telegram-pref", + "type": "memory", + "title": "Telegram preference", + "summary": "Prefer plain text on Telegram.", + "triggers": [ + "telegram", + "preference", + "answer" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 0.9, + "risk": 0.0, + "tags": [ + "output" + ], + "source": "user", + "type_payload": {} + } + ], + "skill": [ + { + "id": "skill-deploy", + "type": "skill", + "title": "Deploy workflow", + "summary": "Reusable deployment workflow.", + "triggers": [ + "deploy", + "workflow", + "service" + ], + "cost": 0.0, + "confidence": 0.8, + "success_rate": 0.9, + "freshness": 0.8, + "risk": 0.0, + "tags": [ + "ops" + ], + "source": "system", + "type_payload": {} + } + ], + "tool": [ + { + "id": "tool-terminal", + "type": "tool", + "title": "terminal", + "summary": "Run terminal-style inspection commands.", + "triggers": [ + "check", + "current", + "status", + "system" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 1.0, + "risk": 0.0, + "tags": [ + "inspection" + ], + "source": "system", + "type_payload": {} + } + ] + }, + "decisions": [ + { + "step": 1, + "decision_type": "clarify", + "selected_ids": [], + "selected_payloads": [], + "rejected_ids": [], + "rationale": "Predicted by learning router (score=1535.615).", + "estimated_cost": 0.0 + } + ], + "events": [ + { + "event_id": "evt-4e1aa172-112d-4000-8708-f2184e114ee5", + "trajectory_id": "traj-ebc0d1f0-d01f-4c1f-8cdb-23c3d184b2c5", + "timestamp": "2026-04-14T16:50:18.864163+00:00", + "stage": "retrieval", + "event_type": "task_received", + "payload": { + "input": "Run the deploy workflow skill." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-b9e7f5b9-2f27-4f4d-8f76-8ba6b39620eb", + "trajectory_id": "traj-ebc0d1f0-d01f-4c1f-8cdb-23c3d184b2c5", + "timestamp": "2026-04-14T16:50:18.864173+00:00", + "stage": "retrieval", + "event_type": "candidates_recalled", + "payload": { + "memory_ids": [ + "mem-telegram-pref" + ], + "skill_ids": [ + "skill-deploy" + ], + "tool_ids": [ + "tool-terminal" + ] + }, + "metrics": {}, + "parent_event_id": "evt-4e1aa172-112d-4000-8708-f2184e114ee5" + }, + { + "event_id": "evt-07dcc07d-d9c4-4698-881d-925294dadadf", + "trajectory_id": "traj-ebc0d1f0-d01f-4c1f-8cdb-23c3d184b2c5", + "timestamp": "2026-04-14T16:50:18.864179+00:00", + "stage": "policy", + "event_type": "action_selected", + "payload": { + "step": 1, + "decision_type": "clarify", + "selected_ids": [], + "selected_payloads": [], + "rejected_ids": [], + "rationale": "Predicted by learning router (score=1535.615).", + "estimated_cost": 0.0 + }, + "metrics": {}, + "parent_event_id": "evt-b9e7f5b9-2f27-4f4d-8f76-8ba6b39620eb" + } + ], + "outcome": { + "status": "partial_success", + "steps": 1, + "latency_ms": 0, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Draft trajectory generated by MemabraRunner with execution hooks." + }, + "reward": { + "total": 0.44, + "components": { + "task_success": 0.4, + "retrieval_hit": 0.1, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.0, + "context_cost": 0.06, + "useful_reuse": 0.0 + } + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/trajectories/traj-ed1d8812-f0ac-4994-86ab-21b3cf0fcb17.json b/docs/demo-artifacts/trajectories/traj-ed1d8812-f0ac-4994-86ab-21b3cf0fcb17.json new file mode 100644 index 0000000..8257a07 --- /dev/null +++ b/docs/demo-artifacts/trajectories/traj-ed1d8812-f0ac-4994-86ab-21b3cf0fcb17.json @@ -0,0 +1,192 @@ +{ + "trajectory_id": "traj-ed1d8812-f0ac-4994-86ab-21b3cf0fcb17", + "task": { + "task_id": "task-d9131553-8868-4dac-8f06-69be44c43f4e", + "input": "Use multiple capabilities: memory, skill, and tool.", + "channel": "local", + "created_at": "2026-04-14T15:52:24.607062+00:00", + "user_id": null + }, + "context_snapshot": { + "conversation_summary": "", + "environment_summary": "", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-telegram-pref", + "type": "memory", + "title": "Telegram preference", + "summary": "Prefer plain text on Telegram.", + "triggers": [ + "telegram", + "preference", + "answer" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 0.9, + "risk": 0.0, + "tags": [ + "output" + ], + "source": "user", + "type_payload": {} + } + ], + "skill": [ + { + "id": "skill-deploy", + "type": "skill", + "title": "Deploy workflow", + "summary": "Reusable deployment workflow.", + "triggers": [ + "deploy", + "workflow", + "service" + ], + "cost": 0.0, + "confidence": 0.8, + "success_rate": 0.9, + "freshness": 0.8, + "risk": 0.0, + "tags": [ + "ops" + ], + "source": "system", + "type_payload": {} + } + ], + "tool": [ + { + "id": "tool-terminal", + "type": "tool", + "title": "terminal", + "summary": "Run terminal-style inspection commands.", + "triggers": [ + "check", + "current", + "status", + "system" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 1.0, + "risk": 0.0, + "tags": [ + "inspection" + ], + "source": "system", + "type_payload": {} + } + ] + }, + "decisions": [ + { + "step": 1, + "decision_type": "load_skill", + "selected_ids": [ + "skill-deploy" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Predicted by learning router (score=2167.3334).", + "estimated_cost": 0.0 + } + ], + "events": [ + { + "event_id": "evt-a8bc2d4a-1557-4029-899f-7fa93b764b11", + "trajectory_id": "traj-ed1d8812-f0ac-4994-86ab-21b3cf0fcb17", + "timestamp": "2026-04-14T15:52:24.607165+00:00", + "stage": "retrieval", + "event_type": "task_received", + "payload": { + "input": "Use multiple capabilities: memory, skill, and tool." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-832ac7e6-d619-4e24-ad74-bcca1042806e", + "trajectory_id": "traj-ed1d8812-f0ac-4994-86ab-21b3cf0fcb17", + "timestamp": "2026-04-14T15:52:24.607175+00:00", + "stage": "retrieval", + "event_type": "candidates_recalled", + "payload": { + "memory_ids": [ + "mem-telegram-pref" + ], + "skill_ids": [ + "skill-deploy" + ], + "tool_ids": [ + "tool-terminal" + ] + }, + "metrics": {}, + "parent_event_id": "evt-a8bc2d4a-1557-4029-899f-7fa93b764b11" + }, + { + "event_id": "evt-0aaff11d-de9f-4e28-bc92-6def76857a20", + "trajectory_id": "traj-ed1d8812-f0ac-4994-86ab-21b3cf0fcb17", + "timestamp": "2026-04-14T15:52:24.607182+00:00", + "stage": "policy", + "event_type": "action_selected", + "payload": { + "step": 1, + "decision_type": "load_skill", + "selected_ids": [ + "skill-deploy" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Predicted by learning router (score=2167.3334).", + "estimated_cost": 0.0 + }, + "metrics": {}, + "parent_event_id": "evt-832ac7e6-d619-4e24-ad74-bcca1042806e" + }, + { + "event_id": "evt-skill-traj-ed1d8812-f0ac-4994-86ab-21b3cf0fcb17-skill-deploy", + "trajectory_id": "traj-ed1d8812-f0ac-4994-86ab-21b3cf0fcb17", + "timestamp": "2026-04-14T15:52:24.607192+00:00", + "stage": "execution", + "event_type": "skill_loaded", + "payload": { + "skill_id": "skill-deploy", + "input": "Use multiple capabilities: memory, skill, and tool.", + "instructions": "Demo skill payload loaded successfully." + }, + "metrics": {}, + "parent_event_id": null + } + ], + "outcome": { + "status": "success", + "steps": 1, + "latency_ms": 0, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Draft trajectory generated by MemabraRunner with execution hooks." + }, + "reward": { + "total": 1.04, + "components": { + "task_success": 0.8, + "retrieval_hit": 0.2, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.0, + "context_cost": 0.06, + "useful_reuse": 0.1 + } + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/trajectories/traj-f1d895a0-5442-448f-8936-4ee8b07822e6.json b/docs/demo-artifacts/trajectories/traj-f1d895a0-5442-448f-8936-4ee8b07822e6.json new file mode 100644 index 0000000..9d1d797 --- /dev/null +++ b/docs/demo-artifacts/trajectories/traj-f1d895a0-5442-448f-8936-4ee8b07822e6.json @@ -0,0 +1,192 @@ +{ + "trajectory_id": "traj-f1d895a0-5442-448f-8936-4ee8b07822e6", + "task": { + "task_id": "task-053282d0-1f43-409f-a230-343d3faa02df", + "input": "Check current system status with a tool.", + "channel": "local", + "created_at": "2026-04-14T15:52:24.606551+00:00", + "user_id": null + }, + "context_snapshot": { + "conversation_summary": "", + "environment_summary": "", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-telegram-pref", + "type": "memory", + "title": "Telegram preference", + "summary": "Prefer plain text on Telegram.", + "triggers": [ + "telegram", + "preference", + "answer" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 0.9, + "risk": 0.0, + "tags": [ + "output" + ], + "source": "user", + "type_payload": {} + } + ], + "skill": [ + { + "id": "skill-deploy", + "type": "skill", + "title": "Deploy workflow", + "summary": "Reusable deployment workflow.", + "triggers": [ + "deploy", + "workflow", + "service" + ], + "cost": 0.0, + "confidence": 0.8, + "success_rate": 0.9, + "freshness": 0.8, + "risk": 0.0, + "tags": [ + "ops" + ], + "source": "system", + "type_payload": {} + } + ], + "tool": [ + { + "id": "tool-terminal", + "type": "tool", + "title": "terminal", + "summary": "Run terminal-style inspection commands.", + "triggers": [ + "check", + "current", + "status", + "system" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 1.0, + "risk": 0.0, + "tags": [ + "inspection" + ], + "source": "system", + "type_payload": {} + } + ] + }, + "decisions": [ + { + "step": 1, + "decision_type": "load_skill", + "selected_ids": [ + "skill-deploy" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Predicted by learning router (score=1701.0804).", + "estimated_cost": 0.0 + } + ], + "events": [ + { + "event_id": "evt-5900439a-2a97-41fe-a82e-96181c99fee1", + "trajectory_id": "traj-f1d895a0-5442-448f-8936-4ee8b07822e6", + "timestamp": "2026-04-14T15:52:24.606656+00:00", + "stage": "retrieval", + "event_type": "task_received", + "payload": { + "input": "Check current system status with a tool." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-e0965597-ddee-4ccd-ae72-b51105101428", + "trajectory_id": "traj-f1d895a0-5442-448f-8936-4ee8b07822e6", + "timestamp": "2026-04-14T15:52:24.606666+00:00", + "stage": "retrieval", + "event_type": "candidates_recalled", + "payload": { + "memory_ids": [ + "mem-telegram-pref" + ], + "skill_ids": [ + "skill-deploy" + ], + "tool_ids": [ + "tool-terminal" + ] + }, + "metrics": {}, + "parent_event_id": "evt-5900439a-2a97-41fe-a82e-96181c99fee1" + }, + { + "event_id": "evt-047dc545-d6c2-4a67-b0db-26b79e994e63", + "trajectory_id": "traj-f1d895a0-5442-448f-8936-4ee8b07822e6", + "timestamp": "2026-04-14T15:52:24.606672+00:00", + "stage": "policy", + "event_type": "action_selected", + "payload": { + "step": 1, + "decision_type": "load_skill", + "selected_ids": [ + "skill-deploy" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Predicted by learning router (score=1701.0804).", + "estimated_cost": 0.0 + }, + "metrics": {}, + "parent_event_id": "evt-e0965597-ddee-4ccd-ae72-b51105101428" + }, + { + "event_id": "evt-skill-traj-f1d895a0-5442-448f-8936-4ee8b07822e6-skill-deploy", + "trajectory_id": "traj-f1d895a0-5442-448f-8936-4ee8b07822e6", + "timestamp": "2026-04-14T15:52:24.606681+00:00", + "stage": "execution", + "event_type": "skill_loaded", + "payload": { + "skill_id": "skill-deploy", + "input": "Check current system status with a tool.", + "instructions": "Demo skill payload loaded successfully." + }, + "metrics": {}, + "parent_event_id": null + } + ], + "outcome": { + "status": "success", + "steps": 1, + "latency_ms": 0, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Draft trajectory generated by MemabraRunner with execution hooks." + }, + "reward": { + "total": 1.04, + "components": { + "task_success": 0.8, + "retrieval_hit": 0.2, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.0, + "context_cost": 0.06, + "useful_reuse": 0.1 + } + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/trajectories/traj-f511e978-ad79-4be6-bbab-461b5ad9ecb3.json b/docs/demo-artifacts/trajectories/traj-f511e978-ad79-4be6-bbab-461b5ad9ecb3.json new file mode 100644 index 0000000..5108176 --- /dev/null +++ b/docs/demo-artifacts/trajectories/traj-f511e978-ad79-4be6-bbab-461b5ad9ecb3.json @@ -0,0 +1,170 @@ +{ + "trajectory_id": "traj-f511e978-ad79-4be6-bbab-461b5ad9ecb3", + "task": { + "task_id": "task-c3c52f6d-4793-4687-9838-d98fd99a6074", + "input": "Use multiple capabilities: memory, skill, and tool.", + "channel": "local", + "created_at": "2026-04-14T16:50:18.863031+00:00", + "user_id": null + }, + "context_snapshot": { + "conversation_summary": "", + "environment_summary": "", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-telegram-pref", + "type": "memory", + "title": "Telegram preference", + "summary": "Prefer plain text on Telegram.", + "triggers": [ + "telegram", + "preference", + "answer" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 0.9, + "risk": 0.0, + "tags": [ + "output" + ], + "source": "user", + "type_payload": {} + } + ], + "skill": [ + { + "id": "skill-deploy", + "type": "skill", + "title": "Deploy workflow", + "summary": "Reusable deployment workflow.", + "triggers": [ + "deploy", + "workflow", + "service" + ], + "cost": 0.0, + "confidence": 0.8, + "success_rate": 0.9, + "freshness": 0.8, + "risk": 0.0, + "tags": [ + "ops" + ], + "source": "system", + "type_payload": {} + } + ], + "tool": [ + { + "id": "tool-terminal", + "type": "tool", + "title": "terminal", + "summary": "Run terminal-style inspection commands.", + "triggers": [ + "check", + "current", + "status", + "system" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 1.0, + "risk": 0.0, + "tags": [ + "inspection" + ], + "source": "system", + "type_payload": {} + } + ] + }, + "decisions": [ + { + "step": 1, + "decision_type": "clarify", + "selected_ids": [], + "selected_payloads": [], + "rejected_ids": [], + "rationale": "No high-confidence route found from the current heuristic baseline.", + "estimated_cost": 0.0 + } + ], + "events": [ + { + "event_id": "evt-1cfd8f39-f961-43da-9fb4-9e37dd7072f0", + "trajectory_id": "traj-f511e978-ad79-4be6-bbab-461b5ad9ecb3", + "timestamp": "2026-04-14T16:50:18.863119+00:00", + "stage": "retrieval", + "event_type": "task_received", + "payload": { + "input": "Use multiple capabilities: memory, skill, and tool." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-a7f6a38f-76c5-4342-a592-4acbd15efe9f", + "trajectory_id": "traj-f511e978-ad79-4be6-bbab-461b5ad9ecb3", + "timestamp": "2026-04-14T16:50:18.863129+00:00", + "stage": "retrieval", + "event_type": "candidates_recalled", + "payload": { + "memory_ids": [ + "mem-telegram-pref" + ], + "skill_ids": [ + "skill-deploy" + ], + "tool_ids": [ + "tool-terminal" + ] + }, + "metrics": {}, + "parent_event_id": "evt-1cfd8f39-f961-43da-9fb4-9e37dd7072f0" + }, + { + "event_id": "evt-79e3d820-34bf-4c20-9286-2e20dd3e068c", + "trajectory_id": "traj-f511e978-ad79-4be6-bbab-461b5ad9ecb3", + "timestamp": "2026-04-14T16:50:18.863136+00:00", + "stage": "policy", + "event_type": "action_selected", + "payload": { + "step": 1, + "decision_type": "clarify", + "selected_ids": [], + "selected_payloads": [], + "rejected_ids": [], + "rationale": "No high-confidence route found from the current heuristic baseline.", + "estimated_cost": 0.0 + }, + "metrics": {}, + "parent_event_id": "evt-a7f6a38f-76c5-4342-a592-4acbd15efe9f" + } + ], + "outcome": { + "status": "partial_success", + "steps": 1, + "latency_ms": 0, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Draft trajectory generated by MemabraRunner with execution hooks." + }, + "reward": { + "total": 0.44, + "components": { + "task_success": 0.4, + "retrieval_hit": 0.1, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.0, + "context_cost": 0.06, + "useful_reuse": 0.0 + } + } +} \ No newline at end of file diff --git a/docs/demo-artifacts/trajectories/traj-ffb40d01-7956-4d7b-a41c-9618487fe619.json b/docs/demo-artifacts/trajectories/traj-ffb40d01-7956-4d7b-a41c-9618487fe619.json new file mode 100644 index 0000000..adfa60d --- /dev/null +++ b/docs/demo-artifacts/trajectories/traj-ffb40d01-7956-4d7b-a41c-9618487fe619.json @@ -0,0 +1,192 @@ +{ + "trajectory_id": "traj-ffb40d01-7956-4d7b-a41c-9618487fe619", + "task": { + "task_id": "task-f0aed2e6-8d9b-42f8-a20c-5eb8af052d3b", + "input": "Recall my saved preference from memory.", + "channel": "local", + "created_at": "2026-04-14T15:52:24.605509+00:00", + "user_id": null + }, + "context_snapshot": { + "conversation_summary": "", + "environment_summary": "", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-telegram-pref", + "type": "memory", + "title": "Telegram preference", + "summary": "Prefer plain text on Telegram.", + "triggers": [ + "telegram", + "preference", + "answer" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 0.9, + "risk": 0.0, + "tags": [ + "output" + ], + "source": "user", + "type_payload": {} + } + ], + "skill": [ + { + "id": "skill-deploy", + "type": "skill", + "title": "Deploy workflow", + "summary": "Reusable deployment workflow.", + "triggers": [ + "deploy", + "workflow", + "service" + ], + "cost": 0.0, + "confidence": 0.8, + "success_rate": 0.9, + "freshness": 0.8, + "risk": 0.0, + "tags": [ + "ops" + ], + "source": "system", + "type_payload": {} + } + ], + "tool": [ + { + "id": "tool-terminal", + "type": "tool", + "title": "terminal", + "summary": "Run terminal-style inspection commands.", + "triggers": [ + "check", + "current", + "status", + "system" + ], + "cost": 0.0, + "confidence": 0.95, + "success_rate": 0.9, + "freshness": 1.0, + "risk": 0.0, + "tags": [ + "inspection" + ], + "source": "system", + "type_payload": {} + } + ] + }, + "decisions": [ + { + "step": 1, + "decision_type": "load_skill", + "selected_ids": [ + "skill-deploy" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Predicted by learning router (score=1658.6938).", + "estimated_cost": 0.0 + } + ], + "events": [ + { + "event_id": "evt-44233637-eb1a-47de-972c-942ee409dd78", + "trajectory_id": "traj-ffb40d01-7956-4d7b-a41c-9618487fe619", + "timestamp": "2026-04-14T15:52:24.605614+00:00", + "stage": "retrieval", + "event_type": "task_received", + "payload": { + "input": "Recall my saved preference from memory." + }, + "metrics": {}, + "parent_event_id": null + }, + { + "event_id": "evt-01123ad4-7d52-4c82-bca1-1a3b5014196f", + "trajectory_id": "traj-ffb40d01-7956-4d7b-a41c-9618487fe619", + "timestamp": "2026-04-14T15:52:24.605625+00:00", + "stage": "retrieval", + "event_type": "candidates_recalled", + "payload": { + "memory_ids": [ + "mem-telegram-pref" + ], + "skill_ids": [ + "skill-deploy" + ], + "tool_ids": [ + "tool-terminal" + ] + }, + "metrics": {}, + "parent_event_id": "evt-44233637-eb1a-47de-972c-942ee409dd78" + }, + { + "event_id": "evt-a9a657a1-1e3e-49f3-8ea0-9528c12c633f", + "trajectory_id": "traj-ffb40d01-7956-4d7b-a41c-9618487fe619", + "timestamp": "2026-04-14T15:52:24.605632+00:00", + "stage": "policy", + "event_type": "action_selected", + "payload": { + "step": 1, + "decision_type": "load_skill", + "selected_ids": [ + "skill-deploy" + ], + "selected_payloads": [ + {} + ], + "rejected_ids": [], + "rationale": "Predicted by learning router (score=1658.6938).", + "estimated_cost": 0.0 + }, + "metrics": {}, + "parent_event_id": "evt-01123ad4-7d52-4c82-bca1-1a3b5014196f" + }, + { + "event_id": "evt-skill-traj-ffb40d01-7956-4d7b-a41c-9618487fe619-skill-deploy", + "trajectory_id": "traj-ffb40d01-7956-4d7b-a41c-9618487fe619", + "timestamp": "2026-04-14T15:52:24.605642+00:00", + "stage": "execution", + "event_type": "skill_loaded", + "payload": { + "skill_id": "skill-deploy", + "input": "Recall my saved preference from memory.", + "instructions": "Demo skill payload loaded successfully." + }, + "metrics": {}, + "parent_event_id": null + } + ], + "outcome": { + "status": "success", + "steps": 1, + "latency_ms": 0, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Draft trajectory generated by MemabraRunner with execution hooks." + }, + "reward": { + "total": 1.04, + "components": { + "task_success": 0.8, + "retrieval_hit": 0.2, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.0, + "context_cost": 0.06, + "useful_reuse": 0.1 + } + } +} \ No newline at end of file diff --git a/docs/examples/trajectory_failure_missed_memory.json b/docs/examples/trajectory_failure_missed_memory.json new file mode 100644 index 0000000..c72c1e1 --- /dev/null +++ b/docs/examples/trajectory_failure_missed_memory.json @@ -0,0 +1,66 @@ +{ + "trajectory_id": "traj-failure-missed-memory-001", + "task": { + "task_id": "task-004", + "input": "Use my usual formatting preferences for this write-up.", + "channel": "telegram", + "created_at": "2026-04-14T13:05:00Z", + "user_id": "oza" + }, + "context_snapshot": { + "conversation_summary": "User has repeated stable formatting preferences in earlier sessions.", + "environment_summary": "No tool call required.", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-format-1", + "type": "memory", + "title": "Telegram formatting preference", + "summary": "Prefer plain text over markdown for Telegram delivery.", + "triggers": ["format", "telegram", "write-up"], + "cost": 0.05, + "confidence": 0.9, + "success_rate": 0.95, + "freshness": 0.95, + "risk": 0.05, + "tags": ["preference", "output"], + "source": "system" + } + ], + "skill": [], + "tool": [] + }, + "decisions": [ + { + "step": 1, + "decision_type": "direct_answer", + "selected_ids": [], + "rejected_ids": ["mem-format-1"], + "rationale": "Router failed to recognize a preference-triggered task and skipped memory injection.", + "estimated_cost": 0.0 + } + ], + "events": [], + "outcome": { + "status": "partial_success", + "steps": 1, + "latency_ms": 300, + "user_corrections": 1, + "tool_errors": 0, + "notes": "Answer was serviceable but ignored known formatting preference." + }, + "reward": { + "total": 0.18, + "components": { + "task_success": 0.5, + "retrieval_hit": -0.1, + "tool_error": 0.0, + "user_correction": 0.2, + "latency": 0.02, + "context_cost": 0.0, + "useful_reuse": 0.0 + } + } +} \ No newline at end of file diff --git a/docs/examples/trajectory_failure_overtool.json b/docs/examples/trajectory_failure_overtool.json new file mode 100644 index 0000000..3879356 --- /dev/null +++ b/docs/examples/trajectory_failure_overtool.json @@ -0,0 +1,67 @@ +{ + "trajectory_id": "traj-failure-overtool-001", + "task": { + "task_id": "task-003", + "input": "Name this project.", + "channel": "telegram", + "created_at": "2026-04-14T13:04:00Z", + "user_id": "oza" + }, + "context_snapshot": { + "conversation_summary": "User asks for naming help for an agent memory project.", + "environment_summary": "No real-time state lookup required.", + "recent_failures": ["The agent previously overused tools for pure reasoning tasks."] + }, + "candidate_sets": { + "memory": [], + "skill": [], + "tool": [ + { + "id": "tool-web-1", + "type": "tool", + "title": "web_search", + "summary": "Search the web for information.", + "triggers": ["name", "idea"], + "cost": 0.4, + "confidence": 0.62, + "success_rate": 0.55, + "freshness": 1.0, + "risk": 0.3, + "tags": ["research"], + "source": "system" + } + ], + "skill": [] + }, + "decisions": [ + { + "step": 1, + "decision_type": "call_tool", + "selected_ids": ["tool-web-1"], + "rejected_ids": [], + "rationale": "Incorrectly treated naming as a research task rather than a reasoning task.", + "estimated_cost": 0.4 + } + ], + "events": [], + "outcome": { + "status": "failure", + "steps": 2, + "latency_ms": 2400, + "user_corrections": 1, + "tool_errors": 1, + "notes": "Over-tooled a pure reasoning task and forced unnecessary latency." + }, + "reward": { + "total": -0.82, + "components": { + "task_success": -0.3, + "retrieval_hit": 0.0, + "tool_error": 0.35, + "user_correction": 0.25, + "latency": 0.12, + "context_cost": 0.1, + "useful_reuse": 0.0 + } + } +} \ No newline at end of file diff --git a/docs/examples/trajectory_success_memory.json b/docs/examples/trajectory_success_memory.json new file mode 100644 index 0000000..292df6d --- /dev/null +++ b/docs/examples/trajectory_success_memory.json @@ -0,0 +1,66 @@ +{ + "trajectory_id": "traj-success-memory-001", + "task": { + "task_id": "task-001", + "input": "Remember my preferred deployment region and use it next time.", + "channel": "telegram", + "created_at": "2026-04-14T13:02:00Z", + "user_id": "oza" + }, + "context_snapshot": { + "conversation_summary": "User is defining a local agent memory project and references recurring preferences.", + "environment_summary": "No live tool call required.", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [ + { + "id": "mem-region-1", + "type": "memory", + "title": "Preferred deployment region", + "summary": "User prefers us-west-2 for deployments.", + "triggers": ["deployment", "region", "preference"], + "cost": 0.1, + "confidence": 0.93, + "success_rate": 0.88, + "freshness": 0.9, + "risk": 0.1, + "tags": ["preference", "deployment"], + "source": "user" + } + ], + "skill": [], + "tool": [] + }, + "decisions": [ + { + "step": 1, + "decision_type": "inject_memory", + "selected_ids": ["mem-region-1"], + "rejected_ids": [], + "rationale": "User request depends on a stable preference, so memory injection is the lowest-cost correct route.", + "estimated_cost": 0.1 + } + ], + "events": [], + "outcome": { + "status": "success", + "steps": 1, + "latency_ms": 350, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Correctly identified preference storage request without unnecessary tools." + }, + "reward": { + "total": 1.72, + "components": { + "task_success": 1.0, + "retrieval_hit": 0.45, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.03, + "context_cost": 0.05, + "useful_reuse": 0.35 + } + } +} \ No newline at end of file diff --git a/docs/examples/trajectory_success_tool.json b/docs/examples/trajectory_success_tool.json new file mode 100644 index 0000000..7a8692e --- /dev/null +++ b/docs/examples/trajectory_success_tool.json @@ -0,0 +1,67 @@ +{ + "trajectory_id": "traj-success-tool-001", + "task": { + "task_id": "task-002", + "input": "Check the current test status for the prototype.", + "channel": "telegram", + "created_at": "2026-04-14T13:03:00Z", + "user_id": "oza" + }, + "context_snapshot": { + "conversation_summary": "User wants concrete progress on the memabra prototype.", + "environment_summary": "Pytest is available in the local repo environment.", + "recent_failures": [] + }, + "candidate_sets": { + "memory": [], + "skill": [], + "tool": [ + { + "id": "tool-terminal-1", + "type": "tool", + "title": "terminal", + "summary": "Run shell commands in the local environment.", + "triggers": ["check", "current", "test"], + "cost": 0.2, + "confidence": 0.95, + "success_rate": 0.92, + "freshness": 1.0, + "risk": 0.2, + "tags": ["system", "tests"], + "source": "system" + } + ], + "skill": [] + }, + "decisions": [ + { + "step": 1, + "decision_type": "call_tool", + "selected_ids": ["tool-terminal-1"], + "rejected_ids": [], + "rationale": "Current test status is a live system fact and must be observed with a tool.", + "estimated_cost": 0.2 + } + ], + "events": [], + "outcome": { + "status": "success", + "steps": 1, + "latency_ms": 700, + "user_corrections": 0, + "tool_errors": 0, + "notes": "Terminal used appropriately to inspect live test state." + }, + "reward": { + "total": 1.6, + "components": { + "task_success": 1.0, + "retrieval_hit": 0.4, + "tool_error": 0.0, + "user_correction": 0.0, + "latency": 0.08, + "context_cost": 0.02, + "useful_reuse": 0.3 + } + } +} \ No newline at end of file diff --git a/docs/reward_spec.md b/docs/reward_spec.md new file mode 100644 index 0000000..9a540ce --- /dev/null +++ b/docs/reward_spec.md @@ -0,0 +1,191 @@ +# Reward Specification + +## 目标 + +memabra 的 reward 不是简单判断“任务做成没”,而是评估: +- 是否选对了 memory / skill / tool +- 是否高效 +- 是否稳定 +- 是否减少了用户重复输入和纠正 +- 是否控制了工具成本与上下文成本 + +reward 的作用不是直接美化分数,而是给路由策略提供可归因、可优化的训练信号。 + +## Reward 组成 + +总奖励记为: + +```text +R = ws*S + wr*H - we*E - wc*C - wl*L - wx*X + wu*U +``` + +其中: +- `S` = task success +- `H` = retrieval hit quality +- `E` = execution/tool error penalty +- `C` = user correction penalty +- `L` = latency penalty +- `X` = context cost penalty +- `U` = useful reuse bonus + +## 1. Task Success (`S`) + +定义:任务最终是否完成,以及完成质量如何。 + +建议取值: +- `1.0`:完整达成目标 +- `0.5`:部分达成 +- `0.0`:未完成 +- `-0.5`:明显误导或做错方向 + +数据来源: +- 自动任务验收器 +- 用户显式反馈 +- 回放对比规则 + +## 2. Retrieval Hit Quality (`H`) + +定义:是否命中对任务真正有帮助的 memory / skill / tool。 + +建议拆分: +- `Hm`:memory hit +- `Hs`:skill hit +- `Ht`:tool hit + +取值思路: +- 命中高价值候选并帮助减少步骤:正奖励 +- 召回很多但没用:低奖励或 0 +- 漏掉关键候选:负奖励 + +## 3. Execution / Tool Error Penalty (`E`) + +定义:是否出现无效调用、错误调用、明显多余调用。 + +示例: +- 调了不该调的工具 +- 工具参数明显错 +- 重复调用同一无效动作 +- 本可以直接答,却走了长链路 + +建议取值: +- 每次轻微错误:`0.1` 到 `0.3` +- 严重错误:`0.5` 到 `1.0` + +## 4. User Correction Penalty (`C`) + +定义:用户是否需要补充本应已知的信息,或纠正错误动作。 + +示例: +- 用户重复说明偏好 +- 用户指出调用了错误工具 +- 用户要求撤回错误记忆 + +解释: +这项对长期系统非常关键,因为它直接代表“系统到底有没有真正学会”。 + +## 5. Latency Penalty (`L`) + +定义:系统完成任务消耗的时间和步骤是否过长。 + +建议包括: +- wall-clock latency +- action count +- retry count + +思路: +- 少量额外推理可以接受 +- 大量无效绕路必须惩罚 + +## 6. Context Cost Penalty (`X`) + +定义:是否过度膨胀上下文。 + +包括: +- 注入了太多无关 memory +- 加载了不必要的 skill +- 输出了过大的中间内容 + +原因: +agent 很容易“为了保险多塞一点”,结果把上下文拖死。 +这个成本必须显式进 reward。 + +## 7. Useful Reuse Bonus (`U`) + +定义:是否复用了正确的长期信息,并确实提升了效率或质量。 + +例子: +- 成功复用用户偏好,避免再次确认 +- 复用已验证的 skill,减少试错 +- 复用相似 episode,加速完成任务 + +## 初始权重建议 + +可先用一个朴素版本: + +```text +ws = 1.0 +wr = 0.35 +we = 0.30 +wc = 0.40 +wl = 0.15 +wx = 0.20 +wu = 0.25 +``` + +解释: +- success 最高 +- user correction 罚得较重,因为它直接暴露系统没学会 +- retrieval hit 有明显价值,但不能盖过结果 +- latency/context 重要,但初期不该过重 + +## 信号来源 + +reward 可来自三类来源: + +### A. 显式信号 +- 用户说“对/不对” +- 用户纠正 +- 用户二次要求重做 + +### B. 隐式信号 +- 是否减少步骤 +- 是否触发错误 +- 是否重复问同样的问题 +- 是否超时 + +### C. 程序性验收 +- 测试是否通过 +- 目标文件是否生成 +- 指定字段是否匹配 +- 工具执行是否成功 + +## 反事实记录要求 + +为后续训练,必须记录: +- 候选集有哪些 +- 最终选了谁 +- 哪些高分候选没有被选 +- 每个动作的局部 outcome + +否则 reward 只能打给“整个过程”,无法学习具体路由策略。 + +## 初期策略 + +Phase 0 / Phase 1 不建议直接把 reward 用于大模型权重更新。 +先用于: +- 路由规则评估 +- 样本打标 +- 候选排序优化 +- bandit / reranker 训练 + +## 风险 + +- 只看 success,会奖励瞎猫碰死耗子 +- 只看效率,会让系统不敢探索 +- 只看用户反馈,会受用户表达噪声影响 +- 不记录反事实,训练会非常盲 + +## 当前结论 + +reward 在 memabra 中不是附属件,而是学习闭环的核心基础设施。 +如果 reward 设计不清,后面所有“根据结果更新权重”都会变成伪学习。 \ No newline at end of file diff --git a/docs/router-versions/current.json b/docs/router-versions/current.json new file mode 100644 index 0000000..99f5dff --- /dev/null +++ b/docs/router-versions/current.json @@ -0,0 +1,13 @@ +{ + "current_version_id": "20260415-023347", + "promotion_source": null, + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + }, + "prior_version_id": "20260415-023347", + "saved_at": "2026-04-15T02:33:47.916903+00:00" +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-150123.json b/docs/router-versions/versions/20260414-150123.json new file mode 100644 index 0000000..981eef7 --- /dev/null +++ b/docs/router-versions/versions/20260414-150123.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-150123", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-150127.json b/docs/router-versions/versions/20260414-150127.json new file mode 100644 index 0000000..a2ff8e9 --- /dev/null +++ b/docs/router-versions/versions/20260414-150127.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-150127", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-150228.json b/docs/router-versions/versions/20260414-150228.json new file mode 100644 index 0000000..87fef3e --- /dev/null +++ b/docs/router-versions/versions/20260414-150228.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-150228", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-150426.json b/docs/router-versions/versions/20260414-150426.json new file mode 100644 index 0000000..2c35bc1 --- /dev/null +++ b/docs/router-versions/versions/20260414-150426.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-150426", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-152505.json b/docs/router-versions/versions/20260414-152505.json new file mode 100644 index 0000000..89177f8 --- /dev/null +++ b/docs/router-versions/versions/20260414-152505.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-152505", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-152530.json b/docs/router-versions/versions/20260414-152530.json new file mode 100644 index 0000000..c0e6e3f --- /dev/null +++ b/docs/router-versions/versions/20260414-152530.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-152530", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-152625.json b/docs/router-versions/versions/20260414-152625.json new file mode 100644 index 0000000..e999388 --- /dev/null +++ b/docs/router-versions/versions/20260414-152625.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-152625", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-152935.json b/docs/router-versions/versions/20260414-152935.json new file mode 100644 index 0000000..264b6e5 --- /dev/null +++ b/docs/router-versions/versions/20260414-152935.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-152935", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-152941.json b/docs/router-versions/versions/20260414-152941.json new file mode 100644 index 0000000..839774c --- /dev/null +++ b/docs/router-versions/versions/20260414-152941.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-152941", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-155036.json b/docs/router-versions/versions/20260414-155036.json new file mode 100644 index 0000000..7dc14d0 --- /dev/null +++ b/docs/router-versions/versions/20260414-155036.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-155036", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-155251.json b/docs/router-versions/versions/20260414-155251.json new file mode 100644 index 0000000..f7adf54 --- /dev/null +++ b/docs/router-versions/versions/20260414-155251.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-155251", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-155350.json b/docs/router-versions/versions/20260414-155350.json new file mode 100644 index 0000000..21756af --- /dev/null +++ b/docs/router-versions/versions/20260414-155350.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-155350", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-164944.json b/docs/router-versions/versions/20260414-164944.json new file mode 100644 index 0000000..bbb009f --- /dev/null +++ b/docs/router-versions/versions/20260414-164944.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-164944", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-165138.json b/docs/router-versions/versions/20260414-165138.json new file mode 100644 index 0000000..eced56e --- /dev/null +++ b/docs/router-versions/versions/20260414-165138.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-165138", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-165207.json b/docs/router-versions/versions/20260414-165207.json new file mode 100644 index 0000000..f5e1e9d --- /dev/null +++ b/docs/router-versions/versions/20260414-165207.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-165207", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-165241.json b/docs/router-versions/versions/20260414-165241.json new file mode 100644 index 0000000..534781e --- /dev/null +++ b/docs/router-versions/versions/20260414-165241.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-165241", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-165316.json b/docs/router-versions/versions/20260414-165316.json new file mode 100644 index 0000000..5710405 --- /dev/null +++ b/docs/router-versions/versions/20260414-165316.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-165316", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-165359.json b/docs/router-versions/versions/20260414-165359.json new file mode 100644 index 0000000..f5c2a67 --- /dev/null +++ b/docs/router-versions/versions/20260414-165359.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-165359", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-165450.json b/docs/router-versions/versions/20260414-165450.json new file mode 100644 index 0000000..31153a6 --- /dev/null +++ b/docs/router-versions/versions/20260414-165450.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-165450", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-171516.json b/docs/router-versions/versions/20260414-171516.json new file mode 100644 index 0000000..3e158f7 --- /dev/null +++ b/docs/router-versions/versions/20260414-171516.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-171516", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-171623.json b/docs/router-versions/versions/20260414-171623.json new file mode 100644 index 0000000..a132719 --- /dev/null +++ b/docs/router-versions/versions/20260414-171623.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-171623", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-171651.json b/docs/router-versions/versions/20260414-171651.json new file mode 100644 index 0000000..c34f0c5 --- /dev/null +++ b/docs/router-versions/versions/20260414-171651.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-171651", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-171757.json b/docs/router-versions/versions/20260414-171757.json new file mode 100644 index 0000000..65dd12d --- /dev/null +++ b/docs/router-versions/versions/20260414-171757.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-171757", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-173832.json b/docs/router-versions/versions/20260414-173832.json new file mode 100644 index 0000000..725bfc5 --- /dev/null +++ b/docs/router-versions/versions/20260414-173832.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-173832", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-180027.json b/docs/router-versions/versions/20260414-180027.json new file mode 100644 index 0000000..b58c8ab --- /dev/null +++ b/docs/router-versions/versions/20260414-180027.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-180027", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-180106.json b/docs/router-versions/versions/20260414-180106.json new file mode 100644 index 0000000..fd7ddda --- /dev/null +++ b/docs/router-versions/versions/20260414-180106.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-180106", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-180343.json b/docs/router-versions/versions/20260414-180343.json new file mode 100644 index 0000000..3348994 --- /dev/null +++ b/docs/router-versions/versions/20260414-180343.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-180343", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-180515.json b/docs/router-versions/versions/20260414-180515.json new file mode 100644 index 0000000..67ade92 --- /dev/null +++ b/docs/router-versions/versions/20260414-180515.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-180515", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-180553.json b/docs/router-versions/versions/20260414-180553.json new file mode 100644 index 0000000..dbe91d3 --- /dev/null +++ b/docs/router-versions/versions/20260414-180553.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-180553", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-180625.json b/docs/router-versions/versions/20260414-180625.json new file mode 100644 index 0000000..98ff42d --- /dev/null +++ b/docs/router-versions/versions/20260414-180625.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-180625", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-180658.json b/docs/router-versions/versions/20260414-180658.json new file mode 100644 index 0000000..accb2b3 --- /dev/null +++ b/docs/router-versions/versions/20260414-180658.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-180658", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-182721.json b/docs/router-versions/versions/20260414-182721.json new file mode 100644 index 0000000..e418761 --- /dev/null +++ b/docs/router-versions/versions/20260414-182721.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-182721", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-182806.json b/docs/router-versions/versions/20260414-182806.json new file mode 100644 index 0000000..ca80cf9 --- /dev/null +++ b/docs/router-versions/versions/20260414-182806.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-182806", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-183024.json b/docs/router-versions/versions/20260414-183024.json new file mode 100644 index 0000000..b502e71 --- /dev/null +++ b/docs/router-versions/versions/20260414-183024.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-183024", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-183107.json b/docs/router-versions/versions/20260414-183107.json new file mode 100644 index 0000000..37ba03e --- /dev/null +++ b/docs/router-versions/versions/20260414-183107.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-183107", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-185133.json b/docs/router-versions/versions/20260414-185133.json new file mode 100644 index 0000000..df79a5d --- /dev/null +++ b/docs/router-versions/versions/20260414-185133.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-185133", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-185710.json b/docs/router-versions/versions/20260414-185710.json new file mode 100644 index 0000000..11eeb76 --- /dev/null +++ b/docs/router-versions/versions/20260414-185710.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-185710", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-185816.json b/docs/router-versions/versions/20260414-185816.json new file mode 100644 index 0000000..8b98a8a --- /dev/null +++ b/docs/router-versions/versions/20260414-185816.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-185816", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-185837.json b/docs/router-versions/versions/20260414-185837.json new file mode 100644 index 0000000..dd8c930 --- /dev/null +++ b/docs/router-versions/versions/20260414-185837.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-185837", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-191901.json b/docs/router-versions/versions/20260414-191901.json new file mode 100644 index 0000000..4739452 --- /dev/null +++ b/docs/router-versions/versions/20260414-191901.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-191901", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-192109.json b/docs/router-versions/versions/20260414-192109.json new file mode 100644 index 0000000..bff3115 --- /dev/null +++ b/docs/router-versions/versions/20260414-192109.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-192109", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-194133.json b/docs/router-versions/versions/20260414-194133.json new file mode 100644 index 0000000..b333bab --- /dev/null +++ b/docs/router-versions/versions/20260414-194133.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-194133", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-194158.json b/docs/router-versions/versions/20260414-194158.json new file mode 100644 index 0000000..8d624b2 --- /dev/null +++ b/docs/router-versions/versions/20260414-194158.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-194158", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-200220.json b/docs/router-versions/versions/20260414-200220.json new file mode 100644 index 0000000..d82c9b1 --- /dev/null +++ b/docs/router-versions/versions/20260414-200220.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-200220", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-200302.json b/docs/router-versions/versions/20260414-200302.json new file mode 100644 index 0000000..b4e7d22 --- /dev/null +++ b/docs/router-versions/versions/20260414-200302.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-200302", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-200458.json b/docs/router-versions/versions/20260414-200458.json new file mode 100644 index 0000000..28f3df2 --- /dev/null +++ b/docs/router-versions/versions/20260414-200458.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-200458", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-200616.json b/docs/router-versions/versions/20260414-200616.json new file mode 100644 index 0000000..6364746 --- /dev/null +++ b/docs/router-versions/versions/20260414-200616.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-200616", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-200738.json b/docs/router-versions/versions/20260414-200738.json new file mode 100644 index 0000000..c71f1af --- /dev/null +++ b/docs/router-versions/versions/20260414-200738.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-200738", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-202805.json b/docs/router-versions/versions/20260414-202805.json new file mode 100644 index 0000000..1f11f79 --- /dev/null +++ b/docs/router-versions/versions/20260414-202805.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-202805", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-203008.json b/docs/router-versions/versions/20260414-203008.json new file mode 100644 index 0000000..4d19849 --- /dev/null +++ b/docs/router-versions/versions/20260414-203008.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-203008", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-203111.json b/docs/router-versions/versions/20260414-203111.json new file mode 100644 index 0000000..b1107cb --- /dev/null +++ b/docs/router-versions/versions/20260414-203111.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-203111", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-203237.json b/docs/router-versions/versions/20260414-203237.json new file mode 100644 index 0000000..77d5df4 --- /dev/null +++ b/docs/router-versions/versions/20260414-203237.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-203237", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-203328.json b/docs/router-versions/versions/20260414-203328.json new file mode 100644 index 0000000..7aae422 --- /dev/null +++ b/docs/router-versions/versions/20260414-203328.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-203328", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-203401.json b/docs/router-versions/versions/20260414-203401.json new file mode 100644 index 0000000..cf9911d --- /dev/null +++ b/docs/router-versions/versions/20260414-203401.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-203401", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-205435.json b/docs/router-versions/versions/20260414-205435.json new file mode 100644 index 0000000..6c6b636 --- /dev/null +++ b/docs/router-versions/versions/20260414-205435.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-205435", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-205607.json b/docs/router-versions/versions/20260414-205607.json new file mode 100644 index 0000000..15dbbda --- /dev/null +++ b/docs/router-versions/versions/20260414-205607.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-205607", + "weights": { + "clarify": { + "input_length": 11.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-205611.json b/docs/router-versions/versions/20260414-205611.json new file mode 100644 index 0000000..066f12a --- /dev/null +++ b/docs/router-versions/versions/20260414-205611.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-205611", + "weights": { + "clarify": { + "input_length": 11.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-205703.json b/docs/router-versions/versions/20260414-205703.json new file mode 100644 index 0000000..06983c9 --- /dev/null +++ b/docs/router-versions/versions/20260414-205703.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-205703", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-205728.json b/docs/router-versions/versions/20260414-205728.json new file mode 100644 index 0000000..8477afd --- /dev/null +++ b/docs/router-versions/versions/20260414-205728.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-205728", + "weights": { + "clarify": { + "input_length": 11.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-205805.json b/docs/router-versions/versions/20260414-205805.json new file mode 100644 index 0000000..71603c3 --- /dev/null +++ b/docs/router-versions/versions/20260414-205805.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-205805", + "weights": { + "clarify": { + "input_length": 11.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-211836.json b/docs/router-versions/versions/20260414-211836.json new file mode 100644 index 0000000..dc012fc --- /dev/null +++ b/docs/router-versions/versions/20260414-211836.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-211836", + "weights": { + "clarify": { + "input_length": 11.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-212115.json b/docs/router-versions/versions/20260414-212115.json new file mode 100644 index 0000000..85d6659 --- /dev/null +++ b/docs/router-versions/versions/20260414-212115.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-212115", + "weights": { + "clarify": { + "input_length": 11.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-212202.json b/docs/router-versions/versions/20260414-212202.json new file mode 100644 index 0000000..96564a3 --- /dev/null +++ b/docs/router-versions/versions/20260414-212202.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-212202", + "weights": { + "clarify": { + "input_length": 11.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-212214.json b/docs/router-versions/versions/20260414-212214.json new file mode 100644 index 0000000..6cacf77 --- /dev/null +++ b/docs/router-versions/versions/20260414-212214.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-212214", + "weights": { + "clarify": { + "input_length": 11.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-214245.json b/docs/router-versions/versions/20260414-214245.json new file mode 100644 index 0000000..013cf37 --- /dev/null +++ b/docs/router-versions/versions/20260414-214245.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-214245", + "weights": { + "clarify": { + "input_length": 11.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-214448.json b/docs/router-versions/versions/20260414-214448.json new file mode 100644 index 0000000..5e2da46 --- /dev/null +++ b/docs/router-versions/versions/20260414-214448.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-214448", + "weights": { + "clarify": { + "input_length": 11.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-220543.json b/docs/router-versions/versions/20260414-220543.json new file mode 100644 index 0000000..bf4b3a8 --- /dev/null +++ b/docs/router-versions/versions/20260414-220543.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-220543", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-220544.json b/docs/router-versions/versions/20260414-220544.json new file mode 100644 index 0000000..3192191 --- /dev/null +++ b/docs/router-versions/versions/20260414-220544.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-220544", + "weights": { + "clarify": { + "input_length": 11.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-220559.json b/docs/router-versions/versions/20260414-220559.json new file mode 100644 index 0000000..0d6d8ec --- /dev/null +++ b/docs/router-versions/versions/20260414-220559.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-220559", + "weights": { + "clarify": { + "input_length": 11.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-220819.json b/docs/router-versions/versions/20260414-220819.json new file mode 100644 index 0000000..dcf6ec2 --- /dev/null +++ b/docs/router-versions/versions/20260414-220819.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-220819", + "weights": { + "clarify": { + "input_length": 11.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-220857.json b/docs/router-versions/versions/20260414-220857.json new file mode 100644 index 0000000..8837b83 --- /dev/null +++ b/docs/router-versions/versions/20260414-220857.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-220857", + "weights": { + "clarify": { + "input_length": 11.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-220938.json b/docs/router-versions/versions/20260414-220938.json new file mode 100644 index 0000000..a830b0f --- /dev/null +++ b/docs/router-versions/versions/20260414-220938.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-220938", + "weights": { + "clarify": { + "input_length": 6.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-220939.json b/docs/router-versions/versions/20260414-220939.json new file mode 100644 index 0000000..ba0f02f --- /dev/null +++ b/docs/router-versions/versions/20260414-220939.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-220939", + "weights": { + "clarify": { + "input_length": 11.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-221015.json b/docs/router-versions/versions/20260414-221015.json new file mode 100644 index 0000000..5696f93 --- /dev/null +++ b/docs/router-versions/versions/20260414-221015.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-221015", + "weights": { + "clarify": { + "input_length": 11.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260414-221023.json b/docs/router-versions/versions/20260414-221023.json new file mode 100644 index 0000000..d586cf7 --- /dev/null +++ b/docs/router-versions/versions/20260414-221023.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260414-221023", + "weights": { + "clarify": { + "input_length": 11.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260415-012153.json b/docs/router-versions/versions/20260415-012153.json new file mode 100644 index 0000000..e6d47c1 --- /dev/null +++ b/docs/router-versions/versions/20260415-012153.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260415-012153", + "weights": { + "clarify": { + "input_length": 11.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260415-012533.json b/docs/router-versions/versions/20260415-012533.json new file mode 100644 index 0000000..83e02d6 --- /dev/null +++ b/docs/router-versions/versions/20260415-012533.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260415-012533", + "weights": { + "clarify": { + "input_length": 11.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260415-012918.json b/docs/router-versions/versions/20260415-012918.json new file mode 100644 index 0000000..abe5867 --- /dev/null +++ b/docs/router-versions/versions/20260415-012918.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260415-012918", + "weights": { + "clarify": { + "input_length": 11.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260415-013334.json b/docs/router-versions/versions/20260415-013334.json new file mode 100644 index 0000000..14a4658 --- /dev/null +++ b/docs/router-versions/versions/20260415-013334.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260415-013334", + "weights": { + "clarify": { + "input_length": 11.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260415-013636.json b/docs/router-versions/versions/20260415-013636.json new file mode 100644 index 0000000..2e75c9c --- /dev/null +++ b/docs/router-versions/versions/20260415-013636.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260415-013636", + "weights": { + "clarify": { + "input_length": 11.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260415-014152.json b/docs/router-versions/versions/20260415-014152.json new file mode 100644 index 0000000..bf7c6f4 --- /dev/null +++ b/docs/router-versions/versions/20260415-014152.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260415-014152", + "weights": { + "clarify": { + "input_length": 11.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260415-015732.json b/docs/router-versions/versions/20260415-015732.json new file mode 100644 index 0000000..5005a12 --- /dev/null +++ b/docs/router-versions/versions/20260415-015732.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260415-015732", + "weights": { + "clarify": { + "input_length": 11.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260415-023117.json b/docs/router-versions/versions/20260415-023117.json new file mode 100644 index 0000000..fceb022 --- /dev/null +++ b/docs/router-versions/versions/20260415-023117.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260415-023117", + "weights": { + "clarify": { + "input_length": 11.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/router-versions/versions/20260415-023347.json b/docs/router-versions/versions/20260415-023347.json new file mode 100644 index 0000000..cf75f78 --- /dev/null +++ b/docs/router-versions/versions/20260415-023347.json @@ -0,0 +1,35 @@ +{ + "version_id": "20260415-023347", + "weights": { + "clarify": { + "input_length": 11.0, + "memory_count": 1.0, + "skill_count": 1.0, + "tool_count": 1.0, + "top_memory_confidence": 0.9500000000000001, + "top_skill_success_rate": 0.8999999999999998, + "top_tool_confidence": 0.9500000000000001, + "top_tool_risk": 0.0 + } + }, + "feature_keys": [ + "input_length", + "memory_count", + "skill_count", + "tool_count", + "top_memory_confidence", + "top_skill_success_rate", + "top_tool_confidence", + "top_tool_risk" + ], + "metadata": { + "source": "online_learning", + "benchmark_summary": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + } +} \ No newline at end of file diff --git a/docs/schemas/candidate_object.schema.json b/docs/schemas/candidate_object.schema.json new file mode 100644 index 0000000..908d35f --- /dev/null +++ b/docs/schemas/candidate_object.schema.json @@ -0,0 +1,86 @@ +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "$id": "https://memabra.local/schemas/candidate_object.schema.json", + "title": "CandidateObject", + "description": "Unified retrieval/routing candidate for memory, skill, or tool objects in memabra.", + "type": "object", + "additionalProperties": false, + "required": [ + "id", + "type", + "title", + "summary", + "triggers", + "cost", + "confidence", + "success_rate", + "freshness", + "risk", + "tags", + "source" + ], + "properties": { + "id": { + "type": "string", + "minLength": 1 + }, + "type": { + "type": "string", + "enum": ["memory", "skill", "tool"] + }, + "title": { + "type": "string", + "minLength": 1 + }, + "summary": { + "type": "string", + "minLength": 1 + }, + "triggers": { + "type": "array", + "items": {"type": "string"}, + "default": [] + }, + "cost": { + "type": "number", + "minimum": 0 + }, + "confidence": { + "type": "number", + "minimum": 0, + "maximum": 1 + }, + "success_rate": { + "type": "number", + "minimum": 0, + "maximum": 1 + }, + "freshness": { + "type": "number", + "minimum": 0, + "maximum": 1 + }, + "risk": { + "type": "number", + "minimum": 0, + "maximum": 1 + }, + "embedding_ref": { + "type": ["string", "null"] + }, + "tags": { + "type": "array", + "items": {"type": "string"}, + "default": [] + }, + "source": { + "type": "string", + "enum": ["user", "system", "generated", "external"] + }, + "type_payload": { + "type": "object", + "description": "Type-specific metadata retained without collapsing semantic boundaries.", + "default": {} + } + } +} \ No newline at end of file diff --git a/docs/schemas/event.schema.json b/docs/schemas/event.schema.json new file mode 100644 index 0000000..8450951 --- /dev/null +++ b/docs/schemas/event.schema.json @@ -0,0 +1,57 @@ +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "$id": "https://memabra.local/schemas/event.schema.json", + "title": "Event", + "description": "Atomic event emitted during retrieval, routing, execution, and evaluation in memabra.", + "type": "object", + "additionalProperties": false, + "required": [ + "event_id", + "trajectory_id", + "timestamp", + "stage", + "event_type", + "payload" + ], + "properties": { + "event_id": {"type": "string", "minLength": 1}, + "trajectory_id": {"type": "string", "minLength": 1}, + "timestamp": {"type": "string", "format": "date-time"}, + "stage": { + "type": "string", + "enum": ["retrieval", "policy", "execution", "evaluation", "memory_writeback"] + }, + "event_type": { + "type": "string", + "enum": [ + "task_received", + "context_summarized", + "candidates_recalled", + "candidate_scored", + "action_selected", + "tool_called", + "tool_result", + "skill_loaded", + "memory_injected", + "user_clarified", + "user_corrected", + "reward_computed", + "memory_written", + "memory_revoked", + "task_completed", + "task_failed" + ] + }, + "payload": { + "type": "object", + "description": "Event-specific structured body" + }, + "metrics": { + "type": "object", + "default": {} + }, + "parent_event_id": { + "type": ["string", "null"] + } + } +} \ No newline at end of file diff --git a/docs/schemas/memory_record.schema.json b/docs/schemas/memory_record.schema.json new file mode 100644 index 0000000..ec402c7 --- /dev/null +++ b/docs/schemas/memory_record.schema.json @@ -0,0 +1,75 @@ +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "$id": "https://memabra.local/schemas/memory_record.schema.json", + "title": "MemoryRecord", + "description": "Long-term memory record stored by memabra with explicit layer typing and verification metadata.", + "type": "object", + "additionalProperties": false, + "required": [ + "id", + "memory_type", + "fact_status", + "content", + "summary", + "source", + "confidence", + "created_at", + "updated_at", + "verification" + ], + "properties": { + "id": {"type": "string", "minLength": 1}, + "memory_type": { + "type": "string", + "enum": ["semantic", "procedural", "episodic", "working"] + }, + "fact_status": { + "type": "string", + "enum": ["draft", "assumed", "verified", "deprecated", "revoked"] + }, + "content": {"type": "string", "minLength": 1}, + "summary": {"type": "string", "minLength": 1}, + "source": { + "type": "object", + "additionalProperties": false, + "required": ["kind", "ref"], + "properties": { + "kind": {"type": "string", "enum": ["user", "session", "tool", "import", "system"]}, + "ref": {"type": "string", "minLength": 1} + } + }, + "confidence": {"type": "number", "minimum": 0, "maximum": 1}, + "tags": { + "type": "array", + "items": {"type": "string"}, + "default": [] + }, + "related_entities": { + "type": "array", + "items": {"type": "string"}, + "default": [] + }, + "created_at": {"type": "string", "format": "date-time"}, + "updated_at": {"type": "string", "format": "date-time"}, + "last_used_at": {"type": ["string", "null"], "format": "date-time"}, + "expires_at": {"type": ["string", "null"], "format": "date-time"}, + "verification": { + "type": "object", + "additionalProperties": false, + "required": ["status", "last_checked_at", "check_method"], + "properties": { + "status": {"type": "string", "enum": ["unknown", "pending", "confirmed", "disputed", "failed"]}, + "last_checked_at": {"type": ["string", "null"], "format": "date-time"}, + "check_method": {"type": ["string", "null"]} + } + }, + "revocation": { + "type": ["object", "null"], + "additionalProperties": false, + "properties": { + "reason": {"type": "string"}, + "revoked_at": {"type": "string", "format": "date-time"} + } + } + } +} \ No newline at end of file diff --git a/docs/schemas/trajectory.schema.json b/docs/schemas/trajectory.schema.json new file mode 100644 index 0000000..61d2218 --- /dev/null +++ b/docs/schemas/trajectory.schema.json @@ -0,0 +1,121 @@ +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "$id": "https://memabra.local/schemas/trajectory.schema.json", + "title": "Trajectory", + "description": "Replayable task-level trace for routing and learning in memabra.", + "type": "object", + "additionalProperties": false, + "required": [ + "trajectory_id", + "task", + "context_snapshot", + "candidate_sets", + "decisions", + "events", + "outcome", + "reward" + ], + "properties": { + "trajectory_id": {"type": "string", "minLength": 1}, + "task": { + "type": "object", + "additionalProperties": false, + "required": ["task_id", "input", "channel", "created_at"], + "properties": { + "task_id": {"type": "string", "minLength": 1}, + "input": {"type": "string", "minLength": 1}, + "channel": {"type": "string", "minLength": 1}, + "created_at": {"type": "string", "format": "date-time"}, + "user_id": {"type": ["string", "null"]} + } + }, + "context_snapshot": { + "type": "object", + "additionalProperties": false, + "required": ["conversation_summary", "environment_summary"], + "properties": { + "conversation_summary": {"type": "string"}, + "environment_summary": {"type": "string"}, + "recent_failures": { + "type": "array", + "items": {"type": "string"}, + "default": [] + } + } + }, + "candidate_sets": { + "type": "object", + "additionalProperties": false, + "required": ["memory", "skill", "tool"], + "properties": { + "memory": {"type": "array", "items": {"$ref": "candidate_object.schema.json"}}, + "skill": {"type": "array", "items": {"$ref": "candidate_object.schema.json"}}, + "tool": {"type": "array", "items": {"$ref": "candidate_object.schema.json"}} + } + }, + "decisions": { + "type": "array", + "items": { + "type": "object", + "additionalProperties": false, + "required": ["step", "decision_type", "selected_ids", "rationale"], + "properties": { + "step": {"type": "integer", "minimum": 1}, + "decision_type": { + "type": "string", + "enum": ["direct_answer", "inject_memory", "load_skill", "call_tool", "clarify", "composite_action"] + }, + "selected_ids": { + "type": "array", + "items": {"type": "string"} + }, + "rejected_ids": { + "type": "array", + "items": {"type": "string"}, + "default": [] + }, + "rationale": {"type": "string"}, + "estimated_cost": {"type": ["number", "null" ]} + } + } + }, + "events": { + "type": "array", + "items": {"$ref": "event.schema.json"} + }, + "outcome": { + "type": "object", + "additionalProperties": false, + "required": ["status", "steps", "latency_ms", "user_corrections"], + "properties": { + "status": {"type": "string", "enum": ["success", "partial_success", "failure"]}, + "steps": {"type": "integer", "minimum": 0}, + "latency_ms": {"type": "integer", "minimum": 0}, + "user_corrections": {"type": "integer", "minimum": 0}, + "tool_errors": {"type": "integer", "minimum": 0}, + "notes": {"type": ["string", "null"]} + } + }, + "reward": { + "type": "object", + "additionalProperties": false, + "required": ["total", "components"], + "properties": { + "total": {"type": "number"}, + "components": { + "type": "object", + "required": ["task_success", "retrieval_hit", "tool_error", "user_correction", "latency", "context_cost", "useful_reuse"], + "properties": { + "task_success": {"type": "number"}, + "retrieval_hit": {"type": "number"}, + "tool_error": {"type": "number"}, + "user_correction": {"type": "number"}, + "latency": {"type": "number"}, + "context_cost": {"type": "number"}, + "useful_reuse": {"type": "number"} + } + } + } + } + } +} \ No newline at end of file diff --git a/docs/training-reports/report-0036bcfb-88dc-4636-897e-89fc909a810e.json b/docs/training-reports/report-0036bcfb-88dc-4636-897e-89fc909a810e.json new file mode 100644 index 0000000..b2865d7 --- /dev/null +++ b/docs/training-reports/report-0036bcfb-88dc-4636-897e-89fc909a810e.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-0036bcfb-88dc-4636-897e-89fc909a810e", + "timestamp": "2026-04-14T16:51:38.314846+00:00", + "source_trajectory_ids": [ + "traj-22d17281-9e5c-435d-852e-fa646d15afc4", + "traj-29a77a54-36ed-4885-b77f-ffc131425d2c", + "traj-40bce4b3-20ba-47ab-ac8d-4f3c494bffd1", + "traj-6ce2c5e5-6d58-439a-82ec-21f77f6de860", + "traj-76480a70-fbe1-4481-848b-a7e8d37643f5", + "traj-9a588dc5-9ef2-4290-8712-0b31946536a2", + "traj-b43b4a4e-4dfb-4ba9-8c56-29ea09e00e17", + "traj-ba03c72c-b782-400f-a9b1-4a4f6c0d7769", + "traj-be3bf833-bc49-4852-9ea2-ca04aeea8f31", + "traj-ebafcf74-923e-4af1-b64d-45c7cdbb4b04" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null +} \ No newline at end of file diff --git a/docs/training-reports/report-009a9d41-ba23-4e38-85ad-cd6af5971d8b.json b/docs/training-reports/report-009a9d41-ba23-4e38-85ad-cd6af5971d8b.json new file mode 100644 index 0000000..aab2739 --- /dev/null +++ b/docs/training-reports/report-009a9d41-ba23-4e38-85ad-cd6af5971d8b.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-009a9d41-ba23-4e38-85ad-cd6af5971d8b", + "timestamp": "2026-04-14T19:41:33.462482+00:00", + "source_trajectory_ids": [ + "traj-0e089eaf-e132-405d-992f-a912f6baaaea", + "traj-2881966c-ad32-44c8-9c05-a50b0a2b784c", + "traj-5b351fac-7019-4807-a18a-c66b1c95c3e0", + "traj-6ef4ff84-d199-4864-8c99-6cd9efded1c6", + "traj-a62f2760-76ec-41f9-a20a-3ba8912c7c55", + "traj-b50b6662-ff12-4ec6-a112-c56e989bd768", + "traj-c1683dc5-e3d0-4421-aad7-fa42581096b2", + "traj-c3c9bd98-8c59-4cc4-8ad9-6d7b3c0be987", + "traj-d9a4fcc7-e929-48e6-8153-5d5c9c04f798", + "traj-e357c149-301f-4826-8812-6a1dab9087bd" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-025f0317-eb57-4357-a944-57c83e768e2b.json b/docs/training-reports/report-025f0317-eb57-4357-a944-57c83e768e2b.json new file mode 100644 index 0000000..d970059 --- /dev/null +++ b/docs/training-reports/report-025f0317-eb57-4357-a944-57c83e768e2b.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-025f0317-eb57-4357-a944-57c83e768e2b", + "timestamp": "2026-04-14T20:54:35.912785+00:00", + "source_trajectory_ids": [ + "traj-0a386589-4f3d-4427-8bb6-984395bc391e", + "traj-0f04b540-f8bb-46d7-aeb4-ea65a723b82e", + "traj-14e30ab1-29e9-4356-ae7e-a8cea48c0b60", + "traj-4143c1db-ac63-4bc9-b427-a8f4d64c63f8", + "traj-43de5dee-3e20-42cf-91c1-2371b2f31329", + "traj-55766bd5-37dc-4216-9e29-3aea0a8a5095", + "traj-745e3299-1fd3-4af8-b6e2-4ebc4a47d389", + "traj-7c938b98-8346-48f8-a676-adb2e72e7259", + "traj-7dd2e59b-f65f-4870-b09e-69b95438b57b", + "traj-ab956b24-6aaa-49a2-8841-544cf9555959" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-0335fde2-290a-4346-91b0-d1224cb1253f.json b/docs/training-reports/report-0335fde2-290a-4346-91b0-d1224cb1253f.json new file mode 100644 index 0000000..1cb03da --- /dev/null +++ b/docs/training-reports/report-0335fde2-290a-4346-91b0-d1224cb1253f.json @@ -0,0 +1,41 @@ +{ + "report_id": "report-0335fde2-290a-4346-91b0-d1224cb1253f", + "timestamp": "2026-04-14T15:26:25.556320+00:00", + "source_trajectory_ids": [ + "traj-0ccf1900-1e3b-4465-8f02-c51d07d7934c", + "traj-22a75db4-1794-4b10-ba4f-61539ae28352", + "traj-2ec475f3-4500-4c56-b317-ddb692e6eae5", + "traj-35007253-de45-43f6-a64c-121230ae0e1f", + "traj-3d3548d3-1981-46ad-be73-33b0420e58f4", + "traj-4d5bb70e-9529-4c2c-bb5b-da7f7d09f1f4", + "traj-5a663b45-d37f-489f-a403-6dd73d7b2b52", + "traj-bdbf6fab-cccd-4381-b3dc-ee7533b5be0e", + "traj-c4ea76f2-4403-430f-8821-91f14822e41f", + "traj-f2bf1402-39da-4ec2-97ed-b9349ca87581" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-152625" +} \ No newline at end of file diff --git a/docs/training-reports/report-04b8cf41-45f2-4870-ba8b-b509f7d3da48.json b/docs/training-reports/report-04b8cf41-45f2-4870-ba8b-b509f7d3da48.json new file mode 100644 index 0000000..326dc24 --- /dev/null +++ b/docs/training-reports/report-04b8cf41-45f2-4870-ba8b-b509f7d3da48.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-04b8cf41-45f2-4870-ba8b-b509f7d3da48", + "timestamp": "2026-04-14T18:58:37.161636+00:00", + "source_trajectory_ids": [ + "traj-10fd1aac-8da8-4f5d-be73-feea5fb4e60d", + "traj-4b7226f5-e3ed-47de-b0bb-febcad399f82", + "traj-6364e000-05f1-4de2-b018-090d2dd922bf", + "traj-6ea75734-5be4-4d8c-b5c2-88d971a12763", + "traj-7e17a2ac-0aaf-49a8-aed5-552ce80dcfc8", + "traj-92c21045-ff0f-4ad0-855f-307a9f509ef7", + "traj-b8cefeb5-17ae-4be4-a756-a4c9c453d3c2", + "traj-c6a2dba6-dd4f-4c9f-9455-4fb3db4d44b1", + "traj-dcac6477-8278-43ab-8efe-226cc8acdeaf", + "traj-eec22ce4-682b-4694-bdb8-657a84c4a76c" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-07a477c9-2b2f-4505-a392-5dce58b67829.json b/docs/training-reports/report-07a477c9-2b2f-4505-a392-5dce58b67829.json new file mode 100644 index 0000000..dc34954 --- /dev/null +++ b/docs/training-reports/report-07a477c9-2b2f-4505-a392-5dce58b67829.json @@ -0,0 +1,41 @@ +{ + "report_id": "report-07a477c9-2b2f-4505-a392-5dce58b67829", + "timestamp": "2026-04-14T18:01:06.160145+00:00", + "source_trajectory_ids": [ + "traj-04f74afc-d341-4f63-b5ab-32f6d0fb33fb", + "traj-1fce1f44-c31b-4143-a0af-05b14783299c", + "traj-5fd71ba8-a8ed-4c52-bd5c-3dc0196b954a", + "traj-66dbaed9-42ee-4736-bed9-2a7d8260b81e", + "traj-89ceb3cf-0bfa-477b-b80c-76392bc7e9db", + "traj-8a7a589b-422d-4c51-b209-1f8a28bbe624", + "traj-b885ff21-6df2-4ea2-a39e-bb47a5aca56e", + "traj-df5f18eb-825e-4c80-b047-f9798bbeb654", + "traj-f47da721-1886-4466-b389-32ef359b58e6", + "traj-f673ef5f-700a-4dc8-b5ce-0ae2d3ebeeab" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-180106" +} \ No newline at end of file diff --git a/docs/training-reports/report-0856c8c4-bc0a-402d-8e4c-2e946029226b.json b/docs/training-reports/report-0856c8c4-bc0a-402d-8e4c-2e946029226b.json new file mode 100644 index 0000000..ee955bc --- /dev/null +++ b/docs/training-reports/report-0856c8c4-bc0a-402d-8e4c-2e946029226b.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-0856c8c4-bc0a-402d-8e4c-2e946029226b", + "timestamp": "2026-04-14T21:22:02.802049+00:00", + "source_trajectory_ids": [ + "traj-29cd218f-e9b2-487d-ab34-620450a27cf7", + "traj-41331e52-1bb0-48c7-a65e-2749f3341018", + "traj-42b96e93-b37e-4518-84ce-90b243a4a9e2", + "traj-42c90394-8ac2-4a7d-8c12-4b4a78ab7a87", + "traj-47a0fae8-60f3-4a1a-90a1-0f643e2d9920", + "traj-7deb603d-b31e-4625-abaf-344ec12efe44", + "traj-dc35bca8-1bca-442a-93a1-4d77e360aba0", + "traj-e135ebd2-c850-4f9b-a6df-24b1d7eff190", + "traj-e87835bb-ba03-453e-8a50-49ddeeb7268d", + "traj-f1557075-9c9a-4f2a-bc48-6a919a379ae0" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-212202", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-08ef866e-b477-4e72-a32c-30003d2b91e9.json b/docs/training-reports/report-08ef866e-b477-4e72-a32c-30003d2b91e9.json new file mode 100644 index 0000000..965e996 --- /dev/null +++ b/docs/training-reports/report-08ef866e-b477-4e72-a32c-30003d2b91e9.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-08ef866e-b477-4e72-a32c-30003d2b91e9", + "timestamp": "2026-04-14T20:54:35.812564+00:00", + "source_trajectory_ids": [ + "traj-0c9390f7-31ef-48fa-896b-093f9cd4c0ce", + "traj-1822f88a-0a09-4536-8022-24a7a73ba6df", + "traj-28ba821c-6a0c-4d40-a008-14497585c3d7", + "traj-483b03b2-41ee-4228-9d26-bb4e45eb241c", + "traj-73718bd7-97ad-424e-a049-e1ecc05ad770", + "traj-b4089209-fcf2-4139-9ed1-e9db5caaff69", + "traj-c097a470-093d-4dd1-a6c0-d21110fea346", + "traj-fc0408dc-5429-4af8-87d0-8c6212ae2623", + "traj-fd8f8a32-d9cb-4bda-a70b-bd95e604e037", + "traj-fe7cd38e-3e5d-4ae3-9ee0-6a2c0caafb2b" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-205435", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-09798c98-3bcb-4298-a546-2f531f875853.json b/docs/training-reports/report-09798c98-3bcb-4298-a546-2f531f875853.json new file mode 100644 index 0000000..121baa6 --- /dev/null +++ b/docs/training-reports/report-09798c98-3bcb-4298-a546-2f531f875853.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-09798c98-3bcb-4298-a546-2f531f875853", + "timestamp": "2026-04-14T22:10:23.222959+00:00", + "source_trajectory_ids": [ + "traj-221f0c59-ad6b-4526-ae14-b5bb558b01ca", + "traj-24cbe596-6ee7-444a-b600-32d9b55422db", + "traj-2c5a6c34-df10-411d-9709-2a2e07cfca5e", + "traj-58e6dcd6-c688-4a66-a1a8-2fc64b06452a", + "traj-6a713624-ac97-42db-9946-9919da454d47", + "traj-813e4f86-4aab-420b-86fc-8a8694670c84", + "traj-a4c8fee4-428b-471c-8634-05d09c430b32", + "traj-caa40a45-47ec-4369-9519-ebdb038d5d6a", + "traj-d0d72631-39d8-47d7-83a5-76f424553eca", + "traj-d5dde094-d5e7-465d-8d7d-0f55356ae159" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-221023", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-097c5767-cf9b-42f5-9f8b-dee8a6224a67.json b/docs/training-reports/report-097c5767-cf9b-42f5-9f8b-dee8a6224a67.json new file mode 100644 index 0000000..dce1d39 --- /dev/null +++ b/docs/training-reports/report-097c5767-cf9b-42f5-9f8b-dee8a6224a67.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-097c5767-cf9b-42f5-9f8b-dee8a6224a67", + "timestamp": "2026-04-14T21:42:45.782035+00:00", + "source_trajectory_ids": [ + "traj-1212bad2-f0fe-4d95-afd8-2a711775ccfb", + "traj-4d53da5b-5a10-4689-8dd9-0a1b2fa74083", + "traj-50942176-422b-4653-9477-48e1d16c0d34", + "traj-6d886402-bc74-4c9e-998e-ad9e4177b08d", + "traj-7d91fc48-dd1d-4c40-9785-5ebef05378a4", + "traj-8c012adb-959e-4eab-ac6c-3c5c4854720c", + "traj-8df099f2-5180-4ac1-8519-3204b9cffe07", + "traj-9c21877b-f093-44cc-af53-9b2961a4dd46", + "traj-ff0df310-0704-4619-9346-b27f1df0f237", + "traj-ffb39e69-6327-4985-8858-730a3c00a806" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-214245", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-09ac51b7-988c-4b9c-ba38-3511d728c61d.json b/docs/training-reports/report-09ac51b7-988c-4b9c-ba38-3511d728c61d.json new file mode 100644 index 0000000..f4050d3 --- /dev/null +++ b/docs/training-reports/report-09ac51b7-988c-4b9c-ba38-3511d728c61d.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-09ac51b7-988c-4b9c-ba38-3511d728c61d", + "timestamp": "2026-04-14T22:10:23.371730+00:00", + "source_trajectory_ids": [ + "traj-00565435-0e10-46c5-82bd-3ba97f356fb2", + "traj-0249861e-b2f3-4e73-8e38-0599d0b7e8f0", + "traj-283bbc3b-5f88-48de-87b8-192418f70445", + "traj-4c674844-5148-4428-8607-22ae4ad7361d", + "traj-5689e1e1-0a03-45e8-b006-c692325fcc45", + "traj-a809b624-6317-4ce7-b809-4ab3479566ee", + "traj-b1e3a397-8678-436c-a9ed-17ea168c203a", + "traj-b87b739b-6daa-49a8-a7de-8d4509659328", + "traj-bc0076e9-0167-44ac-89ae-634b02890cb5", + "traj-d42ef1c3-cd8f-4474-a5d6-f0a42ff0a2f3" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-0a3e40be-b389-4041-bab7-cd99e4c8eac0.json b/docs/training-reports/report-0a3e40be-b389-4041-bab7-cd99e4c8eac0.json new file mode 100644 index 0000000..e27d86a --- /dev/null +++ b/docs/training-reports/report-0a3e40be-b389-4041-bab7-cd99e4c8eac0.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-0a3e40be-b389-4041-bab7-cd99e4c8eac0", + "timestamp": "2026-04-14T20:07:38.841838+00:00", + "source_trajectory_ids": [ + "traj-04eb60db-62bd-46c2-afe3-ecba6eac900a", + "traj-26387046-6b12-4841-9129-735599f13261", + "traj-34b22e88-a95e-4f12-84e5-da9af52a7381", + "traj-4f4eb7ad-1d11-4852-adde-eab50619c2bc", + "traj-51ecc36d-be08-4bcf-b645-7553e9b03992", + "traj-6b9f4f38-dc89-4abd-870e-c48c92d2b40e", + "traj-82fb2e11-fb35-4960-90d2-b2e53a1ea2ed", + "traj-ba5fa9da-693f-4a36-ab0f-c2efbe798ece", + "traj-c9c3403c-2ff3-4d84-85dd-731620583118", + "traj-ede4f925-e445-4cae-a3ba-0d30973294ae" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-0a675757-4870-4b12-98fb-ab093889eff3.json b/docs/training-reports/report-0a675757-4870-4b12-98fb-ab093889eff3.json new file mode 100644 index 0000000..e9f69f4 --- /dev/null +++ b/docs/training-reports/report-0a675757-4870-4b12-98fb-ab093889eff3.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-0a675757-4870-4b12-98fb-ab093889eff3", + "timestamp": "2026-04-15T01:57:32.814873+00:00", + "source_trajectory_ids": [ + "traj-1119ae3b-8cb6-4391-b283-bfeefdd12afe", + "traj-11afd403-d5d6-4af6-85b8-b015ed5bb1d3", + "traj-5c68a94c-d276-4e0f-9356-b99b6163e4e5", + "traj-5cdcaf3b-851f-45da-9fe1-253060428059", + "traj-5fd13613-7463-4969-ab53-c2e7e8555df3", + "traj-617478e4-408a-4ed6-a06d-b84da7be94b1", + "traj-a3a5c39f-592b-4b4b-94e9-59f63039e53e", + "traj-b5ffa504-3f59-4a75-8c10-1f4f5b5aa8c8", + "traj-c4e73e5c-5b4d-4f4c-a209-3f6147263622", + "traj-cccb6c09-84cd-4247-aa40-9ec6e0a9f1bd" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-0b64fe15-dd10-4f78-916b-200ec6483fcd.json b/docs/training-reports/report-0b64fe15-dd10-4f78-916b-200ec6483fcd.json new file mode 100644 index 0000000..d0baa1b --- /dev/null +++ b/docs/training-reports/report-0b64fe15-dd10-4f78-916b-200ec6483fcd.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-0b64fe15-dd10-4f78-916b-200ec6483fcd", + "timestamp": "2026-04-14T14:59:48.944796+00:00", + "source_trajectory_ids": [ + "traj-05a12459-be9e-484e-9a58-83b465e24092", + "traj-0924a001-1055-4126-b241-bcdd2c078494", + "traj-17ae339d-c886-414e-94b3-5e570093c8e4", + "traj-18e10e43-4694-43e1-be9a-f16fdf123e35", + "traj-5460af6d-b2c1-4a71-aaff-0060c05a4421", + "traj-5517734f-c4fe-497e-a402-aa5228395d34", + "traj-59b1a050-9be6-4aca-bbfb-0e1da246da2d", + "traj-7094a080-3592-4e33-9ba3-f32fdbe02e76", + "traj-8f62ae27-0a61-43bf-a8e6-7b73e9a1c888", + "traj-967b670c-429a-4be2-a8c2-ec341ff3106e" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null +} \ No newline at end of file diff --git a/docs/training-reports/report-0c59209d-fc75-4b15-bcdb-239138c12b79.json b/docs/training-reports/report-0c59209d-fc75-4b15-bcdb-239138c12b79.json new file mode 100644 index 0000000..411c092 --- /dev/null +++ b/docs/training-reports/report-0c59209d-fc75-4b15-bcdb-239138c12b79.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-0c59209d-fc75-4b15-bcdb-239138c12b79", + "timestamp": "2026-04-15T01:41:52.505512+00:00", + "source_trajectory_ids": [ + "traj-0ddf19dd-e828-4035-bd6e-29e627769d2e", + "traj-1c185db1-61db-4f0c-b069-8e8e4ced92e7", + "traj-23b5c08b-2fca-4f55-84a6-7068af698780", + "traj-3f243fab-5841-41e6-acf6-f8c9f40cf515", + "traj-713b43da-4b0f-4fbe-b190-9b508d1244f0", + "traj-9764ffe9-c580-4b3f-88a3-beead04a1df3", + "traj-a6e4d148-6744-4fda-a1bf-26603166117c", + "traj-cc7aa6c3-de72-4bfd-83fc-761eaa8cc8a7", + "traj-d12e9387-0fd2-4a4f-b387-3cd58cbf12f4", + "traj-fdb7911a-4cc0-4906-b5bc-658be058653e" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260415-014152", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-0cb5da05-94e7-4f55-b759-1338cebaf5fd.json b/docs/training-reports/report-0cb5da05-94e7-4f55-b759-1338cebaf5fd.json new file mode 100644 index 0000000..1edd189 --- /dev/null +++ b/docs/training-reports/report-0cb5da05-94e7-4f55-b759-1338cebaf5fd.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-0cb5da05-94e7-4f55-b759-1338cebaf5fd", + "timestamp": "2026-04-14T21:22:02.988940+00:00", + "source_trajectory_ids": [ + "traj-2704df3d-419c-4206-af3c-afd5466b305c", + "traj-49134729-36f2-467c-83ac-da261acc561b", + "traj-7adf97fd-549f-4123-9340-5b49f024f6d7", + "traj-852b116d-998b-48fd-aa71-293f8b31c6e4", + "traj-92a1293d-5d98-4efb-88ba-125ca308d246", + "traj-ab7f2171-8a7c-4315-a80f-fb88168b794e", + "traj-c3788641-46cf-43fe-bb92-4b3871f1b20e", + "traj-cf3e5d99-8b30-4877-a93d-481968801eaf", + "traj-e051963a-bada-41f6-9fc5-4ba429d136c9", + "traj-f5d94b37-3215-4a5f-9528-8b13dd9f4ceb" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-212202", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-0e0b0f65-2073-445d-8b24-753642e15b88.json b/docs/training-reports/report-0e0b0f65-2073-445d-8b24-753642e15b88.json new file mode 100644 index 0000000..2b74bb3 --- /dev/null +++ b/docs/training-reports/report-0e0b0f65-2073-445d-8b24-753642e15b88.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-0e0b0f65-2073-445d-8b24-753642e15b88", + "timestamp": "2026-04-15T02:31:17.525516+00:00", + "source_trajectory_ids": [ + "traj-22f50846-2268-4d3f-94ab-cef4813aa471", + "traj-34c1633d-c680-49ae-81de-f9c4942f3d1f", + "traj-3fb919b1-e148-4764-9197-aea2c313ec6a", + "traj-4dd3e06b-8c8d-41c2-a06a-006f024b868a", + "traj-6b505d4e-b7da-4f0d-81a5-1c37f89ca93e", + "traj-84925606-3ef8-47bc-8f58-97a22083b6ad", + "traj-d532be68-7050-4ec1-bf21-06085d8894f9", + "traj-e6eec076-e362-4dc1-8444-af9f6bb659b2", + "traj-f273ccb2-a6d6-40dd-836f-e6835a7aa55c", + "traj-ff6a2137-d4ab-406a-b43a-d2741f6dd91b" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260415-023117", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-10a2f403-eca3-4ec5-ac0e-8e907322679d.json b/docs/training-reports/report-10a2f403-eca3-4ec5-ac0e-8e907322679d.json new file mode 100644 index 0000000..04bf5bf --- /dev/null +++ b/docs/training-reports/report-10a2f403-eca3-4ec5-ac0e-8e907322679d.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-10a2f403-eca3-4ec5-ac0e-8e907322679d", + "timestamp": "2026-04-14T18:58:37.085590+00:00", + "source_trajectory_ids": [ + "traj-6aa4c009-836a-4013-887a-07ed5b767a2f", + "traj-6c9e26f0-ed6d-4bcb-aa25-e79789688ccb", + "traj-8898f73e-1ce5-4770-9966-359bef9958ae", + "traj-9055a563-4353-4975-a970-5ef46a472d45", + "traj-91d1f2db-f516-4fc9-8a74-8bc6ebe0be47", + "traj-b5b5fd0f-a745-48aa-a6b9-8c50451a7b07", + "traj-d3f8003f-39fd-4eb7-a68a-41d137da964a", + "traj-dfd75ad1-1e73-45d5-8dd0-aad7e860fdcc", + "traj-e178b3b0-8872-4f89-9d25-4e55d3a7aaf2", + "traj-e7bafd52-4193-4bdd-9fc1-d46658003751" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-185837", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-13b9b11b-b1cd-4d70-9fcc-a972cbd54805.json b/docs/training-reports/report-13b9b11b-b1cd-4d70-9fcc-a972cbd54805.json new file mode 100644 index 0000000..a0f6a6d --- /dev/null +++ b/docs/training-reports/report-13b9b11b-b1cd-4d70-9fcc-a972cbd54805.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-13b9b11b-b1cd-4d70-9fcc-a972cbd54805", + "timestamp": "2026-04-15T01:21:53.876489+00:00", + "source_trajectory_ids": [ + "traj-4095188b-6d84-4ee1-a3fc-a4a147b2e983", + "traj-4f2d3ac1-211c-417a-9836-87126bd0aa35", + "traj-5aa5c2f9-8ac1-4031-a12c-e6c4e8d0ece0", + "traj-60f0860e-9a83-4826-966d-40cf15d4fcb9", + "traj-74fc1f64-33b4-443d-9c1d-a4a0e60f7ff0", + "traj-8bc691d7-e5bc-4b7b-8621-8401f36a5f4d", + "traj-8d7af32a-19d7-4ff9-8f6d-a2337280cc4c", + "traj-b22e4a02-c74d-4e10-91f6-8ac4b7a5c2ea", + "traj-f833d392-3250-4caa-8acb-90fc49d3b3c1", + "traj-f9702b75-4b65-4f83-8ef1-c2594c87db8a" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260415-012153", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-13f1b744-a87c-48f0-b024-d48396ae1c25.json b/docs/training-reports/report-13f1b744-a87c-48f0-b024-d48396ae1c25.json new file mode 100644 index 0000000..564db83 --- /dev/null +++ b/docs/training-reports/report-13f1b744-a87c-48f0-b024-d48396ae1c25.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-13f1b744-a87c-48f0-b024-d48396ae1c25", + "timestamp": "2026-04-14T20:06:16.345373+00:00", + "source_trajectory_ids": [ + "traj-06da564b-232b-4496-ae9d-81306a08cc7b", + "traj-6b557f04-4b89-4628-8d4f-acb8d5b060df", + "traj-7e9f9a58-1594-44aa-9712-215e130a7dd6", + "traj-8765d4ea-b4c4-45dd-8830-df92cc3f3aba", + "traj-ac14447b-6afb-4d39-bd92-8172d4f50c8e", + "traj-b70ae420-ea46-47d8-8640-f0b21e659a81", + "traj-bb4b6108-ade8-421c-96f1-f35c36677029", + "traj-c81f1a4e-7182-4560-91c7-86bdc4ccfa03", + "traj-d6c3266e-8d3e-4e42-bf5b-8522ca351241", + "traj-eca10f07-c2e4-42bd-9ec7-21c3fba82752" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-200616", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-15c39b58-4792-486c-88c2-8fa95f34f0e7.json b/docs/training-reports/report-15c39b58-4792-486c-88c2-8fa95f34f0e7.json new file mode 100644 index 0000000..f78b045 --- /dev/null +++ b/docs/training-reports/report-15c39b58-4792-486c-88c2-8fa95f34f0e7.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-15c39b58-4792-486c-88c2-8fa95f34f0e7", + "timestamp": "2026-04-14T21:18:36.157196+00:00", + "source_trajectory_ids": [ + "traj-0d2380fb-a3a9-4c3b-bb01-4390198f0e60", + "traj-1147c420-d6bd-45a6-9071-a38f96205f7b", + "traj-436265cb-1612-4ef2-94e5-311619f97900", + "traj-4672cbe1-bc67-4378-b517-e4f0c23395c7", + "traj-4cd057fa-bca1-4172-805b-cf0aac1191ce", + "traj-7fae7ddb-e46d-411c-ae1c-b17ed44159e6", + "traj-beac32c5-da6d-48e2-aab1-140041c46a80", + "traj-c08cf9ff-2fee-4071-bb39-91955125de74", + "traj-c32944a0-6f3d-4fc4-9cb7-f8a5b581445b", + "traj-d59a5c84-eea1-44b3-b182-9f0b4119b448" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-211836", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-16240412-414a-48fb-a5de-244647601b99.json b/docs/training-reports/report-16240412-414a-48fb-a5de-244647601b99.json new file mode 100644 index 0000000..4111fcd --- /dev/null +++ b/docs/training-reports/report-16240412-414a-48fb-a5de-244647601b99.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-16240412-414a-48fb-a5de-244647601b99", + "timestamp": "2026-04-14T21:44:48.199092+00:00", + "source_trajectory_ids": [ + "traj-3d852c8f-73dd-454f-a35b-4d22d5dd187e", + "traj-5d8dd2a4-854f-41dd-9f14-6a08657fc60e", + "traj-64125a25-a99f-42c4-9d76-1b68c45809a4", + "traj-6fe3e14c-b6cd-4a04-9fe3-4cfd011f880b", + "traj-94ddd7b7-7a70-45d6-986c-9a22512fb6b8", + "traj-afbbdc73-f4a9-4d42-a8d5-9f41464b0e20", + "traj-bca371df-ce7e-427c-acd3-83c712cb11db", + "traj-c4849366-5ee8-4442-80cb-fb6207f59d48", + "traj-c6a1ed31-c1d7-4147-a035-fef03423d0b6", + "traj-e9a5cbf2-4716-462c-8f00-5e75da4636a4" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-214448", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-197593f2-1928-428f-a143-d59574a1070f.json b/docs/training-reports/report-197593f2-1928-428f-a143-d59574a1070f.json new file mode 100644 index 0000000..b3002f7 --- /dev/null +++ b/docs/training-reports/report-197593f2-1928-428f-a143-d59574a1070f.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-197593f2-1928-428f-a143-d59574a1070f", + "timestamp": "2026-04-15T01:41:52.446867+00:00", + "source_trajectory_ids": [ + "traj-032f9293-f5f9-4f2d-8724-4df72b6e2def", + "traj-1c8cbbb2-d4b0-427b-8009-57141775873f", + "traj-28f049b4-aac9-40d6-8e65-4f8d612fe1cb", + "traj-29337650-4610-4468-adf5-39cd2a095750", + "traj-568702f9-d5ce-440d-8056-faf70ca7492a", + "traj-76b8ddce-0f28-4fc1-8191-82a313b854e7", + "traj-d1ede161-80e7-4d3d-b09e-e65972bbbc61", + "traj-d3528eb9-0934-476e-9291-a0b616686308", + "traj-e4743774-7609-4749-bdc7-bdfe31107cd3", + "traj-eac8b576-7a44-472f-9553-b68773ac4bda" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-1a6693a9-d9aa-4fb7-9c37-8eca70db8ff2.json b/docs/training-reports/report-1a6693a9-d9aa-4fb7-9c37-8eca70db8ff2.json new file mode 100644 index 0000000..722be19 --- /dev/null +++ b/docs/training-reports/report-1a6693a9-d9aa-4fb7-9c37-8eca70db8ff2.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-1a6693a9-d9aa-4fb7-9c37-8eca70db8ff2", + "timestamp": "2026-04-14T20:56:07.953209+00:00", + "source_trajectory_ids": [ + "traj-1d87ed37-a6de-437b-b1f7-655e1465ae99", + "traj-59bfcf1f-5462-419c-8571-56960b954a7a", + "traj-71a16960-4bab-4e3a-8187-aff9a04774f4", + "traj-89b30f5d-96f7-4748-b44a-f03efa183c0c", + "traj-a67da73d-9e4b-4082-9f02-09c1b04a30c7", + "traj-a82d0c43-6d5d-4c2c-b4b1-9ba06a1f8433", + "traj-adb2c5dd-78a0-4a5b-98a7-1d78f8e7e680", + "traj-b5f04bf0-caf1-4523-9b30-5a094185428c", + "traj-d67e548f-de22-4a83-ba4a-9b737dbeefa0", + "traj-f840a2e0-d314-4dcc-9c02-262a92a093e8" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-205607", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-1b339815-f279-409b-ab77-c5c5c31744f7.json b/docs/training-reports/report-1b339815-f279-409b-ab77-c5c5c31744f7.json new file mode 100644 index 0000000..9138079 --- /dev/null +++ b/docs/training-reports/report-1b339815-f279-409b-ab77-c5c5c31744f7.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-1b339815-f279-409b-ab77-c5c5c31744f7", + "timestamp": "2026-04-14T20:56:11.542333+00:00", + "source_trajectory_ids": [ + "traj-02aee60f-4b45-4ee1-9341-c60be647ff1b", + "traj-0a7673fb-d561-44f6-9cb8-aa87122018a3", + "traj-441ae78a-e0c9-408e-87e4-421c9a96fc5e", + "traj-4d2d110d-4309-4728-b072-b1785e6df45a", + "traj-629ca24d-3b2f-4c8f-8be4-0b1f8bc21df7", + "traj-64f19728-5662-43f3-84da-9476b25db403", + "traj-963f8b19-2650-428c-9135-1500fd1d7ded", + "traj-c3aed0f9-1919-49a3-ab90-84d89f423d33", + "traj-e9567bfd-bec6-4836-8f23-341447fd7a9c", + "traj-ea6e5ff5-936f-4aa1-839b-2199ec7f925b" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-205611", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-1d45c6bd-7847-46d4-a3aa-953bcdce24ec.json b/docs/training-reports/report-1d45c6bd-7847-46d4-a3aa-953bcdce24ec.json new file mode 100644 index 0000000..8d0d923 --- /dev/null +++ b/docs/training-reports/report-1d45c6bd-7847-46d4-a3aa-953bcdce24ec.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-1d45c6bd-7847-46d4-a3aa-953bcdce24ec", + "timestamp": "2026-04-14T21:18:36.139079+00:00", + "source_trajectory_ids": [ + "traj-200f901f-aecf-43e4-a9cd-f3ebfed82ed0", + "traj-233795c8-8d47-4a3b-86f7-9d2af40b89cb", + "traj-42523abe-2654-46d6-8cce-4154ab093cf2", + "traj-49763bb6-634e-4fa0-a23b-0f537e97262d", + "traj-593e6516-7ec2-4226-83fb-65a4a5274616", + "traj-617282fd-7c85-4583-a59f-54315bbf9e40", + "traj-7f6074d8-ea0d-4d48-94f3-4a67d0ee92a5", + "traj-8a3543c1-3007-4271-9862-06ae3202f039", + "traj-9121275f-b96c-4cc5-a45d-1c50532c6409", + "traj-c73e41e8-483d-435a-bd6e-e868a445bd30" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-211836", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-1e0bf809-418e-4720-ab59-b8d7401ce94c.json b/docs/training-reports/report-1e0bf809-418e-4720-ab59-b8d7401ce94c.json new file mode 100644 index 0000000..b22d99c --- /dev/null +++ b/docs/training-reports/report-1e0bf809-418e-4720-ab59-b8d7401ce94c.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-1e0bf809-418e-4720-ab59-b8d7401ce94c", + "timestamp": "2026-04-14T20:30:08.746811+00:00", + "source_trajectory_ids": [ + "traj-09d6c0c8-8c5b-4264-aaf4-78b6ec7689b2", + "traj-11b5e179-af57-4df4-a07e-6263f6e82ddd", + "traj-12d9e29a-03cb-4242-91ee-de30aacb0e50", + "traj-41580331-af54-47ed-9aab-2fab2fc8c3a0", + "traj-707b126a-6164-475f-81a3-4a34fe624639", + "traj-903e62d9-1478-44de-8348-4e08531a9178", + "traj-9ebcf874-21b8-453f-817f-f7038907608c", + "traj-9fc3b26a-7ba9-4d9a-a732-17db84494c48", + "traj-a16724af-e0aa-4aa3-9615-c8c3b14173a7", + "traj-b1235cd4-6b9f-4b86-bca3-39f48ee4c1ea" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-1e1d679b-3b0d-4cb9-b474-8302992df5ba.json b/docs/training-reports/report-1e1d679b-3b0d-4cb9-b474-8302992df5ba.json new file mode 100644 index 0000000..a6696da --- /dev/null +++ b/docs/training-reports/report-1e1d679b-3b0d-4cb9-b474-8302992df5ba.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-1e1d679b-3b0d-4cb9-b474-8302992df5ba", + "timestamp": "2026-04-15T01:25:33.772160+00:00", + "source_trajectory_ids": [ + "traj-0091cdde-9035-4995-9b30-ba3e52a4e74b", + "traj-21d17e25-de55-46e8-b31f-0e6d6a045351", + "traj-52d9d18b-f37f-4a42-bcc8-ce7e28277942", + "traj-5f2a03a7-33d9-48f9-b1cb-bb5ac6e1f21c", + "traj-86cb57eb-74c0-4ea5-bce7-3aa1690b9599", + "traj-87e89d6e-3ef0-4ee4-ab62-c0002b4d1b22", + "traj-a7dd2fb7-6756-430e-9964-dfca0f3a6981", + "traj-aa624e2b-6012-406a-8542-2ffed00096bc", + "traj-ef61014c-791a-4535-a78f-7ab715a7c3bb", + "traj-fbab542f-9c5f-4b90-889e-f2d253862441" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260415-012533", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-20611e96-6b00-4a0d-9be5-5d5e968b3371.json b/docs/training-reports/report-20611e96-6b00-4a0d-9be5-5d5e968b3371.json new file mode 100644 index 0000000..8d92afa --- /dev/null +++ b/docs/training-reports/report-20611e96-6b00-4a0d-9be5-5d5e968b3371.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-20611e96-6b00-4a0d-9be5-5d5e968b3371", + "timestamp": "2026-04-14T21:22:14.567603+00:00", + "source_trajectory_ids": [ + "traj-25fd7e3b-5c72-4b15-a847-70a35bc85f1f", + "traj-2c2d6197-5e0e-4ca8-a762-b5aca1b4d486", + "traj-36a27b22-b36b-4912-8dbe-72dd0138c06f", + "traj-590f4b56-7a29-4080-80ba-23bfc984a935", + "traj-7298713f-34a4-48bc-8e41-f0b0d6de8778", + "traj-82acfa6b-76d4-4a01-a9e5-3f989a3f2684", + "traj-8ab1b3d9-a6fa-4bf3-8d5b-5298ac85afb2", + "traj-98cfa826-215f-40a9-9fb9-7d8b41640295", + "traj-cd585938-d8b0-4773-8fe7-655cba23cbec", + "traj-ec26bcd5-3463-4a25-9ebc-b046e66020cf" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-212214", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-20ea6367-8e42-4f6a-b90d-60eb441aa9f8.json b/docs/training-reports/report-20ea6367-8e42-4f6a-b90d-60eb441aa9f8.json new file mode 100644 index 0000000..7893291 --- /dev/null +++ b/docs/training-reports/report-20ea6367-8e42-4f6a-b90d-60eb441aa9f8.json @@ -0,0 +1,42 @@ +{ + "report_id": "report-20ea6367-8e42-4f6a-b90d-60eb441aa9f8", + "timestamp": "2026-04-14T18:31:07.921735+00:00", + "source_trajectory_ids": [ + "traj-253bd144-3ad8-4dcc-951e-535f7fa444c6", + "traj-2ba9feaf-ffd1-41f8-b492-365351133a96", + "traj-35c97cf2-e5d0-4d9c-be42-d46b14f8afa7", + "traj-47b3674d-73db-41dd-b8db-472f91f864a0", + "traj-4d744d56-15bd-4e54-ba65-29f6135dda22", + "traj-4e3fcf21-0020-4585-b277-9c8a03081c06", + "traj-655aea4f-bc1d-47c3-85ac-73c8bb63d7e4", + "traj-66d29b59-21be-474b-a6a8-9d7a0e36b8bb", + "traj-8c7ed834-ada2-44f2-9a67-e7066d0bafe6", + "traj-bd4fdcd8-1577-41a3-a081-a738b17bb9c1" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-183107", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-21a16654-d936-4393-88d7-9e0e00d98fec.json b/docs/training-reports/report-21a16654-d936-4393-88d7-9e0e00d98fec.json new file mode 100644 index 0000000..d528483 --- /dev/null +++ b/docs/training-reports/report-21a16654-d936-4393-88d7-9e0e00d98fec.json @@ -0,0 +1,41 @@ +{ + "report_id": "report-21a16654-d936-4393-88d7-9e0e00d98fec", + "timestamp": "2026-04-14T16:49:44.845808+00:00", + "source_trajectory_ids": [ + "traj-2999497a-dba8-4215-8f44-f7371fb4c18d", + "traj-4603b0e1-ef1e-4f44-a5bf-7994eeb97fd2", + "traj-49f42054-4065-41c6-8d70-e89801df29dc", + "traj-53b375aa-ba3f-4518-973c-6c8c1b704fd1", + "traj-68124be2-5cc4-4c52-b891-fc5cb253b3ea", + "traj-828c6c7b-72ed-44ee-8628-f1bee3080ce1", + "traj-a4516393-6015-4029-910a-15955a283aec", + "traj-ba4984ba-48a3-43ac-8726-d73db56f5a5e", + "traj-ccb9f3ca-26e2-4efb-9b1a-b55a913b55cd", + "traj-f9aa9adc-232e-4f92-9161-23165cb9dca4" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-164944" +} \ No newline at end of file diff --git a/docs/training-reports/report-21e68879-c123-49e0-8af9-9f8e9dc76ecf.json b/docs/training-reports/report-21e68879-c123-49e0-8af9-9f8e9dc76ecf.json new file mode 100644 index 0000000..7ee5219 --- /dev/null +++ b/docs/training-reports/report-21e68879-c123-49e0-8af9-9f8e9dc76ecf.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-21e68879-c123-49e0-8af9-9f8e9dc76ecf", + "timestamp": "2026-04-14T18:58:37.102905+00:00", + "source_trajectory_ids": [ + "traj-1f23a1c3-4ba3-412b-9df5-61bda0396bc8", + "traj-24c1c82b-1fac-4365-a643-faa65082b8d8", + "traj-5f848d9c-294e-4586-9951-30a03588cc26", + "traj-8bdac473-266e-454b-a2d6-267f6189850e", + "traj-b05d7433-97be-4604-bcee-a18cb1102a80", + "traj-c0a59283-e7e0-41fe-925d-79b4e076b9f6", + "traj-cd1ef62e-83f9-4b1b-99b7-ff1e1586afb9", + "traj-e2f37ecf-a075-4b1b-ad12-9c3e7be77fc7", + "traj-e3bb7c13-fe7c-462c-9f88-d7f40092669c", + "traj-e8bc452a-b3c7-4448-a994-51cc81b30730" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-185837", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-22b88101-5979-4011-b85d-c3bb3e1f84ae.json b/docs/training-reports/report-22b88101-5979-4011-b85d-c3bb3e1f84ae.json new file mode 100644 index 0000000..cb6f0bf --- /dev/null +++ b/docs/training-reports/report-22b88101-5979-4011-b85d-c3bb3e1f84ae.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-22b88101-5979-4011-b85d-c3bb3e1f84ae", + "timestamp": "2026-04-14T15:25:06.028393+00:00", + "source_trajectory_ids": [ + "traj-263efcd1-4b24-4302-a9cf-5c5778297ac2", + "traj-29855290-bd18-44cb-b1d2-da2bd3eff5b3", + "traj-5d04fe16-cac2-4f5d-b469-eb4ff0c3e66a", + "traj-67d1127e-70c1-4f86-93e2-8d0e7e6df433", + "traj-8e66108b-fa8c-426e-99e9-f7df432b7436", + "traj-a0df3bda-d35b-4c76-b68e-2c9dbb47f6f2", + "traj-bc36ba31-9c1f-4902-9f79-acf7653d0e86", + "traj-c62fe68a-4ad5-403a-b747-2648ae56392b", + "traj-c74f4755-6cf3-4f1e-950e-672a946a7b4e", + "traj-ff0f3558-e0cd-49e0-974b-1b5bd9cb5af1" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null +} \ No newline at end of file diff --git a/docs/training-reports/report-2493e5d6-4be2-49a6-8e84-6f3fda442ff5.json b/docs/training-reports/report-2493e5d6-4be2-49a6-8e84-6f3fda442ff5.json new file mode 100644 index 0000000..b99f3b7 --- /dev/null +++ b/docs/training-reports/report-2493e5d6-4be2-49a6-8e84-6f3fda442ff5.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-2493e5d6-4be2-49a6-8e84-6f3fda442ff5", + "timestamp": "2026-04-14T16:54:50.564245+00:00", + "source_trajectory_ids": [ + "traj-0ef5e19a-7c30-45a4-979f-67d46413ee95", + "traj-11483ef1-e410-47c5-a265-53bff1968182", + "traj-59cbb530-0ae7-41e6-a033-250431a20bb8", + "traj-671f19f7-cd71-4539-83ea-5441807561c9", + "traj-6c693d47-22b0-42e1-a76f-9ba625d79a70", + "traj-97c4d19b-9aff-4f15-b1e3-c82c8da598e0", + "traj-cd8780de-7bc7-4735-acdd-e66ed407619f", + "traj-e4b2af56-8718-41c4-bd7f-6c479b1fb7f3", + "traj-e63b675d-9bba-41bf-a472-a068cf2437fd", + "traj-f10cfe93-7fe1-4c84-bfba-2cb3c3892a9e" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null +} \ No newline at end of file diff --git a/docs/training-reports/report-24f69dfe-4955-4c6a-8421-5c1bdd0bdfda.json b/docs/training-reports/report-24f69dfe-4955-4c6a-8421-5c1bdd0bdfda.json new file mode 100644 index 0000000..7eb5e32 --- /dev/null +++ b/docs/training-reports/report-24f69dfe-4955-4c6a-8421-5c1bdd0bdfda.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-24f69dfe-4955-4c6a-8421-5c1bdd0bdfda", + "timestamp": "2026-04-14T21:42:45.897635+00:00", + "source_trajectory_ids": [ + "traj-11726363-9ef7-47e4-9e77-4e2b1fbefbd3", + "traj-28ecbc8b-0dd9-4e4e-a4d0-cfe262c4a812", + "traj-3abe1edd-f76a-4040-a471-0ad8535ff553", + "traj-3ed7731b-def2-459e-948d-a45cd595c4de", + "traj-4620d7ac-94ad-4310-8dfe-3f1c9124ceb9", + "traj-5a041387-76c4-483c-8df9-a6ee410a3264", + "traj-6f340ce0-d606-4890-b647-ad57360f8566", + "traj-82ebfe0c-08de-4c00-9a8e-8f293191c97a", + "traj-a5442e4e-57fb-4af6-941f-90c25a3862dc", + "traj-c2c5e763-e809-4264-af82-610fbe7c5fd1" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-2522e075-f011-4379-989c-f413d768a957.json b/docs/training-reports/report-2522e075-f011-4379-989c-f413d768a957.json new file mode 100644 index 0000000..db4db2b --- /dev/null +++ b/docs/training-reports/report-2522e075-f011-4379-989c-f413d768a957.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-2522e075-f011-4379-989c-f413d768a957", + "timestamp": "2026-04-14T15:52:51.189860+00:00", + "source_trajectory_ids": [ + "traj-0c96d155-112e-4658-bf90-3b35da7e7c2f", + "traj-22a09d6a-b9ee-4ff4-b324-a14cb8f33a91", + "traj-36fcb806-8f3f-44cc-99a8-1b4f6170a18e", + "traj-42da2c6e-15c2-4c40-92f3-14b5eef5e681", + "traj-6b083428-c47a-4d83-80dc-4db2c66887d7", + "traj-a9c06cd6-3332-41b9-bb21-3afc06e6f701", + "traj-b45b09b4-c348-4215-9cba-4adbe8a76410", + "traj-c7b2b172-8223-423c-abb4-79bfeb1cbe94", + "traj-cd806afc-6846-4361-9692-dff3469717a8", + "traj-d0622a9d-4e59-44be-8231-2b27a25d47ac" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null +} \ No newline at end of file diff --git a/docs/training-reports/report-263a0d30-9096-4e1c-a406-43927ac46d80.json b/docs/training-reports/report-263a0d30-9096-4e1c-a406-43927ac46d80.json new file mode 100644 index 0000000..c69ebd3 --- /dev/null +++ b/docs/training-reports/report-263a0d30-9096-4e1c-a406-43927ac46d80.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-263a0d30-9096-4e1c-a406-43927ac46d80", + "timestamp": "2026-04-14T20:28:05.472990+00:00", + "source_trajectory_ids": [ + "traj-02ffe167-5287-4a72-8e9e-623baef314d8", + "traj-58518476-0941-437f-86d7-80f000a35ae7", + "traj-73a97752-0bb5-4f24-92ec-8f3cec52ed4f", + "traj-77750db3-022e-4dfc-a19b-a45e2eb41923", + "traj-a7d6998f-0355-408e-acde-7f84033a7712", + "traj-b73c6762-b738-4156-bf01-38661089bd01", + "traj-d90086be-591d-4c4a-a220-6e35a125cc62", + "traj-dcada7b1-e74e-42ee-b117-aea8d121247b", + "traj-e5b873ad-8e62-4b91-a714-16eca70dbae3", + "traj-f63e8157-40e6-480a-a2d2-bd7a257636dc" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-202805", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-26782aa8-a2a0-45f9-8ac3-861fc1364431.json b/docs/training-reports/report-26782aa8-a2a0-45f9-8ac3-861fc1364431.json new file mode 100644 index 0000000..13a980f --- /dev/null +++ b/docs/training-reports/report-26782aa8-a2a0-45f9-8ac3-861fc1364431.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-26782aa8-a2a0-45f9-8ac3-861fc1364431", + "timestamp": "2026-04-15T01:57:32.697736+00:00", + "source_trajectory_ids": [ + "traj-0269da6e-275b-4eae-8a43-1b89a74a87c2", + "traj-28052977-cc23-4c06-9343-dce8b4ca5ee3", + "traj-3f7b8039-6d47-4ce4-9c3b-6d80d4548825", + "traj-4e37cd2f-c830-4eba-84d9-87cffe9bcec3", + "traj-74071894-0764-410c-8460-cebb98b80fa4", + "traj-8c547ed6-61ba-4c6e-a8e2-990264cf77b9", + "traj-a65f8feb-1368-43c0-a3b9-97f6a3420741", + "traj-b80a3d53-4ffe-4147-b8d6-b619fd951f58", + "traj-bb4444ec-7bd6-42e3-b9e1-4a680d368d50", + "traj-c5cea58f-f981-4f73-8f3a-e35d7f3befb7" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260415-015732", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-279b3b5c-bf69-4d8a-9be7-372086c295c9.json b/docs/training-reports/report-279b3b5c-bf69-4d8a-9be7-372086c295c9.json new file mode 100644 index 0000000..8122792 --- /dev/null +++ b/docs/training-reports/report-279b3b5c-bf69-4d8a-9be7-372086c295c9.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-279b3b5c-bf69-4d8a-9be7-372086c295c9", + "timestamp": "2026-04-14T20:06:16.411538+00:00", + "source_trajectory_ids": [ + "traj-2382eb44-3957-434e-a171-04e6ecd5a0ce", + "traj-28e66c25-3314-43ce-8a7f-911d8943ea11", + "traj-3b62b30e-9c8f-433b-bf40-16820db431aa", + "traj-8b3b0d30-f9bb-4323-9403-23970be3a4e6", + "traj-8b84c7fe-b827-4b35-8f2e-6538fbc684fb", + "traj-9cc1eef1-d47d-481f-851e-5913609c8740", + "traj-a960cca8-bcc9-4332-9523-3170ef7c5355", + "traj-b34c9f41-87f6-40f8-8d3b-0878f4d61911", + "traj-b4ba32fb-2bee-41b6-a46e-0a094079f40c", + "traj-f5dd1000-03ed-4b5a-b00f-e6437ba56426" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-2957a4db-25be-4a31-be96-bb53b60b0574.json b/docs/training-reports/report-2957a4db-25be-4a31-be96-bb53b60b0574.json new file mode 100644 index 0000000..2d74de6 --- /dev/null +++ b/docs/training-reports/report-2957a4db-25be-4a31-be96-bb53b60b0574.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-2957a4db-25be-4a31-be96-bb53b60b0574", + "timestamp": "2026-04-15T01:29:18.124253+00:00", + "source_trajectory_ids": [ + "traj-24875022-ace1-4d16-b802-9e19c0345039", + "traj-2adcad42-6922-4c27-b879-75aac21c94ba", + "traj-35512ca0-78e3-4d99-8efa-056013aefbbd", + "traj-60c4586a-c6ba-421c-939c-04a3c497ee3b", + "traj-64a60af0-4d68-4370-a4a0-6f5ebdc4b7ad", + "traj-8ef62528-7f24-495a-8c51-7936b30c02ec", + "traj-c52966b8-939c-493d-b528-716ca6e0c4e5", + "traj-d31381b8-70b3-44fa-9daa-053e9d517b8f", + "traj-e7852591-231f-4f0f-8032-fb872fa5e220", + "traj-eeb9b89c-6edf-4354-aafb-ce7fe0212dab" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260415-012918", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-2b3e115b-fae8-4813-b53d-1c8501010bf6.json b/docs/training-reports/report-2b3e115b-fae8-4813-b53d-1c8501010bf6.json new file mode 100644 index 0000000..075f40a --- /dev/null +++ b/docs/training-reports/report-2b3e115b-fae8-4813-b53d-1c8501010bf6.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-2b3e115b-fae8-4813-b53d-1c8501010bf6", + "timestamp": "2026-04-14T15:53:50.791993+00:00", + "source_trajectory_ids": [ + "traj-03447345-9e58-46c4-9ba0-8db3c0e720ee", + "traj-1b3c8854-3738-444e-b1ee-2ca9d728b580", + "traj-23d09f97-9f5e-4d4f-9e69-f70e66e665ce", + "traj-606fe284-63f2-40b0-88bd-fcb9a3b27738", + "traj-68c91a21-306c-462f-a9e6-f5b9b149de8e", + "traj-6ee3651f-9ccd-4d20-afc6-ac40d1a8dd9f", + "traj-7a5f901f-d5b4-41bb-a790-a9a82c521bee", + "traj-b725122c-6c00-40ae-acb0-c2ad58eaf075", + "traj-c160f961-cd03-404c-9ffa-037d1e196e9f", + "traj-ed0113b3-3c2a-4d16-b31f-8c1fcde61291" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null +} \ No newline at end of file diff --git a/docs/training-reports/report-2bbfec49-07f0-43c8-9c0d-f9dbc33b8b53.json b/docs/training-reports/report-2bbfec49-07f0-43c8-9c0d-f9dbc33b8b53.json new file mode 100644 index 0000000..0965831 --- /dev/null +++ b/docs/training-reports/report-2bbfec49-07f0-43c8-9c0d-f9dbc33b8b53.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-2bbfec49-07f0-43c8-9c0d-f9dbc33b8b53", + "timestamp": "2026-04-14T22:10:15.279833+00:00", + "source_trajectory_ids": [ + "traj-0fd602e1-438a-42d6-b684-09db49c96b27", + "traj-354a350a-d325-4233-afe7-387d05eba246", + "traj-4737d84e-ffc5-4187-b37b-c1581c9197c5", + "traj-54880be9-eac7-4a8a-82c2-aee098b966a1", + "traj-5d14290d-8afc-4a98-9df0-44db13f9bc33", + "traj-659b7f39-a95a-485a-aab7-65018cc206ed", + "traj-77720324-df56-4f6f-a732-824e97d9c7fe", + "traj-ed1a3760-846c-4534-9ea7-e6f54a2e1414", + "traj-f6fef155-e568-49f9-a286-c56c6b729c0d", + "traj-f8488008-391b-4fae-9d98-b9a379eca15e" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-221015", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-2ce71481-bf67-4581-b35e-65b83189c959.json b/docs/training-reports/report-2ce71481-bf67-4581-b35e-65b83189c959.json new file mode 100644 index 0000000..e966380 --- /dev/null +++ b/docs/training-reports/report-2ce71481-bf67-4581-b35e-65b83189c959.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-2ce71481-bf67-4581-b35e-65b83189c959", + "timestamp": "2026-04-14T21:22:14.626005+00:00", + "source_trajectory_ids": [ + "traj-0a77ca1d-28c9-4148-b099-2990f38d701f", + "traj-51afda16-46e8-4fc6-aba5-1a02600624de", + "traj-5586372a-1b39-4028-95ae-31095cf3136d", + "traj-a1333bdd-5b03-45c7-853f-e49c1767031a", + "traj-a6a93535-fce3-4374-9cb6-0321a5a4769f", + "traj-b40704cc-e206-4da6-930c-1d0f1b7234a5", + "traj-b7c19bf6-3da3-451d-ba41-4019ef8e92d5", + "traj-c6f07968-deb4-48eb-a4ef-401339afb5fd", + "traj-d0fd2e96-e15a-4f2e-b51d-34f482772ea1", + "traj-e74aefaa-647c-4b9f-8a6c-e7cb2bbe0780" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-212214", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-2e6d5c48-47b9-4bd6-9a9b-117e9d646ccc.json b/docs/training-reports/report-2e6d5c48-47b9-4bd6-9a9b-117e9d646ccc.json new file mode 100644 index 0000000..87496e3 --- /dev/null +++ b/docs/training-reports/report-2e6d5c48-47b9-4bd6-9a9b-117e9d646ccc.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-2e6d5c48-47b9-4bd6-9a9b-117e9d646ccc", + "timestamp": "2026-04-14T21:18:36.299274+00:00", + "source_trajectory_ids": [ + "traj-06944244-d2e8-4e6f-a49f-da5c792befce", + "traj-2a694d25-5898-4d2d-9bac-bcdc01d2d442", + "traj-4c7f18bb-78aa-4170-99e7-12e5ace54340", + "traj-5d7ed2c2-17b0-4daf-8661-51ec1da1fd60", + "traj-649669bb-b814-4114-9205-a328137d5bf7", + "traj-768ecbc8-30bf-4c05-82e6-c346736eea24", + "traj-7bb5cc17-a8e6-4c0a-b525-e2cd671187a2", + "traj-d23853f5-359e-4c3b-97e7-d239b5d7a152", + "traj-ec11358e-da16-426e-b74a-42f2e95db560", + "traj-f486a57e-e7b2-4a34-a6ed-6bfa836045be" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-211836", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-2f0f6640-8049-4de2-bbb9-71a76fd8be67.json b/docs/training-reports/report-2f0f6640-8049-4de2-bbb9-71a76fd8be67.json new file mode 100644 index 0000000..aa51a58 --- /dev/null +++ b/docs/training-reports/report-2f0f6640-8049-4de2-bbb9-71a76fd8be67.json @@ -0,0 +1,41 @@ +{ + "report_id": "report-2f0f6640-8049-4de2-bbb9-71a76fd8be67", + "timestamp": "2026-04-14T15:29:35.959531+00:00", + "source_trajectory_ids": [ + "traj-13fe5578-4d8a-4781-9338-1a612a1e5a06", + "traj-1ed5b4dc-8872-45f8-81a9-91b46545097b", + "traj-4e0d0d23-c5d7-4211-a6ef-31fb93bd62aa", + "traj-6728130e-69dd-4b4e-beb4-d7c0a898d962", + "traj-74c20720-f0c5-45dc-8ee1-b78a38d9b967", + "traj-c3d1a98e-86fc-491c-8ab1-acb7f890f2c9", + "traj-c7a69537-49c5-42b2-8e87-1a05699bbb15", + "traj-dddd03c8-8ec2-4571-8646-2c3dfd9eefe1", + "traj-df54d24d-cfbc-4475-8d79-c77f4c11407b", + "traj-f1e396ed-ac25-4bb1-bb8e-55fc4b405819" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-152935" +} \ No newline at end of file diff --git a/docs/training-reports/report-31835629-98b1-4a08-8a42-e80702dd3ff7.json b/docs/training-reports/report-31835629-98b1-4a08-8a42-e80702dd3ff7.json new file mode 100644 index 0000000..c8189d3 --- /dev/null +++ b/docs/training-reports/report-31835629-98b1-4a08-8a42-e80702dd3ff7.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-31835629-98b1-4a08-8a42-e80702dd3ff7", + "timestamp": "2026-04-14T22:08:19.690919+00:00", + "source_trajectory_ids": [ + "traj-062ae1e2-aedc-46a1-a8e7-585f1bfd6968", + "traj-0aaadbdb-15cb-466b-9271-ecec84e0b21f", + "traj-87b00206-b80a-4572-a12b-d441e45f4374", + "traj-93bdfd26-c5b0-43d1-926d-303f3ea7d176", + "traj-95de8601-6f2b-4313-ad1f-dc248c4e6d78", + "traj-a72b7a28-3293-4d78-8a65-411a0ca6aefb", + "traj-af818f60-198b-4b46-9a36-5012b467d867", + "traj-bc6fedae-308b-491e-bbea-419817542e18", + "traj-dba716d0-5bdc-425e-8b6a-043f16e1d9b1", + "traj-e573c252-18ea-4f15-8630-568320f4d3c3" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-220819", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-319b9b10-5c39-46eb-a905-638876b20b78.json b/docs/training-reports/report-319b9b10-5c39-46eb-a905-638876b20b78.json new file mode 100644 index 0000000..c5a7a60 --- /dev/null +++ b/docs/training-reports/report-319b9b10-5c39-46eb-a905-638876b20b78.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-319b9b10-5c39-46eb-a905-638876b20b78", + "timestamp": "2026-04-14T20:56:11.631566+00:00", + "source_trajectory_ids": [ + "traj-046f3b45-5b6d-4883-801a-4b674ac9a0f6", + "traj-1fc6191a-e9e5-4b9c-9294-ab68a7992506", + "traj-4370c224-46cc-44bf-aa52-c1ae9b9884be", + "traj-520f51ad-e1ea-42c9-b402-edb618a95020", + "traj-579262b0-c64c-41fa-802f-b3800e44d890", + "traj-5e8a36bb-1586-4843-9c4b-8e981c02342a", + "traj-b1cf2655-7694-4656-94f1-0cd6a2f4e195", + "traj-b92d6fd6-6421-42e3-9a1e-d0d5ea7a3ce5", + "traj-cb26f2fe-8169-4616-b124-47594ea88495", + "traj-fd456c78-f35a-47e3-9491-d462f28ea5dd" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-31c3b5b5-84dc-4a81-91c6-d663f0856347.json b/docs/training-reports/report-31c3b5b5-84dc-4a81-91c6-d663f0856347.json new file mode 100644 index 0000000..b11169d --- /dev/null +++ b/docs/training-reports/report-31c3b5b5-84dc-4a81-91c6-d663f0856347.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-31c3b5b5-84dc-4a81-91c6-d663f0856347", + "timestamp": "2026-04-14T20:57:03.309395+00:00", + "source_trajectory_ids": [ + "traj-0f36d4fa-6aa7-4647-8434-939727c2c38b", + "traj-2abf9d11-6e8f-41c7-9554-ff424498b905", + "traj-43b8c849-cff6-4a05-b166-19252a8b4758", + "traj-78074052-0bf6-466c-bdfe-6cffd970494c", + "traj-7bda4651-8a18-49f1-957e-9163f264035b", + "traj-8c8c5944-1d6f-44e1-a6e8-119d8fd904d7", + "traj-91f84675-8772-44f6-8916-61f9277a9af7", + "traj-cf026710-8f03-45da-8327-9f5e5a671c24", + "traj-e11c2598-a12f-43cf-b547-98a44c373c30", + "traj-fa2448c9-d2da-43d2-802b-bb2585e5c8d4" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-205703", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-32ee1906-6846-46b2-99b6-0df8aa632f18.json b/docs/training-reports/report-32ee1906-6846-46b2-99b6-0df8aa632f18.json new file mode 100644 index 0000000..674098b --- /dev/null +++ b/docs/training-reports/report-32ee1906-6846-46b2-99b6-0df8aa632f18.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-32ee1906-6846-46b2-99b6-0df8aa632f18", + "timestamp": "2026-04-14T21:42:45.766701+00:00", + "source_trajectory_ids": [ + "traj-1e14ac8f-dac5-4cab-a3c6-5b4fa97fb779", + "traj-58dec6f6-4712-4747-b189-9428512d8069", + "traj-59664fbf-ce9a-4b4f-a039-2625767e85ae", + "traj-ae3d3da7-773c-48f7-8c7d-35c9f3d6cccf", + "traj-c399734a-4bbc-49f0-ae6f-c8fa23f6a482", + "traj-c3dff94e-43fd-4d04-820c-47c9b7f01dfb", + "traj-ca22da3d-09d6-493a-b338-444343e0b252", + "traj-eacacd75-cd33-40e3-874b-8032ddd42175", + "traj-f61b2282-3674-4a29-867b-0dbf664bd116", + "traj-ffd6b64c-3b24-4b83-b72b-01b083f6e4b8" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-214245", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-33a0b960-112f-4976-a8db-1b15177f7e8e.json b/docs/training-reports/report-33a0b960-112f-4976-a8db-1b15177f7e8e.json new file mode 100644 index 0000000..0f97b22 --- /dev/null +++ b/docs/training-reports/report-33a0b960-112f-4976-a8db-1b15177f7e8e.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-33a0b960-112f-4976-a8db-1b15177f7e8e", + "timestamp": "2026-04-14T20:32:37.730415+00:00", + "source_trajectory_ids": [ + "traj-23a871e5-2833-48ee-b3c1-5ed94138fcaa", + "traj-3cd1b98f-d6c3-4452-bfbf-71a3cf56415c", + "traj-4e67d82a-e669-4074-8dd9-450c8dd5102a", + "traj-58dd91bb-8f28-45f6-ab96-c61aca61c671", + "traj-627cad0d-6b5d-4631-b0fa-ffd16e1435c9", + "traj-867989fb-734f-441c-8f54-7177f83bb7b9", + "traj-bdc6723e-392d-4fe3-94fb-950af73ecfbe", + "traj-d2a41298-1da9-4fc5-8ba1-4ff15425c2f4", + "traj-d7d6e4c5-b66f-4456-8105-fcdedf467877", + "traj-fcd41273-8d85-4d4d-a0aa-5313347fe699" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-33cb91ff-6fc2-4cff-9e7c-7c80d67e9beb.json b/docs/training-reports/report-33cb91ff-6fc2-4cff-9e7c-7c80d67e9beb.json new file mode 100644 index 0000000..5e1a83e --- /dev/null +++ b/docs/training-reports/report-33cb91ff-6fc2-4cff-9e7c-7c80d67e9beb.json @@ -0,0 +1,44 @@ +{ + "report_id": "report-33cb91ff-6fc2-4cff-9e7c-7c80d67e9beb", + "timestamp": "2026-04-14T18:31:07.979291+00:00", + "source_trajectory_ids": [ + "traj-1724a751-6423-4fff-b9c6-ef92845b7297", + "traj-1f6f99c3-21de-4a14-840e-180021048a34", + "traj-2733b029-d10a-4902-922a-644d329a17c1", + "traj-294aadaf-5f60-409d-80f4-c640d8d82abd", + "traj-3282276b-ea3a-40d8-babb-62fcad0fa27d", + "traj-45b4d7c4-367f-4cbc-825f-a112137275ef", + "traj-5035c348-520b-468b-8e12-5de07b0ca885", + "traj-c86af3f9-c441-4267-bd37-b8a0a2d182da", + "traj-ccbb4aaf-7cff-4711-b987-c4e61ec1a4a8", + "traj-cd5f83d9-4ef4-42e4-af07-7c3345c26fe8" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-349d27df-93ce-49a1-ad9d-9c2ecdb5f9c1.json b/docs/training-reports/report-349d27df-93ce-49a1-ad9d-9c2ecdb5f9c1.json new file mode 100644 index 0000000..3cf8aa0 --- /dev/null +++ b/docs/training-reports/report-349d27df-93ce-49a1-ad9d-9c2ecdb5f9c1.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-349d27df-93ce-49a1-ad9d-9c2ecdb5f9c1", + "timestamp": "2026-04-14T20:03:02.621111+00:00", + "source_trajectory_ids": [ + "traj-40ac86e5-85ee-4468-92ca-7be25e3e7442", + "traj-4c1bed8a-7db6-46f3-9ae9-1844cbbba837", + "traj-4f8e5d10-ce5d-4249-83d0-63e528af3bcd", + "traj-519e05a4-eec3-44ae-b07e-0c78565e6065", + "traj-5d84864a-c5e9-4674-a0ec-47ccedf609a9", + "traj-8fb37e36-f171-4285-8a37-905f6f7a34d5", + "traj-a0ab9293-99af-4b7b-894c-f0d6dec9fd40", + "traj-beead0f0-adcb-496a-8b73-ffe1bef4ee42", + "traj-d909fed4-7e99-46e0-a770-7d9e629cff7c", + "traj-e2d3db6a-5db2-41e8-8ecc-909f5d433324" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-36954641-b7fb-4fdd-895c-590d3ec6e0b8.json b/docs/training-reports/report-36954641-b7fb-4fdd-895c-590d3ec6e0b8.json new file mode 100644 index 0000000..f12a020 --- /dev/null +++ b/docs/training-reports/report-36954641-b7fb-4fdd-895c-590d3ec6e0b8.json @@ -0,0 +1,41 @@ +{ + "report_id": "report-36954641-b7fb-4fdd-895c-590d3ec6e0b8", + "timestamp": "2026-04-14T16:53:59.545841+00:00", + "source_trajectory_ids": [ + "traj-6ba2785a-40a7-4248-9959-81d676a53741", + "traj-6cc684a4-a0f5-4068-bb18-70049e38ad2f", + "traj-750702b7-0344-4ef3-bdb6-8f62486d8788", + "traj-978cf9b1-4066-4c0b-8612-f7432a602153", + "traj-a85c7d2a-aab9-4508-bee1-4da6e793166f", + "traj-ab6fd543-950f-4b00-8a2c-6fbe49ca70d0", + "traj-b40f890b-49cc-4c2e-b16b-35cf68e30ee1", + "traj-de829d20-8677-46d6-a1c5-bd9501eab3ce", + "traj-eb695119-3ef5-4bc3-8c01-1855bab0cd0b", + "traj-f13331d8-e087-4d5a-88a6-bc9f754791a0" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-165359" +} \ No newline at end of file diff --git a/docs/training-reports/report-36b3533a-655c-4a02-b65e-850d90a1c320.json b/docs/training-reports/report-36b3533a-655c-4a02-b65e-850d90a1c320.json new file mode 100644 index 0000000..9f16e8d --- /dev/null +++ b/docs/training-reports/report-36b3533a-655c-4a02-b65e-850d90a1c320.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-36b3533a-655c-4a02-b65e-850d90a1c320", + "timestamp": "2026-04-14T20:56:11.681988+00:00", + "source_trajectory_ids": [ + "traj-22545cf9-c2c8-4ef9-abcd-271a216d7b39", + "traj-487aba11-f414-4f54-ab08-635e4436ce00", + "traj-652cb0d0-0132-44b7-b88b-15cc073fa6b6", + "traj-6e8fb3a1-d05c-4f21-acac-751c59695c26", + "traj-726e0d27-aadc-487a-a2c2-2245705d78bb", + "traj-996ecef9-5116-4cc9-aca0-ed85c27666bb", + "traj-a93a36a4-0c2b-441d-8b9a-8aeb61685092", + "traj-d1beb072-dee1-4ed3-bbfb-3c9462a713ca", + "traj-d58e0089-340c-4910-8c06-bfba7862d075", + "traj-fd45b279-677a-4653-bebf-784271ce95a1" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-205611", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-37a28243-ed47-4bc7-b260-9bb38b5c0f99.json b/docs/training-reports/report-37a28243-ed47-4bc7-b260-9bb38b5c0f99.json new file mode 100644 index 0000000..2bd3be7 --- /dev/null +++ b/docs/training-reports/report-37a28243-ed47-4bc7-b260-9bb38b5c0f99.json @@ -0,0 +1,41 @@ +{ + "report_id": "report-37a28243-ed47-4bc7-b260-9bb38b5c0f99", + "timestamp": "2026-04-14T18:27:21.104548+00:00", + "source_trajectory_ids": [ + "traj-21a6856f-bdd7-4851-b9e0-219458e43cdb", + "traj-31dffa4d-dd6e-4d32-831d-41e83119f7fc", + "traj-372f8001-3343-4e51-b415-ad3231658ffe", + "traj-37658dc1-0bd6-4245-88f2-5a12b901d82f", + "traj-62dfe993-4c43-4122-9dde-39cfe6ea5fc1", + "traj-6cdaf4fa-5553-46c4-b550-3a01fd2b6371", + "traj-cafc62d9-5964-4a94-ac55-26e194aed032", + "traj-cc0843b4-d2fd-4a69-8b82-b55ad4dc8c79", + "traj-e5a7d153-7a31-4618-b117-ec13d8c192db", + "traj-f2aeb199-2ef4-4d29-a523-740d1d10aeb7" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-182721" +} \ No newline at end of file diff --git a/docs/training-reports/report-38b4ed01-cbaa-4ec2-9b31-af634d9786b1.json b/docs/training-reports/report-38b4ed01-cbaa-4ec2-9b31-af634d9786b1.json new file mode 100644 index 0000000..17caffe --- /dev/null +++ b/docs/training-reports/report-38b4ed01-cbaa-4ec2-9b31-af634d9786b1.json @@ -0,0 +1,41 @@ +{ + "report_id": "report-38b4ed01-cbaa-4ec2-9b31-af634d9786b1", + "timestamp": "2026-04-14T15:04:26.185363+00:00", + "source_trajectory_ids": [ + "traj-5e3f3a25-8ba2-4047-9cc5-ec1072ef1eec", + "traj-68138afe-056b-4e15-b67a-cbcaf8c17ff5", + "traj-6c8c007c-7781-4b15-a369-572a00f40457", + "traj-76fbd22e-1f59-4855-afe9-ebf90f8d59e6", + "traj-7b21dec9-e7b7-4f68-a5a0-c50221ee37aa", + "traj-a967b477-d265-414f-a33e-2582a8f0e086", + "traj-b0d09d4f-d041-41ce-94aa-1b7221e182ca", + "traj-bb05f70d-64f5-49ad-bbb2-9a63bf255d5b", + "traj-e4bd70d9-0680-4385-9397-dfb3299f9be0", + "traj-f4fed21e-bf33-479b-a05e-379b6997ca42" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-150426" +} \ No newline at end of file diff --git a/docs/training-reports/report-3c6cc2dd-862a-480a-913d-5554b4058d11.json b/docs/training-reports/report-3c6cc2dd-862a-480a-913d-5554b4058d11.json new file mode 100644 index 0000000..973b886 --- /dev/null +++ b/docs/training-reports/report-3c6cc2dd-862a-480a-913d-5554b4058d11.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-3c6cc2dd-862a-480a-913d-5554b4058d11", + "timestamp": "2026-04-14T18:01:06.225115+00:00", + "source_trajectory_ids": [ + "traj-0f44d7ff-d89a-4c71-8c9a-1302bf13c23b", + "traj-34cf1d79-5fff-4bf2-a4a0-cfb8336821a4", + "traj-43bc3505-9d70-4a13-9212-da1cfe4e09d1", + "traj-566c33fe-5f4e-4697-9166-0420d63623fb", + "traj-56ac21d8-1dcf-4525-95ea-93d3ca52647e", + "traj-787abb45-cb75-40b7-b734-71d7cf60a180", + "traj-a3da8105-84a6-459d-a28f-6892b09afcdc", + "traj-d4cbefc9-9112-490a-a27c-f07f6e3d9662", + "traj-dfda911e-a036-443f-a5dd-28c8f1d84bca", + "traj-f75507d0-c01d-44d5-aaf3-9ff5f7f6c682" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null +} \ No newline at end of file diff --git a/docs/training-reports/report-3d4ecf12-c252-493e-8ba5-2bfe51399190.json b/docs/training-reports/report-3d4ecf12-c252-493e-8ba5-2bfe51399190.json new file mode 100644 index 0000000..8cb061a --- /dev/null +++ b/docs/training-reports/report-3d4ecf12-c252-493e-8ba5-2bfe51399190.json @@ -0,0 +1,41 @@ +{ + "report_id": "report-3d4ecf12-c252-493e-8ba5-2bfe51399190", + "timestamp": "2026-04-14T16:52:07.890026+00:00", + "source_trajectory_ids": [ + "traj-2d72ff63-f4d4-43e4-b7d6-a159d321ef0e", + "traj-2ebae5a7-9c72-4178-b019-3380e17dea1d", + "traj-7123a23c-15ce-400a-9aff-1e4e2251695f", + "traj-7a9be049-d77e-4bf8-9776-045cad6a88b0", + "traj-a0786cfa-9353-4740-a8f1-473809ca7cd5", + "traj-a1a1766b-f5c3-4070-ad40-590985c65ff7", + "traj-ab013496-4365-4881-b625-e466728a83e3", + "traj-d8dc62b0-35bb-4e47-b50e-44f98de2eaf4", + "traj-e0529bff-6c1c-4afd-9ff3-195690e941dc", + "traj-eb197e1a-479a-4bea-8fdc-f48d2a053b7c" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-165207" +} \ No newline at end of file diff --git a/docs/training-reports/report-3e0d3ae0-e20c-40bc-981c-3f582cfbb7b2.json b/docs/training-reports/report-3e0d3ae0-e20c-40bc-981c-3f582cfbb7b2.json new file mode 100644 index 0000000..7c448bf --- /dev/null +++ b/docs/training-reports/report-3e0d3ae0-e20c-40bc-981c-3f582cfbb7b2.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-3e0d3ae0-e20c-40bc-981c-3f582cfbb7b2", + "timestamp": "2026-04-14T18:06:25.522532+00:00", + "source_trajectory_ids": [ + "traj-13bc1070-0b04-4664-b4e9-13400d3a9362", + "traj-4151835b-6ce4-46d5-9fdc-ee914931954a", + "traj-5a342582-6740-452b-a17c-36cb459966e8", + "traj-946fe752-e504-4791-aa5f-cb0af58f66f9", + "traj-99c0c477-db3e-4c2c-b95c-40c3489923d8", + "traj-afed1dbc-f4b5-4826-8124-a448a1ec0b86", + "traj-b0e8b9fe-261b-4673-9a17-8e3f4f64ae2e", + "traj-b2579e6a-2ee4-4572-affd-33537db667df", + "traj-cf639c55-a6a6-48e3-941d-9bc0b9bb1b88", + "traj-e3a07906-0351-4b52-a1a0-f5e1391c93e3" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null +} \ No newline at end of file diff --git a/docs/training-reports/report-401afaba-f468-45bf-8442-2bf14c8316a2.json b/docs/training-reports/report-401afaba-f468-45bf-8442-2bf14c8316a2.json new file mode 100644 index 0000000..c78f28b --- /dev/null +++ b/docs/training-reports/report-401afaba-f468-45bf-8442-2bf14c8316a2.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-401afaba-f468-45bf-8442-2bf14c8316a2", + "timestamp": "2026-04-15T01:25:33.810966+00:00", + "source_trajectory_ids": [ + "traj-13292acc-308a-43f7-9716-87c8b938eb0f", + "traj-223e79f2-f626-4ae1-a0ad-2a6d053af25c", + "traj-519063dd-7bf3-461a-875b-bb7a4ecc2893", + "traj-66938ede-2373-46f8-a459-b5f291f3bc2b", + "traj-868a893d-8fba-45ba-875d-6a6e1f2dce8e", + "traj-916f484c-13a0-415b-aefa-832c07dfcf03", + "traj-9e1cead1-15bb-4098-bf5c-8b9b810988d9", + "traj-b3dc9704-72c5-4b75-a238-ab66b22dd766", + "traj-c6ec0501-d96d-4ad8-bf88-072640e22e4d", + "traj-cee07144-c2df-407d-9c43-29a76de1a48a" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260415-012533", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-4105b4e5-179f-431f-9f11-b110077fb2bc.json b/docs/training-reports/report-4105b4e5-179f-431f-9f11-b110077fb2bc.json new file mode 100644 index 0000000..69b7c0e --- /dev/null +++ b/docs/training-reports/report-4105b4e5-179f-431f-9f11-b110077fb2bc.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-4105b4e5-179f-431f-9f11-b110077fb2bc", + "timestamp": "2026-04-14T21:18:36.114147+00:00", + "source_trajectory_ids": [ + "traj-2fce941b-cf65-4efc-a9bc-c83672376d6e", + "traj-62d39291-7f83-4655-a5d0-c992d9ecdf04", + "traj-69975215-b8ff-4faa-b5f2-88ad26805b28", + "traj-81882b7c-ba4c-4ffc-89f4-7803c3aeee01", + "traj-8816a644-f2ef-44c1-8185-ea2cb83afb06", + "traj-9480f7c2-8853-4971-b36f-19c9f1592285", + "traj-956fdf6d-ef5a-4a7b-b32f-029f38e72533", + "traj-a1e6c947-e6e2-4f90-9a6e-22d99a656a2b", + "traj-b2d31d5b-12b7-4694-a26f-e91e1e3be8a7", + "traj-d03bb05c-acd4-452d-af83-77571b30009a" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-211836", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-4109400b-cfbb-4d5a-a26a-e0f4fc054541.json b/docs/training-reports/report-4109400b-cfbb-4d5a-a26a-e0f4fc054541.json new file mode 100644 index 0000000..2e4a2c3 --- /dev/null +++ b/docs/training-reports/report-4109400b-cfbb-4d5a-a26a-e0f4fc054541.json @@ -0,0 +1,41 @@ +{ + "report_id": "report-4109400b-cfbb-4d5a-a26a-e0f4fc054541", + "timestamp": "2026-04-14T15:50:36.233055+00:00", + "source_trajectory_ids": [ + "traj-096259f5-d76c-4c42-b42e-fa16c5d2935e", + "traj-19070f8a-b395-4564-a3f0-4cb00e20ea3c", + "traj-226fd70d-da48-4f47-9f63-b6df7efc0175", + "traj-2614cf9a-bebc-44ca-99f6-508a266e2f42", + "traj-4a954778-c76c-4c98-9923-9f97365efc12", + "traj-5b0281de-bb94-4864-85a9-3a6b10d38121", + "traj-5bd45032-16b1-438f-b3a4-ba400ac52884", + "traj-82903c13-0579-4ff4-9d16-631ae7174d9e", + "traj-9ba2db0d-281d-44dc-a50e-09df89bee6f9", + "traj-a6c6d8c3-8073-457e-b55c-48b78eead9ba" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-155036" +} \ No newline at end of file diff --git a/docs/training-reports/report-413d07cb-cf68-42a8-87fd-19fef8e752d1.json b/docs/training-reports/report-413d07cb-cf68-42a8-87fd-19fef8e752d1.json new file mode 100644 index 0000000..4cfd397 --- /dev/null +++ b/docs/training-reports/report-413d07cb-cf68-42a8-87fd-19fef8e752d1.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-413d07cb-cf68-42a8-87fd-19fef8e752d1", + "timestamp": "2026-04-14T22:05:59.221755+00:00", + "source_trajectory_ids": [ + "traj-00f15b5e-a641-42bd-b3d0-58fe2d9ab635", + "traj-0fc4a255-480c-49a2-b281-420b14c89d71", + "traj-1e601826-f496-492b-9001-033c1f4bf38f", + "traj-2055c676-3aa8-47c9-838b-a464e2599090", + "traj-35633916-ceff-4ef2-b270-260ef43f068e", + "traj-37f59692-f0db-42c7-9fb8-12a24fe65336", + "traj-5f43857c-36d8-464f-97e4-17d3532babfb", + "traj-7dc6d914-fc99-4995-a701-4c544444e421", + "traj-99143129-4966-49d8-8c79-0bae868f22e6", + "traj-f28f6c36-e9f9-4745-98a3-082eccb8e7a2" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-220559", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-41510fac-07d4-48d7-b0d3-435308de8a9b.json b/docs/training-reports/report-41510fac-07d4-48d7-b0d3-435308de8a9b.json new file mode 100644 index 0000000..6e0bcd9 --- /dev/null +++ b/docs/training-reports/report-41510fac-07d4-48d7-b0d3-435308de8a9b.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-41510fac-07d4-48d7-b0d3-435308de8a9b", + "timestamp": "2026-04-14T22:09:39.015784+00:00", + "source_trajectory_ids": [ + "traj-0679d7fb-16f2-4815-b422-28bfad02aa05", + "traj-4cacef32-7bd5-478d-9811-3cf929fdf4cb", + "traj-72eec91d-72f8-49ad-b56f-d40b65f4cd76", + "traj-7784f333-f35b-4cc6-9258-71cb52ed5d62", + "traj-ad9737bf-684f-42df-927b-a8c239a9e63e", + "traj-d1536852-2f04-44a2-b18a-6b25acacff35", + "traj-ef00ca26-5cfc-4a06-9dcd-e6f678c85c8f", + "traj-f33cfd42-2102-4873-a3f0-5a2be2be4d69", + "traj-f74fe8d9-81b7-4d72-8514-e5bce6b23716", + "traj-f772c53c-6d53-4fa3-9147-b6642fa4a1e8" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-220939", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-43d4c045-9a5c-40d5-8b1d-1d7c6f77adb6.json b/docs/training-reports/report-43d4c045-9a5c-40d5-8b1d-1d7c6f77adb6.json new file mode 100644 index 0000000..81df3ae --- /dev/null +++ b/docs/training-reports/report-43d4c045-9a5c-40d5-8b1d-1d7c6f77adb6.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-43d4c045-9a5c-40d5-8b1d-1d7c6f77adb6", + "timestamp": "2026-04-14T18:57:10.528189+00:00", + "source_trajectory_ids": [ + "traj-1c8080c5-67cb-43d0-80ac-b5e2ec996a3f", + "traj-3f7ac604-e2ed-445e-a8d4-dbd94524609b", + "traj-46540622-c2d7-4783-a309-3833fa1a3f70", + "traj-4e5d9253-feaa-4641-bb84-680f81e38c57", + "traj-7bca135b-7fe2-444a-b112-10570666956d", + "traj-9aec1caa-9415-45a9-ad1d-e5dddda3985b", + "traj-badf1655-c1c5-45ca-b7d5-3f3ab5593848", + "traj-bc375f1b-d8ad-459c-9f08-880f95e1b8d9", + "traj-da9c3b03-5ff3-4f35-bbe1-be521909b86e", + "traj-ec6156e1-6013-45a6-8e3b-66af0214feb7" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-185710", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-43d6e714-88cc-4a65-a101-0d295c3dd389.json b/docs/training-reports/report-43d6e714-88cc-4a65-a101-0d295c3dd389.json new file mode 100644 index 0000000..e5b1c31 --- /dev/null +++ b/docs/training-reports/report-43d6e714-88cc-4a65-a101-0d295c3dd389.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-43d6e714-88cc-4a65-a101-0d295c3dd389", + "timestamp": "2026-04-14T20:58:05.550865+00:00", + "source_trajectory_ids": [ + "traj-4605e012-4600-41ea-87ef-de75d36cb859", + "traj-59306aa1-bcd1-42d8-9bfd-bd8f8d26120a", + "traj-73d23896-0973-442f-87f8-0a80e64d51cb", + "traj-85d6d2cd-b787-42d1-9ed4-623353e3c13f", + "traj-afd7b74c-dd4d-4407-9223-73b0d6fdb58c", + "traj-d2c33fae-90f5-42ae-a928-75cc1f2f4475", + "traj-deb4aa39-0602-4dd7-b06c-57f7fdda2053", + "traj-ee7a8746-7cb2-416b-9f11-afefd425ed0a", + "traj-f369cb32-35a0-4ea2-92a1-f5980c79fd06", + "traj-f5d33f02-0607-4730-894e-4b06be60d7d8" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-205805", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-43e5bf98-8055-4b9f-8640-f2414224d4bf.json b/docs/training-reports/report-43e5bf98-8055-4b9f-8640-f2414224d4bf.json new file mode 100644 index 0000000..64edcf1 --- /dev/null +++ b/docs/training-reports/report-43e5bf98-8055-4b9f-8640-f2414224d4bf.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-43e5bf98-8055-4b9f-8640-f2414224d4bf", + "timestamp": "2026-04-14T22:05:44.115652+00:00", + "source_trajectory_ids": [ + "traj-04405ec0-7ca9-4465-9518-7dab5d020c30", + "traj-0c0ddc68-0063-41c0-ab91-8e966b62f705", + "traj-0f652d30-d7e1-47fa-b124-cd288a12a14c", + "traj-1ce33c0d-597a-49e6-93dc-88f069e4c302", + "traj-24e41e0a-09f1-4d0c-8899-d4bb69ac9ddf", + "traj-7ab55256-fcd2-453f-abbf-4790df4f6ca8", + "traj-82348fa8-183a-4a40-a366-b6ac6a87309a", + "traj-b2eeb3af-3de0-44a4-b183-060b3a6d81d7", + "traj-dcf0d891-e3b8-49e5-ac91-569f3ceb3a90", + "traj-eabcb8e0-3bc8-47f5-a8dc-4221362a102f" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-220544", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-45467fbb-5fdb-45fe-b4a5-293b6560c08c.json b/docs/training-reports/report-45467fbb-5fdb-45fe-b4a5-293b6560c08c.json new file mode 100644 index 0000000..94e78d9 --- /dev/null +++ b/docs/training-reports/report-45467fbb-5fdb-45fe-b4a5-293b6560c08c.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-45467fbb-5fdb-45fe-b4a5-293b6560c08c", + "timestamp": "2026-04-15T01:33:34.775057+00:00", + "source_trajectory_ids": [ + "traj-0bd98b63-fe62-407b-a82a-22240f040427", + "traj-0d032b19-011f-48c7-9bee-508b57b44d26", + "traj-15f2455d-56db-46e2-bd41-6418ba23a463", + "traj-4781ddd9-6ee8-43a1-9047-40503b2e5fae", + "traj-9e942dcc-7508-4779-91e2-a03c1b82c597", + "traj-b09495ff-1be2-4c94-bde8-dd0ea7cfbcad", + "traj-cda9f732-c8a1-4d7f-81bc-21318c922894", + "traj-e2626858-fc8f-476a-913f-fa9b1944c6a4", + "traj-ea4fed23-c0b9-4f76-9489-967dd42ed5ec", + "traj-fd5d13e3-ee3b-468b-99ea-8e85c7e0393b" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260415-013334", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-45f61edc-3f4c-4c6c-822f-d4e97628aa77.json b/docs/training-reports/report-45f61edc-3f4c-4c6c-822f-d4e97628aa77.json new file mode 100644 index 0000000..7c41aca --- /dev/null +++ b/docs/training-reports/report-45f61edc-3f4c-4c6c-822f-d4e97628aa77.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-45f61edc-3f4c-4c6c-822f-d4e97628aa77", + "timestamp": "2026-04-14T22:08:19.673227+00:00", + "source_trajectory_ids": [ + "traj-02f093fc-763e-4d3e-bafc-bb8aeaea75f1", + "traj-0a2744e8-e06b-4ecd-b0df-fa0a1c886b23", + "traj-28d82cd2-2f61-4a7d-903c-76c29caa93a2", + "traj-36f2201d-9728-480b-8c84-b40a7062e4e6", + "traj-7bba52c5-33d5-40e6-83a8-0444853df15d", + "traj-8a29f02e-99b7-4169-bff5-061f9aaa82a3", + "traj-8f82347d-7a20-4f92-9491-6ea282ed4c9e", + "traj-d76fd602-b881-4484-afd1-ef3c2f26f839", + "traj-f000568b-7ac3-4a67-a8ce-63d1d9beb690", + "traj-ffa74e58-35c4-42dd-a83d-e454392c2914" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-220819", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-461a5c98-57e6-40c8-b2ce-5a70f64072d2.json b/docs/training-reports/report-461a5c98-57e6-40c8-b2ce-5a70f64072d2.json new file mode 100644 index 0000000..c57dd40 --- /dev/null +++ b/docs/training-reports/report-461a5c98-57e6-40c8-b2ce-5a70f64072d2.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-461a5c98-57e6-40c8-b2ce-5a70f64072d2", + "timestamp": "2026-04-15T01:41:52.334143+00:00", + "source_trajectory_ids": [ + "traj-3493a4b4-50a6-4a28-8a28-b6811f7dd289", + "traj-363519ec-b624-4a15-857c-7dd124594ef9", + "traj-68a740ca-fdb0-4fe4-9a32-a05ea194f9eb", + "traj-73adf39a-8d2a-465b-815b-d6b59ce127f9", + "traj-7a2de94c-df1d-4b74-83c9-1170b533d845", + "traj-7dd9573a-f665-43a7-b819-cb5ed45dc137", + "traj-afaf90c3-3df5-4f20-ad12-ed790fe8d8aa", + "traj-b1e72746-d5a7-44b8-b666-aa8c48168305", + "traj-e1bc8e6c-e77a-4cb1-8175-8099e93b48de", + "traj-f98d3663-69cd-46b4-ae0a-c20bc80e86f2" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260415-014152", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-48ee7d3f-d744-43de-93ad-88b94206d59a.json b/docs/training-reports/report-48ee7d3f-d744-43de-93ad-88b94206d59a.json new file mode 100644 index 0000000..0ed466a --- /dev/null +++ b/docs/training-reports/report-48ee7d3f-d744-43de-93ad-88b94206d59a.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-48ee7d3f-d744-43de-93ad-88b94206d59a", + "timestamp": "2026-04-15T02:33:47.729485+00:00", + "source_trajectory_ids": [ + "traj-03f64fa4-12e6-43d3-bdd4-850386d7941f", + "traj-231b0863-7611-4fb2-9638-476e7663da5d", + "traj-552c1c6d-4bde-446e-bbf8-c762fed02e81", + "traj-55c1df2f-4e21-463f-8be8-d412eb14da62", + "traj-686ff2d2-9a3c-43e3-8b40-f96962f5647a", + "traj-6e82a8d9-15a2-4891-97c8-ecb3ec3f7192", + "traj-91c9e740-03c7-41b0-883d-b5eeef1a3cc0", + "traj-9d012535-4e80-4aed-b515-97c6a01f4d53", + "traj-ad68a77b-9601-42f2-9c42-0ca21bb7c73f", + "traj-b428fffd-fd29-488a-9d12-95db39eac38e" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260415-023347", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-4a5187ed-5a28-4ee0-8b40-2e821323693d.json b/docs/training-reports/report-4a5187ed-5a28-4ee0-8b40-2e821323693d.json new file mode 100644 index 0000000..00994b3 --- /dev/null +++ b/docs/training-reports/report-4a5187ed-5a28-4ee0-8b40-2e821323693d.json @@ -0,0 +1,44 @@ +{ + "report_id": "report-4a5187ed-5a28-4ee0-8b40-2e821323693d", + "timestamp": "2026-04-14T18:51:33.158431+00:00", + "source_trajectory_ids": [ + "traj-137b6353-1315-4d48-9b83-ad28569d0c96", + "traj-5bd3ab53-8268-4fe0-94a3-490a58200f6f", + "traj-60d28c42-0680-46f8-8de2-f61ec935db8f", + "traj-7886fd07-f658-4bc2-8e44-232a6aad7480", + "traj-868a1eeb-dfad-449d-a725-227ea2c01931", + "traj-950f2bf9-4a9a-4498-a7fc-79abfbc47937", + "traj-a411bae6-de2f-4f81-bc94-6e7075c24b4b", + "traj-a46a7e48-9852-4132-bc5b-44c0dcd5744b", + "traj-eb000b22-fc6f-4a47-9287-7b960627725b", + "traj-f45c3555-2524-44df-9644-979f68d3ed71" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-4b6ecd0d-81b5-4e2c-b52a-d9907a011b20.json b/docs/training-reports/report-4b6ecd0d-81b5-4e2c-b52a-d9907a011b20.json new file mode 100644 index 0000000..d93ccea --- /dev/null +++ b/docs/training-reports/report-4b6ecd0d-81b5-4e2c-b52a-d9907a011b20.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-4b6ecd0d-81b5-4e2c-b52a-d9907a011b20", + "timestamp": "2026-04-14T18:58:16.264614+00:00", + "source_trajectory_ids": [ + "traj-59f457a5-cb29-40d8-bc7d-8730e06986df", + "traj-7102877e-30e6-448b-bb99-9b0ac908a73e", + "traj-79299b5a-8e49-4ccb-8881-948903c39580", + "traj-968451f5-70ec-4d23-b040-9e638a324b78", + "traj-c28c9b70-91b0-412e-af2d-f4bd5551cee4", + "traj-c2e9306c-f40a-45e9-8954-3fc538faaf6d", + "traj-d09a7893-3785-4c50-8a70-69f3bc43173b", + "traj-d205135d-bff0-4f1b-8626-912fee42f566", + "traj-e2741ebc-dc3f-490d-9a8a-aba7d6ac5f0d", + "traj-e408ab2c-8fdb-4a61-a71d-30f9abb9ff3d" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-185816", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-4d50ee54-0bfa-4e34-926b-45900bfd3f8d.json b/docs/training-reports/report-4d50ee54-0bfa-4e34-926b-45900bfd3f8d.json new file mode 100644 index 0000000..eb4f1d7 --- /dev/null +++ b/docs/training-reports/report-4d50ee54-0bfa-4e34-926b-45900bfd3f8d.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-4d50ee54-0bfa-4e34-926b-45900bfd3f8d", + "timestamp": "2026-04-14T20:58:05.422298+00:00", + "source_trajectory_ids": [ + "traj-067e1a4c-17cd-4374-b0c6-bab2ee05aeab", + "traj-241638b5-a874-45da-9be1-cd0541337e08", + "traj-38541187-375c-438a-be39-837e2330e11a", + "traj-5be14b5b-4c9c-459f-afc9-22a5935a2cc8", + "traj-6c2e9b3a-0327-43ca-81e5-56db7ae936b7", + "traj-7025a020-2103-4fe2-baa5-a0da4c7b9cbb", + "traj-79299683-0b2d-4e4e-a124-1fa48c59ef36", + "traj-96c030f3-e898-42c0-9911-338093d1dc60", + "traj-a573418c-2c39-41cf-9440-477dea42c020", + "traj-b1f69fcc-66d9-4f33-8b73-312398f8217e" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-205805", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-4dc7057d-d4f5-4250-b949-bc20d6f7521a.json b/docs/training-reports/report-4dc7057d-d4f5-4250-b949-bc20d6f7521a.json new file mode 100644 index 0000000..0b59172 --- /dev/null +++ b/docs/training-reports/report-4dc7057d-d4f5-4250-b949-bc20d6f7521a.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-4dc7057d-d4f5-4250-b949-bc20d6f7521a", + "timestamp": "2026-04-15T02:33:47.693933+00:00", + "source_trajectory_ids": [ + "traj-0087d4b2-4bf7-4f26-aa54-1cb943e94dd0", + "traj-21bb050c-500e-41e1-87b9-d43e9488d12c", + "traj-282b3e0e-3c53-4580-bdcb-b25d66b9f3dd", + "traj-42d8a589-60b0-4001-8165-25c36e4e6d09", + "traj-508a9891-7a54-4b4c-8dd7-943e5539100e", + "traj-510f9b87-c96a-4b30-897a-96b86c6d15c4", + "traj-6df384c0-0ce4-4049-8983-87ce2880ac89", + "traj-7b69d26f-cbd3-4b5e-ac40-44133fe58a39", + "traj-cf692ef7-f7ff-467f-85a4-dff08182e3e8", + "traj-f44fae13-f32f-41da-a739-cec74835bd37" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260415-023347", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-4e7caa32-18a9-456f-a773-1c67fe43ce31.json b/docs/training-reports/report-4e7caa32-18a9-456f-a773-1c67fe43ce31.json new file mode 100644 index 0000000..411a33c --- /dev/null +++ b/docs/training-reports/report-4e7caa32-18a9-456f-a773-1c67fe43ce31.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-4e7caa32-18a9-456f-a773-1c67fe43ce31", + "timestamp": "2026-04-14T20:32:37.607581+00:00", + "source_trajectory_ids": [ + "traj-20efc089-2454-4f47-aad0-a67f25a73279", + "traj-31875fd0-e57c-42f6-b659-3140e3f06f5b", + "traj-440f0075-1325-461f-b67c-e68605eea57e", + "traj-462b7514-5775-4ee2-a228-8923fc083277", + "traj-5eaada3f-e728-41be-a722-eaacc2343b27", + "traj-60259dde-b711-40e5-b639-b935ef71307f", + "traj-951afa2a-5840-4fc2-a770-8e5624551899", + "traj-ae15a5aa-85e9-4879-b3f5-ce1fc5882cd1", + "traj-d76ddb92-8283-4600-9081-ca6a81af9cd4", + "traj-fb2ecbaa-cbfc-4c35-8c7e-08290bb2763f" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-203237", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-4ecb8a73-0bf3-4c31-ba10-1be9db3146f9.json b/docs/training-reports/report-4ecb8a73-0bf3-4c31-ba10-1be9db3146f9.json new file mode 100644 index 0000000..422bcb7 --- /dev/null +++ b/docs/training-reports/report-4ecb8a73-0bf3-4c31-ba10-1be9db3146f9.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-4ecb8a73-0bf3-4c31-ba10-1be9db3146f9", + "timestamp": "2026-04-14T20:31:11.222850+00:00", + "source_trajectory_ids": [ + "traj-0681f8e7-a761-4503-99eb-1e4bc006ec12", + "traj-0ce9f156-20c5-49a3-8b0a-1fc9e028996c", + "traj-13c2f422-b99b-4c93-b92b-35180e88519e", + "traj-2dfba0de-8e4b-4e07-b6b3-0752f278c8e9", + "traj-5c09b804-4076-41bf-ae6f-17fbc024ee42", + "traj-77c07fd7-4479-494b-bdff-0a353664e330", + "traj-a26447e5-654d-4ecd-9af9-270d8ef26e6b", + "traj-bd4660e9-a9a8-4aad-8215-cfc5708b4e04", + "traj-c059fabf-2e18-481d-a645-3718b74b9963", + "traj-dbf9d2d7-1476-4b77-8f64-9b9e0c16535f" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-203111", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-5012fdef-1b08-4522-a946-072c23b71714.json b/docs/training-reports/report-5012fdef-1b08-4522-a946-072c23b71714.json new file mode 100644 index 0000000..c4abe09 --- /dev/null +++ b/docs/training-reports/report-5012fdef-1b08-4522-a946-072c23b71714.json @@ -0,0 +1,41 @@ +{ + "report_id": "report-5012fdef-1b08-4522-a946-072c23b71714", + "timestamp": "2026-04-14T18:00:27.756619+00:00", + "source_trajectory_ids": [ + "traj-13febbec-cc01-4c68-8895-953e3a458337", + "traj-19937555-e13c-4714-9414-1c3332af7ac8", + "traj-28634bdc-f068-476a-a40a-8318e8786aa9", + "traj-43926528-d16f-41e7-b0d5-ef6e49c1bc05", + "traj-896a8b18-30c4-4d70-98e0-e283f8fc5517", + "traj-91eb1710-444e-4c6d-90e1-aae28c98f4cf", + "traj-96e9bf84-ff75-4b6e-aa70-c328fc951b36", + "traj-a626ce77-02ac-4915-9889-4c2364adeeef", + "traj-b40efd95-f438-4215-ad9a-e46af8638c07", + "traj-fcff5db2-b6bc-46dc-b07d-d0029f88706e" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-180027" +} \ No newline at end of file diff --git a/docs/training-reports/report-508eb935-fe06-4e84-92b6-a1fe22bc6159.json b/docs/training-reports/report-508eb935-fe06-4e84-92b6-a1fe22bc6159.json new file mode 100644 index 0000000..342dd4f --- /dev/null +++ b/docs/training-reports/report-508eb935-fe06-4e84-92b6-a1fe22bc6159.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-508eb935-fe06-4e84-92b6-a1fe22bc6159", + "timestamp": "2026-04-14T20:58:05.539426+00:00", + "source_trajectory_ids": [ + "traj-0253e6ba-f504-4379-ae3d-83016e88c09d", + "traj-2144ea5b-3969-4e06-a893-330db7c757bc", + "traj-22880d82-81b8-4803-a7d8-72f7acb5712e", + "traj-4595b5b9-6e23-42f5-8a61-5fde96159580", + "traj-9c89d043-5997-4480-9de6-9262bdc02a31", + "traj-ad14e644-0b4f-4ef7-a62f-79bdea3c6222", + "traj-cf28e6e2-68f8-4351-9f0f-eb6e9cdc3cd7", + "traj-d1b53585-ba7d-4492-8164-95dd51f0fbf8", + "traj-de3972a7-1bed-44e6-adee-e4b2c7c56456", + "traj-f8bc3f68-896b-4b6f-b0f3-0a6381f70ac2" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-50e8ccdd-d716-40cf-b02c-5c349565a955.json b/docs/training-reports/report-50e8ccdd-d716-40cf-b02c-5c349565a955.json new file mode 100644 index 0000000..dbfe9f3 --- /dev/null +++ b/docs/training-reports/report-50e8ccdd-d716-40cf-b02c-5c349565a955.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-50e8ccdd-d716-40cf-b02c-5c349565a955", + "timestamp": "2026-04-14T20:02:20.807645+00:00", + "source_trajectory_ids": [ + "traj-1304d305-b55f-4849-a6d0-df7f3e0c1fb9", + "traj-3156c560-d459-42b3-a70d-df0a6e9af578", + "traj-3cb2d518-f74a-45bc-b232-a216d3f28abc", + "traj-61a1791a-5a2c-4c7d-a4fc-e4b82952c51c", + "traj-621b385d-d516-4506-8e27-5468be804754", + "traj-abc09bf9-adce-4fcd-8c1a-a151bc749971", + "traj-b08d47a2-57d9-417f-a0fd-87661137c0e4", + "traj-b2b54604-eece-4f93-96b7-df5cbe2f11b8", + "traj-e31d9867-6180-4867-9946-c32e219ed44e", + "traj-ee895acf-27e2-4efb-868b-793b5b0f5f59" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-200220", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-51b272a3-aa2a-42a9-b406-2af6bb6c13d5.json b/docs/training-reports/report-51b272a3-aa2a-42a9-b406-2af6bb6c13d5.json new file mode 100644 index 0000000..9541729 --- /dev/null +++ b/docs/training-reports/report-51b272a3-aa2a-42a9-b406-2af6bb6c13d5.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-51b272a3-aa2a-42a9-b406-2af6bb6c13d5", + "timestamp": "2026-04-15T01:33:34.743842+00:00", + "source_trajectory_ids": [ + "traj-019b1e3d-a4d5-414c-98f2-14566d4e1a73", + "traj-638e4b82-6bb8-4f95-9112-39f1d153140f", + "traj-7c90795a-8999-4733-9e20-aed2c20d11ce", + "traj-890c85dd-37ab-4949-b72a-6dbd86b22610", + "traj-8ef247e7-fc36-47b4-9c35-7791149a4bc6", + "traj-ca8173f4-ba99-4421-a150-3c04e00c1504", + "traj-e0823914-aeec-402a-b02c-27ddf95ab381", + "traj-f7357e60-cc1d-4ca7-a7b6-b00f30d5263a", + "traj-f8c00ccd-24d1-4f0a-a4d4-808d17529466", + "traj-f9e58e88-b618-4dc2-9723-09b4cf5db05b" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260415-013334", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-547cf777-50f1-4848-b5c6-e79110f6b4fa.json b/docs/training-reports/report-547cf777-50f1-4848-b5c6-e79110f6b4fa.json new file mode 100644 index 0000000..8e45a77 --- /dev/null +++ b/docs/training-reports/report-547cf777-50f1-4848-b5c6-e79110f6b4fa.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-547cf777-50f1-4848-b5c6-e79110f6b4fa", + "timestamp": "2026-04-14T21:18:36.371846+00:00", + "source_trajectory_ids": [ + "traj-008e3a6c-d014-441e-a368-3a298a94b570", + "traj-11992679-5342-4506-8f5e-88df8f2e5b37", + "traj-1bd76b8d-7968-4870-adbc-7d6f4d968d4d", + "traj-3c86846f-d086-4198-984b-b34a3d9dc32f", + "traj-559b9c31-c008-4957-8774-539f3225d67b", + "traj-680bba7c-58b7-4b87-be69-3aaeeceb8de6", + "traj-99e17c4d-a8bf-4c3a-8b1e-4e16c640c8aa", + "traj-cd676023-60fa-49d1-bc8d-8b58a8152df0", + "traj-dd628bf0-b6e0-4312-a006-9fcb36cdacd3", + "traj-e10d1f31-6253-4508-a4e9-984894d56caf" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-54f51896-f163-490b-a35f-5f7921b9fcae.json b/docs/training-reports/report-54f51896-f163-490b-a35f-5f7921b9fcae.json new file mode 100644 index 0000000..b6e117d --- /dev/null +++ b/docs/training-reports/report-54f51896-f163-490b-a35f-5f7921b9fcae.json @@ -0,0 +1,41 @@ +{ + "report_id": "report-54f51896-f163-490b-a35f-5f7921b9fcae", + "timestamp": "2026-04-14T15:52:51.142603+00:00", + "source_trajectory_ids": [ + "traj-2263147a-44b3-4e96-94d2-dea2dc31cc58", + "traj-2e08b4bd-1a4d-4ae2-a8cf-355f3812ac0b", + "traj-39cf7b85-237a-4676-8d6a-716ba807446a", + "traj-4f9ac486-f0ae-49d4-b79e-e5c9370c3def", + "traj-6fbd4429-9ab3-4077-b8d0-5ff6a64965ba", + "traj-9ad2532a-4a42-4b8f-90bd-0f1d03fc7d34", + "traj-ba2fa5f5-95e5-48c8-96c0-b8f00d6a7d36", + "traj-c77a8e7a-d1fe-4881-9288-3bb4707e42d0", + "traj-dabcee54-f3d9-412d-b392-32badcbce80c", + "traj-db70effe-7b6c-491f-93c9-4892e67507c7" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-155251" +} \ No newline at end of file diff --git a/docs/training-reports/report-58eb6ade-acce-4120-87bc-17252a66f5c1.json b/docs/training-reports/report-58eb6ade-acce-4120-87bc-17252a66f5c1.json new file mode 100644 index 0000000..c5138ed --- /dev/null +++ b/docs/training-reports/report-58eb6ade-acce-4120-87bc-17252a66f5c1.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-58eb6ade-acce-4120-87bc-17252a66f5c1", + "timestamp": "2026-04-14T22:05:44.095159+00:00", + "source_trajectory_ids": [ + "traj-04176304-8400-459c-8345-1804b8a1857b", + "traj-2ee8e3e5-1274-456d-b60a-4455e6f1d688", + "traj-3c740a85-d4bc-45e4-8aa0-6ad8de199cd5", + "traj-4a69b684-5185-4792-ba8c-82ecd5ea831c", + "traj-5af6d0c4-0d32-4c1b-8003-cbc3b16af816", + "traj-7a6aa489-6c95-4c72-a1da-63acc995c51e", + "traj-9876e667-7cc5-4f93-a3f9-53ba44b6b343", + "traj-bb97b314-3287-4d08-a5be-4cfe60a70959", + "traj-be45651c-16e4-46a8-80f6-b53319a18f4c", + "traj-e1b4102d-ce4a-448d-9060-1648dae074b6" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-5996831f-a897-4eec-8f40-1786539febde.json b/docs/training-reports/report-5996831f-a897-4eec-8f40-1786539febde.json new file mode 100644 index 0000000..e916ca3 --- /dev/null +++ b/docs/training-reports/report-5996831f-a897-4eec-8f40-1786539febde.json @@ -0,0 +1,42 @@ +{ + "report_id": "report-5996831f-a897-4eec-8f40-1786539febde", + "timestamp": "2026-04-14T18:30:24.385886+00:00", + "source_trajectory_ids": [ + "traj-0b57378b-d41c-4cd5-9c8b-cbb6d0cb6c6e", + "traj-1b519f5c-8226-45d9-97a9-7057df5389bd", + "traj-5062f87c-5bf2-4b62-b72e-6964492f4517", + "traj-62dc0190-fa0c-48cd-bd64-7f7d8caaffcb", + "traj-9bb3feb1-c576-4e24-8127-6ca03d2a0c08", + "traj-ca8b06b6-af43-418a-bff1-fe3323e43a01", + "traj-e042d848-7805-42f2-ad1b-8500c6e93136", + "traj-e3de23bb-643d-4833-a5c2-fe98182292ce", + "traj-ea84f194-39a1-40bf-8a70-592f80ba40e0", + "traj-ec288161-f7b1-46df-b7c4-26aa79f5e5c5" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-183024", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-5bbea27a-8808-4142-a300-a5ea9d1b8abb.json b/docs/training-reports/report-5bbea27a-8808-4142-a300-a5ea9d1b8abb.json new file mode 100644 index 0000000..9ced682 --- /dev/null +++ b/docs/training-reports/report-5bbea27a-8808-4142-a300-a5ea9d1b8abb.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-5bbea27a-8808-4142-a300-a5ea9d1b8abb", + "timestamp": "2026-04-14T20:07:38.784383+00:00", + "source_trajectory_ids": [ + "traj-24636cff-0df9-464a-aadc-49e7d5bb4b19", + "traj-38b267f5-2839-41fb-8600-e3eff1ebc850", + "traj-49244af8-1ff7-4381-9bb7-68279aab96a3", + "traj-59765e1c-4ebc-42fd-af01-e36389ff3e39", + "traj-7c3f75af-6df0-485a-af83-714eb5595dbc", + "traj-97a16c7c-4b65-486f-920f-b488ee581a57", + "traj-b386fe62-333b-4cc1-be14-473188c861ed", + "traj-e07c427f-bdfb-43d8-953d-e3a7e91113c2", + "traj-e6458a19-1a8e-4d61-8444-57dfb89649f1", + "traj-eea30f77-6daf-4f21-9e90-8be64b29e654" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-200738", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-5bbfa4bf-2555-42d8-bc71-2dd94bf9a1aa.json b/docs/training-reports/report-5bbfa4bf-2555-42d8-bc71-2dd94bf9a1aa.json new file mode 100644 index 0000000..0109c2a --- /dev/null +++ b/docs/training-reports/report-5bbfa4bf-2555-42d8-bc71-2dd94bf9a1aa.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-5bbfa4bf-2555-42d8-bc71-2dd94bf9a1aa", + "timestamp": "2026-04-15T01:57:32.884356+00:00", + "source_trajectory_ids": [ + "traj-1d68febb-363d-405a-8456-10d070a0df6f", + "traj-5314e5cf-7c86-4ace-9c2c-34b169b9e455", + "traj-5f54288b-b434-4a88-be89-e4115b32c589", + "traj-67e64768-970a-4fde-8b7a-1e5c4821d0e9", + "traj-76a7c854-ff21-4f1e-9a45-f37d288b69bb", + "traj-788a70d9-4154-4526-af64-d3572ee9f3c5", + "traj-9472520c-14be-4841-b662-3506fc829022", + "traj-bfe0c5b0-1109-471d-b5e8-383f4a6f6219", + "traj-c314335d-9b65-4ccd-b6e0-37fff704101a", + "traj-cd70da62-bdba-4e86-ab7e-346b2f439fbb" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260415-015732", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-5c909aba-8438-4c52-9ac5-aa82c8552a5c.json b/docs/training-reports/report-5c909aba-8438-4c52-9ac5-aa82c8552a5c.json new file mode 100644 index 0000000..52ee59b --- /dev/null +++ b/docs/training-reports/report-5c909aba-8438-4c52-9ac5-aa82c8552a5c.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-5c909aba-8438-4c52-9ac5-aa82c8552a5c", + "timestamp": "2026-04-14T21:21:15.173091+00:00", + "source_trajectory_ids": [ + "traj-0ae4d970-1f54-4fca-acf9-49d9786ed602", + "traj-2be44f9a-80d1-4985-9976-a84cdc65ec77", + "traj-54917937-5e1d-4736-9fec-011478a063d3", + "traj-6c11024c-1199-4136-9cad-ebdb70a93b1d", + "traj-74c8f20b-b1fe-4880-8311-32bf22fb2d97", + "traj-921d7a96-8875-4c2f-a1be-e4d59c99b5a2", + "traj-96bfd297-d95c-4190-bac0-01c314cedd02", + "traj-9f46c7fe-20d4-48bc-a430-4dac0a33b36d", + "traj-a9f90b8e-b3e3-4c38-b379-5bae85c0017e", + "traj-f4214286-6b97-416a-98be-2af5379bc7a1" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-5ce6674e-f6e1-4622-aba2-79ed9fb07bc9.json b/docs/training-reports/report-5ce6674e-f6e1-4622-aba2-79ed9fb07bc9.json new file mode 100644 index 0000000..a7cb30a --- /dev/null +++ b/docs/training-reports/report-5ce6674e-f6e1-4622-aba2-79ed9fb07bc9.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-5ce6674e-f6e1-4622-aba2-79ed9fb07bc9", + "timestamp": "2026-04-15T01:21:53.799961+00:00", + "source_trajectory_ids": [ + "traj-1222b185-244e-46f7-a60d-f7f024819f22", + "traj-1d47bcaa-3b89-47a0-9e11-bd5b0cada785", + "traj-250b5772-1194-44f7-9c27-1eab15bf6428", + "traj-56d2f9d5-9608-4392-886e-0e7693c2ae7b", + "traj-befbea50-2481-4558-96f3-a46a4bcae55f", + "traj-d834d9a9-bb52-418a-8ac9-7ef9aac0d506", + "traj-e551dfaf-20cc-4f16-bc94-693d8ba2355b", + "traj-e5763320-3dbd-4f49-80be-3a0126e26e09", + "traj-eabda2c0-ccee-4fe5-b023-e7dc80cae38a", + "traj-f37c8ecb-eef8-41a2-93cb-7860a8341ae0" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-5d3ab2d4-362f-48ea-8e43-98794a729853.json b/docs/training-reports/report-5d3ab2d4-362f-48ea-8e43-98794a729853.json new file mode 100644 index 0000000..41f7846 --- /dev/null +++ b/docs/training-reports/report-5d3ab2d4-362f-48ea-8e43-98794a729853.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-5d3ab2d4-362f-48ea-8e43-98794a729853", + "timestamp": "2026-04-14T20:31:11.240432+00:00", + "source_trajectory_ids": [ + "traj-052c2084-bf8f-45b3-a48b-eaa9b95f5cdc", + "traj-063c87a1-bde2-43e3-ace5-e7c29a9ad296", + "traj-08ac8afd-1f85-4f80-928a-c014756ad1ad", + "traj-254ee33c-82e6-4078-9431-5f421a88eac7", + "traj-3bb69895-8479-48dc-8685-971a00d2dd42", + "traj-5271851b-e3b0-4e5d-981b-2d20551680b1", + "traj-9909611d-1af8-4657-91d2-5bb784c28df7", + "traj-d7ec1de8-c98b-4489-9802-6118696a128d", + "traj-d8d708d8-2e49-49a9-a701-6c08ec33ce8c", + "traj-e3629c94-90cd-4ade-92b6-6b12d088150e" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-203111", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-5f3cc25f-f580-4610-8939-f275b48348aa.json b/docs/training-reports/report-5f3cc25f-f580-4610-8939-f275b48348aa.json new file mode 100644 index 0000000..0139b50 --- /dev/null +++ b/docs/training-reports/report-5f3cc25f-f580-4610-8939-f275b48348aa.json @@ -0,0 +1,42 @@ +{ + "report_id": "report-5f3cc25f-f580-4610-8939-f275b48348aa", + "timestamp": "2026-04-14T18:51:33.102552+00:00", + "source_trajectory_ids": [ + "traj-12af88a3-cd2a-47eb-9f64-cfb6e22e54ce", + "traj-567e562b-b673-4216-a97c-e0342f30a392", + "traj-68ba5f57-66c1-442a-876b-358a1c3ec8a7", + "traj-88df2859-a5f8-41a0-a6b2-b69c7b7534e6", + "traj-a9dd0423-655e-4de1-b438-f129b88eacec", + "traj-ba9449c4-6593-416c-bb47-3aa2e47c8ec7", + "traj-bf6a9a13-af26-4ff1-8ec5-b087bbfde8d1", + "traj-c3f4305f-ca99-44e7-a009-35a3e996acd2", + "traj-e45c104e-f29b-435d-8eb0-cf7c3da2fb2d", + "traj-f71ab09f-bd40-4637-a9ba-999c1e049c47" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-185133", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-5f961f5f-f3de-463d-8586-b770aa2e951f.json b/docs/training-reports/report-5f961f5f-f3de-463d-8586-b770aa2e951f.json new file mode 100644 index 0000000..cbc9f34 --- /dev/null +++ b/docs/training-reports/report-5f961f5f-f3de-463d-8586-b770aa2e951f.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-5f961f5f-f3de-463d-8586-b770aa2e951f", + "timestamp": "2026-04-15T02:31:17.458912+00:00", + "source_trajectory_ids": [ + "traj-157f8c21-719f-4ba8-8243-f22d95e30091", + "traj-1daf4a22-f63c-4119-8f98-426a42756b47", + "traj-3abf53ed-a0db-4b62-a9ee-b24cb3e74359", + "traj-4d732a6b-1a09-4547-91b0-da7c3b72f9c0", + "traj-5efe2891-e10a-44f1-8882-fa9f35ef0ff5", + "traj-6ea2a8f6-86ce-406d-ad52-54de0aabfe6a", + "traj-7653da9a-96a4-4a34-938d-8251fcf02ea4", + "traj-8b17c700-e430-4c91-928f-25db8f0d73f2", + "traj-c4efa221-68bc-4d01-8bf8-eed2268f0c7e", + "traj-ff6a9712-5857-48ce-96f4-b744d655bd18" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-6579e004-98ff-4fd0-b710-2116b47bbe9e.json b/docs/training-reports/report-6579e004-98ff-4fd0-b710-2116b47bbe9e.json new file mode 100644 index 0000000..84c7a57 --- /dev/null +++ b/docs/training-reports/report-6579e004-98ff-4fd0-b710-2116b47bbe9e.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-6579e004-98ff-4fd0-b710-2116b47bbe9e", + "timestamp": "2026-04-15T01:25:33.889628+00:00", + "source_trajectory_ids": [ + "traj-1a1d2d0d-9f79-4edf-805b-d8b4bbd920f4", + "traj-28a1f4a0-7446-4c16-886a-99013507cf20", + "traj-3501ab39-8337-47f7-9ae7-7700c41d160f", + "traj-96eff406-cf30-49aa-9ee8-e278a1a1789c", + "traj-a63dbc77-e3e2-4db7-935e-22a088879ff7", + "traj-aefe0c29-8f97-46d5-8e33-23bc714c7151", + "traj-b10077c3-22ca-46ba-9a3d-70c3474f2449", + "traj-c7ee4596-ca05-40b3-8e24-c0b6f95fb0b8", + "traj-d0429a8b-3e38-4316-9f61-09b5847908e9", + "traj-d128fdeb-ec2a-44db-b1fd-f49f63c33536" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-657d2b18-44a2-42a7-821f-5697611b403f.json b/docs/training-reports/report-657d2b18-44a2-42a7-821f-5697611b403f.json new file mode 100644 index 0000000..5cd7820 --- /dev/null +++ b/docs/training-reports/report-657d2b18-44a2-42a7-821f-5697611b403f.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-657d2b18-44a2-42a7-821f-5697611b403f", + "timestamp": "2026-04-14T20:54:35.829616+00:00", + "source_trajectory_ids": [ + "traj-29e7080a-f38d-4eff-8885-01625f706940", + "traj-4406b1af-13c0-4edb-937e-70a4ad05e9c4", + "traj-74fc5cb4-bcf2-4400-adb4-10f29ad17c13", + "traj-86c672fc-af0b-47c3-9e29-b0ebadab6c3c", + "traj-aef7759d-0c2e-44aa-af65-f40bd4a217ce", + "traj-d087f258-1c76-47e0-b532-7effcece43eb", + "traj-d8f2b0a7-2a9c-4354-b4f1-4fd3de378881", + "traj-e392e827-019b-4679-88aa-b0ca9b9e1a37", + "traj-eddec9a7-94c9-4787-8826-bbaf84c0be35", + "traj-effafeb3-8762-41b6-8594-c1267172b81a" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-205435", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-66aa10dc-0527-4459-bb2e-b5f4da1272c1.json b/docs/training-reports/report-66aa10dc-0527-4459-bb2e-b5f4da1272c1.json new file mode 100644 index 0000000..df424cd --- /dev/null +++ b/docs/training-reports/report-66aa10dc-0527-4459-bb2e-b5f4da1272c1.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-66aa10dc-0527-4459-bb2e-b5f4da1272c1", + "timestamp": "2026-04-14T19:21:09.831626+00:00", + "source_trajectory_ids": [ + "traj-0be01236-de1b-4a17-8d26-ada21cca007a", + "traj-13ae15bb-9bcf-46d7-a8f7-318880c697b2", + "traj-488630e3-898b-4668-9f93-8e3f388bb3c0", + "traj-4e3fc1d9-87f7-4acd-8d9e-8f8caa259c16", + "traj-594da9b5-d065-43c6-8581-e1c325917c5c", + "traj-5cd271b2-b721-4cb3-9d73-35f493d076e7", + "traj-61257688-a738-4164-aeae-6b8ca5b4ccfc", + "traj-7a425e1b-ad77-42eb-bc99-e83d37f0d13c", + "traj-94d587fa-4217-4d91-8119-15c33648ed5b", + "traj-e9bbac88-b876-432a-8208-b45250af8e42" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-192109", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-66beff81-2ce9-4bf5-9d18-6518f371347c.json b/docs/training-reports/report-66beff81-2ce9-4bf5-9d18-6518f371347c.json new file mode 100644 index 0000000..87504b5 --- /dev/null +++ b/docs/training-reports/report-66beff81-2ce9-4bf5-9d18-6518f371347c.json @@ -0,0 +1,41 @@ +{ + "report_id": "report-66beff81-2ce9-4bf5-9d18-6518f371347c", + "timestamp": "2026-04-14T16:54:50.517635+00:00", + "source_trajectory_ids": [ + "traj-20e81f4f-5c98-4f41-b892-81be69efa36c", + "traj-21e3d99b-04f5-4402-9733-f8beaaf8c044", + "traj-54727683-a607-494e-9944-a51bc38d5d22", + "traj-62e7f1b3-ca1f-439e-90ba-081637093125", + "traj-9a7b407c-622b-4adf-9143-513bd58d480e", + "traj-a9ebbf52-9235-4227-8d40-f40cd19b5231", + "traj-b3b68970-229e-40e4-a98c-11a909ba9f7d", + "traj-b786daaf-4ee7-4d79-b09e-0aec9a2b16a3", + "traj-b9b0d092-8782-4f66-beef-4d0995a32fa0", + "traj-c153d7b1-4379-439d-b808-9378fe9f054e" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-165450" +} \ No newline at end of file diff --git a/docs/training-reports/report-679f7a61-bd73-423f-81c8-9836584ce96f.json b/docs/training-reports/report-679f7a61-bd73-423f-81c8-9836584ce96f.json new file mode 100644 index 0000000..fc85e23 --- /dev/null +++ b/docs/training-reports/report-679f7a61-bd73-423f-81c8-9836584ce96f.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-679f7a61-bd73-423f-81c8-9836584ce96f", + "timestamp": "2026-04-14T15:01:27.746539+00:00", + "source_trajectory_ids": [ + "traj-008e354d-cf3f-4838-885d-887add59b833", + "traj-0408386b-4086-4e8b-aab0-34092888ab20", + "traj-41487090-aab2-4c39-bb32-3f08b34253c3", + "traj-49d76619-87bd-4968-ab72-fe2cc4fd687d", + "traj-4ba6da5f-0faa-4792-ab0d-6c44ec286fc3", + "traj-4f8edddf-0ee2-49ac-8d3a-292447189a95", + "traj-6155b78f-ccdd-4141-b0ce-d2d9f3553550", + "traj-6ac43687-13a2-4987-94cf-c7032bc6a8d2", + "traj-7b19cd3d-55e0-4da2-a517-96051a53e63c", + "traj-f2ad8bea-5289-443c-b99d-6e81a5b3f33a" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null +} \ No newline at end of file diff --git a/docs/training-reports/report-68d38c2f-d8dc-4b33-8514-8fa9410b0961.json b/docs/training-reports/report-68d38c2f-d8dc-4b33-8514-8fa9410b0961.json new file mode 100644 index 0000000..2fbfde1 --- /dev/null +++ b/docs/training-reports/report-68d38c2f-d8dc-4b33-8514-8fa9410b0961.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-68d38c2f-d8dc-4b33-8514-8fa9410b0961", + "timestamp": "2026-04-14T20:57:28.152811+00:00", + "source_trajectory_ids": [ + "traj-1c05d834-9c1d-4a43-ad8b-8c6c8087256a", + "traj-386bdc31-2972-4831-9c43-01a2aae8d9f4", + "traj-730465f5-2014-4bea-b491-4bb8fe7c64a4", + "traj-79375497-a0ac-45dd-ae4d-b09bff4be7ed", + "traj-871a56b9-b8bc-40bf-9c2c-9b6358ab53cc", + "traj-ac621263-b868-4692-ab37-7a86bff5a35d", + "traj-b077ff51-61cb-4947-a8b8-0f5f7c5ebea7", + "traj-b342a862-14e1-4635-b4b5-d8e36d10d29b", + "traj-ce5c7f40-792e-43f1-b727-31fd4fdb05e8", + "traj-f9deff53-f6e6-4bdb-a58e-16c927c90d80" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-690aa1c7-7761-49c2-b0a3-9bddfb75dc47.json b/docs/training-reports/report-690aa1c7-7761-49c2-b0a3-9bddfb75dc47.json new file mode 100644 index 0000000..1d4731c --- /dev/null +++ b/docs/training-reports/report-690aa1c7-7761-49c2-b0a3-9bddfb75dc47.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-690aa1c7-7761-49c2-b0a3-9bddfb75dc47", + "timestamp": "2026-04-14T21:22:14.790781+00:00", + "source_trajectory_ids": [ + "traj-0558d6b6-9b4f-4580-a6fa-534dac75f3c4", + "traj-1da973fc-88c3-4ea1-9574-3b7a1558d315", + "traj-2adae2da-c0c1-4cff-ba3d-c16500853ca6", + "traj-41302519-e96f-4a36-bd95-982f8690569e", + "traj-5c77ffb7-aee1-4b79-8e88-7d9116c70629", + "traj-77ad9e4d-6265-41e9-8af3-9cef86345cf9", + "traj-aa6c3083-3d8d-4b8f-93eb-674346ef5005", + "traj-aec8383e-152a-46c8-83ec-0fc181a93516", + "traj-c4130706-d699-4990-805a-a8a90d5b7b8d", + "traj-f0a1f252-5a21-47c6-bfdd-3ff5026b1fec" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-6bf4158b-c9d8-4eab-ba70-6eb707d0bca1.json b/docs/training-reports/report-6bf4158b-c9d8-4eab-ba70-6eb707d0bca1.json new file mode 100644 index 0000000..178e936 --- /dev/null +++ b/docs/training-reports/report-6bf4158b-c9d8-4eab-ba70-6eb707d0bca1.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-6bf4158b-c9d8-4eab-ba70-6eb707d0bca1", + "timestamp": "2026-04-14T15:29:36.013045+00:00", + "source_trajectory_ids": [ + "traj-1d69c6a6-e7d9-446a-911e-96fa227a3c16", + "traj-1dcf8869-c1d1-4fb0-93f0-6fc42870ef9d", + "traj-3f9baf44-61f6-4428-9e9f-8c3f007e522d", + "traj-5018de53-b5b9-4e3e-b160-00c07286de44", + "traj-5dbdcc78-ec34-4272-99b1-bd91c940a8b7", + "traj-a2a2f17f-a173-47b5-859e-bc7774081f6a", + "traj-adf886a2-770f-49bf-b3e2-c6f43d7f138e", + "traj-b2621d11-6b94-45b9-93fb-a777e0d900dc", + "traj-e5f11c2b-56d6-4b1d-b827-e2542af15171", + "traj-f152c0cc-5cc8-4ba1-836f-57bd6d8afe05" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null +} \ No newline at end of file diff --git a/docs/training-reports/report-6ea1be0d-d5b7-40f4-8b56-2519612367c7.json b/docs/training-reports/report-6ea1be0d-d5b7-40f4-8b56-2519612367c7.json new file mode 100644 index 0000000..3130e1c --- /dev/null +++ b/docs/training-reports/report-6ea1be0d-d5b7-40f4-8b56-2519612367c7.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-6ea1be0d-d5b7-40f4-8b56-2519612367c7", + "timestamp": "2026-04-14T15:29:42.041754+00:00", + "source_trajectory_ids": [ + "traj-0ab04ba8-0292-4c2c-bfa4-8a13a848da6c", + "traj-2db80c63-593c-4d37-b90e-582192c5ba19", + "traj-2f821626-4bb5-4af3-a24d-58c8e188e23e", + "traj-68f4d4af-0e40-48d4-b031-0260bc21effe", + "traj-6b2eb327-63dc-49f6-9e97-9393bce968ab", + "traj-b0ad6eaa-82c7-4abe-b854-b48732b81eaa", + "traj-b544cece-67a9-4920-8c5f-cacf4088ac44", + "traj-c035a938-9de2-4346-8407-6b47d0f38597", + "traj-d2fb2aca-ae01-4c1b-923f-5f1fe9d8e92b", + "traj-f7dbd739-5c42-4ba5-bca2-b4f390c0c277" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null +} \ No newline at end of file diff --git a/docs/training-reports/report-6ecc1806-5caf-4def-b8e7-64f9b44be9fc.json b/docs/training-reports/report-6ecc1806-5caf-4def-b8e7-64f9b44be9fc.json new file mode 100644 index 0000000..7106dea --- /dev/null +++ b/docs/training-reports/report-6ecc1806-5caf-4def-b8e7-64f9b44be9fc.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-6ecc1806-5caf-4def-b8e7-64f9b44be9fc", + "timestamp": "2026-04-14T20:58:05.406666+00:00", + "source_trajectory_ids": [ + "traj-14f93591-742c-468d-b37b-93e19bfcc79f", + "traj-4a9b5ebf-5c1f-442a-a0e4-8f14a4e9f678", + "traj-4df1b668-4114-499a-ac9e-fddd57e0b1c8", + "traj-63d5932c-183a-4fb0-95d8-18523c4c486d", + "traj-64f1feba-60b7-40e3-aea8-0d1a1c0defbb", + "traj-687aee87-9150-442b-aed7-45b98399cd02", + "traj-69e4c95d-6d60-4466-a582-d0741d22f04d", + "traj-88339f4c-3c81-4137-b233-eea51c1a6b8e", + "traj-8b107137-35d2-42f3-a5b8-14e0eaa7aa30", + "traj-92c65d68-f839-49ac-8fb7-7a29f9af19f0" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-205805", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-6ecec7b8-074a-46ef-8b29-3b683738074e.json b/docs/training-reports/report-6ecec7b8-074a-46ef-8b29-3b683738074e.json new file mode 100644 index 0000000..8877268 --- /dev/null +++ b/docs/training-reports/report-6ecec7b8-074a-46ef-8b29-3b683738074e.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-6ecec7b8-074a-46ef-8b29-3b683738074e", + "timestamp": "2026-04-14T17:15:16.862902+00:00", + "source_trajectory_ids": [ + "traj-082da722-82fa-460c-b98f-22fad1f364d8", + "traj-09e7688c-92a9-4910-a0f7-92a0c35eb252", + "traj-1118a441-97ff-4431-8427-c0d5535751a0", + "traj-1ed861c0-1fa5-4052-9dc6-d26c9d738de6", + "traj-65d5dc7d-e9d3-48bf-a3ae-34e55c870195", + "traj-8f0f2c25-2565-4144-889c-7a34adb30d45", + "traj-a04080e7-f1fa-4047-9f58-0cd5231e5e36", + "traj-b69e9885-37d2-4c22-9ac1-b0f7b60b06a9", + "traj-bb6f76bf-a5d5-4cfa-8be9-1d54c21e8591", + "traj-dae57c63-fbd6-4a87-bd9e-4ac58a662543" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null +} \ No newline at end of file diff --git a/docs/training-reports/report-6f6c29a6-873b-478d-8847-24480f44f8a6.json b/docs/training-reports/report-6f6c29a6-873b-478d-8847-24480f44f8a6.json new file mode 100644 index 0000000..11c4ced --- /dev/null +++ b/docs/training-reports/report-6f6c29a6-873b-478d-8847-24480f44f8a6.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-6f6c29a6-873b-478d-8847-24480f44f8a6", + "timestamp": "2026-04-14T19:19:01.987820+00:00", + "source_trajectory_ids": [ + "traj-0bd48b9e-1baf-4ced-ad03-1a98ce4dbc80", + "traj-1edfc2ce-d806-41cf-87b1-53c49a1c8f44", + "traj-3de2ffc5-5a19-40a5-a1ab-3b9c0d01823f", + "traj-4d76447d-0c25-424d-9863-0d47c54d7736", + "traj-7b87b3fb-9469-4cfc-896d-615c860a1e87", + "traj-888a4065-b13c-4214-8259-40e12210bc35", + "traj-9703ae83-7f2e-4f15-bcc9-03979412bcd6", + "traj-9b5fa2de-1d78-4f95-9e08-f00a367871ec", + "traj-9f10367b-dc6a-4da0-90e7-f2b5e69665ed", + "traj-fda90a71-144b-4109-919f-68ed99b1dcfe" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-191901", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-6ff5d88c-42b7-4513-af9b-da0d1cfb5d0e.json b/docs/training-reports/report-6ff5d88c-42b7-4513-af9b-da0d1cfb5d0e.json new file mode 100644 index 0000000..18a6436 --- /dev/null +++ b/docs/training-reports/report-6ff5d88c-42b7-4513-af9b-da0d1cfb5d0e.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-6ff5d88c-42b7-4513-af9b-da0d1cfb5d0e", + "timestamp": "2026-04-14T22:08:19.799797+00:00", + "source_trajectory_ids": [ + "traj-00579592-c3af-41d1-9f1f-7ef863150303", + "traj-486e0adb-b3a7-424b-9803-4adef3754d11", + "traj-4e60a033-8d1d-4d40-97ff-285f1aadb44d", + "traj-56e0b16c-a542-4212-ab69-d60ff62a8579", + "traj-68706353-1cf2-451f-909a-ed43ee8d922a", + "traj-789d830c-1db1-4671-a4c1-060c056b9da7", + "traj-7c4c4cb3-aae7-48c7-ae5e-8bd6263f8e30", + "traj-a822f692-f0a2-433a-ab51-96870b509a22", + "traj-e298e14d-2a48-405e-a530-90b77be9905a", + "traj-feb5e08d-5b97-436e-87e4-ce2d2de84414" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-220819", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-7078b6d8-f026-4c36-89d9-12d02f651dd3.json b/docs/training-reports/report-7078b6d8-f026-4c36-89d9-12d02f651dd3.json new file mode 100644 index 0000000..6ac60f8 --- /dev/null +++ b/docs/training-reports/report-7078b6d8-f026-4c36-89d9-12d02f651dd3.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-7078b6d8-f026-4c36-89d9-12d02f651dd3", + "timestamp": "2026-04-14T22:09:38.785450+00:00", + "source_trajectory_ids": [ + "traj-031720a7-fea3-4e49-b135-b3b5591307d1", + "traj-427426c8-07fb-49b7-b19a-e59471bb5b03", + "traj-64093076-b764-4a6a-9eea-157054480539", + "traj-8cfbfc9c-4d1c-42f1-9361-38a8db2af9ec", + "traj-9baadeac-6b04-46b5-b132-e963061732b6", + "traj-acdc31dd-34e4-4016-906e-77b711e81c66", + "traj-b09d085d-c4c4-4df3-89a1-3980140aa96e", + "traj-c31478e4-761a-43e0-b895-6a33b7eebb40", + "traj-d1bccd73-8fb2-4ced-a684-773fd49ee78e", + "traj-fb63c834-a6f7-495a-b9ca-a232678a981f" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-220938", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-716d3658-e56e-475c-b87d-1904b15184e8.json b/docs/training-reports/report-716d3658-e56e-475c-b87d-1904b15184e8.json new file mode 100644 index 0000000..ac103a6 --- /dev/null +++ b/docs/training-reports/report-716d3658-e56e-475c-b87d-1904b15184e8.json @@ -0,0 +1,41 @@ +{ + "report_id": "report-716d3658-e56e-475c-b87d-1904b15184e8", + "timestamp": "2026-04-14T16:53:16.879667+00:00", + "source_trajectory_ids": [ + "traj-3fb70daf-c014-498c-8066-490d02db130b", + "traj-6954f688-1c63-4cc2-ae3c-1082dfc4c74a", + "traj-6cb082e1-f2af-495d-b463-43e725544f22", + "traj-73af4833-54c5-43b8-8c16-0e3bc4598b6f", + "traj-89b55a45-581f-4bbf-854a-9eff339e8b7f", + "traj-8e73d10b-a90a-41fc-be34-f348129d0c9a", + "traj-d3aba8b1-b712-44db-8f32-bc002a83e482", + "traj-e537400c-4cab-4ac1-9dc2-887f70ba9e19", + "traj-edc9669c-9425-4802-9d91-93ec63388e0a", + "traj-f006c107-3802-4eb1-9824-3b19415b0cc9" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-165316" +} \ No newline at end of file diff --git a/docs/training-reports/report-71d18e5e-93f1-4973-b205-5e21ae4cf132.json b/docs/training-reports/report-71d18e5e-93f1-4973-b205-5e21ae4cf132.json new file mode 100644 index 0000000..313da04 --- /dev/null +++ b/docs/training-reports/report-71d18e5e-93f1-4973-b205-5e21ae4cf132.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-71d18e5e-93f1-4973-b205-5e21ae4cf132", + "timestamp": "2026-04-14T21:44:48.185492+00:00", + "source_trajectory_ids": [ + "traj-07cc6d5b-8d85-4ac6-9d4b-4bfc129ab4e5", + "traj-35e4dc69-fbcb-4378-88e1-02f93fe78a31", + "traj-422c85ac-d2e1-4587-b3d1-c87881007757", + "traj-6f5c6331-735e-49ee-b402-062b3e711058", + "traj-8555bf2a-03c1-4d66-bd82-6edd583564f7", + "traj-98976a8d-1e1e-4e15-887c-567487a5a8bb", + "traj-b5331542-8962-4f0e-8be7-bda2dbe60fb0", + "traj-d63b7563-2553-4440-b637-72a225eef4e0", + "traj-d8334f2e-5b4c-47ab-bba9-49a3fa2f4b27", + "traj-ddc27f2e-73ed-4fbe-ab56-5c612dcebfe3" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-214448", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-71df18e0-e98e-4663-9279-46bd619196be.json b/docs/training-reports/report-71df18e0-e98e-4663-9279-46bd619196be.json new file mode 100644 index 0000000..03927f9 --- /dev/null +++ b/docs/training-reports/report-71df18e0-e98e-4663-9279-46bd619196be.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-71df18e0-e98e-4663-9279-46bd619196be", + "timestamp": "2026-04-15T01:36:36.536819+00:00", + "source_trajectory_ids": [ + "traj-155f162e-c6eb-4b33-82f8-ddab0dd2d63f", + "traj-15d916ea-8e00-4301-afd8-69c8371ca19d", + "traj-45d60fba-bb81-4fe8-b1a2-fc39ee4e8175", + "traj-45e653cb-f3f1-4e4a-bf5a-879a1d45c0f6", + "traj-ac56a61d-3b28-4fc5-89fe-1585d7a095df", + "traj-b95675d2-2eef-4563-b00f-0ab16366164d", + "traj-e23436a3-9fbf-450a-9c68-d5b9ce50b6ec", + "traj-f180b809-b918-4227-9b34-1c4f6c9448e2", + "traj-f1c411ee-fd4a-4e25-8fc9-ec36dae50b69", + "traj-f8fb5253-f07a-44de-a8c4-87123e70f689" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260415-013636", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-72b168b7-0904-4df5-8c0e-40e83d404554.json b/docs/training-reports/report-72b168b7-0904-4df5-8c0e-40e83d404554.json new file mode 100644 index 0000000..bf4d90b --- /dev/null +++ b/docs/training-reports/report-72b168b7-0904-4df5-8c0e-40e83d404554.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-72b168b7-0904-4df5-8c0e-40e83d404554", + "timestamp": "2026-04-15T01:57:32.711488+00:00", + "source_trajectory_ids": [ + "traj-10e25201-4b4b-4019-90b1-83d1680be21e", + "traj-32e4d84c-a667-4a47-9298-0125e963c079", + "traj-52faa2ef-b9ee-4ecb-81bd-ade6c2469dd3", + "traj-57abbb97-de7e-454e-b71b-5903aafc818d", + "traj-7f3fe1da-8dbf-456d-9d56-e6a4685a4072", + "traj-8bf39705-498f-4531-80d3-484748a4679f", + "traj-ade910a9-6578-4a87-89ea-2fff0db2f89c", + "traj-cc534e68-0d81-4ebd-8f2c-31740d8015a6", + "traj-d05c409b-7536-4bf7-b012-7677cfed5906", + "traj-f40e3dd4-496a-47bc-8671-befc06b8f071" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260415-015732", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-73683712-e416-4693-9f7c-85f78bc5c34b.json b/docs/training-reports/report-73683712-e416-4693-9f7c-85f78bc5c34b.json new file mode 100644 index 0000000..f281ac6 --- /dev/null +++ b/docs/training-reports/report-73683712-e416-4693-9f7c-85f78bc5c34b.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-73683712-e416-4693-9f7c-85f78bc5c34b", + "timestamp": "2026-04-14T22:08:57.531386+00:00", + "source_trajectory_ids": [ + "traj-0bea8430-0d7a-458c-8389-103a79e81a6f", + "traj-240d14c8-0d5e-4e90-9752-eab26315ea26", + "traj-40dc0cbc-37c3-48d5-94fa-85153d444a20", + "traj-5acb9c23-702b-4be4-ac29-a6052782260b", + "traj-5b5167db-b303-48bb-b78f-14f8c140633c", + "traj-6e2a8741-7e18-48e1-aa1d-d60d7fcc143f", + "traj-aa1c75b4-20aa-4d03-a45c-30a0850eabe2", + "traj-b1b01b2a-dd5f-4203-bf99-0aecbbced7c5", + "traj-dff09def-3106-44ad-a2fa-41cb9a5e9778", + "traj-e21ef421-3e49-4f23-843b-7c2cf17ae89d" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-220857", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-7576a115-0b27-46f0-b9a8-51b9a18a0fe6.json b/docs/training-reports/report-7576a115-0b27-46f0-b9a8-51b9a18a0fe6.json new file mode 100644 index 0000000..742021f --- /dev/null +++ b/docs/training-reports/report-7576a115-0b27-46f0-b9a8-51b9a18a0fe6.json @@ -0,0 +1,41 @@ +{ + "report_id": "report-7576a115-0b27-46f0-b9a8-51b9a18a0fe6", + "timestamp": "2026-04-14T18:06:25.465766+00:00", + "source_trajectory_ids": [ + "traj-08a6168f-f0b7-42bb-aef8-ac2215c45cad", + "traj-0df55922-a115-41b3-a721-e861afe1fb5b", + "traj-3209ba26-ea03-4788-8bd7-9b5fb6aaf830", + "traj-3845105a-b0fb-4a28-8322-002af1fdc5f3", + "traj-5b1ca6c3-3cbc-4c40-91e4-918988937861", + "traj-8f00713c-623b-430f-a3f3-bb0929291c72", + "traj-b0d7dae3-3d72-43ce-9c86-f9407b257b75", + "traj-b470b811-25ca-4c7c-b217-bd92d415c196", + "traj-dafe6056-749b-4aae-9984-e640634ac83a", + "traj-fda175cf-e74b-4e4b-ac9a-99a3189ff6cf" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-180625" +} \ No newline at end of file diff --git a/docs/training-reports/report-75df8940-51be-4562-afe7-0f1f374219b7.json b/docs/training-reports/report-75df8940-51be-4562-afe7-0f1f374219b7.json new file mode 100644 index 0000000..baba9a6 --- /dev/null +++ b/docs/training-reports/report-75df8940-51be-4562-afe7-0f1f374219b7.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-75df8940-51be-4562-afe7-0f1f374219b7", + "timestamp": "2026-04-14T20:57:28.014592+00:00", + "source_trajectory_ids": [ + "traj-27837c4d-d551-459c-a927-b0e32bec2891", + "traj-66a29122-5fbd-4e9b-a175-53382743ae4f", + "traj-7803135e-0e45-4ee8-a903-3920eda92711", + "traj-782f247f-39c1-464f-a5d6-56e7f9b444a7", + "traj-8a851754-1403-488d-a3e2-b3a2765ded34", + "traj-8cba8586-5cad-4c07-bbbb-f6c8f93a8895", + "traj-a6632a98-e33a-477d-b148-af244c6386bf", + "traj-cc07e16e-8ff7-4c82-b234-77d8335242d8", + "traj-e7327928-4089-49bb-9060-3828e65f83fe", + "traj-f49e6b3e-6ad8-459e-ac2f-9bc3a3297517" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-205728", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-76faf179-ae5b-489f-ba05-ba89f22aa9ca.json b/docs/training-reports/report-76faf179-ae5b-489f-ba05-ba89f22aa9ca.json new file mode 100644 index 0000000..5ef012f --- /dev/null +++ b/docs/training-reports/report-76faf179-ae5b-489f-ba05-ba89f22aa9ca.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-76faf179-ae5b-489f-ba05-ba89f22aa9ca", + "timestamp": "2026-04-14T19:19:02.003460+00:00", + "source_trajectory_ids": [ + "traj-06cce6e6-05e6-4a4a-8254-8b7d21a43e58", + "traj-38840dd8-8d74-4d1e-95ab-74b45736838d", + "traj-432599a4-b898-4b86-9025-ecd601192efa", + "traj-a7dfcce9-8068-4b98-972d-7cb2975d6224", + "traj-b9b42927-828b-40d9-806b-8cd1b211ae9d", + "traj-bf1b9759-81c5-403a-8548-e1d674ed10de", + "traj-d16ffb2b-0768-49fa-8f19-9d7b57f5af08", + "traj-d9afd41e-580e-4efc-82ee-e2f525f39b59", + "traj-e57518db-770e-4dcc-9507-04bf043c240f", + "traj-fd97119b-4e71-457d-8fad-d00d7c291454" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-191902", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-774e00aa-847e-4329-b4e4-6745a7510deb.json b/docs/training-reports/report-774e00aa-847e-4329-b4e4-6745a7510deb.json new file mode 100644 index 0000000..4d38320 --- /dev/null +++ b/docs/training-reports/report-774e00aa-847e-4329-b4e4-6745a7510deb.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-774e00aa-847e-4329-b4e4-6745a7510deb", + "timestamp": "2026-04-14T18:28:06.116516+00:00", + "source_trajectory_ids": [ + "traj-0680f16c-a3e2-43d9-92bc-413943c87fb2", + "traj-11dd5532-5b59-4562-aba9-ec1e3fb314c9", + "traj-45cddb43-6197-4bfc-88b5-94e0b896054f", + "traj-4de4cc12-7e2d-45cb-a523-1141dbeb15e2", + "traj-72bc5ccc-1dfd-4d8f-a5be-4198aacad1dc", + "traj-95c5c1ae-ef96-40b5-8f0b-a9e156d3657a", + "traj-976dcb41-a7c5-473a-b907-753fe7c6721a", + "traj-c8100fdb-052f-401e-b04f-b99a62d84289", + "traj-c9ba219d-f53d-4292-a2e5-f91378600a09", + "traj-e7644f00-be04-4d15-9489-7b3668004f36" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null +} \ No newline at end of file diff --git a/docs/training-reports/report-77d96a9c-b260-4ae8-b3e8-444604bf75f8.json b/docs/training-reports/report-77d96a9c-b260-4ae8-b3e8-444604bf75f8.json new file mode 100644 index 0000000..35131da --- /dev/null +++ b/docs/training-reports/report-77d96a9c-b260-4ae8-b3e8-444604bf75f8.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-77d96a9c-b260-4ae8-b3e8-444604bf75f8", + "timestamp": "2026-04-14T16:53:16.938055+00:00", + "source_trajectory_ids": [ + "traj-0f512c57-990c-4129-bfdd-e2cf71cee0cd", + "traj-3d5e314b-b7bd-42ec-9f3f-546c0f1556fc", + "traj-6846c005-3a1d-4dd5-86d0-4fd770d0befb", + "traj-68ce854e-0101-4ddd-9a3f-7ebf3e2a328a", + "traj-6d63a4e3-9db1-423b-b014-724509fb2efd", + "traj-8058f54c-1268-4908-9234-ff9fa6b6f886", + "traj-ccab552c-e82c-459a-95c2-f777bcb821be", + "traj-ce59c4db-28ec-4aa3-ac23-00ecc5046e11", + "traj-e93a74a3-f222-436f-bd4c-31001be3add3", + "traj-fad54386-752b-4be8-a770-72c86b3a0301" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null +} \ No newline at end of file diff --git a/docs/training-reports/report-78c72165-f016-49da-bd17-edcaf3d0fe93.json b/docs/training-reports/report-78c72165-f016-49da-bd17-edcaf3d0fe93.json new file mode 100644 index 0000000..6c62ea8 --- /dev/null +++ b/docs/training-reports/report-78c72165-f016-49da-bd17-edcaf3d0fe93.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-78c72165-f016-49da-bd17-edcaf3d0fe93", + "timestamp": "2026-04-14T22:09:38.807229+00:00", + "source_trajectory_ids": [ + "traj-1bebede9-ff62-4edf-88f7-4b7c8c01db7e", + "traj-22856fc1-cd85-43a0-8df6-ce9493104a32", + "traj-369510fa-53f7-42ba-afec-ddacb87240cd", + "traj-4e5fa0e0-da1a-49a1-bcc2-553f2414285c", + "traj-503891d2-34f4-4ae2-b7b9-c72426249514", + "traj-561ee822-6e49-4800-b88a-122054d153b8", + "traj-87b9f160-8f3e-4fd8-b096-420d56a6701a", + "traj-9d6f08f3-7494-4235-a955-1b692ed56477", + "traj-d082d1a8-aebc-402a-b8cc-b5d4ea291ccf", + "traj-f3e5cf58-8a67-47a9-a3c3-166628b75865" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-220938", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-78cd7466-8da2-476d-a066-ad24d259993b.json b/docs/training-reports/report-78cd7466-8da2-476d-a066-ad24d259993b.json new file mode 100644 index 0000000..b4cf881 --- /dev/null +++ b/docs/training-reports/report-78cd7466-8da2-476d-a066-ad24d259993b.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-78cd7466-8da2-476d-a066-ad24d259993b", + "timestamp": "2026-04-14T20:02:20.878572+00:00", + "source_trajectory_ids": [ + "traj-06e3f022-7237-4f78-a932-446c23355912", + "traj-14d729fc-5e4e-4ddf-be4d-1cdf0925404e", + "traj-50af15e1-95d7-4712-b1ea-e4451b1f4f0e", + "traj-6265a728-1cda-4ce1-a43e-d4dbe5c11617", + "traj-7a4a0e3d-cbef-4eb1-a5af-d18a7f25fbcb", + "traj-8ffc2bc7-5b72-4bc7-b1b1-f534e08d2175", + "traj-b11030bf-9f24-4d96-8f73-f3b2ff2ad84a", + "traj-c830bc23-3c04-4440-b45f-ef27207f95d8", + "traj-cc66bddc-541e-46b4-b8c6-6139e4a399ca", + "traj-fe47844e-144c-4b1b-9e0f-79377dff71f0" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-79ed3a93-77a0-43f0-8292-2bcf92efece9.json b/docs/training-reports/report-79ed3a93-77a0-43f0-8292-2bcf92efece9.json new file mode 100644 index 0000000..d68565f --- /dev/null +++ b/docs/training-reports/report-79ed3a93-77a0-43f0-8292-2bcf92efece9.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-79ed3a93-77a0-43f0-8292-2bcf92efece9", + "timestamp": "2026-04-14T22:08:57.523349+00:00", + "source_trajectory_ids": [ + "traj-0ea806a1-936e-4175-a018-12f69a729e23", + "traj-0eaa5a43-8655-49ec-9c6d-622f479c6835", + "traj-148638fa-2385-407e-9916-e7ddeddc1c27", + "traj-1c46e94e-5b91-467b-8b4e-1d57480cc745", + "traj-20925b8c-f507-4085-baa2-ab417bc7d9c2", + "traj-59d8245c-88c5-4b47-9d4d-0f9560fd18e4", + "traj-90e42a4c-df1f-4586-9bbb-ce430321fe6f", + "traj-a7604864-dc57-4681-b654-73f9253ad476", + "traj-af7b7001-38e6-4067-aa03-46f0466d9a9a", + "traj-b049ab69-573a-462f-bc7b-0c5867825251" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-7b49086a-603a-475c-b68b-f1f91dadd5f7.json b/docs/training-reports/report-7b49086a-603a-475c-b68b-f1f91dadd5f7.json new file mode 100644 index 0000000..5af77dd --- /dev/null +++ b/docs/training-reports/report-7b49086a-603a-475c-b68b-f1f91dadd5f7.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-7b49086a-603a-475c-b68b-f1f91dadd5f7", + "timestamp": "2026-04-14T20:04:58.773686+00:00", + "source_trajectory_ids": [ + "traj-02be2bfc-7a5c-4a78-a3bd-46c947889687", + "traj-40f61062-4dfc-485d-93dd-a0f5cff744f1", + "traj-479b19c2-ceef-45ff-8765-4b88448470d6", + "traj-70b19941-f21d-4e73-85c6-b007cc4a01d9", + "traj-7de17462-21f7-4ed2-9e1a-98978596cb95", + "traj-9c681ffd-1c59-4eb9-b297-766896c09b44", + "traj-ad285c8c-036c-49cf-8c18-47595e1beb1d", + "traj-b433a2d0-39b3-4743-a619-7d26d6ba7170", + "traj-d0689fdc-4a0a-45a6-bb87-84b4907f09d8", + "traj-d5a5a060-9280-429e-adf0-25a31aff7503" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-200458", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-7caea702-2511-4a95-a864-3689aa2ad0d9.json b/docs/training-reports/report-7caea702-2511-4a95-a864-3689aa2ad0d9.json new file mode 100644 index 0000000..8bf9088 --- /dev/null +++ b/docs/training-reports/report-7caea702-2511-4a95-a864-3689aa2ad0d9.json @@ -0,0 +1,41 @@ +{ + "report_id": "report-7caea702-2511-4a95-a864-3689aa2ad0d9", + "timestamp": "2026-04-14T15:53:50.734704+00:00", + "source_trajectory_ids": [ + "traj-39cac35e-7c25-4060-a318-abc3b5348d4e", + "traj-51e0c7e0-92c9-48c7-a729-68ba3938f6b4", + "traj-56d1aa4a-1405-42bb-8f92-2a40256550c5", + "traj-58f3999c-11a3-4ec7-8045-ecf9c8a975e4", + "traj-5b061c74-cf63-4e25-be8e-57d5d37ba1fa", + "traj-6f128a84-d017-4a7d-9fd8-542a4bea0dce", + "traj-782f6872-920c-44ea-bb75-bdd6f8183a3a", + "traj-c01ae2e8-be7f-45d9-a96f-347cdac3df72", + "traj-d65b30df-971a-44e7-a639-e27b43547483", + "traj-edaf3d1f-5dac-4be8-a83a-741a5b5a573a" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-155350" +} \ No newline at end of file diff --git a/docs/training-reports/report-7d902557-f70d-4fef-9d62-6aa342e8e377.json b/docs/training-reports/report-7d902557-f70d-4fef-9d62-6aa342e8e377.json new file mode 100644 index 0000000..b4625b7 --- /dev/null +++ b/docs/training-reports/report-7d902557-f70d-4fef-9d62-6aa342e8e377.json @@ -0,0 +1,44 @@ +{ + "report_id": "report-7d902557-f70d-4fef-9d62-6aa342e8e377", + "timestamp": "2026-04-14T18:30:24.441520+00:00", + "source_trajectory_ids": [ + "traj-1223e20d-cc08-4cde-a091-d4d93076754f", + "traj-1c5231fe-2568-46fe-bede-5df699d68252", + "traj-2def2337-04e5-460e-b304-c56cd94cfa0c", + "traj-866ba769-fd58-4198-8cbf-b58536a6c8ad", + "traj-a4ac7fac-fb17-48aa-8ad3-29ab4ff2d82b", + "traj-c855a1b9-2b55-4ff8-8860-bdffa0007d9b", + "traj-cc900807-0dd5-4678-93e9-b7a5fcbb115b", + "traj-d7a83b94-ab03-47ae-bd34-43e3569b7661", + "traj-dcf39e28-ba1d-4f59-96ce-8f2f37392e0f", + "traj-f2106a1a-e7d9-4eed-adba-3b4efb403870" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-7e1496f0-7f0c-4a63-b692-43cfbc20fa08.json b/docs/training-reports/report-7e1496f0-7f0c-4a63-b692-43cfbc20fa08.json new file mode 100644 index 0000000..9f72d6b --- /dev/null +++ b/docs/training-reports/report-7e1496f0-7f0c-4a63-b692-43cfbc20fa08.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-7e1496f0-7f0c-4a63-b692-43cfbc20fa08", + "timestamp": "2026-04-14T18:06:58.447400+00:00", + "source_trajectory_ids": [ + "traj-24b4113e-b5b7-4e95-8d60-cf8bffd93a31", + "traj-2b78cd34-d97f-47c5-a969-76261eb42302", + "traj-3a298882-04a8-4499-abc9-2cb86807d866", + "traj-45aeb4e1-7356-40b7-a3bb-a6a04c45053e", + "traj-4e4cb938-5ad3-4404-b4bd-470fe315478a", + "traj-52a8714e-18d3-4e6a-a86a-537d7fd019e0", + "traj-58810fbe-1d12-4865-873f-f305864cd277", + "traj-7ab1758b-3dc7-4fe4-ae8a-925b996b5b27", + "traj-d1e97442-2516-4929-8ca3-9d9c94fa38dc", + "traj-f4e5e924-7f05-4633-87a8-f14aeab64e4e" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null +} \ No newline at end of file diff --git a/docs/training-reports/report-7e50ca6a-54a5-4bbd-81af-df48c6e8914f.json b/docs/training-reports/report-7e50ca6a-54a5-4bbd-81af-df48c6e8914f.json new file mode 100644 index 0000000..22a0da4 --- /dev/null +++ b/docs/training-reports/report-7e50ca6a-54a5-4bbd-81af-df48c6e8914f.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-7e50ca6a-54a5-4bbd-81af-df48c6e8914f", + "timestamp": "2026-04-14T20:57:28.147337+00:00", + "source_trajectory_ids": [ + "traj-0d7435b2-984e-40a8-8edb-adf22890d7b8", + "traj-0f38e5b4-f39c-4d9f-afcc-717e81dbf51d", + "traj-16289330-a579-4aa4-a189-c9b31aeae31a", + "traj-3f346979-7403-4fe7-806e-991fc05eac08", + "traj-804e7f40-b6a6-41c0-bfbd-cee5faca7126", + "traj-8ec9f4da-e7f5-45d4-99ce-d17b78277424", + "traj-964f433b-b7d5-44d3-9ed9-0acf65540800", + "traj-bc2474b1-5269-4ddf-8142-a2d93ac72616", + "traj-e8a0b6c5-48bf-47c8-b33a-41a88c735c42", + "traj-fb49b641-4ab6-4415-a853-1cb76506cb8e" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-205728", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-7ed3bf36-e982-41d3-959a-1156ffd1999a.json b/docs/training-reports/report-7ed3bf36-e982-41d3-959a-1156ffd1999a.json new file mode 100644 index 0000000..d6f1e64 --- /dev/null +++ b/docs/training-reports/report-7ed3bf36-e982-41d3-959a-1156ffd1999a.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-7ed3bf36-e982-41d3-959a-1156ffd1999a", + "timestamp": "2026-04-15T01:36:36.549118+00:00", + "source_trajectory_ids": [ + "traj-1cfffe68-99f3-4b79-acaf-492f98bfb30e", + "traj-1d01aaea-2f82-4aab-808e-22b582cf3804", + "traj-46735e8f-ebaf-4e64-b7ff-fadbc7661eda", + "traj-4ed1dd60-007c-4b67-9bdb-5bc25301e439", + "traj-73c8466c-8313-4ac0-af99-97792bee79a7", + "traj-7df1403a-6441-486a-9486-5e8a09430c42", + "traj-905c6036-4972-407f-bec0-d0ceeeb0168f", + "traj-95a86969-c5c1-48a5-81f1-2f767b79ae75", + "traj-b11258b0-0969-48b0-96a3-d1dc48d6d4ba", + "traj-b6c9f538-3585-4653-a7dd-64ee41deb323" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260415-013636", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-80957f87-ab7d-48b9-a539-135f6a2264c3.json b/docs/training-reports/report-80957f87-ab7d-48b9-a539-135f6a2264c3.json new file mode 100644 index 0000000..93d61cb --- /dev/null +++ b/docs/training-reports/report-80957f87-ab7d-48b9-a539-135f6a2264c3.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-80957f87-ab7d-48b9-a539-135f6a2264c3", + "timestamp": "2026-04-14T20:34:01.478949+00:00", + "source_trajectory_ids": [ + "traj-03c82893-5c8d-4a00-8d7d-fab40579de5c", + "traj-3556fa38-6b54-4852-a431-de7c4c8bf65c", + "traj-46299d00-2b46-4ad4-a54e-62da45757215", + "traj-489c5c00-15d0-41c9-876d-1cf33499efb6", + "traj-5b197c9d-6186-4c47-8b04-ffd9febc4c5e", + "traj-71b1132e-4113-45ce-a58f-4a433ea7b378", + "traj-8c57805b-3f69-4b28-ad9c-b40a0ac62980", + "traj-ca0abd8a-3477-4155-b959-b96fdfeafabd", + "traj-d4f2ec7a-e604-4fa1-a271-aa3c3578787b", + "traj-d78066c1-6ad5-4fe1-8f21-882a2eb1fb50" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-203401", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-80e0b670-ea2a-47b3-ab54-6481afcb7c23.json b/docs/training-reports/report-80e0b670-ea2a-47b3-ab54-6481afcb7c23.json new file mode 100644 index 0000000..79ab97a --- /dev/null +++ b/docs/training-reports/report-80e0b670-ea2a-47b3-ab54-6481afcb7c23.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-80e0b670-ea2a-47b3-ab54-6481afcb7c23", + "timestamp": "2026-04-14T22:10:23.390552+00:00", + "source_trajectory_ids": [ + "traj-3a34ed0e-6006-4fcf-a401-d583ea95266e", + "traj-3ee70787-9427-42c4-9993-e6a7406401c7", + "traj-3f0b5592-b637-4d45-9a00-722be627c706", + "traj-495ace0b-e88e-4a49-99ea-1e9982bfdb9f", + "traj-6dafc03e-d667-439d-b691-237a437a0cd2", + "traj-74950b70-551f-4ef0-9d50-d36552403354", + "traj-837ddc95-4152-461c-9bd7-fd789734a9d4", + "traj-9ce5a24d-35f4-49fb-932d-8053c8ccfea3", + "traj-c1f1da9d-c325-4207-b54c-f0a79575ab02", + "traj-cfcbeec2-de79-4382-83be-ed7382811289" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-221023", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-813a3b23-c617-45bd-bcd6-e941054922cb.json b/docs/training-reports/report-813a3b23-c617-45bd-bcd6-e941054922cb.json new file mode 100644 index 0000000..169301a --- /dev/null +++ b/docs/training-reports/report-813a3b23-c617-45bd-bcd6-e941054922cb.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-813a3b23-c617-45bd-bcd6-e941054922cb", + "timestamp": "2026-04-14T18:57:10.511393+00:00", + "source_trajectory_ids": [ + "traj-11604d98-9fda-4d93-b95a-b8890c081387", + "traj-41d5bde2-6c0c-4d64-a950-caa5dab1ec28", + "traj-5c7b8795-4ce4-45f7-8672-dc60e183adb5", + "traj-7e40294c-b7b4-4556-ae27-f505f784b349", + "traj-96987742-db16-4bbe-8a86-c3b8b8a1b200", + "traj-aa145147-f81e-47b7-9e7a-1e5d854d7709", + "traj-c1e9f16a-65d1-4e40-9dac-bf9395931317", + "traj-e0256034-a98c-4f72-b038-2c7cefb311d3", + "traj-ea461c8f-b1f9-4eee-86c3-dbf6f899f5b2", + "traj-eba44f0c-d723-4561-b9fc-85a3343a0144" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-185710", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-81576e60-8de8-4f78-bfbc-d9868a297c74.json b/docs/training-reports/report-81576e60-8de8-4f78-bfbc-d9868a297c74.json new file mode 100644 index 0000000..c81faed --- /dev/null +++ b/docs/training-reports/report-81576e60-8de8-4f78-bfbc-d9868a297c74.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-81576e60-8de8-4f78-bfbc-d9868a297c74", + "timestamp": "2026-04-14T19:41:33.388720+00:00", + "source_trajectory_ids": [ + "traj-0ba4e422-2a4a-437b-b206-7c75cd7b1fd2", + "traj-130fb6f1-d212-45bd-a37d-c0d4679f3824", + "traj-5deb5b82-ffc9-4609-a1fe-201d34ab4f84", + "traj-776a9217-14a1-47ee-8931-fc68ecd29df4", + "traj-8b25fac4-7465-4cfd-9e04-3867a7564d89", + "traj-9e9bcd8b-4283-405c-b7da-f05ba2078152", + "traj-adf8cfbc-bc31-46cf-8371-8329f7781625", + "traj-c1872f64-599f-4e00-94af-ab3082573351", + "traj-c1a75b20-8145-4744-bfb7-5cc4f38814d2", + "traj-e71f334a-047d-4976-b2b9-17156cefa676" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-194133", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-83931755-d4b6-4ffc-a3ba-bb42a57c2703.json b/docs/training-reports/report-83931755-d4b6-4ffc-a3ba-bb42a57c2703.json new file mode 100644 index 0000000..98f5b14 --- /dev/null +++ b/docs/training-reports/report-83931755-d4b6-4ffc-a3ba-bb42a57c2703.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-83931755-d4b6-4ffc-a3ba-bb42a57c2703", + "timestamp": "2026-04-14T20:32:37.622875+00:00", + "source_trajectory_ids": [ + "traj-1b10bd44-58ca-4529-ac5a-450101520ead", + "traj-1c07e980-c3b3-4d31-8a92-6917518f6d0f", + "traj-26404c84-9032-48e9-a252-5e5c759d3ed9", + "traj-5032358e-b033-477c-aa80-6d443eeec3e5", + "traj-525efb64-bd21-4e59-b0b4-7e95d5ee8bc3", + "traj-5c574fba-966a-44f9-907c-71203298ce82", + "traj-7a0c61a4-6e75-4269-82c7-ee6cf735bb8a", + "traj-98490501-2251-4637-a29e-3dd00705bbc3", + "traj-bf6cb52a-66f2-4455-a212-0fa93adabafa", + "traj-fcc0b20d-a2e0-4493-9960-a284d8df8c75" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-203237", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-846503cc-7539-411b-bec8-797423739b33.json b/docs/training-reports/report-846503cc-7539-411b-bec8-797423739b33.json new file mode 100644 index 0000000..b4528d0 --- /dev/null +++ b/docs/training-reports/report-846503cc-7539-411b-bec8-797423739b33.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-846503cc-7539-411b-bec8-797423739b33", + "timestamp": "2026-04-14T21:21:15.060958+00:00", + "source_trajectory_ids": [ + "traj-037ac519-ba97-4b7a-97d5-f97f9b85a151", + "traj-3d24b7c0-5651-4fee-a4e8-fa304770b9cd", + "traj-472a907c-e919-46db-8f7d-20f3158fb8b9", + "traj-613362df-8139-4c00-a325-75fcb3d43d0f", + "traj-7fc820e9-f38d-4e3a-832b-6e1789e754bc", + "traj-834a1362-b9c3-425f-a814-3e37c0d9b3b4", + "traj-94fdc828-4d7b-4174-813d-64937f461ff8", + "traj-c6477509-0608-44ed-8044-6178c3840771", + "traj-ce1d2299-fd21-4b8f-9c52-63ace4b36ad9", + "traj-d0a029ec-7272-4ba1-9c5d-ff6339a88cd5" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-212115", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-84999c85-add6-4217-a619-d7e9b885ebd4.json b/docs/training-reports/report-84999c85-add6-4217-a619-d7e9b885ebd4.json new file mode 100644 index 0000000..e5be4a3 --- /dev/null +++ b/docs/training-reports/report-84999c85-add6-4217-a619-d7e9b885ebd4.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-84999c85-add6-4217-a619-d7e9b885ebd4", + "timestamp": "2026-04-14T21:22:14.902538+00:00", + "source_trajectory_ids": [ + "traj-25d53d2d-ad4c-47ef-b69f-7775319a0425", + "traj-42e48398-78b7-4d4c-9857-2e508d277631", + "traj-84cb72ab-2972-4243-a4c7-92aa08e25061", + "traj-99c0ca36-dc99-4a93-8dcb-15eb9d4d42b4", + "traj-b3ab1fd1-e41f-48a2-9a96-366685a6ee6b", + "traj-b99760af-283f-4abc-b9d6-08e7ae18be5a", + "traj-cc1b2fea-8ec3-4258-a76d-f78ea128e199", + "traj-d252955f-ad96-4cd3-b56e-08907e50aefa", + "traj-f029e126-7286-45ef-8877-6a6361f13949", + "traj-fff66343-2d72-43ff-b5a4-0c5bf6f6945f" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-212214", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-861ec0ac-12df-4364-8353-833b01551326.json b/docs/training-reports/report-861ec0ac-12df-4364-8353-833b01551326.json new file mode 100644 index 0000000..7b4c599 --- /dev/null +++ b/docs/training-reports/report-861ec0ac-12df-4364-8353-833b01551326.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-861ec0ac-12df-4364-8353-833b01551326", + "timestamp": "2026-04-14T20:28:05.488890+00:00", + "source_trajectory_ids": [ + "traj-076d450f-b0f8-4056-b3e8-6a6bd4358853", + "traj-109d9788-f729-4c43-b1a8-fd9760e58823", + "traj-10cd4d15-2952-4d92-9d69-9bae45de014b", + "traj-33a631a9-9133-4de1-8076-09b7a492a98b", + "traj-3c98074c-1094-4dd9-bc2b-ce86893a8020", + "traj-6dd93046-7964-4d07-8a68-0f3e21ffb971", + "traj-b632f2e8-de53-48d0-b80f-ce84f693585f", + "traj-c0bbb5e5-14c1-4b13-b104-a383a388da99", + "traj-c3a0367b-b9b6-4fd7-bf96-9e300c213a8d", + "traj-df9f86c0-b697-48e7-b4ee-284708dd389f" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-202805", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-8632d57e-b8cd-478e-9a0f-67e80fdb083a.json b/docs/training-reports/report-8632d57e-b8cd-478e-9a0f-67e80fdb083a.json new file mode 100644 index 0000000..87dd2d9 --- /dev/null +++ b/docs/training-reports/report-8632d57e-b8cd-478e-9a0f-67e80fdb083a.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-8632d57e-b8cd-478e-9a0f-67e80fdb083a", + "timestamp": "2026-04-15T01:41:52.320732+00:00", + "source_trajectory_ids": [ + "traj-0a0b4023-fc35-42f9-b6ba-3702f51d1ee5", + "traj-0e1a3b4d-bd5e-4b77-abd3-8d919381c39b", + "traj-3d790042-c656-4156-bde8-c09f26ca5e29", + "traj-3ec762af-785f-45d3-8fc4-97e5552d87a3", + "traj-46bdb0d8-e0f4-4f40-a827-bb5af885ac63", + "traj-5263feac-df7e-46eb-9d43-c7a2854dadce", + "traj-9c78fed2-f613-4fa3-9cf6-149392bf85db", + "traj-a8aa0a98-39cf-4967-92cf-3989f9160491", + "traj-ac9f41a7-bd74-487b-a0ee-099f16376dde", + "traj-b8b25237-11ef-4d11-b103-19336da2a915" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260415-014152", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-8648638d-4733-4f01-b253-ed8bdd622f31.json b/docs/training-reports/report-8648638d-4733-4f01-b253-ed8bdd622f31.json new file mode 100644 index 0000000..1a702de --- /dev/null +++ b/docs/training-reports/report-8648638d-4733-4f01-b253-ed8bdd622f31.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-8648638d-4733-4f01-b253-ed8bdd622f31", + "timestamp": "2026-04-14T21:42:45.793182+00:00", + "source_trajectory_ids": [ + "traj-191d8496-6a70-498f-a9b8-0ed466282bef", + "traj-1ce69adc-9039-4855-a1c8-0f09bc98e480", + "traj-5ba7c32d-3f3f-48e3-a21e-ca00432b6336", + "traj-6cdc3520-b9bf-4755-bdbd-0d557bfd6861", + "traj-8d5974c0-1ff8-4d40-833a-c34092a41190", + "traj-a6673469-e3bc-450b-bba3-705533546f28", + "traj-c3f46d8b-05a2-4f61-9bfe-611178e853ac", + "traj-dc2894ca-7ce8-4832-b202-4e881ec6c321", + "traj-ed936ed8-a13c-484c-a81b-9660ac914a77", + "traj-f24b1f5a-c880-4e57-aa0b-aa925ff7b3b8" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-214245", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-87e8c52b-b9d4-4313-9427-6ff15435f288.json b/docs/training-reports/report-87e8c52b-b9d4-4313-9427-6ff15435f288.json new file mode 100644 index 0000000..f494641 --- /dev/null +++ b/docs/training-reports/report-87e8c52b-b9d4-4313-9427-6ff15435f288.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-87e8c52b-b9d4-4313-9427-6ff15435f288", + "timestamp": "2026-04-14T20:57:28.031518+00:00", + "source_trajectory_ids": [ + "traj-24e288c1-5b5f-466b-8d57-8947efe30c3d", + "traj-419f0bc4-2099-4702-8cae-e09559835562", + "traj-4b96d746-8f00-43d8-964d-d89daa892714", + "traj-5daee68f-b5d3-49d4-b443-df61bb4fd1dd", + "traj-70cba1f5-10a1-4b5e-b3a7-b2e58376e304", + "traj-79c24421-c9d5-4cd1-bdcf-f7fa5bca13bb", + "traj-8800b4d2-7581-49c0-9f0f-b7b2a8c80e51", + "traj-8f4500be-5f43-4bab-8727-209a40cbdd80", + "traj-cc04b521-7699-4d1d-ad39-0d17f8ec039c", + "traj-d313b94a-5d22-44aa-bd16-cad1abd710ee" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-205728", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-883d758b-a1da-4510-88c5-cebf8ce0ba31.json b/docs/training-reports/report-883d758b-a1da-4510-88c5-cebf8ce0ba31.json new file mode 100644 index 0000000..ee0c5c7 --- /dev/null +++ b/docs/training-reports/report-883d758b-a1da-4510-88c5-cebf8ce0ba31.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-883d758b-a1da-4510-88c5-cebf8ce0ba31", + "timestamp": "2026-04-14T17:16:51.844103+00:00", + "source_trajectory_ids": [ + "traj-10750d7e-aec2-49e4-8c6a-49bf8d2bf620", + "traj-10fcbc94-01ac-42e2-804f-1d54ef7f5a87", + "traj-472748b4-dc44-408f-9ee5-a7991053cb4b", + "traj-52cf389a-e84e-4490-aa75-a0c116f8f82b", + "traj-66a4e8e5-42d0-4046-a9be-4c02c50325a5", + "traj-6d41eb83-b2cb-4987-a44f-16eb13d32d26", + "traj-9552c648-2d0b-4ca7-84ea-d738964bb54c", + "traj-aa5b30f3-a9f4-46c8-951e-4d3360ca1c15", + "traj-f4a7999c-0982-4ae0-ab0e-770de4c4ab8c", + "traj-f9a2ae8d-8fff-4e7b-951a-2e536bd8c96e" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null +} \ No newline at end of file diff --git a/docs/training-reports/report-88e3948c-3625-405b-af33-c51e9135aff2.json b/docs/training-reports/report-88e3948c-3625-405b-af33-c51e9135aff2.json new file mode 100644 index 0000000..6897b60 --- /dev/null +++ b/docs/training-reports/report-88e3948c-3625-405b-af33-c51e9135aff2.json @@ -0,0 +1,41 @@ +{ + "report_id": "report-88e3948c-3625-405b-af33-c51e9135aff2", + "timestamp": "2026-04-14T16:52:41.146294+00:00", + "source_trajectory_ids": [ + "traj-13e828a9-1ba3-4a80-9047-cbda9d2d5b6e", + "traj-50cd11c9-13b9-4265-ad3a-9ec07cd54504", + "traj-5837b135-1320-4be2-927e-3d9196e2b9da", + "traj-5dab0e5f-e8d1-40af-b2ea-fa691b9e9615", + "traj-740b3a62-9784-461c-a817-8f0615946d63", + "traj-9f118b9b-63fe-400f-ab25-959518aa658b", + "traj-cb67afcf-8c40-42bb-a469-b838e74c046a", + "traj-d0c75c4c-6241-4a19-9e12-a0c052d51108", + "traj-e4db8044-df9c-4c4c-ab98-97c839d8945e", + "traj-fe162343-b729-4d58-a395-a85ea4dd1921" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-165241" +} \ No newline at end of file diff --git a/docs/training-reports/report-8c7eae98-78c8-4392-b3bb-e08f1b55b477.json b/docs/training-reports/report-8c7eae98-78c8-4392-b3bb-e08f1b55b477.json new file mode 100644 index 0000000..4241dbc --- /dev/null +++ b/docs/training-reports/report-8c7eae98-78c8-4392-b3bb-e08f1b55b477.json @@ -0,0 +1,41 @@ +{ + "report_id": "report-8c7eae98-78c8-4392-b3bb-e08f1b55b477", + "timestamp": "2026-04-14T15:02:28.783291+00:00", + "source_trajectory_ids": [ + "traj-3fc37a97-5031-4cf8-8d66-42bc06486572", + "traj-49691566-2f6c-4774-bcf2-7e92cf7177f9", + "traj-678c08d1-4c36-43e0-bca4-962a057e50ab", + "traj-889ee048-f42d-493b-83c5-46e4d0bf1af7", + "traj-8b29c640-6379-4155-909f-1e0aad020cfe", + "traj-98bac095-cb81-4b20-82b4-60241df64bc5", + "traj-9c775265-5f9a-4552-836d-a6cd2c50067c", + "traj-abf09758-c6a2-493f-94ab-3939964b6ece", + "traj-c7bf056e-fe8a-42b4-a0e6-8bb111616e4c", + "traj-d923f51a-8d41-4cf4-b2e1-5f6aa3cfbc40" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-150228" +} \ No newline at end of file diff --git a/docs/training-reports/report-8d24c2b9-cd2a-4637-b590-6c4bb256b30e.json b/docs/training-reports/report-8d24c2b9-cd2a-4637-b590-6c4bb256b30e.json new file mode 100644 index 0000000..bc1455e --- /dev/null +++ b/docs/training-reports/report-8d24c2b9-cd2a-4637-b590-6c4bb256b30e.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-8d24c2b9-cd2a-4637-b590-6c4bb256b30e", + "timestamp": "2026-04-14T16:52:07.930194+00:00", + "source_trajectory_ids": [ + "traj-4a66d419-43d3-4300-b341-7bee0dbad001", + "traj-67c2f7d0-bd50-4fe0-b7f5-9cf6280a8b15", + "traj-6aee8a88-7456-477d-803e-84b29665e300", + "traj-6ba96b06-a520-4750-be96-f6e7d47bb05c", + "traj-74f20e0d-f7a3-4919-acac-b9d17bb668bf", + "traj-7f78f0f4-db8b-4f69-9820-f76ee0b3bb51", + "traj-af8a8b2a-a62d-4d47-b8c9-3072d23cd291", + "traj-b06ec696-af9c-48ee-949e-59fcc53c27ad", + "traj-d3b29b21-6816-4602-a43e-d9e7999e370d", + "traj-f6ab046b-4bb3-4f69-85d3-411b3c53c871" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null +} \ No newline at end of file diff --git a/docs/training-reports/report-8d4c7747-fa7c-4644-be2f-970c83f00bf9.json b/docs/training-reports/report-8d4c7747-fa7c-4644-be2f-970c83f00bf9.json new file mode 100644 index 0000000..0cc1dda --- /dev/null +++ b/docs/training-reports/report-8d4c7747-fa7c-4644-be2f-970c83f00bf9.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-8d4c7747-fa7c-4644-be2f-970c83f00bf9", + "timestamp": "2026-04-14T20:33:28.701057+00:00", + "source_trajectory_ids": [ + "traj-052faf89-7da2-47b6-8d10-c6e2a9ebcbd4", + "traj-086a884a-9385-446c-a84d-a2b913fc999d", + "traj-0f01bc07-6889-437c-a17d-62282f3268b9", + "traj-13b10580-5268-42e1-b694-30b2d6b6a805", + "traj-57b3a72c-9b3e-4bf3-85ce-0ecaf911ca7e", + "traj-7d9dfe23-94ff-4780-93f7-95d52c7514f1", + "traj-bf08149d-17e5-4852-b729-22b0c0dd6722", + "traj-d1e4a1f4-94ea-48c0-b14f-e84c22bda1ed", + "traj-ea5e5e8a-b34a-4dbe-a904-077a747c61eb", + "traj-f69f3469-1f86-41e7-9de6-33f517bbcd56" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-929ce385-3224-4343-b6a3-623d743d00d0.json b/docs/training-reports/report-929ce385-3224-4343-b6a3-623d743d00d0.json new file mode 100644 index 0000000..53ecdda --- /dev/null +++ b/docs/training-reports/report-929ce385-3224-4343-b6a3-623d743d00d0.json @@ -0,0 +1,41 @@ +{ + "report_id": "report-929ce385-3224-4343-b6a3-623d743d00d0", + "timestamp": "2026-04-14T17:15:16.809604+00:00", + "source_trajectory_ids": [ + "traj-09e794a0-c453-468e-abf4-671f75d09b27", + "traj-17352f71-f442-4a14-a968-3ac95478d226", + "traj-45c6096f-f884-454d-8ee0-86796ed28e3c", + "traj-b6c5ae89-313e-4c08-a3cf-44019a6b54f9", + "traj-bea9c4d9-616a-4982-b984-04fb347f3833", + "traj-c19e693c-5497-43f5-8279-97e33e6ae6bd", + "traj-d71c1ee7-c50b-4c24-9c9c-99a25c682734", + "traj-dc39ead7-258a-4b55-aa7a-5d93bddd628b", + "traj-dd1a3c6d-497d-4926-8073-755e7d6a83f0", + "traj-fd07ca81-b6de-4afe-afa3-54855388b6d5" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-171516" +} \ No newline at end of file diff --git a/docs/training-reports/report-9411737d-5da5-4423-afb8-960331cdc84e.json b/docs/training-reports/report-9411737d-5da5-4423-afb8-960331cdc84e.json new file mode 100644 index 0000000..7a5a666 --- /dev/null +++ b/docs/training-reports/report-9411737d-5da5-4423-afb8-960331cdc84e.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-9411737d-5da5-4423-afb8-960331cdc84e", + "timestamp": "2026-04-14T22:08:19.704176+00:00", + "source_trajectory_ids": [ + "traj-0e884ed2-e5a4-4958-8ccd-949789f2f075", + "traj-1ecb2bbb-062f-414f-a766-168911840e45", + "traj-315f5e26-3ada-4e15-9cd2-cc3bbe56961c", + "traj-535bbe8b-024a-4f48-a2d5-7b47d96eeff1", + "traj-59d74691-087b-4fb8-be2e-8494044a5c5e", + "traj-73365bf3-01a6-4b89-aebc-868662c93d53", + "traj-a2fc2a8a-0625-49ee-9dce-55f898108099", + "traj-b87b6d30-5cd5-47ea-906d-574c4d159708", + "traj-d84b9e7c-c4aa-431f-805d-11a3e9bb0e5f", + "traj-e3db39fc-ae68-4eb3-b0d7-9ba77f6605f6" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-220819", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-94760e0f-e122-41d8-af83-333a8c37a193.json b/docs/training-reports/report-94760e0f-e122-41d8-af83-333a8c37a193.json new file mode 100644 index 0000000..2e2c8ea --- /dev/null +++ b/docs/training-reports/report-94760e0f-e122-41d8-af83-333a8c37a193.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-94760e0f-e122-41d8-af83-333a8c37a193", + "timestamp": "2026-04-14T18:27:21.162078+00:00", + "source_trajectory_ids": [ + "traj-01e96207-75ce-48c2-8091-10f0bb7e37c4", + "traj-06eb55c2-4f97-4fc8-a784-ef6c4d817073", + "traj-20570b08-b2a6-454e-80ce-caa0fca0383d", + "traj-2131dbb2-21f4-4eb0-abe4-5ed360d9edc1", + "traj-5079dd02-a733-49f6-9e78-725036d4ea60", + "traj-6c16d814-aa3f-447c-b7b6-5fdf70746110", + "traj-a426c334-3e74-4b85-bab6-78d3cffbe889", + "traj-c0a40c81-335f-48c1-b2f4-62c5d2395083", + "traj-e65f84b2-506d-43c1-8348-6c6d31a92cb9", + "traj-f4e4a5d7-614d-4ddb-964c-0ff97b4117ba" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null +} \ No newline at end of file diff --git a/docs/training-reports/report-95ce94d7-1f5a-4b01-abfc-33221db5edb8.json b/docs/training-reports/report-95ce94d7-1f5a-4b01-abfc-33221db5edb8.json new file mode 100644 index 0000000..1d364bb --- /dev/null +++ b/docs/training-reports/report-95ce94d7-1f5a-4b01-abfc-33221db5edb8.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-95ce94d7-1f5a-4b01-abfc-33221db5edb8", + "timestamp": "2026-04-14T22:05:59.116543+00:00", + "source_trajectory_ids": [ + "traj-0bfe93a3-bae9-4362-8f20-9c72c4b1f327", + "traj-0f156cb4-a83d-4e61-a20b-7f79ad707683", + "traj-4895bea3-bbc0-41a5-90e4-cb1d3dda6a67", + "traj-51649ce3-4499-4149-90db-62efd52e79f4", + "traj-584e3954-521b-4418-90e5-2b61caa50403", + "traj-9cadb24c-e9a3-4d40-824c-597569d7c2cf", + "traj-9f822bbd-502d-485b-b54a-deb4470e555a", + "traj-c7b2b31b-fd5d-478b-add5-9ed96b171036", + "traj-df615790-893c-4421-92b0-e0bdfdd94844", + "traj-e553ba9f-3ab9-4463-91bb-b651bd213dda" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-220559", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-968dc267-95ba-4796-a51e-76f67e534009.json b/docs/training-reports/report-968dc267-95ba-4796-a51e-76f67e534009.json new file mode 100644 index 0000000..f006aff --- /dev/null +++ b/docs/training-reports/report-968dc267-95ba-4796-a51e-76f67e534009.json @@ -0,0 +1,41 @@ +{ + "report_id": "report-968dc267-95ba-4796-a51e-76f67e534009", + "timestamp": "2026-04-14T15:01:27.701445+00:00", + "source_trajectory_ids": [ + "traj-083bd3bc-cae0-4982-b7ce-fdeb1bc18028", + "traj-19771c35-b6ec-4863-8543-c08370ca0822", + "traj-1c6864b5-22a1-4b81-8516-50ebe8eef5ae", + "traj-3af19065-9017-465b-a183-f3b860a1d228", + "traj-43adff43-d81c-4f0e-86da-b27c07f8a6ca", + "traj-5167e5bc-e287-47e7-8d5c-b8a1171f7978", + "traj-543b5227-9ea6-4007-affd-de515c82aa82", + "traj-d75e3381-3303-4e9b-ba2c-a0937bd825c0", + "traj-e0df49ba-e771-4e0f-aa5d-92f5677b6f0b", + "traj-f8efbc7a-8f37-4edd-85b1-d7b8ae8021eb" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-150127" +} \ No newline at end of file diff --git a/docs/training-reports/report-9693116a-6a49-479d-bd5c-d7e2e31d762e.json b/docs/training-reports/report-9693116a-6a49-479d-bd5c-d7e2e31d762e.json new file mode 100644 index 0000000..df1fac5 --- /dev/null +++ b/docs/training-reports/report-9693116a-6a49-479d-bd5c-d7e2e31d762e.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-9693116a-6a49-479d-bd5c-d7e2e31d762e", + "timestamp": "2026-04-14T22:08:19.819939+00:00", + "source_trajectory_ids": [ + "traj-13a9f6c9-7e91-4697-8fbe-4566db4e6b7c", + "traj-2d8bd089-7a55-4b37-9779-2d6eff8b5c75", + "traj-430c8552-caa3-48d4-a04a-50d33b2f4651", + "traj-4dbf435e-2948-49b9-8128-e9ad6aecb83b", + "traj-7efca5b5-d777-4d02-bd1c-a1bcc894946e", + "traj-81d83ce2-17e2-48e3-a0f8-6daf658bfb98", + "traj-81ef2ecc-2b92-4311-af28-b603b6e1db14", + "traj-b9fa615c-fced-4c42-b354-0b653b56dbb5", + "traj-bef1d857-2276-47c8-b453-cff7d0aba2b2", + "traj-fccde1c2-f299-456f-a020-23e38a9677a2" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-97d7333f-8fff-433c-b976-1bf7c91084ef.json b/docs/training-reports/report-97d7333f-8fff-433c-b976-1bf7c91084ef.json new file mode 100644 index 0000000..3aab3dc --- /dev/null +++ b/docs/training-reports/report-97d7333f-8fff-433c-b976-1bf7c91084ef.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-97d7333f-8fff-433c-b976-1bf7c91084ef", + "timestamp": "2026-04-15T02:31:17.347142+00:00", + "source_trajectory_ids": [ + "traj-02d262d8-1363-4c07-88b3-a31809c1e92c", + "traj-042ab936-62ca-4a6f-bd33-0bee39011ebc", + "traj-28f6ddb1-92a8-4869-9213-72d44636c31c", + "traj-2fe5a024-3ac8-42cc-a54b-1ef4c2c18ccc", + "traj-4a3a1415-6b20-493c-a507-cec184b19efc", + "traj-9dcb498e-8acc-47d8-9783-e6e21fd7911d", + "traj-d43c9fb3-11e2-4df1-97cb-1bef6aeb5cb1", + "traj-e8776a75-5865-4d50-95ac-f7c02a96754a", + "traj-f3f945da-3c7d-48d9-ae0f-779552c8e39f", + "traj-fa89579f-35d2-405e-8d65-bdc096d7d3a3" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260415-023117", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-98271ceb-8f12-4af9-8107-75c1d4417a18.json b/docs/training-reports/report-98271ceb-8f12-4af9-8107-75c1d4417a18.json new file mode 100644 index 0000000..f5cc2aa --- /dev/null +++ b/docs/training-reports/report-98271ceb-8f12-4af9-8107-75c1d4417a18.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-98271ceb-8f12-4af9-8107-75c1d4417a18", + "timestamp": "2026-04-15T01:21:53.704331+00:00", + "source_trajectory_ids": [ + "traj-1cb29d73-0930-4e93-9405-f3eba3ec5645", + "traj-246371c8-1d8f-415a-a1e7-e3a71e34013f", + "traj-4111d916-a401-40f3-808f-0a7fe984a576", + "traj-4af2386b-ddba-44e3-a9de-69167dacff25", + "traj-4dd46793-9506-460b-abbe-cab2513e68bb", + "traj-678326cb-a48e-4d57-9370-4f836d87d237", + "traj-6f8de3f1-e141-4b45-af97-54bb1bd89215", + "traj-8765e3b5-169e-44e7-b2e3-a89b8accbc48", + "traj-89094a9c-093b-4350-85df-1e45b82394cb", + "traj-fe60dc09-9b7c-422a-ac77-609981c40f2f" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260415-012153", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-983c0662-bcf9-4a16-9973-ddaddad78011.json b/docs/training-reports/report-983c0662-bcf9-4a16-9973-ddaddad78011.json new file mode 100644 index 0000000..a7e667f --- /dev/null +++ b/docs/training-reports/report-983c0662-bcf9-4a16-9973-ddaddad78011.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-983c0662-bcf9-4a16-9973-ddaddad78011", + "timestamp": "2026-04-14T20:31:11.334248+00:00", + "source_trajectory_ids": [ + "traj-06c09b49-ff21-4254-98ee-db1561f48e77", + "traj-146bd07d-3d43-4282-a0c8-9225ba05bc69", + "traj-1fa2cd91-42e3-421a-a1d3-1737dabb196f", + "traj-25b3b238-59dd-47dd-ad44-2663febeebfa", + "traj-3c10bf10-20ed-4e2a-b405-d72c9e00fec8", + "traj-3cf8ae87-53e2-41f7-b16e-4dace2d06754", + "traj-9c7bffcf-cd71-4509-96f3-1f62b79857fd", + "traj-a8b04220-2c5f-42da-9727-6ad8ce8c256e", + "traj-b67d4dbb-a598-4536-973a-eb8f5d5c099c", + "traj-feae3677-adb6-431b-bd8f-6f32bbcfff23" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-986b8fb7-7bfb-49b1-86c5-361d0ef3faec.json b/docs/training-reports/report-986b8fb7-7bfb-49b1-86c5-361d0ef3faec.json new file mode 100644 index 0000000..6dcf352 --- /dev/null +++ b/docs/training-reports/report-986b8fb7-7bfb-49b1-86c5-361d0ef3faec.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-986b8fb7-7bfb-49b1-86c5-361d0ef3faec", + "timestamp": "2026-04-14T19:19:02.064004+00:00", + "source_trajectory_ids": [ + "traj-01798b8c-8f40-477c-a29c-20f9b63b2c39", + "traj-183830ec-e6b5-4fa6-9c21-ab37be03948c", + "traj-2e1c9767-6883-4102-a0c1-a37fd58a33dc", + "traj-2f68cc6e-84f9-468c-8651-6b798da1bad8", + "traj-34ebba6f-8f77-4bf9-b196-1c789e12d7b1", + "traj-6f7b3641-34a1-4635-b993-c456c7a68c56", + "traj-8ac3d02f-1ba7-45d6-9d51-32f7f7a698ca", + "traj-b7458383-db3a-46ac-992e-28cb0da0f5c8", + "traj-dfc99b79-5474-428d-959b-7e3187494d41", + "traj-f7e2bb59-2b0a-43d5-bacd-ae1b7ecde76b" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-98f18532-e2a1-49c1-8b57-b6e3dd45052b.json b/docs/training-reports/report-98f18532-e2a1-49c1-8b57-b6e3dd45052b.json new file mode 100644 index 0000000..a76bb8d --- /dev/null +++ b/docs/training-reports/report-98f18532-e2a1-49c1-8b57-b6e3dd45052b.json @@ -0,0 +1,41 @@ +{ + "report_id": "report-98f18532-e2a1-49c1-8b57-b6e3dd45052b", + "timestamp": "2026-04-14T17:16:51.793811+00:00", + "source_trajectory_ids": [ + "traj-0d1add2e-cc1f-4d95-93e7-83632005460c", + "traj-19552c13-b01d-4f63-95c3-8050ee73732e", + "traj-29336be2-22cb-483e-8481-59c99da47b6e", + "traj-363e23a6-ad90-4821-985b-d041c97275d5", + "traj-51055cfe-31df-473d-90c6-d6ef3d76b794", + "traj-51b8e02e-f36e-40f7-b9cf-c076c60fac7e", + "traj-88467392-2831-4a14-88f8-f21c7dd17487", + "traj-91b019f6-8e7e-4de8-84b6-7cb00976d1c6", + "traj-bc32252b-6a48-48ad-baa5-332a75fee261", + "traj-bd5b5c36-2f01-469d-b162-75d64359da0c" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-171651" +} \ No newline at end of file diff --git a/docs/training-reports/report-995ea2f1-7710-45f1-ada9-f5e91cc6563a.json b/docs/training-reports/report-995ea2f1-7710-45f1-ada9-f5e91cc6563a.json new file mode 100644 index 0000000..b8494d3 --- /dev/null +++ b/docs/training-reports/report-995ea2f1-7710-45f1-ada9-f5e91cc6563a.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-995ea2f1-7710-45f1-ada9-f5e91cc6563a", + "timestamp": "2026-04-14T18:05:15.833596+00:00", + "source_trajectory_ids": [ + "traj-0c8a0005-00ef-4c60-8fc3-c989d42758ea", + "traj-29cd85b7-bd1e-4f34-bba9-b49a61dd2fe5", + "traj-2c0e79d9-e25c-4303-9b5d-b562050bf5c7", + "traj-6af102dd-1d08-41d4-8165-ea4c8bc5d578", + "traj-6c109019-7ef3-41df-be70-9b33e9b0db92", + "traj-990cf702-35f5-45fd-be4e-8b437cc0860a", + "traj-a3383362-b355-4e3d-9e2b-e63cc2e5a536", + "traj-aaa92abe-ca48-4f6d-9fca-6977b0ba2763", + "traj-b8be0603-142d-4575-a0bc-1a4a324c3f8e", + "traj-bb66e0f8-0f96-46b0-9b80-565f35306d1a" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null +} \ No newline at end of file diff --git a/docs/training-reports/report-99e7039d-ad7d-40f3-8df1-921a6b4a755a.json b/docs/training-reports/report-99e7039d-ad7d-40f3-8df1-921a6b4a755a.json new file mode 100644 index 0000000..253995a --- /dev/null +++ b/docs/training-reports/report-99e7039d-ad7d-40f3-8df1-921a6b4a755a.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-99e7039d-ad7d-40f3-8df1-921a6b4a755a", + "timestamp": "2026-04-15T02:31:17.330200+00:00", + "source_trajectory_ids": [ + "traj-01934281-c2cd-4655-9713-dcf9a6423612", + "traj-078d9a1d-75bb-43ce-a0c6-363ace32baae", + "traj-2bbef0ab-e642-474b-a4cb-c0c988f313bb", + "traj-2cb7f855-20ad-4a8b-b11c-30665c47d857", + "traj-85a46d03-4ccd-4002-88cc-1b3e30406b77", + "traj-8b61fd64-ea3c-4246-821d-84210bc8cc14", + "traj-ce33a0f6-414b-4d11-95a2-9146c8bb9125", + "traj-d4fe5501-4fcc-46ad-b75e-fc5ef9a3934e", + "traj-fa0fbbfb-e0b0-4452-8172-0c382518cf1d", + "traj-fdc82c0a-b867-4053-b8c4-44bdc0561d1f" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260415-023117", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-9d2e3473-442e-4fcb-b2dc-57b7e80ef623.json b/docs/training-reports/report-9d2e3473-442e-4fcb-b2dc-57b7e80ef623.json new file mode 100644 index 0000000..bf90f3e --- /dev/null +++ b/docs/training-reports/report-9d2e3473-442e-4fcb-b2dc-57b7e80ef623.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-9d2e3473-442e-4fcb-b2dc-57b7e80ef623", + "timestamp": "2026-04-14T22:05:59.105612+00:00", + "source_trajectory_ids": [ + "traj-1ca21b0f-fdc4-409e-a6a7-ee09a5f888b2", + "traj-2a2b3fc7-a3ec-421a-be23-470b44771087", + "traj-51813a04-e1db-4f5a-a1b4-3df19be7e08c", + "traj-637bc4f2-c93f-4ed9-b4d1-794b4cf086f4", + "traj-803fcf08-6779-49cd-9324-6b26b24f40fb", + "traj-9f3acfbb-ee22-41fc-9c81-ab6b6bbf8a0a", + "traj-a3928831-c8be-4ef5-b7fb-c993635f5d04", + "traj-b11b9446-05a2-405b-92ab-e759fd3e821a", + "traj-c36ad7bd-7533-4772-9ae0-39b06970a6ec", + "traj-f3fdf61f-eee6-46df-ae61-6aa2e1e2b167" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-220559", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-9fd6cd04-3f46-4363-bc6c-65a68c7669e7.json b/docs/training-reports/report-9fd6cd04-3f46-4363-bc6c-65a68c7669e7.json new file mode 100644 index 0000000..8dd1c1b --- /dev/null +++ b/docs/training-reports/report-9fd6cd04-3f46-4363-bc6c-65a68c7669e7.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-9fd6cd04-3f46-4363-bc6c-65a68c7669e7", + "timestamp": "2026-04-14T21:22:02.820380+00:00", + "source_trajectory_ids": [ + "traj-14515db2-6257-46d3-8111-ee5298cb8707", + "traj-25357165-32ee-4c36-af02-2770f53f6207", + "traj-7f2fa927-eb98-4ed7-916f-fdbc2b1e8f03", + "traj-87eda674-8b88-4843-852b-c64e472ac374", + "traj-a4833d9a-3ee5-4cb4-bd1f-fa8b0a619b87", + "traj-afa6a33f-ff16-4fc4-a9ec-79ee42ca88d3", + "traj-b5789cbe-1ef7-4e48-a8eb-a09e0beb91c9", + "traj-ba7db9ce-e019-413c-b45e-50d2b03c7fb1", + "traj-e063b56f-1b26-4955-908a-e87e175b3e01", + "traj-f344b472-15ad-4e29-b084-de2185896a81" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-212202", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-a08199f7-c94c-4829-bace-e904a7f4ac74.json b/docs/training-reports/report-a08199f7-c94c-4829-bace-e904a7f4ac74.json new file mode 100644 index 0000000..cb9bc08 --- /dev/null +++ b/docs/training-reports/report-a08199f7-c94c-4829-bace-e904a7f4ac74.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-a08199f7-c94c-4829-bace-e904a7f4ac74", + "timestamp": "2026-04-14T21:21:15.044727+00:00", + "source_trajectory_ids": [ + "traj-0e13473a-9ee2-4567-a265-c6e2b9fd5100", + "traj-1e705a1c-5e10-4a9b-a00b-3653cffaa31c", + "traj-27c9411b-8b91-4382-9276-dca0733b2e36", + "traj-2a4f54be-13d9-49c8-9621-3de69fc023c0", + "traj-47cfb39a-67a1-488a-a060-484958e90abc", + "traj-5514a1b9-11ac-4039-85e3-7d8a509dbf3d", + "traj-5a38919f-4707-4604-94ca-8d383a89eaf7", + "traj-69386c22-782d-4c78-bd4d-b4c4c3e4d87c", + "traj-6c67ad21-4da3-4e2f-9a74-c5d457d3fccd", + "traj-c37ea4a2-83b9-4968-8a3c-8cc5c55c8899" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-212115", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-a0a7ee60-aa2d-4f53-8686-e6a72ed1d192.json b/docs/training-reports/report-a0a7ee60-aa2d-4f53-8686-e6a72ed1d192.json new file mode 100644 index 0000000..6a5cac4 --- /dev/null +++ b/docs/training-reports/report-a0a7ee60-aa2d-4f53-8686-e6a72ed1d192.json @@ -0,0 +1,41 @@ +{ + "report_id": "report-a0a7ee60-aa2d-4f53-8686-e6a72ed1d192", + "timestamp": "2026-04-14T18:03:43.972514+00:00", + "source_trajectory_ids": [ + "traj-016b4424-cdd6-453a-98a8-f4fa6da5f166", + "traj-0aacdc90-9e9e-48d2-876b-bc68e5af08e6", + "traj-141b40bc-85df-4d3d-a3e1-a1ee36a8cf48", + "traj-233b089d-0899-497d-a3a4-a9af408cfc58", + "traj-5fbeaa00-d1a7-4b4e-a151-6d71347e2f94", + "traj-7dbfdbc7-3113-48cd-bcf8-4066d6f8b98a", + "traj-7e3d6383-adb2-4f62-97fb-259f9d1778a3", + "traj-ea3648dd-a2e5-4d04-bc49-8c330f279b13", + "traj-ed863566-5d50-41a7-b23d-30f9229286fc", + "traj-f9c2b526-29bc-48c7-a150-2782b4a524cb" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-180343" +} \ No newline at end of file diff --git a/docs/training-reports/report-a155a209-f98a-4672-984a-5822384f5c1a.json b/docs/training-reports/report-a155a209-f98a-4672-984a-5822384f5c1a.json new file mode 100644 index 0000000..7a70a9a --- /dev/null +++ b/docs/training-reports/report-a155a209-f98a-4672-984a-5822384f5c1a.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-a155a209-f98a-4672-984a-5822384f5c1a", + "timestamp": "2026-04-15T02:33:47.917171+00:00", + "source_trajectory_ids": [ + "traj-130fc82e-79ff-4b26-8521-cab929ec178d", + "traj-3f07d696-a342-4d07-8134-5c7dc5226ddf", + "traj-83b19d1e-ce8f-4dd3-a940-0f991a477eff", + "traj-87a2d328-8aa6-46e2-b84a-07ec14128b7f", + "traj-a5c5a69d-46ea-4cf2-8dc1-9258fff743c7", + "traj-acad6133-2bac-4e71-a546-adc0b110b975", + "traj-c94b9c62-c5f3-421b-916c-ae1028e3db57", + "traj-d78205e9-74e9-4de8-87f2-8871c81fa9ec", + "traj-d910f0c5-5ea6-472a-a94b-cfcd98e0b42b", + "traj-f9c6c857-b7cc-4b49-8991-85b43c869373" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260415-023347", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-a15627c6-080a-406e-a549-9e6fa828c1bf.json b/docs/training-reports/report-a15627c6-080a-406e-a549-9e6fa828c1bf.json new file mode 100644 index 0000000..0f42b0e --- /dev/null +++ b/docs/training-reports/report-a15627c6-080a-406e-a549-9e6fa828c1bf.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-a15627c6-080a-406e-a549-9e6fa828c1bf", + "timestamp": "2026-04-14T20:30:08.656393+00:00", + "source_trajectory_ids": [ + "traj-516c4b52-83f2-490d-8c8f-3321b1733b26", + "traj-5564bdb2-35f1-46b9-95b8-567e85e606bb", + "traj-66d9de3c-86db-4f36-ac6b-f210d98f6839", + "traj-87819fe2-3abf-4f0d-a924-8386a5c0683b", + "traj-9a2eb8ef-0d4c-45cf-9773-a86d86a7c8d5", + "traj-a29b6097-ceec-4d7a-aea8-f0efb4436486", + "traj-af5323f0-3699-4d16-9274-363a474477b9", + "traj-c9f5678e-5fa8-43db-bb96-45d8f67fdb2d", + "traj-cdce8964-60be-4a15-b9ae-37ec6300470d", + "traj-e168fb14-1e5e-4b33-9819-02e33d435812" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-203008", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-a41767b9-4256-4d1d-afcc-3b6f9dc85ddf.json b/docs/training-reports/report-a41767b9-4256-4d1d-afcc-3b6f9dc85ddf.json new file mode 100644 index 0000000..cf03e9a --- /dev/null +++ b/docs/training-reports/report-a41767b9-4256-4d1d-afcc-3b6f9dc85ddf.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-a41767b9-4256-4d1d-afcc-3b6f9dc85ddf", + "timestamp": "2026-04-14T19:41:58.743590+00:00", + "source_trajectory_ids": [ + "traj-44afabc6-efd0-4aaa-b815-361bc2c996a0", + "traj-5e29c8b5-bc67-48d4-8ce4-e44f87bb0f2e", + "traj-69fcd481-11f6-42e8-b89d-77cf0a841404", + "traj-6d52b9eb-0b50-42e8-b336-b44387d147c6", + "traj-a4508037-1c11-4e3a-b8a6-cfd453fb44ff", + "traj-c9e7e3f1-1971-47c7-8f9d-ed1f6faa4bd6", + "traj-cd23f8c5-adcb-456f-9682-d97629ea389e", + "traj-ce025b94-b21b-496f-895f-a8b4f241fd60", + "traj-da027372-9d50-4d5a-875d-732a504e5ff3", + "traj-da44d36d-600e-45bf-b66e-bce65a28e40b" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-194158", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-a4439f74-2df0-4bf0-a0e1-9a6f3b6bf4e4.json b/docs/training-reports/report-a4439f74-2df0-4bf0-a0e1-9a6f3b6bf4e4.json new file mode 100644 index 0000000..a0cf123 --- /dev/null +++ b/docs/training-reports/report-a4439f74-2df0-4bf0-a0e1-9a6f3b6bf4e4.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-a4439f74-2df0-4bf0-a0e1-9a6f3b6bf4e4", + "timestamp": "2026-04-14T22:10:15.298250+00:00", + "source_trajectory_ids": [ + "traj-0a4226f6-e916-4e96-9454-798fec39ff67", + "traj-0f7ea76d-a5d2-409e-9b21-7dee91b4854a", + "traj-17f515e4-5577-4107-8ced-644b541bf429", + "traj-280e5a6b-8d2a-4b79-8c3a-5d3f1b066c6a", + "traj-7414d872-e032-471c-9796-24dffd19426b", + "traj-846c0282-d6c3-490a-8cac-85576955c43d", + "traj-8a9d55c1-f594-48aa-8fc2-8bfc66ca5dab", + "traj-a657c7b6-7967-409e-8466-a9a65aa796d5", + "traj-ca1fc6dd-3168-4035-9a16-8f0653ca6306", + "traj-faf255fb-9f40-498a-be06-e77f3c04a763" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-221015", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-a699cdfc-432b-4561-8991-7e0129ec50f3.json b/docs/training-reports/report-a699cdfc-432b-4561-8991-7e0129ec50f3.json new file mode 100644 index 0000000..6e8a816 --- /dev/null +++ b/docs/training-reports/report-a699cdfc-432b-4561-8991-7e0129ec50f3.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-a699cdfc-432b-4561-8991-7e0129ec50f3", + "timestamp": "2026-04-14T17:17:57.987874+00:00", + "source_trajectory_ids": [ + "traj-1420cb03-d185-4412-992c-54f7db4e96aa", + "traj-3557f94b-279c-4c8c-bb33-77ff5d84937c", + "traj-4606f2f9-64a9-40df-b76e-580c15d39441", + "traj-5f8d3fe1-d60e-4c33-a569-ff507724503c", + "traj-5ffe92b7-295f-4e8e-9b95-2c108126e5a6", + "traj-6d7f1a66-5b58-4791-af4c-fd7a8a2ad21e", + "traj-8f4ceb49-eca2-4b8b-bbed-c55f1e37cca7", + "traj-a4f28432-26e3-4cb7-9f8d-a753dd265ca5", + "traj-a85fb06c-1c5a-4bda-bb76-71fa64108ea9", + "traj-e9894efb-bc73-4246-9ff5-346bf361c645" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null +} \ No newline at end of file diff --git a/docs/training-reports/report-a7c13fda-3bc4-40fe-8005-1ea30070e87d.json b/docs/training-reports/report-a7c13fda-3bc4-40fe-8005-1ea30070e87d.json new file mode 100644 index 0000000..57857de --- /dev/null +++ b/docs/training-reports/report-a7c13fda-3bc4-40fe-8005-1ea30070e87d.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-a7c13fda-3bc4-40fe-8005-1ea30070e87d", + "timestamp": "2026-04-14T15:02:28.835899+00:00", + "source_trajectory_ids": [ + "traj-1bbd6764-060e-4e1b-a736-85d1cdeac96c", + "traj-24593f48-53db-4ffb-9686-71c03a8d76bb", + "traj-390f0671-c917-43ce-af6c-e560adf1e853", + "traj-4c5c72fd-1211-441d-ad74-58209bbe0e07", + "traj-6810ea4f-454a-4cbe-95f8-477b310d6d28", + "traj-712830a1-eae2-4988-9e58-3916202f2602", + "traj-807823b8-f5c6-4c6f-9c7b-4450bf382446", + "traj-c3e7e656-bcba-482b-af68-8d7074a26a88", + "traj-c8ad6290-bf84-47a4-a0d1-10a56818483c", + "traj-fe08dd3e-9fbe-463d-8c17-8f0b168c963a" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null +} \ No newline at end of file diff --git a/docs/training-reports/report-a8a3ceca-3d16-4dcd-a615-92350dcca0d0.json b/docs/training-reports/report-a8a3ceca-3d16-4dcd-a615-92350dcca0d0.json new file mode 100644 index 0000000..d7032ff --- /dev/null +++ b/docs/training-reports/report-a8a3ceca-3d16-4dcd-a615-92350dcca0d0.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-a8a3ceca-3d16-4dcd-a615-92350dcca0d0", + "timestamp": "2026-04-14T20:28:05.555096+00:00", + "source_trajectory_ids": [ + "traj-03d4d829-ffeb-4969-83b9-dae08e858a8e", + "traj-061c8e42-7cd6-4cca-b9b8-95041eec844d", + "traj-0b075101-cf3f-4793-89cf-c10a5c2d8292", + "traj-176a444f-0601-4c9a-905d-677c3eb2da22", + "traj-18aa001a-3d41-405c-8c0c-188611c8e2c6", + "traj-48364e4f-449b-4ad7-acab-0fe6fbbef59b", + "traj-727c5e76-1001-47d7-8d62-fc8764e6abf4", + "traj-83effcd6-7c25-462c-8754-290a782ddf5c", + "traj-cc60ce08-0bab-435c-8f77-34f82773c046", + "traj-f31a284d-ea48-4b74-92eb-e8002e85f14b" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-a8dc1af4-2dff-48ed-acca-064453de957b.json b/docs/training-reports/report-a8dc1af4-2dff-48ed-acca-064453de957b.json new file mode 100644 index 0000000..d32de26 --- /dev/null +++ b/docs/training-reports/report-a8dc1af4-2dff-48ed-acca-064453de957b.json @@ -0,0 +1,41 @@ +{ + "report_id": "report-a8dc1af4-2dff-48ed-acca-064453de957b", + "timestamp": "2026-04-14T17:16:23.799704+00:00", + "source_trajectory_ids": [ + "traj-17ce2ec3-0e16-4bd4-ac67-019e51caadb8", + "traj-2ae009a2-a5e8-4ff2-9f43-d792771aae64", + "traj-3ef172c6-6ea9-4fd6-83db-177b84191c1e", + "traj-6f9c4819-85bf-4f05-b9d3-79d4e1bc48f6", + "traj-929223b3-aadb-4f68-84d0-46e31c1fd289", + "traj-a9f4c049-d2d8-4c92-bcb3-c0faf0ea367c", + "traj-aa1aa2de-f72e-4672-b435-c20b90e6170c", + "traj-ab8d1feb-96cb-44f1-9712-13ede898eb56", + "traj-b032db6f-0262-4078-8110-ce02963d6f4a", + "traj-c98ae591-93dd-4d1c-acc8-77f17792725d" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-171623" +} \ No newline at end of file diff --git a/docs/training-reports/report-a9207b6a-4ce4-4a2e-8c68-0f8bc824d25a.json b/docs/training-reports/report-a9207b6a-4ce4-4a2e-8c68-0f8bc824d25a.json new file mode 100644 index 0000000..08357bd --- /dev/null +++ b/docs/training-reports/report-a9207b6a-4ce4-4a2e-8c68-0f8bc824d25a.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-a9207b6a-4ce4-4a2e-8c68-0f8bc824d25a", + "timestamp": "2026-04-14T20:03:02.565465+00:00", + "source_trajectory_ids": [ + "traj-02511035-45f1-445a-aba9-628c6cd28531", + "traj-3275a8f1-b2bd-433e-b303-e03b4ebbb9a0", + "traj-43ea388a-0a38-4650-a607-019c02e994c2", + "traj-781042f3-d3f1-491a-8140-2f0b29aaafb2", + "traj-897a2ffd-6b31-4b78-8573-fbb9298c1308", + "traj-c7d1ccc9-5436-4380-8d91-54129f3ca332", + "traj-c7fa45db-f91c-485c-aee5-4d8ae31f7c27", + "traj-cff8e2f2-27b2-4064-89ae-bfe9ac4f5a4b", + "traj-f3e9f993-55ba-41da-8c9f-ec7b2a5a1d85", + "traj-fee1b187-7073-4490-b82a-1a66ae1f061e" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-200302", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-a94d4e71-5183-4bc7-9093-13afd6d4af21.json b/docs/training-reports/report-a94d4e71-5183-4bc7-9093-13afd6d4af21.json new file mode 100644 index 0000000..a6868aa --- /dev/null +++ b/docs/training-reports/report-a94d4e71-5183-4bc7-9093-13afd6d4af21.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-a94d4e71-5183-4bc7-9093-13afd6d4af21", + "timestamp": "2026-04-14T18:03:44.027491+00:00", + "source_trajectory_ids": [ + "traj-10901c12-d7d2-4575-bab0-481bdf4cad64", + "traj-11df76ec-bd7c-45cc-bdb9-c596ab840488", + "traj-2c7bd714-3320-4f81-b173-714c94b15d32", + "traj-3657359a-8270-4599-89bb-a8d1f746f151", + "traj-4782717b-8feb-40db-97fa-d40f2f3ed3f7", + "traj-5a3851a6-3a65-4cfe-a680-6e93ff270b2f", + "traj-6a428f29-648e-492c-951e-f66313f5a223", + "traj-bc4eeb6a-53ef-42be-9358-97f14f5faa51", + "traj-def08080-e44b-4962-9600-f9c93bf1735f", + "traj-e1819f76-b5fe-4a3d-a0eb-0e63baa71367" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null +} \ No newline at end of file diff --git a/docs/training-reports/report-abcad13b-e47e-422d-899b-bd3506c50f90.json b/docs/training-reports/report-abcad13b-e47e-422d-899b-bd3506c50f90.json new file mode 100644 index 0000000..afc091d --- /dev/null +++ b/docs/training-reports/report-abcad13b-e47e-422d-899b-bd3506c50f90.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-abcad13b-e47e-422d-899b-bd3506c50f90", + "timestamp": "2026-04-14T14:58:22.633922+00:00", + "source_trajectory_ids": [ + "traj-03ec4e07-01bd-4a3e-92b8-7d47b9f4f38d", + "traj-162d3154-0c71-4219-b995-a25767b32ce3", + "traj-2b57f91d-15e9-4758-a2c9-233bb766bd6a", + "traj-40731eb6-ff31-4441-bd55-68f96738601d", + "traj-45ab0672-f94c-4823-b102-ef12fccf1058", + "traj-51596c64-2922-4446-b61d-d312ab01bc60", + "traj-614127bd-e303-48b1-a863-3d8eecbaae13", + "traj-b1e405c0-f500-4b8b-b651-37be305ab040", + "traj-d21d0b04-b11b-4052-81bc-b3691d043dab", + "traj-e8dafb23-71db-4d33-993e-3af2cbfbd39d" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null +} \ No newline at end of file diff --git a/docs/training-reports/report-ac39add9-b527-4a69-b204-88ddb1520840.json b/docs/training-reports/report-ac39add9-b527-4a69-b204-88ddb1520840.json new file mode 100644 index 0000000..1d3c022 --- /dev/null +++ b/docs/training-reports/report-ac39add9-b527-4a69-b204-88ddb1520840.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-ac39add9-b527-4a69-b204-88ddb1520840", + "timestamp": "2026-04-14T15:26:25.610444+00:00", + "source_trajectory_ids": [ + "traj-1ceccbf7-6330-47ed-8847-4b91daf7cc4b", + "traj-2b893213-85dc-4790-9bb2-0629483e4d3c", + "traj-36b3fa2f-33a1-49ff-898d-762620569a41", + "traj-5195dada-dc07-4e09-9975-6a89db4235a6", + "traj-5dc5d158-1159-4e4f-93ff-c41e767488c4", + "traj-7790964a-282d-4dcc-b485-747c6c7184d4", + "traj-7bacbf96-41a3-4086-bdec-27eab7b218f5", + "traj-7e67c0b7-9b4f-4abe-b47d-b19ab291cd87", + "traj-d9d8a7de-3f14-4831-98f4-f16fed122d00", + "traj-f6480177-ca71-4aeb-8ffa-2123c92f25b8" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null +} \ No newline at end of file diff --git a/docs/training-reports/report-ad614434-8387-4c0a-9170-9821f5b6e5dc.json b/docs/training-reports/report-ad614434-8387-4c0a-9170-9821f5b6e5dc.json new file mode 100644 index 0000000..4981773 --- /dev/null +++ b/docs/training-reports/report-ad614434-8387-4c0a-9170-9821f5b6e5dc.json @@ -0,0 +1,41 @@ +{ + "report_id": "report-ad614434-8387-4c0a-9170-9821f5b6e5dc", + "timestamp": "2026-04-14T16:51:38.267196+00:00", + "source_trajectory_ids": [ + "traj-009e84d5-54ae-45fe-be70-318309d44e88", + "traj-03204054-d990-4ff9-8a78-5ce4ad733771", + "traj-092679d0-2802-4a05-8a78-08d0018eff91", + "traj-1482c6be-621f-48a8-8c86-5f76b6228610", + "traj-9647873e-9662-4522-9686-340d41b70a7e", + "traj-af69ebb9-eb2e-42ae-b6ea-34f0e01492a8", + "traj-c01cf256-50d1-4987-8ff2-c4296d49f786", + "traj-dbb31292-2d6c-4f7b-98cb-3c25282b6332", + "traj-e49d6087-1a59-4b20-b94e-08dd152d4538", + "traj-fdc62729-ef73-449e-9939-7a1e7346c9b6" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-165138" +} \ No newline at end of file diff --git a/docs/training-reports/report-afb90250-beba-4250-823f-151413488736.json b/docs/training-reports/report-afb90250-beba-4250-823f-151413488736.json new file mode 100644 index 0000000..7f4699d --- /dev/null +++ b/docs/training-reports/report-afb90250-beba-4250-823f-151413488736.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-afb90250-beba-4250-823f-151413488736", + "timestamp": "2026-04-14T20:33:28.562913+00:00", + "source_trajectory_ids": [ + "traj-0471e097-5d37-4815-b0eb-e94932d89df6", + "traj-136d1e6b-1172-40a9-a711-9c1f31c05269", + "traj-38f6850d-18fd-40a9-b84c-882662737e0a", + "traj-40d5edf5-8ce9-41bb-8de5-fe9fa43c6691", + "traj-4a46b326-a971-4fda-9cb0-50f32fc5e360", + "traj-50d662c4-240b-4d1e-aa13-87d60a81753c", + "traj-8af47f18-6a0e-429e-827c-f4f7c861bc43", + "traj-9fb29645-8d07-4924-9b8b-6e29c6e27b91", + "traj-c7e2dcfd-096e-4c3b-8aa3-4f3f5e9216e0", + "traj-d60f2070-0f4e-4365-9e1a-94217c7f85b2" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-203328", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-b08b0c58-bf8a-4778-8be1-14297ef8da75.json b/docs/training-reports/report-b08b0c58-bf8a-4778-8be1-14297ef8da75.json new file mode 100644 index 0000000..6e7f027 --- /dev/null +++ b/docs/training-reports/report-b08b0c58-bf8a-4778-8be1-14297ef8da75.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-b08b0c58-bf8a-4778-8be1-14297ef8da75", + "timestamp": "2026-04-14T16:52:41.195817+00:00", + "source_trajectory_ids": [ + "traj-09039ff7-415b-4f7d-aa5b-9631409dfaeb", + "traj-11f1581f-23fb-464a-96da-70bee88c77b3", + "traj-191e9df4-0fad-43c5-84ca-e975604888ee", + "traj-1a6d2e5d-0b27-4385-9fd3-eede85e9905b", + "traj-1c98843e-f8ae-4591-802f-7becc8ced646", + "traj-8910116d-dd38-4b77-bed2-d68167e051aa", + "traj-916583ca-2b06-4872-9982-a0bf53812925", + "traj-a420b171-d66c-41fc-a0ab-587552e66a08", + "traj-ac38e061-4d94-46e5-841f-51ef534af6fa", + "traj-fbae6722-a3a4-4718-b051-ad1e63d1d501" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null +} \ No newline at end of file diff --git a/docs/training-reports/report-b15b1ed1-f644-49a7-953f-15462a81088a.json b/docs/training-reports/report-b15b1ed1-f644-49a7-953f-15462a81088a.json new file mode 100644 index 0000000..fb150b4 --- /dev/null +++ b/docs/training-reports/report-b15b1ed1-f644-49a7-953f-15462a81088a.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-b15b1ed1-f644-49a7-953f-15462a81088a", + "timestamp": "2026-04-14T22:10:15.388367+00:00", + "source_trajectory_ids": [ + "traj-037d286c-f051-41dd-b52b-6c0bf50ee9b1", + "traj-144db647-0c11-40a7-9025-1309dcf66d3b", + "traj-428ffedd-fa2b-48f9-9ad3-18ede627b50a", + "traj-649cc27f-7580-4951-b29c-fa6c7dbe33aa", + "traj-68e97826-cf15-4ea0-8ff7-b8acfa708ef5", + "traj-6debbbb7-fcdb-440a-8507-e57788eb909b", + "traj-858a7b11-0c27-4dc2-9a01-e48cbcad6557", + "traj-8cf01efb-3af0-48a4-a6c3-2d01c5a6180d", + "traj-9c0bb987-beb6-454b-bd62-4d7eb69ecdfc", + "traj-dbf587c3-637c-4d42-a0ea-c5c15ec1c1af" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-b3889917-dee5-44e9-93e2-d02bac4f591a.json b/docs/training-reports/report-b3889917-dee5-44e9-93e2-d02bac4f591a.json new file mode 100644 index 0000000..8c84a18 --- /dev/null +++ b/docs/training-reports/report-b3889917-dee5-44e9-93e2-d02bac4f591a.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-b3889917-dee5-44e9-93e2-d02bac4f591a", + "timestamp": "2026-04-14T20:04:58.790133+00:00", + "source_trajectory_ids": [ + "traj-065ad0b9-8718-4195-9f4a-a2e6193e3919", + "traj-09fc27f8-ac42-4ba0-a222-2393aa78f02e", + "traj-3e24ada7-d0a1-4096-9fec-cfc046b00f29", + "traj-6d80b27c-d907-4907-ad07-02f4a43a5bd4", + "traj-98ae6e53-996e-4227-8b13-d960e85f64dc", + "traj-a0fa2cd8-d8a6-4fd2-ae4c-f82d6231ed6d", + "traj-bf2d8b66-a2b2-4a9b-a500-ce8c9106b3f8", + "traj-c4703e13-6da0-48a1-8539-0389b4326a2b", + "traj-d6bc19c6-4ea5-4f3e-8aae-e44bb574aa99", + "traj-dae8fad0-a9fd-4189-bebf-5255c217d9ca" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-200458", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-b5ca812a-32d1-4db3-b051-fc048d104bfa.json b/docs/training-reports/report-b5ca812a-32d1-4db3-b051-fc048d104bfa.json new file mode 100644 index 0000000..cc4e33c --- /dev/null +++ b/docs/training-reports/report-b5ca812a-32d1-4db3-b051-fc048d104bfa.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-b5ca812a-32d1-4db3-b051-fc048d104bfa", + "timestamp": "2026-04-14T18:57:05.769250+00:00", + "source_trajectory_ids": [ + "traj-05888383-9224-4b2d-8242-91c084b1c134", + "traj-2c77022b-6f74-4421-bda0-18898681bf4a", + "traj-3593a811-c2b7-43c5-87d9-e5fb423653b2", + "traj-3ce3f763-87a2-4873-a9ec-eafd7c8456ad", + "traj-5b32d46d-3026-443b-822e-b1b4feca14c0", + "traj-6d8730a1-be49-443d-bb69-b804f56cf9a0", + "traj-783b4a60-78ec-45f6-bf55-68a62a29e288", + "traj-be504263-ca1d-429a-8526-16de4be65b3b", + "traj-c19440a5-c45b-4541-85fb-42bcfd8d61c5", + "traj-c291fee4-83f2-4f3b-8d3f-6598bb3cf2de" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-185705", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-b5cb4ed3-3658-4aa7-b82a-2bb69051bf66.json b/docs/training-reports/report-b5cb4ed3-3658-4aa7-b82a-2bb69051bf66.json new file mode 100644 index 0000000..4fa134c --- /dev/null +++ b/docs/training-reports/report-b5cb4ed3-3658-4aa7-b82a-2bb69051bf66.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-b5cb4ed3-3658-4aa7-b82a-2bb69051bf66", + "timestamp": "2026-04-14T21:22:14.602041+00:00", + "source_trajectory_ids": [ + "traj-0228a534-97db-4d3f-b9e2-1ae0747bb805", + "traj-0fc13a52-b42b-4bbe-aab7-6c6e947e7307", + "traj-200269e8-5edc-4498-b8f0-fa5697514404", + "traj-6a820e5e-46ab-496f-a3e3-29a3f0d57db0", + "traj-6c9870b9-6588-43cc-b567-1797b6c4c718", + "traj-7817c80c-c9d3-4aa2-90f5-56d349c3b7f1", + "traj-b9a8ad9a-f2ed-4359-a6aa-665179284484", + "traj-d2a809e9-4a0a-47c2-a10f-d8d7d8ddfe13", + "traj-d6482725-07df-443e-899d-fe4c4d48a428", + "traj-e100291e-f2b3-413e-a4ec-af5b3266d115" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-212214", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-b624fb18-a44b-4165-a263-67e79606bab7.json b/docs/training-reports/report-b624fb18-a44b-4165-a263-67e79606bab7.json new file mode 100644 index 0000000..ae6d916 --- /dev/null +++ b/docs/training-reports/report-b624fb18-a44b-4165-a263-67e79606bab7.json @@ -0,0 +1,41 @@ +{ + "report_id": "report-b624fb18-a44b-4165-a263-67e79606bab7", + "timestamp": "2026-04-14T17:17:57.927250+00:00", + "source_trajectory_ids": [ + "traj-02e88bb2-fa0a-4a26-88c5-b64123fe93f7", + "traj-03ca6759-f0a6-4ce6-8ec3-909c0511e7cc", + "traj-0b150651-df2f-4892-beaa-cbf3b3f5cf49", + "traj-832c8656-ecbf-43f9-8a27-788c3c909a28", + "traj-9607751f-10b9-4182-a9ac-8d8435bf223a", + "traj-a4867557-b4a4-4d55-a129-ff075331c72c", + "traj-a77e1df5-c366-4922-a43f-fbfef5ac2caa", + "traj-c4b15c6b-5945-4442-a1d2-ed9bae9fd580", + "traj-d0df5946-d4d0-4578-9119-8c3ea6919ef6", + "traj-e60df802-1d11-48a7-b6cb-a6c96a330ec4" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-171757" +} \ No newline at end of file diff --git a/docs/training-reports/report-b6a8a135-b797-4239-a193-ed50dac34580.json b/docs/training-reports/report-b6a8a135-b797-4239-a193-ed50dac34580.json new file mode 100644 index 0000000..85b74e6 --- /dev/null +++ b/docs/training-reports/report-b6a8a135-b797-4239-a193-ed50dac34580.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-b6a8a135-b797-4239-a193-ed50dac34580", + "timestamp": "2026-04-14T22:10:23.259626+00:00", + "source_trajectory_ids": [ + "traj-21d4470f-f925-416d-baef-066f54f03eb9", + "traj-2986a37a-ef01-44a2-b284-321dde5911b6", + "traj-359e0bb9-9d3d-4bff-8732-75627ff4d637", + "traj-76524185-f3a4-482a-a200-873003aadf08", + "traj-8881428d-0f16-47b9-bfa6-f33a9c277f36", + "traj-a9ac163e-f0fb-4422-999d-b88049f587b6", + "traj-af08ed4c-f920-4bff-be83-46b595ecb048", + "traj-bcd03d93-f9ac-47d6-9763-d924052351fb", + "traj-bfb72fd6-34d4-4d08-bc51-a9cf99367220", + "traj-ec36d500-0fee-428c-a2eb-b01c432cea15" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-221023", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-b7383132-01b9-4367-a7f2-1aba245fbadd.json b/docs/training-reports/report-b7383132-01b9-4367-a7f2-1aba245fbadd.json new file mode 100644 index 0000000..cf730e0 --- /dev/null +++ b/docs/training-reports/report-b7383132-01b9-4367-a7f2-1aba245fbadd.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-b7383132-01b9-4367-a7f2-1aba245fbadd", + "timestamp": "2026-04-14T18:00:27.811321+00:00", + "source_trajectory_ids": [ + "traj-0689f2e4-311d-487b-89ff-ef90068fadb7", + "traj-0aa5a60c-8c01-4861-9183-8d6074eef288", + "traj-30e01a6f-7e1c-447b-a542-399735b194a9", + "traj-4998be70-3b0a-4d6e-b22b-58aee0911c38", + "traj-52a6a35d-2ddf-48e8-a490-8894d96391a7", + "traj-7a5a71d3-458c-4738-991d-cc130067e868", + "traj-9a52134d-6090-41da-9b9c-3fe33edf484b", + "traj-9d05beab-6c51-4737-ad99-b3f00e950c24", + "traj-9dfbb478-577a-46e0-9550-0262d307b96a", + "traj-b791de9c-eee5-4166-abe7-7080e25fa9e7" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null +} \ No newline at end of file diff --git a/docs/training-reports/report-b7ac968d-1404-47b1-81bd-0d78f65dbeca.json b/docs/training-reports/report-b7ac968d-1404-47b1-81bd-0d78f65dbeca.json new file mode 100644 index 0000000..519612c --- /dev/null +++ b/docs/training-reports/report-b7ac968d-1404-47b1-81bd-0d78f65dbeca.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-b7ac968d-1404-47b1-81bd-0d78f65dbeca", + "timestamp": "2026-04-14T19:41:33.403940+00:00", + "source_trajectory_ids": [ + "traj-0fedb768-b218-4e2a-ab56-d0655d6c947a", + "traj-15327bb3-7a2f-420d-97c6-ea80afc54b59", + "traj-21e16eee-5ac2-4b74-a05c-c0dc7020a25e", + "traj-62984473-f827-4a12-b084-651eb6ea553d", + "traj-66bb33bb-6b33-4ec5-9f37-6bf2f6d41bdb", + "traj-6886aa3a-62b8-4cfd-bb14-8cd430e0cba7", + "traj-8b3ed693-3a22-4e20-8f2a-8030fb132dcc", + "traj-8fb431e8-7ec8-4f41-8d74-8d6103513ac4", + "traj-af35eb2e-9d6c-4305-8833-4513c5d23e7f", + "traj-b0dc17ea-3956-404a-a52c-87fbf4b5c546" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-194133", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-b96a7d12-192c-4a4b-9bb4-6d420db36e62.json b/docs/training-reports/report-b96a7d12-192c-4a4b-9bb4-6d420db36e62.json new file mode 100644 index 0000000..67cc5ae --- /dev/null +++ b/docs/training-reports/report-b96a7d12-192c-4a4b-9bb4-6d420db36e62.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-b96a7d12-192c-4a4b-9bb4-6d420db36e62", + "timestamp": "2026-04-14T22:05:43.990256+00:00", + "source_trajectory_ids": [ + "traj-00ce0e5c-3e46-47b0-b69f-131cfd13e311", + "traj-071fd37a-7fe1-4299-8f3b-64013316eb20", + "traj-1a05680f-94fd-4fae-92a9-2cbb55041263", + "traj-4711d2da-0d1e-4d33-863b-d1b1769c7780", + "traj-56fb49ad-8eaf-4c31-82d8-4f99688d0865", + "traj-78ed00a1-be06-4efd-959f-76d172d02081", + "traj-a0e1ce5d-1089-44c2-900f-7c3b298c0234", + "traj-a44ca7d1-78db-46de-b144-42d70a1d0bfc", + "traj-bd696e6c-e004-49f7-9967-b991bbe5369f", + "traj-fdbb3b87-c254-4d81-a06d-ea0ceb7e3093" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-220543", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-b9b63402-56c4-4082-ab3b-ce9d6bcb7300.json b/docs/training-reports/report-b9b63402-56c4-4082-ab3b-ce9d6bcb7300.json new file mode 100644 index 0000000..589ac84 --- /dev/null +++ b/docs/training-reports/report-b9b63402-56c4-4082-ab3b-ce9d6bcb7300.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-b9b63402-56c4-4082-ab3b-ce9d6bcb7300", + "timestamp": "2026-04-14T22:09:38.829716+00:00", + "source_trajectory_ids": [ + "traj-26d89301-9410-4f3a-b5e3-abe70b34449b", + "traj-36e6c163-f3c1-4455-a2e3-2086b49ad2ff", + "traj-48e59c9e-bd24-43b3-9f45-40df2a51b0e9", + "traj-69ed9ad6-3c96-46ab-9e03-4293ab89f00e", + "traj-77b4ee5c-3632-4f28-afb3-2d36355a709a", + "traj-8cde9570-81f2-4bf3-b052-5a7e570fe584", + "traj-b1522698-1324-4157-a938-e4eaca620616", + "traj-db8ac9c4-cf34-4202-8392-a84d57eba527", + "traj-f6027573-cd03-4538-9f9f-71702bf9dcf1", + "traj-fa4760e8-f762-4d68-93a9-3a222336aff2" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-220938", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-ba1732f5-2baf-4c43-933f-988b5154d0f8.json b/docs/training-reports/report-ba1732f5-2baf-4c43-933f-988b5154d0f8.json new file mode 100644 index 0000000..d3e71f1 --- /dev/null +++ b/docs/training-reports/report-ba1732f5-2baf-4c43-933f-988b5154d0f8.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-ba1732f5-2baf-4c43-933f-988b5154d0f8", + "timestamp": "2026-04-14T22:10:15.421472+00:00", + "source_trajectory_ids": [ + "traj-0011f4e3-1e88-4436-8c14-c7d141d79d46", + "traj-1ecd6c57-826c-49db-ab52-e1591940b7ba", + "traj-22b8c4d4-87b2-4c18-a6e3-4bd606bd668c", + "traj-2c1fc276-d7df-49c0-a650-ac3dc36ce843", + "traj-398e90e6-7085-49f5-b2ad-3d3e5af28be4", + "traj-9a591a6c-8109-49c7-9ea5-a13a1be0072e", + "traj-a1a7f2aa-c82a-4c37-a683-fd3133e1cb41", + "traj-c37d86df-2122-4727-a07b-3a385cec8928", + "traj-cba4cb94-1e32-4157-9d37-81eacb4bb837", + "traj-e850792b-93b4-4902-8882-00b7458dcf0a" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-221015", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-ba50f102-0064-4a66-94b0-f70255912adf.json b/docs/training-reports/report-ba50f102-0064-4a66-94b0-f70255912adf.json new file mode 100644 index 0000000..3de4e76 --- /dev/null +++ b/docs/training-reports/report-ba50f102-0064-4a66-94b0-f70255912adf.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-ba50f102-0064-4a66-94b0-f70255912adf", + "timestamp": "2026-04-14T18:05:53.342527+00:00", + "source_trajectory_ids": [ + "traj-23a6ed04-1fca-4471-9829-d6380b446e4d", + "traj-27e30234-7703-41b6-a67c-ea578304d23c", + "traj-387b8923-472c-472f-b164-8f8e4e10c109", + "traj-50440795-3a97-4965-baa4-c2e4e879f04e", + "traj-5251de53-5592-463c-9ff4-c4a355dd79ac", + "traj-576d8a39-b1cc-4b0f-b579-9db70c57dad8", + "traj-61d38340-1cfe-478a-8a3d-fa70a0e659db", + "traj-7cde25d9-6c7c-4f50-ae43-cd69f41fdce3", + "traj-9435c839-1ae3-46e7-a1e6-0b2a2558dbc6", + "traj-d2d5388f-f2a9-487c-a491-ecae4e887756" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null +} \ No newline at end of file diff --git a/docs/training-reports/report-bac33bbb-f742-4452-ad1e-3f38f1b1da41.json b/docs/training-reports/report-bac33bbb-f742-4452-ad1e-3f38f1b1da41.json new file mode 100644 index 0000000..ff99991 --- /dev/null +++ b/docs/training-reports/report-bac33bbb-f742-4452-ad1e-3f38f1b1da41.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-bac33bbb-f742-4452-ad1e-3f38f1b1da41", + "timestamp": "2026-04-15T01:41:52.348551+00:00", + "source_trajectory_ids": [ + "traj-1acb8315-4183-408e-baac-6c5d10a76ad0", + "traj-6f54e908-779a-46ea-9b64-0eaa71eb0e79", + "traj-75460b31-a146-4e33-bafd-11431cde6ab5", + "traj-85ab9708-1e7b-430a-9bdd-58f7edf87a91", + "traj-8da986f4-6c63-4cbf-a7b1-7863f137d09f", + "traj-9b2f3762-dc26-4325-9576-1b6ea291af0d", + "traj-a13a088f-68c6-47cf-ac0c-62918b634ecd", + "traj-bd3aabad-dea4-40fe-8eae-1863e68bae66", + "traj-c181d665-ffb1-48a3-9351-70d90f300877", + "traj-e96d6fe3-30e8-429a-b6de-07d9d65b8614" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260415-014152", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-bdbfaea5-f10d-4d52-aab6-0c51903a34c6.json b/docs/training-reports/report-bdbfaea5-f10d-4d52-aab6-0c51903a34c6.json new file mode 100644 index 0000000..af9719e --- /dev/null +++ b/docs/training-reports/report-bdbfaea5-f10d-4d52-aab6-0c51903a34c6.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-bdbfaea5-f10d-4d52-aab6-0c51903a34c6", + "timestamp": "2026-04-14T22:05:43.973354+00:00", + "source_trajectory_ids": [ + "traj-013904bf-451f-4fc8-886e-8bd677c396fa", + "traj-12db6565-4add-40a6-bd7a-bdcba778740c", + "traj-23de983d-e612-4b47-a896-5882bad7f55c", + "traj-4410b217-3e73-4d24-93fc-3ecc16709ffe", + "traj-4be8e3ad-07ac-4f8b-adc5-6e9246a666fa", + "traj-b6bcacb9-35e8-4058-8dbd-d8d2efcf82be", + "traj-c48aef4e-be06-4295-8934-0efa2fa62e64", + "traj-c657e1e0-7974-4bdd-8189-386644098129", + "traj-e6af881a-4dcb-456d-9418-874d1fb53f0b", + "traj-eaac1f04-0cb0-4a2a-aa5d-75cc8b312645" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-220543", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-bf874f6f-6b22-40fb-881b-3a657de8c947.json b/docs/training-reports/report-bf874f6f-6b22-40fb-881b-3a657de8c947.json new file mode 100644 index 0000000..430eed4 --- /dev/null +++ b/docs/training-reports/report-bf874f6f-6b22-40fb-881b-3a657de8c947.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-bf874f6f-6b22-40fb-881b-3a657de8c947", + "timestamp": "2026-04-14T15:04:26.240344+00:00", + "source_trajectory_ids": [ + "traj-1682ca68-24d4-400f-819b-159269099ef9", + "traj-1a0710b3-1ce2-4f92-a905-f6bb334c2e03", + "traj-31847413-2d52-477a-8cda-f5c6aaa96d4e", + "traj-61cbf705-2d7d-4c98-9342-ca21d2acc5a5", + "traj-8a23fecf-d6ca-428e-bbdb-97f932c9d837", + "traj-b880afc6-d668-4cd7-a3c7-1c97e835ba47", + "traj-c95e2379-7f34-4c82-8afb-54934a62dabb", + "traj-d7e0cc79-197b-4e09-9902-be1168fce911", + "traj-e4303b1e-47a3-4a03-be69-523bc53db3dc", + "traj-e7fc1eb5-c4e9-4b1e-8abf-34cd01612431" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null +} \ No newline at end of file diff --git a/docs/training-reports/report-c01cc9e5-1955-4dd9-9c76-c47a81862cbb.json b/docs/training-reports/report-c01cc9e5-1955-4dd9-9c76-c47a81862cbb.json new file mode 100644 index 0000000..e877722 --- /dev/null +++ b/docs/training-reports/report-c01cc9e5-1955-4dd9-9c76-c47a81862cbb.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-c01cc9e5-1955-4dd9-9c76-c47a81862cbb", + "timestamp": "2026-04-14T17:38:32.809545+00:00", + "source_trajectory_ids": [ + "traj-3972e30c-c93d-4a1e-b53b-f2dee9ec910e", + "traj-51f1987a-30d2-44bd-b0c0-a958bc8cdc04", + "traj-564404fe-f2c1-4962-81a1-6d1c44cfa858", + "traj-652c319b-368f-43a5-a00d-5f5eed161e12", + "traj-6578adb1-ca9c-46ba-b544-d8e3bdd2ae22", + "traj-6b30fbd2-751c-4f50-9bd0-a689840b0687", + "traj-76e06af8-ed94-4866-93ab-1f874dce5926", + "traj-809634f9-e464-4b97-9db7-f574c9160993", + "traj-b99be6b0-71ae-4b3c-9d6a-d3a0159fdf92", + "traj-c24e00a1-8ce2-4178-80b0-ace21f9b3c8b" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null +} \ No newline at end of file diff --git a/docs/training-reports/report-c06df51a-9e80-452e-b1cc-97ff0464287a.json b/docs/training-reports/report-c06df51a-9e80-452e-b1cc-97ff0464287a.json new file mode 100644 index 0000000..3e809a2 --- /dev/null +++ b/docs/training-reports/report-c06df51a-9e80-452e-b1cc-97ff0464287a.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-c06df51a-9e80-452e-b1cc-97ff0464287a", + "timestamp": "2026-04-14T19:41:58.758717+00:00", + "source_trajectory_ids": [ + "traj-04c786ef-d1e2-49fa-9b09-a257924e8a8a", + "traj-141c99cd-6de0-4b94-b1fd-080bdd9682c7", + "traj-31f846ae-2ed7-4b40-8b12-d50dd072068a", + "traj-3f63215d-575c-4cf9-b203-573eca4be6eb", + "traj-5e54b1a3-07be-45b7-8398-341aa6dbfadf", + "traj-86f55c72-139c-43e1-82e5-faf2346bdef8", + "traj-9969138c-855c-4bcd-b3f6-c41c716e4afe", + "traj-9a3b3648-0d8c-4329-9c6b-e05a130b4fe8", + "traj-d0ea1484-2964-4b98-90fe-74c59b0b7e14", + "traj-dbccd463-80bf-4cbc-890b-154c9fd631ca" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-194158", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-c072450b-837b-4c9d-b890-906af6470902.json b/docs/training-reports/report-c072450b-837b-4c9d-b890-906af6470902.json new file mode 100644 index 0000000..77137df --- /dev/null +++ b/docs/training-reports/report-c072450b-837b-4c9d-b890-906af6470902.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-c072450b-837b-4c9d-b890-906af6470902", + "timestamp": "2026-04-14T21:44:48.318468+00:00", + "source_trajectory_ids": [ + "traj-173485d8-32c3-4e6f-bd00-c3fbab201bd9", + "traj-2ce59a34-0c0f-4109-b7c8-e1b2e881ed9c", + "traj-388456f0-a36a-4ad5-a647-763b00b46a54", + "traj-41c56bb8-ddf9-43fb-9609-3a7b3ca2722d", + "traj-7a4e855e-ce55-4b30-9c94-da0cb7b9337a", + "traj-7fe376fc-2c1d-4d89-8f79-eb74d80f0609", + "traj-8c1525e4-67ce-428d-bb68-5721fc48af03", + "traj-a51b2451-eeab-4b01-ae03-0cfd8a0c751a", + "traj-acf5c8eb-c562-445f-9509-36fb199a5278", + "traj-ff4c62b3-f124-4d4e-8375-393213f36ec0" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-c0c51d04-9150-4d99-a072-cb74bd449d57.json b/docs/training-reports/report-c0c51d04-9150-4d99-a072-cb74bd449d57.json new file mode 100644 index 0000000..54fdb51 --- /dev/null +++ b/docs/training-reports/report-c0c51d04-9150-4d99-a072-cb74bd449d57.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-c0c51d04-9150-4d99-a072-cb74bd449d57", + "timestamp": "2026-04-15T01:29:18.109113+00:00", + "source_trajectory_ids": [ + "traj-0ae4c52d-e2b9-4c1e-9e82-9e4643eac994", + "traj-0e675169-7841-47a6-b686-fbc5b709e2b5", + "traj-1ed2bcc2-f67a-49be-b23b-36c6576e0f38", + "traj-4a6a7c91-8852-4d20-9e68-b44512d859b3", + "traj-706288b8-dba8-49eb-b4fc-c370c2e70da8", + "traj-7c985ba5-58cd-4890-b237-62476f9b1a64", + "traj-a924efc9-ec0d-4cd3-aaff-35140f3137b7", + "traj-d4e28964-e670-44de-9038-9fe75aec5439", + "traj-ddad6c58-d793-48c8-bd7c-8ec8255eb34a", + "traj-f773c13d-5a33-4fae-bf4e-735aba862ccb" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260415-012918", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-c0ea3db2-6e3c-4bb6-974e-4b151528897a.json b/docs/training-reports/report-c0ea3db2-6e3c-4bb6-974e-4b151528897a.json new file mode 100644 index 0000000..94fa481 --- /dev/null +++ b/docs/training-reports/report-c0ea3db2-6e3c-4bb6-974e-4b151528897a.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-c0ea3db2-6e3c-4bb6-974e-4b151528897a", + "timestamp": "2026-04-15T01:25:33.790818+00:00", + "source_trajectory_ids": [ + "traj-0e621cdc-27e6-4aea-8237-1913e07e3a04", + "traj-1f607627-554f-470c-b755-1a345abdf2fe", + "traj-3f268ad9-c65d-4977-a300-92085c2eceb6", + "traj-7e08836a-280c-445b-9bf1-79da285d17cf", + "traj-930c7c4e-914a-45f4-8c8c-4abf43ddf22e", + "traj-9fbe5b46-1c80-444a-b6fa-3f666bbdfb84", + "traj-addf8af3-81ad-4577-b0ed-3d2603965271", + "traj-c1f3e575-d22b-4d20-a06d-9d5c736ce9ad", + "traj-e192a163-fc63-487d-8dc7-e7480cb1b3bd", + "traj-ee92b488-6994-489a-88cb-03d2cd57c852" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260415-012533", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-c17a47a0-db7c-4a1b-93ad-9b9b3cb91be6.json b/docs/training-reports/report-c17a47a0-db7c-4a1b-93ad-9b9b3cb91be6.json new file mode 100644 index 0000000..cb3d8f5 --- /dev/null +++ b/docs/training-reports/report-c17a47a0-db7c-4a1b-93ad-9b9b3cb91be6.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-c17a47a0-db7c-4a1b-93ad-9b9b3cb91be6", + "timestamp": "2026-04-14T20:02:20.820321+00:00", + "source_trajectory_ids": [ + "traj-15d398ff-cf8a-4740-98b9-6f767eb2701c", + "traj-20a09faf-8c47-47a7-96b3-b5540f4f9741", + "traj-23b2ea87-fb1e-47a6-94c6-f87c6dbd6a6f", + "traj-340baf51-ca18-4930-807e-92da95cab7d1", + "traj-3ae727c5-8028-45a1-b721-8b8cd450c9a6", + "traj-89b5b1a5-4794-402c-94ac-ae889bfd0330", + "traj-8fb8a6da-0eeb-4ed9-a2d9-b4327dcc58f6", + "traj-99d911b8-6058-4968-95ad-f65b90564dc9", + "traj-b78b77c4-3968-4a6e-b690-89b24989fc51", + "traj-c294451a-3fcf-4d6d-be30-3dd35d97d46e" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-200220", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-c19bb0a6-2d9a-4989-9457-8af5e912ec9b.json b/docs/training-reports/report-c19bb0a6-2d9a-4989-9457-8af5e912ec9b.json new file mode 100644 index 0000000..be6c979 --- /dev/null +++ b/docs/training-reports/report-c19bb0a6-2d9a-4989-9457-8af5e912ec9b.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-c19bb0a6-2d9a-4989-9457-8af5e912ec9b", + "timestamp": "2026-04-14T20:58:05.443004+00:00", + "source_trajectory_ids": [ + "traj-2ea69728-7b80-4764-b555-c3a0d6505837", + "traj-3461130a-cd1d-4690-a394-8c05edcbfde2", + "traj-35e46fbd-f374-489d-97b9-37a5fd6be082", + "traj-3f7b645c-ff20-41ee-8bd5-1a1d8e89710f", + "traj-6f6b27ec-fe35-49c7-b32b-2376e7319c5b", + "traj-81bf4283-a56f-4746-8169-fa772d827a58", + "traj-c4717ea8-6eb9-4376-99d6-9de31e014efd", + "traj-c4ca1daa-bf73-4eaa-8822-88bb915f3956", + "traj-d4c19cb0-13fe-4e5f-af0b-e539c1097a3b", + "traj-f13f5053-3a00-448a-b6db-6be812ed26e4" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-205805", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-c1a8c365-3589-492b-a5ec-bb642ee5a49d.json b/docs/training-reports/report-c1a8c365-3589-492b-a5ec-bb642ee5a49d.json new file mode 100644 index 0000000..fc2e2d7 --- /dev/null +++ b/docs/training-reports/report-c1a8c365-3589-492b-a5ec-bb642ee5a49d.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-c1a8c365-3589-492b-a5ec-bb642ee5a49d", + "timestamp": "2026-04-15T01:36:36.519137+00:00", + "source_trajectory_ids": [ + "traj-11094fa0-83dd-4230-b7c8-61d9e49012e0", + "traj-21156bd2-738c-4c8d-ba5a-1c43c57e9c8a", + "traj-8323ac23-e4be-45f9-80a8-342f454ebb9b", + "traj-89986fa7-3cbb-4cbd-9c3a-b3ef8bf379bf", + "traj-93ae8e66-ab43-41fb-8d4a-a4897238859a", + "traj-97448ce5-33e4-4730-af43-6eeddbaecf41", + "traj-a345d9ec-af0a-4111-838a-fc42f11a9722", + "traj-adb59401-f166-4167-9800-49782b5cf8ae", + "traj-b37dd814-79b3-4987-a87e-bcdf016d3fd0", + "traj-d3f4a3cf-d3e8-4bfb-9bc3-b339d5587645" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260415-013636", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-c3173371-285f-494d-9b27-9bfe27bd4e32.json b/docs/training-reports/report-c3173371-285f-494d-9b27-9bfe27bd4e32.json new file mode 100644 index 0000000..6a72630 --- /dev/null +++ b/docs/training-reports/report-c3173371-285f-494d-9b27-9bfe27bd4e32.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-c3173371-285f-494d-9b27-9bfe27bd4e32", + "timestamp": "2026-04-15T01:57:32.724964+00:00", + "source_trajectory_ids": [ + "traj-45a98f88-a40c-474a-9096-9eccbd472cff", + "traj-51e9d1a3-e056-4dac-8f8f-2682112787c2", + "traj-54b8b23e-bce6-4cce-a33f-9bf27b44d0ac", + "traj-7283e59e-f4ba-46b7-a6d9-4195d74aab77", + "traj-8505e1bd-f4ee-41ed-8301-20dc0b16d263", + "traj-a9787b27-0d54-40c0-810b-3f689dcf3a29", + "traj-c7bdd3d8-358a-44cc-85f7-b0f0c5503df0", + "traj-da3a21e8-ffc5-4424-9fcc-558e6ac6dab0", + "traj-e1938779-6d55-47bc-a36d-e7314f5c8444", + "traj-eed42b8e-78bb-49f3-8976-7982dd41fe55" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260415-015732", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-c3e96ae8-32ea-4bb4-a431-b24e68aee196.json b/docs/training-reports/report-c3e96ae8-32ea-4bb4-a431-b24e68aee196.json new file mode 100644 index 0000000..ce151e2 --- /dev/null +++ b/docs/training-reports/report-c3e96ae8-32ea-4bb4-a431-b24e68aee196.json @@ -0,0 +1,41 @@ +{ + "report_id": "report-c3e96ae8-32ea-4bb4-a431-b24e68aee196", + "timestamp": "2026-04-14T15:25:30.504606+00:00", + "source_trajectory_ids": [ + "traj-26936c0b-44c0-436d-b733-3a58d091df06", + "traj-2b2f336b-f53d-446a-b8ba-92dd960b587a", + "traj-42034694-89c7-468c-990e-10fd7fe171ed", + "traj-84bb3272-3a6a-490c-8a77-b54fec9ef499", + "traj-8c507785-8bba-4070-b93b-1c08dfebb435", + "traj-93c7668d-cd94-45a0-8c7c-3e77814d8ff7", + "traj-9750eb36-1d03-4ebd-9b5c-8fae31366bdf", + "traj-aeb8e090-5253-4a3a-a70c-ef7d4184c24f", + "traj-c7421208-ae56-4be1-95dc-508009d01a72", + "traj-fdb9a0ee-a2c1-48ee-8734-033f9604cd16" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-152530" +} \ No newline at end of file diff --git a/docs/training-reports/report-c5aa0414-d36d-4ebe-86ab-ca6b800fa588.json b/docs/training-reports/report-c5aa0414-d36d-4ebe-86ab-ca6b800fa588.json new file mode 100644 index 0000000..9b23fd6 --- /dev/null +++ b/docs/training-reports/report-c5aa0414-d36d-4ebe-86ab-ca6b800fa588.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-c5aa0414-d36d-4ebe-86ab-ca6b800fa588", + "timestamp": "2026-04-14T16:49:44.884587+00:00", + "source_trajectory_ids": [ + "traj-2d8901ee-5d76-4dd9-a49d-1c883a02bdea", + "traj-35fe497d-2203-4d17-bc82-3dddfa76d541", + "traj-4c2fcb31-652d-4e72-84e9-cc2ddc0c4601", + "traj-5bdda2b8-0593-4c8e-aa6a-95ac8fa0d108", + "traj-6571e341-23a4-4e58-b377-7929e3e0c3fa", + "traj-90e0e4bc-c9ac-422b-a689-4208b9759f1e", + "traj-b96791b9-6ed9-40b6-919c-58e2af3b7561", + "traj-c25c91bf-dc9e-42f4-ac1a-caae371bf3c3", + "traj-da88d8b5-cf0a-43e8-bd28-600e518d00fb", + "traj-e422e41c-901e-431f-9753-bd0b33c9de36" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null +} \ No newline at end of file diff --git a/docs/training-reports/report-c852d6d2-752e-4047-8ea1-974f09b93f9a.json b/docs/training-reports/report-c852d6d2-752e-4047-8ea1-974f09b93f9a.json new file mode 100644 index 0000000..e608710 --- /dev/null +++ b/docs/training-reports/report-c852d6d2-752e-4047-8ea1-974f09b93f9a.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-c852d6d2-752e-4047-8ea1-974f09b93f9a", + "timestamp": "2026-04-14T20:34:01.592740+00:00", + "source_trajectory_ids": [ + "traj-05c1d7fa-b0b4-4604-bdcb-d9e7177bdd42", + "traj-0a4153d5-3c8e-4bef-af61-50dc3d96e882", + "traj-148c678f-f8a0-4e20-b0eb-11a16efffdcb", + "traj-6039897d-4781-4c47-88f5-66ddb64671a0", + "traj-91b6d544-77db-4ae3-923b-98d1c29a5eea", + "traj-a02972fc-6eab-4bfe-88a5-8d0e8e76ecb7", + "traj-a36478d9-a059-4b41-8c83-5c6274ccf24d", + "traj-b1830d6e-8a4d-483c-9a0e-c12b36538204", + "traj-dcd714ed-f057-4072-b956-920d641aa8da", + "traj-e4a26d08-16fd-442d-a3ad-f59498229e69" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-c9311f0d-4049-477f-afe6-7076642f0ec2.json b/docs/training-reports/report-c9311f0d-4049-477f-afe6-7076642f0ec2.json new file mode 100644 index 0000000..c13b492 --- /dev/null +++ b/docs/training-reports/report-c9311f0d-4049-477f-afe6-7076642f0ec2.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-c9311f0d-4049-477f-afe6-7076642f0ec2", + "timestamp": "2026-04-14T20:57:28.048313+00:00", + "source_trajectory_ids": [ + "traj-0c670378-8585-43d8-9842-6271b7f464fe", + "traj-2440d4a0-4f59-4085-a2da-9b8a9106421c", + "traj-70adbbbe-86b6-438e-80be-392553e727cf", + "traj-73e465b5-794e-494b-b2c0-d7c6eb6d62ac", + "traj-9d7ada37-f233-4e88-9a2d-6fc499b29163", + "traj-a2a2f85a-f7d0-43da-8207-a55c60fa7853", + "traj-c1877544-d735-460a-a663-89a62e8e0d90", + "traj-c6ffcb61-0919-40be-baef-14fbf8b1a2d8", + "traj-c9914af3-c974-4e7f-adfb-22576edb65b4", + "traj-da28d36d-fe4d-4c68-9995-10d5193fb01c" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-205728", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-ca9f7219-01ad-42e2-b6e5-47d0654c30d9.json b/docs/training-reports/report-ca9f7219-01ad-42e2-b6e5-47d0654c30d9.json new file mode 100644 index 0000000..6814e3f --- /dev/null +++ b/docs/training-reports/report-ca9f7219-01ad-42e2-b6e5-47d0654c30d9.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-ca9f7219-01ad-42e2-b6e5-47d0654c30d9", + "timestamp": "2026-04-14T22:05:44.007084+00:00", + "source_trajectory_ids": [ + "traj-08efb59a-fa69-4b7f-8be6-b53e3521474e", + "traj-2758fe69-867a-44fc-b2f9-fdaac8e3dcc9", + "traj-455ff9bb-cd0c-46ab-ae65-6f040ed26b3b", + "traj-5172d14f-0bd7-4ef9-9e60-44badea48055", + "traj-5ec85b29-41c4-4ae0-b293-7d0e9158ff6e", + "traj-611d0c0a-96db-486a-905e-5df1aa2a8d98", + "traj-634297e6-68ad-4f79-8816-11a66d5ea299", + "traj-7accfe4e-961d-49c6-bc04-31e8f94453da", + "traj-9065b828-578a-4b33-8ee9-86ec033a78e4", + "traj-a14c8e2a-f256-47fc-96db-ccbb86340e25" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-220544", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-cd34b1ad-03d6-4caa-a013-dc3344fbadfc.json b/docs/training-reports/report-cd34b1ad-03d6-4caa-a013-dc3344fbadfc.json new file mode 100644 index 0000000..3719ed3 --- /dev/null +++ b/docs/training-reports/report-cd34b1ad-03d6-4caa-a013-dc3344fbadfc.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-cd34b1ad-03d6-4caa-a013-dc3344fbadfc", + "timestamp": "2026-04-14T20:56:11.558946+00:00", + "source_trajectory_ids": [ + "traj-189f9afd-a820-4b6c-9113-90b783b9221a", + "traj-1a836db9-5cc0-49d7-97ba-652559847588", + "traj-262c5f9b-f588-4f52-93c5-aa91ab26b4b5", + "traj-3e5d9867-cdba-4a2f-a0ee-45927ba29a6b", + "traj-7b20761c-cfed-41a9-9ce2-3e8a51b8a44d", + "traj-9155d91f-1e61-4e48-b7be-c08ce26ac19a", + "traj-ab6306ba-730b-4489-bdf4-fc672b34b542", + "traj-b77cc94e-8069-44bc-9c74-1377ab559833", + "traj-c753ac1b-5934-48b0-8592-b00926b43b21", + "traj-e2616df0-0f8a-4747-b3e5-2f8a8a7e51e6" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-205611", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-cf57461d-9cd9-4091-954c-2116899fa819.json b/docs/training-reports/report-cf57461d-9cd9-4091-954c-2116899fa819.json new file mode 100644 index 0000000..33ee044 --- /dev/null +++ b/docs/training-reports/report-cf57461d-9cd9-4091-954c-2116899fa819.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-cf57461d-9cd9-4091-954c-2116899fa819", + "timestamp": "2026-04-14T22:09:38.908061+00:00", + "source_trajectory_ids": [ + "traj-00dc33e8-b08d-455e-bfc0-ea7b2a61d37b", + "traj-2d49f999-6433-4c08-aff0-c397a3510e19", + "traj-5d53bd03-4d74-46d4-8a2e-6a1ea0058b2d", + "traj-6febe1f4-9d67-4c49-a2ec-7982b0f02178", + "traj-743f7d85-4529-4e29-8b5f-4664740fca18", + "traj-8d0bdac7-9384-45dd-99d1-924962681054", + "traj-a3a28ab1-dbf8-4b5f-95eb-5676d2e1f4a3", + "traj-aaf34328-4ac9-4562-95d2-7dd7e38aae1d", + "traj-c5b9b9c4-4a62-4177-b262-62a37b11497c", + "traj-f10615e9-a2c9-46f0-950b-3a6d935bd4ea" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-cfd806b9-d65f-42de-aa48-83d459f3f19c.json b/docs/training-reports/report-cfd806b9-d65f-42de-aa48-83d459f3f19c.json new file mode 100644 index 0000000..22c90b3 --- /dev/null +++ b/docs/training-reports/report-cfd806b9-d65f-42de-aa48-83d459f3f19c.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-cfd806b9-d65f-42de-aa48-83d459f3f19c", + "timestamp": "2026-04-15T01:36:36.645965+00:00", + "source_trajectory_ids": [ + "traj-2b1c7197-9cf6-4832-8373-8d9623ffdbe9", + "traj-4be5befc-b1d7-44a9-829b-b62f9b618e79", + "traj-51263133-7a86-4619-9a3b-c77c72cf6dcb", + "traj-61e9ab6e-ad84-4b9b-bf1f-a854003ea0ef", + "traj-65bc098c-dedc-4c3a-89c9-7bb69de25bf4", + "traj-786afb59-6853-4862-8746-a8e627b4e946", + "traj-b301d62d-79d1-4b7e-9cc6-1ec377d5e674", + "traj-bf2d6c27-aacf-4596-9fae-96882bdebb22", + "traj-c22c3912-924b-4d21-a713-215416810099", + "traj-c64acab7-5425-4ffd-a28a-4ed346193aad" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-d02ccedd-fdf8-4dd7-af8f-d132765084f9.json b/docs/training-reports/report-d02ccedd-fdf8-4dd7-af8f-d132765084f9.json new file mode 100644 index 0000000..870ecc8 --- /dev/null +++ b/docs/training-reports/report-d02ccedd-fdf8-4dd7-af8f-d132765084f9.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-d02ccedd-fdf8-4dd7-af8f-d132765084f9", + "timestamp": "2026-04-14T20:34:01.494134+00:00", + "source_trajectory_ids": [ + "traj-05f6993d-cdf9-45b4-9c4e-90eb1cbbf143", + "traj-46cdd34c-3c0e-49a0-80c5-f25305b57abb", + "traj-70bd5eaf-962d-4f2f-829a-0ec1c92fe092", + "traj-78071eb9-5083-4961-a87c-f6e4569ad858", + "traj-88c5e46a-d37d-415a-adcf-7b74faa641e5", + "traj-8a7671bc-4e2e-45f6-9dff-4ff281592f1d", + "traj-b1ff2f33-7b98-4eed-940b-1c9ecd2acc43", + "traj-b76252cb-642c-4996-a8ba-b65e868b6efc", + "traj-d321e753-9f95-43dc-99bf-379509a3911a", + "traj-ef722e2c-76ec-4a6a-a984-69da5b33ff1f" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-203401", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-d0cd90a5-ef4a-4167-b3db-67c417669aea.json b/docs/training-reports/report-d0cd90a5-ef4a-4167-b3db-67c417669aea.json new file mode 100644 index 0000000..484ff4c --- /dev/null +++ b/docs/training-reports/report-d0cd90a5-ef4a-4167-b3db-67c417669aea.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-d0cd90a5-ef4a-4167-b3db-67c417669aea", + "timestamp": "2026-04-14T20:03:02.548083+00:00", + "source_trajectory_ids": [ + "traj-4399ca04-9011-45c9-91b0-663c823ad37f", + "traj-45de2e2d-cfb7-44cc-ac79-9dd18f865fd4", + "traj-942fd417-caa3-4e68-bb36-23fb2bfc7fd5", + "traj-95cad70c-56c6-4222-9323-6a561bf5d7dc", + "traj-9a185707-02dd-49d7-8c34-b0ba058b3f22", + "traj-a19dd27e-6a49-4dd5-b2d6-98334ed6c971", + "traj-a9dd1c05-92b9-49fa-9b3a-a58932d5dcbe", + "traj-abc842b8-ad40-41bb-882b-2209b409f32a", + "traj-b0b00a31-288d-42bd-9feb-66a9a7c4f816", + "traj-cb670c02-9592-4d5a-bb11-8ea6d004b3e8" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-200302", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-d34a861c-41f3-430e-af58-0fe460911260.json b/docs/training-reports/report-d34a861c-41f3-430e-af58-0fe460911260.json new file mode 100644 index 0000000..490fa3e --- /dev/null +++ b/docs/training-reports/report-d34a861c-41f3-430e-af58-0fe460911260.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-d34a861c-41f3-430e-af58-0fe460911260", + "timestamp": "2026-04-14T21:44:48.308452+00:00", + "source_trajectory_ids": [ + "traj-0268003b-2c0a-417e-95fb-7ad9eac382c6", + "traj-11815e71-e969-46fd-b92a-b9d30c27f818", + "traj-1b21bd87-6895-49a4-aa1a-0e295420a772", + "traj-29625285-f1bd-4114-ad94-099624a58846", + "traj-2e1e2c62-3c9e-4593-bc32-d6e8a8b40821", + "traj-76d68926-c753-4936-b577-d5c30fb06968", + "traj-8138144a-221f-4177-82b6-0eb5fa4660b3", + "traj-b5ae8e3a-3b60-458d-9bcb-a2b47be458b9", + "traj-cf563c03-cd68-44ab-af02-8ddb54dab197", + "traj-efa123d6-2c74-4c0e-beae-761934a045f2" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-214448", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-d448c7ff-8162-4ae7-b489-68c67675a752.json b/docs/training-reports/report-d448c7ff-8162-4ae7-b489-68c67675a752.json new file mode 100644 index 0000000..cc6804c --- /dev/null +++ b/docs/training-reports/report-d448c7ff-8162-4ae7-b489-68c67675a752.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-d448c7ff-8162-4ae7-b489-68c67675a752", + "timestamp": "2026-04-14T21:21:15.238780+00:00", + "source_trajectory_ids": [ + "traj-263df466-1bd9-490d-8225-69fa64b10c8b", + "traj-3a6196d2-f94e-4149-bc8c-4637d1939a79", + "traj-4c2cf6fb-dd03-477f-8570-932f2254af01", + "traj-81cc8735-94cd-4a94-99df-19ce67d282a1", + "traj-91245c94-2620-4ee2-ab0a-20542cc83bb5", + "traj-91fcb473-9f0e-4187-a3a1-84454daea52e", + "traj-ad751016-d9a6-4464-9286-694dc8f9272e", + "traj-b62924a5-5ccb-40e6-87d7-bc42bad5ad4f", + "traj-b9223b5f-8dca-4965-a4cd-08e060a91542", + "traj-d453b4a0-666c-4ea9-a930-faaef9bf8f4b" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-212115", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-d46767dc-7142-47c7-9456-44db4df475a7.json b/docs/training-reports/report-d46767dc-7142-47c7-9456-44db4df475a7.json new file mode 100644 index 0000000..6008c45 --- /dev/null +++ b/docs/training-reports/report-d46767dc-7142-47c7-9456-44db4df475a7.json @@ -0,0 +1,41 @@ +{ + "report_id": "report-d46767dc-7142-47c7-9456-44db4df475a7", + "timestamp": "2026-04-14T18:05:53.292364+00:00", + "source_trajectory_ids": [ + "traj-055b1e88-d24e-4b66-8638-4ac1afbb7193", + "traj-069542e2-fbf2-478a-845e-43094feb4ce1", + "traj-306f78fb-0151-4f58-9ae5-4959fc2ea2a6", + "traj-3b806f93-2ed5-4961-b742-5d3d21a7acf4", + "traj-4e391bc5-c6bf-461b-ba26-aadbb0de214a", + "traj-5c096700-bfca-4f6a-9abc-af1d895cd62f", + "traj-6d99cef7-f649-4ed5-9915-76717baba1a1", + "traj-8a0d9f86-076e-44fe-b0d8-9b879b8db845", + "traj-bdd8ffd2-5bb1-4c4a-ba73-7f5b1f2703be", + "traj-d87d887a-96f2-403c-a5a9-fdfcf7cc846c" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-180553" +} \ No newline at end of file diff --git a/docs/training-reports/report-d5d4708a-a9c1-4e32-a1eb-80a156055374.json b/docs/training-reports/report-d5d4708a-a9c1-4e32-a1eb-80a156055374.json new file mode 100644 index 0000000..b2c75e2 --- /dev/null +++ b/docs/training-reports/report-d5d4708a-a9c1-4e32-a1eb-80a156055374.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-d5d4708a-a9c1-4e32-a1eb-80a156055374", + "timestamp": "2026-04-15T01:29:18.250752+00:00", + "source_trajectory_ids": [ + "traj-239200f5-a6ad-4189-bad1-cc34d0ec57b5", + "traj-8a3f24bf-578a-472e-bbb1-3e2c701d1bfc", + "traj-9fb35b1f-b9db-4bed-9fcc-6d812b8eefaa", + "traj-a6180f6a-42b5-4cda-886b-2c50c9056f31", + "traj-a80dad6a-e710-4a45-ab31-151ffa342603", + "traj-b47c1410-ad27-449b-b446-905bd47ce262", + "traj-b640d93e-1258-4c59-9157-ab6c6d1d95d8", + "traj-cb46a313-1a0c-4be9-b757-9d122db50752", + "traj-ce579216-ac1c-40a3-9fe7-500d6cad014f", + "traj-eb4c2c46-7e44-44a6-9aa5-1b909cfc7dd4" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-d607641c-8460-44ea-b9f0-1fbb9ca0dad5.json b/docs/training-reports/report-d607641c-8460-44ea-b9f0-1fbb9ca0dad5.json new file mode 100644 index 0000000..23fbb9b --- /dev/null +++ b/docs/training-reports/report-d607641c-8460-44ea-b9f0-1fbb9ca0dad5.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-d607641c-8460-44ea-b9f0-1fbb9ca0dad5", + "timestamp": "2026-04-14T21:44:48.170031+00:00", + "source_trajectory_ids": [ + "traj-132ee788-dba2-4fe7-ace2-f86e4e58e012", + "traj-32a84d4f-8c0b-4f35-aaac-3f90b3e3aa3e", + "traj-3868936e-a6bb-4b79-bdcb-2818757709a8", + "traj-391c8367-54ab-464b-84a4-b87c18a398cf", + "traj-4c2774f5-414b-47fb-942e-bcd42b3dfd7f", + "traj-6d5eb242-82dd-4c6c-8b42-b7491b15c83d", + "traj-94718bcc-1b5e-4651-ad96-6e00a9be07bd", + "traj-c344578f-70c3-490a-b813-4e4ce9dbf99c", + "traj-e88dda13-55bf-4428-abe7-de91a2b40d20", + "traj-fd02e009-81c4-4280-a3e5-9405570c36c1" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-214448", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-d629a53f-1527-4984-ad23-a344dd0ee965.json b/docs/training-reports/report-d629a53f-1527-4984-ad23-a344dd0ee965.json new file mode 100644 index 0000000..9ca01ed --- /dev/null +++ b/docs/training-reports/report-d629a53f-1527-4984-ad23-a344dd0ee965.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-d629a53f-1527-4984-ad23-a344dd0ee965", + "timestamp": "2026-04-14T22:10:23.241693+00:00", + "source_trajectory_ids": [ + "traj-26a8e8b2-d28f-482d-bb80-153f522871bd", + "traj-2905e431-4574-4e8f-a8f8-34aa87bcff50", + "traj-41318a60-642e-4c1b-bbaf-51f083dd0063", + "traj-62037c5e-06d5-4319-803b-5769f511f89a", + "traj-71ae1c04-b498-4a33-94af-ec86e9e198b0", + "traj-b578d761-010a-4851-bff5-258ed555a9e8", + "traj-c3a05916-8867-4733-99a1-f3f83ff6aae6", + "traj-ec177665-1bad-42b6-b316-ee9b28205f4a", + "traj-f97e531f-e0a2-4bef-b6bc-bdc777029898", + "traj-fdb67019-e369-4a27-a150-c17ee293a493" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-221023", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-d71aa4f7-2e1e-4bae-b564-3f9fc6c6cd3d.json b/docs/training-reports/report-d71aa4f7-2e1e-4bae-b564-3f9fc6c6cd3d.json new file mode 100644 index 0000000..7ff9a0a --- /dev/null +++ b/docs/training-reports/report-d71aa4f7-2e1e-4bae-b564-3f9fc6c6cd3d.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-d71aa4f7-2e1e-4bae-b564-3f9fc6c6cd3d", + "timestamp": "2026-04-15T01:33:34.954131+00:00", + "source_trajectory_ids": [ + "traj-0d72cd8c-e2fa-4096-a8ec-9e2bb8c17761", + "traj-3e3680b5-5eca-4f19-b92a-d2b6819fb0d1", + "traj-7429f563-4a87-49aa-b8c3-d4a37535dcfc", + "traj-77212f26-a4d8-41b3-9a49-9dc59e01d192", + "traj-8de21233-aef7-46f8-a320-b0c7892cdb0d", + "traj-8e287f1a-b36e-4394-b333-a391248606ac", + "traj-9f1cdd1a-64dd-4e8f-aaf6-0c891dc2fbd3", + "traj-b4ef0eca-88c9-4bda-8599-5846267e0536", + "traj-c41750ac-d969-47de-8375-694911136b5b", + "traj-df7662d9-c080-43b8-b86b-a2452b120f9c" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260415-013334", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-d7b956e1-c2c8-4105-9765-2c11a59903c2.json b/docs/training-reports/report-d7b956e1-c2c8-4105-9765-2c11a59903c2.json new file mode 100644 index 0000000..c842607 --- /dev/null +++ b/docs/training-reports/report-d7b956e1-c2c8-4105-9765-2c11a59903c2.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-d7b956e1-c2c8-4105-9765-2c11a59903c2", + "timestamp": "2026-04-14T17:16:23.861811+00:00", + "source_trajectory_ids": [ + "traj-5771e541-b118-49dc-9a61-cd498528ea45", + "traj-67b4ae15-c4a8-439b-b5af-e137fc243746", + "traj-7210de2c-aaaa-4bcb-a2e4-51585e1e2867", + "traj-72d6eb97-1251-4017-92bd-6e8030293d32", + "traj-84fada4a-85b7-4e67-a54f-1481c33d20c6", + "traj-972862cf-1d1e-46b9-8510-5314d5c42da5", + "traj-aedce72f-34fb-4d38-9c5d-2b4d8b0e8961", + "traj-d5ecb440-0907-419e-9d6d-024e703168da", + "traj-db015c45-fd5a-445a-b5fb-07a14e0d764e", + "traj-e923acdf-8fbc-4164-bd3b-50b633f273a6" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null +} \ No newline at end of file diff --git a/docs/training-reports/report-d8a735d5-1511-4867-ae34-d5450578b4eb.json b/docs/training-reports/report-d8a735d5-1511-4867-ae34-d5450578b4eb.json new file mode 100644 index 0000000..17da59f --- /dev/null +++ b/docs/training-reports/report-d8a735d5-1511-4867-ae34-d5450578b4eb.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-d8a735d5-1511-4867-ae34-d5450578b4eb", + "timestamp": "2026-04-14T20:33:28.544173+00:00", + "source_trajectory_ids": [ + "traj-05634505-966b-4ca5-bee5-2acc0f375da9", + "traj-356893fa-ddfb-4cee-bc10-2c80601a964a", + "traj-5de517e3-4302-45d5-ad1a-8995423cb0d8", + "traj-6095abea-d8e2-4f2c-ad89-1a8ffc170286", + "traj-86b7818f-5f1c-4c2c-ab23-5facae4b6861", + "traj-b37c76d7-bbd0-47b0-88f8-054667e29d70", + "traj-c0483a4c-2f25-4c22-9f8d-72846331c765", + "traj-dda694aa-dc3f-4b5e-9f0e-9721b1d04905", + "traj-de3c0563-fcd6-4499-a349-1679468d6ab8", + "traj-fc6e6767-f20e-44c6-a2a1-3807426c1e61" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-203328", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-da9f5483-f9e7-45b1-a5a6-8293c3fa1997.json b/docs/training-reports/report-da9f5483-f9e7-45b1-a5a6-8293c3fa1997.json new file mode 100644 index 0000000..7381412 --- /dev/null +++ b/docs/training-reports/report-da9f5483-f9e7-45b1-a5a6-8293c3fa1997.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-da9f5483-f9e7-45b1-a5a6-8293c3fa1997", + "timestamp": "2026-04-14T20:04:58.852352+00:00", + "source_trajectory_ids": [ + "traj-3e4bd63d-9d95-4c40-add0-d4a2bfe7b522", + "traj-4d0ff1e6-774b-4f7f-992e-3b8e0ab255ad", + "traj-5ff66524-3a06-4d48-8252-7fac0cd6d9af", + "traj-9c02aa8c-185e-4bb7-b7bc-4d4bdb5889ef", + "traj-b7459c9c-73d3-430f-aeed-2371437f31d4", + "traj-bfab0455-55af-4c50-b7fd-0fda053de88e", + "traj-c4b55069-fa9d-48f0-b949-7481d22a16fb", + "traj-e137f479-cd23-4b44-94aa-4fb4119fdac9", + "traj-e5a044db-fd31-4e93-b533-12ce84634bd9", + "traj-ff43d8b5-55ff-4491-81b3-7429fa9c3455" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-daf13edb-5a29-462a-b09c-f5f3ca73e757.json b/docs/training-reports/report-daf13edb-5a29-462a-b09c-f5f3ca73e757.json new file mode 100644 index 0000000..b357dd4 --- /dev/null +++ b/docs/training-reports/report-daf13edb-5a29-462a-b09c-f5f3ca73e757.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-daf13edb-5a29-462a-b09c-f5f3ca73e757", + "timestamp": "2026-04-15T02:33:47.842875+00:00", + "source_trajectory_ids": [ + "traj-1020f220-92a2-4654-baa6-72e52ec71053", + "traj-47295e54-e0dc-4ef5-82f4-96954496198d", + "traj-4db1eccf-61c3-4333-8345-eddacc8e27cc", + "traj-5eadd414-270d-455b-8df9-a827f8f5c585", + "traj-75ce6d90-cf75-4852-afb1-f42bf31120a6", + "traj-7ebb66f4-5ade-4815-b147-33179c8d0cfe", + "traj-823c8d29-ee1f-4701-a18f-bd7a847452c7", + "traj-8df5ba18-78e0-4546-913b-01b887bf1015", + "traj-e06e46d9-485c-43ec-a2d8-7c166781dff6", + "traj-f857f2ad-8f5a-4770-ae6f-1d07d91e2be3" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-db8988e0-a578-4fe7-9cdb-4676e7df9533.json b/docs/training-reports/report-db8988e0-a578-4fe7-9cdb-4676e7df9533.json new file mode 100644 index 0000000..1be9d3b --- /dev/null +++ b/docs/training-reports/report-db8988e0-a578-4fe7-9cdb-4676e7df9533.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-db8988e0-a578-4fe7-9cdb-4676e7df9533", + "timestamp": "2026-04-14T22:10:15.259647+00:00", + "source_trajectory_ids": [ + "traj-05daae2e-2ddf-451c-bbed-0a541389b238", + "traj-3cc8cddd-56b4-4e19-aa8d-607a780d51f9", + "traj-65595747-c587-473b-af09-536b4a1ed09f", + "traj-87ffdde5-e669-4c1b-96f1-0750f43c2e72", + "traj-9e62f3ea-6852-425e-9495-1a4f4f434834", + "traj-a7b4ed18-5bfa-4c66-bbd5-ba68b1faf8d9", + "traj-b2d30603-6c44-417b-9ffe-fc10ca98cd85", + "traj-bba3a9f1-fe90-49c1-b589-bba3492b7a6e", + "traj-eee3a4a3-74be-4b5a-940c-3b28df374b3a", + "traj-f08030e2-1ddc-4447-9546-ea1ade988688" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-221015", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-deb97363-3e48-4492-b1fe-81c5b2f0cbe9.json b/docs/training-reports/report-deb97363-3e48-4492-b1fe-81c5b2f0cbe9.json new file mode 100644 index 0000000..8a76f1d --- /dev/null +++ b/docs/training-reports/report-deb97363-3e48-4492-b1fe-81c5b2f0cbe9.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-deb97363-3e48-4492-b1fe-81c5b2f0cbe9", + "timestamp": "2026-04-14T20:07:38.768519+00:00", + "source_trajectory_ids": [ + "traj-0b76be8b-2770-49d3-b2f6-d7aaf678fdda", + "traj-63111548-ac20-481f-908e-fb426bfa3000", + "traj-73abe76c-84e9-42b2-baa4-6eb180186c38", + "traj-73b8a06f-e93f-4282-a10b-f1bd4f83d317", + "traj-770a367b-9cb8-4b13-aef8-78ccd6992052", + "traj-ad4a2a80-851a-4bac-92e7-26854f165be9", + "traj-cfc3d5e2-00d6-4394-a7f6-40128541dd79", + "traj-ec0c07bb-a8cb-4816-98ec-e749a98971e3", + "traj-ee41c065-c055-4797-8d3f-730e63732f0f", + "traj-ef5a45f8-be53-4450-812d-0bbd1fdb13d8" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-200738", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-df7c7b8d-f5a1-45e0-ac6d-8812a3cfdd82.json b/docs/training-reports/report-df7c7b8d-f5a1-45e0-ac6d-8812a3cfdd82.json new file mode 100644 index 0000000..85c53e5 --- /dev/null +++ b/docs/training-reports/report-df7c7b8d-f5a1-45e0-ac6d-8812a3cfdd82.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-df7c7b8d-f5a1-45e0-ac6d-8812a3cfdd82", + "timestamp": "2026-04-15T01:21:53.687956+00:00", + "source_trajectory_ids": [ + "traj-080955e6-3524-4392-aac5-36c7edabbf77", + "traj-16b8d58c-1d3d-4b2a-9600-c4107f1ba2cd", + "traj-4383bc8f-cdf0-46c7-b5ad-6eb4317b7761", + "traj-6de7e63e-ad6d-45f3-823f-07a37bbc1cee", + "traj-a9f6837a-bfdb-4438-aa70-f1bef125ff76", + "traj-e087b8ea-3d7d-41d7-91c7-371b55220151", + "traj-e45207bc-8291-471a-8b05-2aae927274e7", + "traj-f4cc6c86-e6ee-4a4b-8352-ecd52ee6bda4", + "traj-f6e18ddd-c91e-4da0-840c-cc78e953b8e2", + "traj-f9ea3c5f-7b2e-4c8d-bff7-64e56391c5e4" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260415-012153", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-df953479-f64a-4d9b-b437-0820f8ade9cb.json b/docs/training-reports/report-df953479-f64a-4d9b-b437-0820f8ade9cb.json new file mode 100644 index 0000000..ac70f52 --- /dev/null +++ b/docs/training-reports/report-df953479-f64a-4d9b-b437-0820f8ade9cb.json @@ -0,0 +1,41 @@ +{ + "report_id": "report-df953479-f64a-4d9b-b437-0820f8ade9cb", + "timestamp": "2026-04-14T18:05:15.780624+00:00", + "source_trajectory_ids": [ + "traj-224d7e4b-dee5-459e-94dc-41a7c5d1cafe", + "traj-3be09bf7-ccd8-4e5a-9d16-43e0dd11cec4", + "traj-44741aba-78ff-429e-87ae-13a3f4178304", + "traj-46518a93-9201-4745-92c6-799205b18a1d", + "traj-4f5ca0b0-e9fc-44c7-a997-1c690cc69095", + "traj-50b3c733-51a6-47d6-89b9-cdaec97cce2b", + "traj-791b50b2-c319-425c-91cd-76fa51a09df1", + "traj-888930f8-507f-408d-98e7-a331705ae548", + "traj-9de333f2-5263-4c42-b3ac-870836328a9c", + "traj-d76fb56e-ebc2-4a33-a298-5000ad94778f" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-180515" +} \ No newline at end of file diff --git a/docs/training-reports/report-e0290bbc-50f9-4284-b44e-953f7b686a87.json b/docs/training-reports/report-e0290bbc-50f9-4284-b44e-953f7b686a87.json new file mode 100644 index 0000000..889bb4e --- /dev/null +++ b/docs/training-reports/report-e0290bbc-50f9-4284-b44e-953f7b686a87.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-e0290bbc-50f9-4284-b44e-953f7b686a87", + "timestamp": "2026-04-15T01:25:33.972369+00:00", + "source_trajectory_ids": [ + "traj-190b4a5c-bdac-4aff-94cd-0e28366ce081", + "traj-3beb0139-7c6c-493e-a6d9-a02264781d7c", + "traj-4e922a44-282e-47ae-8e45-741189d712d2", + "traj-5dc65f8c-ea10-4291-a7d2-9b4a784c3a48", + "traj-6ee5e2fa-2d80-49bf-8216-d5582320bcc0", + "traj-887c6376-9df1-47d6-a9f0-84df3de7aaab", + "traj-8a3b3a1c-73bd-449e-a090-a688246361e9", + "traj-af0dcd29-6552-4092-b9a6-3daf4580d06b", + "traj-c46cff1d-05f9-4233-a7eb-6c89e019c827", + "traj-fc51fe5c-b4d1-4245-88c8-0fa34c5b4820" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260415-012533", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-e0d5b2d1-9c97-4e39-938e-fa26725a38ea.json b/docs/training-reports/report-e0d5b2d1-9c97-4e39-938e-fa26725a38ea.json new file mode 100644 index 0000000..97b1e1c --- /dev/null +++ b/docs/training-reports/report-e0d5b2d1-9c97-4e39-938e-fa26725a38ea.json @@ -0,0 +1,41 @@ +{ + "report_id": "report-e0d5b2d1-9c97-4e39-938e-fa26725a38ea", + "timestamp": "2026-04-14T17:38:32.766277+00:00", + "source_trajectory_ids": [ + "traj-066155cb-7006-450d-be8c-38f5c0551da5", + "traj-0fa878bc-b6a3-4642-83ff-41148eb173ad", + "traj-2f30ecb8-f2fe-4d2b-997c-8852b6fd33b0", + "traj-47893dab-ad1d-4778-a5b1-c51835d34a60", + "traj-5ba0e654-2be3-407c-85a8-358be8b81d09", + "traj-681cab8d-832c-487d-9dff-ba2d7016a72a", + "traj-753ac1b1-416c-429a-a6d3-01d8eac1d125", + "traj-beafa3ca-5c96-4449-839a-ce18c14c641e", + "traj-cd97d02c-56b4-479f-83cd-0a1541cbcbec", + "traj-e2a7ff20-3e59-4ec1-a4b3-a7d8781d81a8" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-173832" +} \ No newline at end of file diff --git a/docs/training-reports/report-e16add80-5ba4-44b5-9432-56da24eb0ffa.json b/docs/training-reports/report-e16add80-5ba4-44b5-9432-56da24eb0ffa.json new file mode 100644 index 0000000..870ec62 --- /dev/null +++ b/docs/training-reports/report-e16add80-5ba4-44b5-9432-56da24eb0ffa.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-e16add80-5ba4-44b5-9432-56da24eb0ffa", + "timestamp": "2026-04-14T22:08:57.425891+00:00", + "source_trajectory_ids": [ + "traj-137adbba-3299-4e2a-80bc-dea7feddd5f7", + "traj-1f6bee3f-e284-4ab5-9137-7f3edab1ef3c", + "traj-2ea814f1-5344-465e-bb79-be735b66dd8d", + "traj-66cffda6-b6ee-498b-b8bb-cdbb00e85004", + "traj-916df637-b8bf-4330-b800-d1085ebf8671", + "traj-ad7652bf-b165-4da4-b365-28a848fd95b5", + "traj-d1a1e96b-9858-4b7c-a1a6-136eb3db8c3f", + "traj-de3bc12f-5083-4544-b3b1-d3f3285e37af", + "traj-f24ae176-1313-4812-86c9-e6ff010967fa", + "traj-fd65a143-4f58-4cd4-95be-bc832d7991f9" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-220857", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-e1e80ed1-a1c1-4853-a7c6-2a64a7298bfe.json b/docs/training-reports/report-e1e80ed1-a1c1-4853-a7c6-2a64a7298bfe.json new file mode 100644 index 0000000..058d943 --- /dev/null +++ b/docs/training-reports/report-e1e80ed1-a1c1-4853-a7c6-2a64a7298bfe.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-e1e80ed1-a1c1-4853-a7c6-2a64a7298bfe", + "timestamp": "2026-04-15T01:33:34.877747+00:00", + "source_trajectory_ids": [ + "traj-09fe4d7c-5710-4124-8e32-4f2a5f2acc33", + "traj-0ebd77c5-f37c-4dff-a674-5681833cef6d", + "traj-2a5f1f91-44b1-4f46-892f-17c17dbb3e4a", + "traj-5ee5bcc4-4b7f-4d87-8f45-00216a4988b5", + "traj-7a0eda78-1571-4600-97eb-f2038850125f", + "traj-8587d4ad-d609-41b1-8c1f-22b1db7b9f4e", + "traj-b62704ca-b248-4d6d-aa6e-95b5cb202311", + "traj-b64969d6-2863-4bf1-acd7-14b1b445d276", + "traj-bf0fd2a9-03a6-4e7c-920a-20d38586ae1a", + "traj-fff422bb-122b-4c00-996b-be99e5ee72de" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-e417488b-5ae1-453c-a778-048212c656be.json b/docs/training-reports/report-e417488b-5ae1-453c-a778-048212c656be.json new file mode 100644 index 0000000..6e29e12 --- /dev/null +++ b/docs/training-reports/report-e417488b-5ae1-453c-a778-048212c656be.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-e417488b-5ae1-453c-a778-048212c656be", + "timestamp": "2026-04-14T22:05:59.262693+00:00", + "source_trajectory_ids": [ + "traj-00ea1a3d-6882-4346-9644-003aeb653785", + "traj-340a9b1a-f6a1-4096-9e39-880299c5c17b", + "traj-5c44a81a-2c38-4de5-91b9-cadcab8bf75e", + "traj-7d181b97-19e6-4f2d-954f-e0d2bf05d3f5", + "traj-7e6381ae-9f4a-410b-9389-0bbcb0f5006b", + "traj-81b00183-53f7-44d6-aa30-6cc5bf18498e", + "traj-a060dbb8-3c32-4bbc-b22a-be280ca4abec", + "traj-a66d985e-a7b7-4c12-8c0c-16cad34dc1f8", + "traj-c3b2bdfe-fa31-4bf4-8356-80ce021888e4", + "traj-e2b08721-e6b2-4cdb-93b6-4332eaa24f7e" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-e5d1ccc5-92e7-4b4e-970d-3c295320874d.json b/docs/training-reports/report-e5d1ccc5-92e7-4b4e-970d-3c295320874d.json new file mode 100644 index 0000000..05dd838 --- /dev/null +++ b/docs/training-reports/report-e5d1ccc5-92e7-4b4e-970d-3c295320874d.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-e5d1ccc5-92e7-4b4e-970d-3c295320874d", + "timestamp": "2026-04-14T20:06:16.329964+00:00", + "source_trajectory_ids": [ + "traj-00cd4485-1631-4d40-8808-9c58ff380031", + "traj-01cbcaf2-a53a-4d43-9147-fc68e054ef42", + "traj-4dbf5b3b-687e-4376-8ec5-5c2a74dedd1d", + "traj-6aeabfa2-c245-4687-8d69-45bcffc27c9a", + "traj-a2f6af6e-ec0b-4117-9bc6-a77d6982b811", + "traj-a47370d1-89ec-4f55-841c-cdcd4a2a5807", + "traj-a69dc48e-b005-490e-ad65-812a0bc4bdf6", + "traj-ca5a2a5d-5b1e-4ed5-9239-5b528d2fa1ed", + "traj-cb7942a4-a3c7-40ff-ad16-4a32b71bd14d", + "traj-ffd0712c-7a9b-414c-96f2-a611100cb668" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-200616", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-e691ec63-98f9-4239-820c-5bf6d57de243.json b/docs/training-reports/report-e691ec63-98f9-4239-820c-5bf6d57de243.json new file mode 100644 index 0000000..31ca5c0 --- /dev/null +++ b/docs/training-reports/report-e691ec63-98f9-4239-820c-5bf6d57de243.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-e691ec63-98f9-4239-820c-5bf6d57de243", + "timestamp": "2026-04-14T15:25:30.542190+00:00", + "source_trajectory_ids": [ + "traj-040359a5-00c1-4a34-a9fb-ba92c413b7b3", + "traj-0d41a9da-0c99-4eb2-9d3a-708b9b7b82db", + "traj-3df2f77a-e3ec-4417-8791-26ce0676f84a", + "traj-8e50fb5a-5cc0-4365-adbe-198bef7e4420", + "traj-8f0b868a-d9dd-4dbe-a02e-ce9b8c04c2ea", + "traj-c5783fab-637b-46e1-958c-5e92ed130ee9", + "traj-ce5196c6-665c-4279-8985-13bc6537cf00", + "traj-db9d61be-2302-48d0-a646-c0f2308789b1", + "traj-e72c549b-6f44-48cf-b2df-0c236e44cfa6", + "traj-ea4ec480-c25d-477e-9f46-79783b2bf7c4" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null +} \ No newline at end of file diff --git a/docs/training-reports/report-e6964e86-8d88-44b5-b71f-646d511c1760.json b/docs/training-reports/report-e6964e86-8d88-44b5-b71f-646d511c1760.json new file mode 100644 index 0000000..510ae48 --- /dev/null +++ b/docs/training-reports/report-e6964e86-8d88-44b5-b71f-646d511c1760.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-e6964e86-8d88-44b5-b71f-646d511c1760", + "timestamp": "2026-04-14T22:05:59.127680+00:00", + "source_trajectory_ids": [ + "traj-2809c098-f8eb-4033-b427-a1877fae79ce", + "traj-2a4ac1d6-e8b4-4470-b6df-5cf001962453", + "traj-79d8dcbd-2ffd-4052-9851-4606f29a7924", + "traj-89c138ae-808c-4e52-a721-dcdb1d5ad6cb", + "traj-a24b86ef-0afd-48df-be9c-fb4a0c2c32fc", + "traj-abf933d5-9e37-478c-ad3a-ec5292d3f091", + "traj-acb50486-6301-49af-9dc1-38ea94a3671a", + "traj-b3caf571-6c26-4fd9-83fc-7e6ad2142f90", + "traj-c9179c98-1c94-4df9-b829-46bea8866bc6", + "traj-e85ff765-bb03-4ddd-8907-5f3a460e5ee1" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-220559", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-e7c8c800-0285-451c-a381-9d973b09a28d.json b/docs/training-reports/report-e7c8c800-0285-451c-a381-9d973b09a28d.json new file mode 100644 index 0000000..dd7654b --- /dev/null +++ b/docs/training-reports/report-e7c8c800-0285-451c-a381-9d973b09a28d.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-e7c8c800-0285-451c-a381-9d973b09a28d", + "timestamp": "2026-04-14T16:53:59.603802+00:00", + "source_trajectory_ids": [ + "traj-021fe9c7-2219-4720-a611-00be43fd45a5", + "traj-3057337f-dfa6-4cad-9a75-0fd7578ad26e", + "traj-4be526b9-7ece-4472-969f-958300b7de46", + "traj-5d648d76-9044-4bfb-8698-5d82edaaae41", + "traj-8d9f783d-4a01-4336-b021-34e0619afa39", + "traj-bf8c7b99-aed9-4c4f-8fb8-779b87ddc8f7", + "traj-dfff1285-8e2a-4cac-8f2b-a75ebb7caae6", + "traj-ec2344f9-6741-4806-8b0c-1de561cf4b3d", + "traj-f0f0723d-da65-4db5-a8ac-8cc96fafe09d", + "traj-ff4b3b70-e611-4736-8494-fc580cacb7cc" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null +} \ No newline at end of file diff --git a/docs/training-reports/report-e816962d-950c-44d5-9c56-84fbac38ad03.json b/docs/training-reports/report-e816962d-950c-44d5-9c56-84fbac38ad03.json new file mode 100644 index 0000000..66b6cf8 --- /dev/null +++ b/docs/training-reports/report-e816962d-950c-44d5-9c56-84fbac38ad03.json @@ -0,0 +1,41 @@ +{ + "report_id": "report-e816962d-950c-44d5-9c56-84fbac38ad03", + "timestamp": "2026-04-14T15:29:41.997128+00:00", + "source_trajectory_ids": [ + "traj-0f91f7d5-8baa-41af-acf4-89fd1912dadd", + "traj-363ee6c2-07ec-4694-bcc1-34f1f5be50fe", + "traj-3b44d3de-1632-4c52-9703-839eb64cd8c0", + "traj-497af52f-3274-4304-b628-f877c7ea9b05", + "traj-54d7eb22-d6ef-4e27-a62c-aa29ac1cad80", + "traj-7a4077ab-a811-41d3-88a5-d88ca6af6697", + "traj-c3f0b796-d860-4cb0-96d7-9908b4c6765a", + "traj-d0b1e945-a422-4d68-979d-92a86c6dc849", + "traj-d0f36666-a9b4-419c-8bb1-5e3a675331d5", + "traj-e0bc1956-9463-4d5a-bd67-da6f6b375604" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-152941" +} \ No newline at end of file diff --git a/docs/training-reports/report-e8c7784a-ae81-43d4-b452-69c1f3cd4b57.json b/docs/training-reports/report-e8c7784a-ae81-43d4-b452-69c1f3cd4b57.json new file mode 100644 index 0000000..ca66f8c --- /dev/null +++ b/docs/training-reports/report-e8c7784a-ae81-43d4-b452-69c1f3cd4b57.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-e8c7784a-ae81-43d4-b452-69c1f3cd4b57", + "timestamp": "2026-04-14T22:08:57.406399+00:00", + "source_trajectory_ids": [ + "traj-13684ccc-9d94-4432-9e30-bbc6c63867c3", + "traj-1572c18d-e850-40e5-a174-64077d127dee", + "traj-21d8d6cc-82cf-4dc9-9bf3-2ed4bee40057", + "traj-3ff1b908-b792-4706-b8b4-8c34dbeb478e", + "traj-91502573-f6b7-4242-8794-c4540083911a", + "traj-9db8409b-0701-46b2-9170-75be6ce920e5", + "traj-b6cff214-92d6-498a-adb0-69de05645bae", + "traj-de8f5c5b-a1b1-43f0-a462-1feb799b74d4", + "traj-ea05f470-b5b4-44b1-9550-1d511ff336ed", + "traj-ffaa1acf-fb13-4afe-a1fe-9e837c6eaab1" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-220857", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-e906bfbd-7e74-4fa0-b5fc-af95662e3cbb.json b/docs/training-reports/report-e906bfbd-7e74-4fa0-b5fc-af95662e3cbb.json new file mode 100644 index 0000000..6fc6b1f --- /dev/null +++ b/docs/training-reports/report-e906bfbd-7e74-4fa0-b5fc-af95662e3cbb.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-e906bfbd-7e74-4fa0-b5fc-af95662e3cbb", + "timestamp": "2026-04-14T20:30:08.670779+00:00", + "source_trajectory_ids": [ + "traj-102953fc-c25f-467b-aa45-df989baf932d", + "traj-32aaed3b-2202-4501-84c9-f37da80b2b61", + "traj-6462d1e8-96f7-4afa-bb9d-738cf14f5b78", + "traj-a0f3e55c-d632-4398-87d3-0e5ac29e0004", + "traj-b342e4ab-e4ec-47e3-9778-82ce5b2448cd", + "traj-c6e60f16-bddb-40cf-b802-7791797f1dd2", + "traj-cc0a72ed-b91f-477f-990f-1d1d5e49d4d7", + "traj-e5155977-eefa-4516-9d71-5c4e1e45e895", + "traj-ee2f8242-c041-4ed7-872e-76f174eb7b19", + "traj-f47630d3-e921-4255-9d54-31446b16e80c" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-203008", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-e90f9007-de11-4d53-bb49-02ea80f41f16.json b/docs/training-reports/report-e90f9007-de11-4d53-bb49-02ea80f41f16.json new file mode 100644 index 0000000..11f1643 --- /dev/null +++ b/docs/training-reports/report-e90f9007-de11-4d53-bb49-02ea80f41f16.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-e90f9007-de11-4d53-bb49-02ea80f41f16", + "timestamp": "2026-04-15T01:36:36.721484+00:00", + "source_trajectory_ids": [ + "traj-133b537d-e6ac-412b-b587-f1284e53871f", + "traj-2009aee1-92c6-46ed-a2e5-301b3d290aa2", + "traj-62e63e82-0507-4d50-b427-c1c98b7a413a", + "traj-9e44bc91-5a5f-45ab-8a4e-63c8f3db2863", + "traj-ac7b3556-f7fd-4851-a247-d711d861dca2", + "traj-acaf1d77-43b3-4126-b334-fc40b3acfc00", + "traj-b93d9b0f-4b85-4702-9434-a155fca3c42f", + "traj-d3c4168c-738c-429f-a905-696e99d70394", + "traj-d5ec2f13-247e-4145-aa3a-7fec0909763a", + "traj-d8413c3d-f807-49a1-9adf-2e085c285b92" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260415-013636", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-ea1359a3-de4d-41ea-bb38-f0686988dfad.json b/docs/training-reports/report-ea1359a3-de4d-41ea-bb38-f0686988dfad.json new file mode 100644 index 0000000..4c6bc59 --- /dev/null +++ b/docs/training-reports/report-ea1359a3-de4d-41ea-bb38-f0686988dfad.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-ea1359a3-de4d-41ea-bb38-f0686988dfad", + "timestamp": "2026-04-14T18:57:10.584479+00:00", + "source_trajectory_ids": [ + "traj-0f8f2837-462f-4e68-81e5-95269261f9c3", + "traj-15dd9093-770a-4618-8f81-a44802fe739a", + "traj-17fb2b75-5c45-40a6-b7fb-70f5232f13ab", + "traj-41782cfe-0342-4316-80f6-87068179a731", + "traj-6fd3bd56-8bfd-49e2-85a7-7879ed9edda9", + "traj-729c9408-62d8-4d1f-854a-fd4f6b3b724a", + "traj-90a72499-3dcb-42fd-8422-a1a5707fdfb7", + "traj-9872615d-1abf-4dc7-839d-6d7311da5b29", + "traj-ab17bef8-d6de-41cf-a20f-9322534a2e6a", + "traj-ceac8d83-5f7a-4b4e-83a4-db73f487446f" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-eb4bdf38-2de7-4fa3-9551-38db92dc4112.json b/docs/training-reports/report-eb4bdf38-2de7-4fa3-9551-38db92dc4112.json new file mode 100644 index 0000000..bb5f561 --- /dev/null +++ b/docs/training-reports/report-eb4bdf38-2de7-4fa3-9551-38db92dc4112.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-eb4bdf38-2de7-4fa3-9551-38db92dc4112", + "timestamp": "2026-04-14T19:21:09.906852+00:00", + "source_trajectory_ids": [ + "traj-3568d203-d9a8-457f-a795-f02cccd74226", + "traj-3efaca24-0feb-4713-bb93-1dcbe8f78e05", + "traj-43152ba0-8459-4c74-aede-f42320839358", + "traj-6d7a01bd-727d-4c61-9245-a52743781eca", + "traj-797b472e-ad48-460c-bfda-f96a8411c97c", + "traj-9070ba4b-6cac-4ac2-b2eb-086d070539e4", + "traj-b0513ac4-2a90-4c07-a6b2-1aba4640f4d0", + "traj-c7737f31-0869-47bb-9b9f-36e3650f4816", + "traj-cb2f8ab9-6fab-442b-af9a-e933a4e1857f", + "traj-f98e8f81-3dc5-4e33-9726-8b3b06fca5b3" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-eb61d92d-9e0b-42d8-88a4-3f9103d3e107.json b/docs/training-reports/report-eb61d92d-9e0b-42d8-88a4-3f9103d3e107.json new file mode 100644 index 0000000..7725639 --- /dev/null +++ b/docs/training-reports/report-eb61d92d-9e0b-42d8-88a4-3f9103d3e107.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-eb61d92d-9e0b-42d8-88a4-3f9103d3e107", + "timestamp": "2026-04-15T01:33:34.762735+00:00", + "source_trajectory_ids": [ + "traj-00bb52b0-1516-4293-9c3a-da247dcd0b2b", + "traj-3d122123-93ef-4f7f-9f27-2a98e4eb125b", + "traj-4e1c3b97-e24a-4659-a53a-31ca7f50317e", + "traj-5bd04398-7767-40c2-860a-4b9d5feb9d0d", + "traj-76065493-0f85-4e97-bed8-91c7157f5505", + "traj-9855c8b7-86e0-4a96-9835-12fa75e8d0e5", + "traj-a7fa856c-76fc-42ae-889f-6f9f18ba22de", + "traj-ab48e55f-3106-44ad-b6b1-88c9d5e44cca", + "traj-bc54320b-6e2a-41a8-a234-df11f3205bea", + "traj-d7b3eb1f-0f60-4f0b-ba1e-84d5de43ff6e" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260415-013334", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-ec16f041-8bb5-4b1d-9511-9e69bfac708e.json b/docs/training-reports/report-ec16f041-8bb5-4b1d-9511-9e69bfac708e.json new file mode 100644 index 0000000..3ceb6f3 --- /dev/null +++ b/docs/training-reports/report-ec16f041-8bb5-4b1d-9511-9e69bfac708e.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-ec16f041-8bb5-4b1d-9511-9e69bfac708e", + "timestamp": "2026-04-14T15:50:36.264321+00:00", + "source_trajectory_ids": [ + "traj-08aa6357-9b21-45b3-ac1c-9b28411977f8", + "traj-31a561de-fa5e-4198-8153-a514c6221614", + "traj-3f5ea542-7acd-4464-923c-bb9a7669c4ce", + "traj-487a4ee1-408d-412e-808e-0216683d12d1", + "traj-801ff984-f4be-45a7-a8bc-50549b44eda9", + "traj-b07bfe9f-97c8-4056-9570-97f890b43fef", + "traj-c1558560-0a56-4fe4-b1fe-03da1b894c8b", + "traj-d9b6253c-f864-4458-8aa9-8481b6f6dd3f", + "traj-ea2ed025-6f31-4714-a752-85b6e11b28f3", + "traj-ec0709c5-ab4c-4670-a4c7-539256f04ad1" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null +} \ No newline at end of file diff --git a/docs/training-reports/report-ec302092-0762-4d16-aebc-9876b6bc7a73.json b/docs/training-reports/report-ec302092-0762-4d16-aebc-9876b6bc7a73.json new file mode 100644 index 0000000..4ba92f3 --- /dev/null +++ b/docs/training-reports/report-ec302092-0762-4d16-aebc-9876b6bc7a73.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-ec302092-0762-4d16-aebc-9876b6bc7a73", + "timestamp": "2026-04-15T01:29:18.092249+00:00", + "source_trajectory_ids": [ + "traj-0162689c-caf0-4ea3-94f2-ab2093ea43d0", + "traj-163f5c06-b304-4f1d-a39b-d1ce661ab37d", + "traj-4ce69f4d-3f69-408e-9539-8e72db29b69c", + "traj-5e496f48-0f92-4a3d-b11a-cedf1b00dd63", + "traj-73bab35f-f2a9-44e6-b2e0-d2afbfb23ee0", + "traj-b7712775-b9a0-4a34-ba53-bc3da7555f80", + "traj-bd3c6896-8ac1-43be-82e1-35cdef359fb9", + "traj-c659e864-7823-4171-a435-495bd0cfabac", + "traj-dd684833-24e4-4fff-84c6-ac60c39dd99e", + "traj-e165e57d-c47a-4106-a714-e6668bfe19b5" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260415-012918", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-ece2bad2-a0c3-4e1c-8a0f-3ef7d117a020.json b/docs/training-reports/report-ece2bad2-a0c3-4e1c-8a0f-3ef7d117a020.json new file mode 100644 index 0000000..d4ffe9c --- /dev/null +++ b/docs/training-reports/report-ece2bad2-a0c3-4e1c-8a0f-3ef7d117a020.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-ece2bad2-a0c3-4e1c-8a0f-3ef7d117a020", + "timestamp": "2026-04-15T01:21:53.722270+00:00", + "source_trajectory_ids": [ + "traj-09dd77dc-cb65-4842-ae03-2f122122a4a8", + "traj-1d0a6fd7-e0b1-4b46-84ee-20efcc7068e0", + "traj-2d3ef0e1-b3f0-4489-90a6-fae5ff89612f", + "traj-5abdeaf3-5e14-4ec8-8ecc-d6747aef72ca", + "traj-6224679f-fde2-4e8d-9b79-ecca1a069fdd", + "traj-6c13be64-cf54-4193-bbb5-ef24fbd7807a", + "traj-8e38d5b7-2b09-4e89-8aaf-f569f26af6be", + "traj-b08fb543-4b37-4f0d-ae4a-c9456c27a82f", + "traj-db14df9c-e710-43d4-8458-bfe644b901ce", + "traj-fd6b49e0-fa1a-4149-84ae-c1259715af98" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260415-012153", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-edc77cfc-4d81-4c19-af38-251cab92970d.json b/docs/training-reports/report-edc77cfc-4d81-4c19-af38-251cab92970d.json new file mode 100644 index 0000000..877801f --- /dev/null +++ b/docs/training-reports/report-edc77cfc-4d81-4c19-af38-251cab92970d.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-edc77cfc-4d81-4c19-af38-251cab92970d", + "timestamp": "2026-04-15T02:33:47.711827+00:00", + "source_trajectory_ids": [ + "traj-11a04d19-980c-4253-bc08-dc0b12d64997", + "traj-508b04f7-0766-406b-98fd-06f3e646cb9f", + "traj-77b5433a-8c6a-4641-b874-c2d9ff3e106e", + "traj-8b92455b-c396-4db5-853b-ddfea0490004", + "traj-a9ddc0cd-b1d4-4bdf-a6df-c439bc1578e9", + "traj-c2aaca11-d2f9-4c22-937e-ccbe63e8813b", + "traj-cb03e2f0-4727-4aac-9b3d-e85a34debe3f", + "traj-e1af53f8-c308-4cf6-a5a2-430858cdd615", + "traj-ea713056-2e30-43e6-a7b5-5fb21bad52b7", + "traj-f53ce840-b5ad-45d1-bb3b-07161ad67487" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260415-023347", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-ef6332a7-6380-4bac-b564-07efde275167.json b/docs/training-reports/report-ef6332a7-6380-4bac-b564-07efde275167.json new file mode 100644 index 0000000..7784a33 --- /dev/null +++ b/docs/training-reports/report-ef6332a7-6380-4bac-b564-07efde275167.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-ef6332a7-6380-4bac-b564-07efde275167", + "timestamp": "2026-04-14T21:21:15.079182+00:00", + "source_trajectory_ids": [ + "traj-1eba9860-e518-47dd-a740-fdab80c52396", + "traj-2a0349b1-6962-47df-bef1-d8ad96ffc420", + "traj-3115b21b-166d-4491-a11c-f4964af6a33f", + "traj-3e7be7f5-3de5-48ab-b1d2-8d648af7f4e6", + "traj-3eca0352-8b8c-4b4e-8c62-c844caae2a3a", + "traj-3fa32280-6653-488d-8c88-f6f9ea771191", + "traj-59b08da1-87fc-4739-ad8a-de335000ed7a", + "traj-64807cf0-c788-4627-b76b-798aca232a74", + "traj-760bf2da-d4a5-4e2e-844a-3985ef77895d", + "traj-c33fd39d-479c-421f-abf5-39704848038e" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-212115", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-f2e9a5a6-2e1e-479f-b2ed-596df0646cc7.json b/docs/training-reports/report-f2e9a5a6-2e1e-479f-b2ed-596df0646cc7.json new file mode 100644 index 0000000..338b06c --- /dev/null +++ b/docs/training-reports/report-f2e9a5a6-2e1e-479f-b2ed-596df0646cc7.json @@ -0,0 +1,41 @@ +{ + "report_id": "report-f2e9a5a6-2e1e-479f-b2ed-596df0646cc7", + "timestamp": "2026-04-14T15:25:05.984437+00:00", + "source_trajectory_ids": [ + "traj-001f5738-83e9-4dbc-aadf-7521abadfd40", + "traj-2d2e0090-3e96-427d-b74a-3e1189efb210", + "traj-5efe4bed-ff80-4c62-82ee-e324b9dd157c", + "traj-6283a946-ff0f-4360-8cff-40c3272c12b7", + "traj-702659a1-47b5-44a7-b402-c4fcae372e58", + "traj-7cf0b837-d9cf-43ee-afc2-81ca9eba0d6f", + "traj-9a14d982-2ee2-4023-98f3-4ae15bbd18bf", + "traj-be916cd5-8cb6-460d-98d9-067f4fad75f2", + "traj-ceafa0b8-210c-4bc9-b5aa-0192faaf3d48", + "traj-f4223a89-a6d7-4e5d-950e-c982243f87e3" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-152505" +} \ No newline at end of file diff --git a/docs/training-reports/report-f31d1eaf-ab94-4afc-93db-987f0c86d72f.json b/docs/training-reports/report-f31d1eaf-ab94-4afc-93db-987f0c86d72f.json new file mode 100644 index 0000000..0ae3d93 --- /dev/null +++ b/docs/training-reports/report-f31d1eaf-ab94-4afc-93db-987f0c86d72f.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-f31d1eaf-ab94-4afc-93db-987f0c86d72f", + "timestamp": "2026-04-15T01:29:18.310997+00:00", + "source_trajectory_ids": [ + "traj-043c0a2b-fe0c-47f7-942e-37cd9f1a2c52", + "traj-2b197bf7-fbf9-49c5-a7d6-bab8fc42eb17", + "traj-3c404c3d-bb3b-4069-adf2-23cc85c31953", + "traj-53a6e6a3-0fae-49fe-b1bd-f1d246934bea", + "traj-55cbd2c3-e0c3-4258-8f1b-02c7bf2a722b", + "traj-895ae645-9e2e-4239-b26d-136f3c7a6169", + "traj-a19458bf-6b48-46eb-b5b5-1a67d924faf4", + "traj-d438f7ce-b221-44da-9bb5-0e1bbf1bf08d", + "traj-d9182e76-e379-46f8-a250-5c46138f564c", + "traj-f6f774df-1db7-42e0-9bb0-f0aa3dbb4e9e" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260415-012918", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-f614c9d8-c842-4ded-9133-f12f7e335170.json b/docs/training-reports/report-f614c9d8-c842-4ded-9133-f12f7e335170.json new file mode 100644 index 0000000..8b4140f --- /dev/null +++ b/docs/training-reports/report-f614c9d8-c842-4ded-9133-f12f7e335170.json @@ -0,0 +1,41 @@ +{ + "report_id": "report-f614c9d8-c842-4ded-9133-f12f7e335170", + "timestamp": "2026-04-14T18:06:58.399935+00:00", + "source_trajectory_ids": [ + "traj-2fb36bd0-d817-49a5-b3e6-56438fda87f0", + "traj-60e6ea02-5d69-4f87-82dd-2d115a0fb374", + "traj-9200a377-e6ef-4d92-b179-0198e98d7223", + "traj-94fff4cb-960d-4888-8c5a-355139ff2c47", + "traj-9a524afd-a257-4358-9f6e-211598546789", + "traj-cfb9a25d-26b1-43e0-8d9a-5c8199e790e1", + "traj-d7d4c8be-12a4-45fe-81e2-d89d49e82222", + "traj-ec8805d1-ec26-40ec-bde2-fc84b15e5f23", + "traj-f1bd10c8-39ae-48bc-878f-7e858b116185", + "traj-f3f8e2b2-7312-465d-95e2-3cc557ce6630" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-180658" +} \ No newline at end of file diff --git a/docs/training-reports/report-f7cc3f6c-f59d-4e4d-9dca-fcb871db1610.json b/docs/training-reports/report-f7cc3f6c-f59d-4e4d-9dca-fcb871db1610.json new file mode 100644 index 0000000..cde1c0e --- /dev/null +++ b/docs/training-reports/report-f7cc3f6c-f59d-4e4d-9dca-fcb871db1610.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-f7cc3f6c-f59d-4e4d-9dca-fcb871db1610", + "timestamp": "2026-04-14T21:22:02.917939+00:00", + "source_trajectory_ids": [ + "traj-1da5679f-c0f8-490b-b08c-903522c31a0a", + "traj-48c2b7d6-6fe6-43db-9e92-bf845b2ab039", + "traj-7dc29339-2171-4562-af26-66ad33333cea", + "traj-7e2d006f-eab7-4a6f-b45f-3a1cd336f311", + "traj-8b18e462-7b9a-48d0-8106-27bc0da7b559", + "traj-8fb9dfcf-c792-4080-9897-23f71262ca58", + "traj-a258951e-5e02-4594-8600-8baa199f1cf3", + "traj-e593ee82-9029-4896-aac6-5599dc41975e", + "traj-e807c756-0bb9-474f-9664-4f3f1a01113d", + "traj-fc39007a-f9a8-4592-adb4-69e51ab99714" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-f83a4926-75db-4068-9162-e10b3f1e1f04.json b/docs/training-reports/report-f83a4926-75db-4068-9162-e10b3f1e1f04.json new file mode 100644 index 0000000..6cc3c72 --- /dev/null +++ b/docs/training-reports/report-f83a4926-75db-4068-9162-e10b3f1e1f04.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-f83a4926-75db-4068-9162-e10b3f1e1f04", + "timestamp": "2026-04-14T14:58:26.730544+00:00", + "source_trajectory_ids": [ + "traj-1c6f3ea2-dd6d-4f3d-86a8-7993c299bb17", + "traj-2210d72c-4be6-4c03-891e-17188094863f", + "traj-286fefbc-1242-4a01-82c6-504b524180e6", + "traj-3c66b440-7854-47fe-aa20-16015d87abc5", + "traj-6fe14a99-029d-4613-8ef1-5ff8f771b6d5", + "traj-73aa330c-05cc-4f69-b115-2884e9e71893", + "traj-9e60ffa0-2303-463d-bb2c-c1466ae52fc6", + "traj-a5454786-4882-4e76-9e9a-78f7402296c9", + "traj-b074de5f-5d0b-49e4-83aa-528d94881baa", + "traj-b21963be-1683-4287-b037-ff4d206a5583" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null +} \ No newline at end of file diff --git a/docs/training-reports/report-f86438b5-d5a5-45c9-9a02-edc6fc1beea6.json b/docs/training-reports/report-f86438b5-d5a5-45c9-9a02-edc6fc1beea6.json new file mode 100644 index 0000000..7fb69f2 --- /dev/null +++ b/docs/training-reports/report-f86438b5-d5a5-45c9-9a02-edc6fc1beea6.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-f86438b5-d5a5-45c9-9a02-edc6fc1beea6", + "timestamp": "2026-04-14T19:41:58.818358+00:00", + "source_trajectory_ids": [ + "traj-000c5d65-b669-4762-b1ec-ff532e4f175b", + "traj-2a5e6577-cce2-4dcb-8e6c-342882dc80be", + "traj-4a06eada-beb2-40a7-889f-65c30690ba8d", + "traj-4a752185-8f2b-41ed-b37e-e27a48f5cf65", + "traj-57c9952e-0e3c-4f15-b848-304fbdedd1af", + "traj-6113d0eb-95ed-4087-8761-93797aef2de7", + "traj-6f942715-d773-4dcb-8e17-390ad54e6a1c", + "traj-84022ea5-84d2-4bee-84c9-4145d1ce2252", + "traj-b09eb135-5dd2-45fc-931a-462b0820ad69", + "traj-cbd0f9e5-c314-474f-a89a-f5c08f7094c4" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-facd9bf0-010d-4367-acc7-9f3ecf45ff2a.json b/docs/training-reports/report-facd9bf0-010d-4367-acc7-9f3ecf45ff2a.json new file mode 100644 index 0000000..cc3699d --- /dev/null +++ b/docs/training-reports/report-facd9bf0-010d-4367-acc7-9f3ecf45ff2a.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-facd9bf0-010d-4367-acc7-9f3ecf45ff2a", + "timestamp": "2026-04-14T21:22:02.838599+00:00", + "source_trajectory_ids": [ + "traj-1e2f6e46-e1f7-4f85-96e5-2c0a402b4730", + "traj-68aa8a8d-0a4e-4b7d-9a9a-4dabdd9314ba", + "traj-7731fbbb-5783-4448-bbd8-33d490678333", + "traj-7b0500f4-5a59-4027-8158-163a8c418cf4", + "traj-8c978479-17bd-4a26-8f92-e96cfc9c0939", + "traj-b1c96539-10c6-4846-b1a8-60dc1bcd530b", + "traj-c3e65f27-e23f-4671-bad0-a93b45312dc7", + "traj-d337c078-637a-43ac-b40d-25cb7fd9c7a5", + "traj-e9f0dacb-86bd-49f9-a3eb-3dd58c8cd3b2", + "traj-fae22657-16c7-4936-8f9b-e68c7044fb87" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-212202", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-fb17f5e0-1dda-45bc-9b62-e81f630036fc.json b/docs/training-reports/report-fb17f5e0-1dda-45bc-9b62-e81f630036fc.json new file mode 100644 index 0000000..54c9db7 --- /dev/null +++ b/docs/training-reports/report-fb17f5e0-1dda-45bc-9b62-e81f630036fc.json @@ -0,0 +1,45 @@ +{ + "report_id": "report-fb17f5e0-1dda-45bc-9b62-e81f630036fc", + "timestamp": "2026-04-14T18:58:16.333603+00:00", + "source_trajectory_ids": [ + "traj-05893f5d-61e8-49bf-a396-dda96476bf3f", + "traj-20d60006-2a38-44f9-90a2-dce4d2f10f57", + "traj-2a867bbd-0dca-413f-917b-5c38013f8c34", + "traj-6abe7a50-b7f4-426a-99ea-84c6601b7cc6", + "traj-7464de1e-e8a9-4057-a457-9ce9f0b87fe4", + "traj-985f0d65-67ca-42d2-9ec8-93a97eb393df", + "traj-a3e1a3d3-6a9d-4d9c-9630-a14fc6e59364", + "traj-a5876914-5344-498d-92f2-e0be0345508c", + "traj-d63ccd70-74cd-4120-9cb4-3e81625440b2", + "traj-e0cdedd2-5bbb-4703-ad55-791637c624c0" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Reward delta 0.0000 below minimum 1.0" + ], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": null, + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-fbee2bc3-2d42-4054-8977-0de42e4f9677.json b/docs/training-reports/report-fbee2bc3-2d42-4054-8977-0de42e4f9677.json new file mode 100644 index 0000000..893d2a8 --- /dev/null +++ b/docs/training-reports/report-fbee2bc3-2d42-4054-8977-0de42e4f9677.json @@ -0,0 +1,41 @@ +{ + "report_id": "report-fbee2bc3-2d42-4054-8977-0de42e4f9677", + "timestamp": "2026-04-14T18:28:06.051428+00:00", + "source_trajectory_ids": [ + "traj-1806edd7-1665-464d-8fe8-81e08b68508c", + "traj-1853aaf9-58cd-482d-add6-20732f06d63f", + "traj-18d7697a-b1dc-4756-aee5-05f9abdc92a9", + "traj-5be27c08-5b53-45c6-9426-25bcc1c9e586", + "traj-aa39d764-ad5c-4f6f-b111-6865899d2039", + "traj-b5eb758a-12b7-4a88-8706-b0dd75af3682", + "traj-b73439b8-a2a2-4b95-a664-988d94633db9", + "traj-bc97b4da-12c5-4c11-9411-55c09801bb77", + "traj-c197926c-7548-4fc5-8411-eb524e80f8f4", + "traj-fd6129ca-2a05-4fad-b498-5b648ae60313" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-182806" +} \ No newline at end of file diff --git a/docs/training-reports/report-fc2280f9-4f45-4c80-863f-9f3230680adf.json b/docs/training-reports/report-fc2280f9-4f45-4c80-863f-9f3230680adf.json new file mode 100644 index 0000000..7389a8b --- /dev/null +++ b/docs/training-reports/report-fc2280f9-4f45-4c80-863f-9f3230680adf.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-fc2280f9-4f45-4c80-863f-9f3230680adf", + "timestamp": "2026-04-14T21:42:45.963717+00:00", + "source_trajectory_ids": [ + "traj-04ce8768-6cdf-4845-a641-4a5399053067", + "traj-6b1f94e1-effd-4194-be51-14fc8df6c591", + "traj-739ff5b8-6fd7-4626-81bc-6f8090a51af6", + "traj-7452c5c4-4024-465c-8c77-6dbd5ea25a92", + "traj-79cd3a80-9ae2-419a-9862-cd1d75791dab", + "traj-94e88578-9401-4e6f-af9d-1bdae819061f", + "traj-96f4c259-0f1f-441d-8804-4fc44cd9596c", + "traj-d5525102-21c8-4d3c-aaa5-c28be4f954a4", + "traj-fb5b223a-6284-4b29-88cd-4c54a4706a15", + "traj-feba8df4-9acc-423d-a9e6-d57fc11e8121" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-214245", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-fc5436d6-b262-4bd1-b49e-868ed99182ac.json b/docs/training-reports/report-fc5436d6-b262-4bd1-b49e-868ed99182ac.json new file mode 100644 index 0000000..a8eddb5 --- /dev/null +++ b/docs/training-reports/report-fc5436d6-b262-4bd1-b49e-868ed99182ac.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-fc5436d6-b262-4bd1-b49e-868ed99182ac", + "timestamp": "2026-04-15T02:31:17.315117+00:00", + "source_trajectory_ids": [ + "traj-44949aa6-dd35-4dd1-9559-e57ae1858444", + "traj-5e16afa7-2990-4fde-b21a-e34e6b1bd4d7", + "traj-7d92d69d-0718-4486-89e7-1b8646058403", + "traj-84ede30b-c5bf-43f4-85b7-0876cc551cb9", + "traj-bbbb4bfd-5a77-4784-a3fb-64b7ef345cff", + "traj-c17c9054-f6c6-4fe9-a146-81b927c1f422", + "traj-d53098c2-e164-4539-bc67-88625530eac5", + "traj-d8c78344-76c8-417d-bc40-135e6a6b132f", + "traj-de33c53a-a9e2-4015-8005-64b98d6fe5c1", + "traj-ee7a4e9f-bdf8-4a82-8829-fd02d60a045a" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260415-023117", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-fe3e7a38-a614-4a5d-966a-917f76bbb2a1.json b/docs/training-reports/report-fe3e7a38-a614-4a5d-966a-917f76bbb2a1.json new file mode 100644 index 0000000..0b13cf3 --- /dev/null +++ b/docs/training-reports/report-fe3e7a38-a614-4a5d-966a-917f76bbb2a1.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-fe3e7a38-a614-4a5d-966a-917f76bbb2a1", + "timestamp": "2026-04-14T19:21:09.845980+00:00", + "source_trajectory_ids": [ + "traj-008b6a44-c3f9-471c-877a-7d611030dc20", + "traj-07292070-2eb1-4f02-bf11-100203c05ee6", + "traj-1f81e7ba-64be-4375-bade-2be3a7a664ba", + "traj-295fcff4-376d-4070-af6b-373692f36397", + "traj-2da2ec0d-4ca4-4a1e-b506-9605a1f3b1e9", + "traj-595b8736-b78d-419f-9053-2159ce9120be", + "traj-76d980b2-94c8-4a9e-ab06-8ea3ba1e1714", + "traj-8a315c2a-cb9d-429e-a092-d7d0e1fd33b2", + "traj-d97d51a1-af38-46e2-9701-22721a93fc8a", + "traj-f2c72660-912a-467e-b2f9-a42253485446" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-192109", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-feb730d9-1798-4ca6-aa23-808735bcf1f0.json b/docs/training-reports/report-feb730d9-1798-4ca6-aa23-808735bcf1f0.json new file mode 100644 index 0000000..1aaf62c --- /dev/null +++ b/docs/training-reports/report-feb730d9-1798-4ca6-aa23-808735bcf1f0.json @@ -0,0 +1,41 @@ +{ + "report_id": "report-feb730d9-1798-4ca6-aa23-808735bcf1f0", + "timestamp": "2026-04-14T15:01:23.368695+00:00", + "source_trajectory_ids": [ + "traj-21260677-0787-42a3-b8bd-6c95997ae207", + "traj-433b056e-c53a-43c3-8dd8-67b13348b777", + "traj-4b742a19-8d3f-4f03-bd27-30a0037a6922", + "traj-5766dd21-89ea-410c-b7f9-683ec31c6688", + "traj-7e676561-ad0d-42ff-8816-efa7d046bd50", + "traj-91077d5b-f8bf-4c87-8d09-d6e093b5bb88", + "traj-b5efa56d-f6c0-458c-9e36-7eacd5e3aecd", + "traj-c1ae0253-b64c-4729-8091-86f4cf1db32f", + "traj-dac9bf7d-93a8-4152-9e15-cd9d3ec106ac", + "traj-fd5264e2-8196-440d-824d-178554b9f3b5" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-150123" +} \ No newline at end of file diff --git a/docs/training-reports/report-ff04fe09-ef40-4a66-9686-d40e1e44a435.json b/docs/training-reports/report-ff04fe09-ef40-4a66-9686-d40e1e44a435.json new file mode 100644 index 0000000..cc77888 --- /dev/null +++ b/docs/training-reports/report-ff04fe09-ef40-4a66-9686-d40e1e44a435.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-ff04fe09-ef40-4a66-9686-d40e1e44a435", + "timestamp": "2026-04-14T22:08:57.386324+00:00", + "source_trajectory_ids": [ + "traj-01aac751-1501-416a-9dc0-6a468a475b86", + "traj-576398f8-1d5a-48c9-b034-773996eb879b", + "traj-612968e5-793c-4837-a47d-ba9ae22a1b29", + "traj-6dac8d03-c298-423b-9b69-f3bd76274710", + "traj-961d6d99-672d-41b2-a1b5-67aeb2de2adf", + "traj-a5ea2801-0117-4ec3-859b-ecef47344247", + "traj-a97fe01e-4d79-4db4-8e56-8bac791ce374", + "traj-c866e29e-706b-41da-aa85-ba32786b1589", + "traj-d909c5a3-00c7-49e9-9a5e-871622fe5c1c", + "traj-f468128e-3e64-4537-99d8-39168035c1cf" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": 0.0, + "error_rate_delta": 0.0, + "latency_delta_ms": 0.0, + "baseline_avg_reward": 0.44, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-220857", + "baseline_version_id": null, + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-ff8f03c8-8567-4a59-af68-fc54fe4f5240.json b/docs/training-reports/report-ff8f03c8-8567-4a59-af68-fc54fe4f5240.json new file mode 100644 index 0000000..0ae04b5 --- /dev/null +++ b/docs/training-reports/report-ff8f03c8-8567-4a59-af68-fc54fe4f5240.json @@ -0,0 +1,43 @@ +{ + "report_id": "report-ff8f03c8-8567-4a59-af68-fc54fe4f5240", + "timestamp": "2026-04-14T18:58:16.279667+00:00", + "source_trajectory_ids": [ + "traj-02b7fb3c-5b02-4951-9591-595dd475bbb2", + "traj-09e1f07a-a743-4ed4-8960-3714fcdf9b63", + "traj-0e2dc646-2625-4875-aaef-1509c73d6318", + "traj-11563e48-055c-48b7-9811-ea869d92942e", + "traj-3032352a-d96b-4eea-8f98-361689d8c2a2", + "traj-77944f35-49b0-4673-b74a-4721236d6bf7", + "traj-90ca8f92-7a02-4f4b-85ae-46acfba2710c", + "traj-aac78c6e-3ad4-4d52-9d77-1e734e9b267f", + "traj-d2e80efb-0a35-4a3d-82a6-50161a81693c", + "traj-d3d3f7f6-52e6-4142-8e5a-6a98c54636b7" + ], + "sample_count": 10, + "baseline_metrics": { + "task_count": 1, + "avg_reward": 1.032, + "error_rate": 0.0, + "avg_latency_ms": 42.0 + }, + "challenger_metrics": { + "task_count": 1, + "avg_reward": 0.44, + "error_rate": 0.0, + "avg_latency_ms": 0.0 + }, + "promotion_decision": { + "accepted": true, + "reasons": [], + "metrics": { + "reward_delta": -0.592, + "error_rate_delta": 0.0, + "latency_delta_ms": -42.0, + "baseline_avg_reward": 1.032, + "challenger_avg_reward": 0.44 + } + }, + "promoted_version_id": "20260414-185816", + "baseline_version_id": "v-baseline", + "dry_run": false +} \ No newline at end of file diff --git a/docs/training-reports/report-skipped-0.json b/docs/training-reports/report-skipped-0.json new file mode 100644 index 0000000..82de6ac --- /dev/null +++ b/docs/training-reports/report-skipped-0.json @@ -0,0 +1,17 @@ +{ + "report_id": "report-skipped-0", + "timestamp": "2026-04-15T02:33:47.840638+00:00", + "source_trajectory_ids": [], + "sample_count": 0, + "baseline_metrics": {}, + "challenger_metrics": {}, + "promotion_decision": { + "accepted": false, + "reasons": [ + "Too few new trajectories (2 < 5)" + ], + "metrics": {} + }, + "promoted_version_id": null, + "skipped": true +} \ No newline at end of file diff --git a/pyproject.toml b/pyproject.toml new file mode 100644 index 0000000..6d98fb0 --- /dev/null +++ b/pyproject.toml @@ -0,0 +1,27 @@ +[project] +name = "memabra" +version = "0.1.0" +description = "An intuition-driven control plane for agent memory and action selection." +readme = "README.md" +requires-python = ">=3.11" +dependencies = [ + "pyyaml>=6.0", +] + +[project.optional-dependencies] +dev = [ + "pytest>=7.0", +] + +[project.scripts] +memabra = "memabra.cli:main" + +[build-system] +requires = ["setuptools>=61.0", "wheel"] +build-backend = "setuptools.build_meta" + +[tool.setuptools.packages.find] +where = ["src"] + +[tool.pytest.ini_options] +testpaths = ["tests"] diff --git a/src/memabra/__init__.py b/src/memabra/__init__.py new file mode 100644 index 0000000..0874f52 --- /dev/null +++ b/src/memabra/__init__.py @@ -0,0 +1,73 @@ +"""memabra: intuition-driven control plane for agent memory and action selection.""" + +from . import ( + app, + artifact_index, + benchmarks, + candidate_types, + case_index, + dataset, + evaluator, + execution, + memory_store, + online_learning, + outcome, + persistence, + promotion, + replay, + retrieval, + reward, + router, + router_versioning, + runner, + schemas, + telemetry, + training_reports, + trajectory_summary, +) +from .benchmarks import BenchmarkSuite, BenchmarkTask +from .case_index import CaseIndex +from .online_learning import OnlineLearningCoordinator +from .promotion import PromotionDecision, PromotionPolicy +from .training_reports import TrainingReportStore + +__all__ = [ + "app", + "artifact_index", + "benchmarks", + "BenchmarkSuite", + "BenchmarkTask", + "candidate_types", + "case_index", + "CaseIndex", + "cli", + "dataset", + "evaluator", + "execution", + "memory_store", + "online_learning", + "OnlineLearningCoordinator", + "outcome", + "persistence", + "promotion", + "PromotionDecision", + "PromotionPolicy", + "replay", + "retrieval", + "reward", + "router", + "router_versioning", + "runner", + "schemas", + "telemetry", + "training_reports", + "trajectory_summary", + "TrainingReportStore", +] + + +def __getattr__(name: str): + if name == "cli": + from . import cli as _cli + return _cli + raise AttributeError(f"module {__name__!r} has no attribute {name!r}") diff --git a/src/memabra/app.py b/src/memabra/app.py new file mode 100644 index 0000000..37562f5 --- /dev/null +++ b/src/memabra/app.py @@ -0,0 +1,308 @@ +from __future__ import annotations + +from dataclasses import dataclass +from pathlib import Path +from typing import Any + +from .artifact_index import ArtifactIndex +from .candidate_types import CandidateObject +from .case_index import CaseIndex +from .dataset import DatasetBuilder +from .execution import ExecutionEngine, FileSystemSkillBackend +from .memory_store import InMemoryMemoryStore, MemoryRecord, MemorySource +from .online_learning import OnlineLearningCoordinator +from .persistence import PersistenceStore +from .promotion import PromotionPolicy +from .replay import ReplaySummary, TrajectoryReplay +from .retrieval import CandidateRetriever, InMemoryCandidateProvider +from .router import RouterProtocol, RuleBasedRouter, SimpleLearningRouter, TaskContext +from .router_versioning import RouterVersionStore +from .runner import MemabraRunner + + +class DemoToolBackend: + def run_tool(self, tool_id: str, context: TaskContext, params: dict | None = None) -> dict: + return { + "status": "success", + "output": f"demo-result-for:{tool_id}", + "error": None, + "latency_ms": 42, + } + + +class DemoSkillBackend: + def load_skill(self, skill_id: str) -> dict: + return { + "skill_id": skill_id, + "instructions": "Demo skill payload loaded successfully.", + } + + +@dataclass(slots=True) +class MemabraApp: + runner: MemabraRunner + persistence_store: PersistenceStore + case_index: CaseIndex | None = None + + def run_task(self, user_input: str, *, channel: str = "local", user_id: str | None = None) -> dict: + return self.runner.run( + context=TaskContext(user_input=user_input), + channel=channel, + user_id=user_id, + persist=True, + ) + + def replay_summary(self) -> ReplaySummary: + return TrajectoryReplay().summarize_persistence_store(self.persistence_store) + + def artifact_index(self) -> ArtifactIndex: + return ArtifactIndex(persistence_store=self.persistence_store) + + def set_router(self, router: RouterProtocol) -> None: + self.runner.router = router + + def train_learning_router(self) -> SimpleLearningRouter: + index = self.artifact_index() + trajectories = index.query() + if not trajectories: + return SimpleLearningRouter() + builder = DatasetBuilder() + samples = builder.build(trajectories) + router = SimpleLearningRouter() + router.fit(samples) + return router + + def save_learning_router( + self, + version_id: str | None = None, + base_dir: str | Path = "docs/projects/memabra/router-versions", + metadata: dict[str, Any] | None = None, + ) -> dict[str, Any]: + if not isinstance(self.runner.router, SimpleLearningRouter): + raise TypeError("Current router is not a SimpleLearningRouter.") + store = RouterVersionStore(base_dir=base_dir) + return store.save(self.runner.router, version_id=version_id, metadata=metadata) + + def load_learning_router( + self, + version_id: str | None = None, + base_dir: str | Path = "docs/projects/memabra/router-versions", + ) -> SimpleLearningRouter: + store = RouterVersionStore(base_dir=base_dir) + router = store.load(version_id) + self.runner.router = router + return router + + def list_router_versions( + self, + base_dir: str | Path = "docs/projects/memabra/router-versions", + ) -> list[dict[str, Any]]: + store = RouterVersionStore(base_dir=base_dir) + return store.list_versions() + + def run_online_learning_cycle( + self, + policy: PromotionPolicy, + benchmark_tasks: list, + min_new_trajectories: int = 5, + version_store_base_dir: str | Path = "docs/projects/memabra/router-versions", + report_store_base_dir: str | Path = "docs/projects/memabra/training-reports", + seen_trajectory_store: str | Path | None = None, + dry_run: bool = False, + baseline_version_id: str | None = None, + case_index_path: str | Path | None = None, + ) -> dict[str, Any]: + coordinator = OnlineLearningCoordinator( + app=self, + policy=policy, + benchmark_tasks=benchmark_tasks, + min_new_trajectories=min_new_trajectories, + version_store_base_dir=version_store_base_dir, + report_store_base_dir=report_store_base_dir, + seen_trajectory_store=seen_trajectory_store, + case_index_path=case_index_path, + ) + return coordinator.run_cycle(dry_run=dry_run, baseline_version_id=baseline_version_id) + + def build_case_index(self) -> CaseIndex: + index = self.artifact_index() + case_index = CaseIndex() + for trajectory in index.query(): + case_index.add(trajectory) + self.case_index = case_index + self.runner.case_index = case_index + return case_index + + def save_case_index(self, path: str | Path) -> None: + if self.case_index is None: + raise RuntimeError("No case index loaded. Call build_case_index() or load_case_index() first.") + self.case_index.save(path) + + def load_case_index(self, path: str | Path) -> CaseIndex: + case_index = CaseIndex.load(path) + self.case_index = case_index + self.runner.case_index = case_index + return case_index + + def best_trajectory_for(self, input_text: str) -> str | None: + if self.case_index is None: + return None + return self.case_index.best(input_text) + + +def build_demo_app(*, base_dir: str | Path = "artifacts") -> MemabraApp: + memory_store = InMemoryMemoryStore() + memory_store.upsert( + MemoryRecord( + id="mem-telegram-pref", + memory_type="semantic", + fact_status="verified", + content="Prefer plain text on Telegram.", + summary="Telegram plain-text preference", + source=MemorySource(kind="user", ref="demo-seed"), + confidence=0.95, + tags=["telegram", "output"], + ) + ) + + providers = [ + InMemoryCandidateProvider( + candidate_type="memory", + candidates=[ + CandidateObject( + id="mem-telegram-pref", + type="memory", + title="Telegram preference", + summary="Prefer plain text on Telegram.", + triggers=["telegram", "preference", "answer"], + confidence=0.95, + success_rate=0.9, + freshness=0.9, + tags=["output"], + source="user", + ) + ], + ), + InMemoryCandidateProvider( + candidate_type="skill", + candidates=[ + CandidateObject( + id="skill-deploy", + type="skill", + title="Deploy workflow", + summary="Reusable deployment workflow.", + triggers=["deploy", "workflow", "service"], + confidence=0.8, + success_rate=0.9, + freshness=0.8, + tags=["ops"], + source="system", + ) + ], + ), + InMemoryCandidateProvider( + candidate_type="tool", + candidates=[ + CandidateObject( + id="tool-terminal", + type="tool", + title="terminal", + summary="Run terminal-style inspection commands.", + triggers=["check", "current", "status", "system"], + confidence=0.95, + success_rate=0.9, + freshness=1.0, + tags=["inspection"], + source="system", + ) + ], + ), + ] + + persistence_store = PersistenceStore(base_dir=base_dir) + runner = MemabraRunner( + retriever=CandidateRetriever(providers), + router=RuleBasedRouter(), + execution_engine=ExecutionEngine(tool_backend=DemoToolBackend(), skill_backend=DemoSkillBackend()), + persistence_store=persistence_store, + memory_store=memory_store, + ) + return MemabraApp(runner=runner, persistence_store=persistence_store) + + +def build_app_with_skills( + *, + base_dir: str | Path = "artifacts", + skill_search_paths: list[str | Path] | None = None, +) -> MemabraApp: + """Build a MemabraApp that loads real skills from the filesystem. + + By default it searches ~/.hermes/skills for SKILL.md files. + If a requested skill_id is not found on disk, the skill executor + will return an error payload in the trajectory events. + """ + memory_store = InMemoryMemoryStore() + memory_store.upsert( + MemoryRecord( + id="mem-telegram-pref", + memory_type="semantic", + fact_status="verified", + content="Prefer plain text on Telegram.", + summary="Telegram plain-text preference", + source=MemorySource(kind="user", ref="demo-seed"), + confidence=0.95, + tags=["telegram", "output"], + ) + ) + + providers = [ + InMemoryCandidateProvider( + candidate_type="memory", + candidates=[ + CandidateObject( + id="mem-telegram-pref", + type="memory", + title="Telegram preference", + summary="Prefer plain text on Telegram.", + triggers=["telegram", "preference", "answer"], + confidence=0.95, + success_rate=0.9, + freshness=0.9, + tags=["output"], + source="user", + ) + ], + ), + InMemoryCandidateProvider( + candidate_type="skill", + candidates=[], + ), + InMemoryCandidateProvider( + candidate_type="tool", + candidates=[ + CandidateObject( + id="tool-terminal", + type="tool", + title="terminal", + summary="Run terminal-style inspection commands.", + triggers=["check", "current", "status", "system"], + confidence=0.95, + success_rate=0.9, + freshness=1.0, + tags=["inspection"], + source="system", + ) + ], + ), + ] + + skill_backend = FileSystemSkillBackend(search_paths=skill_search_paths) + persistence_store = PersistenceStore(base_dir=base_dir) + runner = MemabraRunner( + retriever=CandidateRetriever(providers), + router=RuleBasedRouter(), + execution_engine=ExecutionEngine(tool_backend=DemoToolBackend(), skill_backend=skill_backend), + persistence_store=persistence_store, + memory_store=memory_store, + ) + return MemabraApp(runner=runner, persistence_store=persistence_store) diff --git a/src/memabra/artifact_index.py b/src/memabra/artifact_index.py new file mode 100644 index 0000000..55e56e4 --- /dev/null +++ b/src/memabra/artifact_index.py @@ -0,0 +1,104 @@ +from __future__ import annotations + +from dataclasses import dataclass, field +from pathlib import Path +from typing import Any + +from .persistence import PersistenceStore + + +@dataclass +class ArtifactIndex: + persistence_store: PersistenceStore | None = None + base_dir: str | Path | None = None + _trajectories: list[dict[str, Any]] = field(default_factory=list, repr=False) + + def __post_init__(self): + if self.persistence_store is None and self.base_dir is None: + raise ValueError("Either persistence_store or base_dir must be provided") + self.refresh() + + def refresh(self) -> None: + paths = self._list_trajectory_paths() + self._trajectories = [] + for path in paths: + try: + trajectory = self._load_trajectory(path) + self._trajectories.append(trajectory) + except Exception: + continue + + def query( + self, + *, + status: str | None = None, + min_reward: float | None = None, + max_reward: float | None = None, + decision_type: str | None = None, + channel: str | None = None, + min_tool_errors: int | None = None, + min_user_corrections: int | None = None, + input_contains: str | None = None, + ) -> list[dict[str, Any]]: + results = [] + for trajectory in self._trajectories: + if status is not None and trajectory["outcome"]["status"] != status: + continue + reward_total = trajectory["reward"]["total"] + if min_reward is not None and reward_total < min_reward: + continue + if max_reward is not None and reward_total > max_reward: + continue + if decision_type is not None: + decisions = trajectory.get("decisions", []) + if not any(d["decision_type"] == decision_type for d in decisions): + continue + if channel is not None and trajectory["task"]["channel"] != channel: + continue + if min_tool_errors is not None and trajectory["outcome"]["tool_errors"] < min_tool_errors: + continue + if min_user_corrections is not None and trajectory["outcome"]["user_corrections"] < min_user_corrections: + continue + if input_contains is not None: + task_input = trajectory["task"]["input"] + if input_contains.lower() not in task_input.lower(): + continue + results.append(trajectory) + return results + + def slice_dataset( + self, + *, + status: str | None = None, + min_reward: float | None = None, + max_reward: float | None = None, + decision_type: str | None = None, + channel: str | None = None, + min_tool_errors: int | None = None, + min_user_corrections: int | None = None, + input_contains: str | None = None, + ) -> list[str]: + results = self.query( + status=status, + min_reward=min_reward, + max_reward=max_reward, + decision_type=decision_type, + channel=channel, + min_tool_errors=min_tool_errors, + min_user_corrections=min_user_corrections, + input_contains=input_contains, + ) + return [r["trajectory_id"] for r in results] + + def _list_trajectory_paths(self) -> list[Path]: + if self.persistence_store is not None: + return self.persistence_store.list_trajectory_paths() + return sorted(Path(self.base_dir).glob("*.json")) + + def _load_trajectory(self, path: Path) -> dict[str, Any]: + if self.persistence_store is not None: + trajectory_id = path.stem + return self.persistence_store.load_trajectory(trajectory_id) + import json + + return json.loads(path.read_text(encoding="utf-8")) diff --git a/src/memabra/benchmarks.py b/src/memabra/benchmarks.py new file mode 100644 index 0000000..f0e172d --- /dev/null +++ b/src/memabra/benchmarks.py @@ -0,0 +1,63 @@ +from __future__ import annotations + +import json +from dataclasses import dataclass, field +from pathlib import Path +from typing import Any + +from .evaluator import BenchmarkTask + + +@dataclass(slots=True) +class BenchmarkSuite: + name: str + tasks: list[BenchmarkTask] = field(default_factory=list) + metadata: dict[str, Any] = field(default_factory=dict) + + +def save_benchmark_suite(suite: BenchmarkSuite, path: str | Path) -> None: + path = Path(path) + record = { + "name": suite.name, + "tasks": [ + { + "user_input": t.user_input, + "channel": t.channel, + "user_id": t.user_id, + } + for t in suite.tasks + ], + "metadata": suite.metadata, + } + path.write_text(json.dumps(record, indent=2), encoding="utf-8") + + +def load_benchmark_suite(path: str | Path) -> BenchmarkSuite: + path = Path(path) + record = json.loads(path.read_text(encoding="utf-8")) + tasks = [ + BenchmarkTask( + user_input=t["user_input"], + channel=t.get("channel", "local"), + user_id=t.get("user_id"), + ) + for t in record.get("tasks", []) + ] + return BenchmarkSuite( + name=record.get("name", "unnamed"), + tasks=tasks, + metadata=record.get("metadata", {}), + ) + + +def default_benchmark_suite() -> BenchmarkSuite: + return BenchmarkSuite( + name="default", + tasks=[ + BenchmarkTask(user_input="Recall my saved preference from memory."), + BenchmarkTask(user_input="Run the deploy workflow skill."), + BenchmarkTask(user_input="Check current system status with a tool."), + BenchmarkTask(user_input="Use multiple capabilities: memory, skill, and tool."), + ], + metadata={"source": "seed", "description": "Coverage over memory, skill, tool, and composite tasks"}, + ) diff --git a/src/memabra/candidate_types.py b/src/memabra/candidate_types.py new file mode 100644 index 0000000..a13b06f --- /dev/null +++ b/src/memabra/candidate_types.py @@ -0,0 +1,30 @@ +from dataclasses import dataclass, field +from typing import Any, Literal + +CandidateType = Literal["memory", "skill", "tool"] +DecisionType = Literal[ + "direct_answer", + "inject_memory", + "load_skill", + "call_tool", + "clarify", + "composite_action", +] + + +@dataclass(slots=True) +class CandidateObject: + id: str + type: CandidateType + title: str + summary: str + triggers: list[str] = field(default_factory=list) + cost: float = 0.0 + confidence: float = 0.0 + success_rate: float = 0.0 + freshness: float = 0.0 + risk: float = 0.0 + tags: list[str] = field(default_factory=list) + source: str = "generated" + preconditions: list[str] = field(default_factory=list) + type_payload: dict[str, Any] = field(default_factory=dict) diff --git a/src/memabra/case_index.py b/src/memabra/case_index.py new file mode 100644 index 0000000..3b28d7b --- /dev/null +++ b/src/memabra/case_index.py @@ -0,0 +1,48 @@ +from __future__ import annotations + +import json +from pathlib import Path +from typing import Any + + +class CaseIndex: + """Simple JSON-backed index that maps normalized task inputs to the best trajectory ID.""" + + def __init__(self) -> None: + self._index: dict[str, tuple[str, float]] = {} + + @staticmethod + def _normalize(text: str) -> str: + return " ".join(text.strip().lower().split()) + + def add(self, trajectory: dict[str, Any]) -> None: + trajectory_id = trajectory["trajectory_id"] + task_input = self._normalize(trajectory["task"]["input"]) + reward = float(trajectory["reward"]["total"]) + existing = self._index.get(task_input) + if existing is None or reward > existing[1]: + self._index[task_input] = (trajectory_id, reward) + + def best(self, input_text: str) -> str | None: + normalized = self._normalize(input_text) + entry = self._index.get(normalized) + if entry is None: + return None + return entry[0] + + def save(self, path: str | Path) -> None: + data = { + "cases": { + task_input: {"trajectory_id": traj_id, "reward": reward} + for task_input, (traj_id, reward) in self._index.items() + } + } + Path(path).write_text(json.dumps(data, indent=2), encoding="utf-8") + + @classmethod + def load(cls, path: str | Path) -> CaseIndex: + data = json.loads(Path(path).read_text(encoding="utf-8")) + index = cls() + for task_input, entry in data.get("cases", {}).items(): + index._index[task_input] = (entry["trajectory_id"], float(entry["reward"])) + return index diff --git a/src/memabra/cli.py b/src/memabra/cli.py new file mode 100644 index 0000000..f832582 --- /dev/null +++ b/src/memabra/cli.py @@ -0,0 +1,411 @@ +from __future__ import annotations + +import json +from pathlib import Path +from typing import Any + +from .app import build_demo_app +from .benchmarks import default_benchmark_suite +from .evaluator import BenchmarkTask, Evaluator +from .promotion import PromotionPolicy + + +def run_wrapup_workflow(*, base_dir: str | Path = "artifacts") -> dict[str, Any]: + base_path = Path(base_dir) + app = build_demo_app(base_dir=base_path) + + seed_prompts = [ + "Use my telegram preference for this answer.", + "Check the current system status.", + "Deploy this service with the usual workflow.", + ] + for prompt in seed_prompts: + app.run_task(prompt, channel="telegram", user_id="oza") + + seed_summary = app.replay_summary() + learning_router = app.train_learning_router() + + evaluator = Evaluator(app) + benchmark_tasks = [ + BenchmarkTask(user_input="Use my telegram preference for this answer.", channel="telegram", user_id="oza"), + BenchmarkTask(user_input="Check the current system status.", channel="local", user_id="oza"), + BenchmarkTask(user_input="Deploy this service with the usual workflow.", channel="local", user_id="oza"), + ] + baseline = evaluator.run(benchmark_tasks) + challenger = evaluator.run(benchmark_tasks, router=learning_router) + comparison = { + "baseline": { + "avg_reward": baseline.avg_reward, + "error_rate": baseline.error_rate, + "avg_latency_ms": baseline.avg_latency_ms, + "decision_distribution": baseline.decision_distribution, + }, + "challenger": { + "avg_reward": challenger.avg_reward, + "error_rate": challenger.error_rate, + "avg_latency_ms": challenger.avg_latency_ms, + "decision_distribution": challenger.decision_distribution, + }, + **evaluator.compare(baseline, challenger), + } + + app.set_router(learning_router) + saved_version = app.save_learning_router( + base_dir=base_path / "router-versions", + metadata={ + "avg_reward": challenger.avg_reward, + "task_count": challenger.task_count, + "source": "wrapup_workflow", + }, + ) + + return { + "seed_summary": { + "trajectories": seed_summary.trajectories, + "success_count": seed_summary.success_count, + "failure_count": seed_summary.failure_count, + "average_reward": seed_summary.average_reward, + }, + "comparison": comparison, + "saved_version": saved_version, + } + + +def run_online_learning_workflow( + *, + base_dir: str | Path = "artifacts", + min_new_trajectories: int = 3, + seen_trajectory_store: str | Path | None = None, + dry_run: bool = False, + baseline_version: str | None = None, + case_index_path: str | Path | None = None, + rebuild_case_index: bool = False, +) -> dict[str, Any]: + base_path = Path(base_dir) + app = build_demo_app(base_dir=base_path) + + # Seed demo tasks if no artifacts exist yet + if not any((base_path / "trajectories").glob("*.json")): + seed_prompts = [ + "Use my telegram preference for this answer.", + "Check the current system status.", + "Deploy this service with the usual workflow.", + "Recall my saved preference from memory.", + "Run the deploy workflow skill.", + ] + for prompt in seed_prompts: + app.run_task(prompt, channel="local") + + # Handle case index loading or rebuilding + if case_index_path is not None: + case_index_file = Path(case_index_path) + if rebuild_case_index: + app.build_case_index() + app.save_case_index(case_index_file) + elif case_index_file.exists(): + app.load_case_index(case_index_file) + + policy = PromotionPolicy( + min_reward_delta=-1.0, + max_error_rate_increase=1.0, + max_latency_increase_ms=10000.0, + required_task_count=1, + ) + benchmark_tasks = default_benchmark_suite().tasks + + result = app.run_online_learning_cycle( + policy=policy, + benchmark_tasks=benchmark_tasks, + min_new_trajectories=min_new_trajectories, + version_store_base_dir=base_path / "router-versions", + report_store_base_dir=base_path / "training-reports", + seen_trajectory_store=seen_trajectory_store, + dry_run=dry_run, + baseline_version_id=baseline_version, + case_index_path=case_index_path, + ) + + # Serialize dataclass objects for JSON compatibility + from dataclasses import asdict + + serializable = {} + for key, value in result.items(): + if hasattr(value, "__dataclass_fields__"): + serializable[key] = asdict(value) + else: + serializable[key] = value + return serializable + + +def show_status(*, base_dir: str | Path = "artifacts") -> dict[str, Any]: + base_path = Path(base_dir) + from .router_versioning import RouterVersionStore + from .training_reports import TrainingReportStore + + version_store = RouterVersionStore(base_dir=base_path / "router-versions") + report_store = TrainingReportStore(base_dir=base_path / "training-reports") + + current = version_store.get_current() + versions = version_store.list_versions() + reports = report_store.list_reports() + latest_report = reports[-1] if reports else None + + trajectory_dir = base_path / "trajectories" + trajectory_count = len(list(trajectory_dir.glob("*.json"))) if trajectory_dir.exists() else 0 + + return { + "base_dir": str(base_path), + "current_version_id": current.get("current_version_id"), + "version_count": len(versions), + "trajectory_count": trajectory_count, + "report_count": len(reports), + "latest_report": { + "report_id": latest_report.get("report_id"), + "timestamp": latest_report.get("timestamp"), + "promoted": latest_report.get("promotion_decision", {}).get("accepted") if latest_report else None, + } if latest_report else None, + } + + +def format_output(payload: dict[str, Any], *, output_format: str, mode: str) -> str: + if output_format == "json": + return json.dumps(payload, indent=2, ensure_ascii=False) + + def _as_mapping(value: Any) -> dict[str, Any]: + if value is None: + return {} + if isinstance(value, dict): + return value + if hasattr(value, "__dataclass_fields__"): + from dataclasses import asdict + + return asdict(value) + return { + key: getattr(value, key) + for key in ("avg_reward", "error_rate", "avg_latency_ms", "metrics", "reasons", "accepted") + if hasattr(value, key) + } + + def _fmt_bool(value: Any) -> str: + if value is None: + return "none" + return "yes" if bool(value) else "no" + + def _fmt_number(value: Any, *, digits: int = 4) -> str: + if value is None: + return "none" + if isinstance(value, bool): + return _fmt_bool(value) + if isinstance(value, (int, float)): + return f"{float(value):.{digits}f}" + return str(value) + + if mode == "status": + latest_report = payload.get("latest_report") or {} + lines = [ + "Memabra status", + "Summary", + f"Base dir: {payload.get('base_dir') or 'none'}", + f"Current version: {payload.get('current_version_id') or 'none'}", + f"Saved versions: {payload.get('version_count', 0)}", + f"Trajectory count: {payload.get('trajectory_count', 0)}", + f"Training reports: {payload.get('report_count', 0)}", + f"Latest report: {latest_report.get('report_id') or 'none'}", + ] + if latest_report.get("timestamp") is not None: + lines.append(f"Latest report time: {latest_report.get('timestamp')}") + if latest_report.get("promoted") is not None: + lines.append(f"Latest promotion accepted: {_fmt_bool(latest_report.get('promoted'))}") + return "\n".join(lines) + + if mode == "list_versions": + versions = payload.get("versions", []) + current_version_id = payload.get("current_version_id") + lines = [f"Saved router versions ({len(versions)} total)"] + lines.append(f"Current version: {current_version_id or 'none'}") + if not versions: + lines.append("(none)") + return "\n".join(lines) + for index, version in enumerate(versions, start=1): + metadata = version.get("metadata") or {} + metadata_parts = [] + if version.get("version_id") == current_version_id: + metadata_parts.append("current") + if metadata.get("source") is not None: + metadata_parts.append(f"source={metadata['source']}") + if metadata.get("avg_reward") is not None: + metadata_parts.append(f"avg_reward={metadata['avg_reward']}") + suffix = f" ({', '.join(metadata_parts)})" if metadata_parts else "" + lines.append(f"{index}. {version.get('version_id')}{suffix}") + return "\n".join(lines) + + if mode == "rollback": + return f"Rolled back current version to: {payload.get('current_version_id') or 'none'}" + + if mode == "workflow": + report_id = payload.get("report_id") or "none" + lines = [ + "Memabra online learning result", + "Summary", + f"Report ID: {report_id}", + f"Skipped: {_fmt_bool(payload.get('skipped'))}", + f"Promoted: {_fmt_bool(payload.get('promoted'))}", + ] + if "dry_run" in payload: + lines.append(f"Dry run: {_fmt_bool(payload.get('dry_run'))}") + + baseline_metrics = _as_mapping(payload.get("baseline_metrics")) + challenger_metrics = _as_mapping(payload.get("challenger_metrics")) + decision = _as_mapping(payload.get("decision")) + decision_metrics = _as_mapping(decision.get("metrics")) + + if baseline_metrics: + lines.extend([ + "Baseline", + f"Reward: {_fmt_number(baseline_metrics.get('avg_reward'))}", + f"Error rate: {_fmt_number(baseline_metrics.get('error_rate'))}", + f"Latency (ms): {_fmt_number(baseline_metrics.get('avg_latency_ms'))}", + ]) + if challenger_metrics: + lines.extend([ + "Challenger", + f"Reward: {_fmt_number(challenger_metrics.get('avg_reward'))}", + f"Error rate: {_fmt_number(challenger_metrics.get('error_rate'))}", + f"Latency (ms): {_fmt_number(challenger_metrics.get('avg_latency_ms'))}", + ]) + if decision_metrics: + lines.extend([ + "Deltas", + f"Reward delta: {_fmt_number(decision_metrics.get('reward_delta'))}", + f"Error rate delta: {_fmt_number(decision_metrics.get('error_rate_delta'))}", + f"Latency delta (ms): {_fmt_number(decision_metrics.get('latency_delta_ms'))}", + ]) + + reason = payload.get("reason") + if not reason: + decision_reasons = decision.get("reasons", []) if isinstance(decision, dict) else [] + if decision_reasons: + reason = "; ".join(str(item) for item in decision_reasons) + + error = payload.get("error") + if reason or error or decision.get("accepted") is not None: + lines.append("Decision") + if decision.get("accepted") is not None: + lines.append(f"Accepted: {_fmt_bool(decision.get('accepted'))}") + if reason: + lines.append(f"Reason: {reason}") + if error: + lines.append(f"Error: {error}") + + version_id = payload.get("version_id") or payload.get("promoted_version_id") + if version_id: + lines.append(f"Version ID: {version_id}") + return "\n".join(lines) + + return json.dumps(payload, indent=2, ensure_ascii=False) + + +def main(argv: list[str] | None = None) -> int: + import argparse + import sys + + if argv is None: + argv = sys.argv[1:] + # When running under pytest without explicit args, default to run subcommand + # to avoid argparse picking up pytest's own command-line arguments. + if "pytest" in sys.modules: + argv = ["run"] + + # Backward compat: default to 'run' when invoked without a known subcommand + known_commands = {"run", "status", "version"} + if not argv or argv[0] not in known_commands: + if argv and argv[0] in ("-h", "--help"): + pass # let top-level parser show help + else: + argv = ["run"] + list(argv) + + parser = argparse.ArgumentParser(description="memabra CLI") + subparsers = parser.add_subparsers(dest="command") + + run_parser = subparsers.add_parser("run", help="Run the online learning workflow") + run_parser.add_argument("--base-dir", default="artifacts", help="Base directory for artifacts") + run_parser.add_argument("--min-new-trajectories", type=int, default=3, help="Minimum new trajectories required to run a cycle") + run_parser.add_argument("--seen-trajectory-store", default=None, help="Path to persist seen trajectory IDs (defaults to /seen-trajectories.json)") + run_parser.add_argument("--dry-run", action="store_true", help="Train and evaluate but do not promote or save a new router version") + run_parser.add_argument("--baseline-version", default=None, help="Load a specific router version as the baseline for evaluation") + run_parser.add_argument("--case-index", default=None, help="Path to a case index JSON file for episodic retrieval") + run_parser.add_argument("--rebuild-case-index", action="store_true", help="Rebuild and save the case index from existing trajectories before running") + run_parser.add_argument("--format", choices=("json", "text"), default="json", help="Output format for CLI results") + + status_parser = subparsers.add_parser("status", help="Show system status") + status_parser.add_argument("--base-dir", default="artifacts", help="Base directory for artifacts") + status_parser.add_argument("--format", choices=("json", "text"), default="json", help="Output format for CLI results") + + version_parser = subparsers.add_parser("version", help="Manage router versions") + version_subparsers = version_parser.add_subparsers(dest="version_command") + + list_parser = version_subparsers.add_parser("list", help="List all saved router versions") + list_parser.add_argument("--base-dir", default="artifacts", help="Base directory for artifacts") + list_parser.add_argument("--format", choices=("json", "text"), default="json", help="Output format for CLI results") + + rollback_parser = version_subparsers.add_parser("rollback", help="Rollback to a specific router version") + rollback_parser.add_argument("version_id", help="Router version ID to rollback to") + rollback_parser.add_argument("--base-dir", default="artifacts", help="Base directory for artifacts") + rollback_parser.add_argument("--format", choices=("json", "text"), default="json", help="Output format for CLI results") + + args = parser.parse_args(args=argv) + + base_path = Path(args.base_dir) + + if args.command == "status": + result = show_status(base_dir=base_path) + print(format_output(result, output_format=args.format, mode="status")) + return 0 + + if args.command == "version": + from .router_versioning import RouterVersionStore + + store = RouterVersionStore(base_dir=base_path / "router-versions") + if args.version_command == "rollback": + try: + rollback_result = store.rollback(args.version_id) + except ValueError as exc: + print(str(exc), file=sys.stderr) + return 1 + current_version_id = rollback_result.get("current_version_id") + if current_version_id is None: + current = store.get_current() + current_version_id = current.get("current_version_id") + print(format_output({"current_version_id": current_version_id}, output_format=args.format, mode="rollback")) + return 0 + elif args.version_command == "list": + versions = store.list_versions() + current = store.get_current() + print(format_output({"versions": versions, "current_version_id": current.get("current_version_id")}, output_format=args.format, mode="list_versions")) + return 0 + else: + version_parser.print_help() + return 2 + + if args.command == "run": + seen_store = args.seen_trajectory_store or str(base_path / "seen-trajectories.json") + case_index_path = args.case_index or (str(base_path / "case-index.json") if args.rebuild_case_index else None) + + result = run_online_learning_workflow( + base_dir=base_path, + min_new_trajectories=args.min_new_trajectories, + seen_trajectory_store=seen_store, + dry_run=args.dry_run, + baseline_version=args.baseline_version, + case_index_path=case_index_path, + rebuild_case_index=args.rebuild_case_index, + ) + print(format_output(result, output_format=args.format, mode="workflow")) + return 0 + + parser.print_help() + return 2 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/src/memabra/dataset.py b/src/memabra/dataset.py new file mode 100644 index 0000000..ba96a8f --- /dev/null +++ b/src/memabra/dataset.py @@ -0,0 +1,48 @@ +from __future__ import annotations + +from dataclasses import dataclass +from typing import Any + + +@dataclass(slots=True) +class TrainingSample: + input_text: str + features: dict[str, float] + label: str + reward: float + + +class DatasetBuilder: + def build(self, trajectories: list[dict[str, Any]]) -> list[TrainingSample]: + samples: list[TrainingSample] = [] + for trajectory in trajectories: + task_input = trajectory["task"]["input"] + candidate_sets = trajectory["candidate_sets"] + decisions = trajectory.get("decisions", []) + label = decisions[0]["decision_type"] if decisions else "clarify" + reward_total = trajectory["reward"]["total"] + + memory = candidate_sets.get("memory", []) + skill = candidate_sets.get("skill", []) + tool = candidate_sets.get("tool", []) + + features: dict[str, float] = { + "input_length": float(len(task_input)), + "memory_count": float(len(memory)), + "skill_count": float(len(skill)), + "tool_count": float(len(tool)), + "top_memory_confidence": max((c.get("confidence", 0.0) for c in memory), default=0.0), + "top_skill_success_rate": max((c.get("success_rate", 0.0) for c in skill), default=0.0), + "top_tool_confidence": max((c.get("confidence", 0.0) for c in tool), default=0.0), + "top_tool_risk": max((c.get("risk", 0.0) for c in tool), default=0.0), + } + + samples.append( + TrainingSample( + input_text=task_input, + features=features, + label=label, + reward=reward_total, + ) + ) + return samples diff --git a/src/memabra/evaluator.py b/src/memabra/evaluator.py new file mode 100644 index 0000000..39ec575 --- /dev/null +++ b/src/memabra/evaluator.py @@ -0,0 +1,94 @@ +from __future__ import annotations + +from dataclasses import dataclass, field +from typing import TYPE_CHECKING, Any + +from .dataset import DatasetBuilder +from .router import SimpleLearningRouter, TaskContext + +if TYPE_CHECKING: + from .app import MemabraApp + + +@dataclass(slots=True) +class BenchmarkTask: + user_input: str + channel: str = "local" + user_id: str | None = None + + +@dataclass(slots=True) +class EvaluationResult: + task_count: int = 0 + trajectories: list[dict[str, Any]] = field(default_factory=list) + avg_reward: float = 0.0 + decision_distribution: dict[str, int] = field(default_factory=dict) + error_rate: float = 0.0 + avg_latency_ms: float = 0.0 + + +class Evaluator: + def __init__(self, app: MemabraApp): + self.app = app + + def run(self, tasks: list[BenchmarkTask], router=None) -> EvaluationResult: + original_router = self.app.runner.router + if router is not None: + self.app.runner.router = router + + trajectories: list[dict[str, Any]] = [] + try: + for task in tasks: + trajectory = self.app.run_task( + task.user_input, + channel=task.channel, + user_id=task.user_id, + ) + trajectories.append(trajectory) + finally: + self.app.runner.router = original_router + + return self._analyze(trajectories) + + def _analyze(self, trajectories: list[dict[str, Any]]) -> EvaluationResult: + if not trajectories: + return EvaluationResult() + + total_reward = sum(t["reward"]["total"] for t in trajectories) + decisions = [t["decisions"][0]["decision_type"] for t in trajectories if t.get("decisions")] + distribution: dict[str, int] = {} + for d in decisions: + distribution[d] = distribution.get(d, 0) + 1 + + error_count = sum(1 for t in trajectories if t["outcome"]["status"] == "error") + total_latency = sum(t["outcome"]["latency_ms"] for t in trajectories) + + return EvaluationResult( + task_count=len(trajectories), + trajectories=trajectories, + avg_reward=round(total_reward / len(trajectories), 4), + decision_distribution=distribution, + error_rate=round(error_count / len(trajectories), 4), + avg_latency_ms=round(total_latency / len(trajectories), 4), + ) + + def compare(self, baseline: EvaluationResult, challenger: EvaluationResult) -> dict[str, Any]: + reward_delta = round(challenger.avg_reward - baseline.avg_reward, 4) + error_delta = round(challenger.error_rate - baseline.error_rate, 4) + latency_delta = round(challenger.avg_latency_ms - baseline.avg_latency_ms, 4) + + if reward_delta > 0.001: + winner = "challenger" + elif reward_delta < -0.001: + winner = "baseline" + else: + winner = "tie" + + return { + "winner": winner, + "avg_reward_delta": reward_delta, + "error_rate_delta": error_delta, + "avg_latency_ms_delta": latency_delta, + "baseline_avg_reward": baseline.avg_reward, + "challenger_avg_reward": challenger.avg_reward, + } diff --git a/src/memabra/execution.py b/src/memabra/execution.py new file mode 100644 index 0000000..b4133d4 --- /dev/null +++ b/src/memabra/execution.py @@ -0,0 +1,296 @@ +from __future__ import annotations + +from dataclasses import dataclass, field +from pathlib import Path +from typing import Any, Protocol + +import yaml + +from .memory_store import InMemoryMemoryStore +from .router import RouteDecision, TaskContext +from .telemetry import Event + + +@dataclass(slots=True) +class ActionResult: + decision_type: str + status: str + details: dict[str, Any] = field(default_factory=dict) + events: list[Event] = field(default_factory=list) + + +class ToolBackend(Protocol): + def run_tool(self, tool_id: str, context: TaskContext, params: dict[str, Any] | None = None) -> dict[str, Any]: + ... + + +class SkillBackend(Protocol): + def load_skill(self, skill_id: str) -> dict[str, Any]: + ... + + +@dataclass(slots=True) +class FileSystemSkillBackend: + search_paths: list[str | Path] = field(default_factory=lambda: [Path.home() / ".hermes" / "skills"]) + + def _discover(self) -> dict[str, Path]: + index: dict[str, Path] = {} + for base in self.search_paths: + base_path = Path(base) + if not base_path.exists(): + continue + for skill_file in base_path.rglob("SKILL.md"): + frontmatter = self._parse_frontmatter(skill_file) + name = frontmatter.get("name") if frontmatter else None + if name: + index[name] = skill_file + return index + + def _parse_frontmatter(self, path: Path) -> dict[str, Any] | None: + text = path.read_text(encoding="utf-8") + if not text.startswith("---"): + return None + try: + _, rest = text.split("---", 1) + fm_text, _ = rest.split("---", 1) + return yaml.safe_load(fm_text) or {} + except Exception: + return None + + def load_skill(self, skill_id: str) -> dict[str, Any]: + index = self._discover() + skill_path = index.get(skill_id) + if skill_path is None: + return { + "skill_id": skill_id, + "status": "error", + "error": f"Skill '{skill_id}' not found in {self.search_paths}.", + } + text = skill_path.read_text(encoding="utf-8") + frontmatter = self._parse_frontmatter(skill_path) or {} + body = text + if text.startswith("---"): + try: + _, rest = text.split("---", 1) + _, body = rest.split("---", 1) + except Exception: + pass + return { + "skill_id": skill_id, + "status": "success", + "name": frontmatter.get("name", skill_id), + "description": frontmatter.get("description", ""), + "version": frontmatter.get("version", ""), + "author": frontmatter.get("author", ""), + "content": body.strip(), + "path": str(skill_path), + **frontmatter, + } + + +@dataclass(slots=True) +class LocalFunctionToolAdapter: + func: Any + + def run_tool(self, tool_id: str, context: TaskContext, params: dict[str, Any] | None = None) -> dict[str, Any]: + params = params or {} + start = __import__("time").time() + try: + output = self.func(**params) + latency_ms = int((__import__("time").time() - start) * 1000) + return {"status": "success", "output": output, "error": None, "latency_ms": latency_ms} + except Exception as exc: + latency_ms = int((__import__("time").time() - start) * 1000) + return {"status": "error", "output": None, "error": str(exc), "latency_ms": latency_ms} + + +@dataclass(slots=True) +class SubprocessToolAdapter: + command: str + + def run_tool(self, tool_id: str, context: TaskContext, params: dict[str, Any] | None = None) -> dict[str, Any]: + import subprocess + import time + + start = time.time() + try: + proc = subprocess.run(self.command, shell=True, capture_output=True, text=True, timeout=30) + latency_ms = int((time.time() - start) * 1000) + if proc.returncode == 0: + return {"status": "success", "output": proc.stdout.strip(), "error": None, "latency_ms": latency_ms} + return {"status": "error", "output": proc.stdout.strip(), "error": proc.stderr.strip(), "latency_ms": latency_ms} + except Exception as exc: + latency_ms = int((time.time() - start) * 1000) + return {"status": "error", "output": None, "error": str(exc), "latency_ms": latency_ms} + + +@dataclass(slots=True) +class ToolRegistry: + _tools: dict[str, ToolBackend] = field(default_factory=dict) + + def register(self, tool_id: str, backend: ToolBackend) -> None: + self._tools[tool_id] = backend + + def run_tool(self, tool_id: str, context: TaskContext, params: dict[str, Any] | None = None) -> dict[str, Any]: + backend = self._tools.get(tool_id) + if backend is None: + return {"status": "error", "output": None, "error": f"Tool '{tool_id}' not found in registry.", "latency_ms": 0} + return backend.run_tool(tool_id, context, params) + + +@dataclass(slots=True) +class MemoryExecutor: + memory_store: InMemoryMemoryStore | None = None + + def execute(self, decision: RouteDecision, context: TaskContext, trajectory_id: str) -> ActionResult: + selected_ids = list(decision.selected_ids) + events: list[Event] = [] + for record_id in selected_ids: + if self.memory_store is not None and self.memory_store.get(record_id) is not None: + self.memory_store.mark_used(record_id) + events.append( + Event( + event_id=f"evt-memory-{trajectory_id}-{record_id}", + trajectory_id=trajectory_id, + stage="execution", + event_type="memory_injected", + payload={"record_id": record_id, "input": context.user_input}, + ) + ) + return ActionResult( + decision_type=decision.decision_type, + status="executed" if selected_ids else "skipped", + details={"selected_ids": selected_ids, "latency_ms": 0}, + events=events, + ) + + +@dataclass(slots=True) +class SkillExecutor: + backend: SkillBackend | None = None + + def execute(self, decision: RouteDecision, context: TaskContext, trajectory_id: str) -> ActionResult: + selected_ids = list(decision.selected_ids) + payloads: list[dict[str, Any]] = [] + events: list[Event] = [] + for skill_id in selected_ids: + payload = self.backend.load_skill(skill_id) if self.backend is not None else {"skill_id": skill_id} + payloads.append(payload) + event_payload = {"skill_id": skill_id, "input": context.user_input, **payload} + events.append( + Event( + event_id=f"evt-skill-{trajectory_id}-{skill_id}", + trajectory_id=trajectory_id, + stage="execution", + event_type="skill_loaded", + payload=event_payload, + ) + ) + return ActionResult( + decision_type=decision.decision_type, + status="executed" if selected_ids else "skipped", + details={"selected_ids": selected_ids, "payloads": payloads, "latency_ms": 0}, + events=events, + ) + + +@dataclass(slots=True) +class ToolExecutor: + backend: ToolBackend | None = None + + def execute(self, decision: RouteDecision, context: TaskContext, trajectory_id: str) -> ActionResult: + selected_ids = list(decision.selected_ids) + events: list[Event] = [] + result_status = "executed" if selected_ids else "skipped" + result_payloads: list[dict[str, Any]] = [] + max_latency = 0 + for idx, tool_id in enumerate(selected_ids): + params = decision.selected_payloads[idx] if idx < len(decision.selected_payloads) else {} + backend_result = ( + self.backend.run_tool(tool_id, context, params=params) + if self.backend is not None + else {"status": "success", "output": "mock-success", "error": None, "latency_ms": 0} + ) + result_payloads.append({"tool_id": tool_id, **backend_result}) + max_latency = max(max_latency, int(backend_result.get("latency_ms", 0) or 0)) + if backend_result.get("status") == "error": + result_status = "error" + events.extend( + [ + Event( + event_id=f"evt-tool-{trajectory_id}-{tool_id}", + trajectory_id=trajectory_id, + stage="execution", + event_type="tool_called", + payload={"tool_id": tool_id, "input": context.user_input}, + ), + Event( + event_id=f"evt-tool-result-{trajectory_id}-{tool_id}", + trajectory_id=trajectory_id, + stage="execution", + event_type="tool_result", + payload={"tool_id": tool_id, **backend_result}, + ), + ] + ) + return ActionResult( + decision_type=decision.decision_type, + status=result_status, + details={"selected_ids": selected_ids, "results": result_payloads, "latency_ms": max_latency}, + events=events, + ) + + +# Backward compatibility alias +MockToolExecutor = ToolExecutor + + +@dataclass(slots=True) +class ExecutionEngine: + memory_executor: MemoryExecutor = field(default_factory=MemoryExecutor) + skill_executor: SkillExecutor = field(default_factory=SkillExecutor) + tool_executor: ToolExecutor = field(default_factory=ToolExecutor) + + def __init__( + self, + memory_executor: MemoryExecutor | None = None, + skill_executor: SkillExecutor | None = None, + tool_executor: ToolExecutor | None = None, + tool_backend: ToolBackend | None = None, + skill_backend: SkillBackend | None = None, + ): + self.memory_executor = memory_executor or MemoryExecutor() + self.skill_executor = skill_executor or SkillExecutor(backend=skill_backend) + self.tool_executor = tool_executor or ToolExecutor(backend=tool_backend) + + def execute(self, decision: RouteDecision, context: TaskContext, trajectory_id: str) -> ActionResult: + if decision.decision_type == "inject_memory": + return self.memory_executor.execute(decision, context, trajectory_id) + if decision.decision_type == "load_skill": + return self.skill_executor.execute(decision, context, trajectory_id) + if decision.decision_type == "call_tool": + return self.tool_executor.execute(decision, context, trajectory_id) + if decision.decision_type == "composite_action": + events: list[Event] = [] + steps: list[dict[str, Any]] = [] + total_latency = 0 + status = "executed" + for step in decision.composite_steps: + step_result = self.execute(step, context, trajectory_id) + events.extend(step_result.events) + steps.append({"decision_type": step.decision_type, "status": step_result.status, "details": step_result.details}) + total_latency += int(step_result.details.get("latency_ms", 0) or 0) + if step_result.status in ("error", "failure"): + status = "error" + return ActionResult( + decision_type="composite_action", + status=status, + details={"steps": steps, "latency_ms": total_latency}, + events=events, + ) + return ActionResult( + decision_type=decision.decision_type, + status="noop", + details={"reason": "No executor needed for this decision type.", "latency_ms": 0}, + events=[], + ) diff --git a/src/memabra/memory_store.py b/src/memabra/memory_store.py new file mode 100644 index 0000000..dd9883b --- /dev/null +++ b/src/memabra/memory_store.py @@ -0,0 +1,107 @@ +from __future__ import annotations + +from dataclasses import dataclass, field +from datetime import UTC, datetime +from typing import Literal + +MemoryType = Literal["semantic", "procedural", "episodic", "working"] +FactStatus = Literal["draft", "assumed", "verified", "deprecated", "revoked"] + + +@dataclass(slots=True) +class MemorySource: + kind: str + ref: str + + +@dataclass(slots=True) +class VerificationState: + status: str = "unknown" + last_checked_at: str | None = None + check_method: str | None = None + + +@dataclass(slots=True) +class MemoryRecord: + id: str + memory_type: MemoryType + fact_status: FactStatus + content: str + summary: str + source: MemorySource + confidence: float + created_at: str = field(default_factory=lambda: datetime.now(UTC).isoformat()) + updated_at: str = field(default_factory=lambda: datetime.now(UTC).isoformat()) + tags: list[str] = field(default_factory=list) + related_entities: list[str] = field(default_factory=list) + last_used_at: str | None = None + expires_at: str | None = None + verification: VerificationState = field(default_factory=VerificationState) + revocation: dict[str, str] | None = None + + def to_dict(self) -> dict: + return { + "id": self.id, + "memory_type": self.memory_type, + "fact_status": self.fact_status, + "content": self.content, + "summary": self.summary, + "source": {"kind": self.source.kind, "ref": self.source.ref}, + "confidence": self.confidence, + "tags": list(self.tags), + "related_entities": list(self.related_entities), + "created_at": self.created_at, + "updated_at": self.updated_at, + "last_used_at": self.last_used_at, + "expires_at": self.expires_at, + "verification": { + "status": self.verification.status, + "last_checked_at": self.verification.last_checked_at, + "check_method": self.verification.check_method, + }, + "revocation": self.revocation, + } + + +class InMemoryMemoryStore: + def __init__(self): + self._records: dict[str, MemoryRecord] = {} + + def upsert(self, record: MemoryRecord) -> None: + record.updated_at = datetime.now(UTC).isoformat() + self._records[record.id] = record + + def get(self, record_id: str) -> MemoryRecord | None: + return self._records.get(record_id) + + def list_by_type(self, memory_type: MemoryType | None = None) -> list[MemoryRecord]: + records = list(self._records.values()) + if memory_type is None: + return records + return [record for record in records if record.memory_type == memory_type] + + def mark_used(self, record_id: str) -> None: + record = self._require_record(record_id) + now = datetime.now(UTC).isoformat() + record.last_used_at = now + record.updated_at = now + + def verify(self, record_id: str, *, status: str, check_method: str) -> None: + record = self._require_record(record_id) + now = datetime.now(UTC).isoformat() + record.fact_status = "verified" if status == "confirmed" else record.fact_status + record.verification = VerificationState(status=status, last_checked_at=now, check_method=check_method) + record.updated_at = now + + def revoke(self, record_id: str, *, reason: str) -> None: + record = self._require_record(record_id) + now = datetime.now(UTC).isoformat() + record.fact_status = "revoked" + record.revocation = {"reason": reason, "revoked_at": now} + record.updated_at = now + + def _require_record(self, record_id: str) -> MemoryRecord: + record = self.get(record_id) + if record is None: + raise KeyError(f"Unknown memory record: {record_id}") + return record diff --git a/src/memabra/online_learning.py b/src/memabra/online_learning.py new file mode 100644 index 0000000..d64cc70 --- /dev/null +++ b/src/memabra/online_learning.py @@ -0,0 +1,175 @@ +from __future__ import annotations + +import json +from dataclasses import dataclass, field +from datetime import datetime, timezone +from pathlib import Path +from typing import Any + +from .benchmarks import BenchmarkTask +from .dataset import DatasetBuilder +from .evaluator import Evaluator, EvaluationResult +from .promotion import PromotionDecision, PromotionPolicy +from .router import SimpleLearningRouter +from .router_versioning import RouterVersionStore +from .training_reports import TrainingReportStore, build_report + + +@dataclass +class OnlineLearningCoordinator: + app: Any + policy: PromotionPolicy + benchmark_tasks: list[BenchmarkTask] + min_new_trajectories: int = 5 + version_store_base_dir: str | Path = "docs/projects/memabra/router-versions" + report_store_base_dir: str | Path = "docs/projects/memabra/training-reports" + seen_trajectory_store: str | Path | None = None + case_index_path: str | Path | None = None + _seen_trajectory_ids: set[str] = field(default_factory=set, repr=False) + + def __post_init__(self): + if self.seen_trajectory_store is not None: + path = Path(self.seen_trajectory_store) + if path.exists(): + data = json.loads(path.read_text(encoding="utf-8")) + self._seen_trajectory_ids = set(data.get("seen_trajectory_ids", [])) + + def _version_store(self) -> RouterVersionStore: + return RouterVersionStore(base_dir=self.version_store_base_dir) + + def _save_seen_trajectories(self) -> None: + if self.seen_trajectory_store is not None: + path = Path(self.seen_trajectory_store) + path.write_text( + json.dumps({"seen_trajectory_ids": sorted(self._seen_trajectory_ids)}, indent=2), + encoding="utf-8", + ) + + def run_cycle(self, dry_run: bool = False, baseline_version_id: str | None = None) -> dict[str, Any]: + index = self.app.artifact_index() + all_trajectories = index.query() + new_trajectories = [ + t for t in all_trajectories if t["trajectory_id"] not in self._seen_trajectory_ids + ] + + if len(new_trajectories) < self.min_new_trajectories: + report = { + "report_id": f"report-skipped-{len(self._seen_trajectory_ids)}", + "timestamp": datetime.now(timezone.utc).isoformat(), + "source_trajectory_ids": [], + "sample_count": 0, + "baseline_metrics": {}, + "challenger_metrics": {}, + "promotion_decision": {"accepted": False, "reasons": [f"Too few new trajectories ({len(new_trajectories)} < {self.min_new_trajectories})"], "metrics": {}}, + "promoted_version_id": None, + "skipped": True, + } + self._save_report(report) + return { + "skipped": True, + "reason": f"Too few new trajectories ({len(new_trajectories)} < {self.min_new_trajectories})", + "new_count": len(new_trajectories), + "min_required": self.min_new_trajectories, + "report_id": report["report_id"], + } + + try: + # Train challenger on all available trajectories + dataset_builder = DatasetBuilder() + samples = dataset_builder.build(all_trajectories) + challenger = SimpleLearningRouter() + if samples: + challenger.fit(samples) + + # Load baseline version if specified + original_router = self.app.runner.router + if baseline_version_id is not None: + baseline_router = self._version_store().load(baseline_version_id) + self.app.set_router(baseline_router) + + # Evaluate baseline vs challenger + evaluator = Evaluator(self.app) + baseline_result = evaluator.run(self.benchmark_tasks) + challenger_result = evaluator.run(self.benchmark_tasks, router=challenger) + except Exception as exc: + report = build_report( + source_trajectory_ids=[t["trajectory_id"] for t in all_trajectories], + baseline=EvaluationResult(task_count=0, trajectories=[], avg_reward=0.0, error_rate=0.0, avg_latency_ms=0.0, decision_distribution={}), + challenger=EvaluationResult(task_count=0, trajectories=[], avg_reward=0.0, error_rate=0.0, avg_latency_ms=0.0, decision_distribution={}), + decision=PromotionDecision(accepted=False, reasons=[f"Cycle failed: {exc}"], metrics={}), + promoted_version_id=None, + ) + self._save_report(report) + return { + "skipped": False, + "promoted": False, + "error": str(exc), + "report_id": report["report_id"], + } + finally: + # Restore original router if a baseline version was loaded + if baseline_version_id is not None: + self.app.set_router(original_router) + + # Refresh index to capture trajectories generated during evaluation + # and mark everything as seen so benchmark runs don't retrigger cycles. + index.refresh() + post_eval_trajectories = index.query() + for t in post_eval_trajectories: + self._seen_trajectory_ids.add(t["trajectory_id"]) + self._save_seen_trajectories() + + if self.case_index_path is not None: + self.app.build_case_index() + self.app.save_case_index(self.case_index_path) + + decision = self.policy.evaluate(baseline_result, challenger_result) + + version_id: str | None = None + if decision.accepted and not dry_run: + store = RouterVersionStore(base_dir=self.version_store_base_dir) + version_record = store.save( + challenger, + metadata={ + "source": "online_learning", + "benchmark_summary": decision.metrics, + }, + ) + version_id = version_record["version_id"] + self.app.set_router(challenger) + + report = build_report( + source_trajectory_ids=[t["trajectory_id"] for t in all_trajectories], + baseline=baseline_result, + challenger=challenger_result, + decision=decision, + promoted_version_id=version_id, + baseline_version_id=baseline_version_id, + ) + report["dry_run"] = dry_run + self._save_report(report) + + if not decision.accepted or dry_run: + return { + "skipped": False, + "promoted": False, + "decision": decision, + "baseline_metrics": baseline_result, + "challenger_metrics": challenger_result, + "report_id": report["report_id"], + "dry_run": dry_run, + } + + return { + "skipped": False, + "promoted": True, + "decision": decision, + "version_id": version_id, + "baseline_metrics": baseline_result, + "challenger_metrics": challenger_result, + "report_id": report["report_id"], + } + + def _save_report(self, report: dict[str, Any]) -> None: + store = TrainingReportStore(base_dir=self.report_store_base_dir) + store.save(report) diff --git a/src/memabra/outcome.py b/src/memabra/outcome.py new file mode 100644 index 0000000..bdf080a --- /dev/null +++ b/src/memabra/outcome.py @@ -0,0 +1,138 @@ +from __future__ import annotations + +from dataclasses import dataclass +from typing import Any + +from .execution import ActionResult +from .retrieval import RetrievalResult +from .router import RouteDecision +from .telemetry import RewardBreakdown + + +@dataclass(slots=True) +class Outcome: + status: str + steps: int + latency_ms: int + user_corrections: int + tool_errors: int + notes: str | None = None + + +class OutcomeEngine: + def build_outcome(self, decision: RouteDecision, execution_result: ActionResult | None = None) -> Outcome: + latency_ms = int((execution_result.details.get("latency_ms", 0) if execution_result is not None else 0) or 0) + steps = 1 + user_corrections = 0 + + if execution_result is not None and execution_result.status == "error": + tool_errors = self._count_tool_errors(decision, execution_result) + status = self._resolve_status(decision, execution_result, tool_errors) + notes = "Execution failed during runner dispatch." + if status == "partial_success": + notes = "Some tools succeeded, but errors were encountered." + return Outcome( + status=status, + steps=steps, + latency_ms=latency_ms, + user_corrections=user_corrections, + tool_errors=tool_errors, + notes=notes, + ) + + status = "partial_success" if decision.decision_type == "clarify" else "success" + notes = "Draft trajectory generated by MemabraRunner with execution hooks." if execution_result else "Draft trajectory generated by MemabraRunner." + return Outcome( + status=status, + steps=steps, + latency_ms=latency_ms, + user_corrections=user_corrections, + tool_errors=0, + notes=notes, + ) + + def _count_tool_errors(self, decision: RouteDecision, execution_result: ActionResult) -> int: + if decision.decision_type != "call_tool": + return 0 + results = execution_result.details.get("results", []) + if not results: + return 1 + return sum(1 for r in results if r.get("status") == "error") + + def _resolve_status(self, decision: RouteDecision, execution_result: ActionResult, tool_errors: int) -> str: + if decision.decision_type != "call_tool": + return "failure" + total_tools = max(len(decision.selected_ids), len(execution_result.details.get("results", []))) + if total_tools > 0 and 0 < tool_errors < total_tools: + return "partial_success" + return "failure" + + +class RewardEngine: + def compute( + self, + decision: RouteDecision, + outcome: Outcome, + execution_result: ActionResult | None = None, + retrieval_result: RetrievalResult | None = None, + ) -> RewardBreakdown: + latency_ms = outcome.latency_ms + latency_penalty = self._latency_tier_penalty(latency_ms) + tool_error = self._tool_error_penalty(outcome) + context_cost = self._context_cost(retrieval_result) + + if decision.decision_type == "clarify": + return RewardBreakdown( + task_success=0.4, + retrieval_hit=0.1, + user_correction=0.0, + latency=latency_penalty, + context_cost=context_cost, + tool_error=tool_error, + ) + + if decision.decision_type == "call_tool": + task_success = self._tool_task_success(tool_error, outcome) + useful_reuse = 0.05 if outcome.status in ("success", "partial_success") and tool_error == 0.0 else 0.0 + return RewardBreakdown( + task_success=task_success, + retrieval_hit=0.25, + useful_reuse=useful_reuse, + latency=latency_penalty, + context_cost=context_cost, + tool_error=tool_error, + ) + + return RewardBreakdown( + task_success=0.8 if outcome.status == "success" else 0.5, + retrieval_hit=0.2, + useful_reuse=0.1 if outcome.status == "success" else 0.0, + latency=latency_penalty, + context_cost=context_cost, + tool_error=tool_error, + ) + + def _latency_tier_penalty(self, latency_ms: int) -> float: + if latency_ms < 500: + return round(latency_ms / 5000, 3) + if latency_ms < 1500: + return round(latency_ms / 2000, 3) + return round(latency_ms / 1000, 3) + + def _tool_error_penalty(self, outcome: Outcome) -> float: + base = 0.35 if outcome.tool_errors > 0 else 0.0 + extra = 0.15 * max(0, outcome.tool_errors - 1) + return round(min(base + extra, 1.0), 3) + + def _context_cost(self, retrieval_result: RetrievalResult | None) -> float: + if retrieval_result is None: + return 0.0 + total = len(retrieval_result.memory) + len(retrieval_result.skill) + len(retrieval_result.tool) + return round(total * 0.02, 3) + + def _tool_task_success(self, tool_error: float, outcome: Outcome) -> float: + if tool_error == 0.0: + return 0.8 + if outcome.status == "partial_success": + return max(0.2, 0.6 - tool_error) + return max(0.0, 0.2 - tool_error) diff --git a/src/memabra/persistence.py b/src/memabra/persistence.py new file mode 100644 index 0000000..e9323e9 --- /dev/null +++ b/src/memabra/persistence.py @@ -0,0 +1,40 @@ +from __future__ import annotations + +import json +from pathlib import Path +from typing import Any + +from .memory_store import MemoryRecord + + +class PersistenceStore: + def __init__(self, base_dir: str | Path = "docs/projects/memabra/artifacts"): + self.base_dir = Path(base_dir) + self.trajectories_dir = self.base_dir / "trajectories" + self.memories_dir = self.base_dir / "memories" + self.trajectories_dir.mkdir(parents=True, exist_ok=True) + self.memories_dir.mkdir(parents=True, exist_ok=True) + + def save_trajectory(self, trajectory: dict[str, Any]) -> Path: + path = self.trajectories_dir / f"{trajectory['trajectory_id']}.json" + path.write_text(json.dumps(trajectory, indent=2, ensure_ascii=False), encoding="utf-8") + return path + + def load_trajectory(self, trajectory_id: str) -> dict[str, Any]: + path = self.trajectories_dir / f"{trajectory_id}.json" + return json.loads(path.read_text(encoding="utf-8")) + + def list_trajectory_paths(self) -> list[Path]: + return sorted(self.trajectories_dir.glob("*.json")) + + def save_memory_record(self, record: MemoryRecord) -> Path: + path = self.memories_dir / f"{record.id}.json" + path.write_text(json.dumps(record.to_dict(), indent=2, ensure_ascii=False), encoding="utf-8") + return path + + def load_memory_record(self, record_id: str) -> dict[str, Any]: + path = self.memories_dir / f"{record_id}.json" + return json.loads(path.read_text(encoding="utf-8")) + + def list_memory_paths(self) -> list[Path]: + return sorted(self.memories_dir.glob("*.json")) diff --git a/src/memabra/promotion.py b/src/memabra/promotion.py new file mode 100644 index 0000000..921a1cb --- /dev/null +++ b/src/memabra/promotion.py @@ -0,0 +1,59 @@ +from __future__ import annotations + +from dataclasses import dataclass +from typing import Any + +from .evaluator import EvaluationResult + + +@dataclass(slots=True) +class PromotionDecision: + accepted: bool + reasons: list[str] + metrics: dict[str, Any] + + +@dataclass(slots=True) +class PromotionPolicy: + min_reward_delta: float + max_error_rate_increase: float + max_latency_increase_ms: float + required_task_count: int + + def evaluate(self, baseline: EvaluationResult, challenger: EvaluationResult) -> PromotionDecision: + reasons: list[str] = [] + reward_delta = challenger.avg_reward - baseline.avg_reward + error_rate_delta = challenger.error_rate - baseline.error_rate + latency_delta_ms = challenger.avg_latency_ms - baseline.avg_latency_ms + + if challenger.task_count < self.required_task_count: + reasons.append( + f"Task count {challenger.task_count} below required {self.required_task_count}" + ) + + if reward_delta < self.min_reward_delta: + reasons.append( + f"Reward delta {reward_delta:.4f} below minimum {self.min_reward_delta}" + ) + + if error_rate_delta > self.max_error_rate_increase: + reasons.append( + f"Error rate increase {error_rate_delta:.4f} exceeds max {self.max_error_rate_increase}" + ) + + if latency_delta_ms > self.max_latency_increase_ms: + reasons.append( + f"Latency increase {latency_delta_ms:.1f}ms exceeds max {self.max_latency_increase_ms}ms" + ) + + return PromotionDecision( + accepted=len(reasons) == 0, + reasons=reasons, + metrics={ + "reward_delta": round(reward_delta, 4), + "error_rate_delta": round(error_rate_delta, 4), + "latency_delta_ms": round(latency_delta_ms, 4), + "baseline_avg_reward": baseline.avg_reward, + "challenger_avg_reward": challenger.avg_reward, + }, + ) diff --git a/src/memabra/replay.py b/src/memabra/replay.py new file mode 100644 index 0000000..ebe8508 --- /dev/null +++ b/src/memabra/replay.py @@ -0,0 +1,88 @@ +from __future__ import annotations + +import json +from dataclasses import dataclass +from pathlib import Path +from typing import Any + +from .persistence import PersistenceStore + + +@dataclass(slots=True) +class ReplaySummary: + trajectories: int + success_count: int + partial_success_count: int + failure_count: int + average_reward: float + average_latency_ms: float + average_steps: float + average_user_corrections: float + direct_answer_count: int + memory_action_count: int + skill_action_count: int + tool_action_count: int + clarify_count: int + composite_action_count: int + + +class TrajectoryReplay: + def load(self, path: str | Path) -> dict[str, Any]: + trajectory_path = Path(path) + with trajectory_path.open("r", encoding="utf-8") as handle: + return json.load(handle) + + def load_many(self, paths: list[str | Path]) -> list[dict[str, Any]]: + return [self.load(path) for path in paths] + + def summarize(self, trajectories: list[dict[str, Any]]) -> ReplaySummary: + total = len(trajectories) + if total == 0: + return ReplaySummary(0, 0, 0, 0, 0.0, 0.0, 0.0, 0.0, 0, 0, 0, 0, 0, 0) + + success_count = sum(1 for t in trajectories if t["outcome"]["status"] == "success") + partial_success_count = sum(1 for t in trajectories if t["outcome"]["status"] == "partial_success") + failure_count = sum(1 for t in trajectories if t["outcome"]["status"] == "failure") + average_reward = sum(t["reward"]["total"] for t in trajectories) / total + average_latency_ms = sum(t["outcome"]["latency_ms"] for t in trajectories) / total + average_steps = sum(t["outcome"]["steps"] for t in trajectories) / total + average_user_corrections = sum(t["outcome"]["user_corrections"] for t in trajectories) / total + + decisions = [decision for trajectory in trajectories for decision in trajectory.get("decisions", [])] + counts = { + "direct_answer": 0, + "inject_memory": 0, + "load_skill": 0, + "call_tool": 0, + "clarify": 0, + "composite_action": 0, + } + for decision in decisions: + decision_type = decision["decision_type"] + counts[decision_type] = counts.get(decision_type, 0) + 1 + + return ReplaySummary( + trajectories=total, + success_count=success_count, + partial_success_count=partial_success_count, + failure_count=failure_count, + average_reward=average_reward, + average_latency_ms=average_latency_ms, + average_steps=average_steps, + average_user_corrections=average_user_corrections, + direct_answer_count=counts["direct_answer"], + memory_action_count=counts["inject_memory"], + skill_action_count=counts["load_skill"], + tool_action_count=counts["call_tool"], + clarify_count=counts["clarify"], + composite_action_count=counts["composite_action"], + ) + + def summarize_directory(self, directory: str | Path) -> ReplaySummary: + base = Path(directory) + paths = sorted(base.glob("*.json")) + trajectories = self.load_many(paths) + return self.summarize(trajectories) + + def summarize_persistence_store(self, persistence_store: PersistenceStore) -> ReplaySummary: + return self.summarize(self.load_many(persistence_store.list_trajectory_paths())) diff --git a/src/memabra/retrieval.py b/src/memabra/retrieval.py new file mode 100644 index 0000000..90753f4 --- /dev/null +++ b/src/memabra/retrieval.py @@ -0,0 +1,88 @@ +from __future__ import annotations + +from dataclasses import dataclass +from typing import Iterable, Protocol + +from .candidate_types import CandidateObject, CandidateType +from .router import TaskContext + + +class CandidateProvider(Protocol): + candidate_type: CandidateType + + def list_candidates(self) -> Iterable[CandidateObject]: + """Return all available candidates for this provider.""" + + +@dataclass(slots=True) +class InMemoryCandidateProvider: + candidate_type: CandidateType + candidates: list[CandidateObject] + + def list_candidates(self) -> Iterable[CandidateObject]: + return list(self.candidates) + + +@dataclass(slots=True) +class RetrievalResult: + memory: list[CandidateObject] + skill: list[CandidateObject] + tool: list[CandidateObject] + + +class CandidateRetriever: + def __init__(self, providers: Iterable[CandidateProvider]): + self.providers = list(providers) + + def retrieve(self, context: TaskContext, top_k: int = 3) -> RetrievalResult: + grouped: dict[CandidateType, list[CandidateObject]] = { + "memory": [], + "skill": [], + "tool": [], + } + + for provider in self.providers: + candidates = [candidate for candidate in provider.list_candidates() if candidate.type == provider.candidate_type] + ranked = sorted( + candidates, + key=lambda candidate: self._score_candidate(candidate, context), + reverse=True, + ) + grouped[provider.candidate_type].extend(ranked[:top_k]) + + return RetrievalResult( + memory=self._dedupe_and_rank(grouped["memory"], context, top_k), + skill=self._dedupe_and_rank(grouped["skill"], context, top_k), + tool=self._dedupe_and_rank(grouped["tool"], context, top_k), + ) + + def _dedupe_and_rank( + self, + candidates: list[CandidateObject], + context: TaskContext, + top_k: int, + ) -> list[CandidateObject]: + deduped: dict[str, CandidateObject] = {} + for candidate in candidates: + current = deduped.get(candidate.id) + if current is None or self._score_candidate(candidate, context) > self._score_candidate(current, context): + deduped[candidate.id] = candidate + + return sorted( + deduped.values(), + key=lambda candidate: self._score_candidate(candidate, context), + reverse=True, + )[:top_k] + + def _score_candidate(self, candidate: CandidateObject, context: TaskContext) -> float: + text = " ".join( + [ + context.user_input.lower(), + context.conversation_summary.lower(), + context.environment_summary.lower(), + ] + ) + lexical_hits = sum(1 for token in candidate.triggers + candidate.tags if token.lower() in text) + base = candidate.confidence + candidate.success_rate + candidate.freshness + penalty = candidate.cost + candidate.risk + return base + (0.2 * lexical_hits) - penalty diff --git a/src/memabra/reward.py b/src/memabra/reward.py new file mode 100644 index 0000000..7fd2fd2 --- /dev/null +++ b/src/memabra/reward.py @@ -0,0 +1,22 @@ +from .telemetry import RewardBreakdown + + +def compute_reward( + *, + task_success: float, + retrieval_hit: float, + tool_error: float, + user_correction: float, + latency: float, + context_cost: float, + useful_reuse: float, +) -> RewardBreakdown: + return RewardBreakdown( + task_success=task_success, + retrieval_hit=retrieval_hit, + tool_error=tool_error, + user_correction=user_correction, + latency=latency, + context_cost=context_cost, + useful_reuse=useful_reuse, + ) diff --git a/src/memabra/router.py b/src/memabra/router.py new file mode 100644 index 0000000..e730c97 --- /dev/null +++ b/src/memabra/router.py @@ -0,0 +1,337 @@ +from dataclasses import dataclass, field +from typing import Any, Iterable, Protocol, runtime_checkable + +from .candidate_types import CandidateObject, DecisionType +from .dataset import TrainingSample + + +@runtime_checkable +class RouterProtocol(Protocol): + def choose( + self, + context: "TaskContext", + memory_candidates: Iterable[CandidateObject], + skill_candidates: Iterable[CandidateObject], + tool_candidates: Iterable[CandidateObject], + ) -> "RouteDecision": + ... + + +@dataclass(slots=True) +class RouteDecision: + decision_type: DecisionType + selected_ids: list[str] = field(default_factory=list) + selected_payloads: list[dict[str, Any]] = field(default_factory=list) + rationale: str = "" + estimated_cost: float = 0.0 + score_breakdown: dict[str, float] = field(default_factory=dict) + composite_steps: list["RouteDecision"] = field(default_factory=list) + + +@dataclass(slots=True) +class TaskContext: + user_input: str + conversation_summary: str = "" + environment_summary: str = "" + recent_failures: list[str] = field(default_factory=list) + + +class RuleBasedRouter: + """Baseline placeholder router for Phase 1. + + The initial implementation is intentionally simple: + - prefer direct answer for low-ambiguity, no-tool tasks + - prefer memory when user/environment facts appear relevant + - prefer skill when a reusable procedure is clearly triggered + - prefer tool when current state or side effects must be observed + """ + + def choose( + self, + context: TaskContext, + memory_candidates: Iterable[CandidateObject], + skill_candidates: Iterable[CandidateObject], + tool_candidates: Iterable[CandidateObject], + ) -> RouteDecision: + text = context.user_input.lower() + if any(token in text for token in ["why", "think", "design", "name"]): + return RouteDecision( + decision_type="direct_answer", + rationale="Looks like a reasoning-first task with no strong tool trigger.", + ) + + tool_matches = [c for c in tool_candidates if c.confidence >= 0.6 and c.risk <= 0.7] + if any(token in text for token in ["check", "run", "open", "current", "list", "time"]): + if tool_matches: + best = sorted(tool_matches, key=lambda c: (c.confidence + c.success_rate - c.cost), reverse=True)[0] + return RouteDecision( + decision_type="call_tool", + selected_ids=[best.id], + selected_payloads=[dict(best.type_payload)], + rationale="Task asks for current state or external action; tool use is justified.", + estimated_cost=best.cost, + ) + + memory_matches = [c for c in memory_candidates if c.confidence >= 0.65 and c.freshness >= 0.3] + if any(token in text for token in ["prefer", "remember", "usually", "my", "our"]): + if memory_matches: + best = sorted(memory_matches, key=lambda c: (c.confidence + c.freshness + c.success_rate), reverse=True)[0] + return RouteDecision( + decision_type="inject_memory", + selected_ids=[best.id], + selected_payloads=[dict(best.type_payload)], + rationale="Task likely depends on stable user/project facts.", + estimated_cost=best.cost, + ) + + skill_matches = [c for c in skill_candidates if c.confidence >= 0.55 and c.success_rate >= 0.4] + if any(token in text for token in ["fix", "deploy", "review", "setup", "workflow"]): + if skill_matches: + best = sorted(skill_matches, key=lambda c: (c.success_rate + c.confidence - c.cost), reverse=True)[0] + return RouteDecision( + decision_type="load_skill", + selected_ids=[best.id], + selected_payloads=[dict(best.type_payload)], + rationale="Task resembles a reusable procedure; load a skill before action.", + estimated_cost=best.cost, + ) + + return RouteDecision( + decision_type="clarify", + rationale="No high-confidence route found from the current heuristic baseline.", + ) + + +class FeatureScoringRouter: + """Router v2 with explicit feature scoring, failure penalties, and composite action preconditions.""" + + def choose( + self, + context: TaskContext, + memory_candidates: Iterable[CandidateObject], + skill_candidates: Iterable[CandidateObject], + tool_candidates: Iterable[CandidateObject], + ) -> RouteDecision: + scored: list[tuple[CandidateObject, str, float]] = [] + breakdown: dict[str, float] = {} + + for c in memory_candidates: + score = self._score(c, "memory", context) + scored.append((c, "memory", score)) + breakdown[c.id] = score + + for c in skill_candidates: + score = self._score(c, "skill", context) + scored.append((c, "skill", score)) + breakdown[c.id] = score + + for c in tool_candidates: + score = self._score(c, "tool", context) + scored.append((c, "tool", score)) + breakdown[c.id] = score + + filtered = [item for item in scored if self._passes_threshold(item[0], item[1])] + + if not filtered: + return RouteDecision( + decision_type="clarify", + rationale="No high-confidence route found from feature scoring.", + score_breakdown=breakdown, + ) + + best_candidate, best_type, best_score = max(filtered, key=lambda x: x[2]) + + if best_candidate.preconditions: + composite_steps: list[RouteDecision] = [] + for precondition in best_candidate.preconditions: + pre_candidate = self._find_best_precondition(precondition, scored) + if pre_candidate is not None: + pre_type = precondition + composite_steps.append( + RouteDecision( + decision_type=self._decision_type_for_candidate_type(pre_type), + selected_ids=[pre_candidate.id], + selected_payloads=[dict(pre_candidate.type_payload)], + rationale=f"Satisfy precondition for {best_candidate.id}.", + estimated_cost=pre_candidate.cost, + ) + ) + if composite_steps: + composite_steps.append( + RouteDecision( + decision_type=self._decision_type_for_candidate_type(best_type), + selected_ids=[best_candidate.id], + selected_payloads=[dict(best_candidate.type_payload)], + rationale=f"Best {best_type} candidate after feature scoring.", + estimated_cost=best_candidate.cost, + score_breakdown={best_candidate.id: best_score}, + ) + ) + return RouteDecision( + decision_type="composite_action", + rationale=f"Composite action required for {best_candidate.id}.", + composite_steps=composite_steps, + score_breakdown=breakdown, + ) + + return RouteDecision( + decision_type=self._decision_type_for_candidate_type(best_type), + selected_ids=[best_candidate.id], + selected_payloads=[dict(best_candidate.type_payload)], + rationale=f"Best {best_type} candidate after feature scoring.", + estimated_cost=best_candidate.cost, + score_breakdown=breakdown, + ) + + def _score(self, candidate: CandidateObject, candidate_type: str, context: TaskContext) -> float: + if candidate_type == "memory": + score = ( + candidate.confidence * 0.35 + + candidate.freshness * 0.25 + + candidate.success_rate * 0.25 + - candidate.cost * 0.1 + - candidate.risk * 0.05 + ) + elif candidate_type == "skill": + score = ( + candidate.confidence * 0.25 + + candidate.success_rate * 0.35 + - candidate.cost * 0.2 + - candidate.risk * 0.2 + ) + else: # tool + score = ( + candidate.confidence * 0.3 + + candidate.success_rate * 0.3 + - candidate.cost * 0.1 + - candidate.risk * 0.3 + ) + if candidate.id in context.recent_failures: + score -= 0.5 + return round(score, 4) + + def _passes_threshold(self, candidate: CandidateObject, candidate_type: str) -> bool: + if candidate_type == "memory": + return candidate.confidence >= 0.65 and candidate.freshness >= 0.3 + if candidate_type == "skill": + return candidate.confidence >= 0.55 and candidate.success_rate >= 0.4 + if candidate_type == "tool": + return candidate.confidence >= 0.6 and candidate.risk <= 0.7 + return True + + def _decision_type_for_candidate_type(self, candidate_type: str) -> DecisionType: + if candidate_type == "memory": + return "inject_memory" + if candidate_type == "skill": + return "load_skill" + if candidate_type == "tool": + return "call_tool" + return "clarify" + + def _find_best_precondition( + self, + precondition: str, + scored: list[tuple[CandidateObject, str, float]], + ) -> CandidateObject | None: + matches = [ + item for item in scored if item[1] == precondition and self._passes_threshold(item[0], precondition) + ] + if not matches: + return None + best, _, _ = max(matches, key=lambda x: x[2]) + return best + + +def _extract_features( + context: TaskContext, + memory_candidates: Iterable[CandidateObject], + skill_candidates: Iterable[CandidateObject], + tool_candidates: Iterable[CandidateObject], +) -> dict[str, float]: + memory = list(memory_candidates) + skill = list(skill_candidates) + tool = list(tool_candidates) + return { + "input_length": float(len(context.user_input)), + "memory_count": float(len(memory)), + "skill_count": float(len(skill)), + "tool_count": float(len(tool)), + "top_memory_confidence": max((c.confidence for c in memory), default=0.0), + "top_skill_success_rate": max((c.success_rate for c in skill), default=0.0), + "top_tool_confidence": max((c.confidence for c in tool), default=0.0), + "top_tool_risk": max((c.risk for c in tool), default=0.0), + } + + +class SimpleLearningRouter: + """Lightweight learning router that trains reward-weighted feature vectors per decision type.""" + + def __init__(self) -> None: + self._weights: dict[str, dict[str, float]] = {} + self._feature_keys: list[str] = [] + + def fit(self, samples: list[TrainingSample]) -> None: + from collections import defaultdict + + sums: dict[str, dict[str, float]] = defaultdict(lambda: defaultdict(float)) + counts: dict[str, float] = defaultdict(float) + for sample in samples: + label = sample.label + reward = sample.reward + for key, value in sample.features.items(): + sums[label][key] += value * reward + counts[label] += reward + if not self._feature_keys: + self._feature_keys = list(sample.features.keys()) + + self._weights = {} + for label, feature_sums in sums.items(): + total_reward = counts[label] + if total_reward == 0: + continue + self._weights[label] = {k: v / total_reward for k, v in feature_sums.items()} + + def choose( + self, + context: TaskContext, + memory_candidates: Iterable[CandidateObject], + skill_candidates: Iterable[CandidateObject], + tool_candidates: Iterable[CandidateObject], + ) -> RouteDecision: + features = _extract_features(context, memory_candidates, skill_candidates, tool_candidates) + if not self._weights: + return RouteDecision( + decision_type="clarify", + rationale="Learning router has not been trained yet.", + ) + + best_label: str | None = None + best_score = float("-inf") + for label, weights in self._weights.items(): + score = sum(features.get(k, 0.0) * w for k, w in weights.items()) + if score > best_score: + best_score = score + best_label = label + + assert best_label is not None + selected_ids: list[str] = [] + selected_payloads: list[dict[str, Any]] = [] + if best_label == "inject_memory" and memory_candidates: + best = max(memory_candidates, key=lambda c: c.confidence) + selected_ids = [best.id] + selected_payloads = [dict(best.type_payload)] + elif best_label == "load_skill" and skill_candidates: + best = max(skill_candidates, key=lambda c: c.success_rate) + selected_ids = [best.id] + selected_payloads = [dict(best.type_payload)] + elif best_label == "call_tool" and tool_candidates: + best = max(tool_candidates, key=lambda c: c.confidence - c.risk) + selected_ids = [best.id] + selected_payloads = [dict(best.type_payload)] + + return RouteDecision( + decision_type=best_label, + selected_ids=selected_ids, + selected_payloads=selected_payloads, + rationale=f"Predicted by learning router (score={round(best_score, 4)}).", + ) diff --git a/src/memabra/router_versioning.py b/src/memabra/router_versioning.py new file mode 100644 index 0000000..11cb968 --- /dev/null +++ b/src/memabra/router_versioning.py @@ -0,0 +1,97 @@ +from __future__ import annotations + +import json +from dataclasses import dataclass, field +from datetime import datetime, timezone +from pathlib import Path +from typing import Any + +from .router import SimpleLearningRouter + + +@dataclass +class RouterVersionStore: + base_dir: str | Path = field(default="docs/projects/memabra/router-versions") + + def __post_init__(self): + self._base = Path(self.base_dir) + self._versions_dir = self._base / "versions" + self._versions_dir.mkdir(parents=True, exist_ok=True) + self._current_file = self._base / "current.json" + + def save( + self, + router: SimpleLearningRouter, + version_id: str | None = None, + metadata: dict[str, Any] | None = None, + ) -> dict[str, Any]: + if version_id is None: + version_id = datetime.now(timezone.utc).strftime("%Y%m%d-%H%M%S") + + version_path = self._versions_dir / f"{version_id}.json" + record = { + "version_id": version_id, + "weights": router._weights, + "feature_keys": router._feature_keys, + "metadata": metadata or {}, + } + version_path.write_text(json.dumps(record, indent=2), encoding="utf-8") + + prior = self._read_current() + prior_version_id = prior.get("current_version_id") + + current_record = { + "current_version_id": version_id, + "promotion_source": (metadata or {}).get("promotion_source"), + "benchmark_summary": (metadata or {}).get("benchmark_summary"), + "prior_version_id": prior_version_id, + "saved_at": datetime.now(timezone.utc).isoformat(), + } + self._current_file.write_text(json.dumps(current_record, indent=2), encoding="utf-8") + return record + + def load(self, version_id: str | None = None) -> SimpleLearningRouter: + if version_id is None: + current = self._read_current() + version_id = current.get("current_version_id") + if version_id is None: + raise ValueError("No version_id provided and no current version set.") + + version_path = self._versions_dir / f"{version_id}.json" + record = json.loads(version_path.read_text(encoding="utf-8")) + router = SimpleLearningRouter() + router._weights = record.get("weights", {}) + router._feature_keys = record.get("feature_keys", []) + return router + + def list_versions(self) -> list[dict[str, Any]]: + versions = [] + for path in sorted(self._versions_dir.glob("*.json")): + record = json.loads(path.read_text(encoding="utf-8")) + versions.append({ + "version_id": record.get("version_id"), + "metadata": record.get("metadata", {}), + }) + return versions + + def rollback(self, version_id: str) -> dict[str, Any]: + version_path = self._versions_dir / f"{version_id}.json" + if not version_path.exists(): + raise ValueError(f"Version '{version_id}' not found.") + prior = self._read_current() + current_record = { + "current_version_id": version_id, + "rollback_from": prior.get("current_version_id"), + "rolled_back_at": datetime.now(timezone.utc).isoformat(), + "prior_version_id": prior.get("prior_version_id"), + } + self._current_file.write_text(json.dumps(current_record, indent=2), encoding="utf-8") + return {"current_version_id": version_id} + + def get_current(self) -> dict[str, Any]: + return self._read_current() + + def _read_current(self) -> dict[str, Any]: + if not self._current_file.exists(): + return {} + return json.loads(self._current_file.read_text(encoding="utf-8")) diff --git a/src/memabra/runner.py b/src/memabra/runner.py new file mode 100644 index 0000000..0bf8dd4 --- /dev/null +++ b/src/memabra/runner.py @@ -0,0 +1,237 @@ +from __future__ import annotations + +from dataclasses import dataclass, field +from datetime import UTC, datetime +from typing import Any +from uuid import uuid4 + +from .candidate_types import CandidateObject +from .case_index import CaseIndex +from .execution import ExecutionEngine +from .memory_store import InMemoryMemoryStore, MemoryRecord, MemorySource +from .outcome import OutcomeEngine, RewardEngine +from .persistence import PersistenceStore +from .replay import TrajectoryReplay +from .retrieval import CandidateRetriever, RetrievalResult +from .router import RouteDecision, RuleBasedRouter, TaskContext +from .telemetry import Event, RewardBreakdown +from .trajectory_summary import TrajectorySummarizer + + +@dataclass(slots=True) +class MemabraRunner: + retriever: CandidateRetriever + router: RuleBasedRouter + execution_engine: ExecutionEngine | None = None + persistence_store: PersistenceStore | None = None + memory_store: InMemoryMemoryStore | None = None + case_index: CaseIndex | None = None + outcome_engine: OutcomeEngine = field(default_factory=OutcomeEngine) + reward_engine: RewardEngine = field(default_factory=RewardEngine) + + def run( + self, + *, + context: TaskContext, + channel: str = "local", + user_id: str | None = None, + top_k: int = 3, + persist: bool = False, + ) -> dict[str, Any]: + trajectory_id = f"traj-{uuid4()}" + task_id = f"task-{uuid4()}" + started_at = datetime.now(UTC).isoformat() + + retrieval_result = self.retriever.retrieve(context, top_k=top_k) + if self.case_index is not None: + best_trajectory_id = self.case_index.best(context.user_input) + if best_trajectory_id is not None: + summary = f"Previous successful trajectory: {best_trajectory_id}" + if self.persistence_store is not None: + try: + past_trajectory = self.persistence_store.load_trajectory(best_trajectory_id) + summary = TrajectorySummarizer().summarize(past_trajectory) + except Exception: + pass + episodic_candidate = CandidateObject( + id=f"episodic-{best_trajectory_id}", + type="memory", + title="Episodic case", + summary=summary, + triggers=["episodic"], + confidence=0.95, + success_rate=0.95, + freshness=1.0, + tags=["episodic"], + source="case_index", + ) + retrieval_result.memory.insert(0, episodic_candidate) + decision = self.router.choose( + context, + retrieval_result.memory, + retrieval_result.skill, + retrieval_result.tool, + ) + events = self._build_events(trajectory_id, context, retrieval_result, decision) + execution_result = None + if self.execution_engine is not None: + execution_result = self.execution_engine.execute(decision, context, trajectory_id) + events.extend(execution_result.events) + self._write_back_memory(decision, context, execution_result) + outcome = self.outcome_engine.build_outcome(decision, execution_result) + reward = self.reward_engine.compute( + decision, + outcome, + execution_result=execution_result, + retrieval_result=retrieval_result, + ) + outcome_dict = { + "status": outcome.status, + "steps": outcome.steps, + "latency_ms": outcome.latency_ms, + "user_corrections": outcome.user_corrections, + "tool_errors": outcome.tool_errors, + "notes": outcome.notes, + } + + trajectory = { + "trajectory_id": trajectory_id, + "task": { + "task_id": task_id, + "input": context.user_input, + "channel": channel, + "created_at": started_at, + "user_id": user_id, + }, + "context_snapshot": { + "conversation_summary": context.conversation_summary, + "environment_summary": context.environment_summary, + "recent_failures": list(context.recent_failures), + }, + "candidate_sets": { + "memory": [self._candidate_to_dict(candidate) for candidate in retrieval_result.memory], + "skill": [self._candidate_to_dict(candidate) for candidate in retrieval_result.skill], + "tool": [self._candidate_to_dict(candidate) for candidate in retrieval_result.tool], + }, + "decisions": [self._decision_to_dict(decision)], + "events": [self._event_to_dict(event) for event in events], + "outcome": outcome_dict, + "reward": { + "total": reward.total, + "components": { + "task_success": reward.task_success, + "retrieval_hit": reward.retrieval_hit, + "tool_error": reward.tool_error, + "user_correction": reward.user_correction, + "latency": reward.latency, + "context_cost": reward.context_cost, + "useful_reuse": reward.useful_reuse, + }, + }, + } + if persist and self.persistence_store is not None: + self.persistence_store.save_trajectory(trajectory) + return trajectory + + def summarize_runs(self, trajectories: list[dict[str, Any]]): + replay = TrajectoryReplay() + return replay.summarize(trajectories) + + def _write_back_memory(self, decision: RouteDecision, context: TaskContext, execution_result) -> None: + if self.memory_store is None: + return + if decision.decision_type == "inject_memory": + for record_id in decision.selected_ids: + if self.memory_store.get(record_id) is None: + self.memory_store.upsert( + MemoryRecord( + id=record_id, + memory_type="semantic", + fact_status="assumed", + content=context.user_input, + summary=f"Writeback placeholder for {record_id}", + source=MemorySource(kind="system", ref="runner-writeback"), + confidence=0.5, + ) + ) + self.memory_store.mark_used(record_id) + + def _build_events( + self, + trajectory_id: str, + context: TaskContext, + retrieval_result: RetrievalResult, + decision: RouteDecision, + ) -> list[Event]: + task_event_id = f"evt-{uuid4()}" + retrieve_event_id = f"evt-{uuid4()}" + decision_event_id = f"evt-{uuid4()}" + return [ + Event( + event_id=task_event_id, + trajectory_id=trajectory_id, + stage="retrieval", + event_type="task_received", + payload={"input": context.user_input}, + ), + Event( + event_id=retrieve_event_id, + trajectory_id=trajectory_id, + stage="retrieval", + event_type="candidates_recalled", + parent_event_id=task_event_id, + payload={ + "memory_ids": [candidate.id for candidate in retrieval_result.memory], + "skill_ids": [candidate.id for candidate in retrieval_result.skill], + "tool_ids": [candidate.id for candidate in retrieval_result.tool], + }, + ), + Event( + event_id=decision_event_id, + trajectory_id=trajectory_id, + stage="policy", + event_type="action_selected", + parent_event_id=retrieve_event_id, + payload=self._decision_to_dict(decision), + ), + ] + + def _candidate_to_dict(self, candidate) -> dict[str, Any]: + return { + "id": candidate.id, + "type": candidate.type, + "title": candidate.title, + "summary": candidate.summary, + "triggers": list(candidate.triggers), + "cost": candidate.cost, + "confidence": candidate.confidence, + "success_rate": candidate.success_rate, + "freshness": candidate.freshness, + "risk": candidate.risk, + "tags": list(candidate.tags), + "source": candidate.source, + "type_payload": dict(candidate.type_payload), + } + + def _decision_to_dict(self, decision: RouteDecision) -> dict[str, Any]: + return { + "step": 1, + "decision_type": decision.decision_type, + "selected_ids": list(decision.selected_ids), + "selected_payloads": [dict(payload) for payload in decision.selected_payloads], + "rejected_ids": [], + "rationale": decision.rationale, + "estimated_cost": decision.estimated_cost, + } + + def _event_to_dict(self, event: Event) -> dict[str, Any]: + return { + "event_id": event.event_id, + "trajectory_id": event.trajectory_id, + "timestamp": event.timestamp, + "stage": event.stage, + "event_type": event.event_type, + "payload": event.payload, + "metrics": event.metrics, + "parent_event_id": event.parent_event_id, + } diff --git a/src/memabra/schemas.py b/src/memabra/schemas.py new file mode 100644 index 0000000..bb36ff4 --- /dev/null +++ b/src/memabra/schemas.py @@ -0,0 +1,44 @@ +from __future__ import annotations + +import json +from pathlib import Path +from typing import Any + + +class SchemaValidationError(ValueError): + pass + + +class SchemaRegistry: + def __init__(self, schema_dir: str | Path = "docs/projects/memabra/schemas"): + self.schema_dir = Path(schema_dir) + + def load_schema(self, name: str) -> dict[str, Any]: + path = self.schema_dir / name + with path.open("r", encoding="utf-8") as handle: + return json.load(handle) + + def validate_trajectory(self, document: dict[str, Any]) -> None: + self._require_keys(document, ["trajectory_id", "task", "context_snapshot", "candidate_sets", "decisions", "events", "outcome", "reward"]) + self._require_keys(document["task"], ["task_id", "input", "channel", "created_at"]) + self._require_keys(document["context_snapshot"], ["conversation_summary", "environment_summary"]) + self._require_keys(document["candidate_sets"], ["memory", "skill", "tool"]) + self._require_keys(document["outcome"], ["status", "steps", "latency_ms", "user_corrections"]) + self._require_keys(document["reward"], ["total", "components"]) + self._require_keys( + document["reward"]["components"], + ["task_success", "retrieval_hit", "tool_error", "user_correction", "latency", "context_cost", "useful_reuse"], + ) + + def validate_memory_record(self, document: dict[str, Any]) -> None: + self._require_keys( + document, + ["id", "memory_type", "fact_status", "content", "summary", "source", "confidence", "created_at", "updated_at", "verification"], + ) + self._require_keys(document["source"], ["kind", "ref"]) + self._require_keys(document["verification"], ["status", "last_checked_at", "check_method"]) + + def _require_keys(self, document: dict[str, Any], keys: list[str]) -> None: + missing = [key for key in keys if key not in document] + if missing: + raise SchemaValidationError(f"Missing required keys: {', '.join(missing)}") diff --git a/src/memabra/telemetry.py b/src/memabra/telemetry.py new file mode 100644 index 0000000..ef7abda --- /dev/null +++ b/src/memabra/telemetry.py @@ -0,0 +1,38 @@ +from dataclasses import dataclass, field +from datetime import datetime, UTC +from typing import Any + + +@dataclass(slots=True) +class Event: + event_id: str + trajectory_id: str + stage: str + event_type: str + payload: dict[str, Any] + metrics: dict[str, Any] = field(default_factory=dict) + parent_event_id: str | None = None + timestamp: str = field(default_factory=lambda: datetime.now(UTC).isoformat()) + + +@dataclass(slots=True) +class RewardBreakdown: + task_success: float = 0.0 + retrieval_hit: float = 0.0 + tool_error: float = 0.0 + user_correction: float = 0.0 + latency: float = 0.0 + context_cost: float = 0.0 + useful_reuse: float = 0.0 + + @property + def total(self) -> float: + return ( + self.task_success + + self.retrieval_hit + - self.tool_error + - self.user_correction + - self.latency + - self.context_cost + + self.useful_reuse + ) diff --git a/src/memabra/training_reports.py b/src/memabra/training_reports.py new file mode 100644 index 0000000..c1d06f3 --- /dev/null +++ b/src/memabra/training_reports.py @@ -0,0 +1,75 @@ +from __future__ import annotations + +import json +from dataclasses import dataclass, field +from datetime import datetime, timezone +from pathlib import Path +from typing import Any +from uuid import uuid4 + +from .evaluator import EvaluationResult +from .promotion import PromotionDecision + + +def build_report( + *, + source_trajectory_ids: list[str], + baseline: EvaluationResult, + challenger: EvaluationResult, + decision: PromotionDecision, + promoted_version_id: str | None = None, + baseline_version_id: str | None = None, +) -> dict[str, Any]: + return { + "report_id": f"report-{uuid4()}", + "timestamp": datetime.now(timezone.utc).isoformat(), + "source_trajectory_ids": source_trajectory_ids, + "sample_count": len(source_trajectory_ids), + "baseline_metrics": { + "task_count": baseline.task_count, + "avg_reward": baseline.avg_reward, + "error_rate": baseline.error_rate, + "avg_latency_ms": baseline.avg_latency_ms, + }, + "challenger_metrics": { + "task_count": challenger.task_count, + "avg_reward": challenger.avg_reward, + "error_rate": challenger.error_rate, + "avg_latency_ms": challenger.avg_latency_ms, + }, + "promotion_decision": { + "accepted": decision.accepted, + "reasons": decision.reasons, + "metrics": decision.metrics, + }, + "promoted_version_id": promoted_version_id, + "baseline_version_id": baseline_version_id, + } + + +@dataclass +class TrainingReportStore: + base_dir: str | Path = field(default="docs/projects/memabra/training-reports") + + def __post_init__(self): + self._base = Path(self.base_dir) + self._base.mkdir(parents=True, exist_ok=True) + + def save(self, report: dict[str, Any]) -> dict[str, Any]: + report_id = report["report_id"] + path = self._base / f"{report_id}.json" + path.write_text(json.dumps(report, indent=2), encoding="utf-8") + return {"report_id": report_id, "path": str(path)} + + def list_reports(self) -> list[dict[str, Any]]: + reports = [] + for path in sorted(self._base.glob("*.json")): + record = json.loads(path.read_text(encoding="utf-8")) + reports.append(record) + return reports + + def get_report(self, report_id: str) -> dict[str, Any] | None: + path = self._base / f"{report_id}.json" + if not path.exists(): + return None + return json.loads(path.read_text(encoding="utf-8")) diff --git a/src/memabra/trajectory_summary.py b/src/memabra/trajectory_summary.py new file mode 100644 index 0000000..0a67a75 --- /dev/null +++ b/src/memabra/trajectory_summary.py @@ -0,0 +1,35 @@ +from __future__ import annotations + +from typing import Any + + +class TrajectorySummarizer: + def summarize(self, trajectory: dict[str, Any]) -> str: + task_input = "" + if "task" in trajectory and isinstance(trajectory["task"], dict): + task_input = trajectory["task"].get("input", "") + if len(task_input) > 60: + task_input = task_input[:57] + "..." + + decisions = trajectory.get("decisions", []) + action_types = [d.get("decision_type", "unknown") for d in decisions] if isinstance(decisions, list) else [] + action_str = " -> ".join(action_types) if action_types else "none" + + outcome = trajectory.get("outcome", {}) if isinstance(trajectory.get("outcome"), dict) else {} + status = outcome.get("status", "unknown") + reward = trajectory.get("reward", {}).get("total", 0.0) if isinstance(trajectory.get("reward"), dict) else 0.0 + steps = outcome.get("steps", 0) + tool_errors = outcome.get("tool_errors", 0) + user_corrections = outcome.get("user_corrections", 0) + + parts = [ + f"Task: '{task_input}'", + f"Actions: {action_str}", + f"Outcome: {status} (reward={reward}, steps={steps})", + ] + if tool_errors: + parts.append(f"Tool errors: {tool_errors}") + if user_corrections: + parts.append(f"User corrections: {user_corrections}") + + return " | ".join(parts) diff --git a/tests/test_app.py b/tests/test_app.py new file mode 100644 index 0000000..69b063f --- /dev/null +++ b/tests/test_app.py @@ -0,0 +1,197 @@ +from pathlib import Path + +from memabra.app import MemabraApp, build_app_with_skills, build_demo_app + + +def test_build_demo_app_runs_task_and_produces_summary(tmp_path: Path): + app = build_demo_app(base_dir=tmp_path / "demo-artifacts") + + trajectory = app.run_task("Use my telegram preference for this answer.", channel="telegram", user_id="oza") + summary = app.replay_summary() + + assert trajectory["trajectory_id"].startswith("traj-") + assert summary.trajectories == 1 + assert any(event["event_type"] == "memory_injected" for event in trajectory["events"]) + assert len(list((tmp_path / "demo-artifacts" / "trajectories").glob("*.json"))) == 1 + + +def test_app_can_run_tool_task_with_demo_backend(tmp_path: Path): + app = build_demo_app(base_dir=tmp_path / "demo-artifacts") + + trajectory = app.run_task("Check the current system status.") + + assert trajectory["decisions"][0]["decision_type"] == "call_tool" + assert any(event["event_type"] == "tool_result" for event in trajectory["events"]) + assert trajectory["outcome"]["status"] == "success" + + +def test_build_app_with_skills_loads_real_skill_from_filesystem(tmp_path: Path): + skill_dir = tmp_path / "skills" / "github-auth" + skill_dir.mkdir(parents=True) + (skill_dir / "SKILL.md").write_text( + "---\n" + "name: github-auth\n" + "description: Authenticate with GitHub.\n" + "---\n\n" + "# GitHub Auth\n\n" + "Use git or gh.\n" + ) + + app = build_app_with_skills(base_dir=tmp_path / "artifacts", skill_search_paths=[tmp_path / "skills"]) + + # github-auth is not in the candidate set by default, so router won't trigger it. + # We test that the app builds and a memory task still works. + trajectory = app.run_task("Use my telegram preference for this answer.", channel="telegram", user_id="oza") + assert trajectory["decisions"][0]["decision_type"] == "inject_memory" + + # Now verify the skill backend is actually wired by loading directly + backend = app.runner.execution_engine.skill_executor.backend + payload = backend.load_skill("github-auth") + assert payload["name"] == "github-auth" + assert "Use git or gh." in payload["content"] + + +def test_app_artifact_index_queries_persisted_trajectories(tmp_path: Path): + app = build_demo_app(base_dir=tmp_path / "demo-artifacts") + + app.run_task("Use my telegram preference for this answer.", channel="telegram", user_id="u1") + app.run_task("Check the current system status.", channel="local", user_id="u2") + + index = app.artifact_index() + telegram_trajs = index.query(channel="telegram") + tool_trajs = index.query(decision_type="call_tool") + + assert len(telegram_trajs) == 1 + assert telegram_trajs[0]["task"]["input"] == "Use my telegram preference for this answer." + assert len(tool_trajs) == 1 + assert tool_trajs[0]["task"]["input"] == "Check the current system status." + + slice_ids = index.slice_dataset(channel="local") + assert len(slice_ids) == 1 + + +def test_app_run_online_learning_cycle_returns_report(tmp_path: Path): + from memabra.benchmarks import BenchmarkTask + from memabra.promotion import PromotionPolicy + + app = build_demo_app(base_dir=tmp_path / "demo-artifacts") + # Seed trajectories + for i in range(10): + app.run_task(f"Task {i}") + + result = app.run_online_learning_cycle( + policy=PromotionPolicy( + min_reward_delta=-1.0, + max_error_rate_increase=1.0, + max_latency_increase_ms=10000.0, + required_task_count=1, + ), + benchmark_tasks=[BenchmarkTask(user_input="Task 0")], + min_new_trajectories=1, + ) + + assert "skipped" in result + assert "promoted" in result or result["skipped"] is True + assert "report_id" in result + + +def test_app_run_online_learning_cycle_uses_baseline_version(tmp_path: Path): + from memabra.benchmarks import BenchmarkTask + from memabra.promotion import PromotionPolicy + from memabra.router import SimpleLearningRouter + from memabra.router_versioning import RouterVersionStore + + app = build_demo_app(base_dir=tmp_path / "demo-artifacts") + for i in range(10): + app.run_task(f"Task {i}") + + # Save a baseline version + baseline_router = SimpleLearningRouter() + baseline_router._weights = {"call_tool": {"input_length": 0.99}} + baseline_router._feature_keys = ["input_length"] + version_dir = tmp_path / "versions" + store = RouterVersionStore(base_dir=version_dir) + store.save(baseline_router, version_id="v-baseline") + + # Change current router + app.set_router(SimpleLearningRouter()) + + result = app.run_online_learning_cycle( + policy=PromotionPolicy( + min_reward_delta=-1.0, + max_error_rate_increase=1.0, + max_latency_increase_ms=10000.0, + required_task_count=1, + ), + benchmark_tasks=[BenchmarkTask(user_input="Task 0")], + min_new_trajectories=1, + version_store_base_dir=version_dir, + baseline_version_id="v-baseline", + ) + + assert result["skipped"] is False + assert "baseline_metrics" in result + assert "challenger_metrics" in result + + +def test_app_run_online_learning_cycle_rebuilds_case_index(tmp_path: Path): + from memabra.benchmarks import BenchmarkTask + from memabra.promotion import PromotionPolicy + + app = build_demo_app(base_dir=tmp_path / "demo-artifacts") + for i in range(10): + app.run_task(f"Task {i}") + + case_index_path = tmp_path / "case-index.json" + result = app.run_online_learning_cycle( + policy=PromotionPolicy( + min_reward_delta=-1.0, + max_error_rate_increase=1.0, + max_latency_increase_ms=10000.0, + required_task_count=1, + ), + benchmark_tasks=[BenchmarkTask(user_input="Task 0")], + min_new_trajectories=1, + case_index_path=case_index_path, + ) + + assert result["skipped"] is False + assert case_index_path.exists() + from memabra.case_index import CaseIndex + + index = CaseIndex.load(case_index_path) + assert index.best("Task 0") is not None + + +def test_app_build_case_index_from_trajectories(tmp_path: Path): + app = build_demo_app(base_dir=tmp_path / "demo-artifacts") + app.run_task("Hello world", channel="local", user_id="u1") + app.run_task("Hello world", channel="local", user_id="u2") + + case_index = app.build_case_index() + + assert case_index.best("Hello world") is not None + + +def test_app_save_and_load_case_index(tmp_path: Path): + app = build_demo_app(base_dir=tmp_path / "demo-artifacts") + app.run_task("Persist this case", channel="local", user_id="u1") + + case_index_path = tmp_path / "case-index.json" + app.build_case_index() + app.save_case_index(case_index_path) + loaded_app = build_demo_app(base_dir=tmp_path / "demo-artifacts") + loaded_app.load_case_index(case_index_path) + + assert loaded_app.case_index is not None + assert loaded_app.case_index.best("Persist this case") is not None + + +def test_app_best_trajectory_for_input(tmp_path: Path): + app = build_demo_app(base_dir=tmp_path / "demo-artifacts") + trajectory = app.run_task("Find the best trajectory", channel="local", user_id="u1") + + app.build_case_index() + best_id = app.best_trajectory_for("Find the best trajectory") + + assert best_id == trajectory["trajectory_id"] diff --git a/tests/test_artifact_index.py b/tests/test_artifact_index.py new file mode 100644 index 0000000..e62495a --- /dev/null +++ b/tests/test_artifact_index.py @@ -0,0 +1,169 @@ +from pathlib import Path + +from memabra.persistence import PersistenceStore +from memabra.artifact_index import ArtifactIndex + + +def _make_trajectory( + trajectory_id: str, + *, + status: str = "success", + decision_type: str = "direct_answer", + channel: str = "local", + reward_total: float = 1.0, + latency_ms: int = 100, + tool_errors: int = 0, + user_corrections: int = 0, + input_text: str = "Hello", + created_at: str = "2026-01-15T10:00:00Z", +): + return { + "trajectory_id": trajectory_id, + "task": { + "task_id": f"task-{trajectory_id}", + "input": input_text, + "channel": channel, + "created_at": created_at, + "user_id": None, + }, + "context_snapshot": {"conversation_summary": "", "environment_summary": "", "recent_failures": []}, + "candidate_sets": {"memory": [], "skill": [], "tool": []}, + "decisions": [ + { + "step": 1, + "decision_type": decision_type, + "selected_ids": [], + "selected_payloads": [], + "rejected_ids": [], + "rationale": "", + "estimated_cost": 0.0, + } + ], + "events": [], + "outcome": { + "status": status, + "steps": 1, + "latency_ms": latency_ms, + "user_corrections": user_corrections, + "tool_errors": tool_errors, + "notes": None, + }, + "reward": { + "total": reward_total, + "components": { + "task_success": 1.0 if status == "success" else 0.0, + "retrieval_hit": 0.0, + "tool_error": 0.1 * tool_errors, + "user_correction": 0.1 * user_corrections, + "latency": 0.0, + "context_cost": 0.0, + "useful_reuse": 0.0, + }, + }, + } + + +def test_artifact_index_lists_all_trajectories(tmp_path: Path): + persistence = PersistenceStore(base_dir=tmp_path / "artifacts") + persistence.save_trajectory(_make_trajectory("traj-1", status="success")) + persistence.save_trajectory(_make_trajectory("traj-2", status="failure")) + + index = ArtifactIndex(persistence_store=persistence) + results = index.query() + + assert len(results) == 2 + assert {r["trajectory_id"] for r in results} == {"traj-1", "traj-2"} + + +def test_artifact_index_filters_by_status(tmp_path: Path): + persistence = PersistenceStore(base_dir=tmp_path / "artifacts") + persistence.save_trajectory(_make_trajectory("traj-1", status="success")) + persistence.save_trajectory(_make_trajectory("traj-2", status="failure")) + persistence.save_trajectory(_make_trajectory("traj-3", status="partial_success")) + + index = ArtifactIndex(persistence_store=persistence) + successes = index.query(status="success") + failures = index.query(status="failure") + + assert len(successes) == 1 + assert successes[0]["trajectory_id"] == "traj-1" + assert len(failures) == 1 + assert failures[0]["trajectory_id"] == "traj-2" + + +def test_artifact_index_filters_by_reward_range(tmp_path: Path): + persistence = PersistenceStore(base_dir=tmp_path / "artifacts") + persistence.save_trajectory(_make_trajectory("traj-1", reward_total=0.9)) + persistence.save_trajectory(_make_trajectory("traj-2", reward_total=0.5)) + persistence.save_trajectory(_make_trajectory("traj-3", reward_total=-0.2)) + + index = ArtifactIndex(persistence_store=persistence) + high = index.query(min_reward=0.6) + low = index.query(max_reward=0.0) + + assert len(high) == 1 and high[0]["trajectory_id"] == "traj-1" + assert len(low) == 1 and low[0]["trajectory_id"] == "traj-3" + + +def test_artifact_index_filters_by_decision_type_and_channel(tmp_path: Path): + persistence = PersistenceStore(base_dir=tmp_path / "artifacts") + persistence.save_trajectory(_make_trajectory("traj-1", decision_type="direct_answer", channel="local")) + persistence.save_trajectory(_make_trajectory("traj-2", decision_type="call_tool", channel="telegram")) + + index = ArtifactIndex(persistence_store=persistence) + tools = index.query(decision_type="call_tool") + telegram = index.query(channel="telegram") + + assert len(tools) == 1 and tools[0]["trajectory_id"] == "traj-2" + assert len(telegram) == 1 and telegram[0]["trajectory_id"] == "traj-2" + + +def test_artifact_index_filters_by_tool_errors_and_user_corrections(tmp_path: Path): + persistence = PersistenceStore(base_dir=tmp_path / "artifacts") + persistence.save_trajectory(_make_trajectory("traj-1", tool_errors=0, user_corrections=0)) + persistence.save_trajectory(_make_trajectory("traj-2", tool_errors=2, user_corrections=1)) + + index = ArtifactIndex(persistence_store=persistence) + with_errors = index.query(min_tool_errors=1) + with_corrections = index.query(min_user_corrections=1) + + assert len(with_errors) == 1 and with_errors[0]["trajectory_id"] == "traj-2" + assert len(with_corrections) == 1 and with_corrections[0]["trajectory_id"] == "traj-2" + + +def test_artifact_index_filters_by_input_text(tmp_path: Path): + persistence = PersistenceStore(base_dir=tmp_path / "artifacts") + persistence.save_trajectory(_make_trajectory("traj-1", input_text="Deploy the service")) + persistence.save_trajectory(_make_trajectory("traj-2", input_text="Check status")) + + index = ArtifactIndex(persistence_store=persistence) + deploy = index.query(input_contains="deploy") + status = index.query(input_contains="STATUS") + + assert len(deploy) == 1 and deploy[0]["trajectory_id"] == "traj-1" + assert len(status) == 1 and status[0]["trajectory_id"] == "traj-2" + + +def test_artifact_index_slice_dataset_returns_ids(tmp_path: Path): + persistence = PersistenceStore(base_dir=tmp_path / "artifacts") + persistence.save_trajectory(_make_trajectory("traj-1", status="success", reward_total=0.9)) + persistence.save_trajectory(_make_trajectory("traj-2", status="failure", reward_total=-0.1)) + persistence.save_trajectory(_make_trajectory("traj-3", status="success", reward_total=0.95)) + + index = ArtifactIndex(persistence_store=persistence) + slice_ids = index.slice_dataset(status="success", min_reward=0.8) + + assert slice_ids == ["traj-1", "traj-3"] + + +def test_artifact_index_refresh_picks_up_new_files(tmp_path: Path): + persistence = PersistenceStore(base_dir=tmp_path / "artifacts") + persistence.save_trajectory(_make_trajectory("traj-1")) + + index = ArtifactIndex(persistence_store=persistence) + assert len(index.query()) == 1 + + persistence.save_trajectory(_make_trajectory("traj-2")) + index.refresh() + + assert len(index.query()) == 2 diff --git a/tests/test_benchmarks.py b/tests/test_benchmarks.py new file mode 100644 index 0000000..fff3554 --- /dev/null +++ b/tests/test_benchmarks.py @@ -0,0 +1,38 @@ +from __future__ import annotations + +from memabra.benchmarks import BenchmarkSuite, BenchmarkTask, save_benchmark_suite, load_benchmark_suite, default_benchmark_suite + + +def test_benchmark_suite_roundtrip(tmp_path): + path = tmp_path / "suite.json" + suite = BenchmarkSuite( + name="test-suite", + tasks=[ + BenchmarkTask(user_input="Hello", channel="local", user_id="u1"), + BenchmarkTask(user_input="World", channel="telegram"), + ], + ) + + save_benchmark_suite(suite, path) + loaded = load_benchmark_suite(path) + + assert loaded.name == "test-suite" + assert len(loaded.tasks) == 2 + assert loaded.tasks[0].user_input == "Hello" + assert loaded.tasks[0].channel == "local" + assert loaded.tasks[0].user_id == "u1" + assert loaded.tasks[1].user_input == "World" + assert loaded.tasks[1].channel == "telegram" + assert loaded.tasks[1].user_id is None + + +def test_default_benchmark_suite_covers_expected_categories(): + suite = default_benchmark_suite() + + assert suite.name == "default" + assert len(suite.tasks) >= 4 + inputs = [t.user_input.lower() for t in suite.tasks] + assert any("memory" in i or "preference" in i for i in inputs) + assert any("skill" in i or "deploy" in i for i in inputs) + assert any("tool" in i or "status" in i for i in inputs) + assert any("composite" in i or "multiple" in i for i in inputs) diff --git a/tests/test_case_index.py b/tests/test_case_index.py new file mode 100644 index 0000000..e2d5a06 --- /dev/null +++ b/tests/test_case_index.py @@ -0,0 +1,50 @@ +from memabra.case_index import CaseIndex + + +def test_case_index_adds_and_retrieves_best_trajectory(): + index = CaseIndex() + trajectory = { + "trajectory_id": "traj-1", + "task": {"input": "Hello world"}, + "outcome": {"status": "success"}, + "reward": {"total": 1.0}, + } + index.add(trajectory) + assert index.best("Hello world") == "traj-1" + + +def test_case_index_returns_none_for_unknown_input(): + index = CaseIndex() + assert index.best("Unknown input") is None + + +def test_case_index_keeps_higher_reward_for_same_input(): + index = CaseIndex() + index.add({ + "trajectory_id": "traj-low", + "task": {"input": "Same input"}, + "outcome": {"status": "success"}, + "reward": {"total": 0.5}, + }) + index.add({ + "trajectory_id": "traj-high", + "task": {"input": "Same input"}, + "outcome": {"status": "success"}, + "reward": {"total": 1.5}, + }) + assert index.best("Same input") == "traj-high" + + +def test_case_index_save_and_round_trip(tmp_path): + index = CaseIndex() + index.add({ + "trajectory_id": "traj-save", + "task": {"input": "Persist me"}, + "outcome": {"status": "success"}, + "reward": {"total": 2.0}, + }) + path = tmp_path / "case_index.json" + index.save(path) + + loaded = CaseIndex.load(path) + assert loaded.best("Persist me") == "traj-save" diff --git a/tests/test_cli_workflow.py b/tests/test_cli_workflow.py new file mode 100644 index 0000000..a6d3d39 --- /dev/null +++ b/tests/test_cli_workflow.py @@ -0,0 +1,574 @@ +from pathlib import Path + +from memabra.cli import format_output, run_online_learning_workflow, run_wrapup_workflow + + +def test_run_wrapup_workflow_trains_evaluates_and_versions_router(tmp_path: Path): + result = run_wrapup_workflow(base_dir=tmp_path / "demo-artifacts") + + assert result["seed_summary"]["trajectories"] >= 3 + assert "baseline" in result["comparison"] + assert "challenger" in result["comparison"] + assert result["saved_version"]["version_id"] + assert (tmp_path / "demo-artifacts" / "router-versions" / "current.json").exists() + + +def test_run_online_learning_workflow_runs_cycle_and_returns_report(tmp_path: Path): + result = run_online_learning_workflow(base_dir=tmp_path / "demo-artifacts") + + assert "skipped" in result + assert "report_id" in result + # Since it seeds tasks, it should not skip + assert result["skipped"] is False + assert result["promoted"] is True + assert (tmp_path / "demo-artifacts" / "training-reports").exists() + + +def test_format_output_workflow_text_includes_decision_reason_and_dry_run(): + payload = { + "report_id": "report-123", + "skipped": False, + "promoted": False, + "dry_run": True, + "decision": { + "accepted": False, + "reasons": ["Reward delta too small", "Latency increased"], + "metrics": { + "reward_delta": -0.12, + "error_rate_delta": 0.02, + "latency_delta_ms": 12.5, + }, + }, + "baseline_metrics": { + "avg_reward": 1.0, + "error_rate": 0.1, + "avg_latency_ms": 120.0, + }, + "challenger_metrics": { + "avg_reward": 0.88, + "error_rate": 0.12, + "avg_latency_ms": 132.5, + }, + } + + rendered = format_output(payload, output_format="text", mode="workflow") + + assert "Memabra online learning result" in rendered + assert "Summary" in rendered + assert "Report ID: report-123" in rendered + assert "Skipped: no" in rendered + assert "Promoted: no" in rendered + assert "Dry run: yes" in rendered + assert "Baseline" in rendered + assert "Reward: 1.0000" in rendered + assert "Error rate: 0.1000" in rendered + assert "Latency (ms): 120.0000" in rendered + assert "Challenger" in rendered + assert "Reward: 0.8800" in rendered + assert "Deltas" in rendered + assert "Reward delta: -0.1200" in rendered + assert "Error rate delta: 0.0200" in rendered + assert "Latency delta (ms): 12.5000" in rendered + assert "Decision" in rendered + assert "Reason: Reward delta too small; Latency increased" in rendered + + +def test_format_output_workflow_text_includes_error_details(): + payload = { + "report_id": "report-err", + "skipped": False, + "promoted": False, + "error": "benchmark crashed", + } + + rendered = format_output(payload, output_format="text", mode="workflow") + + assert "Error: benchmark crashed" in rendered + + +def test_format_output_status_text_includes_latest_report_details(): + payload = { + "base_dir": "/tmp/demo-artifacts", + "current_version_id": "v2", + "version_count": 2, + "trajectory_count": 8, + "report_count": 3, + "latest_report": { + "report_id": "report-9", + "timestamp": "2026-04-15T06:00:00+00:00", + "promoted": True, + }, + } + + rendered = format_output(payload, output_format="text", mode="status") + + assert "Memabra status" in rendered + assert "Current version: v2" in rendered + assert "Latest report: report-9" in rendered + assert "Latest report time: 2026-04-15T06:00:00+00:00" in rendered + assert "Latest promotion accepted: yes" in rendered + + +def test_format_output_list_versions_text_marks_current_version(): + payload = { + "current_version_id": "v2", + "versions": [ + {"version_id": "v1", "metadata": {"source": "seed", "avg_reward": 1.2}}, + {"version_id": "v2", "metadata": {"source": "online_learning", "avg_reward": 1.4}}, + ], + } + + rendered = format_output(payload, output_format="text", mode="list_versions") + + assert "Saved router versions (2 total)" in rendered + assert "Current version: v2" in rendered + assert "1. v1 (source=seed, avg_reward=1.2)" in rendered + assert "2. v2 (current, source=online_learning, avg_reward=1.4)" in rendered + + +def test_main_entrypoint_uses_online_learning_workflow(monkeypatch): + from memabra import cli + + calls = [] + + def mock_online_learning_workflow(*, base_dir=None, min_new_trajectories=3, seen_trajectory_store=None, **kwargs): + calls.append({"base_dir": str(base_dir), "min_new_trajectories": min_new_trajectories, "seen_trajectory_store": seen_trajectory_store}) + return {"skipped": False, "promoted": True, "report_id": "report-test"} + + monkeypatch.setattr(cli, "run_online_learning_workflow", mock_online_learning_workflow) + + rc = cli.main() + + assert rc == 0 + assert len(calls) == 1 + assert calls[0]["min_new_trajectories"] == 3 + + +def test_main_entrypoint_parses_base_dir_argument(monkeypatch): + from memabra import cli + + calls = [] + + def mock_online_learning_workflow(*, base_dir=None, min_new_trajectories=3, seen_trajectory_store=None, **kwargs): + calls.append({"base_dir": str(base_dir) if base_dir else None, "min_new_trajectories": min_new_trajectories, "seen_trajectory_store": seen_trajectory_store}) + return {"skipped": False, "promoted": True, "report_id": "report-test"} + + monkeypatch.setattr(cli, "run_online_learning_workflow", mock_online_learning_workflow) + + rc = cli.main(["--base-dir", "/custom/path"]) + + assert rc == 0 + assert len(calls) == 1 + assert calls[0]["base_dir"] == "/custom/path" + + +def test_main_entrypoint_parses_min_new_trajectories_argument(monkeypatch): + from memabra import cli + + calls = [] + + def mock_online_learning_workflow(*, base_dir=None, min_new_trajectories=3, seen_trajectory_store=None, **kwargs): + calls.append({"base_dir": str(base_dir) if base_dir else None, "min_new_trajectories": min_new_trajectories, "seen_trajectory_store": seen_trajectory_store}) + return {"skipped": False, "promoted": True, "report_id": "report-test"} + + monkeypatch.setattr(cli, "run_online_learning_workflow", mock_online_learning_workflow) + + rc = cli.main(["--min-new-trajectories", "10"]) + + assert rc == 0 + assert len(calls) == 1 + assert calls[0]["min_new_trajectories"] == 10 + + +def test_run_online_learning_workflow_skips_on_second_run_when_seen_store_provided(tmp_path: Path): + base_dir = tmp_path / "demo-artifacts" + seen_store = tmp_path / "seen.json" + + result1 = run_online_learning_workflow( + base_dir=base_dir, + min_new_trajectories=1, + seen_trajectory_store=seen_store, + ) + assert result1["skipped"] is False + + result2 = run_online_learning_workflow( + base_dir=base_dir, + min_new_trajectories=1, + seen_trajectory_store=seen_store, + ) + assert result2["skipped"] is True + assert "too few new trajectories" in result2["reason"].lower() + + +def test_main_entrypoint_passes_default_seen_trajectory_store(monkeypatch): + from memabra import cli + + calls = [] + + def mock_online_learning_workflow(*, base_dir=None, min_new_trajectories=3, seen_trajectory_store=None, dry_run=False, **kwargs): + calls.append({ + "base_dir": str(base_dir) if base_dir else None, + "min_new_trajectories": min_new_trajectories, + "seen_trajectory_store": str(seen_trajectory_store) if seen_trajectory_store else None, + "dry_run": dry_run, + }) + return {"skipped": False, "promoted": True, "report_id": "report-test"} + + monkeypatch.setattr(cli, "run_online_learning_workflow", mock_online_learning_workflow) + + rc = cli.main() + + assert rc == 0 + assert len(calls) == 1 + assert calls[0]["seen_trajectory_store"] is not None + assert "seen-trajectories.json" in calls[0]["seen_trajectory_store"] + assert calls[0]["dry_run"] is False + + +def test_main_entrypoint_passes_dry_run_flag(monkeypatch): + from memabra import cli + + calls = [] + + def mock_online_learning_workflow(*, base_dir=None, min_new_trajectories=3, seen_trajectory_store=None, dry_run=False, **kwargs): + calls.append({ + "base_dir": str(base_dir) if base_dir else None, + "min_new_trajectories": min_new_trajectories, + "seen_trajectory_store": str(seen_trajectory_store) if seen_trajectory_store else None, + "dry_run": dry_run, + "baseline_version": kwargs.get("baseline_version"), + }) + return {"skipped": False, "promoted": True, "report_id": "report-test"} + + monkeypatch.setattr(cli, "run_online_learning_workflow", mock_online_learning_workflow) + + rc = cli.main(["--dry-run"]) + + assert rc == 0 + assert len(calls) == 1 + assert calls[0]["dry_run"] is True + + +def test_main_entrypoint_passes_baseline_version_flag(monkeypatch): + from memabra import cli + + calls = [] + + def mock_online_learning_workflow(*, base_dir=None, min_new_trajectories=3, seen_trajectory_store=None, dry_run=False, baseline_version=None, **kwargs): + calls.append({ + "base_dir": str(base_dir) if base_dir else None, + "min_new_trajectories": min_new_trajectories, + "seen_trajectory_store": str(seen_trajectory_store) if seen_trajectory_store else None, + "dry_run": dry_run, + "baseline_version": baseline_version, + }) + return {"skipped": False, "promoted": True, "report_id": "report-test"} + + monkeypatch.setattr(cli, "run_online_learning_workflow", mock_online_learning_workflow) + + rc = cli.main(["--baseline-version", "v1"]) + + assert rc == 0 + assert len(calls) == 1 + assert calls[0]["baseline_version"] == "v1" + + +def test_main_entrypoint_supports_text_format_for_workflow(monkeypatch, capsys): + from memabra import cli + + def mock_online_learning_workflow(**kwargs): + return { + "skipped": False, + "promoted": False, + "report_id": "report-text", + "dry_run": True, + "decision": { + "accepted": False, + "reasons": ["Dry run requested"], + "metrics": { + "reward_delta": 0.05, + "error_rate_delta": 0.0, + "latency_delta_ms": 4.0, + }, + }, + "baseline_metrics": { + "avg_reward": 0.8, + "error_rate": 0.1, + "avg_latency_ms": 90.0, + }, + "challenger_metrics": { + "avg_reward": 0.85, + "error_rate": 0.1, + "avg_latency_ms": 94.0, + }, + } + + monkeypatch.setattr(cli, "run_online_learning_workflow", mock_online_learning_workflow) + + rc = cli.main(["--format", "text", "--dry-run"]) + + captured = capsys.readouterr() + assert rc == 0 + assert "Memabra online learning result" in captured.out + assert "Summary" in captured.out + assert "Dry run: yes" in captured.out + assert "Baseline" in captured.out + assert "Reward: 0.8000" in captured.out + assert "Challenger" in captured.out + assert "Reward: 0.8500" in captured.out + assert "Deltas" in captured.out + assert "Reward delta: 0.0500" in captured.out + assert "Reason: Dry run requested" in captured.out + + +def test_main_entrypoint_passes_case_index_flags(monkeypatch): + from memabra import cli + + calls = [] + + def mock_online_learning_workflow(*, base_dir=None, min_new_trajectories=3, seen_trajectory_store=None, dry_run=False, baseline_version=None, case_index_path=None, rebuild_case_index=False, **kwargs): + calls.append({ + "base_dir": str(base_dir) if base_dir else None, + "case_index_path": str(case_index_path) if case_index_path else None, + "rebuild_case_index": rebuild_case_index, + }) + return {"skipped": False, "promoted": True, "report_id": "report-test"} + + monkeypatch.setattr(cli, "run_online_learning_workflow", mock_online_learning_workflow) + + rc = cli.main(["--case-index", "/tmp/cases.json", "--rebuild-case-index"]) + + assert rc == 0 + assert len(calls) == 1 + assert calls[0]["case_index_path"] == "/tmp/cases.json" + assert calls[0]["rebuild_case_index"] is True + + +def test_run_online_learning_workflow_loads_existing_case_index(tmp_path: Path): + base_dir = tmp_path / "demo-artifacts" + case_index_path = tmp_path / "case-index.json" + + # Run once to create trajectories and rebuild case index + result1 = run_online_learning_workflow(base_dir=base_dir, min_new_trajectories=1, rebuild_case_index=True, case_index_path=case_index_path) + assert result1["skipped"] is False + assert case_index_path.exists() + + # Second run should load the existing case index + result2 = run_online_learning_workflow(base_dir=base_dir, min_new_trajectories=1, rebuild_case_index=False, case_index_path=case_index_path) + assert result2["skipped"] is False + + +def test_run_online_learning_workflow_rebuilds_case_index_after_cycle(tmp_path: Path): + base_dir = tmp_path / "demo-artifacts" + case_index_path = tmp_path / "case-index.json" + + result = run_online_learning_workflow( + base_dir=base_dir, + min_new_trajectories=1, + case_index_path=case_index_path, + ) + assert result["skipped"] is False + assert case_index_path.exists() + from memabra.case_index import CaseIndex + + index = CaseIndex.load(case_index_path) + # The benchmark task during the cycle should produce a trajectory that gets indexed + assert index.best("Use my telegram preference for this answer.") is not None + + +def test_main_entrypoint_defaults_case_index_path_when_rebuild_flag_set(monkeypatch): + from memabra import cli + + calls = [] + + def mock_online_learning_workflow(*, base_dir=None, min_new_trajectories=3, seen_trajectory_store=None, dry_run=False, baseline_version=None, case_index_path=None, rebuild_case_index=False, **kwargs): + calls.append({ + "base_dir": str(base_dir) if base_dir else None, + "case_index_path": str(case_index_path) if case_index_path else None, + "rebuild_case_index": rebuild_case_index, + }) + return {"skipped": False, "promoted": True, "report_id": "report-test"} + + monkeypatch.setattr(cli, "run_online_learning_workflow", mock_online_learning_workflow) + + rc = cli.main(["--rebuild-case-index"]) + + assert rc == 0 + assert len(calls) == 1 + assert calls[0]["rebuild_case_index"] is True + assert calls[0]["case_index_path"] is not None + assert "case-index.json" in calls[0]["case_index_path"] + + +def test_main_status_flag_prints_status_and_skips_workflow(tmp_path: Path, monkeypatch, capsys): + from memabra import cli + + workflow_calls = [] + + def mock_online_learning_workflow(**kwargs): + workflow_calls.append(kwargs) + return {"skipped": False, "promoted": True, "report_id": "report-test"} + + monkeypatch.setattr(cli, "run_online_learning_workflow", mock_online_learning_workflow) + + base_dir = tmp_path / "demo-artifacts" + base_dir.mkdir(parents=True, exist_ok=True) + + rc = cli.main(["status", "--base-dir", str(base_dir)]) + + captured = capsys.readouterr() + assert rc == 0 + assert len(workflow_calls) == 0 + assert "current_version_id" in captured.out + + +def test_main_status_flag_supports_text_format(tmp_path: Path, monkeypatch, capsys): + from memabra import cli + + workflow_calls = [] + + def mock_online_learning_workflow(**kwargs): + workflow_calls.append(kwargs) + return {"skipped": False, "promoted": True, "report_id": "report-test"} + + monkeypatch.setattr(cli, "run_online_learning_workflow", mock_online_learning_workflow) + + base_dir = tmp_path / "demo-artifacts" + base_dir.mkdir(parents=True, exist_ok=True) + + rc = cli.main(["status", "--format", "text", "--base-dir", str(base_dir)]) + + captured = capsys.readouterr() + assert rc == 0 + assert len(workflow_calls) == 0 + assert "Memabra status" in captured.out + assert "Current version:" in captured.out + assert "Trajectory count:" in captured.out + + +def test_main_rollback_flag_rolls_back_and_skips_workflow(tmp_path: Path, monkeypatch, capsys): + from memabra import cli + from memabra.router_versioning import RouterVersionStore + + workflow_calls = [] + rollback_calls = [] + + def mock_online_learning_workflow(**kwargs): + workflow_calls.append(kwargs) + return {"skipped": False, "promoted": True, "report_id": "report-test"} + + def mock_rollback(self, version_id: str): + rollback_calls.append(version_id) + return {"current_version_id": version_id} + + monkeypatch.setattr(cli, "run_online_learning_workflow", mock_online_learning_workflow) + monkeypatch.setattr(RouterVersionStore, "rollback", mock_rollback) + + base_dir = tmp_path / "demo-artifacts" + base_dir.mkdir(parents=True, exist_ok=True) + + rc = cli.main(["version", "rollback", "v1", "--base-dir", str(base_dir)]) + + captured = capsys.readouterr() + assert rc == 0 + assert len(workflow_calls) == 0 + assert len(rollback_calls) == 1 + assert rollback_calls[0] == "v1" + assert "current_version_id" in captured.out + + +def test_main_rollback_flag_supports_text_format(tmp_path: Path, monkeypatch, capsys): + from memabra import cli + from memabra.router_versioning import RouterVersionStore + + def mock_rollback(self, version_id: str): + return {"current_version_id": version_id} + + monkeypatch.setattr(RouterVersionStore, "rollback", mock_rollback) + + base_dir = tmp_path / "demo-artifacts" + base_dir.mkdir(parents=True, exist_ok=True) + + rc = cli.main(["version", "rollback", "v1", "--format", "text", "--base-dir", str(base_dir)]) + + captured = capsys.readouterr() + assert rc == 0 + assert "Rolled back current version to: v1" in captured.out + + +def test_main_rollback_missing_version_prints_error_and_exits_nonzero(tmp_path: Path, monkeypatch, capsys): + from memabra import cli + from memabra.router_versioning import RouterVersionStore + + def mock_rollback(self, version_id: str): + raise ValueError(f"Version '{version_id}' not found.") + + monkeypatch.setattr(RouterVersionStore, "rollback", mock_rollback) + + base_dir = tmp_path / "demo-artifacts" + base_dir.mkdir(parents=True, exist_ok=True) + + rc = cli.main(["version", "rollback", "v99", "--base-dir", str(base_dir)]) + + captured = capsys.readouterr() + assert rc == 1 + assert "not found" in captured.err.lower() + + +def test_main_list_versions_flag_prints_versions_and_skips_workflow(tmp_path: Path, monkeypatch, capsys): + from memabra import cli + from memabra.router_versioning import RouterVersionStore + + workflow_calls = [] + + def mock_online_learning_workflow(**kwargs): + workflow_calls.append(kwargs) + return {"skipped": False, "promoted": True, "report_id": "report-test"} + + def mock_list_versions(self): + return [ + {"version_id": "v1", "metadata": {"source": "test"}}, + {"version_id": "v2", "metadata": {"source": "test"}}, + ] + + monkeypatch.setattr(cli, "run_online_learning_workflow", mock_online_learning_workflow) + monkeypatch.setattr(RouterVersionStore, "list_versions", mock_list_versions) + + base_dir = tmp_path / "demo-artifacts" + base_dir.mkdir(parents=True, exist_ok=True) + + rc = cli.main(["version", "list", "--base-dir", str(base_dir)]) + + captured = capsys.readouterr() + assert rc == 0 + assert len(workflow_calls) == 0 + assert "v1" in captured.out + assert "v2" in captured.out + + +def test_main_list_versions_flag_supports_text_format(tmp_path: Path, monkeypatch, capsys): + from memabra import cli + from memabra.router_versioning import RouterVersionStore + + def mock_list_versions(self): + return [ + {"version_id": "v1", "metadata": {"source": "seed", "avg_reward": 1.2}}, + {"version_id": "v2", "metadata": {"source": "online_learning", "avg_reward": 1.4}}, + ] + + def mock_get_current(self): + return {"current_version_id": "v2"} + + monkeypatch.setattr(RouterVersionStore, "list_versions", mock_list_versions) + monkeypatch.setattr(RouterVersionStore, "get_current", mock_get_current) + + base_dir = tmp_path / "demo-artifacts" + base_dir.mkdir(parents=True, exist_ok=True) + + rc = cli.main(["version", "list", "--format", "text", "--base-dir", str(base_dir)]) + + captured = capsys.readouterr() + assert rc == 0 + assert "Saved router versions (2 total)" in captured.out + assert "Current version: v2" in captured.out + assert "2. v2 (current, source=online_learning, avg_reward=1.4)" in captured.out diff --git a/tests/test_dataset.py b/tests/test_dataset.py new file mode 100644 index 0000000..2617c66 --- /dev/null +++ b/tests/test_dataset.py @@ -0,0 +1,49 @@ +from memabra.dataset import DatasetBuilder, TrainingSample + + +def test_dataset_builder_extracts_features_and_label(): + trajectories = [ + { + "task": {"input": "hello world"}, + "candidate_sets": { + "memory": [{"confidence": 0.8}], + "skill": [{"success_rate": 0.9}], + "tool": [{"confidence": 0.7, "risk": 0.2}], + }, + "decisions": [{"decision_type": "direct_answer"}], + "reward": {"total": 0.95}, + } + ] + builder = DatasetBuilder() + samples = builder.build(trajectories) + assert len(samples) == 1 + sample = samples[0] + assert sample.input_text == "hello world" + assert sample.label == "direct_answer" + assert sample.reward == 0.95 + assert sample.features["input_length"] == 11 + assert sample.features["memory_count"] == 1 + assert sample.features["skill_count"] == 1 + assert sample.features["tool_count"] == 1 + assert sample.features["top_memory_confidence"] == 0.8 + assert sample.features["top_skill_success_rate"] == 0.9 + assert sample.features["top_tool_confidence"] == 0.7 + assert sample.features["top_tool_risk"] == 0.2 + + +def test_dataset_builder_handles_empty_candidates(): + trajectories = [ + { + "task": {"input": "hi"}, + "candidate_sets": {"memory": [], "skill": [], "tool": []}, + "decisions": [{"decision_type": "clarify"}], + "reward": {"total": 0.0}, + } + ] + builder = DatasetBuilder() + samples = builder.build(trajectories) + assert len(samples) == 1 + assert samples[0].features["top_memory_confidence"] == 0.0 + assert samples[0].features["top_skill_success_rate"] == 0.0 + assert samples[0].features["top_tool_confidence"] == 0.0 + assert samples[0].features["top_tool_risk"] == 0.0 diff --git a/tests/test_evaluator.py b/tests/test_evaluator.py new file mode 100644 index 0000000..f6bbc89 --- /dev/null +++ b/tests/test_evaluator.py @@ -0,0 +1,54 @@ +from memabra.app import build_demo_app +from memabra.evaluator import BenchmarkTask, Evaluator + + +def test_evaluator_runs_benchmark_and_reports_metrics(tmp_path): + app = build_demo_app(base_dir=tmp_path / "demo-artifacts") + evaluator = Evaluator(app) + tasks = [ + BenchmarkTask(user_input="Use my telegram preference."), + BenchmarkTask(user_input="Check the current system status."), + ] + result = evaluator.run(tasks) + + assert result.task_count == 2 + assert result.avg_reward >= 0.0 + assert "inject_memory" in result.decision_distribution + assert "call_tool" in result.decision_distribution + assert result.error_rate == 0.0 + + +def test_evaluator_ab_compares_two_routers(tmp_path): + from memabra.router import RuleBasedRouter, TaskContext + + app = build_demo_app(base_dir=tmp_path / "demo-artifacts") + evaluator = Evaluator(app) + tasks = [ + BenchmarkTask(user_input="Use my telegram preference."), + BenchmarkTask(user_input="Check the current system status."), + ] + + baseline = evaluator.run(tasks, router=RuleBasedRouter()) + # Using same router for both arms in this test; real tests would compare different routers + challenger = evaluator.run(tasks, router=RuleBasedRouter()) + comparison = evaluator.compare(baseline, challenger) + + assert comparison["winner"] in ("baseline", "challenger", "tie") + assert "avg_reward_delta" in comparison + assert "error_rate_delta" in comparison + + +def test_app_trains_learning_router_from_artifact_index(tmp_path): + from memabra.router import SimpleLearningRouter + + app = build_demo_app(base_dir=tmp_path / "demo-artifacts") + # Generate some training data + app.run_task("Use my telegram preference.", channel="local") + app.run_task("Check the current system status.", channel="local") + + router = app.train_learning_router() + + assert isinstance(router, SimpleLearningRouter) + # After training, the router should be able to make predictions (not fallback to clarify for known patterns) + trajectory = app.run_task("Use my telegram preference.", channel="local") + assert trajectory["reward"]["total"] >= 0.0 diff --git a/tests/test_execution_persistence.py b/tests/test_execution_persistence.py new file mode 100644 index 0000000..ea44bba --- /dev/null +++ b/tests/test_execution_persistence.py @@ -0,0 +1,265 @@ +from pathlib import Path + +from memabra.candidate_types import CandidateObject +from memabra.execution import ExecutionEngine, MemoryExecutor, ToolExecutor +from memabra.memory_store import InMemoryMemoryStore, MemoryRecord, MemorySource +from memabra.persistence import PersistenceStore +from memabra.retrieval import CandidateRetriever, InMemoryCandidateProvider +from memabra.router import RouteDecision, RuleBasedRouter, TaskContext +from memabra.runner import MemabraRunner +from memabra.schemas import SchemaRegistry + + +class FailingToolBackend: + def run_tool(self, tool_id: str, context: TaskContext, params: dict | None = None) -> dict: + return {"status": "error", "output": None, "error": f"{tool_id} failed", "latency_ms": 123} + + +class MixedResultToolBackend: + def run_tool(self, tool_id: str, context: TaskContext, params: dict | None = None) -> dict: + if tool_id == "tool-ok": + return {"status": "success", "output": "ok", "error": None, "latency_ms": 50} + return {"status": "error", "output": None, "error": f"{tool_id} failed", "latency_ms": 100} + + +class StaticSkillBackend: + def load_skill(self, skill_id: str) -> dict: + return {"skill_id": skill_id, "instructions": "Follow the documented deployment workflow."} + + +def test_execution_engine_marks_memory_used_and_runner_persists(tmp_path: Path): + memory_store = InMemoryMemoryStore() + memory_store.upsert( + MemoryRecord( + id="mem-telegram-pref", + memory_type="semantic", + fact_status="verified", + content="Prefer plain text on Telegram.", + summary="Telegram preference", + source=MemorySource(kind="user", ref="session-1"), + confidence=0.95, + ) + ) + retriever = CandidateRetriever( + [ + InMemoryCandidateProvider( + candidate_type="memory", + candidates=[ + CandidateObject( + id="mem-telegram-pref", + type="memory", + title="Telegram preference", + summary="Prefer plain text on Telegram.", + triggers=["telegram", "preference"], + confidence=0.95, + success_rate=0.9, + freshness=0.9, + ) + ], + ) + ] + ) + persistence = PersistenceStore(base_dir=tmp_path / "artifacts") + runner = MemabraRunner( + retriever=retriever, + router=RuleBasedRouter(), + execution_engine=ExecutionEngine(memory_executor=MemoryExecutor(memory_store=memory_store)), + persistence_store=persistence, + memory_store=memory_store, + ) + + trajectory = runner.run( + context=TaskContext(user_input="Use my telegram preference for this answer."), + channel="telegram", + user_id="oza", + persist=True, + ) + + SchemaRegistry().validate_trajectory(trajectory) + assert any(event["event_type"] == "memory_injected" for event in trajectory["events"]) + assert memory_store.get("mem-telegram-pref").last_used_at is not None + assert persistence.load_trajectory(trajectory["trajectory_id"])["trajectory_id"] == trajectory["trajectory_id"] + + +def test_persistence_store_round_trip_memory_record(tmp_path: Path): + persistence = PersistenceStore(base_dir=tmp_path / "artifacts") + record = MemoryRecord( + id="mem-1", + memory_type="semantic", + fact_status="assumed", + content="User likes concise replies.", + summary="Concise reply preference", + source=MemorySource(kind="user", ref="session-2"), + confidence=0.7, + ) + + persistence.save_memory_record(record) + loaded = persistence.load_memory_record("mem-1") + assert loaded["id"] == "mem-1" + assert len(persistence.list_memory_paths()) == 1 + + +def test_runner_records_tool_failures_in_outcome_and_reward(tmp_path: Path): + retriever = CandidateRetriever( + [ + InMemoryCandidateProvider( + candidate_type="tool", + candidates=[ + CandidateObject( + id="tool-terminal", + type="tool", + title="terminal", + summary="Run terminal commands.", + triggers=["check", "current"], + confidence=0.95, + success_rate=0.9, + freshness=1.0, + ) + ], + ) + ] + ) + persistence = PersistenceStore(base_dir=tmp_path / "artifacts") + runner = MemabraRunner( + retriever=retriever, + router=RuleBasedRouter(), + execution_engine=ExecutionEngine(tool_backend=FailingToolBackend()), + persistence_store=persistence, + ) + + trajectory = runner.run( + context=TaskContext(user_input="Check the current status."), + channel="telegram", + persist=True, + ) + + assert trajectory["outcome"]["status"] == "failure" + assert trajectory["outcome"]["tool_errors"] == 1 + assert trajectory["reward"]["components"]["tool_error"] > 0 + assert trajectory["reward"]["components"]["latency"] > 0 + assert any(event["event_type"] == "tool_result" for event in trajectory["events"]) + + +def test_runner_loads_skill_payload_from_backend(): + retriever = CandidateRetriever( + [ + InMemoryCandidateProvider( + candidate_type="skill", + candidates=[ + CandidateObject( + id="skill-deploy", + type="skill", + title="deploy workflow", + summary="Reusable deployment procedure.", + triggers=["deploy", "workflow"], + confidence=0.9, + success_rate=0.95, + freshness=0.8, + ) + ], + ) + ] + ) + runner = MemabraRunner( + retriever=retriever, + router=RuleBasedRouter(), + execution_engine=ExecutionEngine(skill_backend=StaticSkillBackend()), + ) + + trajectory = runner.run(context=TaskContext(user_input="Deploy this service with the usual workflow.")) + + skill_events = [event for event in trajectory["events"] if event["event_type"] == "skill_loaded"] + assert skill_events + assert skill_events[0]["payload"]["instructions"] == "Follow the documented deployment workflow." + + +def test_runner_detects_partial_success_for_mixed_tool_results(): + class BothToolsRouter: + def choose(self, context, memory, skill, tool): + from memabra.router import RouteDecision + return RouteDecision( + decision_type="call_tool", + selected_ids=["tool-ok", "tool-fail"], + selected_payloads=[{}, {}], + rationale="Force both tools for testing.", + ) + + retriever = CandidateRetriever( + [ + InMemoryCandidateProvider( + candidate_type="tool", + candidates=[ + CandidateObject( + id="tool-ok", + type="tool", + title="ok tool", + summary="Always succeeds.", + triggers=["check", "current"], + confidence=0.95, + success_rate=0.9, + freshness=1.0, + ), + CandidateObject( + id="tool-fail", + type="tool", + title="failing tool", + summary="Always fails.", + triggers=["check", "current"], + confidence=0.9, + success_rate=0.5, + freshness=1.0, + ), + ], + ) + ] + ) + runner = MemabraRunner( + retriever=retriever, + router=BothToolsRouter(), + execution_engine=ExecutionEngine(tool_backend=MixedResultToolBackend()), + ) + + trajectory = runner.run( + context=TaskContext(user_input="Check the current status."), + channel="local", + ) + + assert trajectory["outcome"]["status"] == "partial_success" + assert trajectory["outcome"]["tool_errors"] == 1 + assert trajectory["reward"]["components"]["tool_error"] > 0 + assert trajectory["reward"]["components"]["context_cost"] > 0 + + +def test_execution_engine_executes_composite_action_sequentially(): + memory_store = InMemoryMemoryStore() + memory_store.upsert( + MemoryRecord( + id="mem-1", + memory_type="semantic", + fact_status="verified", + content="Prefer concise replies.", + summary="Concise preference", + source=MemorySource(kind="user", ref="session-1"), + confidence=0.9, + ) + ) + engine = ExecutionEngine( + memory_executor=MemoryExecutor(memory_store=memory_store), + tool_executor=ToolExecutor(backend=MixedResultToolBackend()), + ) + decision = RouteDecision( + decision_type="composite_action", + composite_steps=[ + RouteDecision(decision_type="inject_memory", selected_ids=["mem-1"]), + RouteDecision(decision_type="call_tool", selected_ids=["tool-ok"], selected_payloads=[{}]), + ], + ) + result = engine.execute(decision, TaskContext(user_input="composite test"), trajectory_id="traj-comp") + + assert result.status == "executed" + assert any(event.event_type == "memory_injected" for event in result.events) + assert any(event.event_type == "tool_result" for event in result.events) + assert len(result.details["steps"]) == 2 + assert result.details["steps"][0]["decision_type"] == "inject_memory" + assert result.details["steps"][1]["decision_type"] == "call_tool" + diff --git a/tests/test_learning_router.py b/tests/test_learning_router.py new file mode 100644 index 0000000..568adb5 --- /dev/null +++ b/tests/test_learning_router.py @@ -0,0 +1,91 @@ +from memabra.candidate_types import CandidateObject +from memabra.dataset import TrainingSample +from memabra.router import SimpleLearningRouter, TaskContext + + +def test_learning_router_fits_and_predicts(): + router = SimpleLearningRouter() + samples = [ + TrainingSample( + input_text="run tool", + features={ + "input_length": 8, + "memory_count": 0, + "skill_count": 0, + "tool_count": 1, + "top_memory_confidence": 0.0, + "top_skill_success_rate": 0.0, + "top_tool_confidence": 0.9, + "top_tool_risk": 0.1, + }, + label="call_tool", + reward=1.0, + ), + TrainingSample( + input_text="remember", + features={ + "input_length": 8, + "memory_count": 1, + "skill_count": 0, + "tool_count": 0, + "top_memory_confidence": 0.9, + "top_skill_success_rate": 0.0, + "top_tool_confidence": 0.0, + "top_tool_risk": 0.0, + }, + label="inject_memory", + reward=1.0, + ), + ] + router.fit(samples) + + tool = CandidateObject( + id="t1", + type="tool", + title="t", + summary="s", + triggers=[], + confidence=0.9, + success_rate=0.9, + freshness=0.9, + cost=0.0, + risk=0.1, + ) + decision = router.choose( + TaskContext(user_input="run tool"), + memory_candidates=[], + skill_candidates=[], + tool_candidates=[tool], + ) + assert decision.decision_type == "call_tool" + + mem = CandidateObject( + id="m1", + type="memory", + title="m", + summary="s", + triggers=[], + confidence=0.9, + success_rate=0.9, + freshness=0.9, + cost=0.0, + risk=0.0, + ) + decision = router.choose( + TaskContext(user_input="remember"), + memory_candidates=[mem], + skill_candidates=[], + tool_candidates=[], + ) + assert decision.decision_type == "inject_memory" + + +def test_learning_router_falls_back_to_clarify_when_untrained(): + router = SimpleLearningRouter() + decision = router.choose( + TaskContext(user_input="hi"), + memory_candidates=[], + skill_candidates=[], + tool_candidates=[], + ) + assert decision.decision_type == "clarify" diff --git a/tests/test_memory_store.py b/tests/test_memory_store.py new file mode 100644 index 0000000..cf8c382 --- /dev/null +++ b/tests/test_memory_store.py @@ -0,0 +1,27 @@ +from memabra.memory_store import InMemoryMemoryStore, MemoryRecord, MemorySource +from memabra.schemas import SchemaRegistry + + +def test_memory_store_verify_and_revoke_round_trip(): + store = InMemoryMemoryStore() + record = MemoryRecord( + id="mem-pref-1", + memory_type="semantic", + fact_status="assumed", + content="User prefers plain text on Telegram.", + summary="Telegram plain-text preference", + source=MemorySource(kind="user", ref="session-1"), + confidence=0.9, + ) + store.upsert(record) + store.verify("mem-pref-1", status="confirmed", check_method="user-confirmed") + store.mark_used("mem-pref-1") + store.revoke("mem-pref-1", reason="User changed preference") + + updated = store.get("mem-pref-1") + assert updated is not None + assert updated.verification.status == "confirmed" + assert updated.last_used_at is not None + assert updated.fact_status == "revoked" + + SchemaRegistry().validate_memory_record(updated.to_dict()) diff --git a/tests/test_online_learning.py b/tests/test_online_learning.py new file mode 100644 index 0000000..bd5e40e --- /dev/null +++ b/tests/test_online_learning.py @@ -0,0 +1,348 @@ +from __future__ import annotations + +from memabra.app import build_demo_app +from memabra.benchmarks import BenchmarkTask +from memabra.dataset import DatasetBuilder +from memabra.evaluator import Evaluator +from memabra.online_learning import OnlineLearningCoordinator +from memabra.promotion import PromotionPolicy +from memabra.router_versioning import RouterVersionStore + + +def _seed_trajectories(app, count: int): + for i in range(count): + app.run_task(f"Test task {i}", channel="local") + + +def test_coordinator_skips_when_too_few_new_trajectories(tmp_path): + app = build_demo_app(base_dir=tmp_path / "demo-artifacts") + _seed_trajectories(app, 2) + + coordinator = OnlineLearningCoordinator( + app=app, + policy=PromotionPolicy( + min_reward_delta=0.01, + max_error_rate_increase=0.05, + max_latency_increase_ms=100.0, + required_task_count=1, + ), + benchmark_tasks=[BenchmarkTask(user_input="test")], + min_new_trajectories=5, + ) + + result = coordinator.run_cycle() + + assert result["skipped"] is True + assert "too few new trajectories" in result["reason"].lower() + + +def test_coordinator_rejects_when_policy_fails(tmp_path): + app = build_demo_app(base_dir=tmp_path / "demo-artifacts") + # Seed enough trajectories for training and benchmarking + _seed_trajectories(app, 10) + + # Use a very strict policy that will reject any challenger + policy = PromotionPolicy( + min_reward_delta=1.0, # impossible to meet + max_error_rate_increase=0.0, + max_latency_increase_ms=0.0, + required_task_count=1, + ) + + coordinator = OnlineLearningCoordinator( + app=app, + policy=policy, + benchmark_tasks=[BenchmarkTask(user_input="Test task 0")], + min_new_trajectories=1, + version_store_base_dir=tmp_path / "versions", + ) + + result = coordinator.run_cycle() + + assert result["skipped"] is False + assert result["promoted"] is False + assert "decision" in result + assert result["decision"].accepted is False + + +def test_coordinator_accepts_and_saves_version_when_policy_passes(tmp_path): + app = build_demo_app(base_dir=tmp_path / "demo-artifacts") + _seed_trajectories(app, 10) + + # Lenient policy that should pass + policy = PromotionPolicy( + min_reward_delta=-1.0, # always passes + max_error_rate_increase=1.0, + max_latency_increase_ms=10000.0, + required_task_count=1, + ) + + version_dir = tmp_path / "versions" + report_dir = tmp_path / "reports" + coordinator = OnlineLearningCoordinator( + app=app, + policy=policy, + benchmark_tasks=[BenchmarkTask(user_input="Test task 0")], + min_new_trajectories=1, + version_store_base_dir=version_dir, + report_store_base_dir=report_dir, + ) + + result = coordinator.run_cycle() + + assert result["skipped"] is False + assert result["promoted"] is True + assert "version_id" in result + assert result["decision"].accepted is True + + # Verify version was saved + store = RouterVersionStore(base_dir=version_dir) + versions = store.list_versions() + assert len(versions) == 1 + assert versions[0]["version_id"] == result["version_id"] + + # Verify report was saved + from memabra.training_reports import TrainingReportStore + report_store = TrainingReportStore(base_dir=report_dir) + reports = report_store.list_reports() + assert len(reports) == 1 + assert reports[0]["promoted_version_id"] == result["version_id"] + + +def test_coordinator_saves_report_on_rejection(tmp_path): + app = build_demo_app(base_dir=tmp_path / "demo-artifacts") + _seed_trajectories(app, 10) + + policy = PromotionPolicy( + min_reward_delta=1.0, + max_error_rate_increase=0.0, + max_latency_increase_ms=0.0, + required_task_count=1, + ) + + report_dir = tmp_path / "reports" + coordinator = OnlineLearningCoordinator( + app=app, + policy=policy, + benchmark_tasks=[BenchmarkTask(user_input="Test task 0")], + min_new_trajectories=1, + report_store_base_dir=report_dir, + ) + + result = coordinator.run_cycle() + + assert result["promoted"] is False + from memabra.training_reports import TrainingReportStore + report_store = TrainingReportStore(base_dir=report_dir) + reports = report_store.list_reports() + assert len(reports) == 1 + assert reports[0]["promotion_decision"]["accepted"] is False + + +def test_coordinator_catches_training_exception_and_returns_error_report(tmp_path): + app = build_demo_app(base_dir=tmp_path / "demo-artifacts") + _seed_trajectories(app, 10) + + policy = PromotionPolicy( + min_reward_delta=-1.0, + max_error_rate_increase=1.0, + max_latency_increase_ms=10000.0, + required_task_count=1, + ) + + report_dir = tmp_path / "reports" + coordinator = OnlineLearningCoordinator( + app=app, + policy=policy, + benchmark_tasks=[BenchmarkTask(user_input="Test task 0")], + min_new_trajectories=1, + report_store_base_dir=report_dir, + ) + + # Force a training failure by monkeypatching DatasetBuilder.build to raise + original_build = DatasetBuilder.build + DatasetBuilder.build = lambda self, trajectories: (_ for _ in ()).throw(RuntimeError("simulated training failure")) + + try: + result = coordinator.run_cycle() + finally: + DatasetBuilder.build = original_build + + assert result["skipped"] is False + assert result["promoted"] is False + assert "error" in result + assert "simulated training failure" in result["error"] + + # Verify error report was saved + from memabra.training_reports import TrainingReportStore + report_store = TrainingReportStore(base_dir=report_dir) + reports = report_store.list_reports() + assert len(reports) == 1 + assert reports[0]["promotion_decision"]["accepted"] is False + assert "simulated training failure" in reports[0]["promotion_decision"]["reasons"][0] + + +def test_coordinator_persists_seen_trajectory_ids_across_restarts(tmp_path): + app = build_demo_app(base_dir=tmp_path / "demo-artifacts") + _seed_trajectories(app, 5) + + policy = PromotionPolicy( + min_reward_delta=-1.0, + max_error_rate_increase=1.0, + max_latency_increase_ms=10000.0, + required_task_count=1, + ) + benchmark_tasks = [BenchmarkTask(user_input="Test task 0")] + seen_store = tmp_path / "seen_trajectories.json" + version_dir = tmp_path / "versions" + report_dir = tmp_path / "reports" + + coordinator1 = OnlineLearningCoordinator( + app=app, + policy=policy, + benchmark_tasks=benchmark_tasks, + min_new_trajectories=1, + version_store_base_dir=version_dir, + report_store_base_dir=report_dir, + seen_trajectory_store=seen_store, + ) + result1 = coordinator1.run_cycle() + assert result1["skipped"] is False + + # New coordinator instance pointing to same store + coordinator2 = OnlineLearningCoordinator( + app=app, + policy=policy, + benchmark_tasks=benchmark_tasks, + min_new_trajectories=1, + version_store_base_dir=version_dir, + report_store_base_dir=report_dir, + seen_trajectory_store=seen_store, + ) + result2 = coordinator2.run_cycle() + assert result2["skipped"] is True + assert "too few new trajectories" in result2["reason"].lower() + + +def test_coordinator_dry_run_does_not_promote_or_save_version(tmp_path): + app = build_demo_app(base_dir=tmp_path / "demo-artifacts") + _seed_trajectories(app, 10) + + policy = PromotionPolicy( + min_reward_delta=-1.0, + max_error_rate_increase=1.0, + max_latency_increase_ms=10000.0, + required_task_count=1, + ) + + version_dir = tmp_path / "versions" + report_dir = tmp_path / "reports" + coordinator = OnlineLearningCoordinator( + app=app, + policy=policy, + benchmark_tasks=[BenchmarkTask(user_input="Test task 0")], + min_new_trajectories=1, + version_store_base_dir=version_dir, + report_store_base_dir=report_dir, + ) + + result = coordinator.run_cycle(dry_run=True) + + assert result["skipped"] is False + assert result["promoted"] is False + assert "decision" in result + assert result["decision"].accepted is True # policy would accept, but dry_run blocks promotion + + # No version should be saved + store = RouterVersionStore(base_dir=version_dir) + assert len(store.list_versions()) == 0 + + # Report should still be saved for audit + from memabra.training_reports import TrainingReportStore + + report_store = TrainingReportStore(base_dir=report_dir) + reports = report_store.list_reports() + assert len(reports) == 1 + assert reports[0].get("dry_run") is True + + +def test_coordinator_rebuilds_case_index_when_path_provided(tmp_path): + app = build_demo_app(base_dir=tmp_path / "demo-artifacts") + _seed_trajectories(app, 10) + + policy = PromotionPolicy( + min_reward_delta=-1.0, + max_error_rate_increase=1.0, + max_latency_increase_ms=10000.0, + required_task_count=1, + ) + + case_index_path = tmp_path / "case-index.json" + coordinator = OnlineLearningCoordinator( + app=app, + policy=policy, + benchmark_tasks=[BenchmarkTask(user_input="Test task 0")], + min_new_trajectories=1, + case_index_path=case_index_path, + ) + + result = coordinator.run_cycle() + + assert result["skipped"] is False + assert case_index_path.exists() + from memabra.case_index import CaseIndex + + index = CaseIndex.load(case_index_path) + assert index.best("Test task 0") is not None + + +def test_coordinator_uses_specified_baseline_version(tmp_path): + from memabra.router import SimpleLearningRouter + + app = build_demo_app(base_dir=tmp_path / "demo-artifacts") + _seed_trajectories(app, 10) + + # Save a baseline version with known weights + baseline_router = SimpleLearningRouter() + baseline_router._weights = {"call_tool": {"input_length": 0.99}} + baseline_router._feature_keys = ["input_length"] + version_dir = tmp_path / "versions" + store = RouterVersionStore(base_dir=version_dir) + store.save(baseline_router, version_id="v-baseline", metadata={"note": "baseline"}) + + # Change app's current router to something different + different_router = SimpleLearningRouter() + different_router._weights = {"clarify": {"input_length": 0.01}} + different_router._feature_keys = ["input_length"] + app.set_router(different_router) + + policy = PromotionPolicy( + min_reward_delta=-1.0, + max_error_rate_increase=1.0, + max_latency_increase_ms=10000.0, + required_task_count=1, + ) + + report_dir = tmp_path / "reports" + coordinator = OnlineLearningCoordinator( + app=app, + policy=policy, + benchmark_tasks=[BenchmarkTask(user_input="Test task 0")], + min_new_trajectories=1, + version_store_base_dir=version_dir, + report_store_base_dir=report_dir, + ) + + result = coordinator.run_cycle(baseline_version_id="v-baseline") + + assert result["skipped"] is False + assert "baseline_metrics" in result + assert "challenger_metrics" in result + + # Verify report records the baseline version + from memabra.training_reports import TrainingReportStore + + report_store = TrainingReportStore(base_dir=report_dir) + reports = report_store.list_reports() + assert len(reports) == 1 + assert reports[0].get("baseline_version_id") == "v-baseline" diff --git a/tests/test_outcome_reward.py b/tests/test_outcome_reward.py new file mode 100644 index 0000000..a77c26a --- /dev/null +++ b/tests/test_outcome_reward.py @@ -0,0 +1,126 @@ +from memabra.execution import ActionResult +from memabra.outcome import OutcomeEngine, RewardEngine +from memabra.retrieval import RetrievalResult +from memabra.router import RouteDecision, TaskContext +from memabra.telemetry import RewardBreakdown + + +def test_outcome_engine_success_for_memory_injection(): + engine = OutcomeEngine() + decision = RouteDecision(decision_type="inject_memory", selected_ids=["mem-1"]) + result = ActionResult(decision_type="inject_memory", status="executed", details={"latency_ms": 50}) + + outcome = engine.build_outcome(decision, result) + + assert outcome.status == "success" + assert outcome.steps == 1 + assert outcome.latency_ms == 50 + assert outcome.tool_errors == 0 + + +def test_outcome_engine_failure_for_tool_error(): + engine = OutcomeEngine() + decision = RouteDecision(decision_type="call_tool", selected_ids=["tool-1"]) + result = ActionResult(decision_type="call_tool", status="error", details={"latency_ms": 120}) + + outcome = engine.build_outcome(decision, result) + + assert outcome.status == "failure" + assert outcome.latency_ms == 120 + assert outcome.tool_errors == 1 + + +def test_outcome_engine_counts_multiple_tool_errors(): + engine = OutcomeEngine() + decision = RouteDecision(decision_type="call_tool", selected_ids=["tool-1", "tool-2"]) + result = ActionResult( + decision_type="call_tool", + status="error", + details={ + "latency_ms": 200, + "results": [ + {"tool_id": "tool-1", "status": "error"}, + {"tool_id": "tool-2", "status": "error"}, + ], + }, + ) + + outcome = engine.build_outcome(decision, result) + + assert outcome.status == "failure" + assert outcome.tool_errors == 2 + + +def test_outcome_engine_partial_success_for_mixed_tool_results(): + engine = OutcomeEngine() + decision = RouteDecision(decision_type="call_tool", selected_ids=["tool-1", "tool-2"]) + result = ActionResult( + decision_type="call_tool", + status="error", + details={ + "latency_ms": 200, + "results": [ + {"tool_id": "tool-1", "status": "success"}, + {"tool_id": "tool-2", "status": "error"}, + ], + }, + ) + + outcome = engine.build_outcome(decision, result) + + assert outcome.status == "partial_success" + assert outcome.tool_errors == 1 + + +def test_reward_engine_penalizes_latency_by_tier(): + outcome_engine = OutcomeEngine() + reward_engine = RewardEngine() + decision = RouteDecision(decision_type="call_tool") + outcome_fast = outcome_engine.build_outcome(decision, ActionResult(decision_type="call_tool", status="success", details={"latency_ms": 200})) + outcome_slow = outcome_engine.build_outcome(decision, ActionResult(decision_type="call_tool", status="success", details={"latency_ms": 2500})) + + reward_fast = reward_engine.compute(decision, outcome_fast) + reward_slow = reward_engine.compute(decision, outcome_slow) + + assert reward_fast.latency < reward_slow.latency + assert reward_slow.latency > 0.5 + + +def test_reward_engine_context_cost_based_on_candidate_count(): + from memabra.candidate_types import CandidateObject + + outcome_engine = OutcomeEngine() + reward_engine = RewardEngine() + decision = RouteDecision(decision_type="direct_answer") + outcome = outcome_engine.build_outcome(decision, ActionResult(decision_type="direct_answer", status="skipped", details={"latency_ms": 0})) + dummy_candidate = CandidateObject(id="c1", type="memory", title="t", summary="s", triggers=[]) + retrieval = RetrievalResult(memory=[dummy_candidate, dummy_candidate, dummy_candidate], skill=[dummy_candidate, dummy_candidate], tool=[dummy_candidate]) + + reward = reward_engine.compute(decision, outcome, retrieval_result=retrieval) + + assert reward.context_cost > 0 + + +def test_reward_engine_reduces_task_success_for_multiple_errors(): + outcome_engine = OutcomeEngine() + reward_engine = RewardEngine() + decision = RouteDecision(decision_type="call_tool") + outcome = outcome_engine.build_outcome( + decision, + ActionResult( + decision_type="call_tool", + status="error", + details={ + "latency_ms": 100, + "results": [ + {"tool_id": "tool-1", "status": "error"}, + {"tool_id": "tool-2", "status": "error"}, + ], + }, + ), + ) + + reward = reward_engine.compute(decision, outcome) + + assert reward.task_success < 0.5 + assert reward.tool_error >= 0.5 diff --git a/tests/test_package_exports.py b/tests/test_package_exports.py new file mode 100644 index 0000000..5b9deee --- /dev/null +++ b/tests/test_package_exports.py @@ -0,0 +1,22 @@ +def test_memabra_package_exports_alpha_modules(): + from src import memabra + + assert hasattr(memabra, "promotion") + assert hasattr(memabra, "benchmarks") + assert hasattr(memabra, "online_learning") + assert hasattr(memabra, "training_reports") + + +def test_memabra_top_level_imports(): + from memabra import PromotionPolicy, BenchmarkSuite, OnlineLearningCoordinator, TrainingReportStore, CaseIndex + + assert PromotionPolicy is not None + assert BenchmarkSuite is not None + assert OnlineLearningCoordinator is not None + assert TrainingReportStore is not None + assert CaseIndex is not None + + +def test_benchmark_task_exported_from_package(): + from memabra import BenchmarkTask + assert BenchmarkTask is not None diff --git a/tests/test_promotion.py b/tests/test_promotion.py new file mode 100644 index 0000000..4cce6ac --- /dev/null +++ b/tests/test_promotion.py @@ -0,0 +1,112 @@ +from __future__ import annotations + +import pytest + +from memabra.promotion import PromotionDecision, PromotionPolicy +from memabra.evaluator import EvaluationResult + + +class TestPromotionPolicy: + def test_accepted_when_challenger_improves_on_all_metrics(self): + policy = PromotionPolicy( + min_reward_delta=0.01, + max_error_rate_increase=0.05, + max_latency_increase_ms=100.0, + required_task_count=2, + ) + baseline = EvaluationResult( + task_count=2, + avg_reward=0.5, + error_rate=0.1, + avg_latency_ms=50.0, + ) + challenger = EvaluationResult( + task_count=2, + avg_reward=0.6, + error_rate=0.05, + avg_latency_ms=45.0, + ) + + decision = policy.evaluate(baseline, challenger) + + assert isinstance(decision, PromotionDecision) + assert decision.accepted is True + assert decision.reasons == [] + assert decision.metrics["reward_delta"] == pytest.approx(0.1, abs=0.001) + assert decision.metrics["error_rate_delta"] == pytest.approx(-0.05, abs=0.001) + assert decision.metrics["latency_delta_ms"] == pytest.approx(-5.0, abs=0.001) + + def test_rejected_when_reward_delta_below_minimum(self): + policy = PromotionPolicy( + min_reward_delta=0.1, + max_error_rate_increase=0.05, + max_latency_increase_ms=100.0, + required_task_count=2, + ) + baseline = EvaluationResult(task_count=2, avg_reward=0.5, error_rate=0.1, avg_latency_ms=50.0) + challenger = EvaluationResult(task_count=2, avg_reward=0.55, error_rate=0.1, avg_latency_ms=50.0) + + decision = policy.evaluate(baseline, challenger) + + assert decision.accepted is False + assert any("reward" in r.lower() for r in decision.reasons) + + def test_rejected_when_error_rate_increase_exceeds_max(self): + policy = PromotionPolicy( + min_reward_delta=0.01, + max_error_rate_increase=0.05, + max_latency_increase_ms=100.0, + required_task_count=2, + ) + baseline = EvaluationResult(task_count=2, avg_reward=0.5, error_rate=0.1, avg_latency_ms=50.0) + challenger = EvaluationResult(task_count=2, avg_reward=0.6, error_rate=0.2, avg_latency_ms=50.0) + + decision = policy.evaluate(baseline, challenger) + + assert decision.accepted is False + assert any("error" in r.lower() for r in decision.reasons) + + def test_rejected_when_latency_increase_exceeds_max(self): + policy = PromotionPolicy( + min_reward_delta=0.01, + max_error_rate_increase=0.05, + max_latency_increase_ms=10.0, + required_task_count=2, + ) + baseline = EvaluationResult(task_count=2, avg_reward=0.5, error_rate=0.1, avg_latency_ms=50.0) + challenger = EvaluationResult(task_count=2, avg_reward=0.6, error_rate=0.1, avg_latency_ms=65.0) + + decision = policy.evaluate(baseline, challenger) + + assert decision.accepted is False + assert any("latency" in r.lower() for r in decision.reasons) + + def test_rejected_when_task_count_below_required(self): + policy = PromotionPolicy( + min_reward_delta=0.01, + max_error_rate_increase=0.05, + max_latency_increase_ms=100.0, + required_task_count=5, + ) + baseline = EvaluationResult(task_count=2, avg_reward=0.5, error_rate=0.1, avg_latency_ms=50.0) + challenger = EvaluationResult(task_count=2, avg_reward=0.6, error_rate=0.1, avg_latency_ms=50.0) + + decision = policy.evaluate(baseline, challenger) + + assert decision.accepted is False + assert any("task count" in r.lower() for r in decision.reasons) + + def test_multiple_rejection_reasons_accumulate(self): + policy = PromotionPolicy( + min_reward_delta=0.2, + max_error_rate_increase=0.01, + max_latency_increase_ms=10.0, + required_task_count=10, + ) + baseline = EvaluationResult(task_count=2, avg_reward=0.5, error_rate=0.1, avg_latency_ms=50.0) + challenger = EvaluationResult(task_count=2, avg_reward=0.55, error_rate=0.15, avg_latency_ms=70.0) + + decision = policy.evaluate(baseline, challenger) + + assert decision.accepted is False + assert len(decision.reasons) >= 3 diff --git a/tests/test_replay.py b/tests/test_replay.py new file mode 100644 index 0000000..2685c31 --- /dev/null +++ b/tests/test_replay.py @@ -0,0 +1,57 @@ +from pathlib import Path + +from memabra.persistence import PersistenceStore +from memabra.replay import TrajectoryReplay + + +EXAMPLE_DIR = "docs/examples" + + +def test_replay_summary_counts_outcomes_and_actions(): + replay = TrajectoryReplay() + summary = replay.summarize_directory(EXAMPLE_DIR) + + assert summary.trajectories == 4 + assert summary.success_count == 2 + assert summary.partial_success_count == 1 + assert summary.failure_count == 1 + assert summary.direct_answer_count == 1 + assert summary.memory_action_count == 1 + assert summary.tool_action_count == 2 + assert summary.skill_action_count == 0 + + +def test_replay_can_summarize_persisted_artifacts(tmp_path: Path): + persistence = PersistenceStore(base_dir=tmp_path / "artifacts") + persistence.save_trajectory( + { + "trajectory_id": "traj-1", + "task": {"task_id": "task-1", "input": "A", "channel": "local", "created_at": "2026-01-01T00:00:00Z", "user_id": None}, + "context_snapshot": {"conversation_summary": "", "environment_summary": "", "recent_failures": []}, + "candidate_sets": {"memory": [], "skill": [], "tool": []}, + "decisions": [{"step": 1, "decision_type": "direct_answer", "selected_ids": [], "rejected_ids": [], "rationale": "", "estimated_cost": 0}], + "events": [], + "outcome": {"status": "success", "steps": 1, "latency_ms": 10, "user_corrections": 0, "tool_errors": 0, "notes": None}, + "reward": {"total": 1.0, "components": {"task_success": 1.0, "retrieval_hit": 0.0, "tool_error": 0.0, "user_correction": 0.0, "latency": 0.0, "context_cost": 0.0, "useful_reuse": 0.0}}, + } + ) + persistence.save_trajectory( + { + "trajectory_id": "traj-2", + "task": {"task_id": "task-2", "input": "B", "channel": "local", "created_at": "2026-01-01T00:00:00Z", "user_id": None}, + "context_snapshot": {"conversation_summary": "", "environment_summary": "", "recent_failures": []}, + "candidate_sets": {"memory": [], "skill": [], "tool": []}, + "decisions": [{"step": 1, "decision_type": "call_tool", "selected_ids": ["tool-1"], "rejected_ids": [], "rationale": "", "estimated_cost": 0.1}], + "events": [], + "outcome": {"status": "failure", "steps": 1, "latency_ms": 50, "user_corrections": 0, "tool_errors": 1, "notes": None}, + "reward": {"total": -0.2, "components": {"task_success": 0.2, "retrieval_hit": 0.0, "tool_error": 0.3, "user_correction": 0.0, "latency": 0.05, "context_cost": 0.0, "useful_reuse": 0.0}}, + } + ) + + replay = TrajectoryReplay() + summary = replay.summarize_persistence_store(persistence) + + assert summary.trajectories == 2 + assert summary.success_count == 1 + assert summary.failure_count == 1 + assert summary.tool_action_count == 1 diff --git a/tests/test_retrieval.py b/tests/test_retrieval.py new file mode 100644 index 0000000..42b4068 --- /dev/null +++ b/tests/test_retrieval.py @@ -0,0 +1,45 @@ +from memabra.candidate_types import CandidateObject +from memabra.retrieval import CandidateRetriever, InMemoryCandidateProvider +from memabra.router import TaskContext + + +def test_retriever_ranks_trigger_matches_first(): + retriever = CandidateRetriever( + [ + InMemoryCandidateProvider( + candidate_type="memory", + candidates=[ + CandidateObject( + id="mem-weak", + type="memory", + title="Generic preference", + summary="A weak preference record", + confidence=0.4, + success_rate=0.4, + freshness=0.4, + triggers=["generic"], + ), + CandidateObject( + id="mem-strong", + type="memory", + title="Formatting preference", + summary="Telegram prefers plain text", + confidence=0.8, + success_rate=0.9, + freshness=0.9, + triggers=["telegram", "formatting"], + tags=["output"], + ), + ], + ) + ] + ) + + result = retriever.retrieve( + TaskContext(user_input="Use my telegram formatting preference for the output."), + top_k=2, + ) + + assert [candidate.id for candidate in result.memory] == ["mem-strong", "mem-weak"] + assert result.skill == [] + assert result.tool == [] diff --git a/tests/test_router_feature_scoring.py b/tests/test_router_feature_scoring.py new file mode 100644 index 0000000..5975eb3 --- /dev/null +++ b/tests/test_router_feature_scoring.py @@ -0,0 +1,137 @@ +from memabra.candidate_types import CandidateObject +from memabra.router import FeatureScoringRouter, TaskContext + + +def test_feature_scoring_router_computes_score_breakdown_and_selects_best(): + router = FeatureScoringRouter() + memory = CandidateObject( + id="mem-1", + type="memory", + title="m1", + summary="s1", + confidence=0.9, + success_rate=0.9, + freshness=0.9, + cost=0.1, + risk=0.1, + ) + tool = CandidateObject( + id="tool-1", + type="tool", + title="t1", + summary="s1", + confidence=0.8, + success_rate=0.8, + freshness=0.8, + cost=0.1, + risk=0.1, + ) + decision = router.choose( + TaskContext(user_input="do something"), + memory_candidates=[memory], + skill_candidates=[], + tool_candidates=[tool], + ) + assert decision.decision_type == "inject_memory" + assert "mem-1" in decision.score_breakdown + assert "tool-1" in decision.score_breakdown + assert decision.score_breakdown["mem-1"] > decision.score_breakdown["tool-1"] + + +def test_feature_scoring_router_applies_failure_penalty(): + router = FeatureScoringRouter() + tool_a = CandidateObject( + id="tool-a", + type="tool", + title="ta", + summary="sa", + confidence=0.9, + success_rate=0.9, + freshness=0.9, + cost=0.0, + risk=0.0, + ) + tool_b = CandidateObject( + id="tool-b", + type="tool", + title="tb", + summary="sb", + confidence=0.9, + success_rate=0.9, + freshness=0.9, + cost=0.0, + risk=0.0, + ) + context = TaskContext(user_input="run tool", recent_failures=["tool-b"]) + decision = router.choose( + context, + memory_candidates=[], + skill_candidates=[], + tool_candidates=[tool_a, tool_b], + ) + assert decision.decision_type == "call_tool" + assert decision.selected_ids == ["tool-a"] + assert decision.score_breakdown["tool-b"] < decision.score_breakdown["tool-a"] + + +def test_feature_scoring_router_emits_composite_action_for_preconditions(): + router = FeatureScoringRouter() + memory = CandidateObject( + id="mem-1", + type="memory", + title="m1", + summary="s1", + confidence=0.7, + success_rate=0.5, + freshness=0.3, + cost=0.0, + risk=0.0, + ) + tool = CandidateObject( + id="tool-1", + type="tool", + title="t1", + summary="s1", + confidence=0.9, + success_rate=0.9, + freshness=0.9, + cost=0.0, + risk=0.0, + preconditions=["memory"], + ) + decision = router.choose( + TaskContext(user_input="run tool"), + memory_candidates=[memory], + skill_candidates=[], + tool_candidates=[tool], + ) + assert decision.decision_type == "composite_action" + assert len(decision.composite_steps) == 2 + assert decision.composite_steps[0].decision_type == "inject_memory" + assert decision.composite_steps[0].selected_ids == ["mem-1"] + assert decision.composite_steps[1].decision_type == "call_tool" + assert decision.composite_steps[1].selected_ids == ["tool-1"] + + +def test_feature_scoring_router_fallback_when_precondition_missing(): + router = FeatureScoringRouter() + tool = CandidateObject( + id="tool-1", + type="tool", + title="t1", + summary="s1", + confidence=0.9, + success_rate=0.9, + freshness=0.9, + cost=0.0, + risk=0.0, + preconditions=["memory"], + ) + decision = router.choose( + TaskContext(user_input="run tool"), + memory_candidates=[], + skill_candidates=[], + tool_candidates=[tool], + ) + assert decision.decision_type == "call_tool" + assert decision.selected_ids == ["tool-1"] diff --git a/tests/test_router_protocol.py b/tests/test_router_protocol.py new file mode 100644 index 0000000..7475110 --- /dev/null +++ b/tests/test_router_protocol.py @@ -0,0 +1,12 @@ +from memabra.router import ( + FeatureScoringRouter, + RouterProtocol, + RuleBasedRouter, + SimpleLearningRouter, +) + + +def test_all_router_implementations_conform_to_router_protocol(): + assert isinstance(RuleBasedRouter(), RouterProtocol) + assert isinstance(FeatureScoringRouter(), RouterProtocol) + assert isinstance(SimpleLearningRouter(), RouterProtocol) diff --git a/tests/test_router_smoke.py b/tests/test_router_smoke.py new file mode 100644 index 0000000..08ae048 --- /dev/null +++ b/tests/test_router_smoke.py @@ -0,0 +1,25 @@ +from memabra.candidate_types import CandidateObject +from memabra.router import RuleBasedRouter, TaskContext + + +def test_router_prefers_memory_for_preference_queries(): + router = RuleBasedRouter() + decision = router.choose( + TaskContext(user_input="Remember my preferred deployment region"), + memory_candidates=[ + CandidateObject( + id="mem-1", + type="memory", + title="Preferred region", + summary="User prefers us-west-2", + confidence=0.9, + freshness=0.8, + success_rate=0.9, + ) + ], + skill_candidates=[], + tool_candidates=[], + ) + + assert decision.decision_type == "inject_memory" + assert decision.selected_ids == ["mem-1"] diff --git a/tests/test_router_versioning.py b/tests/test_router_versioning.py new file mode 100644 index 0000000..367293c --- /dev/null +++ b/tests/test_router_versioning.py @@ -0,0 +1,115 @@ +import json +from pathlib import Path + +from memabra.router import SimpleLearningRouter +from memabra.router_versioning import RouterVersionStore + + +def test_save_and_load_router_version(tmp_path): + store = RouterVersionStore(base_dir=tmp_path) + router = SimpleLearningRouter() + router._weights = {"call_tool": {"input_length": 0.5, "tool_count": 1.2}} + router._feature_keys = ["input_length", "tool_count"] + + store.save(router, version_id="v1", metadata={"avg_reward": 0.75}) + loaded = store.load("v1") + + assert loaded._weights == router._weights + assert loaded._feature_keys == router._feature_keys + + +def test_list_versions_returns_metadata(tmp_path): + store = RouterVersionStore(base_dir=tmp_path) + router = SimpleLearningRouter() + router._weights = {"inject_memory": {"memory_count": 0.8}} + router._feature_keys = ["memory_count"] + + store.save(router, version_id="v1", metadata={"avg_reward": 0.75}) + store.save(router, version_id="v2", metadata={"avg_reward": 0.82}) + + versions = store.list_versions() + assert len(versions) == 2 + assert versions[0]["version_id"] == "v1" + assert versions[0]["metadata"]["avg_reward"] == 0.75 + assert versions[1]["version_id"] == "v2" + assert versions[1]["metadata"]["avg_reward"] == 0.82 + + +def test_rollback_changes_current_version(tmp_path): + store = RouterVersionStore(base_dir=tmp_path) + router = SimpleLearningRouter() + router._weights = {"a": {"x": 1.0}} + router._feature_keys = ["x"] + + store.save(router, version_id="v1") + store.save(router, version_id="v2") + assert store.get_current()["current_version_id"] == "v2" + + store.rollback("v1") + current = store.get_current() + assert current["current_version_id"] == "v1" + assert current.get("rollback_from") == "v2" + assert "rolled_back_at" in current + + +def test_save_tracks_active_router_metadata(tmp_path): + store = RouterVersionStore(base_dir=tmp_path) + router = SimpleLearningRouter() + router._weights = {"a": {"x": 1.0}} + router._feature_keys = ["x"] + + store.save( + router, + version_id="v1", + metadata={"promotion_source": "online_learning", "benchmark_summary": {"reward_delta": 0.1}}, + ) + + current = store.get_current() + assert current["current_version_id"] == "v1" + assert current["promotion_source"] == "online_learning" + assert current["benchmark_summary"]["reward_delta"] == 0.1 + assert current.get("prior_version_id") is None + + +def test_save_records_prior_version_id(tmp_path): + store = RouterVersionStore(base_dir=tmp_path) + router = SimpleLearningRouter() + router._weights = {"a": {"x": 1.0}} + router._feature_keys = ["x"] + + store.save(router, version_id="v1") + store.save(router, version_id="v2") + + current = store.get_current() + assert current["current_version_id"] == "v2" + assert current["prior_version_id"] == "v1" + + +def test_load_without_version_uses_current(tmp_path): + store = RouterVersionStore(base_dir=tmp_path) + router = SimpleLearningRouter() + router._weights = {"call_tool": {"input_length": 0.5}} + router._feature_keys = ["input_length"] + + store.save(router, version_id="v1") + loaded = store.load() + + assert loaded._weights == router._weights + + +def test_app_save_and_load_learning_router(tmp_path): + from memabra.app import MemabraApp, build_demo_app + + app = build_demo_app(base_dir=tmp_path / "artifacts") + router = SimpleLearningRouter() + router._weights = {"clarify": {"input_length": 0.1}} + router._feature_keys = ["input_length"] + app.runner.router = router + + version_dir = tmp_path / "router-versions" + app.save_learning_router(version_id="v-test", base_dir=version_dir, metadata={"note": "test"}) + loaded_app = build_demo_app(base_dir=tmp_path / "artifacts") + loaded_app.load_learning_router(version_id="v-test", base_dir=version_dir) + + assert loaded_app.runner.router._weights == router._weights + assert loaded_app.runner.router._feature_keys == router._feature_keys diff --git a/tests/test_runner.py b/tests/test_runner.py new file mode 100644 index 0000000..bf63247 --- /dev/null +++ b/tests/test_runner.py @@ -0,0 +1,96 @@ +from memabra.candidate_types import CandidateObject +from memabra.retrieval import CandidateRetriever, InMemoryCandidateProvider +from memabra.router import RuleBasedRouter, TaskContext +from memabra.runner import MemabraRunner +from memabra.schemas import SchemaRegistry + + +def test_runner_produces_valid_draft_trajectory(): + retriever = CandidateRetriever( + [ + InMemoryCandidateProvider( + candidate_type="memory", + candidates=[ + CandidateObject( + id="mem-1", + type="memory", + title="Output preference", + summary="Prefer plain text on Telegram.", + triggers=["telegram", "preference"], + confidence=0.9, + success_rate=0.8, + freshness=0.9, + tags=["output"], + ) + ], + ) + ] + ) + runner = MemabraRunner(retriever=retriever, router=RuleBasedRouter()) + + trajectory = runner.run( + context=TaskContext( + user_input="Use my telegram preference for this answer.", + conversation_summary="User often cares about output formatting.", + ), + channel="telegram", + user_id="oza", + ) + + SchemaRegistry().validate_trajectory(trajectory) + assert trajectory["decisions"][0]["decision_type"] == "inject_memory" + assert trajectory["candidate_sets"]["memory"][0]["id"] == "mem-1" + assert len(trajectory["events"]) == 3 + + +def test_runner_injects_episodic_candidate_when_case_index_matches(tmp_path): + from memabra.case_index import CaseIndex + from memabra.persistence import PersistenceStore + + store = PersistenceStore(base_dir=tmp_path / "artifacts") + retriever = CandidateRetriever( + [ + InMemoryCandidateProvider( + candidate_type="memory", + candidates=[], + ), + InMemoryCandidateProvider( + candidate_type="skill", + candidates=[], + ), + InMemoryCandidateProvider( + candidate_type="tool", + candidates=[], + ), + ] + ) + runner = MemabraRunner(retriever=retriever, router=RuleBasedRouter(), persistence_store=store) + + # First run creates a trajectory + trajectory1 = runner.run( + context=TaskContext(user_input="Hello world"), + channel="local", + persist=True, + ) + + # Build case index from the trajectory + case_index = CaseIndex() + case_index.add(trajectory1) + + # Second run with case index should inject an episodic candidate + runner_with_case = MemabraRunner( + retriever=retriever, + router=RuleBasedRouter(), + persistence_store=store, + case_index=case_index, + ) + trajectory2 = runner_with_case.run( + context=TaskContext(user_input="Hello world"), + channel="local", + persist=True, + ) + + memory_candidates = trajectory2["candidate_sets"]["memory"] + assert any(c["id"].startswith("episodic-") for c in memory_candidates) + # With a persistence store, the runner should generate a rich episodic summary + assert any("Task:" in c["summary"] for c in memory_candidates) diff --git a/tests/test_schemas.py b/tests/test_schemas.py new file mode 100644 index 0000000..d53fb10 --- /dev/null +++ b/tests/test_schemas.py @@ -0,0 +1,30 @@ +import pytest + +from memabra.schemas import SchemaRegistry, SchemaValidationError + + +EXAMPLE_TRAJECTORY = "docs/examples/trajectory_success_memory.json" + + +def test_schema_registry_validates_example_trajectory(): + registry = SchemaRegistry() + with open(EXAMPLE_TRAJECTORY, "r", encoding="utf-8") as f: + example = __import__("json").load(f) + registry.validate_trajectory(example) + + +def test_schema_registry_rejects_missing_required_keys(): + registry = SchemaRegistry() + with pytest.raises(SchemaValidationError): + registry.validate_trajectory({"trajectory_id": "oops"}) + + +def test_no_resource_warning_from_schema_validation(): + import warnings + + with warnings.catch_warnings(record=True) as w: + warnings.simplefilter("always", ResourceWarning) + test_schema_registry_validates_example_trajectory() + + resource_warnings = [x for x in w if issubclass(x.category, ResourceWarning)] + assert len(resource_warnings) == 0 diff --git a/tests/test_skill_adapters.py b/tests/test_skill_adapters.py new file mode 100644 index 0000000..baa6d23 --- /dev/null +++ b/tests/test_skill_adapters.py @@ -0,0 +1,107 @@ +from pathlib import Path + +from memabra.candidate_types import CandidateObject +from memabra.execution import ExecutionEngine, FileSystemSkillBackend, SkillExecutor +from memabra.retrieval import CandidateRetriever, InMemoryCandidateProvider +from memabra.router import RouteDecision, RuleBasedRouter, TaskContext +from memabra.runner import MemabraRunner + + +def test_filesystem_skill_backend_loads_skill_from_directory(tmp_path: Path): + skill_dir = tmp_path / "category-a" / "skill-demo" + skill_dir.mkdir(parents=True) + skill_file = skill_dir / "SKILL.md" + skill_file.write_text( + "---\n" + "name: skill-demo\n" + "description: A demo skill for testing.\n" + "version: 1.0.0\n" + "---\n\n" + "# Demo Skill\n\n" + "This is the demo skill body.\n" + ) + + backend = FileSystemSkillBackend(search_paths=[tmp_path]) + payload = backend.load_skill("skill-demo") + + assert payload["skill_id"] == "skill-demo" + assert payload["name"] == "skill-demo" + assert payload["description"] == "A demo skill for testing." + assert "This is the demo skill body." in payload["content"] + + +def test_filesystem_skill_backend_returns_error_for_missing_skill(tmp_path: Path): + backend = FileSystemSkillBackend(search_paths=[tmp_path]) + payload = backend.load_skill("nonexistent") + + assert payload["skill_id"] == "nonexistent" + assert payload["status"] == "error" + assert "not found" in payload["error"].lower() + + +def test_skill_executor_uses_filesystem_backend_to_load_payload(tmp_path: Path): + skill_dir = tmp_path / "ops" / "skill-deploy" + skill_dir.mkdir(parents=True) + skill_file = skill_dir / "SKILL.md" + skill_file.write_text( + "---\n" + "name: skill-deploy\n" + "description: Deploy workflow skill.\n" + "---\n\n" + "# Deploy Workflow\n\n" + "1. Build\n2. Test\n3. Deploy\n" + ) + + backend = FileSystemSkillBackend(search_paths=[tmp_path]) + executor = SkillExecutor(backend=backend) + decision = RouteDecision(decision_type="load_skill", selected_ids=["skill-deploy"]) + result = executor.execute(decision, TaskContext(user_input="deploy"), trajectory_id="traj-1") + + assert result.status == "executed" + assert result.details["payloads"][0]["name"] == "skill-deploy" + assert "1. Build" in result.details["payloads"][0]["content"] + assert any(event.event_type == "skill_loaded" for event in result.events) + + +def test_execution_engine_runs_skill_path_end_to_end(tmp_path: Path): + skill_dir = tmp_path / "ops" / "skill-deploy" + skill_dir.mkdir(parents=True) + (skill_dir / "SKILL.md").write_text( + "---\n" + "name: skill-deploy\n" + "description: Deploy workflow skill.\n" + "---\n\n" + "Deploy steps here.\n" + ) + + retriever = CandidateRetriever( + [ + InMemoryCandidateProvider( + candidate_type="skill", + candidates=[ + CandidateObject( + id="skill-deploy", + type="skill", + title="deploy workflow", + summary="Reusable deployment procedure.", + triggers=["deploy", "workflow"], + confidence=0.9, + success_rate=0.95, + freshness=0.8, + ) + ], + ) + ] + ) + runner = MemabraRunner( + retriever=retriever, + router=RuleBasedRouter(), + execution_engine=ExecutionEngine(skill_backend=FileSystemSkillBackend(search_paths=[tmp_path])), + ) + + trajectory = runner.run(context=TaskContext(user_input="Deploy this service with the usual workflow.")) + + skill_events = [event for event in trajectory["events"] if event["event_type"] == "skill_loaded"] + assert skill_events + assert skill_events[0]["payload"]["name"] == "skill-deploy" + assert "Deploy steps here." in skill_events[0]["payload"]["content"] diff --git a/tests/test_tool_adapters.py b/tests/test_tool_adapters.py new file mode 100644 index 0000000..e1cdd1a --- /dev/null +++ b/tests/test_tool_adapters.py @@ -0,0 +1,66 @@ +from memabra.router import TaskContext + + +def test_local_function_tool_adapter_executes_callable(): + from memabra.execution import LocalFunctionToolAdapter + + def add(a: int, b: int) -> int: + return a + b + + adapter = LocalFunctionToolAdapter(func=add) + result = adapter.run_tool("add", TaskContext(user_input="add 1 and 2"), {"a": 1, "b": 2}) + + assert result["status"] == "success" + assert result["output"] == 3 + assert result["error"] is None + + +def test_subprocess_tool_adapter_executes_command(): + from memabra.execution import SubprocessToolAdapter + + adapter = SubprocessToolAdapter(command="echo hello") + result = adapter.run_tool("echo", TaskContext(user_input="say hello")) + + assert result["status"] == "success" + assert "hello" in result["output"] + assert result["error"] is None + assert result["latency_ms"] >= 0 + + +def test_tool_registry_resolves_and_runs_tools(): + from memabra.execution import LocalFunctionToolAdapter, ToolRegistry + + registry = ToolRegistry() + registry.register("double", LocalFunctionToolAdapter(func=lambda x: x * 2)) + + result = registry.run_tool("double", TaskContext(user_input="double 5"), {"x": 5}) + + assert result["status"] == "success" + assert result["output"] == 10 + + +def test_tool_registry_returns_error_for_unknown_tool(): + from memabra.execution import ToolRegistry + + registry = ToolRegistry() + result = registry.run_tool("missing", TaskContext(user_input="missing")) + + assert result["status"] == "error" + assert "not found" in result["error"].lower() + + +def test_tool_executor_uses_registry_and_produces_result_events(): + from memabra.execution import ToolExecutor, ToolRegistry, LocalFunctionToolAdapter + from memabra.router import RouteDecision + + registry = ToolRegistry() + registry.register("add", LocalFunctionToolAdapter(func=lambda a, b: a + b)) + + executor = ToolExecutor(backend=registry) + decision = RouteDecision(decision_type="call_tool", selected_ids=["add"], selected_payloads=[{"a": 2, "b": 3}]) + result = executor.execute(decision, TaskContext(user_input="add 2 and 3"), trajectory_id="traj-1") + + assert result.status == "executed" + assert result.details["results"][0]["output"] == 5 + assert any(event.event_type == "tool_called" for event in result.events) + assert any(event.event_type == "tool_result" for event in result.events) diff --git a/tests/test_training_reports.py b/tests/test_training_reports.py new file mode 100644 index 0000000..b6f358c --- /dev/null +++ b/tests/test_training_reports.py @@ -0,0 +1,74 @@ +from __future__ import annotations + +from datetime import datetime, timezone + +from memabra.evaluator import EvaluationResult +from memabra.promotion import PromotionDecision, PromotionPolicy +from memabra.training_reports import TrainingReportStore, build_report + + +def test_build_report_includes_all_required_fields(): + baseline = EvaluationResult(task_count=2, avg_reward=0.5, error_rate=0.1, avg_latency_ms=50.0) + challenger = EvaluationResult(task_count=2, avg_reward=0.6, error_rate=0.05, avg_latency_ms=45.0) + decision = PromotionDecision(accepted=True, reasons=[], metrics={"reward_delta": 0.1}) + + report = build_report( + source_trajectory_ids=["t1", "t2"], + baseline=baseline, + challenger=challenger, + decision=decision, + promoted_version_id="v-2026", + ) + + assert report["source_trajectory_ids"] == ["t1", "t2"] + assert report["sample_count"] == 2 + assert "timestamp" in report + assert report["promoted_version_id"] == "v-2026" + assert report["baseline_metrics"]["avg_reward"] == 0.5 + assert report["challenger_metrics"]["avg_reward"] == 0.6 + assert report["promotion_decision"]["accepted"] is True + + +def test_training_report_store_save_and_list(tmp_path): + store = TrainingReportStore(base_dir=tmp_path / "reports") + report = build_report( + source_trajectory_ids=["t1"], + baseline=EvaluationResult(task_count=1, avg_reward=0.5, error_rate=0.0, avg_latency_ms=10.0), + challenger=EvaluationResult(task_count=1, avg_reward=0.6, error_rate=0.0, avg_latency_ms=10.0), + decision=PromotionDecision(accepted=False, reasons=["reward too low"], metrics={}), + ) + + saved = store.save(report) + reports = store.list_reports() + + assert len(reports) == 1 + assert reports[0]["report_id"] == saved["report_id"] + assert reports[0]["promotion_decision"]["accepted"] is False + + +def test_training_report_store_get_report_returns_specific_report(tmp_path): + from memabra.training_reports import TrainingReportStore, build_report + from memabra.evaluator import EvaluationResult + from memabra.promotion import PromotionDecision + + store = TrainingReportStore(base_dir=tmp_path) + report = build_report( + source_trajectory_ids=["t1", "t2"], + baseline=EvaluationResult(task_count=1, trajectories=[], avg_reward=0.5, error_rate=0.0, avg_latency_ms=10.0, decision_distribution={}), + challenger=EvaluationResult(task_count=1, trajectories=[], avg_reward=0.6, error_rate=0.0, avg_latency_ms=10.0, decision_distribution={}), + decision=PromotionDecision(accepted=True, reasons=[], metrics={}), + promoted_version_id="v1", + ) + store.save(report) + + fetched = store.get_report(report["report_id"]) + assert fetched is not None + assert fetched["report_id"] == report["report_id"] + assert fetched["promoted_version_id"] == "v1" + + +def test_training_report_store_get_report_missing_returns_none(tmp_path): + from memabra.training_reports import TrainingReportStore + + store = TrainingReportStore(base_dir=tmp_path) + assert store.get_report("nonexistent") is None diff --git a/tests/test_trajectory_summary.py b/tests/test_trajectory_summary.py new file mode 100644 index 0000000..2f91623 --- /dev/null +++ b/tests/test_trajectory_summary.py @@ -0,0 +1,58 @@ +from memabra.trajectory_summary import TrajectorySummarizer + + +def test_summarize_direct_answer_success(): + summarizer = TrajectorySummarizer() + trajectory = { + "task": {"input": "What is 2+2?"}, + "decisions": [{"decision_type": "direct_answer"}], + "outcome": {"status": "success", "steps": 1, "tool_errors": 0, "user_corrections": 0}, + "reward": {"total": 1.0}, + } + summary = summarizer.summarize(trajectory) + assert "Task: 'What is 2+2?'" in summary + assert "Actions: direct_answer" in summary + assert "Outcome: success (reward=1.0, steps=1)" in summary + + +def test_summarize_multi_step_with_tool_errors(): + summarizer = TrajectorySummarizer() + trajectory = { + "task": {"input": "Run analysis"}, + "decisions": [ + {"decision_type": "clarify"}, + {"decision_type": "call_tool"}, + {"decision_type": "direct_answer"}, + ], + "outcome": {"status": "partial_success", "steps": 3, "tool_errors": 1, "user_corrections": 1}, + "reward": {"total": 0.5}, + } + summary = summarizer.summarize(trajectory) + assert "Actions: clarify -> call_tool -> direct_answer" in summary + assert "Outcome: partial_success (reward=0.5, steps=3)" in summary + assert "Tool errors: 1" in summary + assert "User corrections: 1" in summary + + +def test_summarize_truncates_long_input(): + summarizer = TrajectorySummarizer() + long_input = "a" * 100 + trajectory = { + "task": {"input": long_input}, + "decisions": [{"decision_type": "direct_answer"}], + "outcome": {"status": "success", "steps": 1, "tool_errors": 0, "user_corrections": 0}, + "reward": {"total": 0.9}, + } + summary = summarizer.summarize(trajectory) + assert "Task: '" in summary + assert "..." in summary + assert len(summary) < 300 + + +def test_summarize_handles_missing_fields_gracefully(): + summarizer = TrajectorySummarizer() + trajectory = {} + summary = summarizer.summarize(trajectory) + assert "Task: ''" in summary + assert "Actions: none" in summary + assert "Outcome: unknown (reward=0.0, steps=0)" in summary