Initial standalone memabra release
This commit is contained in:
84
README.md
Normal file
84
README.md
Normal file
@@ -0,0 +1,84 @@
|
||||
# memabra
|
||||
|
||||
An intuition-driven control plane for agent memory and action selection.
|
||||
|
||||
## What is memabra?
|
||||
|
||||
memabra is a local-first, observable, trainable, and replayable agent memory and action orchestration system.
|
||||
|
||||
Instead of being a simple memory database, memabra acts as a meta-cognitive controller for agents: given a task, it quickly decides whether to answer directly, recall memory, load a skill, or invoke a tool — and continuously improves this judgment based on task outcomes.
|
||||
|
||||
## Install
|
||||
|
||||
```bash
|
||||
git clone https://github.com/TacitLab/memabra.git
|
||||
cd memabra
|
||||
python -m venv venv
|
||||
source venv/bin/activate
|
||||
pip install -e ".[dev]"
|
||||
```
|
||||
|
||||
## Quick start
|
||||
|
||||
### 1. See the available commands
|
||||
|
||||
```bash
|
||||
memabra --help
|
||||
```
|
||||
|
||||
### 2. Run a dry-run evaluation
|
||||
|
||||
A safe way to see the full workflow without actually promoting a new router version:
|
||||
|
||||
```bash
|
||||
memabra run --dry-run --format text
|
||||
```
|
||||
|
||||
### 3. Check system status
|
||||
|
||||
```bash
|
||||
memabra status --format text
|
||||
```
|
||||
|
||||
### 4. List saved router versions
|
||||
|
||||
```bash
|
||||
memabra version list --format text
|
||||
```
|
||||
|
||||
### 5. Roll back to a previous version
|
||||
|
||||
```bash
|
||||
memabra version rollback <version-id> --format text
|
||||
```
|
||||
|
||||
## CLI subcommands
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `memabra run` | Run the online learning workflow |
|
||||
| `memabra status` | Show current system state |
|
||||
| `memabra version list` | List all saved router versions |
|
||||
| `memabra version rollback <id>` | Roll back to a specific version |
|
||||
|
||||
## Text output format
|
||||
|
||||
By default, memabra prints JSON. For operator-friendly summaries, add `--format text`:
|
||||
|
||||
- **Status** — current version, trajectory/report counts, latest report timing and promotion outcome.
|
||||
- **Version list** — total count, current active version highlighted.
|
||||
- **Workflow** — grouped into Summary, Baseline, Challenger, Deltas, and Decision sections with normalized `yes/no` flags and fixed-precision metrics.
|
||||
|
||||
## Running tests
|
||||
|
||||
```bash
|
||||
pytest tests/ -q
|
||||
```
|
||||
|
||||
## Project status
|
||||
|
||||
See [docs/PROGRESS.md](docs/PROGRESS.md) for a detailed capability roadmap and [docs/DEMO.md](docs/DEMO.md) for walkthrough examples.
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
252
docs/ALPHA_ITERATION_1_PLAN.md
Normal file
252
docs/ALPHA_ITERATION_1_PLAN.md
Normal file
@@ -0,0 +1,252 @@
|
||||
# memabra Alpha Iteration 1 Plan
|
||||
|
||||
> For Hermes: continue this plan autonomously in small TDD-driven increments. Each run should complete one or more concrete tasks, update this file's progress section, run targeted tests first, then run the full memabra test suite.
|
||||
|
||||
Goal: turn memabra from a showable prototype into a safe self-improving alpha by adding an online learning loop with automatic training, evaluation, gated promotion, and rollback-safe router deployment.
|
||||
|
||||
Architecture:
|
||||
- Keep the current layered design.
|
||||
- Do not replace existing routers; add an orchestration layer around them.
|
||||
- Promotion must be benchmark-gated: no automatic router switch without passing evaluation thresholds.
|
||||
- Persist every training/promotion attempt as an auditable artifact.
|
||||
|
||||
Tech stack:
|
||||
- Existing memabra Python package under `src/memabra/`
|
||||
- Existing pytest suite under `tests/memabra/`
|
||||
- Existing persistence via JSON artifacts; keep it simple for alpha
|
||||
|
||||
---
|
||||
|
||||
## Acceptance criteria
|
||||
|
||||
Alpha Iteration 1 is complete when memabra can:
|
||||
1. detect newly accumulated trajectories
|
||||
2. build a training dataset from eligible trajectories
|
||||
3. train a challenger router automatically
|
||||
4. run challenger vs baseline on a fixed benchmark set
|
||||
5. promote challenger only if thresholds are met
|
||||
6. save a versioned promoted router
|
||||
7. keep an auditable training/promotion report
|
||||
8. leave the currently active router unchanged when challenger loses
|
||||
|
||||
---
|
||||
|
||||
## Implementation phases
|
||||
|
||||
### Phase A — Benchmark-gated online learning loop
|
||||
|
||||
#### Task A1: Add a promotion policy object
|
||||
Objective: define explicit acceptance rules for promoting a challenger router.
|
||||
|
||||
Files:
|
||||
- Create: `src/memabra/promotion.py`
|
||||
- Create: `tests/memabra/test_promotion.py`
|
||||
|
||||
Required behavior:
|
||||
- Define a `PromotionPolicy` dataclass
|
||||
- Inputs should include at least:
|
||||
- `min_reward_delta`
|
||||
- `max_error_rate_increase`
|
||||
- `max_latency_increase_ms`
|
||||
- `required_task_count`
|
||||
- Provide `evaluate(baseline, challenger) -> PromotionDecision`
|
||||
- `PromotionDecision` should include:
|
||||
- `accepted: bool`
|
||||
- `reasons: list[str]`
|
||||
- `metrics: dict`
|
||||
|
||||
TDD steps:
|
||||
1. Write failing tests for accepted and rejected cases.
|
||||
2. Run targeted tests and verify failure.
|
||||
3. Implement minimal policy logic.
|
||||
4. Re-run targeted tests.
|
||||
5. Re-run full memabra suite.
|
||||
|
||||
#### Task A2: Add benchmark suite persistence
|
||||
Objective: store and load a fixed benchmark task set for repeatable evaluations.
|
||||
|
||||
Files:
|
||||
- Create: `src/memabra/benchmarks.py`
|
||||
- Create: `tests/memabra/test_benchmarks.py`
|
||||
|
||||
Required behavior:
|
||||
- Define a serializable benchmark suite format
|
||||
- Load/save benchmark tasks from JSON
|
||||
- Provide a default benchmark seed for memory/tool/skill/composite coverage
|
||||
|
||||
TDD steps:
|
||||
1. Write failing benchmark round-trip tests.
|
||||
2. Verify RED.
|
||||
3. Implement load/save helpers.
|
||||
4. Verify GREEN.
|
||||
5. Run full suite.
|
||||
|
||||
#### Task A3: Add online training coordinator
|
||||
Objective: orchestrate dataset selection, training, evaluation, and promotion.
|
||||
|
||||
Files:
|
||||
- Create: `src/memabra/online_learning.py`
|
||||
- Create: `tests/memabra/test_online_learning.py`
|
||||
|
||||
Required behavior:
|
||||
- Define `OnlineLearningCoordinator`
|
||||
- It should:
|
||||
- query trajectories from `ArtifactIndex`
|
||||
- enforce minimum new trajectory count
|
||||
- train a challenger with `DatasetBuilder`
|
||||
- evaluate challenger with `Evaluator`
|
||||
- apply `PromotionPolicy`
|
||||
- save promoted routers via `RouterVersionStore`
|
||||
- emit a structured report whether accepted or rejected
|
||||
|
||||
TDD steps:
|
||||
1. Write failing tests for:
|
||||
- skip when too few new trajectories
|
||||
- reject when policy fails
|
||||
- accept and save version when policy passes
|
||||
2. Verify failure.
|
||||
3. Implement minimal coordinator.
|
||||
4. Verify targeted tests.
|
||||
5. Run full suite.
|
||||
|
||||
### Phase B — Auditability and safe deployment
|
||||
|
||||
#### Task B1: Add training run reports
|
||||
Objective: persist every online-learning attempt, not just successful promotions.
|
||||
|
||||
Files:
|
||||
- Extend: `src/memabra/persistence.py` or create `src/memabra/training_reports.py`
|
||||
- Create: `tests/memabra/test_training_reports.py`
|
||||
|
||||
Required behavior:
|
||||
- Save a JSON report per training run
|
||||
- Include:
|
||||
- timestamp
|
||||
- source trajectory ids
|
||||
- sample count
|
||||
- baseline metrics
|
||||
- challenger metrics
|
||||
- promotion decision
|
||||
- promoted version id if any
|
||||
|
||||
#### Task B2: Add active router metadata tracking
|
||||
Objective: make it obvious which router is active and why.
|
||||
|
||||
Files:
|
||||
- Extend: `src/memabra/router_versioning.py`
|
||||
- Extend: `tests/memabra/test_router_versioning.py`
|
||||
|
||||
Required behavior:
|
||||
- Track metadata for current active router
|
||||
- Record promotion source, benchmark result summary, and prior version
|
||||
- Make rollback preserve audit trail
|
||||
|
||||
### Phase C — Product surface and automation
|
||||
|
||||
#### Task C1: Add app-level online learning entrypoint
|
||||
Objective: expose one-call retrain/evaluate/promote behavior from `MemabraApp`.
|
||||
|
||||
Files:
|
||||
- Extend: `src/memabra/app.py`
|
||||
- Extend: `tests/memabra/test_app.py`
|
||||
|
||||
Required behavior:
|
||||
- Add a method like `run_online_learning_cycle(...)`
|
||||
- Return a structured result dict/report
|
||||
|
||||
#### Task C2: Add CLI entrypoint for the alpha loop
|
||||
Objective: make the safe online-learning loop runnable from the command line.
|
||||
|
||||
Files:
|
||||
- Extend: `src/memabra/cli.py`
|
||||
- Extend: `tests/memabra/test_cli_workflow.py`
|
||||
- Update: `docs/projects/memabra/DEMO.md`
|
||||
|
||||
Required behavior:
|
||||
- Add a callable workflow that:
|
||||
- seeds or uses existing artifacts
|
||||
- runs one online-learning cycle
|
||||
- prints the report JSON
|
||||
|
||||
#### Task C3: Update docs and wrap-up materials
|
||||
Objective: document the alpha loop clearly.
|
||||
|
||||
Files:
|
||||
- Update: `docs/projects/memabra/PROGRESS.md`
|
||||
- Update: `docs/projects/memabra/ROADMAP.md`
|
||||
- Update: `docs/projects/memabra/DEMO.md`
|
||||
- Optional: create `docs/projects/memabra/ONLINE_LEARNING.md`
|
||||
|
||||
Required behavior:
|
||||
- Explain promotion gates
|
||||
- Explain how to run one cycle manually
|
||||
- Explain where reports and versions are stored
|
||||
|
||||
---
|
||||
|
||||
## Suggested run order for autonomous 20-minute cycles
|
||||
|
||||
Cycle group 1:
|
||||
- A1 promotion policy
|
||||
- A2 benchmark suite persistence
|
||||
|
||||
Cycle group 2:
|
||||
- A3 online training coordinator
|
||||
|
||||
Cycle group 3:
|
||||
- B1 training run reports
|
||||
- B2 active router metadata tracking
|
||||
|
||||
Cycle group 4:
|
||||
- C1 app-level entrypoint
|
||||
- C2 CLI workflow
|
||||
- C3 docs cleanup
|
||||
|
||||
---
|
||||
|
||||
## Estimated autonomous runs
|
||||
|
||||
Recommended initial budget: 18 runs at every 20 minutes.
|
||||
|
||||
Reasoning:
|
||||
- 3 to 4 runs for Phase A
|
||||
- 3 to 4 runs for Phase B
|
||||
- 2 to 3 runs for Phase C
|
||||
- remaining runs as slack for regression fixes, docs cleanup, and one or two extra quality passes
|
||||
|
||||
At 20 minutes per run, 18 runs gives about 6 hours of autonomous iteration, which is a reasonable overnight alpha push.
|
||||
|
||||
---
|
||||
|
||||
## Progress tracker
|
||||
|
||||
- [x] Task A1 — promotion policy
|
||||
- [x] Task A2 — benchmark suite persistence
|
||||
- [x] Task A3 — online training coordinator
|
||||
- [x] Task B1 — training run reports
|
||||
- [x] Task B2 — active router metadata tracking
|
||||
- [x] Task C1 — app-level online learning entrypoint
|
||||
- [x] Task C2 — CLI online learning workflow
|
||||
- [x] Task C3 — docs cleanup and operator guidance
|
||||
- [x] Task D1 — baseline version selection for online learning
|
||||
- [x] Task E1 — task case index for episodic retrieval
|
||||
|
||||
## Run log
|
||||
|
||||
- 2026-04-14: Plan created. Ready for autonomous overnight execution.
|
||||
- 2026-04-14 22:52 UTC: Completed Tasks A1–A3. Promotion policy, benchmark persistence, and online training coordinator implemented with tests. Full suite: 71 passed.
|
||||
- 2026-04-14 23:22 UTC: Completed Tasks B1–C3. Training reports, active router metadata tracking, app/CLI entrypoints, and docs implemented with tests. Full suite: 78 passed.
|
||||
- 2026-04-14 23:24 UTC: Quality pass — CLI main() now defaults to online-learning workflow, fixed schema test resource warning, added missing alpha module exports to package __init__.py. Full suite: 82 passed.
|
||||
- 2026-04-14 23:50 UTC: Docs and repo hygiene pass — updated DEMO.md and ONLINE_LEARNING.md to reflect that `python -m src.memabra.cli` runs the online-learning workflow; added `docs/projects/memabra/demo-artifacts/` to `.gitignore`; verified CLI end-to-end (promoted=true, version saved, report emitted). Full suite: 82 passed.
|
||||
- 2026-04-15 00:49 UTC: Safety and usability pass — added exception handling in `OnlineLearningCoordinator` so training/evaluation failures emit error reports instead of crashing; added CLI argument parsing (`--base-dir`, `--min-new-trajectories`); fixed `python -m src.memabra.cli` RuntimeWarning via lazy `cli` import; added `TrainingReportStore.get_report()` for by-id lookup; exported `BenchmarkTask` from package `__init__.py`; updated DEMO.md and ONLINE_LEARNING.md. Full suite: 88 passed.
|
||||
- 2026-04-15 01:15 UTC: Repo hygiene and commit pass — verified end-to-end CLI workflow produced a promoted router, version, and report; updated `.gitignore` to exclude runtime artifact directories (`router-versions/`, `training-reports/`); committed entire memabra alpha codebase (67 files, 6,818 insertions). Full suite: 88 passed.
|
||||
- 2026-04-15 02:00 UTC: Persistence pass — `OnlineLearningCoordinator` now supports `seen_trajectory_store` to persist seen trajectory IDs across restarts, preventing duplicate retraining in cron jobs. Added `test_coordinator_persists_seen_trajectory_ids_across_restarts`. Fixed evaluation leakage by refreshing the artifact index after benchmarking and marking post-evaluation trajectories as seen. Wired `seen_trajectory_store` through `app.py` and `cli.py`; CLI now defaults to `<base-dir>/seen-trajectories.json`. Added corresponding tests. Full suite: 91 passed.
|
||||
- 2026-04-15 02:27 UTC: Dry-run pass — committed pending persistence-pass changes, then added `--dry-run` CLI flag and `dry_run` parameter through the full stack (`OnlineLearningCoordinator`, `app.py`, `cli.py`). In dry-run mode training and evaluation execute but promotion and version saving are skipped; an audit report is still emitted with `dry_run: true`. Added `test_coordinator_dry_run_does_not_promote_or_save_version` and `test_main_entrypoint_passes_dry_run_flag`. Updated `ONLINE_LEARNING.md`. Full suite: 93 passed.
|
||||
- 2026-04-15 02:51 UTC: Baseline-version pass — added `baseline_version_id` parameter to `OnlineLearningCoordinator.run_cycle()`, `MemabraApp.run_online_learning_cycle()`, and CLI `--baseline-version` flag. This lets operators evaluate a challenger against a specific saved router version rather than the currently active one. Added tests for coordinator, app, and CLI. Updated `ONLINE_LEARNING.md`. Full suite: 96 passed.
|
||||
- 2026-04-15 03:18 UTC: Verification pass — confirmed all tasks A1–D1 are complete and stable. Ran full memabra suite (96 passed) and end-to-end CLI workflow (promoted=true, version saved, report emitted). No code changes required; repo is clean and ready for operator review.
|
||||
- 2026-04-15 04:02 UTC: Started Phase E — added `CaseIndex` (`src/memabra/case_index.py`) for task-level episodic retrieval. Maps normalized task inputs to the highest-reward trajectory ID, with JSON save/load. Added `tests/memabra/test_case_index.py` (4 tests). Full suite: 100 passed.
|
||||
- 2026-04-15 04:27 UTC: Integrated `CaseIndex` into `MemabraApp` and `MemabraRunner` for episodic retrieval. Added app-level methods (`build_case_index`, `save_case_index`, `load_case_index`, `best_trajectory_for`). Runner now injects an episodic memory candidate when a case index hit occurs. Added CLI flags `--case-index` and `--rebuild-case-index`. Updated docs. Full suite: 107 passed.
|
||||
- 2026-04-15 04:54 UTC: Added `case_index_path` support to `OnlineLearningCoordinator` so the case index is automatically rebuilt after each online-learning cycle (including benchmark-generated trajectories). Wired parameter through `app.py` and `cli.py`. Added tests for coordinator, app, and CLI. Full suite: 110 passed.
|
||||
- 2026-04-15 05:18 UTC: Added `TrajectorySummarizer` (`src/memabra/trajectory_summary.py`) for generating human-readable trajectory summaries. Integrated summarizer into `MemabraRunner` so episodic memory candidates contain rich summaries when a `persistence_store` is available. Added `tests/memabra/test_trajectory_summary.py` (4 tests) and updated runner test. Full suite: 114 passed.
|
||||
- 2026-04-15 05:42 UTC: Added CLI `--status` flag (`src/memabra/cli.py`) to print current system state (active router version, version count, trajectory count, report count, latest report summary) without running a learning cycle. Added `tests/memabra/test_cli_workflow.py::test_main_status_flag_prints_status_and_skips_workflow`. Full suite: 115 passed.
|
||||
- 2026-04-15 06:05 UTC: Added CLI `--rollback` and `--list-versions` flags for operator-safe router version management. Added error handling for missing rollback targets (exits 1 with clean message). Added corresponding tests. Full suite: 118 passed. Updated `ONLINE_LEARNING.md` and `DEMO.md` documentation.
|
||||
219
docs/ARCHITECTURE.md
Normal file
219
docs/ARCHITECTURE.md
Normal file
@@ -0,0 +1,219 @@
|
||||
# Architecture
|
||||
|
||||
## 1. 问题定义
|
||||
|
||||
我们要解决的不是“怎样让模型记住更多”,而是:
|
||||
当 agent 遇到一个任务时,怎样在有限上下文、有限工具预算和有限时间下,快速决定是否要调用 memory、skill、tool,并让这个决策过程能够被训练和修正。
|
||||
|
||||
## 2. 系统总览
|
||||
|
||||
系统采用四层架构。
|
||||
|
||||
### 2.1 Retrieval Layer(候选召回层)
|
||||
输入:
|
||||
- 当前用户任务
|
||||
- 对话短摘要
|
||||
- 当前环境状态
|
||||
- 失败历史 / 最近修正
|
||||
|
||||
输出:
|
||||
- top-k memory candidates
|
||||
- top-k skill candidates
|
||||
- top-k tool candidates
|
||||
|
||||
职责:
|
||||
- 从不同来源召回候选对象
|
||||
- 统一为标准候选格式
|
||||
- 不做最终决策,只做缩小搜索空间
|
||||
|
||||
### 2.2 Policy Layer(直觉 / 路由层)
|
||||
输入:
|
||||
- 当前任务表示
|
||||
- 候选对象集合
|
||||
- 历史选择特征
|
||||
- 成本与风险信号
|
||||
|
||||
输出:
|
||||
- 直接回答
|
||||
- 读取某条 memory
|
||||
- 加载某个 skill
|
||||
- 调用某个 tool
|
||||
- 组合动作(如先 skill 后 tool)
|
||||
- 请求澄清
|
||||
|
||||
职责:
|
||||
- 模拟“直觉”
|
||||
- 先做快速动作选择
|
||||
- 后续可从规则逐步升级到分类器、reranker、bandit、RL policy
|
||||
|
||||
### 2.3 Execution Layer(执行层)
|
||||
职责:
|
||||
- 注入记忆到上下文
|
||||
- 加载 skill 指令
|
||||
- 调用真实工具
|
||||
- 记录执行步骤、耗时、报错、产出
|
||||
|
||||
### 2.4 Evaluation Layer(反馈 / 归因层)
|
||||
职责:
|
||||
- 判断任务是否成功
|
||||
- 分析步骤数、重试数、错误率、用户修正次数
|
||||
- 拆解 reward
|
||||
- 产生可训练轨迹
|
||||
|
||||
没有这一层,就没有真正的“学习”,只有玄学调参。
|
||||
|
||||
## 3. 统一对象模型
|
||||
|
||||
虽然 memory、skill、tool 性质不同,但在召回和路由阶段可以统一成候选对象:
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "string",
|
||||
"type": "memory|skill|tool",
|
||||
"title": "string",
|
||||
"summary": "string",
|
||||
"triggers": ["string"],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.0,
|
||||
"success_rate": 0.0,
|
||||
"freshness": 0.0,
|
||||
"risk": 0.0,
|
||||
"embedding": "vector-ref",
|
||||
"tags": ["string"],
|
||||
"source": "user|system|generated|external"
|
||||
}
|
||||
```
|
||||
|
||||
注意:统一的是候选接口,不是语义本体。
|
||||
三类对象必须保持边界:
|
||||
- memory 存事实
|
||||
- skill 存程序
|
||||
- tool 存动作能力
|
||||
|
||||
## 4. 记忆系统分层
|
||||
|
||||
### 4.1 Semantic Memory(事实记忆)
|
||||
例如:
|
||||
- 用户偏好
|
||||
- 机器环境
|
||||
- 项目约定
|
||||
- API 限制
|
||||
|
||||
### 4.2 Procedural Memory(程序性记忆)
|
||||
即 skill:
|
||||
- 某类任务的处理流程
|
||||
- 踩坑经验
|
||||
- 验证步骤
|
||||
|
||||
### 4.3 Episodic Memory(情景记忆)
|
||||
- 某次任务的具体轨迹
|
||||
- 当时用了什么资源
|
||||
- 为什么成功或失败
|
||||
|
||||
### 4.4 Working Memory(工作记忆)
|
||||
- 当前任务临时状态
|
||||
- 本轮推理中间产物
|
||||
- 不应直接沉淀为长期记忆
|
||||
|
||||
## 5. 训练策略:先外部策略,后端到端
|
||||
|
||||
### 5.1 Phase A:不改基础模型权重
|
||||
先训练一个小型策略器,决定:
|
||||
- 要不要查记忆
|
||||
- 查哪类记忆
|
||||
- 要不要 skill
|
||||
- 先用哪个工具
|
||||
|
||||
可选实现:
|
||||
- 规则 + 分数融合
|
||||
- 轻量分类器
|
||||
- reranker
|
||||
- contextual bandit
|
||||
|
||||
### 5.2 Phase B:从轨迹中学 reranking / routing
|
||||
训练输入:
|
||||
- 任务上下文
|
||||
- 候选对象集合
|
||||
- 实际动作
|
||||
- 结果 reward
|
||||
|
||||
训练目标:
|
||||
- 最大化任务完成率
|
||||
- 最小化无效调用
|
||||
- 减少用户重复提供信息
|
||||
- 减少不必要的上下文膨胀
|
||||
|
||||
### 5.3 Phase C:端到端实验
|
||||
只有当以下条件成立,才值得考虑:
|
||||
- 已有高质量轨迹数据
|
||||
- 能做 credit assignment
|
||||
- 有稳定的离线评估环境
|
||||
- 能控制灾难性遗忘
|
||||
|
||||
## 6. Feedback & Reward 设计
|
||||
|
||||
reward 不能只看任务是否成功。要拆成多项:
|
||||
- task_success:最终是否完成
|
||||
- efficiency:用了多少步
|
||||
- retrieval_hit:是否命中关键 memory/skill/tool
|
||||
- user_correction_penalty:用户是否纠正
|
||||
- tool_error_penalty:是否触发无效工具调用
|
||||
- context_cost_penalty:上下文是否膨胀过度
|
||||
- latency_penalty:是否过慢
|
||||
|
||||
可组合为:
|
||||
|
||||
```text
|
||||
R = a*task_success + b*retrieval_hit - c*tool_error - d*user_correction - e*latency - f*context_cost
|
||||
```
|
||||
|
||||
## 7. 关键难点
|
||||
|
||||
### 7.1 Credit Assignment
|
||||
成功了,到底是谁的功劳?
|
||||
要记录候选集、最终选择、未选备选项,才能做反事实分析。
|
||||
|
||||
### 7.2 False Reinforcement
|
||||
错误记忆被反复命中,会自我强化。
|
||||
需要:
|
||||
- 置信度
|
||||
- 可撤销
|
||||
- 最近验证时间
|
||||
- 来源追踪
|
||||
|
||||
### 7.3 Exploitation vs Exploration
|
||||
老选最稳的对象会变保守,永远学不到新模式。
|
||||
需要安全探索机制。
|
||||
|
||||
### 7.4 Type Boundary Collapse
|
||||
如果把 memory、skill、tool 混成一个大向量池,系统会越来越糊。
|
||||
|
||||
## 8. 推荐 MVP
|
||||
|
||||
### MVP-1:可观测系统
|
||||
- 定义对象 schema
|
||||
- 定义事件 schema
|
||||
- 统一记录轨迹
|
||||
- 做基础检索
|
||||
- 用规则路由
|
||||
|
||||
### MVP-2:轻量学习型路由
|
||||
- 加入候选打分器
|
||||
- 从优秀轨迹训练动作选择器
|
||||
- 做离线回放评估
|
||||
|
||||
### MVP-3:在线自适应
|
||||
- 使用 bandit / preference updates
|
||||
- 根据任务结果微调路由策略
|
||||
|
||||
### MVP-4:端到端试验场
|
||||
- 小规模实验性训练
|
||||
- 与分层方案对比
|
||||
- 验证是否真有收益
|
||||
|
||||
## 9. 核心原则
|
||||
|
||||
1. 先可观测,再可学习
|
||||
2. 先学路由,再学大脑
|
||||
3. 先做分层归因,再做端到端优化
|
||||
4. 优化“何时依赖什么”,而不是盲目优化“模型看起来更聪明”
|
||||
94
docs/DECISIONS.md
Normal file
94
docs/DECISIONS.md
Normal file
@@ -0,0 +1,94 @@
|
||||
# Design Decisions
|
||||
|
||||
## D-001: 不以端到端训练作为第一阶段目标
|
||||
|
||||
决定:
|
||||
第一阶段采用分层架构,不直接训练一个从任务到动作的黑盒大模型。
|
||||
|
||||
原因:
|
||||
- 反馈稀疏
|
||||
- credit assignment 困难
|
||||
- 数据量不足时容易学偏
|
||||
- 可解释性太差,难 debug
|
||||
|
||||
影响:
|
||||
项目先构建 observability、logging、router 和 reward 层。
|
||||
|
||||
## D-002: 将 memory、skill、tool 统一为候选对象接口,但不混淆类型
|
||||
|
||||
决定:
|
||||
在召回和排序阶段,三者共享统一候选 schema;在存储、执行和评估阶段,保持强类型边界。
|
||||
|
||||
原因:
|
||||
- 统一召回便于路由决策
|
||||
- 保持类型边界可避免语义坍塌
|
||||
|
||||
影响:
|
||||
后续 schema 设计需要同时支持统一特征和类型特有字段。
|
||||
|
||||
## D-003: 记忆分为 facts / procedures / episodes / working 四层
|
||||
|
||||
决定:
|
||||
长期系统至少区分:
|
||||
- facts
|
||||
- procedures
|
||||
- episodes
|
||||
- working memory
|
||||
|
||||
原因:
|
||||
“记忆”不是一坨文本,人的有效直觉来自多种记忆系统协同。
|
||||
|
||||
影响:
|
||||
每个写入动作都要先判定落到哪一层,而不是直接塞进统一向量库。
|
||||
|
||||
## D-004: 先优化路由策略,再考虑学习基础模型内部权重
|
||||
|
||||
决定:
|
||||
学习目标先放在 external policy 上,而不是 foundation model 的参数上。
|
||||
|
||||
原因:
|
||||
- 小模型更便宜
|
||||
- 训练更稳定
|
||||
- 更容易比较实验结果
|
||||
- 更适合本地部署
|
||||
|
||||
影响:
|
||||
需要专门设计 router features、训练样本和离线评估框架。
|
||||
|
||||
## D-005: reward 必须拆分,不使用单一任务成败信号
|
||||
|
||||
决定:
|
||||
reward 将拆分为 success、efficiency、retrieval_hit、user_correction、tool_error、latency、context_cost 等因子。
|
||||
|
||||
原因:
|
||||
只看任务成功会掩盖大量中间行为质量问题。
|
||||
|
||||
影响:
|
||||
需要事件级 logging,不能只存最终答案。
|
||||
|
||||
## D-006: 所有学习都建立在可回放轨迹上
|
||||
|
||||
决定:
|
||||
任何策略更新都必须能追溯到完整 trajectory。
|
||||
|
||||
原因:
|
||||
不可回放,就无法排查策略劣化;不可回放,也无法做人类审计。
|
||||
|
||||
影响:
|
||||
trajectory schema 和 replay 工具会成为基础设施,而不是可选项。
|
||||
|
||||
## D-007: 项目正式命名为 memabra
|
||||
|
||||
决定:
|
||||
项目正式名采用 `memabra`。
|
||||
|
||||
副标题:
|
||||
An intuition-driven control plane for agent memory and action selection.
|
||||
|
||||
原因:
|
||||
- 需要一个可品牌化、可传播的短名
|
||||
- 技术本质由副标题补足
|
||||
- 避免旧名把项目误导成“单纯记忆管理工具”
|
||||
|
||||
影响:
|
||||
后续所有原型代码、文档、schema 标识、演示材料统一使用 memabra。
|
||||
148
docs/DEMO.md
Normal file
148
docs/DEMO.md
Normal file
@@ -0,0 +1,148 @@
|
||||
# Demo
|
||||
|
||||
memabra now has a polished wrap-up workflow in addition to the lower-level demo app.
|
||||
|
||||
## Quick run
|
||||
|
||||
If you installed the repo in editable mode, prefer the dedicated CLI command:
|
||||
|
||||
```bash
|
||||
source venv/bin/activate
|
||||
memabra
|
||||
```
|
||||
|
||||
The legacy developer entrypoint still works too:
|
||||
|
||||
```bash
|
||||
source venv/bin/activate
|
||||
python -m src.memabra.cli
|
||||
```
|
||||
|
||||
This runs the online-learning loop: it seeds demo tasks, trains a challenger router, evaluates it against a benchmark suite, promotes it if thresholds are met, and prints a JSON report.
|
||||
|
||||
You can override the default artifact directory and minimum trajectory threshold:
|
||||
|
||||
```bash
|
||||
source venv/bin/activate
|
||||
memabra run --base-dir /custom/artifacts --min-new-trajectories 5
|
||||
```
|
||||
|
||||
You can also enable episodic retrieval by rebuilding the case index from saved trajectories:
|
||||
|
||||
```bash
|
||||
source venv/bin/activate
|
||||
memabra run --rebuild-case-index
|
||||
```
|
||||
|
||||
You can check system status, list versions, or roll back without running a learning cycle:
|
||||
|
||||
```bash
|
||||
source venv/bin/activate
|
||||
memabra status
|
||||
memabra version list
|
||||
memabra version rollback 20260414-123456
|
||||
```
|
||||
|
||||
If you want operator-friendly output instead of raw JSON, use `--format text`:
|
||||
|
||||
```bash
|
||||
source venv/bin/activate
|
||||
memabra status --format text
|
||||
memabra version list --format text
|
||||
memabra version rollback 20260414-123456 --format text
|
||||
memabra run --dry-run --format text
|
||||
```
|
||||
|
||||
The text formatter is aimed at operators: status output includes the latest report timing/outcome, version listings highlight the currently active router version, and workflow output is grouped into summary/baseline/challenger/deltas/decision sections with normalized yes/no and fixed-precision metrics.
|
||||
|
||||
You can also call it programmatically:
|
||||
|
||||
```bash
|
||||
source venv/bin/activate
|
||||
python - <<'PY'
|
||||
from src.memabra.cli import run_online_learning_workflow
|
||||
result = run_online_learning_workflow()
|
||||
print(result)
|
||||
PY
|
||||
```
|
||||
|
||||
The online-learning workflow will:
|
||||
1. build a demo app
|
||||
2. seed example tasks (if no trajectories exist yet)
|
||||
3. run one online-learning cycle
|
||||
4. train a challenger router
|
||||
5. evaluate it against the baseline on a fixed benchmark suite
|
||||
6. promote it only if the promotion policy accepts
|
||||
7. persist a training report under `training-reports/`
|
||||
8. print a JSON report
|
||||
|
||||
## Python API
|
||||
|
||||
```python
|
||||
from src.memabra.cli import run_wrapup_workflow, run_online_learning_workflow
|
||||
|
||||
# Legacy wrap-up demo
|
||||
result = run_wrapup_workflow()
|
||||
print(result)
|
||||
|
||||
# Safe online-learning loop with benchmark-gated promotion
|
||||
result = run_online_learning_workflow()
|
||||
print(result)
|
||||
```
|
||||
|
||||
## Lower-level demo app
|
||||
|
||||
You can still drive the app manually:
|
||||
|
||||
```bash
|
||||
source venv/bin/activate
|
||||
python - <<'PY'
|
||||
from src.memabra.app import build_demo_app
|
||||
app = build_demo_app()
|
||||
|
||||
for prompt in [
|
||||
'Use my telegram preference for this answer.',
|
||||
'Check the current system status.',
|
||||
'Deploy this service with the usual workflow.',
|
||||
]:
|
||||
trajectory = app.run_task(prompt, channel='telegram', user_id='oza')
|
||||
print(prompt)
|
||||
print(trajectory['decisions'][0]['decision_type'], trajectory['outcome']['status'], trajectory['reward']['total'])
|
||||
print([event['event_type'] for event in trajectory['events']])
|
||||
print('---')
|
||||
|
||||
print(app.replay_summary())
|
||||
PY
|
||||
```
|
||||
|
||||
## Output locations
|
||||
|
||||
By default the workflows write to:
|
||||
- `docs/projects/memabra/demo-artifacts/trajectories/`
|
||||
- `docs/projects/memabra/demo-artifacts/memories/`
|
||||
- `docs/projects/memabra/demo-artifacts/router-versions/`
|
||||
- `docs/projects/memabra/demo-artifacts/training-reports/`
|
||||
|
||||
## What this proves
|
||||
|
||||
The alpha is able to demonstrate the whole loop:
|
||||
- retrieval
|
||||
- routing
|
||||
- execution
|
||||
- persistence
|
||||
- replay
|
||||
- training
|
||||
- evaluation
|
||||
- router versioning
|
||||
- benchmark-gated promotion
|
||||
- auditable training reports
|
||||
|
||||
## Limits
|
||||
|
||||
This is still an alpha:
|
||||
- learning is lightweight, not a deep model
|
||||
- storage is JSON-file based
|
||||
- promotion policy thresholds are manually configured
|
||||
- tool/skill integration is still narrower than a production agent platform
|
||||
|
||||
But it is now a safe, self-improving alpha, not just a pile of modules.
|
||||
77
docs/EXECUTION_AND_PERSISTENCE.md
Normal file
77
docs/EXECUTION_AND_PERSISTENCE.md
Normal file
@@ -0,0 +1,77 @@
|
||||
# Execution and Persistence
|
||||
|
||||
## 目标
|
||||
|
||||
给 memabra 补上两块真正让系统“落地”的骨头:
|
||||
- execution:让路由决策进入可执行动作层
|
||||
- persistence:让 trajectory 和 memory record 能落到磁盘
|
||||
|
||||
## 当前实现
|
||||
|
||||
### execution.py
|
||||
提供:
|
||||
- `ActionResult`
|
||||
- `MemoryExecutor`
|
||||
- `SkillExecutor`
|
||||
- `ToolExecutor` (原 MockToolExecutor,现已升级为可接真实后端)
|
||||
- `ExecutionEngine`
|
||||
- `ToolBackend` 协议(支持 `params` 传参)
|
||||
- `LocalFunctionToolAdapter` — 将工具映射到本地 Python 函数
|
||||
- `SubprocessToolAdapter` — 将工具映射到 shell 命令
|
||||
- `ToolRegistry` — 按 `tool_id` 注册、查找和执行工具
|
||||
|
||||
当前行为:
|
||||
- `inject_memory` 会产出 `memory_injected` 事件,并在有 memory store 时标记 `last_used_at`
|
||||
- `load_skill` 会产出 `skill_loaded` 事件
|
||||
- `call_tool` 会通过 `ToolBackend` 协议调用真实后端,产出 `tool_called` 和 `tool_result` 事件
|
||||
- `RouteDecision` 现在携带 `selected_payloads`,可以将候选参数经由 `ToolExecutor` 传递给后端
|
||||
- 其他 decision_type 先走 noop
|
||||
|
||||
这一步的意义是:
|
||||
memabra 第一次有了 execution stage,而不是只有 policy stage。
|
||||
并且 tool 层现在可以接入真实的本地函数或子进程后端,不再是纯 mock。
|
||||
|
||||
### persistence.py
|
||||
提供:
|
||||
- `PersistenceStore`
|
||||
|
||||
当前能力:
|
||||
- 保存 trajectory 到 `artifacts/trajectories/`
|
||||
- 读取 trajectory
|
||||
- 列出 trajectory 文件
|
||||
- 保存 memory record 到 `artifacts/memories/`
|
||||
- 读取 memory record
|
||||
- 列出 memory 文件
|
||||
|
||||
这意味着 prototype artifacts 已经不再只是内存态漂浮物。
|
||||
|
||||
### runner writeback integration
|
||||
runner 现在支持:
|
||||
- 挂 execution engine
|
||||
- 挂 persistence store
|
||||
- 挂 memory store
|
||||
- 执行后扩展 execution events
|
||||
- 可选把 trajectory 落盘
|
||||
- 对 memory inject 决策进行基本 writeback / mark_used
|
||||
|
||||
## 当前闭环
|
||||
|
||||
现在的最小系统流程已经变成:
|
||||
任务 -> retrieval -> router -> execution -> trajectory -> validation -> persistence -> replay
|
||||
|
||||
这就真正有点 agent runtime 的味儿了。
|
||||
|
||||
## 当前限制
|
||||
|
||||
- ~~tool 执行还是 mock 的~~ 已升级为可插拔式真实后端
|
||||
- skill 执行只是事件层,不是真加载技能
|
||||
- writeback 逻辑还很粗糙
|
||||
- persistence 目前是 JSON 文件,没有索引层
|
||||
|
||||
## 下一步建议
|
||||
|
||||
1. ~~做真实 `ToolExecutor` / `SkillExecutor` adapter 协议~~ tool adapter 已完成
|
||||
2. 做真实 `SkillExecutor` adapter(从文件系统加载 skill payload)
|
||||
3. 把 persistence 接到 replay 默认数据源
|
||||
4. 给 runner 增加 outcome / reward 的真实更新逻辑
|
||||
5. 做 richer telemetry 和失败事件归因
|
||||
48
docs/NAMING.md
Normal file
48
docs/NAMING.md
Normal file
@@ -0,0 +1,48 @@
|
||||
# Naming
|
||||
|
||||
最终命名确定为:
|
||||
|
||||
# memabra
|
||||
|
||||
副标题:
|
||||
An intuition-driven control plane for agent memory and action selection.
|
||||
|
||||
## 选择理由
|
||||
|
||||
这个名字成立,因为它同时满足两件事:
|
||||
|
||||
1. 作为品牌名,它短、好记、有辨识度。
|
||||
2. 作为系统名,它配合副标题后,能准确表达项目本质不是“记忆库”,而是 memory、skill、tool 的动作选择与控制系统。
|
||||
|
||||
## 命名策略
|
||||
|
||||
- 品牌名:`memabra`
|
||||
- 技术描述:`An intuition-driven control plane for agent memory and action selection.`
|
||||
|
||||
这样分层后:
|
||||
- `memabra` 负责让人记住
|
||||
- 副标题负责让人看懂
|
||||
|
||||
## 为什么不用纯功能名
|
||||
|
||||
像 `Agent Memory Manager` 这样直接描述功能的名字,问题是太窄:
|
||||
- 太像存储工具
|
||||
- 没体现 routing / policy / evaluation / learning
|
||||
- 没体现它是 agent 的元认知控制器
|
||||
|
||||
## 内部表达建议
|
||||
|
||||
在技术文档里,可以把 memabra 描述为:
|
||||
- local-first metacognitive router
|
||||
- agent memory and action orchestration system
|
||||
- intuition-driven control plane
|
||||
|
||||
这三个说法分别适合:
|
||||
- 研究语境
|
||||
- 工程语境
|
||||
- 对外介绍语境
|
||||
|
||||
## 结论
|
||||
|
||||
命名不再强调“memory manager”,而强调“intuition-driven control”。
|
||||
这更接近项目真正的骨架。
|
||||
171
docs/ONLINE_LEARNING.md
Normal file
171
docs/ONLINE_LEARNING.md
Normal file
@@ -0,0 +1,171 @@
|
||||
# Online Learning Operator Guide
|
||||
|
||||
## What it does
|
||||
|
||||
memabra's online learning loop lets the system safely retrain its router from accumulated trajectories, evaluate the new challenger against the current baseline, and promote it only if explicit thresholds are met.
|
||||
|
||||
## How to run one cycle
|
||||
|
||||
### From Python
|
||||
|
||||
```python
|
||||
from src.memabra.cli import run_online_learning_workflow
|
||||
|
||||
result = run_online_learning_workflow()
|
||||
print(result)
|
||||
```
|
||||
|
||||
### From the shell
|
||||
|
||||
```bash
|
||||
source venv/bin/activate
|
||||
python -m src.memabra.cli
|
||||
```
|
||||
|
||||
Or with custom options:
|
||||
|
||||
```bash
|
||||
source venv/bin/activate
|
||||
python -m src.memabra.cli --base-dir /custom/artifacts --min-new-trajectories 5
|
||||
```
|
||||
|
||||
By default the CLI persists seen trajectory IDs to `<base-dir>/seen-trajectories.json` so repeated runs skip already-processed data. You can override the path:
|
||||
|
||||
```bash
|
||||
source venv/bin/activate
|
||||
python -m src.memabra.cli --seen-trajectory-store /custom/artifacts/seen.json
|
||||
```
|
||||
|
||||
### Dry-run mode
|
||||
|
||||
To train and evaluate a challenger without actually promoting it or saving a new router version:
|
||||
|
||||
```bash
|
||||
source venv/bin/activate
|
||||
python -m src.memabra.cli --dry-run
|
||||
```
|
||||
|
||||
This still produces a training report (with `dry_run: true`) so you can inspect what would have happened before allowing a real promotion.
|
||||
|
||||
### Evaluate against a specific baseline version
|
||||
|
||||
By default the online-learning cycle uses the currently active router as the baseline. You can pin the baseline to a specific saved version instead:
|
||||
|
||||
```bash
|
||||
source venv/bin/activate
|
||||
python -m src.memabra.cli --baseline-version 20260414-123456
|
||||
```
|
||||
|
||||
This is useful when you want to compare a challenger against a known-good version rather than whatever happens to be active right now. The report will record `baseline_version_id` for audit.
|
||||
|
||||
### Episodic retrieval with case index
|
||||
|
||||
You can load or rebuild a case index for episodic retrieval during task execution:
|
||||
|
||||
```bash
|
||||
source venv/bin/activate
|
||||
python -m src.memabra.cli --rebuild-case-index
|
||||
```
|
||||
|
||||
This builds a `CaseIndex` from all saved trajectories and saves it to the default path (`<base-dir>/case-index.json`). On subsequent runs, load it without rebuilding:
|
||||
|
||||
```bash
|
||||
source venv/bin/activate
|
||||
python -m src.memabra.cli --case-index /custom/artifacts/case-index.json
|
||||
```
|
||||
|
||||
When a case index path is provided, the online-learning cycle automatically rebuilds the index after training and evaluation, so benchmark-generated trajectories are included for future episodic retrieval.
|
||||
|
||||
When a case index is loaded, the runner injects an episodic memory candidate into retrieval for inputs that match a previously seen task, surfacing the best past trajectory as a hint to the router.
|
||||
|
||||
Or inline:
|
||||
|
||||
```bash
|
||||
source venv/bin/activate
|
||||
python - <<'PY'
|
||||
from src.memabra.cli import run_online_learning_workflow
|
||||
print(run_online_learning_workflow())
|
||||
PY
|
||||
```
|
||||
|
||||
## Promotion gates
|
||||
|
||||
A challenger is promoted only when **all** of the following are true:
|
||||
|
||||
- `reward_delta >= min_reward_delta` — the challenger must improve average reward by at least this amount
|
||||
- `error_rate_delta <= max_error_rate_increase` — the challenger must not increase errors beyond this limit
|
||||
- `latency_delta_ms <= max_latency_increase_ms` — the challenger must not become slower beyond this limit
|
||||
- `task_count >= required_task_count` — the benchmark must include at least this many tasks
|
||||
|
||||
Default policy in the CLI workflow is lenient for alpha exploration. In production you should tighten these thresholds.
|
||||
|
||||
## Where reports and versions are stored
|
||||
|
||||
By default everything lands under:
|
||||
|
||||
- `docs/projects/memabra/demo-artifacts/trajectories/` — raw task trajectories
|
||||
- `docs/projects/memabra/demo-artifacts/router-versions/versions/` — versioned router weights
|
||||
- `docs/projects/memabra/demo-artifacts/router-versions/current.json` — active router metadata (includes promotion source, benchmark summary, prior version, rollback history)
|
||||
- `docs/projects/memabra/demo-artifacts/training-reports/` — one JSON report per training run
|
||||
|
||||
## What happens when the challenger loses
|
||||
|
||||
- The active router in the app **remains unchanged**
|
||||
- A training report is still saved with the rejection reasons
|
||||
- No new version is registered as current
|
||||
|
||||
## Rolling back
|
||||
|
||||
You can roll back to any previous version from Python:
|
||||
|
||||
```python
|
||||
from src.memabra.router_versioning import RouterVersionStore
|
||||
|
||||
store = RouterVersionStore()
|
||||
store.rollback("20260414-123456")
|
||||
current = store.get_current()
|
||||
print(current)
|
||||
```
|
||||
|
||||
Or from the CLI:
|
||||
|
||||
```bash
|
||||
source venv/bin/activate
|
||||
python -m src.memabra.cli --rollback 20260414-123456
|
||||
```
|
||||
|
||||
To see all available versions before rolling back:
|
||||
|
||||
```bash
|
||||
source venv/bin/activate
|
||||
python -m src.memabra.cli --list-versions
|
||||
```
|
||||
|
||||
Rollback preserves an audit trail in `current.json` (`rollback_from`, `rolled_back_at`).
|
||||
|
||||
## Status check
|
||||
|
||||
To quickly inspect the current system state without running a learning cycle:
|
||||
|
||||
```bash
|
||||
source venv/bin/activate
|
||||
python -m src.memabra.cli --status
|
||||
```
|
||||
|
||||
## Architecture summary
|
||||
|
||||
```
|
||||
Trajectories -> ArtifactIndex -> DatasetBuilder -> SimpleLearningRouter (challenger)
|
||||
|
|
||||
v
|
||||
BenchmarkSuite -> Evaluator -> baseline vs challenger
|
||||
|
|
||||
v
|
||||
PromotionPolicy.evaluate()
|
||||
|
|
||||
+-------------------+-------------------+
|
||||
| accepted | rejected
|
||||
v v
|
||||
RouterVersionStore.save() training report saved
|
||||
app.set_router(challenger) active router unchanged
|
||||
```
|
||||
162
docs/PROGRESS.md
Normal file
162
docs/PROGRESS.md
Normal file
@@ -0,0 +1,162 @@
|
||||
# memabra Progress
|
||||
|
||||
## Current status
|
||||
|
||||
Project status: safe self-improving alpha, benchmark-gated online learning loop complete
|
||||
Date: 2026-04-15
|
||||
Project: memabra
|
||||
Subtitle: An intuition-driven control plane for agent memory and action selection.
|
||||
|
||||
## What exists now
|
||||
|
||||
memabra now has a complete safe self-improving alpha control-plane loop:
|
||||
- candidate retrieval
|
||||
- routing decisions
|
||||
- memory / skill / tool execution
|
||||
- telemetry events
|
||||
- trajectory construction
|
||||
- runtime validation
|
||||
- artifact persistence
|
||||
- replay and analytics
|
||||
- artifact indexing and dataset slicing
|
||||
- lightweight learning router training
|
||||
- A/B evaluation
|
||||
- router weight versioning and rollback
|
||||
- benchmark-gated promotion with explicit policy thresholds
|
||||
- auditable training reports
|
||||
- exception-safe online learning coordinator
|
||||
- configurable CLI entrypoint
|
||||
- persisted seen-trajectory tracking across restarts (safe for cron jobs)
|
||||
- dry-run mode for training/evaluation without promotion risk
|
||||
- baseline version selection for challenger evaluation
|
||||
- task case index (`CaseIndex`) for episodic retrieval: maps normalized inputs to the best past trajectory ID
|
||||
- `CaseIndex` integration into `MemabraApp` (build, save, load, lookup) and `MemabraRunner` (injects episodic candidate on matching inputs)
|
||||
- CLI flags `--case-index` and `--rebuild-case-index` for operator-managed episodic retrieval
|
||||
- `OnlineLearningCoordinator` auto-rebuilds case index after each cycle when `case_index_path` is provided, ensuring benchmark-generated trajectories are indexed
|
||||
- `TrajectorySummarizer` generates human-readable trajectory summaries from task input, decisions, outcome, and reward
|
||||
- `MemabraRunner` enriches episodic memory candidate summaries using `TrajectorySummarizer` when `persistence_store` is available
|
||||
- CLI `--status` flag prints current system state (active router version, counts, latest report) without triggering a learning cycle
|
||||
- CLI is now subcommand-driven (`run`, `status`, `version list`, `version rollback`) with a dedicated packaged `memabra` entrypoint
|
||||
- CLI `--format text` mode provides operator-friendly summaries for status checks, version listings, rollbacks, and workflow runs, including latest report details, current-version highlighting, sectioned workflow summaries, normalized yes/no flags, and fixed-precision benchmark/promotion metrics
|
||||
|
||||
## Major completed capabilities
|
||||
|
||||
### Foundations
|
||||
- project naming, architecture, roadmap, decisions, reward spec
|
||||
- candidate / event / trajectory / memory schemas
|
||||
- prototype package structure under `src/memabra/`
|
||||
|
||||
### Runtime path
|
||||
- `retrieval.py`: typed candidate retrieval
|
||||
- `router.py`: heuristic router, feature-scoring router, learning router
|
||||
- `execution.py`: memory, skill, tool executors and adapters
|
||||
- `runner.py`: end-to-end task -> trajectory orchestration
|
||||
- `persistence.py`: trajectory and memory artifact storage
|
||||
- `replay.py`: replay summaries over examples and persisted runs
|
||||
- `memory_store.py`: typed memory records with verify/revoke support
|
||||
|
||||
### Adapters and evaluation
|
||||
- real tool adapters:
|
||||
- `LocalFunctionToolAdapter`
|
||||
- `SubprocessToolAdapter`
|
||||
- `ToolRegistry`
|
||||
- real skill loading:
|
||||
- `FileSystemSkillBackend`
|
||||
- richer evaluation path:
|
||||
- `OutcomeEngine`
|
||||
- `RewardEngine`
|
||||
- `ArtifactIndex`
|
||||
- `DatasetBuilder`
|
||||
- `Evaluator`
|
||||
- `RouterVersionStore`
|
||||
- Alpha Iteration 1 — online learning loop:
|
||||
- `PromotionPolicy` with benchmark-gated promotion rules
|
||||
- `BenchmarkSuite` persistence (JSON load/save + default seed)
|
||||
- `OnlineLearningCoordinator` for retrain/evaluate/promote cycles
|
||||
- exception-safe coordinator: training/evaluation failures emit auditable error reports instead of crashing
|
||||
- `TrainingReportStore.get_report()` for by-id report lookup
|
||||
|
||||
### Product/demo surface
|
||||
- `app.py`: `MemabraApp`, demo builders, artifact index access, training hooks, `run_online_learning_cycle`
|
||||
- `cli.py`: wrap-up workflow and `run_online_learning_workflow` with benchmark-gated promotion
|
||||
- `cli.py`: argument parsing (`--base-dir`, `--min-new-trajectories`) and clean `python -m src.memabra.cli` execution
|
||||
- `DEMO.md`: runnable walkthrough with CLI options
|
||||
|
||||
## Current test status
|
||||
|
||||
Command:
|
||||
`source venv/bin/activate && python -m pytest tests/memabra -q`
|
||||
|
||||
Latest result:
|
||||
`118 passed`
|
||||
|
||||
All alpha iteration 1 source, tests, and documentation have been committed to the repository (commit `34cf507c`).
|
||||
|
||||
## Most important current files
|
||||
|
||||
### Core package
|
||||
- `src/memabra/app.py`
|
||||
- `src/memabra/cli.py`
|
||||
- `src/memabra/router.py`
|
||||
- `src/memabra/runner.py`
|
||||
- `src/memabra/execution.py`
|
||||
- `src/memabra/evaluator.py`
|
||||
- `src/memabra/router_versioning.py`
|
||||
- `src/memabra/promotion.py`
|
||||
- `src/memabra/online_learning.py`
|
||||
- `src/memabra/training_reports.py`
|
||||
- `src/memabra/benchmarks.py`
|
||||
- `src/memabra/case_index.py`
|
||||
|
||||
### Tests
|
||||
- `tests/memabra/test_app.py`
|
||||
- `tests/memabra/test_cli_workflow.py`
|
||||
- `tests/memabra/test_package_exports.py`
|
||||
- `tests/memabra/test_promotion.py`
|
||||
- `tests/memabra/test_online_learning.py`
|
||||
- `tests/memabra/test_training_reports.py`
|
||||
- `tests/memabra/test_benchmarks.py`
|
||||
- `tests/memabra/test_router_versioning.py`
|
||||
- `tests/memabra/test_evaluator.py`
|
||||
- `tests/memabra/test_router_protocol.py`
|
||||
- `tests/memabra/test_execution_persistence.py`
|
||||
|
||||
## Wrap-up status
|
||||
|
||||
The project is now in a safe self-improving alpha state.
|
||||
It can:
|
||||
- run realistic demo tasks
|
||||
- persist trajectories
|
||||
- replay and inspect results
|
||||
- train a lightweight router from saved artifacts
|
||||
- compare baseline vs challenger routers
|
||||
- apply a promotion policy with explicit thresholds
|
||||
- save and reload router versions with metadata
|
||||
- emit auditable training reports
|
||||
- run an online-learning cycle from the CLI
|
||||
- leave the active router unchanged when challenger fails
|
||||
- survive training/evaluation failures gracefully and emit error reports
|
||||
- accept CLI overrides for artifact directory and trajectory thresholds
|
||||
- persist seen-trajectory state across restarts so cron jobs don't retrain on the same data
|
||||
- default CLI `main()` persists seen trajectories to `<base-dir>/seen-trajectories.json`
|
||||
- run in dry-run mode to evaluate a challenger without promoting it
|
||||
- run in baseline-version mode to compare a challenger against a specific saved version instead of the currently active router
|
||||
- index successful task cases by normalized input for episodic retrieval (`CaseIndex`)
|
||||
- build/save/load a case index from `MemabraApp`
|
||||
- inject episodic memory candidates during runner retrieval when a similar past task exists
|
||||
- use `--case-index` and `--rebuild-case-index` CLI flags to manage episodic retrieval
|
||||
- online-learning cycles automatically refresh the case index after training/evaluation when a case-index path is configured
|
||||
- episodic memory candidates now include rich human-readable summaries when the past trajectory is available via `persistence_store`
|
||||
- CLI `--status` flag provides a quick read-only snapshot of the active router, versions, trajectories, and reports
|
||||
- CLI `--rollback` and `--list-versions` flags enable operator-safe router version management without touching code
|
||||
|
||||
## Next sensible frontier
|
||||
|
||||
1. tighter integration with real Hermes trajectories
|
||||
2. multi-turn conversation state and working-memory updates
|
||||
3. richer real-world tool ecosystem integration (MCP, web, git, files)
|
||||
4. stronger storage/index backend beyond plain JSON files
|
||||
|
||||
## One-line summary
|
||||
|
||||
memabra is now a runnable, test-covered safe self-improving alpha for agent memory/action routing, with online learning, benchmark-gated promotion, and auditable reports.
|
||||
90
docs/PROTOTYPE_LAYOUT.md
Normal file
90
docs/PROTOTYPE_LAYOUT.md
Normal file
@@ -0,0 +1,90 @@
|
||||
# Prototype Layout
|
||||
|
||||
## 目标
|
||||
|
||||
为 memabra 建立一个最小可运行的原型目录结构,让后续 rule-based router、replay harness、sample trajectories 和训练样本生成都能有明确落点。
|
||||
|
||||
## 目录结构
|
||||
|
||||
```text
|
||||
src/memabra/
|
||||
├── __init__.py
|
||||
├── candidate_types.py # 统一候选对象与决策类型
|
||||
├── router.py # Rule-based router baseline
|
||||
├── telemetry.py # 事件、reward、轨迹的运行时结构
|
||||
├── reward.py # reward 聚合逻辑
|
||||
├── retrieval.py # 后续:候选召回接口
|
||||
├── memory_store.py # 后续:长期记忆存取
|
||||
├── replay.py # 后续:trajectory 回放与评估
|
||||
└── schemas.py # 后续:schema 装载/校验
|
||||
|
||||
tests/memabra/
|
||||
└── test_router_smoke.py # baseline 冒烟测试
|
||||
```
|
||||
|
||||
## 当前已落地
|
||||
|
||||
已创建:
|
||||
- `src/memabra/__init__.py`
|
||||
- `src/memabra/candidate_types.py`
|
||||
- `src/memabra/router.py`
|
||||
- `src/memabra/telemetry.py`
|
||||
- `src/memabra/reward.py`
|
||||
- `tests/memabra/test_router_smoke.py`
|
||||
|
||||
## 模块边界
|
||||
|
||||
### candidate_types.py
|
||||
负责:
|
||||
- `CandidateObject`
|
||||
- `DecisionType`
|
||||
- 后续可扩展 memory/skill/tool type-specific adapter
|
||||
|
||||
### router.py
|
||||
负责:
|
||||
- `TaskContext`
|
||||
- `RouteDecision`
|
||||
- `RuleBasedRouter`
|
||||
|
||||
当前只实现 baseline 启发式,后续升级为:
|
||||
- 特征打分器
|
||||
- reranker
|
||||
- learned policy
|
||||
|
||||
### telemetry.py
|
||||
负责:
|
||||
- 原子事件结构
|
||||
- reward breakdown
|
||||
- 后续 trajectory runtime objects
|
||||
|
||||
### reward.py
|
||||
负责:
|
||||
- reward 组合与计算
|
||||
- 后续权重版本化
|
||||
|
||||
## 设计原则
|
||||
|
||||
1. 先有可运行 baseline,再抽象复杂接口
|
||||
2. 数据结构先简单,但字段命名与 Phase 0 schema 保持一致
|
||||
3. 先保证 replayable,再考虑高性能
|
||||
4. 不提前引入数据库或向量库耦合
|
||||
|
||||
## 下一步落点
|
||||
|
||||
- `retrieval.py`:定义候选召回接口
|
||||
- `replay.py`:实现 trajectory 读取、回放和指标计算
|
||||
- `schemas.py`:把 JSON schema 转成运行时校验入口
|
||||
- `sample_data/`:放示例 candidates 和 trajectories
|
||||
|
||||
## 验证建议
|
||||
|
||||
在项目根目录运行:
|
||||
|
||||
```bash
|
||||
source venv/bin/activate
|
||||
python -m pytest tests/memabra/test_router_smoke.py -q
|
||||
```
|
||||
|
||||
期望:
|
||||
- baseline router 冒烟测试通过
|
||||
- 说明最小原型骨架已可被导入和调用
|
||||
87
docs/README.md
Normal file
87
docs/README.md
Normal file
@@ -0,0 +1,87 @@
|
||||
# memabra
|
||||
|
||||
An intuition-driven control plane for agent memory and action selection.
|
||||
|
||||
## Quick start
|
||||
|
||||
If you are working from this repository, activate the virtualenv and install the project in editable mode so the dedicated `memabra` command is available:
|
||||
|
||||
```bash
|
||||
source venv/bin/activate
|
||||
uv pip install -e ".[dev]"
|
||||
memabra --help
|
||||
memabra run --base-dir /tmp/memabra-demo --format text --dry-run
|
||||
```
|
||||
|
||||
The dedicated CLI is the fastest way to experience the alpha. It supports subcommands for different operations:
|
||||
|
||||
- `memabra run` — run the online-learning loop
|
||||
- `memabra status` — show system status
|
||||
- `memabra version list` — list saved router versions
|
||||
- `memabra version rollback <id>` — roll back to a version
|
||||
|
||||
memabra 的目标,不是做一个“会存东西的记忆库”,而是做一个本地 agent 的元认知控制器:
|
||||
在面对任务时,能像人的直觉一样,快速判断该直接回答、查记忆、加载 skill、还是调用工具;并且根据任务结果持续优化这种判断。
|
||||
|
||||
一句话定义:
|
||||
这是一个 local-first、可观测、可训练、可回放的 agent memory and action orchestration system。
|
||||
|
||||
## 为什么要做
|
||||
|
||||
传统 agent 的常见问题:
|
||||
- 上下文越来越胖,什么都往 prompt 里塞
|
||||
- 记忆、skill、工具是三套割裂系统
|
||||
- 成功或失败后,很难知道到底是哪一步起了作用
|
||||
- 想“学习”时,缺少可归因的轨迹数据
|
||||
|
||||
memabra 要解决的本质问题是:
|
||||
什么时候该依赖什么。
|
||||
|
||||
## 核心观点
|
||||
|
||||
先不要一上来做端到端神经网络大一统训练。
|
||||
先建立 4 层结构:
|
||||
1. 检索层:召回候选 memory / skill / tool
|
||||
2. 路由层:决定调用什么,以及先后顺序
|
||||
3. 执行层:真正注入记忆、加载 skill、调用工具
|
||||
4. 评估层:记录结果,分配 credit,形成训练样本
|
||||
|
||||
如果这 4 层都看不清,直接端到端训练,大概率会学成“少调工具、靠模型硬猜”的歪路子。
|
||||
|
||||
## 项目输出
|
||||
|
||||
当前目录先以方案与设计文档为主:
|
||||
- `ARCHITECTURE.md`:系统架构
|
||||
- `ROADMAP.md`:分阶段路线图
|
||||
- `DECISIONS.md`:关键设计决策
|
||||
- `PROGRESS.md`:当前进度和下一步
|
||||
- `schemas/`:Phase 0 的统一 schema
|
||||
- `reward_spec.md`:奖励设计草案
|
||||
|
||||
后续可以补:
|
||||
- `experiments/`:训练与评估实验
|
||||
- `src/`:原型代码
|
||||
- `tests/`:验证与回归测试
|
||||
|
||||
## 目标能力
|
||||
|
||||
最终希望具备:
|
||||
- 统一管理 facts / procedures / episodes 三类长期信息
|
||||
- 给 memory、skill、tool 建立统一候选召回机制
|
||||
- 让一个“直觉策略器”做快速动作选择
|
||||
- 通过任务结果反推策略好坏
|
||||
- 逐步从规则系统过渡到可学习策略
|
||||
- 在本地环境下可持续演化
|
||||
|
||||
## 当前状态
|
||||
|
||||
项目已初始化,并已进入 Phase 0 基础定义阶段:
|
||||
- 完成方向澄清
|
||||
- 确立分层路线
|
||||
- 完成命名
|
||||
- 建立项目目录
|
||||
- 写入首版架构、路线图、决策和进度文档
|
||||
- 准备补齐 schema 与 reward 规范
|
||||
|
||||
下一步建议直接进入 Phase 0:
|
||||
定义统一对象模型、轨迹日志结构、reward 拆分方案。
|
||||
60
docs/REPLAY_AND_RETRIEVAL.md
Normal file
60
docs/REPLAY_AND_RETRIEVAL.md
Normal file
@@ -0,0 +1,60 @@
|
||||
# Replay and Retrieval
|
||||
|
||||
## 目标
|
||||
|
||||
把 memabra 的最小闭环接起来:
|
||||
- retrieval 负责把 memory / skill / tool 候选召回出来
|
||||
- replay 负责读取 trajectories 并汇总行为结果
|
||||
|
||||
这两者一接上,系统就不再只是静态文档和单点 router,而是具备了:
|
||||
- 候选输入
|
||||
- 决策输出
|
||||
- 轨迹回放
|
||||
- 基础统计
|
||||
|
||||
## 当前实现
|
||||
|
||||
### retrieval.py
|
||||
提供:
|
||||
- `CandidateProvider` 协议
|
||||
- `InMemoryCandidateProvider`
|
||||
- `CandidateRetriever`
|
||||
- `RetrievalResult`
|
||||
|
||||
当前策略:
|
||||
- 使用 trigger/tag 与任务文本做简单 lexical matching
|
||||
- 结合 confidence / success_rate / freshness / cost / risk 做 baseline 排序
|
||||
- 对不同 provider 输出做按类型聚合与去重
|
||||
|
||||
### replay.py
|
||||
提供:
|
||||
- `TrajectoryReplay`
|
||||
- `ReplaySummary`
|
||||
|
||||
当前能力:
|
||||
- 加载单个 trajectory JSON
|
||||
- 加载目录下多个 trajectory
|
||||
- 汇总 outcome counts
|
||||
- 汇总 reward、latency、steps、user corrections
|
||||
- 统计各类 decision_type 次数
|
||||
|
||||
## 为什么这一步重要
|
||||
|
||||
没有 retrieval,router 只能对空候选做假动作。
|
||||
没有 replay,reward 和 trajectory 只是躺在磁盘上的 JSON 标本。
|
||||
|
||||
这一步之后,memabra 第一次拥有了最小闭环:
|
||||
任务 -> 候选 -> 决策 -> 轨迹 -> 回放统计
|
||||
|
||||
## 当前局限
|
||||
|
||||
- retrieval 还是词面匹配,不是 embedding 或 learned ranking
|
||||
- replay 只做汇总,不做 schema 校验和 counterfactual 对比
|
||||
- 还没有把 router 与 retriever 真正串成 end-to-end runner
|
||||
|
||||
## 下一步
|
||||
|
||||
1. 加 `schemas.py` 做运行时校验
|
||||
2. 做 `memory_store.py` 和 provider 接口
|
||||
3. 做 `runner.py` 把 retrieval + router + telemetry 串起来
|
||||
4. 给 replay 加基线比较和 reward breakdown 分析
|
||||
136
docs/ROADMAP.md
Normal file
136
docs/ROADMAP.md
Normal file
@@ -0,0 +1,136 @@
|
||||
# Roadmap
|
||||
|
||||
## 总体目标
|
||||
|
||||
构建一个本地 agent 记忆管理与元认知控制系统,使 agent 能在 memory、skill、tool 之间做可学习的动作选择,并通过任务反馈逐步优化策略。
|
||||
|
||||
## Phase 0 — Foundations / 仓基
|
||||
|
||||
目标:先把“对象”和“轨迹”定义清楚。
|
||||
|
||||
交付物:
|
||||
- 统一候选对象 schema
|
||||
- memory / skill / tool 类型边界定义
|
||||
- 事件日志 schema
|
||||
- trajectory schema
|
||||
- reward 拆解草案
|
||||
- 评估指标草案
|
||||
- 原型目录布局草案
|
||||
- baseline router 设计文档
|
||||
- 示例 trajectories
|
||||
|
||||
成功标准:
|
||||
- 对任何一次任务,都能完整记录:看到了什么、选了什么、结果如何
|
||||
- 文档足够清晰,后续实现不靠拍脑袋
|
||||
- 有第一批 success / failure trajectory 样本可供 replay 使用
|
||||
|
||||
状态:已完成
|
||||
|
||||
## Phase 1 — Observable MVP / 可观测最小系统
|
||||
|
||||
目标:做一个不学习、但能完整运行和记录的版本。
|
||||
|
||||
交付物:
|
||||
- 候选召回模块
|
||||
- memory/skill/tool 统一候选接口
|
||||
- 基于规则或启发式的 router
|
||||
- 执行适配层
|
||||
- 轨迹日志落盘
|
||||
- 基础可视化 / 回放能力
|
||||
|
||||
成功标准:
|
||||
- 给定任务,系统能做出动作选择
|
||||
- 每次动作都能复盘
|
||||
- 可以统计简单指标:命中率、工具调用率、任务完成率
|
||||
|
||||
状态:已完成
|
||||
|
||||
## Phase 2 — Learned Router / 学习型路由器
|
||||
|
||||
目标:让"直觉"开始可训练。
|
||||
|
||||
交付物:
|
||||
- 候选特征工程
|
||||
- 训练样本构建流程
|
||||
- 轻量分类器 / reranker / bandit
|
||||
- 离线评估基线
|
||||
- 路由策略 A/B 对比
|
||||
|
||||
成功标准:
|
||||
- 学习型路由在离线回放中优于规则路由
|
||||
- 减少明显无效调用
|
||||
- 能识别高价值 memory / skill / tool 场景
|
||||
|
||||
状态:已完成(SimpleLearningRouter、DatasetBuilder、Evaluator、A/B comparison、RouterVersionStore)
|
||||
|
||||
## Phase 3 — Rewarded Adaptation / 带反馈的适应
|
||||
|
||||
目标:利用任务结果对策略做持续更新。
|
||||
|
||||
交付物:
|
||||
- reward 聚合器
|
||||
- 用户修正信号接入
|
||||
- online / batch 更新机制
|
||||
- safe exploration 策略
|
||||
- 记忆置信度更新机制
|
||||
- benchmark-gated promotion policy
|
||||
- training run reports
|
||||
- active router metadata tracking
|
||||
|
||||
成功标准:
|
||||
- 策略可在连续任务中改善
|
||||
- 不会因为少量坏反馈快速崩掉
|
||||
- 可以识别并降权错误记忆
|
||||
- promotion 必须经过 benchmark 验证
|
||||
|
||||
状态:已完成(online learning coordinator、promotion policy、training reports、version metadata、benchmark-gated promotion、active router tracking、app/CLI entrypoints 已实现)
|
||||
|
||||
### Phase 4 — Episodic Learning / 情景学习
|
||||
|
||||
目标:把过往任务轨迹变成真正有用的 episodic memory。
|
||||
|
||||
交付物:
|
||||
- 任务案例索引 (done)
|
||||
- episode retrieval (done — via CaseIndex and runner injection)
|
||||
- 相似任务复用 (done — runner injects episodic candidate)
|
||||
- trajectory summarization (done — `TrajectorySummarizer` generates human-readable summaries)
|
||||
|
||||
成功标准:
|
||||
- 对重复型任务,系统能复用历史成功路径
|
||||
- episode 不会污染事实记忆和 skill 库
|
||||
|
||||
状态:进行中 (核心功能已完成)
|
||||
|
||||
## Phase 5 — End-to-End Experiments / 端到端实验
|
||||
|
||||
目标:验证是否值得把路由进一步内化到神经模型权重中。
|
||||
|
||||
交付物:
|
||||
- 训练数据集定义
|
||||
- SFT / preference / RL 实验方案
|
||||
- 与分层系统的对照评估
|
||||
- 风险分析:遗忘、过拟合、行为漂移
|
||||
|
||||
成功标准:
|
||||
- 至少在一组明确任务上优于分层基线
|
||||
- 不显著降低可解释性和稳定性
|
||||
|
||||
状态:未开始
|
||||
|
||||
## 每阶段都要守住的底线
|
||||
|
||||
- 必须可回放
|
||||
- 必须可归因
|
||||
- 必须分清 memory、skill、tool
|
||||
- 必须有失败样本,不只看成功样本
|
||||
- 必须能撤销错误记忆与错误策略
|
||||
|
||||
## 当前优先级
|
||||
|
||||
1. real adapters
|
||||
2. richer reward/outcome updates
|
||||
3. persistence-backed replay
|
||||
4. router scoring v2
|
||||
5. 再谈 learned router
|
||||
|
||||
这五步不打牢,后面训练都是空中楼阁。
|
||||
213
docs/ROUTER_BASELINE.md
Normal file
213
docs/ROUTER_BASELINE.md
Normal file
@@ -0,0 +1,213 @@
|
||||
# Rule-Based Router Baseline
|
||||
|
||||
## 目标
|
||||
|
||||
定义 memabra 在 Phase 1 使用的第一版路由策略。这个版本不学习,只靠显式规则和候选对象属性做动作选择。
|
||||
|
||||
它的价值不在于聪明,而在于:
|
||||
- 可观察
|
||||
- 可解释
|
||||
- 可回放
|
||||
- 可作为 learned router 的基线
|
||||
|
||||
## 动作空间
|
||||
|
||||
router 当前允许的动作:
|
||||
|
||||
1. `direct_answer`
|
||||
2. `inject_memory`
|
||||
3. `load_skill`
|
||||
4. `call_tool`
|
||||
5. `clarify`
|
||||
6. `composite_action`
|
||||
|
||||
### direct_answer
|
||||
适用场景:
|
||||
- 纯分析、命名、结构设计、解释类任务
|
||||
- 不依赖实时状态
|
||||
- 没有明显外部资源调用必要
|
||||
|
||||
### inject_memory
|
||||
适用场景:
|
||||
- 用户偏好
|
||||
- 项目约定
|
||||
- 环境事实
|
||||
- 历史已知稳定事实
|
||||
|
||||
### load_skill
|
||||
适用场景:
|
||||
- 任务像一个可复用 procedure
|
||||
- 存在已知工作流
|
||||
- 过往在类似任务中复用价值高
|
||||
|
||||
### call_tool
|
||||
适用场景:
|
||||
- 需要获取当前状态
|
||||
- 需要访问文件、系统、网页、进程、时间等实时信息
|
||||
- 需要执行动作而不是纯推理
|
||||
|
||||
### clarify
|
||||
适用场景:
|
||||
- 高风险且候选信号弱
|
||||
- 信息缺失会显著改变动作选择
|
||||
- 所有候选都低置信度
|
||||
|
||||
### composite_action
|
||||
适用场景:
|
||||
- 先 memory 再 tool
|
||||
- 先 skill 再 tool
|
||||
- 先 memory 再 skill
|
||||
|
||||
当前 baseline 先以单动作为主,组合动作先作为保留动作类型。
|
||||
|
||||
## 候选打分思路
|
||||
|
||||
每个候选对象都有公共字段:
|
||||
- `confidence`
|
||||
- `success_rate`
|
||||
- `cost`
|
||||
- `freshness`
|
||||
- `risk`
|
||||
|
||||
baseline 不做复杂学习,只用线性直觉打分。
|
||||
|
||||
### memory score
|
||||
|
||||
```text
|
||||
memory_score = confidence + freshness + success_rate - cost - risk
|
||||
```
|
||||
|
||||
### skill score
|
||||
|
||||
```text
|
||||
skill_score = confidence + success_rate - cost - risk
|
||||
```
|
||||
|
||||
### tool score
|
||||
|
||||
```text
|
||||
tool_score = confidence + success_rate - cost - risk
|
||||
```
|
||||
|
||||
注意:
|
||||
- memory 更看 freshness
|
||||
- tool 更看 risk
|
||||
- skill 更看 success_rate
|
||||
|
||||
## 第一版规则
|
||||
|
||||
### Rule 1: reasoning-first 任务优先 direct_answer
|
||||
若用户输入中明显包含以下信号:
|
||||
- why
|
||||
- think
|
||||
- design
|
||||
- name
|
||||
|
||||
且不存在强 tool 触发词,则优先 `direct_answer`。
|
||||
|
||||
### Rule 2: 需要实时状态时优先 tool
|
||||
若输入中出现:
|
||||
- check
|
||||
- run
|
||||
- open
|
||||
- current
|
||||
- list
|
||||
- time
|
||||
|
||||
则优先找高置信 `tool` 候选。
|
||||
|
||||
额外门槛:
|
||||
- `confidence >= 0.6`
|
||||
- `risk <= 0.7`
|
||||
|
||||
### Rule 3: 用户/项目稳定事实优先 memory
|
||||
若输入中出现:
|
||||
- prefer
|
||||
- remember
|
||||
- usually
|
||||
- my
|
||||
- our
|
||||
|
||||
则优先找高置信、较新鲜的 `memory` 候选。
|
||||
|
||||
额外门槛:
|
||||
- `confidence >= 0.65`
|
||||
- `freshness >= 0.3`
|
||||
|
||||
### Rule 4: 可复用工作流优先 skill
|
||||
若输入中出现:
|
||||
- fix
|
||||
- deploy
|
||||
- review
|
||||
- setup
|
||||
- workflow
|
||||
|
||||
则优先找高 success_rate 的 `skill` 候选。
|
||||
|
||||
额外门槛:
|
||||
- `confidence >= 0.55`
|
||||
- `success_rate >= 0.4`
|
||||
|
||||
### Rule 5: 没把握就 clarify
|
||||
如果没有任何一类候选达到门槛,则返回 `clarify`。
|
||||
|
||||
这条规则很丑,但很必要。
|
||||
宁可问一句,也别瞎调一堆工具把屋顶掀了。
|
||||
|
||||
## 冲突解决顺序
|
||||
|
||||
当多个动作同时触发时,baseline 使用以下优先级:
|
||||
|
||||
```text
|
||||
tool > memory > skill > direct_answer > clarify
|
||||
```
|
||||
|
||||
原因:
|
||||
- 实时信息需求通常最硬
|
||||
- 事实约束其次
|
||||
- skill 更像增强器
|
||||
- 纯回答放在明确无外部需求时
|
||||
|
||||
后续版本可改成:
|
||||
- 先 task intent classification
|
||||
- 再 per-type ranking
|
||||
- 最后做 global arbitration
|
||||
|
||||
## 已知局限
|
||||
|
||||
1. 关键词触发太脆
|
||||
2. 不看长程上下文
|
||||
3. 不支持真正的组合动作规划
|
||||
4. 不做反事实选择比较
|
||||
5. 容易被表面词汇误导
|
||||
|
||||
## baseline 的真正用途
|
||||
|
||||
不是追求高智能,而是提供:
|
||||
- 第一版可运行系统
|
||||
- 第一批可记录轨迹
|
||||
- 第一批失败样本
|
||||
- learned router 的比较对象
|
||||
|
||||
## 下一步
|
||||
|
||||
从这个 baseline 往后长,有三条路线:
|
||||
1. 引入显式特征工程
|
||||
2. 引入候选 reranker
|
||||
3. 引入 bandit / lightweight policy learning
|
||||
|
||||
在此之前,不要急着把 heuristic 糊成“伪智能”。先把 replay 和 metrics 做出来。
|
||||
|
||||
---
|
||||
|
||||
## 实现进展:FeatureScoringRouter (v2)
|
||||
|
||||
已在 `src/memabra/router.py` 中实现 `FeatureScoringRouter`,作为对 `RuleBasedRouter` 的升级:
|
||||
|
||||
- 明确特征打分:memory / skill / tool 分别使用不同权重组合 `confidence`、`success_rate`、`freshness`、`cost`、`risk`
|
||||
- 失败惩罚:候选 `id` 出现在 `TaskContext.recent_failures` 中时,自动扣减 0.5 分
|
||||
- 复合动作前置条件:`CandidateObject` 新增 `preconditions` 字段,支持声明如 `["memory"]` 等前置类型
|
||||
- 复合动作执行:`ExecutionEngine` 已支持 `composite_action` 决策类型,按 `composite_steps` 顺序递归执行子步骤
|
||||
- 打分透明度:`RouteDecision.score_breakdown` 记录每个候选的最终得分,方便追溯与评估
|
||||
|
||||
`FeatureScoringRouter` 保持了可解释性,同时为后续学习型策略提供了结构化特征输出。
|
||||
83
docs/RUNNER_AND_STORE.md
Normal file
83
docs/RUNNER_AND_STORE.md
Normal file
@@ -0,0 +1,83 @@
|
||||
# Runner, Schemas, and Memory Store
|
||||
|
||||
## 目标
|
||||
|
||||
把 memabra 从“能分别检索、路由、回放”推进到“能产出合法 draft trajectory、能校验数据、能管理 typed memory records”。
|
||||
|
||||
## 当前实现
|
||||
|
||||
### runner.py
|
||||
提供:
|
||||
- `MemabraRunner`
|
||||
|
||||
能力:
|
||||
- 接收 `TaskContext`
|
||||
- 调用 retriever 获取候选
|
||||
- 调用 router 生成动作决策
|
||||
- 自动生成 draft trajectory
|
||||
- 产出最小事件流:
|
||||
- `task_received`
|
||||
- `candidates_recalled`
|
||||
- `action_selected`
|
||||
|
||||
意义:
|
||||
这让 memabra 第一次具备了一个 task-to-trajectory 的实际入口。
|
||||
|
||||
### schemas.py
|
||||
提供:
|
||||
- `SchemaRegistry`
|
||||
- `SchemaValidationError`
|
||||
|
||||
当前策略:
|
||||
- 先做轻量级 runtime validation
|
||||
- 不依赖外部库
|
||||
- 先校验关键 required keys
|
||||
|
||||
这还不是完整 JSON Schema engine,但足够先守住地板线,避免样本结构乱飞。
|
||||
|
||||
### memory_store.py
|
||||
提供:
|
||||
- `MemoryRecord`
|
||||
- `MemorySource`
|
||||
- `VerificationState`
|
||||
- `InMemoryMemoryStore`
|
||||
|
||||
当前能力:
|
||||
- upsert
|
||||
- get
|
||||
- list_by_type
|
||||
- mark_used
|
||||
- verify
|
||||
- revoke
|
||||
|
||||
意义:
|
||||
现在 memabra 终于不是只会“谈记忆”,而是有一个 typed memory record runtime 了。
|
||||
|
||||
## 当前闭环
|
||||
|
||||
现在已有:
|
||||
- retrieval
|
||||
- router
|
||||
- runner
|
||||
- replay
|
||||
- memory store
|
||||
- schema validation
|
||||
|
||||
也就是:
|
||||
任务 -> 候选召回 -> 路由决策 -> trajectory 草稿 -> 回放统计
|
||||
并且 memory record 本身也能做校验和状态变更。
|
||||
|
||||
## 还差什么
|
||||
|
||||
- execution adapter(真实工具/skill/memory 注入)
|
||||
- 完整 JSON Schema 验证
|
||||
- trajectory 持久化层
|
||||
- richer reward aggregation
|
||||
- counterfactual replay
|
||||
|
||||
## 建议下一步
|
||||
|
||||
1. 做 `execution.py`
|
||||
2. 做 `persistence.py`
|
||||
3. 给 runner 接上 memory store 和 telemetry writeback
|
||||
4. 做 richer router scoring v2
|
||||
13
docs/demo-artifacts/router-versions/current.json
Normal file
13
docs/demo-artifacts/router-versions/current.json
Normal file
@@ -0,0 +1,13 @@
|
||||
{
|
||||
"current_version_id": "20260414-165018",
|
||||
"promotion_source": null,
|
||||
"benchmark_summary": {
|
||||
"reward_delta": -0.446,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": -21.0,
|
||||
"baseline_avg_reward": 0.886,
|
||||
"challenger_avg_reward": 0.44
|
||||
},
|
||||
"prior_version_id": "20260414-155224",
|
||||
"saved_at": "2026-04-14T16:50:18.865976+00:00"
|
||||
}
|
||||
@@ -0,0 +1,50 @@
|
||||
{
|
||||
"version_id": "20260414-143742",
|
||||
"weights": {
|
||||
"inject_memory": {
|
||||
"input_length": 43.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.95,
|
||||
"top_skill_success_rate": 0.9,
|
||||
"top_tool_confidence": 0.95,
|
||||
"top_tool_risk": 0.0
|
||||
},
|
||||
"load_skill": {
|
||||
"input_length": 44.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.95,
|
||||
"top_skill_success_rate": 0.9,
|
||||
"top_tool_confidence": 0.95,
|
||||
"top_tool_risk": 0.0
|
||||
},
|
||||
"call_tool": {
|
||||
"input_length": 32.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9500000000000001,
|
||||
"top_skill_success_rate": 0.9000000000000001,
|
||||
"top_tool_confidence": 0.9500000000000001,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"avg_reward": 1.04,
|
||||
"task_count": 3,
|
||||
"source": "wrapup_workflow"
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,50 @@
|
||||
{
|
||||
"version_id": "20260414-152738",
|
||||
"weights": {
|
||||
"load_skill": {
|
||||
"input_length": 42.15803814713897,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9499999999999997,
|
||||
"top_skill_success_rate": 0.9,
|
||||
"top_tool_confidence": 0.9499999999999997,
|
||||
"top_tool_risk": 0.0
|
||||
},
|
||||
"call_tool": {
|
||||
"input_length": 32.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.95,
|
||||
"top_skill_success_rate": 0.9000000000000001,
|
||||
"top_tool_confidence": 0.95,
|
||||
"top_tool_risk": 0.0
|
||||
},
|
||||
"inject_memory": {
|
||||
"input_length": 42.99999999999999,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.95,
|
||||
"top_skill_success_rate": 0.8999999999999999,
|
||||
"top_tool_confidence": 0.95,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"avg_reward": 1.04,
|
||||
"task_count": 3,
|
||||
"source": "wrapup_workflow"
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,55 @@
|
||||
{
|
||||
"version_id": "20260414-155224",
|
||||
"weights": {
|
||||
"load_skill": {
|
||||
"input_length": 42.38663484486874,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9499999999999997,
|
||||
"top_skill_success_rate": 0.9,
|
||||
"top_tool_confidence": 0.9499999999999997,
|
||||
"top_tool_risk": 0.0
|
||||
},
|
||||
"call_tool": {
|
||||
"input_length": 32.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9500000000000001,
|
||||
"top_skill_success_rate": 0.9000000000000001,
|
||||
"top_tool_confidence": 0.9500000000000001,
|
||||
"top_tool_risk": 0.0
|
||||
},
|
||||
"inject_memory": {
|
||||
"input_length": 41.75894988066825,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9499999999999997,
|
||||
"top_skill_success_rate": 0.8999999999999999,
|
||||
"top_tool_confidence": 0.9499999999999997,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"source": "online_learning",
|
||||
"benchmark_summary": {
|
||||
"reward_delta": 0.154,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": -21.0,
|
||||
"baseline_avg_reward": 0.886,
|
||||
"challenger_avg_reward": 1.04
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,65 @@
|
||||
{
|
||||
"version_id": "20260414-165018",
|
||||
"weights": {
|
||||
"load_skill": {
|
||||
"input_length": 41.594896331738454,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9499999999999998,
|
||||
"top_skill_success_rate": 0.9000000000000001,
|
||||
"top_tool_confidence": 0.9499999999999998,
|
||||
"top_tool_risk": 0.0
|
||||
},
|
||||
"call_tool": {
|
||||
"input_length": 32.85406896551724,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.95,
|
||||
"top_skill_success_rate": 0.9,
|
||||
"top_tool_confidence": 0.95,
|
||||
"top_tool_risk": 0.0
|
||||
},
|
||||
"clarify": {
|
||||
"input_length": 51.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.95,
|
||||
"top_skill_success_rate": 0.9,
|
||||
"top_tool_confidence": 0.95,
|
||||
"top_tool_risk": 0.0
|
||||
},
|
||||
"inject_memory": {
|
||||
"input_length": 41.45435244161358,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9499999999999996,
|
||||
"top_skill_success_rate": 0.8999999999999998,
|
||||
"top_tool_confidence": 0.9499999999999996,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"source": "online_learning",
|
||||
"benchmark_summary": {
|
||||
"reward_delta": -0.446,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": -21.0,
|
||||
"baseline_avg_reward": 0.886,
|
||||
"challenger_avg_reward": 0.44
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,52 @@
|
||||
{
|
||||
"report_id": "report-886de309-18d0-4be6-b626-0f7d2edc8b72",
|
||||
"timestamp": "2026-04-14T15:52:24.610516+00:00",
|
||||
"source_trajectory_ids": [
|
||||
"traj-004e53d5-006c-4e61-91a4-dc51cf7ee9bd",
|
||||
"traj-120aec7e-a74d-42d6-8846-c472680cc2f3",
|
||||
"traj-179d0c19-3f0f-4429-a85b-3e01802290d3",
|
||||
"traj-1ac5bb3d-f865-4c8c-8ff4-a9c29472b303",
|
||||
"traj-3f6687ff-3a55-4a26-a7bc-8397d8da7d15",
|
||||
"traj-439e4552-f248-43cb-b4eb-25db14da1ebc",
|
||||
"traj-58ec7a90-3ada-4b78-bc6a-6351be4eb4b5",
|
||||
"traj-6a5aaff5-9336-4a1d-b102-80f1196427ae",
|
||||
"traj-707b1dec-1d9a-4a71-a07a-54841155103c",
|
||||
"traj-80784ce5-fc14-4fee-9f5f-90dcec26179b",
|
||||
"traj-819443a2-79ea-48b7-a543-8bb7356dba36",
|
||||
"traj-9144cbc3-1ccf-4660-aad9-8db5797461eb",
|
||||
"traj-9190707c-5486-4266-a6c8-32f34c6c63ec",
|
||||
"traj-adb05c91-4c0c-493a-af84-517efea3f406",
|
||||
"traj-affbeb5b-eb52-40fd-94cb-48b7c374f1fc",
|
||||
"traj-bcad8fa2-ffd3-4e5b-9ddb-720f3898826e",
|
||||
"traj-c5907bfb-61d2-47f9-a6c5-2300701bb551",
|
||||
"traj-c9c11bdc-852b-4aef-851c-f2968806e535",
|
||||
"traj-d2d3a115-36d8-466f-9d14-bf741316f698",
|
||||
"traj-d3575889-7458-44b9-b3f1-f04cd766ca76",
|
||||
"traj-dd361c81-40a1-4892-9914-2140870fff95"
|
||||
],
|
||||
"sample_count": 21,
|
||||
"baseline_metrics": {
|
||||
"task_count": 4,
|
||||
"avg_reward": 0.886,
|
||||
"error_rate": 0.0,
|
||||
"avg_latency_ms": 21.0
|
||||
},
|
||||
"challenger_metrics": {
|
||||
"task_count": 4,
|
||||
"avg_reward": 1.04,
|
||||
"error_rate": 0.0,
|
||||
"avg_latency_ms": 0.0
|
||||
},
|
||||
"promotion_decision": {
|
||||
"accepted": true,
|
||||
"reasons": [],
|
||||
"metrics": {
|
||||
"reward_delta": 0.154,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": -21.0,
|
||||
"baseline_avg_reward": 0.886,
|
||||
"challenger_avg_reward": 1.04
|
||||
}
|
||||
},
|
||||
"promoted_version_id": "20260414-155224"
|
||||
}
|
||||
@@ -0,0 +1,60 @@
|
||||
{
|
||||
"report_id": "report-e7050e1f-fa3c-42e4-9178-e57f69b2dc1d",
|
||||
"timestamp": "2026-04-14T16:50:18.866221+00:00",
|
||||
"source_trajectory_ids": [
|
||||
"traj-004e53d5-006c-4e61-91a4-dc51cf7ee9bd",
|
||||
"traj-120aec7e-a74d-42d6-8846-c472680cc2f3",
|
||||
"traj-179d0c19-3f0f-4429-a85b-3e01802290d3",
|
||||
"traj-1ac5bb3d-f865-4c8c-8ff4-a9c29472b303",
|
||||
"traj-217ccafa-716c-4534-813b-a489ed7d6079",
|
||||
"traj-3f6687ff-3a55-4a26-a7bc-8397d8da7d15",
|
||||
"traj-439e4552-f248-43cb-b4eb-25db14da1ebc",
|
||||
"traj-58ec7a90-3ada-4b78-bc6a-6351be4eb4b5",
|
||||
"traj-6a5aaff5-9336-4a1d-b102-80f1196427ae",
|
||||
"traj-707b1dec-1d9a-4a71-a07a-54841155103c",
|
||||
"traj-74e92442-04fd-4f5a-979f-2dd81a7f08e1",
|
||||
"traj-80784ce5-fc14-4fee-9f5f-90dcec26179b",
|
||||
"traj-819443a2-79ea-48b7-a543-8bb7356dba36",
|
||||
"traj-9144cbc3-1ccf-4660-aad9-8db5797461eb",
|
||||
"traj-9190707c-5486-4266-a6c8-32f34c6c63ec",
|
||||
"traj-9edc5088-09cc-42d6-a160-cede5357f535",
|
||||
"traj-adb05c91-4c0c-493a-af84-517efea3f406",
|
||||
"traj-affbeb5b-eb52-40fd-94cb-48b7c374f1fc",
|
||||
"traj-b786c15f-388d-4228-9da4-c9e82b61570a",
|
||||
"traj-bcad8fa2-ffd3-4e5b-9ddb-720f3898826e",
|
||||
"traj-c5907bfb-61d2-47f9-a6c5-2300701bb551",
|
||||
"traj-c9c11bdc-852b-4aef-851c-f2968806e535",
|
||||
"traj-d2d3a115-36d8-466f-9d14-bf741316f698",
|
||||
"traj-d3575889-7458-44b9-b3f1-f04cd766ca76",
|
||||
"traj-dd361c81-40a1-4892-9914-2140870fff95",
|
||||
"traj-e197ee51-e87c-4203-b9ee-c2f2d530cceb",
|
||||
"traj-ed1d8812-f0ac-4994-86ab-21b3cf0fcb17",
|
||||
"traj-f1d895a0-5442-448f-8936-4ee8b07822e6",
|
||||
"traj-ffb40d01-7956-4d7b-a41c-9618487fe619"
|
||||
],
|
||||
"sample_count": 29,
|
||||
"baseline_metrics": {
|
||||
"task_count": 4,
|
||||
"avg_reward": 0.886,
|
||||
"error_rate": 0.0,
|
||||
"avg_latency_ms": 21.0
|
||||
},
|
||||
"challenger_metrics": {
|
||||
"task_count": 4,
|
||||
"avg_reward": 0.44,
|
||||
"error_rate": 0.0,
|
||||
"avg_latency_ms": 0.0
|
||||
},
|
||||
"promotion_decision": {
|
||||
"accepted": true,
|
||||
"reasons": [],
|
||||
"metrics": {
|
||||
"reward_delta": -0.446,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": -21.0,
|
||||
"baseline_avg_reward": 0.886,
|
||||
"challenger_avg_reward": 0.44
|
||||
}
|
||||
},
|
||||
"promoted_version_id": "20260414-165018"
|
||||
}
|
||||
@@ -0,0 +1,192 @@
|
||||
{
|
||||
"trajectory_id": "traj-004e53d5-006c-4e61-91a4-dc51cf7ee9bd",
|
||||
"task": {
|
||||
"task_id": "task-5977495f-189b-4a87-8924-4834bded854c",
|
||||
"input": "Check the current system status.",
|
||||
"channel": "local",
|
||||
"created_at": "2026-04-14T14:37:42.381631+00:00",
|
||||
"user_id": "oza"
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "",
|
||||
"environment_summary": "",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-telegram-pref",
|
||||
"type": "memory",
|
||||
"title": "Telegram preference",
|
||||
"summary": "Prefer plain text on Telegram.",
|
||||
"triggers": [
|
||||
"telegram",
|
||||
"preference",
|
||||
"answer"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.9,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"output"
|
||||
],
|
||||
"source": "user",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"skill": [
|
||||
{
|
||||
"id": "skill-deploy",
|
||||
"type": "skill",
|
||||
"title": "Deploy workflow",
|
||||
"summary": "Reusable deployment workflow.",
|
||||
"triggers": [
|
||||
"deploy",
|
||||
"workflow",
|
||||
"service"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.8,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.8,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"ops"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-terminal",
|
||||
"type": "tool",
|
||||
"title": "terminal",
|
||||
"summary": "Run terminal-style inspection commands.",
|
||||
"triggers": [
|
||||
"check",
|
||||
"current",
|
||||
"status",
|
||||
"system"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"inspection"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "load_skill",
|
||||
"selected_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Predicted by learning router (score=1413.615).",
|
||||
"estimated_cost": 0.0
|
||||
}
|
||||
],
|
||||
"events": [
|
||||
{
|
||||
"event_id": "evt-501ed3a1-622f-4e8a-b90b-2fb0384d89bd",
|
||||
"trajectory_id": "traj-004e53d5-006c-4e61-91a4-dc51cf7ee9bd",
|
||||
"timestamp": "2026-04-14T14:37:42.381702+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "task_received",
|
||||
"payload": {
|
||||
"input": "Check the current system status."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-4b6839de-ac61-414f-8939-3ba335a93cfa",
|
||||
"trajectory_id": "traj-004e53d5-006c-4e61-91a4-dc51cf7ee9bd",
|
||||
"timestamp": "2026-04-14T14:37:42.381707+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "candidates_recalled",
|
||||
"payload": {
|
||||
"memory_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"skill_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"tool_ids": [
|
||||
"tool-terminal"
|
||||
]
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-501ed3a1-622f-4e8a-b90b-2fb0384d89bd"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-1b229a15-af51-4924-932d-4d0318f0ba26",
|
||||
"trajectory_id": "traj-004e53d5-006c-4e61-91a4-dc51cf7ee9bd",
|
||||
"timestamp": "2026-04-14T14:37:42.381711+00:00",
|
||||
"stage": "policy",
|
||||
"event_type": "action_selected",
|
||||
"payload": {
|
||||
"step": 1,
|
||||
"decision_type": "load_skill",
|
||||
"selected_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Predicted by learning router (score=1413.615).",
|
||||
"estimated_cost": 0.0
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-4b6839de-ac61-414f-8939-3ba335a93cfa"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-skill-traj-004e53d5-006c-4e61-91a4-dc51cf7ee9bd-skill-deploy",
|
||||
"trajectory_id": "traj-004e53d5-006c-4e61-91a4-dc51cf7ee9bd",
|
||||
"timestamp": "2026-04-14T14:37:42.381718+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "skill_loaded",
|
||||
"payload": {
|
||||
"skill_id": "skill-deploy",
|
||||
"input": "Check the current system status.",
|
||||
"instructions": "Demo skill payload loaded successfully."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
}
|
||||
],
|
||||
"outcome": {
|
||||
"status": "success",
|
||||
"steps": 1,
|
||||
"latency_ms": 0,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Draft trajectory generated by MemabraRunner with execution hooks."
|
||||
},
|
||||
"reward": {
|
||||
"total": 1.04,
|
||||
"components": {
|
||||
"task_success": 0.8,
|
||||
"retrieval_hit": 0.2,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.0,
|
||||
"context_cost": 0.06,
|
||||
"useful_reuse": 0.1
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,207 @@
|
||||
{
|
||||
"trajectory_id": "traj-120aec7e-a74d-42d6-8846-c472680cc2f3",
|
||||
"task": {
|
||||
"task_id": "task-78a318e6-c8b4-4d05-bfd8-2ebe4b19710f",
|
||||
"input": "Check the current system status.",
|
||||
"channel": "local",
|
||||
"created_at": "2026-04-14T15:27:38.518486+00:00",
|
||||
"user_id": "oza"
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "",
|
||||
"environment_summary": "",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-telegram-pref",
|
||||
"type": "memory",
|
||||
"title": "Telegram preference",
|
||||
"summary": "Prefer plain text on Telegram.",
|
||||
"triggers": [
|
||||
"telegram",
|
||||
"preference",
|
||||
"answer"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.9,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"output"
|
||||
],
|
||||
"source": "user",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"skill": [
|
||||
{
|
||||
"id": "skill-deploy",
|
||||
"type": "skill",
|
||||
"title": "Deploy workflow",
|
||||
"summary": "Reusable deployment workflow.",
|
||||
"triggers": [
|
||||
"deploy",
|
||||
"workflow",
|
||||
"service"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.8,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.8,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"ops"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-terminal",
|
||||
"type": "tool",
|
||||
"title": "terminal",
|
||||
"summary": "Run terminal-style inspection commands.",
|
||||
"triggers": [
|
||||
"check",
|
||||
"current",
|
||||
"status",
|
||||
"system"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"inspection"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "call_tool",
|
||||
"selected_ids": [
|
||||
"tool-terminal"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task asks for current state or external action; tool use is justified.",
|
||||
"estimated_cost": 0.0
|
||||
}
|
||||
],
|
||||
"events": [
|
||||
{
|
||||
"event_id": "evt-be0db4ba-93b9-4cf7-bd76-51c1af70c6d4",
|
||||
"trajectory_id": "traj-120aec7e-a74d-42d6-8846-c472680cc2f3",
|
||||
"timestamp": "2026-04-14T15:27:38.518550+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "task_received",
|
||||
"payload": {
|
||||
"input": "Check the current system status."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-fb7734b7-bdab-4e24-8dec-a9debf02529d",
|
||||
"trajectory_id": "traj-120aec7e-a74d-42d6-8846-c472680cc2f3",
|
||||
"timestamp": "2026-04-14T15:27:38.518556+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "candidates_recalled",
|
||||
"payload": {
|
||||
"memory_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"skill_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"tool_ids": [
|
||||
"tool-terminal"
|
||||
]
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-be0db4ba-93b9-4cf7-bd76-51c1af70c6d4"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-8ed4e73b-2b45-44a6-9ab6-cc6184202dc0",
|
||||
"trajectory_id": "traj-120aec7e-a74d-42d6-8846-c472680cc2f3",
|
||||
"timestamp": "2026-04-14T15:27:38.518561+00:00",
|
||||
"stage": "policy",
|
||||
"event_type": "action_selected",
|
||||
"payload": {
|
||||
"step": 1,
|
||||
"decision_type": "call_tool",
|
||||
"selected_ids": [
|
||||
"tool-terminal"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task asks for current state or external action; tool use is justified.",
|
||||
"estimated_cost": 0.0
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-fb7734b7-bdab-4e24-8dec-a9debf02529d"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-tool-traj-120aec7e-a74d-42d6-8846-c472680cc2f3-tool-terminal",
|
||||
"trajectory_id": "traj-120aec7e-a74d-42d6-8846-c472680cc2f3",
|
||||
"timestamp": "2026-04-14T15:27:38.518572+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "tool_called",
|
||||
"payload": {
|
||||
"tool_id": "tool-terminal",
|
||||
"input": "Check the current system status."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-tool-result-traj-120aec7e-a74d-42d6-8846-c472680cc2f3-tool-terminal",
|
||||
"trajectory_id": "traj-120aec7e-a74d-42d6-8846-c472680cc2f3",
|
||||
"timestamp": "2026-04-14T15:27:38.518575+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "tool_result",
|
||||
"payload": {
|
||||
"tool_id": "tool-terminal",
|
||||
"status": "success",
|
||||
"output": "demo-result-for:tool-terminal",
|
||||
"error": null,
|
||||
"latency_ms": 42
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
}
|
||||
],
|
||||
"outcome": {
|
||||
"status": "success",
|
||||
"steps": 1,
|
||||
"latency_ms": 42,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Draft trajectory generated by MemabraRunner with execution hooks."
|
||||
},
|
||||
"reward": {
|
||||
"total": 1.032,
|
||||
"components": {
|
||||
"task_success": 0.8,
|
||||
"retrieval_hit": 0.25,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.008,
|
||||
"context_cost": 0.06,
|
||||
"useful_reuse": 0.05
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,207 @@
|
||||
{
|
||||
"trajectory_id": "traj-179d0c19-3f0f-4429-a85b-3e01802290d3",
|
||||
"task": {
|
||||
"task_id": "task-c0d9120f-4b28-4815-bcbc-1ea1cb523129",
|
||||
"input": "Check the current system status.",
|
||||
"channel": "telegram",
|
||||
"created_at": "2026-04-14T15:27:38.512676+00:00",
|
||||
"user_id": "oza"
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "",
|
||||
"environment_summary": "",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-telegram-pref",
|
||||
"type": "memory",
|
||||
"title": "Telegram preference",
|
||||
"summary": "Prefer plain text on Telegram.",
|
||||
"triggers": [
|
||||
"telegram",
|
||||
"preference",
|
||||
"answer"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.9,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"output"
|
||||
],
|
||||
"source": "user",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"skill": [
|
||||
{
|
||||
"id": "skill-deploy",
|
||||
"type": "skill",
|
||||
"title": "Deploy workflow",
|
||||
"summary": "Reusable deployment workflow.",
|
||||
"triggers": [
|
||||
"deploy",
|
||||
"workflow",
|
||||
"service"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.8,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.8,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"ops"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-terminal",
|
||||
"type": "tool",
|
||||
"title": "terminal",
|
||||
"summary": "Run terminal-style inspection commands.",
|
||||
"triggers": [
|
||||
"check",
|
||||
"current",
|
||||
"status",
|
||||
"system"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"inspection"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "call_tool",
|
||||
"selected_ids": [
|
||||
"tool-terminal"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task asks for current state or external action; tool use is justified.",
|
||||
"estimated_cost": 0.0
|
||||
}
|
||||
],
|
||||
"events": [
|
||||
{
|
||||
"event_id": "evt-2e159144-a5dc-4bab-bb15-026b156788a7",
|
||||
"trajectory_id": "traj-179d0c19-3f0f-4429-a85b-3e01802290d3",
|
||||
"timestamp": "2026-04-14T15:27:38.512756+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "task_received",
|
||||
"payload": {
|
||||
"input": "Check the current system status."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-84681604-ee59-4618-8b1b-bdc521e58e7d",
|
||||
"trajectory_id": "traj-179d0c19-3f0f-4429-a85b-3e01802290d3",
|
||||
"timestamp": "2026-04-14T15:27:38.512762+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "candidates_recalled",
|
||||
"payload": {
|
||||
"memory_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"skill_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"tool_ids": [
|
||||
"tool-terminal"
|
||||
]
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-2e159144-a5dc-4bab-bb15-026b156788a7"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-6404a35f-8775-4fc1-9648-62a27f4a1b23",
|
||||
"trajectory_id": "traj-179d0c19-3f0f-4429-a85b-3e01802290d3",
|
||||
"timestamp": "2026-04-14T15:27:38.512767+00:00",
|
||||
"stage": "policy",
|
||||
"event_type": "action_selected",
|
||||
"payload": {
|
||||
"step": 1,
|
||||
"decision_type": "call_tool",
|
||||
"selected_ids": [
|
||||
"tool-terminal"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task asks for current state or external action; tool use is justified.",
|
||||
"estimated_cost": 0.0
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-84681604-ee59-4618-8b1b-bdc521e58e7d"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-tool-traj-179d0c19-3f0f-4429-a85b-3e01802290d3-tool-terminal",
|
||||
"trajectory_id": "traj-179d0c19-3f0f-4429-a85b-3e01802290d3",
|
||||
"timestamp": "2026-04-14T15:27:38.512781+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "tool_called",
|
||||
"payload": {
|
||||
"tool_id": "tool-terminal",
|
||||
"input": "Check the current system status."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-tool-result-traj-179d0c19-3f0f-4429-a85b-3e01802290d3-tool-terminal",
|
||||
"trajectory_id": "traj-179d0c19-3f0f-4429-a85b-3e01802290d3",
|
||||
"timestamp": "2026-04-14T15:27:38.512785+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "tool_result",
|
||||
"payload": {
|
||||
"tool_id": "tool-terminal",
|
||||
"status": "success",
|
||||
"output": "demo-result-for:tool-terminal",
|
||||
"error": null,
|
||||
"latency_ms": 42
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
}
|
||||
],
|
||||
"outcome": {
|
||||
"status": "success",
|
||||
"steps": 1,
|
||||
"latency_ms": 42,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Draft trajectory generated by MemabraRunner with execution hooks."
|
||||
},
|
||||
"reward": {
|
||||
"total": 1.032,
|
||||
"components": {
|
||||
"task_success": 0.8,
|
||||
"retrieval_hit": 0.25,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.008,
|
||||
"context_cost": 0.06,
|
||||
"useful_reuse": 0.05
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,192 @@
|
||||
{
|
||||
"trajectory_id": "traj-1ac5bb3d-f865-4c8c-8ff4-a9c29472b303",
|
||||
"task": {
|
||||
"task_id": "task-f3701d8c-4931-4e43-8488-5fc670e5b2b1",
|
||||
"input": "Deploy this service with the usual workflow.",
|
||||
"channel": "local",
|
||||
"created_at": "2026-04-14T14:37:42.380802+00:00",
|
||||
"user_id": "oza"
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "",
|
||||
"environment_summary": "",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-telegram-pref",
|
||||
"type": "memory",
|
||||
"title": "Telegram preference",
|
||||
"summary": "Prefer plain text on Telegram.",
|
||||
"triggers": [
|
||||
"telegram",
|
||||
"preference",
|
||||
"answer"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.9,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"output"
|
||||
],
|
||||
"source": "user",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"skill": [
|
||||
{
|
||||
"id": "skill-deploy",
|
||||
"type": "skill",
|
||||
"title": "Deploy workflow",
|
||||
"summary": "Reusable deployment workflow.",
|
||||
"triggers": [
|
||||
"deploy",
|
||||
"workflow",
|
||||
"service"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.8,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.8,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"ops"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-terminal",
|
||||
"type": "tool",
|
||||
"title": "terminal",
|
||||
"summary": "Run terminal-style inspection commands.",
|
||||
"triggers": [
|
||||
"check",
|
||||
"current",
|
||||
"status",
|
||||
"system"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"inspection"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "load_skill",
|
||||
"selected_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task resembles a reusable procedure; load a skill before action.",
|
||||
"estimated_cost": 0.0
|
||||
}
|
||||
],
|
||||
"events": [
|
||||
{
|
||||
"event_id": "evt-480e859f-7e5f-42f0-bfcc-f3cb954f75d5",
|
||||
"trajectory_id": "traj-1ac5bb3d-f865-4c8c-8ff4-a9c29472b303",
|
||||
"timestamp": "2026-04-14T14:37:42.380861+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "task_received",
|
||||
"payload": {
|
||||
"input": "Deploy this service with the usual workflow."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-398d16c2-3d12-44a7-8af2-aa306e20195c",
|
||||
"trajectory_id": "traj-1ac5bb3d-f865-4c8c-8ff4-a9c29472b303",
|
||||
"timestamp": "2026-04-14T14:37:42.380867+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "candidates_recalled",
|
||||
"payload": {
|
||||
"memory_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"skill_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"tool_ids": [
|
||||
"tool-terminal"
|
||||
]
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-480e859f-7e5f-42f0-bfcc-f3cb954f75d5"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-b63063ea-1ac7-4b85-a6c7-76a03791bc85",
|
||||
"trajectory_id": "traj-1ac5bb3d-f865-4c8c-8ff4-a9c29472b303",
|
||||
"timestamp": "2026-04-14T14:37:42.380871+00:00",
|
||||
"stage": "policy",
|
||||
"event_type": "action_selected",
|
||||
"payload": {
|
||||
"step": 1,
|
||||
"decision_type": "load_skill",
|
||||
"selected_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task resembles a reusable procedure; load a skill before action.",
|
||||
"estimated_cost": 0.0
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-398d16c2-3d12-44a7-8af2-aa306e20195c"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-skill-traj-1ac5bb3d-f865-4c8c-8ff4-a9c29472b303-skill-deploy",
|
||||
"trajectory_id": "traj-1ac5bb3d-f865-4c8c-8ff4-a9c29472b303",
|
||||
"timestamp": "2026-04-14T14:37:42.380877+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "skill_loaded",
|
||||
"payload": {
|
||||
"skill_id": "skill-deploy",
|
||||
"input": "Deploy this service with the usual workflow.",
|
||||
"instructions": "Demo skill payload loaded successfully."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
}
|
||||
],
|
||||
"outcome": {
|
||||
"status": "success",
|
||||
"steps": 1,
|
||||
"latency_ms": 0,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Draft trajectory generated by MemabraRunner with execution hooks."
|
||||
},
|
||||
"reward": {
|
||||
"total": 1.04,
|
||||
"components": {
|
||||
"task_success": 0.8,
|
||||
"retrieval_hit": 0.2,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.0,
|
||||
"context_cost": 0.06,
|
||||
"useful_reuse": 0.1
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,170 @@
|
||||
{
|
||||
"trajectory_id": "traj-1c2b1a9e-7290-4ea4-be52-c6ba60b72da0",
|
||||
"task": {
|
||||
"task_id": "task-bb730dc5-88ed-4455-9dbb-6cbba55ad0ce",
|
||||
"input": "Check current system status with a tool.",
|
||||
"channel": "local",
|
||||
"created_at": "2026-04-14T16:50:18.864549+00:00",
|
||||
"user_id": null
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "",
|
||||
"environment_summary": "",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-telegram-pref",
|
||||
"type": "memory",
|
||||
"title": "Telegram preference",
|
||||
"summary": "Prefer plain text on Telegram.",
|
||||
"triggers": [
|
||||
"telegram",
|
||||
"preference",
|
||||
"answer"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.9,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"output"
|
||||
],
|
||||
"source": "user",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"skill": [
|
||||
{
|
||||
"id": "skill-deploy",
|
||||
"type": "skill",
|
||||
"title": "Deploy workflow",
|
||||
"summary": "Reusable deployment workflow.",
|
||||
"triggers": [
|
||||
"deploy",
|
||||
"workflow",
|
||||
"service"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.8,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.8,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"ops"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-terminal",
|
||||
"type": "tool",
|
||||
"title": "terminal",
|
||||
"summary": "Run terminal-style inspection commands.",
|
||||
"triggers": [
|
||||
"check",
|
||||
"current",
|
||||
"status",
|
||||
"system"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"inspection"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "clarify",
|
||||
"selected_ids": [],
|
||||
"selected_payloads": [],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Predicted by learning router (score=2045.615).",
|
||||
"estimated_cost": 0.0
|
||||
}
|
||||
],
|
||||
"events": [
|
||||
{
|
||||
"event_id": "evt-f491ed7a-0017-463f-a346-2b13aac2ef27",
|
||||
"trajectory_id": "traj-1c2b1a9e-7290-4ea4-be52-c6ba60b72da0",
|
||||
"timestamp": "2026-04-14T16:50:18.864653+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "task_received",
|
||||
"payload": {
|
||||
"input": "Check current system status with a tool."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-9b88da4b-fe41-4522-ba53-e88adf3df3b4",
|
||||
"trajectory_id": "traj-1c2b1a9e-7290-4ea4-be52-c6ba60b72da0",
|
||||
"timestamp": "2026-04-14T16:50:18.864663+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "candidates_recalled",
|
||||
"payload": {
|
||||
"memory_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"skill_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"tool_ids": [
|
||||
"tool-terminal"
|
||||
]
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-f491ed7a-0017-463f-a346-2b13aac2ef27"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-2fc97f2c-8219-44d3-98c7-5a86ad88326d",
|
||||
"trajectory_id": "traj-1c2b1a9e-7290-4ea4-be52-c6ba60b72da0",
|
||||
"timestamp": "2026-04-14T16:50:18.864669+00:00",
|
||||
"stage": "policy",
|
||||
"event_type": "action_selected",
|
||||
"payload": {
|
||||
"step": 1,
|
||||
"decision_type": "clarify",
|
||||
"selected_ids": [],
|
||||
"selected_payloads": [],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Predicted by learning router (score=2045.615).",
|
||||
"estimated_cost": 0.0
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-9b88da4b-fe41-4522-ba53-e88adf3df3b4"
|
||||
}
|
||||
],
|
||||
"outcome": {
|
||||
"status": "partial_success",
|
||||
"steps": 1,
|
||||
"latency_ms": 0,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Draft trajectory generated by MemabraRunner with execution hooks."
|
||||
},
|
||||
"reward": {
|
||||
"total": 0.44,
|
||||
"components": {
|
||||
"task_success": 0.4,
|
||||
"retrieval_hit": 0.1,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.0,
|
||||
"context_cost": 0.06,
|
||||
"useful_reuse": 0.0
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,207 @@
|
||||
{
|
||||
"trajectory_id": "traj-1ea60d6e-0b83-4cdf-a601-159373c780ee",
|
||||
"task": {
|
||||
"task_id": "task-c5221ec3-e5b9-4a2f-9774-fbb75018fe08",
|
||||
"input": "Check current system status with a tool.",
|
||||
"channel": "local",
|
||||
"created_at": "2026-04-14T16:50:18.862393+00:00",
|
||||
"user_id": null
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "",
|
||||
"environment_summary": "",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-telegram-pref",
|
||||
"type": "memory",
|
||||
"title": "Telegram preference",
|
||||
"summary": "Prefer plain text on Telegram.",
|
||||
"triggers": [
|
||||
"telegram",
|
||||
"preference",
|
||||
"answer"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.9,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"output"
|
||||
],
|
||||
"source": "user",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"skill": [
|
||||
{
|
||||
"id": "skill-deploy",
|
||||
"type": "skill",
|
||||
"title": "Deploy workflow",
|
||||
"summary": "Reusable deployment workflow.",
|
||||
"triggers": [
|
||||
"deploy",
|
||||
"workflow",
|
||||
"service"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.8,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.8,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"ops"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-terminal",
|
||||
"type": "tool",
|
||||
"title": "terminal",
|
||||
"summary": "Run terminal-style inspection commands.",
|
||||
"triggers": [
|
||||
"check",
|
||||
"current",
|
||||
"status",
|
||||
"system"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"inspection"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "call_tool",
|
||||
"selected_ids": [
|
||||
"tool-terminal"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task asks for current state or external action; tool use is justified.",
|
||||
"estimated_cost": 0.0
|
||||
}
|
||||
],
|
||||
"events": [
|
||||
{
|
||||
"event_id": "evt-93525bc5-5e71-481c-a7d4-0282ef59e0a3",
|
||||
"trajectory_id": "traj-1ea60d6e-0b83-4cdf-a601-159373c780ee",
|
||||
"timestamp": "2026-04-14T16:50:18.862483+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "task_received",
|
||||
"payload": {
|
||||
"input": "Check current system status with a tool."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-a01d1dff-a6dc-4c25-a5a5-14efd6f182b2",
|
||||
"trajectory_id": "traj-1ea60d6e-0b83-4cdf-a601-159373c780ee",
|
||||
"timestamp": "2026-04-14T16:50:18.862492+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "candidates_recalled",
|
||||
"payload": {
|
||||
"memory_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"skill_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"tool_ids": [
|
||||
"tool-terminal"
|
||||
]
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-93525bc5-5e71-481c-a7d4-0282ef59e0a3"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-28946864-c699-42fd-9802-dbfe6cb09043",
|
||||
"trajectory_id": "traj-1ea60d6e-0b83-4cdf-a601-159373c780ee",
|
||||
"timestamp": "2026-04-14T16:50:18.862498+00:00",
|
||||
"stage": "policy",
|
||||
"event_type": "action_selected",
|
||||
"payload": {
|
||||
"step": 1,
|
||||
"decision_type": "call_tool",
|
||||
"selected_ids": [
|
||||
"tool-terminal"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task asks for current state or external action; tool use is justified.",
|
||||
"estimated_cost": 0.0
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-a01d1dff-a6dc-4c25-a5a5-14efd6f182b2"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-tool-traj-1ea60d6e-0b83-4cdf-a601-159373c780ee-tool-terminal",
|
||||
"trajectory_id": "traj-1ea60d6e-0b83-4cdf-a601-159373c780ee",
|
||||
"timestamp": "2026-04-14T16:50:18.862511+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "tool_called",
|
||||
"payload": {
|
||||
"tool_id": "tool-terminal",
|
||||
"input": "Check current system status with a tool."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-tool-result-traj-1ea60d6e-0b83-4cdf-a601-159373c780ee-tool-terminal",
|
||||
"trajectory_id": "traj-1ea60d6e-0b83-4cdf-a601-159373c780ee",
|
||||
"timestamp": "2026-04-14T16:50:18.862515+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "tool_result",
|
||||
"payload": {
|
||||
"tool_id": "tool-terminal",
|
||||
"status": "success",
|
||||
"output": "demo-result-for:tool-terminal",
|
||||
"error": null,
|
||||
"latency_ms": 42
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
}
|
||||
],
|
||||
"outcome": {
|
||||
"status": "success",
|
||||
"steps": 1,
|
||||
"latency_ms": 42,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Draft trajectory generated by MemabraRunner with execution hooks."
|
||||
},
|
||||
"reward": {
|
||||
"total": 1.032,
|
||||
"components": {
|
||||
"task_success": 0.8,
|
||||
"retrieval_hit": 0.25,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.008,
|
||||
"context_cost": 0.06,
|
||||
"useful_reuse": 0.05
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,170 @@
|
||||
{
|
||||
"trajectory_id": "traj-217ccafa-716c-4534-813b-a489ed7d6079",
|
||||
"task": {
|
||||
"task_id": "task-5f14e5ed-0635-44a0-82e8-419187b040f3",
|
||||
"input": "Use multiple capabilities: memory, skill, and tool.",
|
||||
"channel": "local",
|
||||
"created_at": "2026-04-14T15:52:24.605025+00:00",
|
||||
"user_id": null
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "",
|
||||
"environment_summary": "",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-telegram-pref",
|
||||
"type": "memory",
|
||||
"title": "Telegram preference",
|
||||
"summary": "Prefer plain text on Telegram.",
|
||||
"triggers": [
|
||||
"telegram",
|
||||
"preference",
|
||||
"answer"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.9,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"output"
|
||||
],
|
||||
"source": "user",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"skill": [
|
||||
{
|
||||
"id": "skill-deploy",
|
||||
"type": "skill",
|
||||
"title": "Deploy workflow",
|
||||
"summary": "Reusable deployment workflow.",
|
||||
"triggers": [
|
||||
"deploy",
|
||||
"workflow",
|
||||
"service"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.8,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.8,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"ops"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-terminal",
|
||||
"type": "tool",
|
||||
"title": "terminal",
|
||||
"summary": "Run terminal-style inspection commands.",
|
||||
"triggers": [
|
||||
"check",
|
||||
"current",
|
||||
"status",
|
||||
"system"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"inspection"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "clarify",
|
||||
"selected_ids": [],
|
||||
"selected_payloads": [],
|
||||
"rejected_ids": [],
|
||||
"rationale": "No high-confidence route found from the current heuristic baseline.",
|
||||
"estimated_cost": 0.0
|
||||
}
|
||||
],
|
||||
"events": [
|
||||
{
|
||||
"event_id": "evt-13ccd07e-9bfd-4ff8-8080-47c400f0be6f",
|
||||
"trajectory_id": "traj-217ccafa-716c-4534-813b-a489ed7d6079",
|
||||
"timestamp": "2026-04-14T15:52:24.605116+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "task_received",
|
||||
"payload": {
|
||||
"input": "Use multiple capabilities: memory, skill, and tool."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-7ecaa289-b7bb-4ac6-ad62-9afb4a49d4a8",
|
||||
"trajectory_id": "traj-217ccafa-716c-4534-813b-a489ed7d6079",
|
||||
"timestamp": "2026-04-14T15:52:24.605126+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "candidates_recalled",
|
||||
"payload": {
|
||||
"memory_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"skill_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"tool_ids": [
|
||||
"tool-terminal"
|
||||
]
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-13ccd07e-9bfd-4ff8-8080-47c400f0be6f"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-ad398931-c79d-411a-93f8-8c5834f5446d",
|
||||
"trajectory_id": "traj-217ccafa-716c-4534-813b-a489ed7d6079",
|
||||
"timestamp": "2026-04-14T15:52:24.605138+00:00",
|
||||
"stage": "policy",
|
||||
"event_type": "action_selected",
|
||||
"payload": {
|
||||
"step": 1,
|
||||
"decision_type": "clarify",
|
||||
"selected_ids": [],
|
||||
"selected_payloads": [],
|
||||
"rejected_ids": [],
|
||||
"rationale": "No high-confidence route found from the current heuristic baseline.",
|
||||
"estimated_cost": 0.0
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-7ecaa289-b7bb-4ac6-ad62-9afb4a49d4a8"
|
||||
}
|
||||
],
|
||||
"outcome": {
|
||||
"status": "partial_success",
|
||||
"steps": 1,
|
||||
"latency_ms": 0,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Draft trajectory generated by MemabraRunner with execution hooks."
|
||||
},
|
||||
"reward": {
|
||||
"total": 0.44,
|
||||
"components": {
|
||||
"task_success": 0.4,
|
||||
"retrieval_hit": 0.1,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.0,
|
||||
"context_cost": 0.06,
|
||||
"useful_reuse": 0.0
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,185 @@
|
||||
{
|
||||
"trajectory_id": "traj-3f6687ff-3a55-4a26-a7bc-8397d8da7d15",
|
||||
"task": {
|
||||
"task_id": "task-aeed227c-2e87-45d8-8d98-e270656556b6",
|
||||
"input": "Use my telegram preference for this answer.",
|
||||
"channel": "telegram",
|
||||
"created_at": "2026-04-14T06:53:08.731336+00:00",
|
||||
"user_id": "oza"
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "",
|
||||
"environment_summary": "",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-telegram-pref",
|
||||
"type": "memory",
|
||||
"title": "Telegram preference",
|
||||
"summary": "Prefer plain text on Telegram.",
|
||||
"triggers": [
|
||||
"telegram",
|
||||
"preference",
|
||||
"answer"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.9,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"output"
|
||||
],
|
||||
"source": "user",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"skill": [
|
||||
{
|
||||
"id": "skill-deploy",
|
||||
"type": "skill",
|
||||
"title": "Deploy workflow",
|
||||
"summary": "Reusable deployment workflow.",
|
||||
"triggers": [
|
||||
"deploy",
|
||||
"workflow",
|
||||
"service"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.8,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.8,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"ops"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-terminal",
|
||||
"type": "tool",
|
||||
"title": "terminal",
|
||||
"summary": "Run terminal-style inspection commands.",
|
||||
"triggers": [
|
||||
"check",
|
||||
"current",
|
||||
"status",
|
||||
"system"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"inspection"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "inject_memory",
|
||||
"selected_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task likely depends on stable user/project facts.",
|
||||
"estimated_cost": 0.0
|
||||
}
|
||||
],
|
||||
"events": [
|
||||
{
|
||||
"event_id": "evt-d71b1fdf-5343-4ac1-89a0-75488c1ce30b",
|
||||
"trajectory_id": "traj-3f6687ff-3a55-4a26-a7bc-8397d8da7d15",
|
||||
"timestamp": "2026-04-14T06:53:08.731418+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "task_received",
|
||||
"payload": {
|
||||
"input": "Use my telegram preference for this answer."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-1f750475-1127-41e5-9f94-c87e4b019ee2",
|
||||
"trajectory_id": "traj-3f6687ff-3a55-4a26-a7bc-8397d8da7d15",
|
||||
"timestamp": "2026-04-14T06:53:08.731427+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "candidates_recalled",
|
||||
"payload": {
|
||||
"memory_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"skill_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"tool_ids": [
|
||||
"tool-terminal"
|
||||
]
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-d71b1fdf-5343-4ac1-89a0-75488c1ce30b"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-741967a5-41b9-4917-9b95-4047f89e6e19",
|
||||
"trajectory_id": "traj-3f6687ff-3a55-4a26-a7bc-8397d8da7d15",
|
||||
"timestamp": "2026-04-14T06:53:08.731432+00:00",
|
||||
"stage": "policy",
|
||||
"event_type": "action_selected",
|
||||
"payload": {
|
||||
"step": 1,
|
||||
"decision_type": "inject_memory",
|
||||
"selected_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task likely depends on stable user/project facts.",
|
||||
"estimated_cost": 0.0
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-1f750475-1127-41e5-9f94-c87e4b019ee2"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-memory-traj-3f6687ff-3a55-4a26-a7bc-8397d8da7d15-mem-telegram-pref",
|
||||
"trajectory_id": "traj-3f6687ff-3a55-4a26-a7bc-8397d8da7d15",
|
||||
"timestamp": "2026-04-14T06:53:08.731437+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "memory_injected",
|
||||
"payload": {
|
||||
"record_id": "mem-telegram-pref",
|
||||
"input": "Use my telegram preference for this answer."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
}
|
||||
],
|
||||
"outcome": {
|
||||
"status": "success",
|
||||
"steps": 1,
|
||||
"latency_ms": 0,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Draft trajectory generated by MemabraRunner with execution hooks."
|
||||
},
|
||||
"reward": {
|
||||
"total": 1.1,
|
||||
"components": {
|
||||
"task_success": 0.8,
|
||||
"retrieval_hit": 0.2,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.0,
|
||||
"context_cost": 0.0,
|
||||
"useful_reuse": 0.1
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,207 @@
|
||||
{
|
||||
"trajectory_id": "traj-439e4552-f248-43cb-b4eb-25db14da1ebc",
|
||||
"task": {
|
||||
"task_id": "task-cde62e1c-0106-4803-9c7d-a0c2f58206d6",
|
||||
"input": "Check the current system status.",
|
||||
"channel": "local",
|
||||
"created_at": "2026-04-14T14:37:42.380386+00:00",
|
||||
"user_id": "oza"
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "",
|
||||
"environment_summary": "",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-telegram-pref",
|
||||
"type": "memory",
|
||||
"title": "Telegram preference",
|
||||
"summary": "Prefer plain text on Telegram.",
|
||||
"triggers": [
|
||||
"telegram",
|
||||
"preference",
|
||||
"answer"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.9,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"output"
|
||||
],
|
||||
"source": "user",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"skill": [
|
||||
{
|
||||
"id": "skill-deploy",
|
||||
"type": "skill",
|
||||
"title": "Deploy workflow",
|
||||
"summary": "Reusable deployment workflow.",
|
||||
"triggers": [
|
||||
"deploy",
|
||||
"workflow",
|
||||
"service"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.8,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.8,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"ops"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-terminal",
|
||||
"type": "tool",
|
||||
"title": "terminal",
|
||||
"summary": "Run terminal-style inspection commands.",
|
||||
"triggers": [
|
||||
"check",
|
||||
"current",
|
||||
"status",
|
||||
"system"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"inspection"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "call_tool",
|
||||
"selected_ids": [
|
||||
"tool-terminal"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task asks for current state or external action; tool use is justified.",
|
||||
"estimated_cost": 0.0
|
||||
}
|
||||
],
|
||||
"events": [
|
||||
{
|
||||
"event_id": "evt-9252427a-3ceb-476a-b72d-a7e4f812194c",
|
||||
"trajectory_id": "traj-439e4552-f248-43cb-b4eb-25db14da1ebc",
|
||||
"timestamp": "2026-04-14T14:37:42.380442+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "task_received",
|
||||
"payload": {
|
||||
"input": "Check the current system status."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-333fbd7f-75b1-495f-acfa-6a66348ef16e",
|
||||
"trajectory_id": "traj-439e4552-f248-43cb-b4eb-25db14da1ebc",
|
||||
"timestamp": "2026-04-14T14:37:42.380447+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "candidates_recalled",
|
||||
"payload": {
|
||||
"memory_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"skill_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"tool_ids": [
|
||||
"tool-terminal"
|
||||
]
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-9252427a-3ceb-476a-b72d-a7e4f812194c"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-7f4eddba-f609-4d72-bf7c-cd6a938233a7",
|
||||
"trajectory_id": "traj-439e4552-f248-43cb-b4eb-25db14da1ebc",
|
||||
"timestamp": "2026-04-14T14:37:42.380452+00:00",
|
||||
"stage": "policy",
|
||||
"event_type": "action_selected",
|
||||
"payload": {
|
||||
"step": 1,
|
||||
"decision_type": "call_tool",
|
||||
"selected_ids": [
|
||||
"tool-terminal"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task asks for current state or external action; tool use is justified.",
|
||||
"estimated_cost": 0.0
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-333fbd7f-75b1-495f-acfa-6a66348ef16e"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-tool-traj-439e4552-f248-43cb-b4eb-25db14da1ebc-tool-terminal",
|
||||
"trajectory_id": "traj-439e4552-f248-43cb-b4eb-25db14da1ebc",
|
||||
"timestamp": "2026-04-14T14:37:42.380461+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "tool_called",
|
||||
"payload": {
|
||||
"tool_id": "tool-terminal",
|
||||
"input": "Check the current system status."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-tool-result-traj-439e4552-f248-43cb-b4eb-25db14da1ebc-tool-terminal",
|
||||
"trajectory_id": "traj-439e4552-f248-43cb-b4eb-25db14da1ebc",
|
||||
"timestamp": "2026-04-14T14:37:42.380464+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "tool_result",
|
||||
"payload": {
|
||||
"tool_id": "tool-terminal",
|
||||
"status": "success",
|
||||
"output": "demo-result-for:tool-terminal",
|
||||
"error": null,
|
||||
"latency_ms": 42
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
}
|
||||
],
|
||||
"outcome": {
|
||||
"status": "success",
|
||||
"steps": 1,
|
||||
"latency_ms": 42,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Draft trajectory generated by MemabraRunner with execution hooks."
|
||||
},
|
||||
"reward": {
|
||||
"total": 1.032,
|
||||
"components": {
|
||||
"task_success": 0.8,
|
||||
"retrieval_hit": 0.25,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.008,
|
||||
"context_cost": 0.06,
|
||||
"useful_reuse": 0.05
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,192 @@
|
||||
{
|
||||
"trajectory_id": "traj-58ec7a90-3ada-4b78-bc6a-6351be4eb4b5",
|
||||
"task": {
|
||||
"task_id": "task-0c82e670-45ab-45f9-af74-c5920f5eb9b3",
|
||||
"input": "Deploy this service with the usual workflow.",
|
||||
"channel": "telegram",
|
||||
"created_at": "2026-04-14T14:37:42.378256+00:00",
|
||||
"user_id": "oza"
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "",
|
||||
"environment_summary": "",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-telegram-pref",
|
||||
"type": "memory",
|
||||
"title": "Telegram preference",
|
||||
"summary": "Prefer plain text on Telegram.",
|
||||
"triggers": [
|
||||
"telegram",
|
||||
"preference",
|
||||
"answer"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.9,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"output"
|
||||
],
|
||||
"source": "user",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"skill": [
|
||||
{
|
||||
"id": "skill-deploy",
|
||||
"type": "skill",
|
||||
"title": "Deploy workflow",
|
||||
"summary": "Reusable deployment workflow.",
|
||||
"triggers": [
|
||||
"deploy",
|
||||
"workflow",
|
||||
"service"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.8,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.8,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"ops"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-terminal",
|
||||
"type": "tool",
|
||||
"title": "terminal",
|
||||
"summary": "Run terminal-style inspection commands.",
|
||||
"triggers": [
|
||||
"check",
|
||||
"current",
|
||||
"status",
|
||||
"system"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"inspection"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "load_skill",
|
||||
"selected_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task resembles a reusable procedure; load a skill before action.",
|
||||
"estimated_cost": 0.0
|
||||
}
|
||||
],
|
||||
"events": [
|
||||
{
|
||||
"event_id": "evt-757f035e-551f-4b55-a506-2aac41134885",
|
||||
"trajectory_id": "traj-58ec7a90-3ada-4b78-bc6a-6351be4eb4b5",
|
||||
"timestamp": "2026-04-14T14:37:42.378322+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "task_received",
|
||||
"payload": {
|
||||
"input": "Deploy this service with the usual workflow."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-dfcdd452-1902-4a6c-97fc-fd6a993c2045",
|
||||
"trajectory_id": "traj-58ec7a90-3ada-4b78-bc6a-6351be4eb4b5",
|
||||
"timestamp": "2026-04-14T14:37:42.378327+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "candidates_recalled",
|
||||
"payload": {
|
||||
"memory_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"skill_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"tool_ids": [
|
||||
"tool-terminal"
|
||||
]
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-757f035e-551f-4b55-a506-2aac41134885"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-c680ed8f-a6b0-48d1-bcd4-7423089aa916",
|
||||
"trajectory_id": "traj-58ec7a90-3ada-4b78-bc6a-6351be4eb4b5",
|
||||
"timestamp": "2026-04-14T14:37:42.378332+00:00",
|
||||
"stage": "policy",
|
||||
"event_type": "action_selected",
|
||||
"payload": {
|
||||
"step": 1,
|
||||
"decision_type": "load_skill",
|
||||
"selected_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task resembles a reusable procedure; load a skill before action.",
|
||||
"estimated_cost": 0.0
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-dfcdd452-1902-4a6c-97fc-fd6a993c2045"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-skill-traj-58ec7a90-3ada-4b78-bc6a-6351be4eb4b5-skill-deploy",
|
||||
"trajectory_id": "traj-58ec7a90-3ada-4b78-bc6a-6351be4eb4b5",
|
||||
"timestamp": "2026-04-14T14:37:42.378339+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "skill_loaded",
|
||||
"payload": {
|
||||
"skill_id": "skill-deploy",
|
||||
"input": "Deploy this service with the usual workflow.",
|
||||
"instructions": "Demo skill payload loaded successfully."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
}
|
||||
],
|
||||
"outcome": {
|
||||
"status": "success",
|
||||
"steps": 1,
|
||||
"latency_ms": 0,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Draft trajectory generated by MemabraRunner with execution hooks."
|
||||
},
|
||||
"reward": {
|
||||
"total": 1.04,
|
||||
"components": {
|
||||
"task_success": 0.8,
|
||||
"retrieval_hit": 0.2,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.0,
|
||||
"context_cost": 0.06,
|
||||
"useful_reuse": 0.1
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,191 @@
|
||||
{
|
||||
"trajectory_id": "traj-6a5aaff5-9336-4a1d-b102-80f1196427ae",
|
||||
"task": {
|
||||
"task_id": "task-549e2de3-bb55-4797-a862-e59f8d69a7e5",
|
||||
"input": "Use my telegram preference for this answer.",
|
||||
"channel": "telegram",
|
||||
"created_at": "2026-04-14T15:27:38.519692+00:00",
|
||||
"user_id": "oza"
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "",
|
||||
"environment_summary": "",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-telegram-pref",
|
||||
"type": "memory",
|
||||
"title": "Telegram preference",
|
||||
"summary": "Prefer plain text on Telegram.",
|
||||
"triggers": [
|
||||
"telegram",
|
||||
"preference",
|
||||
"answer"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.9,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"output"
|
||||
],
|
||||
"source": "user",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"skill": [
|
||||
{
|
||||
"id": "skill-deploy",
|
||||
"type": "skill",
|
||||
"title": "Deploy workflow",
|
||||
"summary": "Reusable deployment workflow.",
|
||||
"triggers": [
|
||||
"deploy",
|
||||
"workflow",
|
||||
"service"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.8,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.8,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"ops"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-terminal",
|
||||
"type": "tool",
|
||||
"title": "terminal",
|
||||
"summary": "Run terminal-style inspection commands.",
|
||||
"triggers": [
|
||||
"check",
|
||||
"current",
|
||||
"status",
|
||||
"system"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"inspection"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "inject_memory",
|
||||
"selected_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Predicted by learning router (score=1854.615).",
|
||||
"estimated_cost": 0.0
|
||||
}
|
||||
],
|
||||
"events": [
|
||||
{
|
||||
"event_id": "evt-369333af-5ca9-4c11-b163-6144d925ba91",
|
||||
"trajectory_id": "traj-6a5aaff5-9336-4a1d-b102-80f1196427ae",
|
||||
"timestamp": "2026-04-14T15:27:38.519774+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "task_received",
|
||||
"payload": {
|
||||
"input": "Use my telegram preference for this answer."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-51d31531-c49b-4af7-86f8-9fc3b5aff7a0",
|
||||
"trajectory_id": "traj-6a5aaff5-9336-4a1d-b102-80f1196427ae",
|
||||
"timestamp": "2026-04-14T15:27:38.519780+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "candidates_recalled",
|
||||
"payload": {
|
||||
"memory_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"skill_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"tool_ids": [
|
||||
"tool-terminal"
|
||||
]
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-369333af-5ca9-4c11-b163-6144d925ba91"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-3a842acf-5111-4b77-98a2-2a18c5a4a61d",
|
||||
"trajectory_id": "traj-6a5aaff5-9336-4a1d-b102-80f1196427ae",
|
||||
"timestamp": "2026-04-14T15:27:38.519784+00:00",
|
||||
"stage": "policy",
|
||||
"event_type": "action_selected",
|
||||
"payload": {
|
||||
"step": 1,
|
||||
"decision_type": "inject_memory",
|
||||
"selected_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Predicted by learning router (score=1854.615).",
|
||||
"estimated_cost": 0.0
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-51d31531-c49b-4af7-86f8-9fc3b5aff7a0"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-memory-traj-6a5aaff5-9336-4a1d-b102-80f1196427ae-mem-telegram-pref",
|
||||
"trajectory_id": "traj-6a5aaff5-9336-4a1d-b102-80f1196427ae",
|
||||
"timestamp": "2026-04-14T15:27:38.519790+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "memory_injected",
|
||||
"payload": {
|
||||
"record_id": "mem-telegram-pref",
|
||||
"input": "Use my telegram preference for this answer."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
}
|
||||
],
|
||||
"outcome": {
|
||||
"status": "success",
|
||||
"steps": 1,
|
||||
"latency_ms": 0,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Draft trajectory generated by MemabraRunner with execution hooks."
|
||||
},
|
||||
"reward": {
|
||||
"total": 1.04,
|
||||
"components": {
|
||||
"task_success": 0.8,
|
||||
"retrieval_hit": 0.2,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.0,
|
||||
"context_cost": 0.06,
|
||||
"useful_reuse": 0.1
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,207 @@
|
||||
{
|
||||
"trajectory_id": "traj-707b1dec-1d9a-4a71-a07a-54841155103c",
|
||||
"task": {
|
||||
"task_id": "task-23d5816f-12f3-4247-8c4f-9c01d13b1fd8",
|
||||
"input": "Check the current system status.",
|
||||
"channel": "telegram",
|
||||
"created_at": "2026-04-14T14:37:42.377746+00:00",
|
||||
"user_id": "oza"
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "",
|
||||
"environment_summary": "",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-telegram-pref",
|
||||
"type": "memory",
|
||||
"title": "Telegram preference",
|
||||
"summary": "Prefer plain text on Telegram.",
|
||||
"triggers": [
|
||||
"telegram",
|
||||
"preference",
|
||||
"answer"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.9,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"output"
|
||||
],
|
||||
"source": "user",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"skill": [
|
||||
{
|
||||
"id": "skill-deploy",
|
||||
"type": "skill",
|
||||
"title": "Deploy workflow",
|
||||
"summary": "Reusable deployment workflow.",
|
||||
"triggers": [
|
||||
"deploy",
|
||||
"workflow",
|
||||
"service"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.8,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.8,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"ops"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-terminal",
|
||||
"type": "tool",
|
||||
"title": "terminal",
|
||||
"summary": "Run terminal-style inspection commands.",
|
||||
"triggers": [
|
||||
"check",
|
||||
"current",
|
||||
"status",
|
||||
"system"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"inspection"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "call_tool",
|
||||
"selected_ids": [
|
||||
"tool-terminal"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task asks for current state or external action; tool use is justified.",
|
||||
"estimated_cost": 0.0
|
||||
}
|
||||
],
|
||||
"events": [
|
||||
{
|
||||
"event_id": "evt-15616207-b055-41b3-98e7-fca3fdd89ce9",
|
||||
"trajectory_id": "traj-707b1dec-1d9a-4a71-a07a-54841155103c",
|
||||
"timestamp": "2026-04-14T14:37:42.377821+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "task_received",
|
||||
"payload": {
|
||||
"input": "Check the current system status."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-431bb458-0488-4712-93d5-d7a689048022",
|
||||
"trajectory_id": "traj-707b1dec-1d9a-4a71-a07a-54841155103c",
|
||||
"timestamp": "2026-04-14T14:37:42.377827+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "candidates_recalled",
|
||||
"payload": {
|
||||
"memory_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"skill_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"tool_ids": [
|
||||
"tool-terminal"
|
||||
]
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-15616207-b055-41b3-98e7-fca3fdd89ce9"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-8bb2db02-56ae-4fad-a0bc-e30cd7fed98e",
|
||||
"trajectory_id": "traj-707b1dec-1d9a-4a71-a07a-54841155103c",
|
||||
"timestamp": "2026-04-14T14:37:42.377831+00:00",
|
||||
"stage": "policy",
|
||||
"event_type": "action_selected",
|
||||
"payload": {
|
||||
"step": 1,
|
||||
"decision_type": "call_tool",
|
||||
"selected_ids": [
|
||||
"tool-terminal"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task asks for current state or external action; tool use is justified.",
|
||||
"estimated_cost": 0.0
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-431bb458-0488-4712-93d5-d7a689048022"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-tool-traj-707b1dec-1d9a-4a71-a07a-54841155103c-tool-terminal",
|
||||
"trajectory_id": "traj-707b1dec-1d9a-4a71-a07a-54841155103c",
|
||||
"timestamp": "2026-04-14T14:37:42.377843+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "tool_called",
|
||||
"payload": {
|
||||
"tool_id": "tool-terminal",
|
||||
"input": "Check the current system status."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-tool-result-traj-707b1dec-1d9a-4a71-a07a-54841155103c-tool-terminal",
|
||||
"trajectory_id": "traj-707b1dec-1d9a-4a71-a07a-54841155103c",
|
||||
"timestamp": "2026-04-14T14:37:42.377846+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "tool_result",
|
||||
"payload": {
|
||||
"tool_id": "tool-terminal",
|
||||
"status": "success",
|
||||
"output": "demo-result-for:tool-terminal",
|
||||
"error": null,
|
||||
"latency_ms": 42
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
}
|
||||
],
|
||||
"outcome": {
|
||||
"status": "success",
|
||||
"steps": 1,
|
||||
"latency_ms": 42,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Draft trajectory generated by MemabraRunner with execution hooks."
|
||||
},
|
||||
"reward": {
|
||||
"total": 1.032,
|
||||
"components": {
|
||||
"task_success": 0.8,
|
||||
"retrieval_hit": 0.25,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.008,
|
||||
"context_cost": 0.06,
|
||||
"useful_reuse": 0.05
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,207 @@
|
||||
{
|
||||
"trajectory_id": "traj-74e92442-04fd-4f5a-979f-2dd81a7f08e1",
|
||||
"task": {
|
||||
"task_id": "task-e0c612c6-d846-4dc0-9c30-4a66d0a78d2a",
|
||||
"input": "Check current system status with a tool.",
|
||||
"channel": "local",
|
||||
"created_at": "2026-04-14T15:52:24.604470+00:00",
|
||||
"user_id": null
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "",
|
||||
"environment_summary": "",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-telegram-pref",
|
||||
"type": "memory",
|
||||
"title": "Telegram preference",
|
||||
"summary": "Prefer plain text on Telegram.",
|
||||
"triggers": [
|
||||
"telegram",
|
||||
"preference",
|
||||
"answer"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.9,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"output"
|
||||
],
|
||||
"source": "user",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"skill": [
|
||||
{
|
||||
"id": "skill-deploy",
|
||||
"type": "skill",
|
||||
"title": "Deploy workflow",
|
||||
"summary": "Reusable deployment workflow.",
|
||||
"triggers": [
|
||||
"deploy",
|
||||
"workflow",
|
||||
"service"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.8,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.8,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"ops"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-terminal",
|
||||
"type": "tool",
|
||||
"title": "terminal",
|
||||
"summary": "Run terminal-style inspection commands.",
|
||||
"triggers": [
|
||||
"check",
|
||||
"current",
|
||||
"status",
|
||||
"system"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"inspection"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "call_tool",
|
||||
"selected_ids": [
|
||||
"tool-terminal"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task asks for current state or external action; tool use is justified.",
|
||||
"estimated_cost": 0.0
|
||||
}
|
||||
],
|
||||
"events": [
|
||||
{
|
||||
"event_id": "evt-7befe34c-6cf6-422b-9615-11fd64b50899",
|
||||
"trajectory_id": "traj-74e92442-04fd-4f5a-979f-2dd81a7f08e1",
|
||||
"timestamp": "2026-04-14T15:52:24.604556+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "task_received",
|
||||
"payload": {
|
||||
"input": "Check current system status with a tool."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-8533f7c9-696d-413d-8484-d434ffccdd02",
|
||||
"trajectory_id": "traj-74e92442-04fd-4f5a-979f-2dd81a7f08e1",
|
||||
"timestamp": "2026-04-14T15:52:24.604565+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "candidates_recalled",
|
||||
"payload": {
|
||||
"memory_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"skill_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"tool_ids": [
|
||||
"tool-terminal"
|
||||
]
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-7befe34c-6cf6-422b-9615-11fd64b50899"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-2f878de3-e77d-42f6-8252-b692a11a69ac",
|
||||
"trajectory_id": "traj-74e92442-04fd-4f5a-979f-2dd81a7f08e1",
|
||||
"timestamp": "2026-04-14T15:52:24.604571+00:00",
|
||||
"stage": "policy",
|
||||
"event_type": "action_selected",
|
||||
"payload": {
|
||||
"step": 1,
|
||||
"decision_type": "call_tool",
|
||||
"selected_ids": [
|
||||
"tool-terminal"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task asks for current state or external action; tool use is justified.",
|
||||
"estimated_cost": 0.0
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-8533f7c9-696d-413d-8484-d434ffccdd02"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-tool-traj-74e92442-04fd-4f5a-979f-2dd81a7f08e1-tool-terminal",
|
||||
"trajectory_id": "traj-74e92442-04fd-4f5a-979f-2dd81a7f08e1",
|
||||
"timestamp": "2026-04-14T15:52:24.604584+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "tool_called",
|
||||
"payload": {
|
||||
"tool_id": "tool-terminal",
|
||||
"input": "Check current system status with a tool."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-tool-result-traj-74e92442-04fd-4f5a-979f-2dd81a7f08e1-tool-terminal",
|
||||
"trajectory_id": "traj-74e92442-04fd-4f5a-979f-2dd81a7f08e1",
|
||||
"timestamp": "2026-04-14T15:52:24.604588+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "tool_result",
|
||||
"payload": {
|
||||
"tool_id": "tool-terminal",
|
||||
"status": "success",
|
||||
"output": "demo-result-for:tool-terminal",
|
||||
"error": null,
|
||||
"latency_ms": 42
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
}
|
||||
],
|
||||
"outcome": {
|
||||
"status": "success",
|
||||
"steps": 1,
|
||||
"latency_ms": 42,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Draft trajectory generated by MemabraRunner with execution hooks."
|
||||
},
|
||||
"reward": {
|
||||
"total": 1.032,
|
||||
"components": {
|
||||
"task_success": 0.8,
|
||||
"retrieval_hit": 0.25,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.008,
|
||||
"context_cost": 0.06,
|
||||
"useful_reuse": 0.05
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,191 @@
|
||||
{
|
||||
"trajectory_id": "traj-77ab4624-013b-4f56-b600-b3e0cbef7a06",
|
||||
"task": {
|
||||
"task_id": "task-ad6649f7-dcca-4dd3-9521-3409c5f4e746",
|
||||
"input": "Recall my saved preference from memory.",
|
||||
"channel": "local",
|
||||
"created_at": "2026-04-14T16:50:18.861213+00:00",
|
||||
"user_id": null
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "",
|
||||
"environment_summary": "",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-telegram-pref",
|
||||
"type": "memory",
|
||||
"title": "Telegram preference",
|
||||
"summary": "Prefer plain text on Telegram.",
|
||||
"triggers": [
|
||||
"telegram",
|
||||
"preference",
|
||||
"answer"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.9,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"output"
|
||||
],
|
||||
"source": "user",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"skill": [
|
||||
{
|
||||
"id": "skill-deploy",
|
||||
"type": "skill",
|
||||
"title": "Deploy workflow",
|
||||
"summary": "Reusable deployment workflow.",
|
||||
"triggers": [
|
||||
"deploy",
|
||||
"workflow",
|
||||
"service"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.8,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.8,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"ops"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-terminal",
|
||||
"type": "tool",
|
||||
"title": "terminal",
|
||||
"summary": "Run terminal-style inspection commands.",
|
||||
"triggers": [
|
||||
"check",
|
||||
"current",
|
||||
"status",
|
||||
"system"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"inspection"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "inject_memory",
|
||||
"selected_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task likely depends on stable user/project facts.",
|
||||
"estimated_cost": 0.0
|
||||
}
|
||||
],
|
||||
"events": [
|
||||
{
|
||||
"event_id": "evt-61f191c1-68c7-4f0b-ab9b-f22b131e2637",
|
||||
"trajectory_id": "traj-77ab4624-013b-4f56-b600-b3e0cbef7a06",
|
||||
"timestamp": "2026-04-14T16:50:18.861293+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "task_received",
|
||||
"payload": {
|
||||
"input": "Recall my saved preference from memory."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-f0fbf671-18c5-4db2-86e5-68950b030992",
|
||||
"trajectory_id": "traj-77ab4624-013b-4f56-b600-b3e0cbef7a06",
|
||||
"timestamp": "2026-04-14T16:50:18.861299+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "candidates_recalled",
|
||||
"payload": {
|
||||
"memory_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"skill_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"tool_ids": [
|
||||
"tool-terminal"
|
||||
]
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-61f191c1-68c7-4f0b-ab9b-f22b131e2637"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-168a76e7-3c64-4f65-8a74-0969942d6d94",
|
||||
"trajectory_id": "traj-77ab4624-013b-4f56-b600-b3e0cbef7a06",
|
||||
"timestamp": "2026-04-14T16:50:18.861304+00:00",
|
||||
"stage": "policy",
|
||||
"event_type": "action_selected",
|
||||
"payload": {
|
||||
"step": 1,
|
||||
"decision_type": "inject_memory",
|
||||
"selected_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task likely depends on stable user/project facts.",
|
||||
"estimated_cost": 0.0
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-f0fbf671-18c5-4db2-86e5-68950b030992"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-memory-traj-77ab4624-013b-4f56-b600-b3e0cbef7a06-mem-telegram-pref",
|
||||
"trajectory_id": "traj-77ab4624-013b-4f56-b600-b3e0cbef7a06",
|
||||
"timestamp": "2026-04-14T16:50:18.861310+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "memory_injected",
|
||||
"payload": {
|
||||
"record_id": "mem-telegram-pref",
|
||||
"input": "Recall my saved preference from memory."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
}
|
||||
],
|
||||
"outcome": {
|
||||
"status": "success",
|
||||
"steps": 1,
|
||||
"latency_ms": 0,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Draft trajectory generated by MemabraRunner with execution hooks."
|
||||
},
|
||||
"reward": {
|
||||
"total": 1.04,
|
||||
"components": {
|
||||
"task_success": 0.8,
|
||||
"retrieval_hit": 0.2,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.0,
|
||||
"context_cost": 0.06,
|
||||
"useful_reuse": 0.1
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,192 @@
|
||||
{
|
||||
"trajectory_id": "traj-80784ce5-fc14-4fee-9f5f-90dcec26179b",
|
||||
"task": {
|
||||
"task_id": "task-37fe7921-66da-4390-a9bf-31209ae8a890",
|
||||
"input": "Use my telegram preference for this answer.",
|
||||
"channel": "telegram",
|
||||
"created_at": "2026-04-14T14:37:42.381229+00:00",
|
||||
"user_id": "oza"
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "",
|
||||
"environment_summary": "",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-telegram-pref",
|
||||
"type": "memory",
|
||||
"title": "Telegram preference",
|
||||
"summary": "Prefer plain text on Telegram.",
|
||||
"triggers": [
|
||||
"telegram",
|
||||
"preference",
|
||||
"answer"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.9,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"output"
|
||||
],
|
||||
"source": "user",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"skill": [
|
||||
{
|
||||
"id": "skill-deploy",
|
||||
"type": "skill",
|
||||
"title": "Deploy workflow",
|
||||
"summary": "Reusable deployment workflow.",
|
||||
"triggers": [
|
||||
"deploy",
|
||||
"workflow",
|
||||
"service"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.8,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.8,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"ops"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-terminal",
|
||||
"type": "tool",
|
||||
"title": "terminal",
|
||||
"summary": "Run terminal-style inspection commands.",
|
||||
"triggers": [
|
||||
"check",
|
||||
"current",
|
||||
"status",
|
||||
"system"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"inspection"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "load_skill",
|
||||
"selected_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Predicted by learning router (score=1897.615).",
|
||||
"estimated_cost": 0.0
|
||||
}
|
||||
],
|
||||
"events": [
|
||||
{
|
||||
"event_id": "evt-01cb59f2-27b0-4be7-b0f9-c878634363ba",
|
||||
"trajectory_id": "traj-80784ce5-fc14-4fee-9f5f-90dcec26179b",
|
||||
"timestamp": "2026-04-14T14:37:42.381299+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "task_received",
|
||||
"payload": {
|
||||
"input": "Use my telegram preference for this answer."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-4281fe16-c753-4024-a0ff-e82f518e16dc",
|
||||
"trajectory_id": "traj-80784ce5-fc14-4fee-9f5f-90dcec26179b",
|
||||
"timestamp": "2026-04-14T14:37:42.381305+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "candidates_recalled",
|
||||
"payload": {
|
||||
"memory_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"skill_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"tool_ids": [
|
||||
"tool-terminal"
|
||||
]
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-01cb59f2-27b0-4be7-b0f9-c878634363ba"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-ccc6afd4-82c2-4774-ba2a-732ffa9296a4",
|
||||
"trajectory_id": "traj-80784ce5-fc14-4fee-9f5f-90dcec26179b",
|
||||
"timestamp": "2026-04-14T14:37:42.381309+00:00",
|
||||
"stage": "policy",
|
||||
"event_type": "action_selected",
|
||||
"payload": {
|
||||
"step": 1,
|
||||
"decision_type": "load_skill",
|
||||
"selected_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Predicted by learning router (score=1897.615).",
|
||||
"estimated_cost": 0.0
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-4281fe16-c753-4024-a0ff-e82f518e16dc"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-skill-traj-80784ce5-fc14-4fee-9f5f-90dcec26179b-skill-deploy",
|
||||
"trajectory_id": "traj-80784ce5-fc14-4fee-9f5f-90dcec26179b",
|
||||
"timestamp": "2026-04-14T14:37:42.381314+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "skill_loaded",
|
||||
"payload": {
|
||||
"skill_id": "skill-deploy",
|
||||
"input": "Use my telegram preference for this answer.",
|
||||
"instructions": "Demo skill payload loaded successfully."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
}
|
||||
],
|
||||
"outcome": {
|
||||
"status": "success",
|
||||
"steps": 1,
|
||||
"latency_ms": 0,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Draft trajectory generated by MemabraRunner with execution hooks."
|
||||
},
|
||||
"reward": {
|
||||
"total": 1.04,
|
||||
"components": {
|
||||
"task_success": 0.8,
|
||||
"retrieval_hit": 0.2,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.0,
|
||||
"context_cost": 0.06,
|
||||
"useful_reuse": 0.1
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,191 @@
|
||||
{
|
||||
"trajectory_id": "traj-819443a2-79ea-48b7-a543-8bb7356dba36",
|
||||
"task": {
|
||||
"task_id": "task-8e991184-4d09-47bd-9a70-2f3d591d875c",
|
||||
"input": "Use my telegram preference for this answer.",
|
||||
"channel": "telegram",
|
||||
"created_at": "2026-04-14T14:37:42.377206+00:00",
|
||||
"user_id": "oza"
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "",
|
||||
"environment_summary": "",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-telegram-pref",
|
||||
"type": "memory",
|
||||
"title": "Telegram preference",
|
||||
"summary": "Prefer plain text on Telegram.",
|
||||
"triggers": [
|
||||
"telegram",
|
||||
"preference",
|
||||
"answer"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.9,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"output"
|
||||
],
|
||||
"source": "user",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"skill": [
|
||||
{
|
||||
"id": "skill-deploy",
|
||||
"type": "skill",
|
||||
"title": "Deploy workflow",
|
||||
"summary": "Reusable deployment workflow.",
|
||||
"triggers": [
|
||||
"deploy",
|
||||
"workflow",
|
||||
"service"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.8,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.8,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"ops"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-terminal",
|
||||
"type": "tool",
|
||||
"title": "terminal",
|
||||
"summary": "Run terminal-style inspection commands.",
|
||||
"triggers": [
|
||||
"check",
|
||||
"current",
|
||||
"status",
|
||||
"system"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"inspection"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "inject_memory",
|
||||
"selected_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task likely depends on stable user/project facts.",
|
||||
"estimated_cost": 0.0
|
||||
}
|
||||
],
|
||||
"events": [
|
||||
{
|
||||
"event_id": "evt-79db8272-c394-40e1-b0d3-c905c305ea26",
|
||||
"trajectory_id": "traj-819443a2-79ea-48b7-a543-8bb7356dba36",
|
||||
"timestamp": "2026-04-14T14:37:42.377281+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "task_received",
|
||||
"payload": {
|
||||
"input": "Use my telegram preference for this answer."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-22367f19-007b-49cd-9ac4-30bbcc77e8a2",
|
||||
"trajectory_id": "traj-819443a2-79ea-48b7-a543-8bb7356dba36",
|
||||
"timestamp": "2026-04-14T14:37:42.377287+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "candidates_recalled",
|
||||
"payload": {
|
||||
"memory_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"skill_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"tool_ids": [
|
||||
"tool-terminal"
|
||||
]
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-79db8272-c394-40e1-b0d3-c905c305ea26"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-84fe05fe-8ccc-4782-8cd2-28d56a659658",
|
||||
"trajectory_id": "traj-819443a2-79ea-48b7-a543-8bb7356dba36",
|
||||
"timestamp": "2026-04-14T14:37:42.377292+00:00",
|
||||
"stage": "policy",
|
||||
"event_type": "action_selected",
|
||||
"payload": {
|
||||
"step": 1,
|
||||
"decision_type": "inject_memory",
|
||||
"selected_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task likely depends on stable user/project facts.",
|
||||
"estimated_cost": 0.0
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-22367f19-007b-49cd-9ac4-30bbcc77e8a2"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-memory-traj-819443a2-79ea-48b7-a543-8bb7356dba36-mem-telegram-pref",
|
||||
"trajectory_id": "traj-819443a2-79ea-48b7-a543-8bb7356dba36",
|
||||
"timestamp": "2026-04-14T14:37:42.377297+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "memory_injected",
|
||||
"payload": {
|
||||
"record_id": "mem-telegram-pref",
|
||||
"input": "Use my telegram preference for this answer."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
}
|
||||
],
|
||||
"outcome": {
|
||||
"status": "success",
|
||||
"steps": 1,
|
||||
"latency_ms": 0,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Draft trajectory generated by MemabraRunner with execution hooks."
|
||||
},
|
||||
"reward": {
|
||||
"total": 1.04,
|
||||
"components": {
|
||||
"task_success": 0.8,
|
||||
"retrieval_hit": 0.2,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.0,
|
||||
"context_cost": 0.06,
|
||||
"useful_reuse": 0.1
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,192 @@
|
||||
{
|
||||
"trajectory_id": "traj-9144cbc3-1ccf-4660-aad9-8db5797461eb",
|
||||
"task": {
|
||||
"task_id": "task-57677ff6-710a-478e-9a5d-e1367db05212",
|
||||
"input": "Deploy this service with the usual workflow.",
|
||||
"channel": "telegram",
|
||||
"created_at": "2026-04-14T15:27:38.514525+00:00",
|
||||
"user_id": "oza"
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "",
|
||||
"environment_summary": "",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-telegram-pref",
|
||||
"type": "memory",
|
||||
"title": "Telegram preference",
|
||||
"summary": "Prefer plain text on Telegram.",
|
||||
"triggers": [
|
||||
"telegram",
|
||||
"preference",
|
||||
"answer"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.9,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"output"
|
||||
],
|
||||
"source": "user",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"skill": [
|
||||
{
|
||||
"id": "skill-deploy",
|
||||
"type": "skill",
|
||||
"title": "Deploy workflow",
|
||||
"summary": "Reusable deployment workflow.",
|
||||
"triggers": [
|
||||
"deploy",
|
||||
"workflow",
|
||||
"service"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.8,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.8,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"ops"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-terminal",
|
||||
"type": "tool",
|
||||
"title": "terminal",
|
||||
"summary": "Run terminal-style inspection commands.",
|
||||
"triggers": [
|
||||
"check",
|
||||
"current",
|
||||
"status",
|
||||
"system"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"inspection"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "load_skill",
|
||||
"selected_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task resembles a reusable procedure; load a skill before action.",
|
||||
"estimated_cost": 0.0
|
||||
}
|
||||
],
|
||||
"events": [
|
||||
{
|
||||
"event_id": "evt-fce4e540-2400-45f8-8050-50f7631422e4",
|
||||
"trajectory_id": "traj-9144cbc3-1ccf-4660-aad9-8db5797461eb",
|
||||
"timestamp": "2026-04-14T15:27:38.514602+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "task_received",
|
||||
"payload": {
|
||||
"input": "Deploy this service with the usual workflow."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-eef1d203-79d4-4037-ae0b-6dff74e035f5",
|
||||
"trajectory_id": "traj-9144cbc3-1ccf-4660-aad9-8db5797461eb",
|
||||
"timestamp": "2026-04-14T15:27:38.514609+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "candidates_recalled",
|
||||
"payload": {
|
||||
"memory_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"skill_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"tool_ids": [
|
||||
"tool-terminal"
|
||||
]
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-fce4e540-2400-45f8-8050-50f7631422e4"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-da150fe5-beff-45b0-a67d-9860205a9690",
|
||||
"trajectory_id": "traj-9144cbc3-1ccf-4660-aad9-8db5797461eb",
|
||||
"timestamp": "2026-04-14T15:27:38.514615+00:00",
|
||||
"stage": "policy",
|
||||
"event_type": "action_selected",
|
||||
"payload": {
|
||||
"step": 1,
|
||||
"decision_type": "load_skill",
|
||||
"selected_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task resembles a reusable procedure; load a skill before action.",
|
||||
"estimated_cost": 0.0
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-eef1d203-79d4-4037-ae0b-6dff74e035f5"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-skill-traj-9144cbc3-1ccf-4660-aad9-8db5797461eb-skill-deploy",
|
||||
"trajectory_id": "traj-9144cbc3-1ccf-4660-aad9-8db5797461eb",
|
||||
"timestamp": "2026-04-14T15:27:38.514623+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "skill_loaded",
|
||||
"payload": {
|
||||
"skill_id": "skill-deploy",
|
||||
"input": "Deploy this service with the usual workflow.",
|
||||
"instructions": "Demo skill payload loaded successfully."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
}
|
||||
],
|
||||
"outcome": {
|
||||
"status": "success",
|
||||
"steps": 1,
|
||||
"latency_ms": 0,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Draft trajectory generated by MemabraRunner with execution hooks."
|
||||
},
|
||||
"reward": {
|
||||
"total": 1.04,
|
||||
"components": {
|
||||
"task_success": 0.8,
|
||||
"retrieval_hit": 0.2,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.0,
|
||||
"context_cost": 0.06,
|
||||
"useful_reuse": 0.1
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,191 @@
|
||||
{
|
||||
"trajectory_id": "traj-9190707c-5486-4266-a6c8-32f34c6c63ec",
|
||||
"task": {
|
||||
"task_id": "task-9f58c7ff-0bfb-4a46-bfbc-94b72b454f44",
|
||||
"input": "Use my telegram preference for this answer.",
|
||||
"channel": "telegram",
|
||||
"created_at": "2026-04-14T14:37:42.379938+00:00",
|
||||
"user_id": "oza"
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "",
|
||||
"environment_summary": "",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-telegram-pref",
|
||||
"type": "memory",
|
||||
"title": "Telegram preference",
|
||||
"summary": "Prefer plain text on Telegram.",
|
||||
"triggers": [
|
||||
"telegram",
|
||||
"preference",
|
||||
"answer"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.9,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"output"
|
||||
],
|
||||
"source": "user",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"skill": [
|
||||
{
|
||||
"id": "skill-deploy",
|
||||
"type": "skill",
|
||||
"title": "Deploy workflow",
|
||||
"summary": "Reusable deployment workflow.",
|
||||
"triggers": [
|
||||
"deploy",
|
||||
"workflow",
|
||||
"service"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.8,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.8,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"ops"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-terminal",
|
||||
"type": "tool",
|
||||
"title": "terminal",
|
||||
"summary": "Run terminal-style inspection commands.",
|
||||
"triggers": [
|
||||
"check",
|
||||
"current",
|
||||
"status",
|
||||
"system"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"inspection"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "inject_memory",
|
||||
"selected_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task likely depends on stable user/project facts.",
|
||||
"estimated_cost": 0.0
|
||||
}
|
||||
],
|
||||
"events": [
|
||||
{
|
||||
"event_id": "evt-1e3b099e-dc40-45e4-9710-3d7f96dc459c",
|
||||
"trajectory_id": "traj-9190707c-5486-4266-a6c8-32f34c6c63ec",
|
||||
"timestamp": "2026-04-14T14:37:42.379999+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "task_received",
|
||||
"payload": {
|
||||
"input": "Use my telegram preference for this answer."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-95d17ab2-a0af-44e6-97db-55600c5d0517",
|
||||
"trajectory_id": "traj-9190707c-5486-4266-a6c8-32f34c6c63ec",
|
||||
"timestamp": "2026-04-14T14:37:42.380024+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "candidates_recalled",
|
||||
"payload": {
|
||||
"memory_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"skill_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"tool_ids": [
|
||||
"tool-terminal"
|
||||
]
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-1e3b099e-dc40-45e4-9710-3d7f96dc459c"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-cef79d76-9bcf-41c7-a430-13e18d46e95f",
|
||||
"trajectory_id": "traj-9190707c-5486-4266-a6c8-32f34c6c63ec",
|
||||
"timestamp": "2026-04-14T14:37:42.380029+00:00",
|
||||
"stage": "policy",
|
||||
"event_type": "action_selected",
|
||||
"payload": {
|
||||
"step": 1,
|
||||
"decision_type": "inject_memory",
|
||||
"selected_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task likely depends on stable user/project facts.",
|
||||
"estimated_cost": 0.0
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-95d17ab2-a0af-44e6-97db-55600c5d0517"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-memory-traj-9190707c-5486-4266-a6c8-32f34c6c63ec-mem-telegram-pref",
|
||||
"trajectory_id": "traj-9190707c-5486-4266-a6c8-32f34c6c63ec",
|
||||
"timestamp": "2026-04-14T14:37:42.380034+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "memory_injected",
|
||||
"payload": {
|
||||
"record_id": "mem-telegram-pref",
|
||||
"input": "Use my telegram preference for this answer."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
}
|
||||
],
|
||||
"outcome": {
|
||||
"status": "success",
|
||||
"steps": 1,
|
||||
"latency_ms": 0,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Draft trajectory generated by MemabraRunner with execution hooks."
|
||||
},
|
||||
"reward": {
|
||||
"total": 1.04,
|
||||
"components": {
|
||||
"task_success": 0.8,
|
||||
"retrieval_hit": 0.2,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.0,
|
||||
"context_cost": 0.06,
|
||||
"useful_reuse": 0.1
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,207 @@
|
||||
{
|
||||
"trajectory_id": "traj-9edc5088-09cc-42d6-a160-cede5357f535",
|
||||
"task": {
|
||||
"task_id": "task-18b8251b-4a68-45e1-93ba-645fe21a279f",
|
||||
"input": "Run the deploy workflow skill.",
|
||||
"channel": "local",
|
||||
"created_at": "2026-04-14T15:52:24.603850+00:00",
|
||||
"user_id": null
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "",
|
||||
"environment_summary": "",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-telegram-pref",
|
||||
"type": "memory",
|
||||
"title": "Telegram preference",
|
||||
"summary": "Prefer plain text on Telegram.",
|
||||
"triggers": [
|
||||
"telegram",
|
||||
"preference",
|
||||
"answer"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.9,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"output"
|
||||
],
|
||||
"source": "user",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"skill": [
|
||||
{
|
||||
"id": "skill-deploy",
|
||||
"type": "skill",
|
||||
"title": "Deploy workflow",
|
||||
"summary": "Reusable deployment workflow.",
|
||||
"triggers": [
|
||||
"deploy",
|
||||
"workflow",
|
||||
"service"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.8,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.8,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"ops"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-terminal",
|
||||
"type": "tool",
|
||||
"title": "terminal",
|
||||
"summary": "Run terminal-style inspection commands.",
|
||||
"triggers": [
|
||||
"check",
|
||||
"current",
|
||||
"status",
|
||||
"system"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"inspection"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "call_tool",
|
||||
"selected_ids": [
|
||||
"tool-terminal"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task asks for current state or external action; tool use is justified.",
|
||||
"estimated_cost": 0.0
|
||||
}
|
||||
],
|
||||
"events": [
|
||||
{
|
||||
"event_id": "evt-33bd0017-cf1b-44ac-892b-c2004bc44c1a",
|
||||
"trajectory_id": "traj-9edc5088-09cc-42d6-a160-cede5357f535",
|
||||
"timestamp": "2026-04-14T15:52:24.603951+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "task_received",
|
||||
"payload": {
|
||||
"input": "Run the deploy workflow skill."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-de288e29-228d-46d4-a657-34edae35fea4",
|
||||
"trajectory_id": "traj-9edc5088-09cc-42d6-a160-cede5357f535",
|
||||
"timestamp": "2026-04-14T15:52:24.603961+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "candidates_recalled",
|
||||
"payload": {
|
||||
"memory_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"skill_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"tool_ids": [
|
||||
"tool-terminal"
|
||||
]
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-33bd0017-cf1b-44ac-892b-c2004bc44c1a"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-256cd272-bcee-48e2-b36b-a4048b6aef3e",
|
||||
"trajectory_id": "traj-9edc5088-09cc-42d6-a160-cede5357f535",
|
||||
"timestamp": "2026-04-14T15:52:24.603968+00:00",
|
||||
"stage": "policy",
|
||||
"event_type": "action_selected",
|
||||
"payload": {
|
||||
"step": 1,
|
||||
"decision_type": "call_tool",
|
||||
"selected_ids": [
|
||||
"tool-terminal"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task asks for current state or external action; tool use is justified.",
|
||||
"estimated_cost": 0.0
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-de288e29-228d-46d4-a657-34edae35fea4"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-tool-traj-9edc5088-09cc-42d6-a160-cede5357f535-tool-terminal",
|
||||
"trajectory_id": "traj-9edc5088-09cc-42d6-a160-cede5357f535",
|
||||
"timestamp": "2026-04-14T15:52:24.603984+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "tool_called",
|
||||
"payload": {
|
||||
"tool_id": "tool-terminal",
|
||||
"input": "Run the deploy workflow skill."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-tool-result-traj-9edc5088-09cc-42d6-a160-cede5357f535-tool-terminal",
|
||||
"trajectory_id": "traj-9edc5088-09cc-42d6-a160-cede5357f535",
|
||||
"timestamp": "2026-04-14T15:52:24.603990+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "tool_result",
|
||||
"payload": {
|
||||
"tool_id": "tool-terminal",
|
||||
"status": "success",
|
||||
"output": "demo-result-for:tool-terminal",
|
||||
"error": null,
|
||||
"latency_ms": 42
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
}
|
||||
],
|
||||
"outcome": {
|
||||
"status": "success",
|
||||
"steps": 1,
|
||||
"latency_ms": 42,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Draft trajectory generated by MemabraRunner with execution hooks."
|
||||
},
|
||||
"reward": {
|
||||
"total": 1.032,
|
||||
"components": {
|
||||
"task_success": 0.8,
|
||||
"retrieval_hit": 0.25,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.008,
|
||||
"context_cost": 0.06,
|
||||
"useful_reuse": 0.05
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,191 @@
|
||||
{
|
||||
"trajectory_id": "traj-adb05c91-4c0c-493a-af84-517efea3f406",
|
||||
"task": {
|
||||
"task_id": "task-66d9a459-4bad-40a5-beda-a9cb30f2e790",
|
||||
"input": "Use my telegram preference for this answer.",
|
||||
"channel": "telegram",
|
||||
"created_at": "2026-04-14T15:27:38.517870+00:00",
|
||||
"user_id": "oza"
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "",
|
||||
"environment_summary": "",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-telegram-pref",
|
||||
"type": "memory",
|
||||
"title": "Telegram preference",
|
||||
"summary": "Prefer plain text on Telegram.",
|
||||
"triggers": [
|
||||
"telegram",
|
||||
"preference",
|
||||
"answer"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.9,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"output"
|
||||
],
|
||||
"source": "user",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"skill": [
|
||||
{
|
||||
"id": "skill-deploy",
|
||||
"type": "skill",
|
||||
"title": "Deploy workflow",
|
||||
"summary": "Reusable deployment workflow.",
|
||||
"triggers": [
|
||||
"deploy",
|
||||
"workflow",
|
||||
"service"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.8,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.8,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"ops"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-terminal",
|
||||
"type": "tool",
|
||||
"title": "terminal",
|
||||
"summary": "Run terminal-style inspection commands.",
|
||||
"triggers": [
|
||||
"check",
|
||||
"current",
|
||||
"status",
|
||||
"system"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"inspection"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "inject_memory",
|
||||
"selected_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task likely depends on stable user/project facts.",
|
||||
"estimated_cost": 0.0
|
||||
}
|
||||
],
|
||||
"events": [
|
||||
{
|
||||
"event_id": "evt-57882c3b-f081-4cb6-b622-98594bfd7b82",
|
||||
"trajectory_id": "traj-adb05c91-4c0c-493a-af84-517efea3f406",
|
||||
"timestamp": "2026-04-14T15:27:38.517938+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "task_received",
|
||||
"payload": {
|
||||
"input": "Use my telegram preference for this answer."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-6eafb4ae-7960-4f17-a928-77834f432cbb",
|
||||
"trajectory_id": "traj-adb05c91-4c0c-493a-af84-517efea3f406",
|
||||
"timestamp": "2026-04-14T15:27:38.517945+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "candidates_recalled",
|
||||
"payload": {
|
||||
"memory_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"skill_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"tool_ids": [
|
||||
"tool-terminal"
|
||||
]
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-57882c3b-f081-4cb6-b622-98594bfd7b82"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-b90de7a7-83a3-4bed-b63d-bf07ba3fc06a",
|
||||
"trajectory_id": "traj-adb05c91-4c0c-493a-af84-517efea3f406",
|
||||
"timestamp": "2026-04-14T15:27:38.517950+00:00",
|
||||
"stage": "policy",
|
||||
"event_type": "action_selected",
|
||||
"payload": {
|
||||
"step": 1,
|
||||
"decision_type": "inject_memory",
|
||||
"selected_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task likely depends on stable user/project facts.",
|
||||
"estimated_cost": 0.0
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-6eafb4ae-7960-4f17-a928-77834f432cbb"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-memory-traj-adb05c91-4c0c-493a-af84-517efea3f406-mem-telegram-pref",
|
||||
"trajectory_id": "traj-adb05c91-4c0c-493a-af84-517efea3f406",
|
||||
"timestamp": "2026-04-14T15:27:38.517955+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "memory_injected",
|
||||
"payload": {
|
||||
"record_id": "mem-telegram-pref",
|
||||
"input": "Use my telegram preference for this answer."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
}
|
||||
],
|
||||
"outcome": {
|
||||
"status": "success",
|
||||
"steps": 1,
|
||||
"latency_ms": 0,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Draft trajectory generated by MemabraRunner with execution hooks."
|
||||
},
|
||||
"reward": {
|
||||
"total": 1.04,
|
||||
"components": {
|
||||
"task_success": 0.8,
|
||||
"retrieval_hit": 0.2,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.0,
|
||||
"context_cost": 0.06,
|
||||
"useful_reuse": 0.1
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,186 @@
|
||||
{
|
||||
"trajectory_id": "traj-affbeb5b-eb52-40fd-94cb-48b7c374f1fc",
|
||||
"task": {
|
||||
"task_id": "task-c88d23cc-88f6-4352-a506-e37187a0e28a",
|
||||
"input": "Deploy this service with the usual workflow.",
|
||||
"channel": "telegram",
|
||||
"created_at": "2026-04-14T06:53:08.732451+00:00",
|
||||
"user_id": "oza"
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "",
|
||||
"environment_summary": "",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-telegram-pref",
|
||||
"type": "memory",
|
||||
"title": "Telegram preference",
|
||||
"summary": "Prefer plain text on Telegram.",
|
||||
"triggers": [
|
||||
"telegram",
|
||||
"preference",
|
||||
"answer"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.9,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"output"
|
||||
],
|
||||
"source": "user",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"skill": [
|
||||
{
|
||||
"id": "skill-deploy",
|
||||
"type": "skill",
|
||||
"title": "Deploy workflow",
|
||||
"summary": "Reusable deployment workflow.",
|
||||
"triggers": [
|
||||
"deploy",
|
||||
"workflow",
|
||||
"service"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.8,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.8,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"ops"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-terminal",
|
||||
"type": "tool",
|
||||
"title": "terminal",
|
||||
"summary": "Run terminal-style inspection commands.",
|
||||
"triggers": [
|
||||
"check",
|
||||
"current",
|
||||
"status",
|
||||
"system"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"inspection"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "load_skill",
|
||||
"selected_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task resembles a reusable procedure; load a skill before action.",
|
||||
"estimated_cost": 0.0
|
||||
}
|
||||
],
|
||||
"events": [
|
||||
{
|
||||
"event_id": "evt-56b47bb2-7cd9-4d1a-9364-b2b6c2b82759",
|
||||
"trajectory_id": "traj-affbeb5b-eb52-40fd-94cb-48b7c374f1fc",
|
||||
"timestamp": "2026-04-14T06:53:08.732515+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "task_received",
|
||||
"payload": {
|
||||
"input": "Deploy this service with the usual workflow."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-62bc72e7-4b3f-4a72-a98e-1ad5bf86aaa4",
|
||||
"trajectory_id": "traj-affbeb5b-eb52-40fd-94cb-48b7c374f1fc",
|
||||
"timestamp": "2026-04-14T06:53:08.732521+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "candidates_recalled",
|
||||
"payload": {
|
||||
"memory_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"skill_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"tool_ids": [
|
||||
"tool-terminal"
|
||||
]
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-56b47bb2-7cd9-4d1a-9364-b2b6c2b82759"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-23968c32-845c-4fb2-86bb-723d70dfec80",
|
||||
"trajectory_id": "traj-affbeb5b-eb52-40fd-94cb-48b7c374f1fc",
|
||||
"timestamp": "2026-04-14T06:53:08.732525+00:00",
|
||||
"stage": "policy",
|
||||
"event_type": "action_selected",
|
||||
"payload": {
|
||||
"step": 1,
|
||||
"decision_type": "load_skill",
|
||||
"selected_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task resembles a reusable procedure; load a skill before action.",
|
||||
"estimated_cost": 0.0
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-62bc72e7-4b3f-4a72-a98e-1ad5bf86aaa4"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-skill-traj-affbeb5b-eb52-40fd-94cb-48b7c374f1fc-skill-deploy",
|
||||
"trajectory_id": "traj-affbeb5b-eb52-40fd-94cb-48b7c374f1fc",
|
||||
"timestamp": "2026-04-14T06:53:08.732531+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "skill_loaded",
|
||||
"payload": {
|
||||
"skill_id": "skill-deploy",
|
||||
"input": "Deploy this service with the usual workflow.",
|
||||
"instructions": "Demo skill payload loaded successfully."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
}
|
||||
],
|
||||
"outcome": {
|
||||
"status": "success",
|
||||
"steps": 1,
|
||||
"latency_ms": 0,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Draft trajectory generated by MemabraRunner with execution hooks."
|
||||
},
|
||||
"reward": {
|
||||
"total": 1.1,
|
||||
"components": {
|
||||
"task_success": 0.8,
|
||||
"retrieval_hit": 0.2,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.0,
|
||||
"context_cost": 0.0,
|
||||
"useful_reuse": 0.1
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,191 @@
|
||||
{
|
||||
"trajectory_id": "traj-b786c15f-388d-4228-9da4-c9e82b61570a",
|
||||
"task": {
|
||||
"task_id": "task-920b26df-8e03-47b3-af48-99454d142e90",
|
||||
"input": "Recall my saved preference from memory.",
|
||||
"channel": "local",
|
||||
"created_at": "2026-04-14T15:52:24.603298+00:00",
|
||||
"user_id": null
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "",
|
||||
"environment_summary": "",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-telegram-pref",
|
||||
"type": "memory",
|
||||
"title": "Telegram preference",
|
||||
"summary": "Prefer plain text on Telegram.",
|
||||
"triggers": [
|
||||
"telegram",
|
||||
"preference",
|
||||
"answer"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.9,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"output"
|
||||
],
|
||||
"source": "user",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"skill": [
|
||||
{
|
||||
"id": "skill-deploy",
|
||||
"type": "skill",
|
||||
"title": "Deploy workflow",
|
||||
"summary": "Reusable deployment workflow.",
|
||||
"triggers": [
|
||||
"deploy",
|
||||
"workflow",
|
||||
"service"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.8,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.8,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"ops"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-terminal",
|
||||
"type": "tool",
|
||||
"title": "terminal",
|
||||
"summary": "Run terminal-style inspection commands.",
|
||||
"triggers": [
|
||||
"check",
|
||||
"current",
|
||||
"status",
|
||||
"system"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"inspection"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "inject_memory",
|
||||
"selected_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task likely depends on stable user/project facts.",
|
||||
"estimated_cost": 0.0
|
||||
}
|
||||
],
|
||||
"events": [
|
||||
{
|
||||
"event_id": "evt-795ad519-4e78-4fdd-b1a9-3e1e2b2cdea0",
|
||||
"trajectory_id": "traj-b786c15f-388d-4228-9da4-c9e82b61570a",
|
||||
"timestamp": "2026-04-14T15:52:24.603384+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "task_received",
|
||||
"payload": {
|
||||
"input": "Recall my saved preference from memory."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-1fbe3cfc-ed78-40f6-b0d9-25ccd14a0110",
|
||||
"trajectory_id": "traj-b786c15f-388d-4228-9da4-c9e82b61570a",
|
||||
"timestamp": "2026-04-14T15:52:24.603390+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "candidates_recalled",
|
||||
"payload": {
|
||||
"memory_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"skill_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"tool_ids": [
|
||||
"tool-terminal"
|
||||
]
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-795ad519-4e78-4fdd-b1a9-3e1e2b2cdea0"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-a57f0922-dbfe-424a-a704-2a382ffa219b",
|
||||
"trajectory_id": "traj-b786c15f-388d-4228-9da4-c9e82b61570a",
|
||||
"timestamp": "2026-04-14T15:52:24.603396+00:00",
|
||||
"stage": "policy",
|
||||
"event_type": "action_selected",
|
||||
"payload": {
|
||||
"step": 1,
|
||||
"decision_type": "inject_memory",
|
||||
"selected_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task likely depends on stable user/project facts.",
|
||||
"estimated_cost": 0.0
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-1fbe3cfc-ed78-40f6-b0d9-25ccd14a0110"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-memory-traj-b786c15f-388d-4228-9da4-c9e82b61570a-mem-telegram-pref",
|
||||
"trajectory_id": "traj-b786c15f-388d-4228-9da4-c9e82b61570a",
|
||||
"timestamp": "2026-04-14T15:52:24.603401+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "memory_injected",
|
||||
"payload": {
|
||||
"record_id": "mem-telegram-pref",
|
||||
"input": "Recall my saved preference from memory."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
}
|
||||
],
|
||||
"outcome": {
|
||||
"status": "success",
|
||||
"steps": 1,
|
||||
"latency_ms": 0,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Draft trajectory generated by MemabraRunner with execution hooks."
|
||||
},
|
||||
"reward": {
|
||||
"total": 1.04,
|
||||
"components": {
|
||||
"task_success": 0.8,
|
||||
"retrieval_hit": 0.2,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.0,
|
||||
"context_cost": 0.06,
|
||||
"useful_reuse": 0.1
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,192 @@
|
||||
{
|
||||
"trajectory_id": "traj-bcad8fa2-ffd3-4e5b-9ddb-720f3898826e",
|
||||
"task": {
|
||||
"task_id": "task-35b31642-86af-4e2c-a255-cdbe19659101",
|
||||
"input": "Deploy this service with the usual workflow.",
|
||||
"channel": "local",
|
||||
"created_at": "2026-04-14T14:37:42.382074+00:00",
|
||||
"user_id": "oza"
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "",
|
||||
"environment_summary": "",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-telegram-pref",
|
||||
"type": "memory",
|
||||
"title": "Telegram preference",
|
||||
"summary": "Prefer plain text on Telegram.",
|
||||
"triggers": [
|
||||
"telegram",
|
||||
"preference",
|
||||
"answer"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.9,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"output"
|
||||
],
|
||||
"source": "user",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"skill": [
|
||||
{
|
||||
"id": "skill-deploy",
|
||||
"type": "skill",
|
||||
"title": "Deploy workflow",
|
||||
"summary": "Reusable deployment workflow.",
|
||||
"triggers": [
|
||||
"deploy",
|
||||
"workflow",
|
||||
"service"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.8,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.8,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"ops"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-terminal",
|
||||
"type": "tool",
|
||||
"title": "terminal",
|
||||
"summary": "Run terminal-style inspection commands.",
|
||||
"triggers": [
|
||||
"check",
|
||||
"current",
|
||||
"status",
|
||||
"system"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"inspection"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "load_skill",
|
||||
"selected_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Predicted by learning router (score=1941.615).",
|
||||
"estimated_cost": 0.0
|
||||
}
|
||||
],
|
||||
"events": [
|
||||
{
|
||||
"event_id": "evt-8f072b70-4161-46fc-bede-cceb930d4cc2",
|
||||
"trajectory_id": "traj-bcad8fa2-ffd3-4e5b-9ddb-720f3898826e",
|
||||
"timestamp": "2026-04-14T14:37:42.382140+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "task_received",
|
||||
"payload": {
|
||||
"input": "Deploy this service with the usual workflow."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-fc15daf4-f738-455e-8b12-39143b3c3d6c",
|
||||
"trajectory_id": "traj-bcad8fa2-ffd3-4e5b-9ddb-720f3898826e",
|
||||
"timestamp": "2026-04-14T14:37:42.382146+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "candidates_recalled",
|
||||
"payload": {
|
||||
"memory_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"skill_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"tool_ids": [
|
||||
"tool-terminal"
|
||||
]
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-8f072b70-4161-46fc-bede-cceb930d4cc2"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-d899c751-6157-4548-893e-b766eeafeb3d",
|
||||
"trajectory_id": "traj-bcad8fa2-ffd3-4e5b-9ddb-720f3898826e",
|
||||
"timestamp": "2026-04-14T14:37:42.382150+00:00",
|
||||
"stage": "policy",
|
||||
"event_type": "action_selected",
|
||||
"payload": {
|
||||
"step": 1,
|
||||
"decision_type": "load_skill",
|
||||
"selected_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Predicted by learning router (score=1941.615).",
|
||||
"estimated_cost": 0.0
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-fc15daf4-f738-455e-8b12-39143b3c3d6c"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-skill-traj-bcad8fa2-ffd3-4e5b-9ddb-720f3898826e-skill-deploy",
|
||||
"trajectory_id": "traj-bcad8fa2-ffd3-4e5b-9ddb-720f3898826e",
|
||||
"timestamp": "2026-04-14T14:37:42.382155+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "skill_loaded",
|
||||
"payload": {
|
||||
"skill_id": "skill-deploy",
|
||||
"input": "Deploy this service with the usual workflow.",
|
||||
"instructions": "Demo skill payload loaded successfully."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
}
|
||||
],
|
||||
"outcome": {
|
||||
"status": "success",
|
||||
"steps": 1,
|
||||
"latency_ms": 0,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Draft trajectory generated by MemabraRunner with execution hooks."
|
||||
},
|
||||
"reward": {
|
||||
"total": 1.04,
|
||||
"components": {
|
||||
"task_success": 0.8,
|
||||
"retrieval_hit": 0.2,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.0,
|
||||
"context_cost": 0.06,
|
||||
"useful_reuse": 0.1
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,207 @@
|
||||
{
|
||||
"trajectory_id": "traj-c0faa5d1-dcb4-4e86-ac6b-2abb15026f43",
|
||||
"task": {
|
||||
"task_id": "task-1a24d0bb-b2e6-44f0-8095-2ed74368dc9d",
|
||||
"input": "Run the deploy workflow skill.",
|
||||
"channel": "local",
|
||||
"created_at": "2026-04-14T16:50:18.861760+00:00",
|
||||
"user_id": null
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "",
|
||||
"environment_summary": "",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-telegram-pref",
|
||||
"type": "memory",
|
||||
"title": "Telegram preference",
|
||||
"summary": "Prefer plain text on Telegram.",
|
||||
"triggers": [
|
||||
"telegram",
|
||||
"preference",
|
||||
"answer"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.9,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"output"
|
||||
],
|
||||
"source": "user",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"skill": [
|
||||
{
|
||||
"id": "skill-deploy",
|
||||
"type": "skill",
|
||||
"title": "Deploy workflow",
|
||||
"summary": "Reusable deployment workflow.",
|
||||
"triggers": [
|
||||
"deploy",
|
||||
"workflow",
|
||||
"service"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.8,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.8,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"ops"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-terminal",
|
||||
"type": "tool",
|
||||
"title": "terminal",
|
||||
"summary": "Run terminal-style inspection commands.",
|
||||
"triggers": [
|
||||
"check",
|
||||
"current",
|
||||
"status",
|
||||
"system"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"inspection"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "call_tool",
|
||||
"selected_ids": [
|
||||
"tool-terminal"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task asks for current state or external action; tool use is justified.",
|
||||
"estimated_cost": 0.0
|
||||
}
|
||||
],
|
||||
"events": [
|
||||
{
|
||||
"event_id": "evt-b4437076-cc94-4903-a2c7-3dd7c644dcc5",
|
||||
"trajectory_id": "traj-c0faa5d1-dcb4-4e86-ac6b-2abb15026f43",
|
||||
"timestamp": "2026-04-14T16:50:18.861861+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "task_received",
|
||||
"payload": {
|
||||
"input": "Run the deploy workflow skill."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-dd3dad15-7ace-47f7-9dd0-cf4955aa16ec",
|
||||
"trajectory_id": "traj-c0faa5d1-dcb4-4e86-ac6b-2abb15026f43",
|
||||
"timestamp": "2026-04-14T16:50:18.861871+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "candidates_recalled",
|
||||
"payload": {
|
||||
"memory_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"skill_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"tool_ids": [
|
||||
"tool-terminal"
|
||||
]
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-b4437076-cc94-4903-a2c7-3dd7c644dcc5"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-c3b04a4a-2506-47db-8d08-c8939c0eba08",
|
||||
"trajectory_id": "traj-c0faa5d1-dcb4-4e86-ac6b-2abb15026f43",
|
||||
"timestamp": "2026-04-14T16:50:18.861878+00:00",
|
||||
"stage": "policy",
|
||||
"event_type": "action_selected",
|
||||
"payload": {
|
||||
"step": 1,
|
||||
"decision_type": "call_tool",
|
||||
"selected_ids": [
|
||||
"tool-terminal"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task asks for current state or external action; tool use is justified.",
|
||||
"estimated_cost": 0.0
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-dd3dad15-7ace-47f7-9dd0-cf4955aa16ec"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-tool-traj-c0faa5d1-dcb4-4e86-ac6b-2abb15026f43-tool-terminal",
|
||||
"trajectory_id": "traj-c0faa5d1-dcb4-4e86-ac6b-2abb15026f43",
|
||||
"timestamp": "2026-04-14T16:50:18.861901+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "tool_called",
|
||||
"payload": {
|
||||
"tool_id": "tool-terminal",
|
||||
"input": "Run the deploy workflow skill."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-tool-result-traj-c0faa5d1-dcb4-4e86-ac6b-2abb15026f43-tool-terminal",
|
||||
"trajectory_id": "traj-c0faa5d1-dcb4-4e86-ac6b-2abb15026f43",
|
||||
"timestamp": "2026-04-14T16:50:18.861906+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "tool_result",
|
||||
"payload": {
|
||||
"tool_id": "tool-terminal",
|
||||
"status": "success",
|
||||
"output": "demo-result-for:tool-terminal",
|
||||
"error": null,
|
||||
"latency_ms": 42
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
}
|
||||
],
|
||||
"outcome": {
|
||||
"status": "success",
|
||||
"steps": 1,
|
||||
"latency_ms": 42,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Draft trajectory generated by MemabraRunner with execution hooks."
|
||||
},
|
||||
"reward": {
|
||||
"total": 1.032,
|
||||
"components": {
|
||||
"task_success": 0.8,
|
||||
"retrieval_hit": 0.25,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.008,
|
||||
"context_cost": 0.06,
|
||||
"useful_reuse": 0.05
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,191 @@
|
||||
{
|
||||
"trajectory_id": "traj-c5907bfb-61d2-47f9-a6c5-2300701bb551",
|
||||
"task": {
|
||||
"task_id": "task-c1f58e80-f0eb-47e9-92ab-9b1a84351dff",
|
||||
"input": "Use my telegram preference for this answer.",
|
||||
"channel": "telegram",
|
||||
"created_at": "2026-04-14T15:27:38.512116+00:00",
|
||||
"user_id": "oza"
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "",
|
||||
"environment_summary": "",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-telegram-pref",
|
||||
"type": "memory",
|
||||
"title": "Telegram preference",
|
||||
"summary": "Prefer plain text on Telegram.",
|
||||
"triggers": [
|
||||
"telegram",
|
||||
"preference",
|
||||
"answer"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.9,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"output"
|
||||
],
|
||||
"source": "user",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"skill": [
|
||||
{
|
||||
"id": "skill-deploy",
|
||||
"type": "skill",
|
||||
"title": "Deploy workflow",
|
||||
"summary": "Reusable deployment workflow.",
|
||||
"triggers": [
|
||||
"deploy",
|
||||
"workflow",
|
||||
"service"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.8,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.8,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"ops"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-terminal",
|
||||
"type": "tool",
|
||||
"title": "terminal",
|
||||
"summary": "Run terminal-style inspection commands.",
|
||||
"triggers": [
|
||||
"check",
|
||||
"current",
|
||||
"status",
|
||||
"system"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"inspection"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "inject_memory",
|
||||
"selected_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task likely depends on stable user/project facts.",
|
||||
"estimated_cost": 0.0
|
||||
}
|
||||
],
|
||||
"events": [
|
||||
{
|
||||
"event_id": "evt-212f6d74-bafd-483b-b8ec-cf4a33bf67da",
|
||||
"trajectory_id": "traj-c5907bfb-61d2-47f9-a6c5-2300701bb551",
|
||||
"timestamp": "2026-04-14T15:27:38.512204+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "task_received",
|
||||
"payload": {
|
||||
"input": "Use my telegram preference for this answer."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-34b409a4-9ba9-4921-b3a6-e4c41bf7660c",
|
||||
"trajectory_id": "traj-c5907bfb-61d2-47f9-a6c5-2300701bb551",
|
||||
"timestamp": "2026-04-14T15:27:38.512211+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "candidates_recalled",
|
||||
"payload": {
|
||||
"memory_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"skill_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"tool_ids": [
|
||||
"tool-terminal"
|
||||
]
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-212f6d74-bafd-483b-b8ec-cf4a33bf67da"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-d117772a-0e77-4068-8ca5-0adacfcee184",
|
||||
"trajectory_id": "traj-c5907bfb-61d2-47f9-a6c5-2300701bb551",
|
||||
"timestamp": "2026-04-14T15:27:38.512216+00:00",
|
||||
"stage": "policy",
|
||||
"event_type": "action_selected",
|
||||
"payload": {
|
||||
"step": 1,
|
||||
"decision_type": "inject_memory",
|
||||
"selected_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task likely depends on stable user/project facts.",
|
||||
"estimated_cost": 0.0
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-34b409a4-9ba9-4921-b3a6-e4c41bf7660c"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-memory-traj-c5907bfb-61d2-47f9-a6c5-2300701bb551-mem-telegram-pref",
|
||||
"trajectory_id": "traj-c5907bfb-61d2-47f9-a6c5-2300701bb551",
|
||||
"timestamp": "2026-04-14T15:27:38.512223+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "memory_injected",
|
||||
"payload": {
|
||||
"record_id": "mem-telegram-pref",
|
||||
"input": "Use my telegram preference for this answer."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
}
|
||||
],
|
||||
"outcome": {
|
||||
"status": "success",
|
||||
"steps": 1,
|
||||
"latency_ms": 0,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Draft trajectory generated by MemabraRunner with execution hooks."
|
||||
},
|
||||
"reward": {
|
||||
"total": 1.04,
|
||||
"components": {
|
||||
"task_success": 0.8,
|
||||
"retrieval_hit": 0.2,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.0,
|
||||
"context_cost": 0.06,
|
||||
"useful_reuse": 0.1
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,191 @@
|
||||
{
|
||||
"trajectory_id": "traj-c9c11bdc-852b-4aef-851c-f2968806e535",
|
||||
"task": {
|
||||
"task_id": "task-c08fbd42-a324-4430-8277-94c666661238",
|
||||
"input": "Check the current system status.",
|
||||
"channel": "local",
|
||||
"created_at": "2026-04-14T15:27:38.520185+00:00",
|
||||
"user_id": "oza"
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "",
|
||||
"environment_summary": "",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-telegram-pref",
|
||||
"type": "memory",
|
||||
"title": "Telegram preference",
|
||||
"summary": "Prefer plain text on Telegram.",
|
||||
"triggers": [
|
||||
"telegram",
|
||||
"preference",
|
||||
"answer"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.9,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"output"
|
||||
],
|
||||
"source": "user",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"skill": [
|
||||
{
|
||||
"id": "skill-deploy",
|
||||
"type": "skill",
|
||||
"title": "Deploy workflow",
|
||||
"summary": "Reusable deployment workflow.",
|
||||
"triggers": [
|
||||
"deploy",
|
||||
"workflow",
|
||||
"service"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.8,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.8,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"ops"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-terminal",
|
||||
"type": "tool",
|
||||
"title": "terminal",
|
||||
"summary": "Run terminal-style inspection commands.",
|
||||
"triggers": [
|
||||
"check",
|
||||
"current",
|
||||
"status",
|
||||
"system"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"inspection"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "inject_memory",
|
||||
"selected_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Predicted by learning router (score=1381.615).",
|
||||
"estimated_cost": 0.0
|
||||
}
|
||||
],
|
||||
"events": [
|
||||
{
|
||||
"event_id": "evt-2e0920c4-6830-4c86-a4a3-139028e46176",
|
||||
"trajectory_id": "traj-c9c11bdc-852b-4aef-851c-f2968806e535",
|
||||
"timestamp": "2026-04-14T15:27:38.520262+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "task_received",
|
||||
"payload": {
|
||||
"input": "Check the current system status."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-f81b2a77-a012-4c62-9700-93f1b31daeb2",
|
||||
"trajectory_id": "traj-c9c11bdc-852b-4aef-851c-f2968806e535",
|
||||
"timestamp": "2026-04-14T15:27:38.520268+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "candidates_recalled",
|
||||
"payload": {
|
||||
"memory_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"skill_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"tool_ids": [
|
||||
"tool-terminal"
|
||||
]
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-2e0920c4-6830-4c86-a4a3-139028e46176"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-2b1fe09d-30b3-46c9-a706-373d5c8da08e",
|
||||
"trajectory_id": "traj-c9c11bdc-852b-4aef-851c-f2968806e535",
|
||||
"timestamp": "2026-04-14T15:27:38.520273+00:00",
|
||||
"stage": "policy",
|
||||
"event_type": "action_selected",
|
||||
"payload": {
|
||||
"step": 1,
|
||||
"decision_type": "inject_memory",
|
||||
"selected_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Predicted by learning router (score=1381.615).",
|
||||
"estimated_cost": 0.0
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-f81b2a77-a012-4c62-9700-93f1b31daeb2"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-memory-traj-c9c11bdc-852b-4aef-851c-f2968806e535-mem-telegram-pref",
|
||||
"trajectory_id": "traj-c9c11bdc-852b-4aef-851c-f2968806e535",
|
||||
"timestamp": "2026-04-14T15:27:38.520280+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "memory_injected",
|
||||
"payload": {
|
||||
"record_id": "mem-telegram-pref",
|
||||
"input": "Check the current system status."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
}
|
||||
],
|
||||
"outcome": {
|
||||
"status": "success",
|
||||
"steps": 1,
|
||||
"latency_ms": 0,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Draft trajectory generated by MemabraRunner with execution hooks."
|
||||
},
|
||||
"reward": {
|
||||
"total": 1.04,
|
||||
"components": {
|
||||
"task_success": 0.8,
|
||||
"retrieval_hit": 0.2,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.0,
|
||||
"context_cost": 0.06,
|
||||
"useful_reuse": 0.1
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,201 @@
|
||||
{
|
||||
"trajectory_id": "traj-d2d3a115-36d8-466f-9d14-bf741316f698",
|
||||
"task": {
|
||||
"task_id": "task-00ccd7d0-72d9-458f-87fa-be0ee5571e44",
|
||||
"input": "Check the current system status.",
|
||||
"channel": "telegram",
|
||||
"created_at": "2026-04-14T06:53:08.731950+00:00",
|
||||
"user_id": "oza"
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "",
|
||||
"environment_summary": "",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-telegram-pref",
|
||||
"type": "memory",
|
||||
"title": "Telegram preference",
|
||||
"summary": "Prefer plain text on Telegram.",
|
||||
"triggers": [
|
||||
"telegram",
|
||||
"preference",
|
||||
"answer"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.9,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"output"
|
||||
],
|
||||
"source": "user",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"skill": [
|
||||
{
|
||||
"id": "skill-deploy",
|
||||
"type": "skill",
|
||||
"title": "Deploy workflow",
|
||||
"summary": "Reusable deployment workflow.",
|
||||
"triggers": [
|
||||
"deploy",
|
||||
"workflow",
|
||||
"service"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.8,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.8,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"ops"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-terminal",
|
||||
"type": "tool",
|
||||
"title": "terminal",
|
||||
"summary": "Run terminal-style inspection commands.",
|
||||
"triggers": [
|
||||
"check",
|
||||
"current",
|
||||
"status",
|
||||
"system"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"inspection"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "call_tool",
|
||||
"selected_ids": [
|
||||
"tool-terminal"
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task asks for current state or external action; tool use is justified.",
|
||||
"estimated_cost": 0.0
|
||||
}
|
||||
],
|
||||
"events": [
|
||||
{
|
||||
"event_id": "evt-63d64eb8-16b1-4dc7-ae03-7c094bc6e64f",
|
||||
"trajectory_id": "traj-d2d3a115-36d8-466f-9d14-bf741316f698",
|
||||
"timestamp": "2026-04-14T06:53:08.732042+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "task_received",
|
||||
"payload": {
|
||||
"input": "Check the current system status."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-04ef718b-6973-465d-920e-bc501a6e02ad",
|
||||
"trajectory_id": "traj-d2d3a115-36d8-466f-9d14-bf741316f698",
|
||||
"timestamp": "2026-04-14T06:53:08.732049+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "candidates_recalled",
|
||||
"payload": {
|
||||
"memory_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"skill_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"tool_ids": [
|
||||
"tool-terminal"
|
||||
]
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-63d64eb8-16b1-4dc7-ae03-7c094bc6e64f"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-50f19e1e-8771-42c1-8846-95b5e4a6f491",
|
||||
"trajectory_id": "traj-d2d3a115-36d8-466f-9d14-bf741316f698",
|
||||
"timestamp": "2026-04-14T06:53:08.732053+00:00",
|
||||
"stage": "policy",
|
||||
"event_type": "action_selected",
|
||||
"payload": {
|
||||
"step": 1,
|
||||
"decision_type": "call_tool",
|
||||
"selected_ids": [
|
||||
"tool-terminal"
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task asks for current state or external action; tool use is justified.",
|
||||
"estimated_cost": 0.0
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-04ef718b-6973-465d-920e-bc501a6e02ad"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-tool-traj-d2d3a115-36d8-466f-9d14-bf741316f698-tool-terminal",
|
||||
"trajectory_id": "traj-d2d3a115-36d8-466f-9d14-bf741316f698",
|
||||
"timestamp": "2026-04-14T06:53:08.732064+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "tool_called",
|
||||
"payload": {
|
||||
"tool_id": "tool-terminal",
|
||||
"input": "Check the current system status."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-tool-result-traj-d2d3a115-36d8-466f-9d14-bf741316f698-tool-terminal",
|
||||
"trajectory_id": "traj-d2d3a115-36d8-466f-9d14-bf741316f698",
|
||||
"timestamp": "2026-04-14T06:53:08.732068+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "tool_result",
|
||||
"payload": {
|
||||
"tool_id": "tool-terminal",
|
||||
"status": "success",
|
||||
"output": "demo-result-for:tool-terminal",
|
||||
"error": null,
|
||||
"latency_ms": 42
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
}
|
||||
],
|
||||
"outcome": {
|
||||
"status": "success",
|
||||
"steps": 1,
|
||||
"latency_ms": 42,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Draft trajectory generated by MemabraRunner with execution hooks."
|
||||
},
|
||||
"reward": {
|
||||
"total": 1.058,
|
||||
"components": {
|
||||
"task_success": 0.8,
|
||||
"retrieval_hit": 0.25,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.042,
|
||||
"context_cost": 0.0,
|
||||
"useful_reuse": 0.05
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,191 @@
|
||||
{
|
||||
"trajectory_id": "traj-d3575889-7458-44b9-b3f1-f04cd766ca76",
|
||||
"task": {
|
||||
"task_id": "task-9db54b7d-a508-49ac-bd3c-bd5af3eabc61",
|
||||
"input": "Deploy this service with the usual workflow.",
|
||||
"channel": "local",
|
||||
"created_at": "2026-04-14T15:27:38.520867+00:00",
|
||||
"user_id": "oza"
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "",
|
||||
"environment_summary": "",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-telegram-pref",
|
||||
"type": "memory",
|
||||
"title": "Telegram preference",
|
||||
"summary": "Prefer plain text on Telegram.",
|
||||
"triggers": [
|
||||
"telegram",
|
||||
"preference",
|
||||
"answer"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.9,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"output"
|
||||
],
|
||||
"source": "user",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"skill": [
|
||||
{
|
||||
"id": "skill-deploy",
|
||||
"type": "skill",
|
||||
"title": "Deploy workflow",
|
||||
"summary": "Reusable deployment workflow.",
|
||||
"triggers": [
|
||||
"deploy",
|
||||
"workflow",
|
||||
"service"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.8,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.8,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"ops"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-terminal",
|
||||
"type": "tool",
|
||||
"title": "terminal",
|
||||
"summary": "Run terminal-style inspection commands.",
|
||||
"triggers": [
|
||||
"check",
|
||||
"current",
|
||||
"status",
|
||||
"system"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"inspection"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "inject_memory",
|
||||
"selected_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Predicted by learning router (score=1897.615).",
|
||||
"estimated_cost": 0.0
|
||||
}
|
||||
],
|
||||
"events": [
|
||||
{
|
||||
"event_id": "evt-3e658630-fea8-44c3-afd2-fc936a2eed37",
|
||||
"trajectory_id": "traj-d3575889-7458-44b9-b3f1-f04cd766ca76",
|
||||
"timestamp": "2026-04-14T15:27:38.520945+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "task_received",
|
||||
"payload": {
|
||||
"input": "Deploy this service with the usual workflow."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-03990424-3433-4147-a963-353863758b31",
|
||||
"trajectory_id": "traj-d3575889-7458-44b9-b3f1-f04cd766ca76",
|
||||
"timestamp": "2026-04-14T15:27:38.520951+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "candidates_recalled",
|
||||
"payload": {
|
||||
"memory_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"skill_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"tool_ids": [
|
||||
"tool-terminal"
|
||||
]
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-3e658630-fea8-44c3-afd2-fc936a2eed37"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-10dfab37-ded7-473e-9de9-2f922c5bf7c8",
|
||||
"trajectory_id": "traj-d3575889-7458-44b9-b3f1-f04cd766ca76",
|
||||
"timestamp": "2026-04-14T15:27:38.520956+00:00",
|
||||
"stage": "policy",
|
||||
"event_type": "action_selected",
|
||||
"payload": {
|
||||
"step": 1,
|
||||
"decision_type": "inject_memory",
|
||||
"selected_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Predicted by learning router (score=1897.615).",
|
||||
"estimated_cost": 0.0
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-03990424-3433-4147-a963-353863758b31"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-memory-traj-d3575889-7458-44b9-b3f1-f04cd766ca76-mem-telegram-pref",
|
||||
"trajectory_id": "traj-d3575889-7458-44b9-b3f1-f04cd766ca76",
|
||||
"timestamp": "2026-04-14T15:27:38.520961+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "memory_injected",
|
||||
"payload": {
|
||||
"record_id": "mem-telegram-pref",
|
||||
"input": "Deploy this service with the usual workflow."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
}
|
||||
],
|
||||
"outcome": {
|
||||
"status": "success",
|
||||
"steps": 1,
|
||||
"latency_ms": 0,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Draft trajectory generated by MemabraRunner with execution hooks."
|
||||
},
|
||||
"reward": {
|
||||
"total": 1.04,
|
||||
"components": {
|
||||
"task_success": 0.8,
|
||||
"retrieval_hit": 0.2,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.0,
|
||||
"context_cost": 0.06,
|
||||
"useful_reuse": 0.1
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,170 @@
|
||||
{
|
||||
"trajectory_id": "traj-d99b5307-1749-4e80-867a-877e087f226f",
|
||||
"task": {
|
||||
"task_id": "task-9cda8e38-dcdf-4877-bc19-48444df0531e",
|
||||
"input": "Use multiple capabilities: memory, skill, and tool.",
|
||||
"channel": "local",
|
||||
"created_at": "2026-04-14T16:50:18.865109+00:00",
|
||||
"user_id": null
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "",
|
||||
"environment_summary": "",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-telegram-pref",
|
||||
"type": "memory",
|
||||
"title": "Telegram preference",
|
||||
"summary": "Prefer plain text on Telegram.",
|
||||
"triggers": [
|
||||
"telegram",
|
||||
"preference",
|
||||
"answer"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.9,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"output"
|
||||
],
|
||||
"source": "user",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"skill": [
|
||||
{
|
||||
"id": "skill-deploy",
|
||||
"type": "skill",
|
||||
"title": "Deploy workflow",
|
||||
"summary": "Reusable deployment workflow.",
|
||||
"triggers": [
|
||||
"deploy",
|
||||
"workflow",
|
||||
"service"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.8,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.8,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"ops"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-terminal",
|
||||
"type": "tool",
|
||||
"title": "terminal",
|
||||
"summary": "Run terminal-style inspection commands.",
|
||||
"triggers": [
|
||||
"check",
|
||||
"current",
|
||||
"status",
|
||||
"system"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"inspection"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "clarify",
|
||||
"selected_ids": [],
|
||||
"selected_payloads": [],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Predicted by learning router (score=2606.615).",
|
||||
"estimated_cost": 0.0
|
||||
}
|
||||
],
|
||||
"events": [
|
||||
{
|
||||
"event_id": "evt-88a21058-c409-4836-a1b8-ef6cc63ac51e",
|
||||
"trajectory_id": "traj-d99b5307-1749-4e80-867a-877e087f226f",
|
||||
"timestamp": "2026-04-14T16:50:18.865214+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "task_received",
|
||||
"payload": {
|
||||
"input": "Use multiple capabilities: memory, skill, and tool."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-44d46564-2d71-4bed-8a3f-d3fc96fce9ef",
|
||||
"trajectory_id": "traj-d99b5307-1749-4e80-867a-877e087f226f",
|
||||
"timestamp": "2026-04-14T16:50:18.865225+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "candidates_recalled",
|
||||
"payload": {
|
||||
"memory_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"skill_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"tool_ids": [
|
||||
"tool-terminal"
|
||||
]
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-88a21058-c409-4836-a1b8-ef6cc63ac51e"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-e21e8afe-d676-4839-b9d0-fd60441b983a",
|
||||
"trajectory_id": "traj-d99b5307-1749-4e80-867a-877e087f226f",
|
||||
"timestamp": "2026-04-14T16:50:18.865231+00:00",
|
||||
"stage": "policy",
|
||||
"event_type": "action_selected",
|
||||
"payload": {
|
||||
"step": 1,
|
||||
"decision_type": "clarify",
|
||||
"selected_ids": [],
|
||||
"selected_payloads": [],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Predicted by learning router (score=2606.615).",
|
||||
"estimated_cost": 0.0
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-44d46564-2d71-4bed-8a3f-d3fc96fce9ef"
|
||||
}
|
||||
],
|
||||
"outcome": {
|
||||
"status": "partial_success",
|
||||
"steps": 1,
|
||||
"latency_ms": 0,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Draft trajectory generated by MemabraRunner with execution hooks."
|
||||
},
|
||||
"reward": {
|
||||
"total": 0.44,
|
||||
"components": {
|
||||
"task_success": 0.4,
|
||||
"retrieval_hit": 0.1,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.0,
|
||||
"context_cost": 0.06,
|
||||
"useful_reuse": 0.0
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,192 @@
|
||||
{
|
||||
"trajectory_id": "traj-dd361c81-40a1-4892-9914-2140870fff95",
|
||||
"task": {
|
||||
"task_id": "task-789e89f1-828b-405e-ab11-43dd00107f5f",
|
||||
"input": "Deploy this service with the usual workflow.",
|
||||
"channel": "local",
|
||||
"created_at": "2026-04-14T15:27:38.519101+00:00",
|
||||
"user_id": "oza"
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "",
|
||||
"environment_summary": "",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-telegram-pref",
|
||||
"type": "memory",
|
||||
"title": "Telegram preference",
|
||||
"summary": "Prefer plain text on Telegram.",
|
||||
"triggers": [
|
||||
"telegram",
|
||||
"preference",
|
||||
"answer"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.9,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"output"
|
||||
],
|
||||
"source": "user",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"skill": [
|
||||
{
|
||||
"id": "skill-deploy",
|
||||
"type": "skill",
|
||||
"title": "Deploy workflow",
|
||||
"summary": "Reusable deployment workflow.",
|
||||
"triggers": [
|
||||
"deploy",
|
||||
"workflow",
|
||||
"service"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.8,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.8,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"ops"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-terminal",
|
||||
"type": "tool",
|
||||
"title": "terminal",
|
||||
"summary": "Run terminal-style inspection commands.",
|
||||
"triggers": [
|
||||
"check",
|
||||
"current",
|
||||
"status",
|
||||
"system"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"inspection"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "load_skill",
|
||||
"selected_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task resembles a reusable procedure; load a skill before action.",
|
||||
"estimated_cost": 0.0
|
||||
}
|
||||
],
|
||||
"events": [
|
||||
{
|
||||
"event_id": "evt-ec9bd980-c648-43fc-8428-83a6ce0cf375",
|
||||
"trajectory_id": "traj-dd361c81-40a1-4892-9914-2140870fff95",
|
||||
"timestamp": "2026-04-14T15:27:38.519171+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "task_received",
|
||||
"payload": {
|
||||
"input": "Deploy this service with the usual workflow."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-e0f1f4e9-2a70-424d-bff6-34a156134b0f",
|
||||
"trajectory_id": "traj-dd361c81-40a1-4892-9914-2140870fff95",
|
||||
"timestamp": "2026-04-14T15:27:38.519177+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "candidates_recalled",
|
||||
"payload": {
|
||||
"memory_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"skill_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"tool_ids": [
|
||||
"tool-terminal"
|
||||
]
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-ec9bd980-c648-43fc-8428-83a6ce0cf375"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-9b1ea6f8-ac54-4aa4-ae0f-44aa3a0128dd",
|
||||
"trajectory_id": "traj-dd361c81-40a1-4892-9914-2140870fff95",
|
||||
"timestamp": "2026-04-14T15:27:38.519181+00:00",
|
||||
"stage": "policy",
|
||||
"event_type": "action_selected",
|
||||
"payload": {
|
||||
"step": 1,
|
||||
"decision_type": "load_skill",
|
||||
"selected_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Task resembles a reusable procedure; load a skill before action.",
|
||||
"estimated_cost": 0.0
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-e0f1f4e9-2a70-424d-bff6-34a156134b0f"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-skill-traj-dd361c81-40a1-4892-9914-2140870fff95-skill-deploy",
|
||||
"trajectory_id": "traj-dd361c81-40a1-4892-9914-2140870fff95",
|
||||
"timestamp": "2026-04-14T15:27:38.519188+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "skill_loaded",
|
||||
"payload": {
|
||||
"skill_id": "skill-deploy",
|
||||
"input": "Deploy this service with the usual workflow.",
|
||||
"instructions": "Demo skill payload loaded successfully."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
}
|
||||
],
|
||||
"outcome": {
|
||||
"status": "success",
|
||||
"steps": 1,
|
||||
"latency_ms": 0,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Draft trajectory generated by MemabraRunner with execution hooks."
|
||||
},
|
||||
"reward": {
|
||||
"total": 1.04,
|
||||
"components": {
|
||||
"task_success": 0.8,
|
||||
"retrieval_hit": 0.2,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.0,
|
||||
"context_cost": 0.06,
|
||||
"useful_reuse": 0.1
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,192 @@
|
||||
{
|
||||
"trajectory_id": "traj-e197ee51-e87c-4203-b9ee-c2f2d530cceb",
|
||||
"task": {
|
||||
"task_id": "task-144d7465-796c-4dd0-a4e2-c2be42872c4a",
|
||||
"input": "Run the deploy workflow skill.",
|
||||
"channel": "local",
|
||||
"created_at": "2026-04-14T15:52:24.606059+00:00",
|
||||
"user_id": null
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "",
|
||||
"environment_summary": "",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-telegram-pref",
|
||||
"type": "memory",
|
||||
"title": "Telegram preference",
|
||||
"summary": "Prefer plain text on Telegram.",
|
||||
"triggers": [
|
||||
"telegram",
|
||||
"preference",
|
||||
"answer"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.9,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"output"
|
||||
],
|
||||
"source": "user",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"skill": [
|
||||
{
|
||||
"id": "skill-deploy",
|
||||
"type": "skill",
|
||||
"title": "Deploy workflow",
|
||||
"summary": "Reusable deployment workflow.",
|
||||
"triggers": [
|
||||
"deploy",
|
||||
"workflow",
|
||||
"service"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.8,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.8,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"ops"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-terminal",
|
||||
"type": "tool",
|
||||
"title": "terminal",
|
||||
"summary": "Run terminal-style inspection commands.",
|
||||
"triggers": [
|
||||
"check",
|
||||
"current",
|
||||
"status",
|
||||
"system"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"inspection"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "load_skill",
|
||||
"selected_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Predicted by learning router (score=1277.214).",
|
||||
"estimated_cost": 0.0
|
||||
}
|
||||
],
|
||||
"events": [
|
||||
{
|
||||
"event_id": "evt-184ab2f3-c1c6-4af1-8241-d55b4731e606",
|
||||
"trajectory_id": "traj-e197ee51-e87c-4203-b9ee-c2f2d530cceb",
|
||||
"timestamp": "2026-04-14T15:52:24.606169+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "task_received",
|
||||
"payload": {
|
||||
"input": "Run the deploy workflow skill."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-9dd959ce-a5ce-42fd-b975-a03dd713adf6",
|
||||
"trajectory_id": "traj-e197ee51-e87c-4203-b9ee-c2f2d530cceb",
|
||||
"timestamp": "2026-04-14T15:52:24.606180+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "candidates_recalled",
|
||||
"payload": {
|
||||
"memory_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"skill_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"tool_ids": [
|
||||
"tool-terminal"
|
||||
]
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-184ab2f3-c1c6-4af1-8241-d55b4731e606"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-537a8488-f6eb-4f15-94ac-3e1f195c584a",
|
||||
"trajectory_id": "traj-e197ee51-e87c-4203-b9ee-c2f2d530cceb",
|
||||
"timestamp": "2026-04-14T15:52:24.606193+00:00",
|
||||
"stage": "policy",
|
||||
"event_type": "action_selected",
|
||||
"payload": {
|
||||
"step": 1,
|
||||
"decision_type": "load_skill",
|
||||
"selected_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Predicted by learning router (score=1277.214).",
|
||||
"estimated_cost": 0.0
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-9dd959ce-a5ce-42fd-b975-a03dd713adf6"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-skill-traj-e197ee51-e87c-4203-b9ee-c2f2d530cceb-skill-deploy",
|
||||
"trajectory_id": "traj-e197ee51-e87c-4203-b9ee-c2f2d530cceb",
|
||||
"timestamp": "2026-04-14T15:52:24.606202+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "skill_loaded",
|
||||
"payload": {
|
||||
"skill_id": "skill-deploy",
|
||||
"input": "Run the deploy workflow skill.",
|
||||
"instructions": "Demo skill payload loaded successfully."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
}
|
||||
],
|
||||
"outcome": {
|
||||
"status": "success",
|
||||
"steps": 1,
|
||||
"latency_ms": 0,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Draft trajectory generated by MemabraRunner with execution hooks."
|
||||
},
|
||||
"reward": {
|
||||
"total": 1.04,
|
||||
"components": {
|
||||
"task_success": 0.8,
|
||||
"retrieval_hit": 0.2,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.0,
|
||||
"context_cost": 0.06,
|
||||
"useful_reuse": 0.1
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,170 @@
|
||||
{
|
||||
"trajectory_id": "traj-e9c37170-8764-4d70-ba0d-90213b275229",
|
||||
"task": {
|
||||
"task_id": "task-f61f5344-3be7-4a7a-9dfa-b8d2a9c30a42",
|
||||
"input": "Recall my saved preference from memory.",
|
||||
"channel": "local",
|
||||
"created_at": "2026-04-14T16:50:18.863539+00:00",
|
||||
"user_id": null
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "",
|
||||
"environment_summary": "",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-telegram-pref",
|
||||
"type": "memory",
|
||||
"title": "Telegram preference",
|
||||
"summary": "Prefer plain text on Telegram.",
|
||||
"triggers": [
|
||||
"telegram",
|
||||
"preference",
|
||||
"answer"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.9,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"output"
|
||||
],
|
||||
"source": "user",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"skill": [
|
||||
{
|
||||
"id": "skill-deploy",
|
||||
"type": "skill",
|
||||
"title": "Deploy workflow",
|
||||
"summary": "Reusable deployment workflow.",
|
||||
"triggers": [
|
||||
"deploy",
|
||||
"workflow",
|
||||
"service"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.8,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.8,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"ops"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-terminal",
|
||||
"type": "tool",
|
||||
"title": "terminal",
|
||||
"summary": "Run terminal-style inspection commands.",
|
||||
"triggers": [
|
||||
"check",
|
||||
"current",
|
||||
"status",
|
||||
"system"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"inspection"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "clarify",
|
||||
"selected_ids": [],
|
||||
"selected_payloads": [],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Predicted by learning router (score=1994.615).",
|
||||
"estimated_cost": 0.0
|
||||
}
|
||||
],
|
||||
"events": [
|
||||
{
|
||||
"event_id": "evt-21762ef9-6490-4e3f-8f3c-2ba17e20c050",
|
||||
"trajectory_id": "traj-e9c37170-8764-4d70-ba0d-90213b275229",
|
||||
"timestamp": "2026-04-14T16:50:18.863643+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "task_received",
|
||||
"payload": {
|
||||
"input": "Recall my saved preference from memory."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-40f5b045-1e94-4c07-8cf5-5a245a946b9d",
|
||||
"trajectory_id": "traj-e9c37170-8764-4d70-ba0d-90213b275229",
|
||||
"timestamp": "2026-04-14T16:50:18.863652+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "candidates_recalled",
|
||||
"payload": {
|
||||
"memory_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"skill_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"tool_ids": [
|
||||
"tool-terminal"
|
||||
]
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-21762ef9-6490-4e3f-8f3c-2ba17e20c050"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-5ed49c2e-d2b3-46ec-859e-ec00f8c001c2",
|
||||
"trajectory_id": "traj-e9c37170-8764-4d70-ba0d-90213b275229",
|
||||
"timestamp": "2026-04-14T16:50:18.863659+00:00",
|
||||
"stage": "policy",
|
||||
"event_type": "action_selected",
|
||||
"payload": {
|
||||
"step": 1,
|
||||
"decision_type": "clarify",
|
||||
"selected_ids": [],
|
||||
"selected_payloads": [],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Predicted by learning router (score=1994.615).",
|
||||
"estimated_cost": 0.0
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-40f5b045-1e94-4c07-8cf5-5a245a946b9d"
|
||||
}
|
||||
],
|
||||
"outcome": {
|
||||
"status": "partial_success",
|
||||
"steps": 1,
|
||||
"latency_ms": 0,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Draft trajectory generated by MemabraRunner with execution hooks."
|
||||
},
|
||||
"reward": {
|
||||
"total": 0.44,
|
||||
"components": {
|
||||
"task_success": 0.4,
|
||||
"retrieval_hit": 0.1,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.0,
|
||||
"context_cost": 0.06,
|
||||
"useful_reuse": 0.0
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,170 @@
|
||||
{
|
||||
"trajectory_id": "traj-ebc0d1f0-d01f-4c1f-8cdb-23c3d184b2c5",
|
||||
"task": {
|
||||
"task_id": "task-d7578bf3-95da-43f2-9b31-2c80ccb4fe33",
|
||||
"input": "Run the deploy workflow skill.",
|
||||
"channel": "local",
|
||||
"created_at": "2026-04-14T16:50:18.864056+00:00",
|
||||
"user_id": null
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "",
|
||||
"environment_summary": "",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-telegram-pref",
|
||||
"type": "memory",
|
||||
"title": "Telegram preference",
|
||||
"summary": "Prefer plain text on Telegram.",
|
||||
"triggers": [
|
||||
"telegram",
|
||||
"preference",
|
||||
"answer"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.9,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"output"
|
||||
],
|
||||
"source": "user",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"skill": [
|
||||
{
|
||||
"id": "skill-deploy",
|
||||
"type": "skill",
|
||||
"title": "Deploy workflow",
|
||||
"summary": "Reusable deployment workflow.",
|
||||
"triggers": [
|
||||
"deploy",
|
||||
"workflow",
|
||||
"service"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.8,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.8,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"ops"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-terminal",
|
||||
"type": "tool",
|
||||
"title": "terminal",
|
||||
"summary": "Run terminal-style inspection commands.",
|
||||
"triggers": [
|
||||
"check",
|
||||
"current",
|
||||
"status",
|
||||
"system"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"inspection"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "clarify",
|
||||
"selected_ids": [],
|
||||
"selected_payloads": [],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Predicted by learning router (score=1535.615).",
|
||||
"estimated_cost": 0.0
|
||||
}
|
||||
],
|
||||
"events": [
|
||||
{
|
||||
"event_id": "evt-4e1aa172-112d-4000-8708-f2184e114ee5",
|
||||
"trajectory_id": "traj-ebc0d1f0-d01f-4c1f-8cdb-23c3d184b2c5",
|
||||
"timestamp": "2026-04-14T16:50:18.864163+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "task_received",
|
||||
"payload": {
|
||||
"input": "Run the deploy workflow skill."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-b9e7f5b9-2f27-4f4d-8f76-8ba6b39620eb",
|
||||
"trajectory_id": "traj-ebc0d1f0-d01f-4c1f-8cdb-23c3d184b2c5",
|
||||
"timestamp": "2026-04-14T16:50:18.864173+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "candidates_recalled",
|
||||
"payload": {
|
||||
"memory_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"skill_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"tool_ids": [
|
||||
"tool-terminal"
|
||||
]
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-4e1aa172-112d-4000-8708-f2184e114ee5"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-07dcc07d-d9c4-4698-881d-925294dadadf",
|
||||
"trajectory_id": "traj-ebc0d1f0-d01f-4c1f-8cdb-23c3d184b2c5",
|
||||
"timestamp": "2026-04-14T16:50:18.864179+00:00",
|
||||
"stage": "policy",
|
||||
"event_type": "action_selected",
|
||||
"payload": {
|
||||
"step": 1,
|
||||
"decision_type": "clarify",
|
||||
"selected_ids": [],
|
||||
"selected_payloads": [],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Predicted by learning router (score=1535.615).",
|
||||
"estimated_cost": 0.0
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-b9e7f5b9-2f27-4f4d-8f76-8ba6b39620eb"
|
||||
}
|
||||
],
|
||||
"outcome": {
|
||||
"status": "partial_success",
|
||||
"steps": 1,
|
||||
"latency_ms": 0,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Draft trajectory generated by MemabraRunner with execution hooks."
|
||||
},
|
||||
"reward": {
|
||||
"total": 0.44,
|
||||
"components": {
|
||||
"task_success": 0.4,
|
||||
"retrieval_hit": 0.1,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.0,
|
||||
"context_cost": 0.06,
|
||||
"useful_reuse": 0.0
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,192 @@
|
||||
{
|
||||
"trajectory_id": "traj-ed1d8812-f0ac-4994-86ab-21b3cf0fcb17",
|
||||
"task": {
|
||||
"task_id": "task-d9131553-8868-4dac-8f06-69be44c43f4e",
|
||||
"input": "Use multiple capabilities: memory, skill, and tool.",
|
||||
"channel": "local",
|
||||
"created_at": "2026-04-14T15:52:24.607062+00:00",
|
||||
"user_id": null
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "",
|
||||
"environment_summary": "",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-telegram-pref",
|
||||
"type": "memory",
|
||||
"title": "Telegram preference",
|
||||
"summary": "Prefer plain text on Telegram.",
|
||||
"triggers": [
|
||||
"telegram",
|
||||
"preference",
|
||||
"answer"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.9,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"output"
|
||||
],
|
||||
"source": "user",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"skill": [
|
||||
{
|
||||
"id": "skill-deploy",
|
||||
"type": "skill",
|
||||
"title": "Deploy workflow",
|
||||
"summary": "Reusable deployment workflow.",
|
||||
"triggers": [
|
||||
"deploy",
|
||||
"workflow",
|
||||
"service"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.8,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.8,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"ops"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-terminal",
|
||||
"type": "tool",
|
||||
"title": "terminal",
|
||||
"summary": "Run terminal-style inspection commands.",
|
||||
"triggers": [
|
||||
"check",
|
||||
"current",
|
||||
"status",
|
||||
"system"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"inspection"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "load_skill",
|
||||
"selected_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Predicted by learning router (score=2167.3334).",
|
||||
"estimated_cost": 0.0
|
||||
}
|
||||
],
|
||||
"events": [
|
||||
{
|
||||
"event_id": "evt-a8bc2d4a-1557-4029-899f-7fa93b764b11",
|
||||
"trajectory_id": "traj-ed1d8812-f0ac-4994-86ab-21b3cf0fcb17",
|
||||
"timestamp": "2026-04-14T15:52:24.607165+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "task_received",
|
||||
"payload": {
|
||||
"input": "Use multiple capabilities: memory, skill, and tool."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-832ac7e6-d619-4e24-ad74-bcca1042806e",
|
||||
"trajectory_id": "traj-ed1d8812-f0ac-4994-86ab-21b3cf0fcb17",
|
||||
"timestamp": "2026-04-14T15:52:24.607175+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "candidates_recalled",
|
||||
"payload": {
|
||||
"memory_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"skill_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"tool_ids": [
|
||||
"tool-terminal"
|
||||
]
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-a8bc2d4a-1557-4029-899f-7fa93b764b11"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-0aaff11d-de9f-4e28-bc92-6def76857a20",
|
||||
"trajectory_id": "traj-ed1d8812-f0ac-4994-86ab-21b3cf0fcb17",
|
||||
"timestamp": "2026-04-14T15:52:24.607182+00:00",
|
||||
"stage": "policy",
|
||||
"event_type": "action_selected",
|
||||
"payload": {
|
||||
"step": 1,
|
||||
"decision_type": "load_skill",
|
||||
"selected_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Predicted by learning router (score=2167.3334).",
|
||||
"estimated_cost": 0.0
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-832ac7e6-d619-4e24-ad74-bcca1042806e"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-skill-traj-ed1d8812-f0ac-4994-86ab-21b3cf0fcb17-skill-deploy",
|
||||
"trajectory_id": "traj-ed1d8812-f0ac-4994-86ab-21b3cf0fcb17",
|
||||
"timestamp": "2026-04-14T15:52:24.607192+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "skill_loaded",
|
||||
"payload": {
|
||||
"skill_id": "skill-deploy",
|
||||
"input": "Use multiple capabilities: memory, skill, and tool.",
|
||||
"instructions": "Demo skill payload loaded successfully."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
}
|
||||
],
|
||||
"outcome": {
|
||||
"status": "success",
|
||||
"steps": 1,
|
||||
"latency_ms": 0,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Draft trajectory generated by MemabraRunner with execution hooks."
|
||||
},
|
||||
"reward": {
|
||||
"total": 1.04,
|
||||
"components": {
|
||||
"task_success": 0.8,
|
||||
"retrieval_hit": 0.2,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.0,
|
||||
"context_cost": 0.06,
|
||||
"useful_reuse": 0.1
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,192 @@
|
||||
{
|
||||
"trajectory_id": "traj-f1d895a0-5442-448f-8936-4ee8b07822e6",
|
||||
"task": {
|
||||
"task_id": "task-053282d0-1f43-409f-a230-343d3faa02df",
|
||||
"input": "Check current system status with a tool.",
|
||||
"channel": "local",
|
||||
"created_at": "2026-04-14T15:52:24.606551+00:00",
|
||||
"user_id": null
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "",
|
||||
"environment_summary": "",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-telegram-pref",
|
||||
"type": "memory",
|
||||
"title": "Telegram preference",
|
||||
"summary": "Prefer plain text on Telegram.",
|
||||
"triggers": [
|
||||
"telegram",
|
||||
"preference",
|
||||
"answer"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.9,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"output"
|
||||
],
|
||||
"source": "user",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"skill": [
|
||||
{
|
||||
"id": "skill-deploy",
|
||||
"type": "skill",
|
||||
"title": "Deploy workflow",
|
||||
"summary": "Reusable deployment workflow.",
|
||||
"triggers": [
|
||||
"deploy",
|
||||
"workflow",
|
||||
"service"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.8,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.8,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"ops"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-terminal",
|
||||
"type": "tool",
|
||||
"title": "terminal",
|
||||
"summary": "Run terminal-style inspection commands.",
|
||||
"triggers": [
|
||||
"check",
|
||||
"current",
|
||||
"status",
|
||||
"system"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"inspection"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "load_skill",
|
||||
"selected_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Predicted by learning router (score=1701.0804).",
|
||||
"estimated_cost": 0.0
|
||||
}
|
||||
],
|
||||
"events": [
|
||||
{
|
||||
"event_id": "evt-5900439a-2a97-41fe-a82e-96181c99fee1",
|
||||
"trajectory_id": "traj-f1d895a0-5442-448f-8936-4ee8b07822e6",
|
||||
"timestamp": "2026-04-14T15:52:24.606656+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "task_received",
|
||||
"payload": {
|
||||
"input": "Check current system status with a tool."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-e0965597-ddee-4ccd-ae72-b51105101428",
|
||||
"trajectory_id": "traj-f1d895a0-5442-448f-8936-4ee8b07822e6",
|
||||
"timestamp": "2026-04-14T15:52:24.606666+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "candidates_recalled",
|
||||
"payload": {
|
||||
"memory_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"skill_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"tool_ids": [
|
||||
"tool-terminal"
|
||||
]
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-5900439a-2a97-41fe-a82e-96181c99fee1"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-047dc545-d6c2-4a67-b0db-26b79e994e63",
|
||||
"trajectory_id": "traj-f1d895a0-5442-448f-8936-4ee8b07822e6",
|
||||
"timestamp": "2026-04-14T15:52:24.606672+00:00",
|
||||
"stage": "policy",
|
||||
"event_type": "action_selected",
|
||||
"payload": {
|
||||
"step": 1,
|
||||
"decision_type": "load_skill",
|
||||
"selected_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Predicted by learning router (score=1701.0804).",
|
||||
"estimated_cost": 0.0
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-e0965597-ddee-4ccd-ae72-b51105101428"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-skill-traj-f1d895a0-5442-448f-8936-4ee8b07822e6-skill-deploy",
|
||||
"trajectory_id": "traj-f1d895a0-5442-448f-8936-4ee8b07822e6",
|
||||
"timestamp": "2026-04-14T15:52:24.606681+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "skill_loaded",
|
||||
"payload": {
|
||||
"skill_id": "skill-deploy",
|
||||
"input": "Check current system status with a tool.",
|
||||
"instructions": "Demo skill payload loaded successfully."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
}
|
||||
],
|
||||
"outcome": {
|
||||
"status": "success",
|
||||
"steps": 1,
|
||||
"latency_ms": 0,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Draft trajectory generated by MemabraRunner with execution hooks."
|
||||
},
|
||||
"reward": {
|
||||
"total": 1.04,
|
||||
"components": {
|
||||
"task_success": 0.8,
|
||||
"retrieval_hit": 0.2,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.0,
|
||||
"context_cost": 0.06,
|
||||
"useful_reuse": 0.1
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,170 @@
|
||||
{
|
||||
"trajectory_id": "traj-f511e978-ad79-4be6-bbab-461b5ad9ecb3",
|
||||
"task": {
|
||||
"task_id": "task-c3c52f6d-4793-4687-9838-d98fd99a6074",
|
||||
"input": "Use multiple capabilities: memory, skill, and tool.",
|
||||
"channel": "local",
|
||||
"created_at": "2026-04-14T16:50:18.863031+00:00",
|
||||
"user_id": null
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "",
|
||||
"environment_summary": "",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-telegram-pref",
|
||||
"type": "memory",
|
||||
"title": "Telegram preference",
|
||||
"summary": "Prefer plain text on Telegram.",
|
||||
"triggers": [
|
||||
"telegram",
|
||||
"preference",
|
||||
"answer"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.9,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"output"
|
||||
],
|
||||
"source": "user",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"skill": [
|
||||
{
|
||||
"id": "skill-deploy",
|
||||
"type": "skill",
|
||||
"title": "Deploy workflow",
|
||||
"summary": "Reusable deployment workflow.",
|
||||
"triggers": [
|
||||
"deploy",
|
||||
"workflow",
|
||||
"service"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.8,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.8,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"ops"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-terminal",
|
||||
"type": "tool",
|
||||
"title": "terminal",
|
||||
"summary": "Run terminal-style inspection commands.",
|
||||
"triggers": [
|
||||
"check",
|
||||
"current",
|
||||
"status",
|
||||
"system"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"inspection"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "clarify",
|
||||
"selected_ids": [],
|
||||
"selected_payloads": [],
|
||||
"rejected_ids": [],
|
||||
"rationale": "No high-confidence route found from the current heuristic baseline.",
|
||||
"estimated_cost": 0.0
|
||||
}
|
||||
],
|
||||
"events": [
|
||||
{
|
||||
"event_id": "evt-1cfd8f39-f961-43da-9fb4-9e37dd7072f0",
|
||||
"trajectory_id": "traj-f511e978-ad79-4be6-bbab-461b5ad9ecb3",
|
||||
"timestamp": "2026-04-14T16:50:18.863119+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "task_received",
|
||||
"payload": {
|
||||
"input": "Use multiple capabilities: memory, skill, and tool."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-a7f6a38f-76c5-4342-a592-4acbd15efe9f",
|
||||
"trajectory_id": "traj-f511e978-ad79-4be6-bbab-461b5ad9ecb3",
|
||||
"timestamp": "2026-04-14T16:50:18.863129+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "candidates_recalled",
|
||||
"payload": {
|
||||
"memory_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"skill_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"tool_ids": [
|
||||
"tool-terminal"
|
||||
]
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-1cfd8f39-f961-43da-9fb4-9e37dd7072f0"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-79e3d820-34bf-4c20-9286-2e20dd3e068c",
|
||||
"trajectory_id": "traj-f511e978-ad79-4be6-bbab-461b5ad9ecb3",
|
||||
"timestamp": "2026-04-14T16:50:18.863136+00:00",
|
||||
"stage": "policy",
|
||||
"event_type": "action_selected",
|
||||
"payload": {
|
||||
"step": 1,
|
||||
"decision_type": "clarify",
|
||||
"selected_ids": [],
|
||||
"selected_payloads": [],
|
||||
"rejected_ids": [],
|
||||
"rationale": "No high-confidence route found from the current heuristic baseline.",
|
||||
"estimated_cost": 0.0
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-a7f6a38f-76c5-4342-a592-4acbd15efe9f"
|
||||
}
|
||||
],
|
||||
"outcome": {
|
||||
"status": "partial_success",
|
||||
"steps": 1,
|
||||
"latency_ms": 0,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Draft trajectory generated by MemabraRunner with execution hooks."
|
||||
},
|
||||
"reward": {
|
||||
"total": 0.44,
|
||||
"components": {
|
||||
"task_success": 0.4,
|
||||
"retrieval_hit": 0.1,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.0,
|
||||
"context_cost": 0.06,
|
||||
"useful_reuse": 0.0
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,192 @@
|
||||
{
|
||||
"trajectory_id": "traj-ffb40d01-7956-4d7b-a41c-9618487fe619",
|
||||
"task": {
|
||||
"task_id": "task-f0aed2e6-8d9b-42f8-a20c-5eb8af052d3b",
|
||||
"input": "Recall my saved preference from memory.",
|
||||
"channel": "local",
|
||||
"created_at": "2026-04-14T15:52:24.605509+00:00",
|
||||
"user_id": null
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "",
|
||||
"environment_summary": "",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-telegram-pref",
|
||||
"type": "memory",
|
||||
"title": "Telegram preference",
|
||||
"summary": "Prefer plain text on Telegram.",
|
||||
"triggers": [
|
||||
"telegram",
|
||||
"preference",
|
||||
"answer"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.9,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"output"
|
||||
],
|
||||
"source": "user",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"skill": [
|
||||
{
|
||||
"id": "skill-deploy",
|
||||
"type": "skill",
|
||||
"title": "Deploy workflow",
|
||||
"summary": "Reusable deployment workflow.",
|
||||
"triggers": [
|
||||
"deploy",
|
||||
"workflow",
|
||||
"service"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.8,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 0.8,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"ops"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-terminal",
|
||||
"type": "tool",
|
||||
"title": "terminal",
|
||||
"summary": "Run terminal-style inspection commands.",
|
||||
"triggers": [
|
||||
"check",
|
||||
"current",
|
||||
"status",
|
||||
"system"
|
||||
],
|
||||
"cost": 0.0,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.9,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.0,
|
||||
"tags": [
|
||||
"inspection"
|
||||
],
|
||||
"source": "system",
|
||||
"type_payload": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "load_skill",
|
||||
"selected_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Predicted by learning router (score=1658.6938).",
|
||||
"estimated_cost": 0.0
|
||||
}
|
||||
],
|
||||
"events": [
|
||||
{
|
||||
"event_id": "evt-44233637-eb1a-47de-972c-942ee409dd78",
|
||||
"trajectory_id": "traj-ffb40d01-7956-4d7b-a41c-9618487fe619",
|
||||
"timestamp": "2026-04-14T15:52:24.605614+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "task_received",
|
||||
"payload": {
|
||||
"input": "Recall my saved preference from memory."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
},
|
||||
{
|
||||
"event_id": "evt-01123ad4-7d52-4c82-bca1-1a3b5014196f",
|
||||
"trajectory_id": "traj-ffb40d01-7956-4d7b-a41c-9618487fe619",
|
||||
"timestamp": "2026-04-14T15:52:24.605625+00:00",
|
||||
"stage": "retrieval",
|
||||
"event_type": "candidates_recalled",
|
||||
"payload": {
|
||||
"memory_ids": [
|
||||
"mem-telegram-pref"
|
||||
],
|
||||
"skill_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"tool_ids": [
|
||||
"tool-terminal"
|
||||
]
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-44233637-eb1a-47de-972c-942ee409dd78"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-a9a657a1-1e3e-49f3-8ea0-9528c12c633f",
|
||||
"trajectory_id": "traj-ffb40d01-7956-4d7b-a41c-9618487fe619",
|
||||
"timestamp": "2026-04-14T15:52:24.605632+00:00",
|
||||
"stage": "policy",
|
||||
"event_type": "action_selected",
|
||||
"payload": {
|
||||
"step": 1,
|
||||
"decision_type": "load_skill",
|
||||
"selected_ids": [
|
||||
"skill-deploy"
|
||||
],
|
||||
"selected_payloads": [
|
||||
{}
|
||||
],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Predicted by learning router (score=1658.6938).",
|
||||
"estimated_cost": 0.0
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": "evt-01123ad4-7d52-4c82-bca1-1a3b5014196f"
|
||||
},
|
||||
{
|
||||
"event_id": "evt-skill-traj-ffb40d01-7956-4d7b-a41c-9618487fe619-skill-deploy",
|
||||
"trajectory_id": "traj-ffb40d01-7956-4d7b-a41c-9618487fe619",
|
||||
"timestamp": "2026-04-14T15:52:24.605642+00:00",
|
||||
"stage": "execution",
|
||||
"event_type": "skill_loaded",
|
||||
"payload": {
|
||||
"skill_id": "skill-deploy",
|
||||
"input": "Recall my saved preference from memory.",
|
||||
"instructions": "Demo skill payload loaded successfully."
|
||||
},
|
||||
"metrics": {},
|
||||
"parent_event_id": null
|
||||
}
|
||||
],
|
||||
"outcome": {
|
||||
"status": "success",
|
||||
"steps": 1,
|
||||
"latency_ms": 0,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Draft trajectory generated by MemabraRunner with execution hooks."
|
||||
},
|
||||
"reward": {
|
||||
"total": 1.04,
|
||||
"components": {
|
||||
"task_success": 0.8,
|
||||
"retrieval_hit": 0.2,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.0,
|
||||
"context_cost": 0.06,
|
||||
"useful_reuse": 0.1
|
||||
}
|
||||
}
|
||||
}
|
||||
66
docs/examples/trajectory_failure_missed_memory.json
Normal file
66
docs/examples/trajectory_failure_missed_memory.json
Normal file
@@ -0,0 +1,66 @@
|
||||
{
|
||||
"trajectory_id": "traj-failure-missed-memory-001",
|
||||
"task": {
|
||||
"task_id": "task-004",
|
||||
"input": "Use my usual formatting preferences for this write-up.",
|
||||
"channel": "telegram",
|
||||
"created_at": "2026-04-14T13:05:00Z",
|
||||
"user_id": "oza"
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "User has repeated stable formatting preferences in earlier sessions.",
|
||||
"environment_summary": "No tool call required.",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-format-1",
|
||||
"type": "memory",
|
||||
"title": "Telegram formatting preference",
|
||||
"summary": "Prefer plain text over markdown for Telegram delivery.",
|
||||
"triggers": ["format", "telegram", "write-up"],
|
||||
"cost": 0.05,
|
||||
"confidence": 0.9,
|
||||
"success_rate": 0.95,
|
||||
"freshness": 0.95,
|
||||
"risk": 0.05,
|
||||
"tags": ["preference", "output"],
|
||||
"source": "system"
|
||||
}
|
||||
],
|
||||
"skill": [],
|
||||
"tool": []
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "direct_answer",
|
||||
"selected_ids": [],
|
||||
"rejected_ids": ["mem-format-1"],
|
||||
"rationale": "Router failed to recognize a preference-triggered task and skipped memory injection.",
|
||||
"estimated_cost": 0.0
|
||||
}
|
||||
],
|
||||
"events": [],
|
||||
"outcome": {
|
||||
"status": "partial_success",
|
||||
"steps": 1,
|
||||
"latency_ms": 300,
|
||||
"user_corrections": 1,
|
||||
"tool_errors": 0,
|
||||
"notes": "Answer was serviceable but ignored known formatting preference."
|
||||
},
|
||||
"reward": {
|
||||
"total": 0.18,
|
||||
"components": {
|
||||
"task_success": 0.5,
|
||||
"retrieval_hit": -0.1,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.2,
|
||||
"latency": 0.02,
|
||||
"context_cost": 0.0,
|
||||
"useful_reuse": 0.0
|
||||
}
|
||||
}
|
||||
}
|
||||
67
docs/examples/trajectory_failure_overtool.json
Normal file
67
docs/examples/trajectory_failure_overtool.json
Normal file
@@ -0,0 +1,67 @@
|
||||
{
|
||||
"trajectory_id": "traj-failure-overtool-001",
|
||||
"task": {
|
||||
"task_id": "task-003",
|
||||
"input": "Name this project.",
|
||||
"channel": "telegram",
|
||||
"created_at": "2026-04-14T13:04:00Z",
|
||||
"user_id": "oza"
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "User asks for naming help for an agent memory project.",
|
||||
"environment_summary": "No real-time state lookup required.",
|
||||
"recent_failures": ["The agent previously overused tools for pure reasoning tasks."]
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [],
|
||||
"skill": [],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-web-1",
|
||||
"type": "tool",
|
||||
"title": "web_search",
|
||||
"summary": "Search the web for information.",
|
||||
"triggers": ["name", "idea"],
|
||||
"cost": 0.4,
|
||||
"confidence": 0.62,
|
||||
"success_rate": 0.55,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.3,
|
||||
"tags": ["research"],
|
||||
"source": "system"
|
||||
}
|
||||
],
|
||||
"skill": []
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "call_tool",
|
||||
"selected_ids": ["tool-web-1"],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Incorrectly treated naming as a research task rather than a reasoning task.",
|
||||
"estimated_cost": 0.4
|
||||
}
|
||||
],
|
||||
"events": [],
|
||||
"outcome": {
|
||||
"status": "failure",
|
||||
"steps": 2,
|
||||
"latency_ms": 2400,
|
||||
"user_corrections": 1,
|
||||
"tool_errors": 1,
|
||||
"notes": "Over-tooled a pure reasoning task and forced unnecessary latency."
|
||||
},
|
||||
"reward": {
|
||||
"total": -0.82,
|
||||
"components": {
|
||||
"task_success": -0.3,
|
||||
"retrieval_hit": 0.0,
|
||||
"tool_error": 0.35,
|
||||
"user_correction": 0.25,
|
||||
"latency": 0.12,
|
||||
"context_cost": 0.1,
|
||||
"useful_reuse": 0.0
|
||||
}
|
||||
}
|
||||
}
|
||||
66
docs/examples/trajectory_success_memory.json
Normal file
66
docs/examples/trajectory_success_memory.json
Normal file
@@ -0,0 +1,66 @@
|
||||
{
|
||||
"trajectory_id": "traj-success-memory-001",
|
||||
"task": {
|
||||
"task_id": "task-001",
|
||||
"input": "Remember my preferred deployment region and use it next time.",
|
||||
"channel": "telegram",
|
||||
"created_at": "2026-04-14T13:02:00Z",
|
||||
"user_id": "oza"
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "User is defining a local agent memory project and references recurring preferences.",
|
||||
"environment_summary": "No live tool call required.",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [
|
||||
{
|
||||
"id": "mem-region-1",
|
||||
"type": "memory",
|
||||
"title": "Preferred deployment region",
|
||||
"summary": "User prefers us-west-2 for deployments.",
|
||||
"triggers": ["deployment", "region", "preference"],
|
||||
"cost": 0.1,
|
||||
"confidence": 0.93,
|
||||
"success_rate": 0.88,
|
||||
"freshness": 0.9,
|
||||
"risk": 0.1,
|
||||
"tags": ["preference", "deployment"],
|
||||
"source": "user"
|
||||
}
|
||||
],
|
||||
"skill": [],
|
||||
"tool": []
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "inject_memory",
|
||||
"selected_ids": ["mem-region-1"],
|
||||
"rejected_ids": [],
|
||||
"rationale": "User request depends on a stable preference, so memory injection is the lowest-cost correct route.",
|
||||
"estimated_cost": 0.1
|
||||
}
|
||||
],
|
||||
"events": [],
|
||||
"outcome": {
|
||||
"status": "success",
|
||||
"steps": 1,
|
||||
"latency_ms": 350,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Correctly identified preference storage request without unnecessary tools."
|
||||
},
|
||||
"reward": {
|
||||
"total": 1.72,
|
||||
"components": {
|
||||
"task_success": 1.0,
|
||||
"retrieval_hit": 0.45,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.03,
|
||||
"context_cost": 0.05,
|
||||
"useful_reuse": 0.35
|
||||
}
|
||||
}
|
||||
}
|
||||
67
docs/examples/trajectory_success_tool.json
Normal file
67
docs/examples/trajectory_success_tool.json
Normal file
@@ -0,0 +1,67 @@
|
||||
{
|
||||
"trajectory_id": "traj-success-tool-001",
|
||||
"task": {
|
||||
"task_id": "task-002",
|
||||
"input": "Check the current test status for the prototype.",
|
||||
"channel": "telegram",
|
||||
"created_at": "2026-04-14T13:03:00Z",
|
||||
"user_id": "oza"
|
||||
},
|
||||
"context_snapshot": {
|
||||
"conversation_summary": "User wants concrete progress on the memabra prototype.",
|
||||
"environment_summary": "Pytest is available in the local repo environment.",
|
||||
"recent_failures": []
|
||||
},
|
||||
"candidate_sets": {
|
||||
"memory": [],
|
||||
"skill": [],
|
||||
"tool": [
|
||||
{
|
||||
"id": "tool-terminal-1",
|
||||
"type": "tool",
|
||||
"title": "terminal",
|
||||
"summary": "Run shell commands in the local environment.",
|
||||
"triggers": ["check", "current", "test"],
|
||||
"cost": 0.2,
|
||||
"confidence": 0.95,
|
||||
"success_rate": 0.92,
|
||||
"freshness": 1.0,
|
||||
"risk": 0.2,
|
||||
"tags": ["system", "tests"],
|
||||
"source": "system"
|
||||
}
|
||||
],
|
||||
"skill": []
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"step": 1,
|
||||
"decision_type": "call_tool",
|
||||
"selected_ids": ["tool-terminal-1"],
|
||||
"rejected_ids": [],
|
||||
"rationale": "Current test status is a live system fact and must be observed with a tool.",
|
||||
"estimated_cost": 0.2
|
||||
}
|
||||
],
|
||||
"events": [],
|
||||
"outcome": {
|
||||
"status": "success",
|
||||
"steps": 1,
|
||||
"latency_ms": 700,
|
||||
"user_corrections": 0,
|
||||
"tool_errors": 0,
|
||||
"notes": "Terminal used appropriately to inspect live test state."
|
||||
},
|
||||
"reward": {
|
||||
"total": 1.6,
|
||||
"components": {
|
||||
"task_success": 1.0,
|
||||
"retrieval_hit": 0.4,
|
||||
"tool_error": 0.0,
|
||||
"user_correction": 0.0,
|
||||
"latency": 0.08,
|
||||
"context_cost": 0.02,
|
||||
"useful_reuse": 0.3
|
||||
}
|
||||
}
|
||||
}
|
||||
191
docs/reward_spec.md
Normal file
191
docs/reward_spec.md
Normal file
@@ -0,0 +1,191 @@
|
||||
# Reward Specification
|
||||
|
||||
## 目标
|
||||
|
||||
memabra 的 reward 不是简单判断“任务做成没”,而是评估:
|
||||
- 是否选对了 memory / skill / tool
|
||||
- 是否高效
|
||||
- 是否稳定
|
||||
- 是否减少了用户重复输入和纠正
|
||||
- 是否控制了工具成本与上下文成本
|
||||
|
||||
reward 的作用不是直接美化分数,而是给路由策略提供可归因、可优化的训练信号。
|
||||
|
||||
## Reward 组成
|
||||
|
||||
总奖励记为:
|
||||
|
||||
```text
|
||||
R = ws*S + wr*H - we*E - wc*C - wl*L - wx*X + wu*U
|
||||
```
|
||||
|
||||
其中:
|
||||
- `S` = task success
|
||||
- `H` = retrieval hit quality
|
||||
- `E` = execution/tool error penalty
|
||||
- `C` = user correction penalty
|
||||
- `L` = latency penalty
|
||||
- `X` = context cost penalty
|
||||
- `U` = useful reuse bonus
|
||||
|
||||
## 1. Task Success (`S`)
|
||||
|
||||
定义:任务最终是否完成,以及完成质量如何。
|
||||
|
||||
建议取值:
|
||||
- `1.0`:完整达成目标
|
||||
- `0.5`:部分达成
|
||||
- `0.0`:未完成
|
||||
- `-0.5`:明显误导或做错方向
|
||||
|
||||
数据来源:
|
||||
- 自动任务验收器
|
||||
- 用户显式反馈
|
||||
- 回放对比规则
|
||||
|
||||
## 2. Retrieval Hit Quality (`H`)
|
||||
|
||||
定义:是否命中对任务真正有帮助的 memory / skill / tool。
|
||||
|
||||
建议拆分:
|
||||
- `Hm`:memory hit
|
||||
- `Hs`:skill hit
|
||||
- `Ht`:tool hit
|
||||
|
||||
取值思路:
|
||||
- 命中高价值候选并帮助减少步骤:正奖励
|
||||
- 召回很多但没用:低奖励或 0
|
||||
- 漏掉关键候选:负奖励
|
||||
|
||||
## 3. Execution / Tool Error Penalty (`E`)
|
||||
|
||||
定义:是否出现无效调用、错误调用、明显多余调用。
|
||||
|
||||
示例:
|
||||
- 调了不该调的工具
|
||||
- 工具参数明显错
|
||||
- 重复调用同一无效动作
|
||||
- 本可以直接答,却走了长链路
|
||||
|
||||
建议取值:
|
||||
- 每次轻微错误:`0.1` 到 `0.3`
|
||||
- 严重错误:`0.5` 到 `1.0`
|
||||
|
||||
## 4. User Correction Penalty (`C`)
|
||||
|
||||
定义:用户是否需要补充本应已知的信息,或纠正错误动作。
|
||||
|
||||
示例:
|
||||
- 用户重复说明偏好
|
||||
- 用户指出调用了错误工具
|
||||
- 用户要求撤回错误记忆
|
||||
|
||||
解释:
|
||||
这项对长期系统非常关键,因为它直接代表“系统到底有没有真正学会”。
|
||||
|
||||
## 5. Latency Penalty (`L`)
|
||||
|
||||
定义:系统完成任务消耗的时间和步骤是否过长。
|
||||
|
||||
建议包括:
|
||||
- wall-clock latency
|
||||
- action count
|
||||
- retry count
|
||||
|
||||
思路:
|
||||
- 少量额外推理可以接受
|
||||
- 大量无效绕路必须惩罚
|
||||
|
||||
## 6. Context Cost Penalty (`X`)
|
||||
|
||||
定义:是否过度膨胀上下文。
|
||||
|
||||
包括:
|
||||
- 注入了太多无关 memory
|
||||
- 加载了不必要的 skill
|
||||
- 输出了过大的中间内容
|
||||
|
||||
原因:
|
||||
agent 很容易“为了保险多塞一点”,结果把上下文拖死。
|
||||
这个成本必须显式进 reward。
|
||||
|
||||
## 7. Useful Reuse Bonus (`U`)
|
||||
|
||||
定义:是否复用了正确的长期信息,并确实提升了效率或质量。
|
||||
|
||||
例子:
|
||||
- 成功复用用户偏好,避免再次确认
|
||||
- 复用已验证的 skill,减少试错
|
||||
- 复用相似 episode,加速完成任务
|
||||
|
||||
## 初始权重建议
|
||||
|
||||
可先用一个朴素版本:
|
||||
|
||||
```text
|
||||
ws = 1.0
|
||||
wr = 0.35
|
||||
we = 0.30
|
||||
wc = 0.40
|
||||
wl = 0.15
|
||||
wx = 0.20
|
||||
wu = 0.25
|
||||
```
|
||||
|
||||
解释:
|
||||
- success 最高
|
||||
- user correction 罚得较重,因为它直接暴露系统没学会
|
||||
- retrieval hit 有明显价值,但不能盖过结果
|
||||
- latency/context 重要,但初期不该过重
|
||||
|
||||
## 信号来源
|
||||
|
||||
reward 可来自三类来源:
|
||||
|
||||
### A. 显式信号
|
||||
- 用户说“对/不对”
|
||||
- 用户纠正
|
||||
- 用户二次要求重做
|
||||
|
||||
### B. 隐式信号
|
||||
- 是否减少步骤
|
||||
- 是否触发错误
|
||||
- 是否重复问同样的问题
|
||||
- 是否超时
|
||||
|
||||
### C. 程序性验收
|
||||
- 测试是否通过
|
||||
- 目标文件是否生成
|
||||
- 指定字段是否匹配
|
||||
- 工具执行是否成功
|
||||
|
||||
## 反事实记录要求
|
||||
|
||||
为后续训练,必须记录:
|
||||
- 候选集有哪些
|
||||
- 最终选了谁
|
||||
- 哪些高分候选没有被选
|
||||
- 每个动作的局部 outcome
|
||||
|
||||
否则 reward 只能打给“整个过程”,无法学习具体路由策略。
|
||||
|
||||
## 初期策略
|
||||
|
||||
Phase 0 / Phase 1 不建议直接把 reward 用于大模型权重更新。
|
||||
先用于:
|
||||
- 路由规则评估
|
||||
- 样本打标
|
||||
- 候选排序优化
|
||||
- bandit / reranker 训练
|
||||
|
||||
## 风险
|
||||
|
||||
- 只看 success,会奖励瞎猫碰死耗子
|
||||
- 只看效率,会让系统不敢探索
|
||||
- 只看用户反馈,会受用户表达噪声影响
|
||||
- 不记录反事实,训练会非常盲
|
||||
|
||||
## 当前结论
|
||||
|
||||
reward 在 memabra 中不是附属件,而是学习闭环的核心基础设施。
|
||||
如果 reward 设计不清,后面所有“根据结果更新权重”都会变成伪学习。
|
||||
13
docs/router-versions/current.json
Normal file
13
docs/router-versions/current.json
Normal file
@@ -0,0 +1,13 @@
|
||||
{
|
||||
"current_version_id": "20260415-023347",
|
||||
"promotion_source": null,
|
||||
"benchmark_summary": {
|
||||
"reward_delta": 0.0,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": 0.0,
|
||||
"baseline_avg_reward": 0.44,
|
||||
"challenger_avg_reward": 0.44
|
||||
},
|
||||
"prior_version_id": "20260415-023347",
|
||||
"saved_at": "2026-04-15T02:33:47.916903+00:00"
|
||||
}
|
||||
35
docs/router-versions/versions/20260414-150123.json
Normal file
35
docs/router-versions/versions/20260414-150123.json
Normal file
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"version_id": "20260414-150123",
|
||||
"weights": {
|
||||
"clarify": {
|
||||
"input_length": 6.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9500000000000001,
|
||||
"top_skill_success_rate": 0.8999999999999998,
|
||||
"top_tool_confidence": 0.9500000000000001,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"source": "online_learning",
|
||||
"benchmark_summary": {
|
||||
"reward_delta": 0.0,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": 0.0,
|
||||
"baseline_avg_reward": 0.44,
|
||||
"challenger_avg_reward": 0.44
|
||||
}
|
||||
}
|
||||
}
|
||||
35
docs/router-versions/versions/20260414-150127.json
Normal file
35
docs/router-versions/versions/20260414-150127.json
Normal file
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"version_id": "20260414-150127",
|
||||
"weights": {
|
||||
"clarify": {
|
||||
"input_length": 6.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9500000000000001,
|
||||
"top_skill_success_rate": 0.8999999999999998,
|
||||
"top_tool_confidence": 0.9500000000000001,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"source": "online_learning",
|
||||
"benchmark_summary": {
|
||||
"reward_delta": 0.0,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": 0.0,
|
||||
"baseline_avg_reward": 0.44,
|
||||
"challenger_avg_reward": 0.44
|
||||
}
|
||||
}
|
||||
}
|
||||
35
docs/router-versions/versions/20260414-150228.json
Normal file
35
docs/router-versions/versions/20260414-150228.json
Normal file
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"version_id": "20260414-150228",
|
||||
"weights": {
|
||||
"clarify": {
|
||||
"input_length": 6.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9500000000000001,
|
||||
"top_skill_success_rate": 0.8999999999999998,
|
||||
"top_tool_confidence": 0.9500000000000001,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"source": "online_learning",
|
||||
"benchmark_summary": {
|
||||
"reward_delta": 0.0,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": 0.0,
|
||||
"baseline_avg_reward": 0.44,
|
||||
"challenger_avg_reward": 0.44
|
||||
}
|
||||
}
|
||||
}
|
||||
35
docs/router-versions/versions/20260414-150426.json
Normal file
35
docs/router-versions/versions/20260414-150426.json
Normal file
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"version_id": "20260414-150426",
|
||||
"weights": {
|
||||
"clarify": {
|
||||
"input_length": 6.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9500000000000001,
|
||||
"top_skill_success_rate": 0.8999999999999998,
|
||||
"top_tool_confidence": 0.9500000000000001,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"source": "online_learning",
|
||||
"benchmark_summary": {
|
||||
"reward_delta": 0.0,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": 0.0,
|
||||
"baseline_avg_reward": 0.44,
|
||||
"challenger_avg_reward": 0.44
|
||||
}
|
||||
}
|
||||
}
|
||||
35
docs/router-versions/versions/20260414-152505.json
Normal file
35
docs/router-versions/versions/20260414-152505.json
Normal file
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"version_id": "20260414-152505",
|
||||
"weights": {
|
||||
"clarify": {
|
||||
"input_length": 6.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9500000000000001,
|
||||
"top_skill_success_rate": 0.8999999999999998,
|
||||
"top_tool_confidence": 0.9500000000000001,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"source": "online_learning",
|
||||
"benchmark_summary": {
|
||||
"reward_delta": 0.0,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": 0.0,
|
||||
"baseline_avg_reward": 0.44,
|
||||
"challenger_avg_reward": 0.44
|
||||
}
|
||||
}
|
||||
}
|
||||
35
docs/router-versions/versions/20260414-152530.json
Normal file
35
docs/router-versions/versions/20260414-152530.json
Normal file
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"version_id": "20260414-152530",
|
||||
"weights": {
|
||||
"clarify": {
|
||||
"input_length": 6.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9500000000000001,
|
||||
"top_skill_success_rate": 0.8999999999999998,
|
||||
"top_tool_confidence": 0.9500000000000001,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"source": "online_learning",
|
||||
"benchmark_summary": {
|
||||
"reward_delta": 0.0,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": 0.0,
|
||||
"baseline_avg_reward": 0.44,
|
||||
"challenger_avg_reward": 0.44
|
||||
}
|
||||
}
|
||||
}
|
||||
35
docs/router-versions/versions/20260414-152625.json
Normal file
35
docs/router-versions/versions/20260414-152625.json
Normal file
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"version_id": "20260414-152625",
|
||||
"weights": {
|
||||
"clarify": {
|
||||
"input_length": 6.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9500000000000001,
|
||||
"top_skill_success_rate": 0.8999999999999998,
|
||||
"top_tool_confidence": 0.9500000000000001,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"source": "online_learning",
|
||||
"benchmark_summary": {
|
||||
"reward_delta": 0.0,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": 0.0,
|
||||
"baseline_avg_reward": 0.44,
|
||||
"challenger_avg_reward": 0.44
|
||||
}
|
||||
}
|
||||
}
|
||||
35
docs/router-versions/versions/20260414-152935.json
Normal file
35
docs/router-versions/versions/20260414-152935.json
Normal file
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"version_id": "20260414-152935",
|
||||
"weights": {
|
||||
"clarify": {
|
||||
"input_length": 6.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9500000000000001,
|
||||
"top_skill_success_rate": 0.8999999999999998,
|
||||
"top_tool_confidence": 0.9500000000000001,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"source": "online_learning",
|
||||
"benchmark_summary": {
|
||||
"reward_delta": 0.0,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": 0.0,
|
||||
"baseline_avg_reward": 0.44,
|
||||
"challenger_avg_reward": 0.44
|
||||
}
|
||||
}
|
||||
}
|
||||
35
docs/router-versions/versions/20260414-152941.json
Normal file
35
docs/router-versions/versions/20260414-152941.json
Normal file
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"version_id": "20260414-152941",
|
||||
"weights": {
|
||||
"clarify": {
|
||||
"input_length": 6.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9500000000000001,
|
||||
"top_skill_success_rate": 0.8999999999999998,
|
||||
"top_tool_confidence": 0.9500000000000001,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"source": "online_learning",
|
||||
"benchmark_summary": {
|
||||
"reward_delta": 0.0,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": 0.0,
|
||||
"baseline_avg_reward": 0.44,
|
||||
"challenger_avg_reward": 0.44
|
||||
}
|
||||
}
|
||||
}
|
||||
35
docs/router-versions/versions/20260414-155036.json
Normal file
35
docs/router-versions/versions/20260414-155036.json
Normal file
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"version_id": "20260414-155036",
|
||||
"weights": {
|
||||
"clarify": {
|
||||
"input_length": 6.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9500000000000001,
|
||||
"top_skill_success_rate": 0.8999999999999998,
|
||||
"top_tool_confidence": 0.9500000000000001,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"source": "online_learning",
|
||||
"benchmark_summary": {
|
||||
"reward_delta": 0.0,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": 0.0,
|
||||
"baseline_avg_reward": 0.44,
|
||||
"challenger_avg_reward": 0.44
|
||||
}
|
||||
}
|
||||
}
|
||||
35
docs/router-versions/versions/20260414-155251.json
Normal file
35
docs/router-versions/versions/20260414-155251.json
Normal file
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"version_id": "20260414-155251",
|
||||
"weights": {
|
||||
"clarify": {
|
||||
"input_length": 6.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9500000000000001,
|
||||
"top_skill_success_rate": 0.8999999999999998,
|
||||
"top_tool_confidence": 0.9500000000000001,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"source": "online_learning",
|
||||
"benchmark_summary": {
|
||||
"reward_delta": 0.0,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": 0.0,
|
||||
"baseline_avg_reward": 0.44,
|
||||
"challenger_avg_reward": 0.44
|
||||
}
|
||||
}
|
||||
}
|
||||
35
docs/router-versions/versions/20260414-155350.json
Normal file
35
docs/router-versions/versions/20260414-155350.json
Normal file
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"version_id": "20260414-155350",
|
||||
"weights": {
|
||||
"clarify": {
|
||||
"input_length": 6.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9500000000000001,
|
||||
"top_skill_success_rate": 0.8999999999999998,
|
||||
"top_tool_confidence": 0.9500000000000001,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"source": "online_learning",
|
||||
"benchmark_summary": {
|
||||
"reward_delta": 0.0,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": 0.0,
|
||||
"baseline_avg_reward": 0.44,
|
||||
"challenger_avg_reward": 0.44
|
||||
}
|
||||
}
|
||||
}
|
||||
35
docs/router-versions/versions/20260414-164944.json
Normal file
35
docs/router-versions/versions/20260414-164944.json
Normal file
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"version_id": "20260414-164944",
|
||||
"weights": {
|
||||
"clarify": {
|
||||
"input_length": 6.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9500000000000001,
|
||||
"top_skill_success_rate": 0.8999999999999998,
|
||||
"top_tool_confidence": 0.9500000000000001,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"source": "online_learning",
|
||||
"benchmark_summary": {
|
||||
"reward_delta": 0.0,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": 0.0,
|
||||
"baseline_avg_reward": 0.44,
|
||||
"challenger_avg_reward": 0.44
|
||||
}
|
||||
}
|
||||
}
|
||||
35
docs/router-versions/versions/20260414-165138.json
Normal file
35
docs/router-versions/versions/20260414-165138.json
Normal file
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"version_id": "20260414-165138",
|
||||
"weights": {
|
||||
"clarify": {
|
||||
"input_length": 6.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9500000000000001,
|
||||
"top_skill_success_rate": 0.8999999999999998,
|
||||
"top_tool_confidence": 0.9500000000000001,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"source": "online_learning",
|
||||
"benchmark_summary": {
|
||||
"reward_delta": 0.0,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": 0.0,
|
||||
"baseline_avg_reward": 0.44,
|
||||
"challenger_avg_reward": 0.44
|
||||
}
|
||||
}
|
||||
}
|
||||
35
docs/router-versions/versions/20260414-165207.json
Normal file
35
docs/router-versions/versions/20260414-165207.json
Normal file
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"version_id": "20260414-165207",
|
||||
"weights": {
|
||||
"clarify": {
|
||||
"input_length": 6.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9500000000000001,
|
||||
"top_skill_success_rate": 0.8999999999999998,
|
||||
"top_tool_confidence": 0.9500000000000001,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"source": "online_learning",
|
||||
"benchmark_summary": {
|
||||
"reward_delta": 0.0,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": 0.0,
|
||||
"baseline_avg_reward": 0.44,
|
||||
"challenger_avg_reward": 0.44
|
||||
}
|
||||
}
|
||||
}
|
||||
35
docs/router-versions/versions/20260414-165241.json
Normal file
35
docs/router-versions/versions/20260414-165241.json
Normal file
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"version_id": "20260414-165241",
|
||||
"weights": {
|
||||
"clarify": {
|
||||
"input_length": 6.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9500000000000001,
|
||||
"top_skill_success_rate": 0.8999999999999998,
|
||||
"top_tool_confidence": 0.9500000000000001,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"source": "online_learning",
|
||||
"benchmark_summary": {
|
||||
"reward_delta": 0.0,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": 0.0,
|
||||
"baseline_avg_reward": 0.44,
|
||||
"challenger_avg_reward": 0.44
|
||||
}
|
||||
}
|
||||
}
|
||||
35
docs/router-versions/versions/20260414-165316.json
Normal file
35
docs/router-versions/versions/20260414-165316.json
Normal file
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"version_id": "20260414-165316",
|
||||
"weights": {
|
||||
"clarify": {
|
||||
"input_length": 6.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9500000000000001,
|
||||
"top_skill_success_rate": 0.8999999999999998,
|
||||
"top_tool_confidence": 0.9500000000000001,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"source": "online_learning",
|
||||
"benchmark_summary": {
|
||||
"reward_delta": 0.0,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": 0.0,
|
||||
"baseline_avg_reward": 0.44,
|
||||
"challenger_avg_reward": 0.44
|
||||
}
|
||||
}
|
||||
}
|
||||
35
docs/router-versions/versions/20260414-165359.json
Normal file
35
docs/router-versions/versions/20260414-165359.json
Normal file
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"version_id": "20260414-165359",
|
||||
"weights": {
|
||||
"clarify": {
|
||||
"input_length": 6.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9500000000000001,
|
||||
"top_skill_success_rate": 0.8999999999999998,
|
||||
"top_tool_confidence": 0.9500000000000001,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"source": "online_learning",
|
||||
"benchmark_summary": {
|
||||
"reward_delta": 0.0,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": 0.0,
|
||||
"baseline_avg_reward": 0.44,
|
||||
"challenger_avg_reward": 0.44
|
||||
}
|
||||
}
|
||||
}
|
||||
35
docs/router-versions/versions/20260414-165450.json
Normal file
35
docs/router-versions/versions/20260414-165450.json
Normal file
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"version_id": "20260414-165450",
|
||||
"weights": {
|
||||
"clarify": {
|
||||
"input_length": 6.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9500000000000001,
|
||||
"top_skill_success_rate": 0.8999999999999998,
|
||||
"top_tool_confidence": 0.9500000000000001,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"source": "online_learning",
|
||||
"benchmark_summary": {
|
||||
"reward_delta": 0.0,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": 0.0,
|
||||
"baseline_avg_reward": 0.44,
|
||||
"challenger_avg_reward": 0.44
|
||||
}
|
||||
}
|
||||
}
|
||||
35
docs/router-versions/versions/20260414-171516.json
Normal file
35
docs/router-versions/versions/20260414-171516.json
Normal file
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"version_id": "20260414-171516",
|
||||
"weights": {
|
||||
"clarify": {
|
||||
"input_length": 6.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9500000000000001,
|
||||
"top_skill_success_rate": 0.8999999999999998,
|
||||
"top_tool_confidence": 0.9500000000000001,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"source": "online_learning",
|
||||
"benchmark_summary": {
|
||||
"reward_delta": 0.0,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": 0.0,
|
||||
"baseline_avg_reward": 0.44,
|
||||
"challenger_avg_reward": 0.44
|
||||
}
|
||||
}
|
||||
}
|
||||
35
docs/router-versions/versions/20260414-171623.json
Normal file
35
docs/router-versions/versions/20260414-171623.json
Normal file
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"version_id": "20260414-171623",
|
||||
"weights": {
|
||||
"clarify": {
|
||||
"input_length": 6.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9500000000000001,
|
||||
"top_skill_success_rate": 0.8999999999999998,
|
||||
"top_tool_confidence": 0.9500000000000001,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"source": "online_learning",
|
||||
"benchmark_summary": {
|
||||
"reward_delta": 0.0,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": 0.0,
|
||||
"baseline_avg_reward": 0.44,
|
||||
"challenger_avg_reward": 0.44
|
||||
}
|
||||
}
|
||||
}
|
||||
35
docs/router-versions/versions/20260414-171651.json
Normal file
35
docs/router-versions/versions/20260414-171651.json
Normal file
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"version_id": "20260414-171651",
|
||||
"weights": {
|
||||
"clarify": {
|
||||
"input_length": 6.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9500000000000001,
|
||||
"top_skill_success_rate": 0.8999999999999998,
|
||||
"top_tool_confidence": 0.9500000000000001,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"source": "online_learning",
|
||||
"benchmark_summary": {
|
||||
"reward_delta": 0.0,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": 0.0,
|
||||
"baseline_avg_reward": 0.44,
|
||||
"challenger_avg_reward": 0.44
|
||||
}
|
||||
}
|
||||
}
|
||||
35
docs/router-versions/versions/20260414-171757.json
Normal file
35
docs/router-versions/versions/20260414-171757.json
Normal file
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"version_id": "20260414-171757",
|
||||
"weights": {
|
||||
"clarify": {
|
||||
"input_length": 6.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9500000000000001,
|
||||
"top_skill_success_rate": 0.8999999999999998,
|
||||
"top_tool_confidence": 0.9500000000000001,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"source": "online_learning",
|
||||
"benchmark_summary": {
|
||||
"reward_delta": 0.0,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": 0.0,
|
||||
"baseline_avg_reward": 0.44,
|
||||
"challenger_avg_reward": 0.44
|
||||
}
|
||||
}
|
||||
}
|
||||
35
docs/router-versions/versions/20260414-173832.json
Normal file
35
docs/router-versions/versions/20260414-173832.json
Normal file
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"version_id": "20260414-173832",
|
||||
"weights": {
|
||||
"clarify": {
|
||||
"input_length": 6.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9500000000000001,
|
||||
"top_skill_success_rate": 0.8999999999999998,
|
||||
"top_tool_confidence": 0.9500000000000001,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"source": "online_learning",
|
||||
"benchmark_summary": {
|
||||
"reward_delta": 0.0,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": 0.0,
|
||||
"baseline_avg_reward": 0.44,
|
||||
"challenger_avg_reward": 0.44
|
||||
}
|
||||
}
|
||||
}
|
||||
35
docs/router-versions/versions/20260414-180027.json
Normal file
35
docs/router-versions/versions/20260414-180027.json
Normal file
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"version_id": "20260414-180027",
|
||||
"weights": {
|
||||
"clarify": {
|
||||
"input_length": 6.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9500000000000001,
|
||||
"top_skill_success_rate": 0.8999999999999998,
|
||||
"top_tool_confidence": 0.9500000000000001,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"source": "online_learning",
|
||||
"benchmark_summary": {
|
||||
"reward_delta": 0.0,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": 0.0,
|
||||
"baseline_avg_reward": 0.44,
|
||||
"challenger_avg_reward": 0.44
|
||||
}
|
||||
}
|
||||
}
|
||||
35
docs/router-versions/versions/20260414-180106.json
Normal file
35
docs/router-versions/versions/20260414-180106.json
Normal file
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"version_id": "20260414-180106",
|
||||
"weights": {
|
||||
"clarify": {
|
||||
"input_length": 6.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9500000000000001,
|
||||
"top_skill_success_rate": 0.8999999999999998,
|
||||
"top_tool_confidence": 0.9500000000000001,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"source": "online_learning",
|
||||
"benchmark_summary": {
|
||||
"reward_delta": 0.0,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": 0.0,
|
||||
"baseline_avg_reward": 0.44,
|
||||
"challenger_avg_reward": 0.44
|
||||
}
|
||||
}
|
||||
}
|
||||
35
docs/router-versions/versions/20260414-180343.json
Normal file
35
docs/router-versions/versions/20260414-180343.json
Normal file
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"version_id": "20260414-180343",
|
||||
"weights": {
|
||||
"clarify": {
|
||||
"input_length": 6.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9500000000000001,
|
||||
"top_skill_success_rate": 0.8999999999999998,
|
||||
"top_tool_confidence": 0.9500000000000001,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"source": "online_learning",
|
||||
"benchmark_summary": {
|
||||
"reward_delta": 0.0,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": 0.0,
|
||||
"baseline_avg_reward": 0.44,
|
||||
"challenger_avg_reward": 0.44
|
||||
}
|
||||
}
|
||||
}
|
||||
35
docs/router-versions/versions/20260414-180515.json
Normal file
35
docs/router-versions/versions/20260414-180515.json
Normal file
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"version_id": "20260414-180515",
|
||||
"weights": {
|
||||
"clarify": {
|
||||
"input_length": 6.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9500000000000001,
|
||||
"top_skill_success_rate": 0.8999999999999998,
|
||||
"top_tool_confidence": 0.9500000000000001,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"source": "online_learning",
|
||||
"benchmark_summary": {
|
||||
"reward_delta": 0.0,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": 0.0,
|
||||
"baseline_avg_reward": 0.44,
|
||||
"challenger_avg_reward": 0.44
|
||||
}
|
||||
}
|
||||
}
|
||||
35
docs/router-versions/versions/20260414-180553.json
Normal file
35
docs/router-versions/versions/20260414-180553.json
Normal file
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"version_id": "20260414-180553",
|
||||
"weights": {
|
||||
"clarify": {
|
||||
"input_length": 6.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9500000000000001,
|
||||
"top_skill_success_rate": 0.8999999999999998,
|
||||
"top_tool_confidence": 0.9500000000000001,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"source": "online_learning",
|
||||
"benchmark_summary": {
|
||||
"reward_delta": 0.0,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": 0.0,
|
||||
"baseline_avg_reward": 0.44,
|
||||
"challenger_avg_reward": 0.44
|
||||
}
|
||||
}
|
||||
}
|
||||
35
docs/router-versions/versions/20260414-180625.json
Normal file
35
docs/router-versions/versions/20260414-180625.json
Normal file
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"version_id": "20260414-180625",
|
||||
"weights": {
|
||||
"clarify": {
|
||||
"input_length": 6.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9500000000000001,
|
||||
"top_skill_success_rate": 0.8999999999999998,
|
||||
"top_tool_confidence": 0.9500000000000001,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"source": "online_learning",
|
||||
"benchmark_summary": {
|
||||
"reward_delta": 0.0,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": 0.0,
|
||||
"baseline_avg_reward": 0.44,
|
||||
"challenger_avg_reward": 0.44
|
||||
}
|
||||
}
|
||||
}
|
||||
35
docs/router-versions/versions/20260414-180658.json
Normal file
35
docs/router-versions/versions/20260414-180658.json
Normal file
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"version_id": "20260414-180658",
|
||||
"weights": {
|
||||
"clarify": {
|
||||
"input_length": 6.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9500000000000001,
|
||||
"top_skill_success_rate": 0.8999999999999998,
|
||||
"top_tool_confidence": 0.9500000000000001,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"source": "online_learning",
|
||||
"benchmark_summary": {
|
||||
"reward_delta": 0.0,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": 0.0,
|
||||
"baseline_avg_reward": 0.44,
|
||||
"challenger_avg_reward": 0.44
|
||||
}
|
||||
}
|
||||
}
|
||||
35
docs/router-versions/versions/20260414-182721.json
Normal file
35
docs/router-versions/versions/20260414-182721.json
Normal file
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"version_id": "20260414-182721",
|
||||
"weights": {
|
||||
"clarify": {
|
||||
"input_length": 6.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9500000000000001,
|
||||
"top_skill_success_rate": 0.8999999999999998,
|
||||
"top_tool_confidence": 0.9500000000000001,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"source": "online_learning",
|
||||
"benchmark_summary": {
|
||||
"reward_delta": 0.0,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": 0.0,
|
||||
"baseline_avg_reward": 0.44,
|
||||
"challenger_avg_reward": 0.44
|
||||
}
|
||||
}
|
||||
}
|
||||
35
docs/router-versions/versions/20260414-182806.json
Normal file
35
docs/router-versions/versions/20260414-182806.json
Normal file
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"version_id": "20260414-182806",
|
||||
"weights": {
|
||||
"clarify": {
|
||||
"input_length": 6.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9500000000000001,
|
||||
"top_skill_success_rate": 0.8999999999999998,
|
||||
"top_tool_confidence": 0.9500000000000001,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"source": "online_learning",
|
||||
"benchmark_summary": {
|
||||
"reward_delta": 0.0,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": 0.0,
|
||||
"baseline_avg_reward": 0.44,
|
||||
"challenger_avg_reward": 0.44
|
||||
}
|
||||
}
|
||||
}
|
||||
35
docs/router-versions/versions/20260414-183024.json
Normal file
35
docs/router-versions/versions/20260414-183024.json
Normal file
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"version_id": "20260414-183024",
|
||||
"weights": {
|
||||
"clarify": {
|
||||
"input_length": 6.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9500000000000001,
|
||||
"top_skill_success_rate": 0.8999999999999998,
|
||||
"top_tool_confidence": 0.9500000000000001,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"source": "online_learning",
|
||||
"benchmark_summary": {
|
||||
"reward_delta": 0.0,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": 0.0,
|
||||
"baseline_avg_reward": 0.44,
|
||||
"challenger_avg_reward": 0.44
|
||||
}
|
||||
}
|
||||
}
|
||||
35
docs/router-versions/versions/20260414-183107.json
Normal file
35
docs/router-versions/versions/20260414-183107.json
Normal file
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"version_id": "20260414-183107",
|
||||
"weights": {
|
||||
"clarify": {
|
||||
"input_length": 6.0,
|
||||
"memory_count": 1.0,
|
||||
"skill_count": 1.0,
|
||||
"tool_count": 1.0,
|
||||
"top_memory_confidence": 0.9500000000000001,
|
||||
"top_skill_success_rate": 0.8999999999999998,
|
||||
"top_tool_confidence": 0.9500000000000001,
|
||||
"top_tool_risk": 0.0
|
||||
}
|
||||
},
|
||||
"feature_keys": [
|
||||
"input_length",
|
||||
"memory_count",
|
||||
"skill_count",
|
||||
"tool_count",
|
||||
"top_memory_confidence",
|
||||
"top_skill_success_rate",
|
||||
"top_tool_confidence",
|
||||
"top_tool_risk"
|
||||
],
|
||||
"metadata": {
|
||||
"source": "online_learning",
|
||||
"benchmark_summary": {
|
||||
"reward_delta": 0.0,
|
||||
"error_rate_delta": 0.0,
|
||||
"latency_delta_ms": 0.0,
|
||||
"baseline_avg_reward": 0.44,
|
||||
"challenger_avg_reward": 0.44
|
||||
}
|
||||
}
|
||||
}
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user