172 lines
5.6 KiB
Markdown
172 lines
5.6 KiB
Markdown
# Online Learning Operator Guide
|
|
|
|
## What it does
|
|
|
|
memabra's online learning loop lets the system safely retrain its router from accumulated trajectories, evaluate the new challenger against the current baseline, and promote it only if explicit thresholds are met.
|
|
|
|
## How to run one cycle
|
|
|
|
### From Python
|
|
|
|
```python
|
|
from src.memabra.cli import run_online_learning_workflow
|
|
|
|
result = run_online_learning_workflow()
|
|
print(result)
|
|
```
|
|
|
|
### From the shell
|
|
|
|
```bash
|
|
source venv/bin/activate
|
|
python -m src.memabra.cli
|
|
```
|
|
|
|
Or with custom options:
|
|
|
|
```bash
|
|
source venv/bin/activate
|
|
python -m src.memabra.cli --base-dir /custom/artifacts --min-new-trajectories 5
|
|
```
|
|
|
|
By default the CLI persists seen trajectory IDs to `<base-dir>/seen-trajectories.json` so repeated runs skip already-processed data. You can override the path:
|
|
|
|
```bash
|
|
source venv/bin/activate
|
|
python -m src.memabra.cli --seen-trajectory-store /custom/artifacts/seen.json
|
|
```
|
|
|
|
### Dry-run mode
|
|
|
|
To train and evaluate a challenger without actually promoting it or saving a new router version:
|
|
|
|
```bash
|
|
source venv/bin/activate
|
|
python -m src.memabra.cli --dry-run
|
|
```
|
|
|
|
This still produces a training report (with `dry_run: true`) so you can inspect what would have happened before allowing a real promotion.
|
|
|
|
### Evaluate against a specific baseline version
|
|
|
|
By default the online-learning cycle uses the currently active router as the baseline. You can pin the baseline to a specific saved version instead:
|
|
|
|
```bash
|
|
source venv/bin/activate
|
|
python -m src.memabra.cli --baseline-version 20260414-123456
|
|
```
|
|
|
|
This is useful when you want to compare a challenger against a known-good version rather than whatever happens to be active right now. The report will record `baseline_version_id` for audit.
|
|
|
|
### Episodic retrieval with case index
|
|
|
|
You can load or rebuild a case index for episodic retrieval during task execution:
|
|
|
|
```bash
|
|
source venv/bin/activate
|
|
python -m src.memabra.cli --rebuild-case-index
|
|
```
|
|
|
|
This builds a `CaseIndex` from all saved trajectories and saves it to the default path (`<base-dir>/case-index.json`). On subsequent runs, load it without rebuilding:
|
|
|
|
```bash
|
|
source venv/bin/activate
|
|
python -m src.memabra.cli --case-index /custom/artifacts/case-index.json
|
|
```
|
|
|
|
When a case index path is provided, the online-learning cycle automatically rebuilds the index after training and evaluation, so benchmark-generated trajectories are included for future episodic retrieval.
|
|
|
|
When a case index is loaded, the runner injects an episodic memory candidate into retrieval for inputs that match a previously seen task, surfacing the best past trajectory as a hint to the router.
|
|
|
|
Or inline:
|
|
|
|
```bash
|
|
source venv/bin/activate
|
|
python - <<'PY'
|
|
from src.memabra.cli import run_online_learning_workflow
|
|
print(run_online_learning_workflow())
|
|
PY
|
|
```
|
|
|
|
## Promotion gates
|
|
|
|
A challenger is promoted only when **all** of the following are true:
|
|
|
|
- `reward_delta >= min_reward_delta` — the challenger must improve average reward by at least this amount
|
|
- `error_rate_delta <= max_error_rate_increase` — the challenger must not increase errors beyond this limit
|
|
- `latency_delta_ms <= max_latency_increase_ms` — the challenger must not become slower beyond this limit
|
|
- `task_count >= required_task_count` — the benchmark must include at least this many tasks
|
|
|
|
Default policy in the CLI workflow is lenient for alpha exploration. In production you should tighten these thresholds.
|
|
|
|
## Where reports and versions are stored
|
|
|
|
By default everything lands under:
|
|
|
|
- `docs/projects/memabra/demo-artifacts/trajectories/` — raw task trajectories
|
|
- `docs/projects/memabra/demo-artifacts/router-versions/versions/` — versioned router weights
|
|
- `docs/projects/memabra/demo-artifacts/router-versions/current.json` — active router metadata (includes promotion source, benchmark summary, prior version, rollback history)
|
|
- `docs/projects/memabra/demo-artifacts/training-reports/` — one JSON report per training run
|
|
|
|
## What happens when the challenger loses
|
|
|
|
- The active router in the app **remains unchanged**
|
|
- A training report is still saved with the rejection reasons
|
|
- No new version is registered as current
|
|
|
|
## Rolling back
|
|
|
|
You can roll back to any previous version from Python:
|
|
|
|
```python
|
|
from src.memabra.router_versioning import RouterVersionStore
|
|
|
|
store = RouterVersionStore()
|
|
store.rollback("20260414-123456")
|
|
current = store.get_current()
|
|
print(current)
|
|
```
|
|
|
|
Or from the CLI:
|
|
|
|
```bash
|
|
source venv/bin/activate
|
|
python -m src.memabra.cli --rollback 20260414-123456
|
|
```
|
|
|
|
To see all available versions before rolling back:
|
|
|
|
```bash
|
|
source venv/bin/activate
|
|
python -m src.memabra.cli --list-versions
|
|
```
|
|
|
|
Rollback preserves an audit trail in `current.json` (`rollback_from`, `rolled_back_at`).
|
|
|
|
## Status check
|
|
|
|
To quickly inspect the current system state without running a learning cycle:
|
|
|
|
```bash
|
|
source venv/bin/activate
|
|
python -m src.memabra.cli --status
|
|
```
|
|
|
|
## Architecture summary
|
|
|
|
```
|
|
Trajectories -> ArtifactIndex -> DatasetBuilder -> SimpleLearningRouter (challenger)
|
|
|
|
|
v
|
|
BenchmarkSuite -> Evaluator -> baseline vs challenger
|
|
|
|
|
v
|
|
PromotionPolicy.evaluate()
|
|
|
|
|
+-------------------+-------------------+
|
|
| accepted | rejected
|
|
v v
|
|
RouterVersionStore.save() training report saved
|
|
app.set_router(challenger) active router unchanged
|
|
```
|