# Online Learning Operator Guide ## What it does memabra's online learning loop lets the system safely retrain its router from accumulated trajectories, evaluate the new challenger against the current baseline, and promote it only if explicit thresholds are met. ## How to run one cycle ### From Python ```python from src.memabra.cli import run_online_learning_workflow result = run_online_learning_workflow() print(result) ``` ### From the shell ```bash source venv/bin/activate python -m src.memabra.cli ``` Or with custom options: ```bash source venv/bin/activate python -m src.memabra.cli --base-dir /custom/artifacts --min-new-trajectories 5 ``` By default the CLI persists seen trajectory IDs to `/seen-trajectories.json` so repeated runs skip already-processed data. You can override the path: ```bash source venv/bin/activate python -m src.memabra.cli --seen-trajectory-store /custom/artifacts/seen.json ``` ### Dry-run mode To train and evaluate a challenger without actually promoting it or saving a new router version: ```bash source venv/bin/activate python -m src.memabra.cli --dry-run ``` This still produces a training report (with `dry_run: true`) so you can inspect what would have happened before allowing a real promotion. ### Evaluate against a specific baseline version By default the online-learning cycle uses the currently active router as the baseline. You can pin the baseline to a specific saved version instead: ```bash source venv/bin/activate python -m src.memabra.cli --baseline-version 20260414-123456 ``` This is useful when you want to compare a challenger against a known-good version rather than whatever happens to be active right now. The report will record `baseline_version_id` for audit. ### Episodic retrieval with case index You can load or rebuild a case index for episodic retrieval during task execution: ```bash source venv/bin/activate python -m src.memabra.cli --rebuild-case-index ``` This builds a `CaseIndex` from all saved trajectories and saves it to the default path (`/case-index.json`). On subsequent runs, load it without rebuilding: ```bash source venv/bin/activate python -m src.memabra.cli --case-index /custom/artifacts/case-index.json ``` When a case index path is provided, the online-learning cycle automatically rebuilds the index after training and evaluation, so benchmark-generated trajectories are included for future episodic retrieval. When a case index is loaded, the runner injects an episodic memory candidate into retrieval for inputs that match a previously seen task, surfacing the best past trajectory as a hint to the router. Or inline: ```bash source venv/bin/activate python - <<'PY' from src.memabra.cli import run_online_learning_workflow print(run_online_learning_workflow()) PY ``` ## Promotion gates A challenger is promoted only when **all** of the following are true: - `reward_delta >= min_reward_delta` — the challenger must improve average reward by at least this amount - `error_rate_delta <= max_error_rate_increase` — the challenger must not increase errors beyond this limit - `latency_delta_ms <= max_latency_increase_ms` — the challenger must not become slower beyond this limit - `task_count >= required_task_count` — the benchmark must include at least this many tasks Default policy in the CLI workflow is lenient for alpha exploration. In production you should tighten these thresholds. ## Where reports and versions are stored By default everything lands under: - `docs/projects/memabra/demo-artifacts/trajectories/` — raw task trajectories - `docs/projects/memabra/demo-artifacts/router-versions/versions/` — versioned router weights - `docs/projects/memabra/demo-artifacts/router-versions/current.json` — active router metadata (includes promotion source, benchmark summary, prior version, rollback history) - `docs/projects/memabra/demo-artifacts/training-reports/` — one JSON report per training run ## What happens when the challenger loses - The active router in the app **remains unchanged** - A training report is still saved with the rejection reasons - No new version is registered as current ## Rolling back You can roll back to any previous version from Python: ```python from src.memabra.router_versioning import RouterVersionStore store = RouterVersionStore() store.rollback("20260414-123456") current = store.get_current() print(current) ``` Or from the CLI: ```bash source venv/bin/activate python -m src.memabra.cli --rollback 20260414-123456 ``` To see all available versions before rolling back: ```bash source venv/bin/activate python -m src.memabra.cli --list-versions ``` Rollback preserves an audit trail in `current.json` (`rollback_from`, `rolled_back_at`). ## Status check To quickly inspect the current system state without running a learning cycle: ```bash source venv/bin/activate python -m src.memabra.cli --status ``` ## Architecture summary ``` Trajectories -> ArtifactIndex -> DatasetBuilder -> SimpleLearningRouter (challenger) | v BenchmarkSuite -> Evaluator -> baseline vs challenger | v PromotionPolicy.evaluate() | +-------------------+-------------------+ | accepted | rejected v v RouterVersionStore.save() training report saved app.set_router(challenger) active router unchanged ```