5.6 KiB
Online Learning Operator Guide
What it does
memabra's online learning loop lets the system safely retrain its router from accumulated trajectories, evaluate the new challenger against the current baseline, and promote it only if explicit thresholds are met.
How to run one cycle
From Python
from src.memabra.cli import run_online_learning_workflow
result = run_online_learning_workflow()
print(result)
From the shell
source venv/bin/activate
python -m src.memabra.cli
Or with custom options:
source venv/bin/activate
python -m src.memabra.cli --base-dir /custom/artifacts --min-new-trajectories 5
By default the CLI persists seen trajectory IDs to <base-dir>/seen-trajectories.json so repeated runs skip already-processed data. You can override the path:
source venv/bin/activate
python -m src.memabra.cli --seen-trajectory-store /custom/artifacts/seen.json
Dry-run mode
To train and evaluate a challenger without actually promoting it or saving a new router version:
source venv/bin/activate
python -m src.memabra.cli --dry-run
This still produces a training report (with dry_run: true) so you can inspect what would have happened before allowing a real promotion.
Evaluate against a specific baseline version
By default the online-learning cycle uses the currently active router as the baseline. You can pin the baseline to a specific saved version instead:
source venv/bin/activate
python -m src.memabra.cli --baseline-version 20260414-123456
This is useful when you want to compare a challenger against a known-good version rather than whatever happens to be active right now. The report will record baseline_version_id for audit.
Episodic retrieval with case index
You can load or rebuild a case index for episodic retrieval during task execution:
source venv/bin/activate
python -m src.memabra.cli --rebuild-case-index
This builds a CaseIndex from all saved trajectories and saves it to the default path (<base-dir>/case-index.json). On subsequent runs, load it without rebuilding:
source venv/bin/activate
python -m src.memabra.cli --case-index /custom/artifacts/case-index.json
When a case index path is provided, the online-learning cycle automatically rebuilds the index after training and evaluation, so benchmark-generated trajectories are included for future episodic retrieval.
When a case index is loaded, the runner injects an episodic memory candidate into retrieval for inputs that match a previously seen task, surfacing the best past trajectory as a hint to the router.
Or inline:
source venv/bin/activate
python - <<'PY'
from src.memabra.cli import run_online_learning_workflow
print(run_online_learning_workflow())
PY
Promotion gates
A challenger is promoted only when all of the following are true:
reward_delta >= min_reward_delta— the challenger must improve average reward by at least this amounterror_rate_delta <= max_error_rate_increase— the challenger must not increase errors beyond this limitlatency_delta_ms <= max_latency_increase_ms— the challenger must not become slower beyond this limittask_count >= required_task_count— the benchmark must include at least this many tasks
Default policy in the CLI workflow is lenient for alpha exploration. In production you should tighten these thresholds.
Where reports and versions are stored
By default everything lands under:
docs/projects/memabra/demo-artifacts/trajectories/— raw task trajectoriesdocs/projects/memabra/demo-artifacts/router-versions/versions/— versioned router weightsdocs/projects/memabra/demo-artifacts/router-versions/current.json— active router metadata (includes promotion source, benchmark summary, prior version, rollback history)docs/projects/memabra/demo-artifacts/training-reports/— one JSON report per training run
What happens when the challenger loses
- The active router in the app remains unchanged
- A training report is still saved with the rejection reasons
- No new version is registered as current
Rolling back
You can roll back to any previous version from Python:
from src.memabra.router_versioning import RouterVersionStore
store = RouterVersionStore()
store.rollback("20260414-123456")
current = store.get_current()
print(current)
Or from the CLI:
source venv/bin/activate
python -m src.memabra.cli --rollback 20260414-123456
To see all available versions before rolling back:
source venv/bin/activate
python -m src.memabra.cli --list-versions
Rollback preserves an audit trail in current.json (rollback_from, rolled_back_at).
Status check
To quickly inspect the current system state without running a learning cycle:
source venv/bin/activate
python -m src.memabra.cli --status
Architecture summary
Trajectories -> ArtifactIndex -> DatasetBuilder -> SimpleLearningRouter (challenger)
|
v
BenchmarkSuite -> Evaluator -> baseline vs challenger
|
v
PromotionPolicy.evaluate()
|
+-------------------+-------------------+
| accepted | rejected
v v
RouterVersionStore.save() training report saved
app.set_router(challenger) active router unchanged