Files
memabra/docs/ONLINE_LEARNING.md
2026-04-15 11:06:05 +08:00

5.6 KiB

Online Learning Operator Guide

What it does

memabra's online learning loop lets the system safely retrain its router from accumulated trajectories, evaluate the new challenger against the current baseline, and promote it only if explicit thresholds are met.

How to run one cycle

From Python

from src.memabra.cli import run_online_learning_workflow

result = run_online_learning_workflow()
print(result)

From the shell

source venv/bin/activate
python -m src.memabra.cli

Or with custom options:

source venv/bin/activate
python -m src.memabra.cli --base-dir /custom/artifacts --min-new-trajectories 5

By default the CLI persists seen trajectory IDs to <base-dir>/seen-trajectories.json so repeated runs skip already-processed data. You can override the path:

source venv/bin/activate
python -m src.memabra.cli --seen-trajectory-store /custom/artifacts/seen.json

Dry-run mode

To train and evaluate a challenger without actually promoting it or saving a new router version:

source venv/bin/activate
python -m src.memabra.cli --dry-run

This still produces a training report (with dry_run: true) so you can inspect what would have happened before allowing a real promotion.

Evaluate against a specific baseline version

By default the online-learning cycle uses the currently active router as the baseline. You can pin the baseline to a specific saved version instead:

source venv/bin/activate
python -m src.memabra.cli --baseline-version 20260414-123456

This is useful when you want to compare a challenger against a known-good version rather than whatever happens to be active right now. The report will record baseline_version_id for audit.

Episodic retrieval with case index

You can load or rebuild a case index for episodic retrieval during task execution:

source venv/bin/activate
python -m src.memabra.cli --rebuild-case-index

This builds a CaseIndex from all saved trajectories and saves it to the default path (<base-dir>/case-index.json). On subsequent runs, load it without rebuilding:

source venv/bin/activate
python -m src.memabra.cli --case-index /custom/artifacts/case-index.json

When a case index path is provided, the online-learning cycle automatically rebuilds the index after training and evaluation, so benchmark-generated trajectories are included for future episodic retrieval.

When a case index is loaded, the runner injects an episodic memory candidate into retrieval for inputs that match a previously seen task, surfacing the best past trajectory as a hint to the router.

Or inline:

source venv/bin/activate
python - <<'PY'
from src.memabra.cli import run_online_learning_workflow
print(run_online_learning_workflow())
PY

Promotion gates

A challenger is promoted only when all of the following are true:

  • reward_delta >= min_reward_delta — the challenger must improve average reward by at least this amount
  • error_rate_delta <= max_error_rate_increase — the challenger must not increase errors beyond this limit
  • latency_delta_ms <= max_latency_increase_ms — the challenger must not become slower beyond this limit
  • task_count >= required_task_count — the benchmark must include at least this many tasks

Default policy in the CLI workflow is lenient for alpha exploration. In production you should tighten these thresholds.

Where reports and versions are stored

By default everything lands under:

  • docs/projects/memabra/demo-artifacts/trajectories/ — raw task trajectories
  • docs/projects/memabra/demo-artifacts/router-versions/versions/ — versioned router weights
  • docs/projects/memabra/demo-artifacts/router-versions/current.json — active router metadata (includes promotion source, benchmark summary, prior version, rollback history)
  • docs/projects/memabra/demo-artifacts/training-reports/ — one JSON report per training run

What happens when the challenger loses

  • The active router in the app remains unchanged
  • A training report is still saved with the rejection reasons
  • No new version is registered as current

Rolling back

You can roll back to any previous version from Python:

from src.memabra.router_versioning import RouterVersionStore

store = RouterVersionStore()
store.rollback("20260414-123456")
current = store.get_current()
print(current)

Or from the CLI:

source venv/bin/activate
python -m src.memabra.cli --rollback 20260414-123456

To see all available versions before rolling back:

source venv/bin/activate
python -m src.memabra.cli --list-versions

Rollback preserves an audit trail in current.json (rollback_from, rolled_back_at).

Status check

To quickly inspect the current system state without running a learning cycle:

source venv/bin/activate
python -m src.memabra.cli --status

Architecture summary

Trajectories -> ArtifactIndex -> DatasetBuilder -> SimpleLearningRouter (challenger)
                                        |
                                        v
BenchmarkSuite -> Evaluator -> baseline vs challenger
                                        |
                                        v
                              PromotionPolicy.evaluate()
                                        |
                    +-------------------+-------------------+
                    | accepted                                  | rejected
                    v                                           v
      RouterVersionStore.save()                    training report saved
      app.set_router(challenger)                   active router unchanged