memabra/docs/ONLINE_LEARNING.md

# Online Learning Operator Guide

## What it does

memabra's online learning loop lets the system safely retrain its router from accumulated trajectories, evaluate the new challenger against the current baseline, and promote it only if explicit thresholds are met.

## How to run one cycle

### From Python

```python
from src.memabra.cli import run_online_learning_workflow

result = run_online_learning_workflow()
print(result)
```

### From the shell

```bash
source venv/bin/activate
python -m src.memabra.cli
```

Or with custom options:

```bash
source venv/bin/activate
python -m src.memabra.cli --base-dir /custom/artifacts --min-new-trajectories 5
```

By default the CLI persists seen trajectory IDs to `<base-dir>/seen-trajectories.json` so repeated runs skip already-processed data. You can override the path:

```bash
source venv/bin/activate
python -m src.memabra.cli --seen-trajectory-store /custom/artifacts/seen.json
```

### Dry-run mode

To train and evaluate a challenger without actually promoting it or saving a new router version:

```bash
source venv/bin/activate
python -m src.memabra.cli --dry-run
```

This still produces a training report (with `dry_run: true`) so you can inspect what would have happened before allowing a real promotion.

### Evaluate against a specific baseline version

By default the online-learning cycle uses the currently active router as the baseline. You can pin the baseline to a specific saved version instead:

```bash
source venv/bin/activate
python -m src.memabra.cli --baseline-version 20260414-123456
```

This is useful when you want to compare a challenger against a known-good version rather than whatever happens to be active right now. The report will record `baseline_version_id` for audit.

### Episodic retrieval with case index

You can load or rebuild a case index for episodic retrieval during task execution:

```bash
source venv/bin/activate
python -m src.memabra.cli --rebuild-case-index
```

This builds a `CaseIndex` from all saved trajectories and saves it to the default path (`<base-dir>/case-index.json`). On subsequent runs, load it without rebuilding:

```bash
source venv/bin/activate
python -m src.memabra.cli --case-index /custom/artifacts/case-index.json
```

When a case index path is provided, the online-learning cycle automatically rebuilds the index after training and evaluation, so benchmark-generated trajectories are included for future episodic retrieval.

When a case index is loaded, the runner injects an episodic memory candidate into retrieval for inputs that match a previously seen task, surfacing the best past trajectory as a hint to the router.

Or inline:

```bash
source venv/bin/activate
python - <<'PY'
from src.memabra.cli import run_online_learning_workflow
print(run_online_learning_workflow())
PY
```

## Promotion gates

A challenger is promoted only when **all** of the following are true:

- `reward_delta >= min_reward_delta` — the challenger must improve average reward by at least this amount
- `error_rate_delta <= max_error_rate_increase` — the challenger must not increase errors beyond this limit
- `latency_delta_ms <= max_latency_increase_ms` — the challenger must not become slower beyond this limit
- `task_count >= required_task_count` — the benchmark must include at least this many tasks

Default policy in the CLI workflow is lenient for alpha exploration. In production you should tighten these thresholds.

## Where reports and versions are stored

By default everything lands under:

- `docs/projects/memabra/demo-artifacts/trajectories/` — raw task trajectories
- `docs/projects/memabra/demo-artifacts/router-versions/versions/` — versioned router weights
- `docs/projects/memabra/demo-artifacts/router-versions/current.json` — active router metadata (includes promotion source, benchmark summary, prior version, rollback history)
- `docs/projects/memabra/demo-artifacts/training-reports/` — one JSON report per training run

## What happens when the challenger loses

- The active router in the app **remains unchanged**
- A training report is still saved with the rejection reasons
- No new version is registered as current

## Rolling back

You can roll back to any previous version from Python:

```python
from src.memabra.router_versioning import RouterVersionStore

store = RouterVersionStore()
store.rollback("20260414-123456")
current = store.get_current()
print(current)
```

Or from the CLI:

```bash
source venv/bin/activate
python -m src.memabra.cli --rollback 20260414-123456
```

To see all available versions before rolling back:

```bash
source venv/bin/activate
python -m src.memabra.cli --list-versions
```

Rollback preserves an audit trail in `current.json` (`rollback_from`, `rolled_back_at`).

## Status check

To quickly inspect the current system state without running a learning cycle:

```bash
source venv/bin/activate
python -m src.memabra.cli --status
```

## Architecture summary

```
Trajectories -> ArtifactIndex -> DatasetBuilder -> SimpleLearningRouter (challenger)
                                        |
                                        v
BenchmarkSuite -> Evaluator -> baseline vs challenger
                                        |
                                        v
                              PromotionPolicy.evaluate()
                                        |
                    +-------------------+-------------------+
                    | accepted                                  | rejected
                    v                                           v
      RouterVersionStore.save()                    training report saved
      app.set_router(challenger)                   active router unchanged
```