# memabra Progress ## Current status Project status: safe self-improving alpha, benchmark-gated online learning loop complete Date: 2026-04-15 Project: memabra Subtitle: An intuition-driven control plane for agent memory and action selection. ## What exists now memabra now has a complete safe self-improving alpha control-plane loop: - candidate retrieval - routing decisions - memory / skill / tool execution - telemetry events - trajectory construction - runtime validation - artifact persistence - replay and analytics - artifact indexing and dataset slicing - lightweight learning router training - A/B evaluation - router weight versioning and rollback - benchmark-gated promotion with explicit policy thresholds - auditable training reports - exception-safe online learning coordinator - configurable CLI entrypoint - persisted seen-trajectory tracking across restarts (safe for cron jobs) - dry-run mode for training/evaluation without promotion risk - baseline version selection for challenger evaluation - task case index (`CaseIndex`) for episodic retrieval: maps normalized inputs to the best past trajectory ID - `CaseIndex` integration into `MemabraApp` (build, save, load, lookup) and `MemabraRunner` (injects episodic candidate on matching inputs) - CLI flags `--case-index` and `--rebuild-case-index` for operator-managed episodic retrieval - `OnlineLearningCoordinator` auto-rebuilds case index after each cycle when `case_index_path` is provided, ensuring benchmark-generated trajectories are indexed - `TrajectorySummarizer` generates human-readable trajectory summaries from task input, decisions, outcome, and reward - `MemabraRunner` enriches episodic memory candidate summaries using `TrajectorySummarizer` when `persistence_store` is available - CLI `--status` flag prints current system state (active router version, counts, latest report) without triggering a learning cycle - CLI is now subcommand-driven (`run`, `status`, `version list`, `version rollback`) with a dedicated packaged `memabra` entrypoint - CLI `--format text` mode provides operator-friendly summaries for status checks, version listings, rollbacks, and workflow runs, including latest report details, current-version highlighting, sectioned workflow summaries, normalized yes/no flags, and fixed-precision benchmark/promotion metrics ## Major completed capabilities ### Foundations - project naming, architecture, roadmap, decisions, reward spec - candidate / event / trajectory / memory schemas - prototype package structure under `src/memabra/` ### Runtime path - `retrieval.py`: typed candidate retrieval - `router.py`: heuristic router, feature-scoring router, learning router - `execution.py`: memory, skill, tool executors and adapters - `runner.py`: end-to-end task -> trajectory orchestration - `persistence.py`: trajectory and memory artifact storage - `replay.py`: replay summaries over examples and persisted runs - `memory_store.py`: typed memory records with verify/revoke support ### Adapters and evaluation - real tool adapters: - `LocalFunctionToolAdapter` - `SubprocessToolAdapter` - `ToolRegistry` - real skill loading: - `FileSystemSkillBackend` - richer evaluation path: - `OutcomeEngine` - `RewardEngine` - `ArtifactIndex` - `DatasetBuilder` - `Evaluator` - `RouterVersionStore` - Alpha Iteration 1 — online learning loop: - `PromotionPolicy` with benchmark-gated promotion rules - `BenchmarkSuite` persistence (JSON load/save + default seed) - `OnlineLearningCoordinator` for retrain/evaluate/promote cycles - exception-safe coordinator: training/evaluation failures emit auditable error reports instead of crashing - `TrainingReportStore.get_report()` for by-id report lookup ### Product/demo surface - `app.py`: `MemabraApp`, demo builders, artifact index access, training hooks, `run_online_learning_cycle` - `cli.py`: wrap-up workflow and `run_online_learning_workflow` with benchmark-gated promotion - `cli.py`: argument parsing (`--base-dir`, `--min-new-trajectories`) and clean `python -m src.memabra.cli` execution - `DEMO.md`: runnable walkthrough with CLI options ## Current test status Command: `source venv/bin/activate && python -m pytest tests/memabra -q` Latest result: `118 passed` All alpha iteration 1 source, tests, and documentation have been committed to the repository (commit `34cf507c`). ## Most important current files ### Core package - `src/memabra/app.py` - `src/memabra/cli.py` - `src/memabra/router.py` - `src/memabra/runner.py` - `src/memabra/execution.py` - `src/memabra/evaluator.py` - `src/memabra/router_versioning.py` - `src/memabra/promotion.py` - `src/memabra/online_learning.py` - `src/memabra/training_reports.py` - `src/memabra/benchmarks.py` - `src/memabra/case_index.py` ### Tests - `tests/memabra/test_app.py` - `tests/memabra/test_cli_workflow.py` - `tests/memabra/test_package_exports.py` - `tests/memabra/test_promotion.py` - `tests/memabra/test_online_learning.py` - `tests/memabra/test_training_reports.py` - `tests/memabra/test_benchmarks.py` - `tests/memabra/test_router_versioning.py` - `tests/memabra/test_evaluator.py` - `tests/memabra/test_router_protocol.py` - `tests/memabra/test_execution_persistence.py` ## Wrap-up status The project is now in a safe self-improving alpha state. It can: - run realistic demo tasks - persist trajectories - replay and inspect results - train a lightweight router from saved artifacts - compare baseline vs challenger routers - apply a promotion policy with explicit thresholds - save and reload router versions with metadata - emit auditable training reports - run an online-learning cycle from the CLI - leave the active router unchanged when challenger fails - survive training/evaluation failures gracefully and emit error reports - accept CLI overrides for artifact directory and trajectory thresholds - persist seen-trajectory state across restarts so cron jobs don't retrain on the same data - default CLI `main()` persists seen trajectories to `/seen-trajectories.json` - run in dry-run mode to evaluate a challenger without promoting it - run in baseline-version mode to compare a challenger against a specific saved version instead of the currently active router - index successful task cases by normalized input for episodic retrieval (`CaseIndex`) - build/save/load a case index from `MemabraApp` - inject episodic memory candidates during runner retrieval when a similar past task exists - use `--case-index` and `--rebuild-case-index` CLI flags to manage episodic retrieval - online-learning cycles automatically refresh the case index after training/evaluation when a case-index path is configured - episodic memory candidates now include rich human-readable summaries when the past trajectory is available via `persistence_store` - CLI `--status` flag provides a quick read-only snapshot of the active router, versions, trajectories, and reports - CLI `--rollback` and `--list-versions` flags enable operator-safe router version management without touching code ## Next sensible frontier 1. tighter integration with real Hermes trajectories 2. multi-turn conversation state and working-memory updates 3. richer real-world tool ecosystem integration (MCP, web, git, files) 4. stronger storage/index backend beyond plain JSON files ## One-line summary memabra is now a runnable, test-covered safe self-improving alpha for agent memory/action routing, with online learning, benchmark-gated promotion, and auditable reports.