7.3 KiB
7.3 KiB
memabra Progress
Current status
Project status: safe self-improving alpha, benchmark-gated online learning loop complete Date: 2026-04-15 Project: memabra Subtitle: An intuition-driven control plane for agent memory and action selection.
What exists now
memabra now has a complete safe self-improving alpha control-plane loop:
- candidate retrieval
- routing decisions
- memory / skill / tool execution
- telemetry events
- trajectory construction
- runtime validation
- artifact persistence
- replay and analytics
- artifact indexing and dataset slicing
- lightweight learning router training
- A/B evaluation
- router weight versioning and rollback
- benchmark-gated promotion with explicit policy thresholds
- auditable training reports
- exception-safe online learning coordinator
- configurable CLI entrypoint
- persisted seen-trajectory tracking across restarts (safe for cron jobs)
- dry-run mode for training/evaluation without promotion risk
- baseline version selection for challenger evaluation
- task case index (
CaseIndex) for episodic retrieval: maps normalized inputs to the best past trajectory ID CaseIndexintegration intoMemabraApp(build, save, load, lookup) andMemabraRunner(injects episodic candidate on matching inputs)- CLI flags
--case-indexand--rebuild-case-indexfor operator-managed episodic retrieval OnlineLearningCoordinatorauto-rebuilds case index after each cycle whencase_index_pathis provided, ensuring benchmark-generated trajectories are indexedTrajectorySummarizergenerates human-readable trajectory summaries from task input, decisions, outcome, and rewardMemabraRunnerenriches episodic memory candidate summaries usingTrajectorySummarizerwhenpersistence_storeis available- CLI
--statusflag prints current system state (active router version, counts, latest report) without triggering a learning cycle - CLI is now subcommand-driven (
run,status,version list,version rollback) with a dedicated packagedmemabraentrypoint - CLI
--format textmode provides operator-friendly summaries for status checks, version listings, rollbacks, and workflow runs, including latest report details, current-version highlighting, sectioned workflow summaries, normalized yes/no flags, and fixed-precision benchmark/promotion metrics
Major completed capabilities
Foundations
- project naming, architecture, roadmap, decisions, reward spec
- candidate / event / trajectory / memory schemas
- prototype package structure under
src/memabra/
Runtime path
retrieval.py: typed candidate retrievalrouter.py: heuristic router, feature-scoring router, learning routerexecution.py: memory, skill, tool executors and adaptersrunner.py: end-to-end task -> trajectory orchestrationpersistence.py: trajectory and memory artifact storagereplay.py: replay summaries over examples and persisted runsmemory_store.py: typed memory records with verify/revoke support
Adapters and evaluation
- real tool adapters:
LocalFunctionToolAdapterSubprocessToolAdapterToolRegistry
- real skill loading:
FileSystemSkillBackend
- richer evaluation path:
OutcomeEngineRewardEngineArtifactIndexDatasetBuilderEvaluatorRouterVersionStore
- Alpha Iteration 1 — online learning loop:
PromotionPolicywith benchmark-gated promotion rulesBenchmarkSuitepersistence (JSON load/save + default seed)OnlineLearningCoordinatorfor retrain/evaluate/promote cycles- exception-safe coordinator: training/evaluation failures emit auditable error reports instead of crashing
TrainingReportStore.get_report()for by-id report lookup
Product/demo surface
app.py:MemabraApp, demo builders, artifact index access, training hooks,run_online_learning_cyclecli.py: wrap-up workflow andrun_online_learning_workflowwith benchmark-gated promotioncli.py: argument parsing (--base-dir,--min-new-trajectories) and cleanpython -m src.memabra.cliexecutionDEMO.md: runnable walkthrough with CLI options
Current test status
Command:
source venv/bin/activate && python -m pytest tests/memabra -q
Latest result:
118 passed
All alpha iteration 1 source, tests, and documentation have been committed to the repository (commit 34cf507c).
Most important current files
Core package
src/memabra/app.pysrc/memabra/cli.pysrc/memabra/router.pysrc/memabra/runner.pysrc/memabra/execution.pysrc/memabra/evaluator.pysrc/memabra/router_versioning.pysrc/memabra/promotion.pysrc/memabra/online_learning.pysrc/memabra/training_reports.pysrc/memabra/benchmarks.pysrc/memabra/case_index.py
Tests
tests/memabra/test_app.pytests/memabra/test_cli_workflow.pytests/memabra/test_package_exports.pytests/memabra/test_promotion.pytests/memabra/test_online_learning.pytests/memabra/test_training_reports.pytests/memabra/test_benchmarks.pytests/memabra/test_router_versioning.pytests/memabra/test_evaluator.pytests/memabra/test_router_protocol.pytests/memabra/test_execution_persistence.py
Wrap-up status
The project is now in a safe self-improving alpha state. It can:
- run realistic demo tasks
- persist trajectories
- replay and inspect results
- train a lightweight router from saved artifacts
- compare baseline vs challenger routers
- apply a promotion policy with explicit thresholds
- save and reload router versions with metadata
- emit auditable training reports
- run an online-learning cycle from the CLI
- leave the active router unchanged when challenger fails
- survive training/evaluation failures gracefully and emit error reports
- accept CLI overrides for artifact directory and trajectory thresholds
- persist seen-trajectory state across restarts so cron jobs don't retrain on the same data
- default CLI
main()persists seen trajectories to<base-dir>/seen-trajectories.json - run in dry-run mode to evaluate a challenger without promoting it
- run in baseline-version mode to compare a challenger against a specific saved version instead of the currently active router
- index successful task cases by normalized input for episodic retrieval (
CaseIndex) - build/save/load a case index from
MemabraApp - inject episodic memory candidates during runner retrieval when a similar past task exists
- use
--case-indexand--rebuild-case-indexCLI flags to manage episodic retrieval - online-learning cycles automatically refresh the case index after training/evaluation when a case-index path is configured
- episodic memory candidates now include rich human-readable summaries when the past trajectory is available via
persistence_store - CLI
--statusflag provides a quick read-only snapshot of the active router, versions, trajectories, and reports - CLI
--rollbackand--list-versionsflags enable operator-safe router version management without touching code
Next sensible frontier
- tighter integration with real Hermes trajectories
- multi-turn conversation state and working-memory updates
- richer real-world tool ecosystem integration (MCP, web, git, files)
- stronger storage/index backend beyond plain JSON files
One-line summary
memabra is now a runnable, test-covered safe self-improving alpha for agent memory/action routing, with online learning, benchmark-gated promotion, and auditable reports.