Files
memabra/docs/PROGRESS.md
2026-04-15 11:06:05 +08:00

7.3 KiB

memabra Progress

Current status

Project status: safe self-improving alpha, benchmark-gated online learning loop complete Date: 2026-04-15 Project: memabra Subtitle: An intuition-driven control plane for agent memory and action selection.

What exists now

memabra now has a complete safe self-improving alpha control-plane loop:

  • candidate retrieval
  • routing decisions
  • memory / skill / tool execution
  • telemetry events
  • trajectory construction
  • runtime validation
  • artifact persistence
  • replay and analytics
  • artifact indexing and dataset slicing
  • lightweight learning router training
  • A/B evaluation
  • router weight versioning and rollback
  • benchmark-gated promotion with explicit policy thresholds
  • auditable training reports
  • exception-safe online learning coordinator
  • configurable CLI entrypoint
  • persisted seen-trajectory tracking across restarts (safe for cron jobs)
  • dry-run mode for training/evaluation without promotion risk
  • baseline version selection for challenger evaluation
  • task case index (CaseIndex) for episodic retrieval: maps normalized inputs to the best past trajectory ID
  • CaseIndex integration into MemabraApp (build, save, load, lookup) and MemabraRunner (injects episodic candidate on matching inputs)
  • CLI flags --case-index and --rebuild-case-index for operator-managed episodic retrieval
  • OnlineLearningCoordinator auto-rebuilds case index after each cycle when case_index_path is provided, ensuring benchmark-generated trajectories are indexed
  • TrajectorySummarizer generates human-readable trajectory summaries from task input, decisions, outcome, and reward
  • MemabraRunner enriches episodic memory candidate summaries using TrajectorySummarizer when persistence_store is available
  • CLI --status flag prints current system state (active router version, counts, latest report) without triggering a learning cycle
  • CLI is now subcommand-driven (run, status, version list, version rollback) with a dedicated packaged memabra entrypoint
  • CLI --format text mode provides operator-friendly summaries for status checks, version listings, rollbacks, and workflow runs, including latest report details, current-version highlighting, sectioned workflow summaries, normalized yes/no flags, and fixed-precision benchmark/promotion metrics

Major completed capabilities

Foundations

  • project naming, architecture, roadmap, decisions, reward spec
  • candidate / event / trajectory / memory schemas
  • prototype package structure under src/memabra/

Runtime path

  • retrieval.py: typed candidate retrieval
  • router.py: heuristic router, feature-scoring router, learning router
  • execution.py: memory, skill, tool executors and adapters
  • runner.py: end-to-end task -> trajectory orchestration
  • persistence.py: trajectory and memory artifact storage
  • replay.py: replay summaries over examples and persisted runs
  • memory_store.py: typed memory records with verify/revoke support

Adapters and evaluation

  • real tool adapters:
    • LocalFunctionToolAdapter
    • SubprocessToolAdapter
    • ToolRegistry
  • real skill loading:
    • FileSystemSkillBackend
  • richer evaluation path:
    • OutcomeEngine
    • RewardEngine
    • ArtifactIndex
    • DatasetBuilder
    • Evaluator
    • RouterVersionStore
  • Alpha Iteration 1 — online learning loop:
    • PromotionPolicy with benchmark-gated promotion rules
    • BenchmarkSuite persistence (JSON load/save + default seed)
    • OnlineLearningCoordinator for retrain/evaluate/promote cycles
    • exception-safe coordinator: training/evaluation failures emit auditable error reports instead of crashing
    • TrainingReportStore.get_report() for by-id report lookup

Product/demo surface

  • app.py: MemabraApp, demo builders, artifact index access, training hooks, run_online_learning_cycle
  • cli.py: wrap-up workflow and run_online_learning_workflow with benchmark-gated promotion
  • cli.py: argument parsing (--base-dir, --min-new-trajectories) and clean python -m src.memabra.cli execution
  • DEMO.md: runnable walkthrough with CLI options

Current test status

Command: source venv/bin/activate && python -m pytest tests/memabra -q

Latest result: 118 passed

All alpha iteration 1 source, tests, and documentation have been committed to the repository (commit 34cf507c).

Most important current files

Core package

  • src/memabra/app.py
  • src/memabra/cli.py
  • src/memabra/router.py
  • src/memabra/runner.py
  • src/memabra/execution.py
  • src/memabra/evaluator.py
  • src/memabra/router_versioning.py
  • src/memabra/promotion.py
  • src/memabra/online_learning.py
  • src/memabra/training_reports.py
  • src/memabra/benchmarks.py
  • src/memabra/case_index.py

Tests

  • tests/memabra/test_app.py
  • tests/memabra/test_cli_workflow.py
  • tests/memabra/test_package_exports.py
  • tests/memabra/test_promotion.py
  • tests/memabra/test_online_learning.py
  • tests/memabra/test_training_reports.py
  • tests/memabra/test_benchmarks.py
  • tests/memabra/test_router_versioning.py
  • tests/memabra/test_evaluator.py
  • tests/memabra/test_router_protocol.py
  • tests/memabra/test_execution_persistence.py

Wrap-up status

The project is now in a safe self-improving alpha state. It can:

  • run realistic demo tasks
  • persist trajectories
  • replay and inspect results
  • train a lightweight router from saved artifacts
  • compare baseline vs challenger routers
  • apply a promotion policy with explicit thresholds
  • save and reload router versions with metadata
  • emit auditable training reports
  • run an online-learning cycle from the CLI
  • leave the active router unchanged when challenger fails
  • survive training/evaluation failures gracefully and emit error reports
  • accept CLI overrides for artifact directory and trajectory thresholds
  • persist seen-trajectory state across restarts so cron jobs don't retrain on the same data
  • default CLI main() persists seen trajectories to <base-dir>/seen-trajectories.json
  • run in dry-run mode to evaluate a challenger without promoting it
  • run in baseline-version mode to compare a challenger against a specific saved version instead of the currently active router
  • index successful task cases by normalized input for episodic retrieval (CaseIndex)
  • build/save/load a case index from MemabraApp
  • inject episodic memory candidates during runner retrieval when a similar past task exists
  • use --case-index and --rebuild-case-index CLI flags to manage episodic retrieval
  • online-learning cycles automatically refresh the case index after training/evaluation when a case-index path is configured
  • episodic memory candidates now include rich human-readable summaries when the past trajectory is available via persistence_store
  • CLI --status flag provides a quick read-only snapshot of the active router, versions, trajectories, and reports
  • CLI --rollback and --list-versions flags enable operator-safe router version management without touching code

Next sensible frontier

  1. tighter integration with real Hermes trajectories
  2. multi-turn conversation state and working-memory updates
  3. richer real-world tool ecosystem integration (MCP, web, git, files)
  4. stronger storage/index backend beyond plain JSON files

One-line summary

memabra is now a runnable, test-covered safe self-improving alpha for agent memory/action routing, with online learning, benchmark-gated promotion, and auditable reports.