Initial standalone memabra release
This commit is contained in:
162
docs/PROGRESS.md
Normal file
162
docs/PROGRESS.md
Normal file
@@ -0,0 +1,162 @@
|
||||
# memabra Progress
|
||||
|
||||
## Current status
|
||||
|
||||
Project status: safe self-improving alpha, benchmark-gated online learning loop complete
|
||||
Date: 2026-04-15
|
||||
Project: memabra
|
||||
Subtitle: An intuition-driven control plane for agent memory and action selection.
|
||||
|
||||
## What exists now
|
||||
|
||||
memabra now has a complete safe self-improving alpha control-plane loop:
|
||||
- candidate retrieval
|
||||
- routing decisions
|
||||
- memory / skill / tool execution
|
||||
- telemetry events
|
||||
- trajectory construction
|
||||
- runtime validation
|
||||
- artifact persistence
|
||||
- replay and analytics
|
||||
- artifact indexing and dataset slicing
|
||||
- lightweight learning router training
|
||||
- A/B evaluation
|
||||
- router weight versioning and rollback
|
||||
- benchmark-gated promotion with explicit policy thresholds
|
||||
- auditable training reports
|
||||
- exception-safe online learning coordinator
|
||||
- configurable CLI entrypoint
|
||||
- persisted seen-trajectory tracking across restarts (safe for cron jobs)
|
||||
- dry-run mode for training/evaluation without promotion risk
|
||||
- baseline version selection for challenger evaluation
|
||||
- task case index (`CaseIndex`) for episodic retrieval: maps normalized inputs to the best past trajectory ID
|
||||
- `CaseIndex` integration into `MemabraApp` (build, save, load, lookup) and `MemabraRunner` (injects episodic candidate on matching inputs)
|
||||
- CLI flags `--case-index` and `--rebuild-case-index` for operator-managed episodic retrieval
|
||||
- `OnlineLearningCoordinator` auto-rebuilds case index after each cycle when `case_index_path` is provided, ensuring benchmark-generated trajectories are indexed
|
||||
- `TrajectorySummarizer` generates human-readable trajectory summaries from task input, decisions, outcome, and reward
|
||||
- `MemabraRunner` enriches episodic memory candidate summaries using `TrajectorySummarizer` when `persistence_store` is available
|
||||
- CLI `--status` flag prints current system state (active router version, counts, latest report) without triggering a learning cycle
|
||||
- CLI is now subcommand-driven (`run`, `status`, `version list`, `version rollback`) with a dedicated packaged `memabra` entrypoint
|
||||
- CLI `--format text` mode provides operator-friendly summaries for status checks, version listings, rollbacks, and workflow runs, including latest report details, current-version highlighting, sectioned workflow summaries, normalized yes/no flags, and fixed-precision benchmark/promotion metrics
|
||||
|
||||
## Major completed capabilities
|
||||
|
||||
### Foundations
|
||||
- project naming, architecture, roadmap, decisions, reward spec
|
||||
- candidate / event / trajectory / memory schemas
|
||||
- prototype package structure under `src/memabra/`
|
||||
|
||||
### Runtime path
|
||||
- `retrieval.py`: typed candidate retrieval
|
||||
- `router.py`: heuristic router, feature-scoring router, learning router
|
||||
- `execution.py`: memory, skill, tool executors and adapters
|
||||
- `runner.py`: end-to-end task -> trajectory orchestration
|
||||
- `persistence.py`: trajectory and memory artifact storage
|
||||
- `replay.py`: replay summaries over examples and persisted runs
|
||||
- `memory_store.py`: typed memory records with verify/revoke support
|
||||
|
||||
### Adapters and evaluation
|
||||
- real tool adapters:
|
||||
- `LocalFunctionToolAdapter`
|
||||
- `SubprocessToolAdapter`
|
||||
- `ToolRegistry`
|
||||
- real skill loading:
|
||||
- `FileSystemSkillBackend`
|
||||
- richer evaluation path:
|
||||
- `OutcomeEngine`
|
||||
- `RewardEngine`
|
||||
- `ArtifactIndex`
|
||||
- `DatasetBuilder`
|
||||
- `Evaluator`
|
||||
- `RouterVersionStore`
|
||||
- Alpha Iteration 1 — online learning loop:
|
||||
- `PromotionPolicy` with benchmark-gated promotion rules
|
||||
- `BenchmarkSuite` persistence (JSON load/save + default seed)
|
||||
- `OnlineLearningCoordinator` for retrain/evaluate/promote cycles
|
||||
- exception-safe coordinator: training/evaluation failures emit auditable error reports instead of crashing
|
||||
- `TrainingReportStore.get_report()` for by-id report lookup
|
||||
|
||||
### Product/demo surface
|
||||
- `app.py`: `MemabraApp`, demo builders, artifact index access, training hooks, `run_online_learning_cycle`
|
||||
- `cli.py`: wrap-up workflow and `run_online_learning_workflow` with benchmark-gated promotion
|
||||
- `cli.py`: argument parsing (`--base-dir`, `--min-new-trajectories`) and clean `python -m src.memabra.cli` execution
|
||||
- `DEMO.md`: runnable walkthrough with CLI options
|
||||
|
||||
## Current test status
|
||||
|
||||
Command:
|
||||
`source venv/bin/activate && python -m pytest tests/memabra -q`
|
||||
|
||||
Latest result:
|
||||
`118 passed`
|
||||
|
||||
All alpha iteration 1 source, tests, and documentation have been committed to the repository (commit `34cf507c`).
|
||||
|
||||
## Most important current files
|
||||
|
||||
### Core package
|
||||
- `src/memabra/app.py`
|
||||
- `src/memabra/cli.py`
|
||||
- `src/memabra/router.py`
|
||||
- `src/memabra/runner.py`
|
||||
- `src/memabra/execution.py`
|
||||
- `src/memabra/evaluator.py`
|
||||
- `src/memabra/router_versioning.py`
|
||||
- `src/memabra/promotion.py`
|
||||
- `src/memabra/online_learning.py`
|
||||
- `src/memabra/training_reports.py`
|
||||
- `src/memabra/benchmarks.py`
|
||||
- `src/memabra/case_index.py`
|
||||
|
||||
### Tests
|
||||
- `tests/memabra/test_app.py`
|
||||
- `tests/memabra/test_cli_workflow.py`
|
||||
- `tests/memabra/test_package_exports.py`
|
||||
- `tests/memabra/test_promotion.py`
|
||||
- `tests/memabra/test_online_learning.py`
|
||||
- `tests/memabra/test_training_reports.py`
|
||||
- `tests/memabra/test_benchmarks.py`
|
||||
- `tests/memabra/test_router_versioning.py`
|
||||
- `tests/memabra/test_evaluator.py`
|
||||
- `tests/memabra/test_router_protocol.py`
|
||||
- `tests/memabra/test_execution_persistence.py`
|
||||
|
||||
## Wrap-up status
|
||||
|
||||
The project is now in a safe self-improving alpha state.
|
||||
It can:
|
||||
- run realistic demo tasks
|
||||
- persist trajectories
|
||||
- replay and inspect results
|
||||
- train a lightweight router from saved artifacts
|
||||
- compare baseline vs challenger routers
|
||||
- apply a promotion policy with explicit thresholds
|
||||
- save and reload router versions with metadata
|
||||
- emit auditable training reports
|
||||
- run an online-learning cycle from the CLI
|
||||
- leave the active router unchanged when challenger fails
|
||||
- survive training/evaluation failures gracefully and emit error reports
|
||||
- accept CLI overrides for artifact directory and trajectory thresholds
|
||||
- persist seen-trajectory state across restarts so cron jobs don't retrain on the same data
|
||||
- default CLI `main()` persists seen trajectories to `<base-dir>/seen-trajectories.json`
|
||||
- run in dry-run mode to evaluate a challenger without promoting it
|
||||
- run in baseline-version mode to compare a challenger against a specific saved version instead of the currently active router
|
||||
- index successful task cases by normalized input for episodic retrieval (`CaseIndex`)
|
||||
- build/save/load a case index from `MemabraApp`
|
||||
- inject episodic memory candidates during runner retrieval when a similar past task exists
|
||||
- use `--case-index` and `--rebuild-case-index` CLI flags to manage episodic retrieval
|
||||
- online-learning cycles automatically refresh the case index after training/evaluation when a case-index path is configured
|
||||
- episodic memory candidates now include rich human-readable summaries when the past trajectory is available via `persistence_store`
|
||||
- CLI `--status` flag provides a quick read-only snapshot of the active router, versions, trajectories, and reports
|
||||
- CLI `--rollback` and `--list-versions` flags enable operator-safe router version management without touching code
|
||||
|
||||
## Next sensible frontier
|
||||
|
||||
1. tighter integration with real Hermes trajectories
|
||||
2. multi-turn conversation state and working-memory updates
|
||||
3. richer real-world tool ecosystem integration (MCP, web, git, files)
|
||||
4. stronger storage/index backend beyond plain JSON files
|
||||
|
||||
## One-line summary
|
||||
|
||||
memabra is now a runnable, test-covered safe self-improving alpha for agent memory/action routing, with online learning, benchmark-gated promotion, and auditable reports.
|
||||
Reference in New Issue
Block a user