Initial standalone memabra release

2026-04-15 11:06:05 +08:00
commit 58f9f221b1
464 changed files with 30256 additions and 0 deletions
--- a/docs/DEMO.md
+++ b/docs/DEMO.md
@@ -0,0 +1,148 @@
+# Demo
+
+memabra now has a polished wrap-up workflow in addition to the lower-level demo app.
+
+## Quick run
+
+If you installed the repo in editable mode, prefer the dedicated CLI command:
+
+```bash
+source venv/bin/activate
+memabra
+```
+
+The legacy developer entrypoint still works too:
+
+```bash
+source venv/bin/activate
+python -m src.memabra.cli
+```
+
+This runs the online-learning loop: it seeds demo tasks, trains a challenger router, evaluates it against a benchmark suite, promotes it if thresholds are met, and prints a JSON report.
+
+You can override the default artifact directory and minimum trajectory threshold:
+
+```bash
+source venv/bin/activate
+memabra run --base-dir /custom/artifacts --min-new-trajectories 5
+```
+
+You can also enable episodic retrieval by rebuilding the case index from saved trajectories:
+
+```bash
+source venv/bin/activate
+memabra run --rebuild-case-index
+```
+
+You can check system status, list versions, or roll back without running a learning cycle:
+
+```bash
+source venv/bin/activate
+memabra status
+memabra version list
+memabra version rollback 20260414-123456
+```
+
+If you want operator-friendly output instead of raw JSON, use `--format text`:
+
+```bash
+source venv/bin/activate
+memabra status --format text
+memabra version list --format text
+memabra version rollback 20260414-123456 --format text
+memabra run --dry-run --format text
+```
+
+The text formatter is aimed at operators: status output includes the latest report timing/outcome, version listings highlight the currently active router version, and workflow output is grouped into summary/baseline/challenger/deltas/decision sections with normalized yes/no and fixed-precision metrics.
+
+You can also call it programmatically:
+
+```bash
+source venv/bin/activate
+python - <<'PY'
+from src.memabra.cli import run_online_learning_workflow
+result = run_online_learning_workflow()
+print(result)
+PY
+```
+
+The online-learning workflow will:
+1. build a demo app
+2. seed example tasks (if no trajectories exist yet)
+3. run one online-learning cycle
+4. train a challenger router
+5. evaluate it against the baseline on a fixed benchmark suite
+6. promote it only if the promotion policy accepts
+7. persist a training report under `training-reports/`
+8. print a JSON report
+
+## Python API
+
+```python
+from src.memabra.cli import run_wrapup_workflow, run_online_learning_workflow
+
+# Legacy wrap-up demo
+result = run_wrapup_workflow()
+print(result)
+
+# Safe online-learning loop with benchmark-gated promotion
+result = run_online_learning_workflow()
+print(result)
+```
+
+## Lower-level demo app
+
+You can still drive the app manually:
+
+```bash
+source venv/bin/activate
+python - <<'PY'
+from src.memabra.app import build_demo_app
+app = build_demo_app()
+
+for prompt in [
+    'Use my telegram preference for this answer.',
+    'Check the current system status.',
+    'Deploy this service with the usual workflow.',
+]:
+    trajectory = app.run_task(prompt, channel='telegram', user_id='oza')
+    print(prompt)
+    print(trajectory['decisions'][0]['decision_type'], trajectory['outcome']['status'], trajectory['reward']['total'])
+    print([event['event_type'] for event in trajectory['events']])
+    print('---')
+
+print(app.replay_summary())
+PY
+```
+
+## Output locations
+
+By default the workflows write to:
+- `docs/projects/memabra/demo-artifacts/trajectories/`
+- `docs/projects/memabra/demo-artifacts/memories/`
+- `docs/projects/memabra/demo-artifacts/router-versions/`
+- `docs/projects/memabra/demo-artifacts/training-reports/`
+
+## What this proves
+
+The alpha is able to demonstrate the whole loop:
+- retrieval
+- routing
+- execution
+- persistence
+- replay
+- training
+- evaluation
+- router versioning
+- benchmark-gated promotion
+- auditable training reports
+
+## Limits
+
+This is still an alpha:
+- learning is lightweight, not a deep model
+- storage is JSON-file based
+- promotion policy thresholds are manually configured
+- tool/skill integration is still narrower than a production agent platform
+
+But it is now a safe, self-improving alpha, not just a pile of modules.