`memo serve` exposes the same action surface as the CLI over HTTP+JSON,
so hot-path callers skip the ~10-20ms per-call subprocess overhead while
preserving the exact request/response shape.
Endpoints (all JSON):
GET /v1/health — liveness probe
GET /v1/backends — list backend descriptors
GET /v1/backends/<type> — full schema + actions
GET /v1/instances — list configured instances
POST /v1/run/<type>/<name>/<action> — execute action (body = params)
Auth via --auth-token (or MEMO_AUTH_TOKEN env). Constant-time bearer
comparison. Default bind 127.0.0.1:8765 when no token; explicit --addr
required to expose externally.
Behind the scenes:
- one cached session per (type, name); reused across requests
- filestore + pgstore are already concurrent-safe so caching is sound
- graceful shutdown on SIGINT/SIGTERM drains in-flight requests then
closes every cached session (so the postgres pool releases cleanly)
23 HTTP-level tests covering health, introspection, run dispatch on both
backends, every error mapping (unknown backend/instance/action, bad
params, body must be JSON object), auth (no header / wrong token /
correct), session cache reuse, concurrent appends, closeAll release,
Content-Type. 164 tests total, 0 failures.
examples/daemon/daemon.sh: self-contained walkthrough — starts daemon,
exercises every endpoint via curl + jq, cleans up.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|---|---|---|
| backend | ||
| backends | ||
| cmd/memo | ||
| examples | ||
| secrets | ||
| state | ||
| storage | ||
| go.mod | ||
| go.sum | ||
| README.md | ||
memo
Agent-memory CLI. Conversation history, scratchpad, and other thread-scoped shapes that an LLM workflow needs to remember across runs. Same shape as pantograf (pgf): self-describing actions, hand-written schemas, encrypted credentials, per-instance state.
memo is intentionally narrow: storage-backed memory shapes only. Semantic
search over long-lived memory is a separate concern (use semantic-layer).
External-system connectors are a separate concern (use pgf).
Concepts
- Backend — the storage type. sqlite, postgres, jsonl, ... Same role
as a pantograf connector. Each backend implements
Open(cred) → Sessionplus a set ofActions. - Instance — one configured backend + credentials, named.
sqlite/default,postgres/team. Persisted at~/.config/memo/instances/<type>/<name>.yaml. Secret fields are field-level sealed with NaCl secretbox (~/.config/memo/master.key). - Action — a verb the agent calls. The action set defines the shapes
of memory: conversation (
append-turn/read-thread), scratchpad (scratchpad-set/scratchpad-get), and whatever else makes sense. The shape logic lives in the action; the storage is swappable. - State store — per-instance
[]bytek/v for backend bookkeeping (last-id cursors, etc.) at~/.local/state/memo/state/<type>/<name>/.
CLI
memo backends # list backend types compiled in
memo actions <type> # list actions a backend exposes
memo connect <type> <name> # wizard to add an instance
memo instances # list configured instances
memo rm <type>/<name> # remove an instance
memo run <type>/<name> <action> [-p k=v ...] # execute an action
memo serve [--addr 127.0.0.1:8765] [--auth-token TOK] # HTTP daemon mode
history backend actions
| Action | Purpose |
|---|---|
append |
Add one record (chat / tool call / tool result / summary / fact). |
read |
Return records, with role/not_role/since/until/offset/limit/tail filters. |
sessions |
List sessions in this instance. |
delete |
Remove a whole session. |
purge |
Atomically remove records matching a filter (role / not_role / before / after / older_than). Rejected without any filter — use delete for that. |
kv backend actions
| Action | Purpose |
|---|---|
set |
Store a value at a key. Overwrites. |
get |
Fetch the value at a key. Errors if unset. |
delete |
Remove a single key. |
list |
Enumerate keys; optional prefix (with trailing / to walk a namespace, or bare for string-prefix match). |
Keys are /-separated paths. Per-segment chars: [A-Za-z0-9._-]. The
on-disk layout mirrors the keys exactly (one file per key, directories
per namespace) — find $dir -type f gives you a readable inventory.
Scratchpad is a convention, not a backend. Create an instance you treat as ephemeral and prefix each key with the session id:
memo connect --input '{"dir":"~/.memo-scratch"}' kv scratch
memo run kv/scratch set -p key=chat-2026-05-17/draft-reply -p value="..."
memo run kv/scratch set -p key=chat-2026-05-17/checkout-step -p value=2
# fetch every scratch note for one session
memo run kv/scratch list -p prefix=chat-2026-05-17/
# clean up at end of conversation — iterate and delete
memo run kv/scratch list -p prefix=chat-2026-05-17/ \
| jq -r '.[].key' \
| xargs -I{} memo run kv/scratch delete -p key={}
When to reach for kv vs history:
- history — anything ordered, append-only, that you'll want to read as a sequence: chat turns, tool calls, summaries, extracted facts.
- kv — anything you SET (overwrite) and GET: current step in a flow, draft text being edited, agent's latest decision, last-seen cursor.
-p repeats; comma-separated lists work for string_list fields. A
JSON object can be supplied with --input '{...}' or --input @file.json.
Environment
| Var | Default | Purpose |
|---|---|---|
MEMO_STORE_DIR |
~/.config/memo/instances |
Credential YAMLs root |
MEMO_STATE_DIR |
~/.local/state/memo/state |
Per-instance state root |
MEMO_KEY_DIR |
~/.config/memo |
master.key location |
MEMO_MASTER_KEY |
(unset) | Base64 32-byte master key override (overrides on-disk key entirely) |
MEMO_ALLOWED_PATHS |
(unset) | Colon-separated allow-list for IsPath schema fields. When set, every IsPath:true field must resolve under one of these roots, otherwise the command is rejected before any storage I/O. See pantograf's SECURITY.md for the threat model — memo applies the same gate. |
MEMO_AUTH_TOKEN |
(unset) | Bearer token required for memo serve if --auth-token flag isn't set. Empty = no auth (loopback bind only by default). |
Status
Two backends compiled in. Backend names describe the FUNCTION; storage
technology is configured per instance via the credential's driver
field, so the same agent scripts work across local files and a shared
Postgres without edits.
| Backend | Shape | Drivers | Use when |
|---|---|---|---|
history |
append-only chat records (chat / tool calls / summaries / facts) | file (JSONL on disk), postgres (shared DB) |
Conversation history, summary chains, fact extraction, tool-call linkage. Pick file for single-host dev / cron, postgres for multi-host or multi-agent. |
kv |
hierarchical key→string (scratchpad, agent state, config) | file |
Per-turn scratch notes, durable agent state, anything not append-only. A postgres driver is the natural next addition. |
Actions on history: append, read, sessions, delete, purge.
read supports windowing (offset+limit+tail), date filtering
(since+until), and role filtering (role / not_role) — covers
summarization windows and cursor-less last-summary lookups.
Actions on kv: set, get, delete, list.
The wizard supports conditional fields (FieldSpec.ShowWhen), so picking
driver=file only prompts for dir and driver=postgres only prompts
for dsn. When you pick postgres, memo runs the embedded migrations
idempotently and seals the DSN via the secrets vault. See examples/
for shell-composed agent patterns — every example works against either
driver by changing the credential, not the script.
Shape designed for shell composition with pgf:
memo connect --input '{"dir":"~/chat"}' history main
memo run history/main append -p session=s1 -p role=user -p content="hi"
memo run history/main read -p session=s1 \
| jq 'map({role,content})' \
| xargs -I{} pgf run llm/local chat-completion -p messages={}
The read output is a bare JSON array of records with role+content
at the top level — drops straight into any OpenAI-compatible LLM client
via jq 'map({role,content})'.
Storage is behind a store.Store interface in
backends/history/store. Two drivers live in tree today:
backends/history/filestore (JSONL files) and
backends/history/pgstore (Postgres + go:embed migrations).
Backend.Open dispatches on the credential's driver field.
Daemon mode (memo serve)
For hot-path callers — many sessions on one box, batch summarizers, agent
loops issuing hundreds of storage calls per turn — the per-call subprocess
overhead (~10-20ms) adds up. memo serve exposes the same action surface
over HTTP+JSON so callers skip the fork/exec cost while preserving the
exact same request/response shape as the CLI.
memo serve # 127.0.0.1:8765, no auth
memo serve --addr :8765 --auth-token TOK # all interfaces, token
MEMO_AUTH_TOKEN=TOK memo serve --addr :8765 # env-var form
Endpoints (all JSON):
| Method | Path | Purpose |
|---|---|---|
GET |
/v1/health |
Liveness probe ({status, backends}). |
GET |
/v1/backends |
List backend descriptors. |
GET |
/v1/backends/<type> |
Backend descriptor + credential schema + actions. |
GET |
/v1/instances |
List configured instances. |
POST |
/v1/run/<type>/<name>/<action> |
Execute action; request body is the JSON params object, response is the action's return value. |
Auth: when --auth-token (or MEMO_AUTH_TOKEN) is set, every endpoint
requires Authorization: Bearer <token>. Comparison is constant-time.
When the token is unset, the default bind is loopback only; to expose
the daemon outside the host you must set both --auth-token and an
explicit --addr like 0.0.0.0:8765 or a non-loopback IP.
Behind the scenes the daemon caches one open session per instance and
reuses it across requests, so the postgres pool is created once and
amortized. Graceful shutdown on SIGINT/SIGTERM drains in-flight
requests (default 30s budget) and closes every cached session.
Driving it from any language is subprocess → POST → JSON. Shell example:
curl -H "Authorization: Bearer $TOK" -H "Content-Type: application/json" \
-d '{"session":"demo","role":"user","content":"hi"}' \
http://localhost:8765/v1/run/history/main/append
Full walkthrough (start + auth + every endpoint, all in one script):
examples/daemon/daemon.sh.
Conversation shapes
memo is intentionally shape-agnostic at the storage layer (every record has the same fields), but the combinations of fields encode three distinct patterns. All three live in the same JSONL file; the action set is the same.
1. Plain chat — user / assistant / system
{"ts":"...","role":"system","content":"You are a concise assistant."}
{"ts":"...","role":"user","content":"What's the capital of France?"}
{"ts":"...","role":"assistant","content":"Paris."}
Pipe into any OpenAI-compatible LLM client by stripping memo-only fields:
memo run history/main read -p session=s | jq 'map({role,content})'
2. Tool calls — assistant emits, tool replies, linked by id
memo carries the OpenAI/Anthropic tool-use shape natively. An assistant
turn stores tool_calls: [{id, name, arguments}]; a tool-result turn uses
role=tool, tool_call_id (matching the call's id), and name (the tool
that ran). The id is the join key — same pattern simple-agent uses.
{"role":"user","content":"What's the weather in Berlin?"}
{"role":"assistant","tool_calls":[{"id":"call_42","name":"get_weather","arguments":"{\"city\":\"Berlin\"}"}]}
{"role":"tool","tool_call_id":"call_42","name":"get_weather","content":"{\"temp_c\":11,\"sky\":\"cloudy\"}"}
{"role":"assistant","content":"Berlin is 11°C and cloudy."}
Append a tool call:
memo run history/main append -p session=s -p role=assistant \
-p tool_calls='[{"id":"call_42","name":"get_weather","arguments":"{\"city\":\"Berlin\"}"}]'
Append the matching tool result:
memo run history/main append -p session=s -p role=tool \
-p tool_call_id=call_42 \
-p name=get_weather \
-p content='{"temp_c":11,"sky":"cloudy"}'
Join calls to their results with jq:
memo run history/main read -p session=s | jq '
(map(select(.role=="tool")) | map({(.tool_call_id): .content}) | add) as $results
| map(select(.tool_calls) | .tool_calls[] | {id, name, args: .arguments, result: $results[.id]})
'
When feeding history back to the LLM in a follow-up turn, re-nest the
flat tool_calls shape into the OpenAI {type, function:{name, arguments}}
form (and keep role=tool records with their tool_call_id):
memo run history/main read -p session=s -p 'role!=summary' | jq '
map(
if .tool_calls then
{role, content: (.content // ""),
tool_calls: [.tool_calls[] | {id, type:"function", function:{name, arguments}}]}
elif .role == "tool" then
{role, content, tool_call_id}
else
{role, content}
end
)'
The full assistant ↔ tool ↔ assistant loop is examples/tool-loop/.
3. Summaries — derived record with boundary in meta
For long conversations: every N turns, an LLM is asked to summarize the
batch. The summary lands as a record with role=summary and a
meta.from / meta.to boundary that says which turn-indices it covers.
No separate state file — the summary IS the cursor.
{"role":"user","content":"... 50 turns ..."}
{"role":"summary","content":"<summary text>","meta":{"from":"0","to":"50"}}
{"role":"user","content":"... 50 more turns ..."}
{"role":"summary","content":"<summary text>","meta":{"from":"50","to":"100"}}
Find where the last summary ended (cursor lookup, no state file):
LAST=$(memo run history/main read -p session=s \
-p role=summary -p limit=1 -p tail=true \
| jq -r '.[0].meta.to // "0"')
Read only the un-summarized turns:
memo run history/main read -p session=s \
-p 'role!=summary' \
-p offset="$LAST"
Inject the latest summary as system context in a follow-up call:
SUMMARY=$(memo run history/main read -p session=s \
-p role=summary -p limit=1 -p tail=true \
| jq -r '.[0].content // ""')
jq -nc \
--arg sys "Summary of earlier conversation:\n$SUMMARY" \
--arg q "$QUESTION" \
'[{role:"system",content:$sys},{role:"user",content:$q}]' \
| xargs -I{} pgf run llm/proxy chat-completion -p model=qwen36 -p messages={}
Full demo: examples/recall/.
4. Facts — extracted preferences / corrections / decisions
Same shape as summaries, different role and meta keys. Each fact is one
record; meta carries the type (pref, correction, decision,
context) and the source-window boundary so re-runs are cursor-less.
{"role":"fact","content":"prefers tabs over spaces","meta":{"type":"pref","from":"0","to":"50"}}
{"role":"fact","content":"running Go 1.25, not 1.21","meta":{"type":"correction","from":"0","to":"50"}}
{"role":"fact","content":"docker compose v2 only","meta":{"type":"pref","from":"50","to":"100"}}
List all facts:
memo run history/main read -p session=s -p role=fact
Inject them as system context for the next call:
FACTS=$(memo run history/main read -p session=s -p role=fact \
| jq -r 'map("- [\(.meta.type)] \(.content)") | join("\n")')
Substring search (cheap recall, no index):
memo run history/main read -p session=s -p role=fact \
| jq 'map(select(.content | test("docker"; "i")))'
Ranked / semantic recall has two levels:
- For per-session, ephemeral semantic search (no index), embed query +
candidates on the fly via
pgf llm embed, cosine-rank in jq. Right when you have ≲ a few hundred candidates per session and don't want persistent state. Seeexamples/embed-recall/. - For long-lived, cross-session semantic memory, push records into
wissen(hybrid BM25 + vectors with a persistent index). memo stores; wissen retrieves.
semantic-layer is a different concern — NL → domain-term routing
for SQL/agent dispatch, not embedding retrieval. Don't reach for it for
chat-record search.
Full demo: examples/extract-facts/ (writes) + examples/embed-recall/
(reads).
Examples
End-to-end shell scripts in examples/ showing memo + pgf composition:
| Script | What it does |
|---|---|
examples/chat/chat.sh |
Interactive REPL. Reads a line, persists, asks the LLM with full history, persists the reply. Resume by re-running with the same $SESSION. |
examples/summarize/summarize.sh |
Folds un-summarized turns into a rolling summary. Cursor-less — the summary record itself carries its boundary in meta.from / meta.to. Idempotent; safe to cron. |
examples/recall/recall.sh |
Proves the loop works both ways: the LLM recalls a seeded fact (a) from in-context history, then (b) from a summary-only context with no raw turns sent. |
examples/tool-loop/tool-loop.sh |
Autonomous tool-use agent. LLM picks tools (e.g. hn_top, now), script dispatches, results linked via tool_call_id, loop until final answer. All state in memo. |
examples/extract-facts/extract-facts.sh |
Pulls preferences / corrections / decisions out of a session and appends them as role=fact records (with meta.type). Same cursor-less pattern as summarize.sh; idempotent. |
examples/embed-recall/embed-recall.sh |
Semantic search over a session's facts/summaries. Embeds query + candidates via pgf llm embed, cosine in jq, returns ranked top-K. No persistent index — for that, push to wissen. |
examples/retention/retention.sh |
Tiered cleanup via memo purge: tools after 24h, raw turns after 30d, summaries after 365d, facts never. Cursor-less and idempotent — safe to cron. |
All seven rely only on memo + pgf + jq. They expect:
memo connect history <inst>
pgf connect llm <inst>
pgf connect rss <inst> # tool-loop only