memo — agent-memory CLI. history (file/postgres) + kv backends. Short-term and durable agent state.
Find a file
Hannes Lehmann f5865b3408 memo: add HTTP daemon mode (memo serve)
`memo serve` exposes the same action surface as the CLI over HTTP+JSON,
so hot-path callers skip the ~10-20ms per-call subprocess overhead while
preserving the exact request/response shape.

Endpoints (all JSON):
  GET  /v1/health                       — liveness probe
  GET  /v1/backends                     — list backend descriptors
  GET  /v1/backends/<type>              — full schema + actions
  GET  /v1/instances                    — list configured instances
  POST /v1/run/<type>/<name>/<action>   — execute action (body = params)

Auth via --auth-token (or MEMO_AUTH_TOKEN env). Constant-time bearer
comparison. Default bind 127.0.0.1:8765 when no token; explicit --addr
required to expose externally.

Behind the scenes:
  - one cached session per (type, name); reused across requests
  - filestore + pgstore are already concurrent-safe so caching is sound
  - graceful shutdown on SIGINT/SIGTERM drains in-flight requests then
    closes every cached session (so the postgres pool releases cleanly)

23 HTTP-level tests covering health, introspection, run dispatch on both
backends, every error mapping (unknown backend/instance/action, bad
params, body must be JSON object), auth (no header / wrong token /
correct), session cache reuse, concurrent appends, closeAll release,
Content-Type. 164 tests total, 0 failures.

examples/daemon/daemon.sh: self-contained walkthrough — starts daemon,
exercises every endpoint via curl + jq, cleans up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 21:37:55 +02:00
backend memo: collapse history-pg into history with driver-discriminated credential 2026-05-17 23:16:37 +02:00
backends memo: collapse history-pg into history with driver-discriminated credential 2026-05-17 23:16:37 +02:00
cmd/memo memo: add HTTP daemon mode (memo serve) 2026-05-26 21:37:55 +02:00
examples memo: add HTTP daemon mode (memo serve) 2026-05-26 21:37:55 +02:00
secrets memo: bootstrap CLI skeleton (no backends) 2026-05-17 13:00:03 +02:00
state memo: bootstrap CLI skeleton (no backends) 2026-05-17 13:00:03 +02:00
storage memo: bootstrap CLI skeleton (no backends) 2026-05-17 13:00:03 +02:00
go.mod memo: add postgres (history-pg) backend with go:embed migrations + docker tests 2026-05-17 22:14:46 +02:00
go.sum memo: add postgres (history-pg) backend with go:embed migrations + docker tests 2026-05-17 22:14:46 +02:00
README.md memo: add HTTP daemon mode (memo serve) 2026-05-26 21:37:55 +02:00

memo

Agent-memory CLI. Conversation history, scratchpad, and other thread-scoped shapes that an LLM workflow needs to remember across runs. Same shape as pantograf (pgf): self-describing actions, hand-written schemas, encrypted credentials, per-instance state.

memo is intentionally narrow: storage-backed memory shapes only. Semantic search over long-lived memory is a separate concern (use semantic-layer). External-system connectors are a separate concern (use pgf).

Concepts

  • Backend — the storage type. sqlite, postgres, jsonl, ... Same role as a pantograf connector. Each backend implements Open(cred) → Session plus a set of Actions.
  • Instance — one configured backend + credentials, named. sqlite/default, postgres/team. Persisted at ~/.config/memo/instances/<type>/<name>.yaml. Secret fields are field-level sealed with NaCl secretbox (~/.config/memo/master.key).
  • Action — a verb the agent calls. The action set defines the shapes of memory: conversation (append-turn / read-thread), scratchpad (scratchpad-set / scratchpad-get), and whatever else makes sense. The shape logic lives in the action; the storage is swappable.
  • State store — per-instance []byte k/v for backend bookkeeping (last-id cursors, etc.) at ~/.local/state/memo/state/<type>/<name>/.

CLI

memo backends                      # list backend types compiled in
memo actions <type>                # list actions a backend exposes
memo connect <type> <name>         # wizard to add an instance
memo instances                     # list configured instances
memo rm <type>/<name>              # remove an instance
memo run <type>/<name> <action> [-p k=v ...]    # execute an action
memo serve [--addr 127.0.0.1:8765] [--auth-token TOK]   # HTTP daemon mode

history backend actions

Action Purpose
append Add one record (chat / tool call / tool result / summary / fact).
read Return records, with role/not_role/since/until/offset/limit/tail filters.
sessions List sessions in this instance.
delete Remove a whole session.
purge Atomically remove records matching a filter (role / not_role / before / after / older_than). Rejected without any filter — use delete for that.

kv backend actions

Action Purpose
set Store a value at a key. Overwrites.
get Fetch the value at a key. Errors if unset.
delete Remove a single key.
list Enumerate keys; optional prefix (with trailing / to walk a namespace, or bare for string-prefix match).

Keys are /-separated paths. Per-segment chars: [A-Za-z0-9._-]. The on-disk layout mirrors the keys exactly (one file per key, directories per namespace) — find $dir -type f gives you a readable inventory.

Scratchpad is a convention, not a backend. Create an instance you treat as ephemeral and prefix each key with the session id:

memo connect --input '{"dir":"~/.memo-scratch"}' kv scratch
memo run kv/scratch set -p key=chat-2026-05-17/draft-reply -p value="..."
memo run kv/scratch set -p key=chat-2026-05-17/checkout-step -p value=2

# fetch every scratch note for one session
memo run kv/scratch list -p prefix=chat-2026-05-17/

# clean up at end of conversation — iterate and delete
memo run kv/scratch list -p prefix=chat-2026-05-17/ \
  | jq -r '.[].key' \
  | xargs -I{} memo run kv/scratch delete -p key={}

When to reach for kv vs history:

  • history — anything ordered, append-only, that you'll want to read as a sequence: chat turns, tool calls, summaries, extracted facts.
  • kv — anything you SET (overwrite) and GET: current step in a flow, draft text being edited, agent's latest decision, last-seen cursor.

-p repeats; comma-separated lists work for string_list fields. A JSON object can be supplied with --input '{...}' or --input @file.json.

Environment

Var Default Purpose
MEMO_STORE_DIR ~/.config/memo/instances Credential YAMLs root
MEMO_STATE_DIR ~/.local/state/memo/state Per-instance state root
MEMO_KEY_DIR ~/.config/memo master.key location
MEMO_MASTER_KEY (unset) Base64 32-byte master key override (overrides on-disk key entirely)
MEMO_ALLOWED_PATHS (unset) Colon-separated allow-list for IsPath schema fields. When set, every IsPath:true field must resolve under one of these roots, otherwise the command is rejected before any storage I/O. See pantograf's SECURITY.md for the threat model — memo applies the same gate.
MEMO_AUTH_TOKEN (unset) Bearer token required for memo serve if --auth-token flag isn't set. Empty = no auth (loopback bind only by default).

Status

Two backends compiled in. Backend names describe the FUNCTION; storage technology is configured per instance via the credential's driver field, so the same agent scripts work across local files and a shared Postgres without edits.

Backend Shape Drivers Use when
history append-only chat records (chat / tool calls / summaries / facts) file (JSONL on disk), postgres (shared DB) Conversation history, summary chains, fact extraction, tool-call linkage. Pick file for single-host dev / cron, postgres for multi-host or multi-agent.
kv hierarchical key→string (scratchpad, agent state, config) file Per-turn scratch notes, durable agent state, anything not append-only. A postgres driver is the natural next addition.

Actions on history: append, read, sessions, delete, purge. read supports windowing (offset+limit+tail), date filtering (since+until), and role filtering (role / not_role) — covers summarization windows and cursor-less last-summary lookups.

Actions on kv: set, get, delete, list.

The wizard supports conditional fields (FieldSpec.ShowWhen), so picking driver=file only prompts for dir and driver=postgres only prompts for dsn. When you pick postgres, memo runs the embedded migrations idempotently and seals the DSN via the secrets vault. See examples/ for shell-composed agent patterns — every example works against either driver by changing the credential, not the script.

Shape designed for shell composition with pgf:

memo connect --input '{"dir":"~/chat"}' history main
memo run history/main append -p session=s1 -p role=user -p content="hi"
memo run history/main read   -p session=s1 \
  | jq 'map({role,content})' \
  | xargs -I{} pgf run llm/local chat-completion -p messages={}

The read output is a bare JSON array of records with role+content at the top level — drops straight into any OpenAI-compatible LLM client via jq 'map({role,content})'.

Storage is behind a store.Store interface in backends/history/store. Two drivers live in tree today: backends/history/filestore (JSONL files) and backends/history/pgstore (Postgres + go:embed migrations). Backend.Open dispatches on the credential's driver field.

Daemon mode (memo serve)

For hot-path callers — many sessions on one box, batch summarizers, agent loops issuing hundreds of storage calls per turn — the per-call subprocess overhead (~10-20ms) adds up. memo serve exposes the same action surface over HTTP+JSON so callers skip the fork/exec cost while preserving the exact same request/response shape as the CLI.

memo serve                                          # 127.0.0.1:8765, no auth
memo serve --addr :8765 --auth-token TOK            # all interfaces, token
MEMO_AUTH_TOKEN=TOK memo serve --addr :8765         # env-var form

Endpoints (all JSON):

Method Path Purpose
GET /v1/health Liveness probe ({status, backends}).
GET /v1/backends List backend descriptors.
GET /v1/backends/<type> Backend descriptor + credential schema + actions.
GET /v1/instances List configured instances.
POST /v1/run/<type>/<name>/<action> Execute action; request body is the JSON params object, response is the action's return value.

Auth: when --auth-token (or MEMO_AUTH_TOKEN) is set, every endpoint requires Authorization: Bearer <token>. Comparison is constant-time. When the token is unset, the default bind is loopback only; to expose the daemon outside the host you must set both --auth-token and an explicit --addr like 0.0.0.0:8765 or a non-loopback IP.

Behind the scenes the daemon caches one open session per instance and reuses it across requests, so the postgres pool is created once and amortized. Graceful shutdown on SIGINT/SIGTERM drains in-flight requests (default 30s budget) and closes every cached session.

Driving it from any language is subprocess → POST → JSON. Shell example:

curl -H "Authorization: Bearer $TOK" -H "Content-Type: application/json" \
  -d '{"session":"demo","role":"user","content":"hi"}' \
  http://localhost:8765/v1/run/history/main/append

Full walkthrough (start + auth + every endpoint, all in one script): examples/daemon/daemon.sh.

Conversation shapes

memo is intentionally shape-agnostic at the storage layer (every record has the same fields), but the combinations of fields encode three distinct patterns. All three live in the same JSONL file; the action set is the same.

1. Plain chat — user / assistant / system

{"ts":"...","role":"system","content":"You are a concise assistant."}
{"ts":"...","role":"user","content":"What's the capital of France?"}
{"ts":"...","role":"assistant","content":"Paris."}

Pipe into any OpenAI-compatible LLM client by stripping memo-only fields:

memo run history/main read -p session=s | jq 'map({role,content})'

2. Tool calls — assistant emits, tool replies, linked by id

memo carries the OpenAI/Anthropic tool-use shape natively. An assistant turn stores tool_calls: [{id, name, arguments}]; a tool-result turn uses role=tool, tool_call_id (matching the call's id), and name (the tool that ran). The id is the join key — same pattern simple-agent uses.

{"role":"user","content":"What's the weather in Berlin?"}
{"role":"assistant","tool_calls":[{"id":"call_42","name":"get_weather","arguments":"{\"city\":\"Berlin\"}"}]}
{"role":"tool","tool_call_id":"call_42","name":"get_weather","content":"{\"temp_c\":11,\"sky\":\"cloudy\"}"}
{"role":"assistant","content":"Berlin is 11°C and cloudy."}

Append a tool call:

memo run history/main append -p session=s -p role=assistant \
  -p tool_calls='[{"id":"call_42","name":"get_weather","arguments":"{\"city\":\"Berlin\"}"}]'

Append the matching tool result:

memo run history/main append -p session=s -p role=tool \
  -p tool_call_id=call_42 \
  -p name=get_weather \
  -p content='{"temp_c":11,"sky":"cloudy"}'

Join calls to their results with jq:

memo run history/main read -p session=s | jq '
  (map(select(.role=="tool")) | map({(.tool_call_id): .content}) | add) as $results
  | map(select(.tool_calls) | .tool_calls[] | {id, name, args: .arguments, result: $results[.id]})
'

When feeding history back to the LLM in a follow-up turn, re-nest the flat tool_calls shape into the OpenAI {type, function:{name, arguments}} form (and keep role=tool records with their tool_call_id):

memo run history/main read -p session=s -p 'role!=summary' | jq '
  map(
    if .tool_calls then
      {role, content: (.content // ""),
       tool_calls: [.tool_calls[] | {id, type:"function", function:{name, arguments}}]}
    elif .role == "tool" then
      {role, content, tool_call_id}
    else
      {role, content}
    end
  )'

The full assistant ↔ tool ↔ assistant loop is examples/tool-loop/.

3. Summaries — derived record with boundary in meta

For long conversations: every N turns, an LLM is asked to summarize the batch. The summary lands as a record with role=summary and a meta.from / meta.to boundary that says which turn-indices it covers. No separate state file — the summary IS the cursor.

{"role":"user","content":"... 50 turns ..."}
{"role":"summary","content":"<summary text>","meta":{"from":"0","to":"50"}}
{"role":"user","content":"... 50 more turns ..."}
{"role":"summary","content":"<summary text>","meta":{"from":"50","to":"100"}}

Find where the last summary ended (cursor lookup, no state file):

LAST=$(memo run history/main read -p session=s \
  -p role=summary -p limit=1 -p tail=true \
  | jq -r '.[0].meta.to // "0"')

Read only the un-summarized turns:

memo run history/main read -p session=s \
  -p 'role!=summary' \
  -p offset="$LAST"

Inject the latest summary as system context in a follow-up call:

SUMMARY=$(memo run history/main read -p session=s \
  -p role=summary -p limit=1 -p tail=true \
  | jq -r '.[0].content // ""')

jq -nc \
  --arg sys "Summary of earlier conversation:\n$SUMMARY" \
  --arg q "$QUESTION" \
  '[{role:"system",content:$sys},{role:"user",content:$q}]' \
  | xargs -I{} pgf run llm/proxy chat-completion -p model=qwen36 -p messages={}

Full demo: examples/recall/.

4. Facts — extracted preferences / corrections / decisions

Same shape as summaries, different role and meta keys. Each fact is one record; meta carries the type (pref, correction, decision, context) and the source-window boundary so re-runs are cursor-less.

{"role":"fact","content":"prefers tabs over spaces","meta":{"type":"pref","from":"0","to":"50"}}
{"role":"fact","content":"running Go 1.25, not 1.21","meta":{"type":"correction","from":"0","to":"50"}}
{"role":"fact","content":"docker compose v2 only","meta":{"type":"pref","from":"50","to":"100"}}

List all facts:

memo run history/main read -p session=s -p role=fact

Inject them as system context for the next call:

FACTS=$(memo run history/main read -p session=s -p role=fact \
  | jq -r 'map("- [\(.meta.type)] \(.content)") | join("\n")')

Substring search (cheap recall, no index):

memo run history/main read -p session=s -p role=fact \
  | jq 'map(select(.content | test("docker"; "i")))'

Ranked / semantic recall has two levels:

  • For per-session, ephemeral semantic search (no index), embed query + candidates on the fly via pgf llm embed, cosine-rank in jq. Right when you have ≲ a few hundred candidates per session and don't want persistent state. See examples/embed-recall/.
  • For long-lived, cross-session semantic memory, push records into wissen (hybrid BM25 + vectors with a persistent index). memo stores; wissen retrieves.

semantic-layer is a different concern — NL → domain-term routing for SQL/agent dispatch, not embedding retrieval. Don't reach for it for chat-record search.

Full demo: examples/extract-facts/ (writes) + examples/embed-recall/ (reads).

Examples

End-to-end shell scripts in examples/ showing memo + pgf composition:

Script What it does
examples/chat/chat.sh Interactive REPL. Reads a line, persists, asks the LLM with full history, persists the reply. Resume by re-running with the same $SESSION.
examples/summarize/summarize.sh Folds un-summarized turns into a rolling summary. Cursor-less — the summary record itself carries its boundary in meta.from / meta.to. Idempotent; safe to cron.
examples/recall/recall.sh Proves the loop works both ways: the LLM recalls a seeded fact (a) from in-context history, then (b) from a summary-only context with no raw turns sent.
examples/tool-loop/tool-loop.sh Autonomous tool-use agent. LLM picks tools (e.g. hn_top, now), script dispatches, results linked via tool_call_id, loop until final answer. All state in memo.
examples/extract-facts/extract-facts.sh Pulls preferences / corrections / decisions out of a session and appends them as role=fact records (with meta.type). Same cursor-less pattern as summarize.sh; idempotent.
examples/embed-recall/embed-recall.sh Semantic search over a session's facts/summaries. Embeds query + candidates via pgf llm embed, cosine in jq, returns ranked top-K. No persistent index — for that, push to wissen.
examples/retention/retention.sh Tiered cleanup via memo purge: tools after 24h, raw turns after 30d, summaries after 365d, facts never. Cursor-less and idempotent — safe to cron.

All seven rely only on memo + pgf + jq. They expect:

memo connect history <inst>
pgf  connect llm     <inst>
pgf  connect rss     <inst>   # tool-loop only