memo — agent-memory CLI. history (file/postgres) + kv backends. Short-term and durable agent state.

Find a file

Hannes Lehmann f5865b3408 memo: add HTTP daemon mode (memo serve) `memo serve` exposes the same action surface as the CLI over HTTP+JSON, so hot-path callers skip the ~10-20ms per-call subprocess overhead while preserving the exact request/response shape. Endpoints (all JSON): GET /v1/health — liveness probe GET /v1/backends — list backend descriptors GET /v1/backends/<type> — full schema + actions GET /v1/instances — list configured instances POST /v1/run/<type>/<name>/<action> — execute action (body = params) Auth via --auth-token (or MEMO_AUTH_TOKEN env). Constant-time bearer comparison. Default bind 127.0.0.1:8765 when no token; explicit --addr required to expose externally. Behind the scenes: - one cached session per (type, name); reused across requests - filestore + pgstore are already concurrent-safe so caching is sound - graceful shutdown on SIGINT/SIGTERM drains in-flight requests then closes every cached session (so the postgres pool releases cleanly) 23 HTTP-level tests covering health, introspection, run dispatch on both backends, every error mapping (unknown backend/instance/action, bad params, body must be JSON object), auth (no header / wrong token / correct), session cache reuse, concurrent appends, closeAll release, Content-Type. 164 tests total, 0 failures. examples/daemon/daemon.sh: self-contained walkthrough — starts daemon, exercises every endpoint via curl + jq, cleans up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-05-26 21:37:55 +02:00
backend	memo: collapse history-pg into history with driver-discriminated credential	2026-05-17 23:16:37 +02:00
backends	memo: collapse history-pg into history with driver-discriminated credential	2026-05-17 23:16:37 +02:00
cmd/memo	memo: add HTTP daemon mode (memo serve)	2026-05-26 21:37:55 +02:00
examples	memo: add HTTP daemon mode (memo serve)	2026-05-26 21:37:55 +02:00
secrets	memo: bootstrap CLI skeleton (no backends)	2026-05-17 13:00:03 +02:00
state	memo: bootstrap CLI skeleton (no backends)	2026-05-17 13:00:03 +02:00
storage	memo: bootstrap CLI skeleton (no backends)	2026-05-17 13:00:03 +02:00
go.mod	memo: add postgres (history-pg) backend with go:embed migrations + docker tests	2026-05-17 22:14:46 +02:00
go.sum	memo: add postgres (history-pg) backend with go:embed migrations + docker tests	2026-05-17 22:14:46 +02:00
README.md	memo: add HTTP daemon mode (memo serve)	2026-05-26 21:37:55 +02:00

README.md

memo

Agent-memory CLI. Conversation history, scratchpad, and other thread-scoped shapes that an LLM workflow needs to remember across runs. Same shape as pantograf (pgf): self-describing actions, hand-written schemas, encrypted credentials, per-instance state.

memo is intentionally narrow: storage-backed memory shapes only. Semantic search over long-lived memory is a separate concern (use semantic-layer). External-system connectors are a separate concern (use pgf).

Concepts

Backend — the storage type. sqlite, postgres, jsonl, ... Same role as a pantograf connector. Each backend implements Open(cred) → Session plus a set of Actions.
Instance — one configured backend + credentials, named. sqlite/default, postgres/team. Persisted at ~/.config/memo/instances/<type>/<name>.yaml. Secret fields are field-level sealed with NaCl secretbox (~/.config/memo/master.key).
Action — a verb the agent calls. The action set defines the shapes of memory: conversation (append-turn / read-thread), scratchpad (scratchpad-set / scratchpad-get), and whatever else makes sense. The shape logic lives in the action; the storage is swappable.
State store — per-instance []byte k/v for backend bookkeeping (last-id cursors, etc.) at ~/.local/state/memo/state/<type>/<name>/.

CLI

memo backends                      # list backend types compiled in
memo actions <type>                # list actions a backend exposes
memo connect <type> <name>         # wizard to add an instance
memo instances                     # list configured instances
memo rm <type>/<name>              # remove an instance
memo run <type>/<name> <action> [-p k=v ...]    # execute an action
memo serve [--addr 127.0.0.1:8765] [--auth-token TOK]   # HTTP daemon mode

history backend actions

Action	Purpose
`append`	Add one record (chat / tool call / tool result / summary / fact).
`read`	Return records, with `role`/`not_role`/`since`/`until`/`offset`/`limit`/`tail` filters.
`sessions`	List sessions in this instance.
`delete`	Remove a whole session.
`purge`	Atomically remove records matching a filter (`role` / `not_role` / `before` / `after` / `older_than`). Rejected without any filter — use `delete` for that.

kv backend actions

Action	Purpose
`set`	Store a value at a key. Overwrites.
`get`	Fetch the value at a key. Errors if unset.
`delete`	Remove a single key.
`list`	Enumerate keys; optional `prefix` (with trailing `/` to walk a namespace, or bare for string-prefix match).

Keys are /-separated paths. Per-segment chars: [A-Za-z0-9._-]. The on-disk layout mirrors the keys exactly (one file per key, directories per namespace) — find $dir -type f gives you a readable inventory.

Scratchpad is a convention, not a backend. Create an instance you treat as ephemeral and prefix each key with the session id:

memo connect --input '{"dir":"~/.memo-scratch"}' kv scratch
memo run kv/scratch set -p key=chat-2026-05-17/draft-reply -p value="..."
memo run kv/scratch set -p key=chat-2026-05-17/checkout-step -p value=2

# fetch every scratch note for one session
memo run kv/scratch list -p prefix=chat-2026-05-17/

# clean up at end of conversation — iterate and delete
memo run kv/scratch list -p prefix=chat-2026-05-17/ \
  | jq -r '.[].key' \
  | xargs -I{} memo run kv/scratch delete -p key={}

When to reach for kv vs history:

history — anything ordered, append-only, that you'll want to read as a sequence: chat turns, tool calls, summaries, extracted facts.
kv — anything you SET (overwrite) and GET: current step in a flow, draft text being edited, agent's latest decision, last-seen cursor.

-p repeats; comma-separated lists work for string_list fields. A JSON object can be supplied with --input '{...}' or --input @file.json.

Environment

Var	Default	Purpose
`MEMO_STORE_DIR`	`~/.config/memo/instances`	Credential YAMLs root
`MEMO_STATE_DIR`	`~/.local/state/memo/state`	Per-instance state root
`MEMO_KEY_DIR`	`~/.config/memo`	`master.key` location
`MEMO_MASTER_KEY`	(unset)	Base64 32-byte master key override (overrides on-disk key entirely)
`MEMO_ALLOWED_PATHS`	(unset)	Colon-separated allow-list for `IsPath` schema fields. When set, every `IsPath:true` field must resolve under one of these roots, otherwise the command is rejected before any storage I/O. See pantograf's `SECURITY.md` for the threat model — memo applies the same gate.
`MEMO_AUTH_TOKEN`	(unset)	Bearer token required for `memo serve` if `--auth-token` flag isn't set. Empty = no auth (loopback bind only by default).

Status

Two backends compiled in. Backend names describe the FUNCTION; storage technology is configured per instance via the credential's driver field, so the same agent scripts work across local files and a shared Postgres without edits.

Backend	Shape	Drivers	Use when
`history`	append-only chat records (chat / tool calls / summaries / facts)	`file` (JSONL on disk), `postgres` (shared DB)	Conversation history, summary chains, fact extraction, tool-call linkage. Pick `file` for single-host dev / cron, `postgres` for multi-host or multi-agent.
`kv`	hierarchical key→string (scratchpad, agent state, config)	`file`	Per-turn scratch notes, durable agent state, anything not append-only. A `postgres` driver is the natural next addition.

Actions on history: append, read, sessions, delete, purge. read supports windowing (offset+limit+tail), date filtering (since+until), and role filtering (role / not_role) — covers summarization windows and cursor-less last-summary lookups.

Actions on kv: set, get, delete, list.

The wizard supports conditional fields (FieldSpec.ShowWhen), so picking driver=file only prompts for dir and driver=postgres only prompts for dsn. When you pick postgres, memo runs the embedded migrations idempotently and seals the DSN via the secrets vault. See examples/ for shell-composed agent patterns — every example works against either driver by changing the credential, not the script.

Shape designed for shell composition with pgf:

memo connect --input '{"dir":"~/chat"}' history main
memo run history/main append -p session=s1 -p role=user -p content="hi"
memo run history/main read   -p session=s1 \
  | jq 'map({role,content})' \
  | xargs -I{} pgf run llm/local chat-completion -p messages={}

The read output is a bare JSON array of records with role+content at the top level — drops straight into any OpenAI-compatible LLM client via jq 'map({role,content})'.

Storage is behind a store.Store interface in backends/history/store. Two drivers live in tree today: backends/history/filestore (JSONL files) and backends/history/pgstore (Postgres + go:embed migrations). Backend.Open dispatches on the credential's driver field.

Daemon mode (`memo serve`)

For hot-path callers — many sessions on one box, batch summarizers, agent loops issuing hundreds of storage calls per turn — the per-call subprocess overhead (~10-20ms) adds up. memo serve exposes the same action surface over HTTP+JSON so callers skip the fork/exec cost while preserving the exact same request/response shape as the CLI.

memo serve                                          # 127.0.0.1:8765, no auth
memo serve --addr :8765 --auth-token TOK            # all interfaces, token
MEMO_AUTH_TOKEN=TOK memo serve --addr :8765         # env-var form

Endpoints (all JSON):

Method	Path	Purpose
`GET`	`/v1/health`	Liveness probe (`{status, backends}`).
`GET`	`/v1/backends`	List backend descriptors.
`GET`	`/v1/backends/<type>`	Backend descriptor + credential schema + actions.
`GET`	`/v1/instances`	List configured instances.
`POST`	`/v1/run/<type>/<name>/<action>`	Execute action; request body is the JSON params object, response is the action's return value.

Auth: when --auth-token (or MEMO_AUTH_TOKEN) is set, every endpoint requires Authorization: Bearer <token>. Comparison is constant-time. When the token is unset, the default bind is loopback only; to expose the daemon outside the host you must set both --auth-token and an explicit --addr like 0.0.0.0:8765 or a non-loopback IP.

Behind the scenes the daemon caches one open session per instance and reuses it across requests, so the postgres pool is created once and amortized. Graceful shutdown on SIGINT/SIGTERM drains in-flight requests (default 30s budget) and closes every cached session.

Driving it from any language is subprocess → POST → JSON. Shell example:

curl -H "Authorization: Bearer $TOK" -H "Content-Type: application/json" \
  -d '{"session":"demo","role":"user","content":"hi"}' \
  http://localhost:8765/v1/run/history/main/append

Full walkthrough (start + auth + every endpoint, all in one script): examples/daemon/daemon.sh.

Conversation shapes

memo is intentionally shape-agnostic at the storage layer (every record has the same fields), but the combinations of fields encode three distinct patterns. All three live in the same JSONL file; the action set is the same.

1. Plain chat — user / assistant / system

{"ts":"...","role":"system","content":"You are a concise assistant."}
{"ts":"...","role":"user","content":"What's the capital of France?"}
{"ts":"...","role":"assistant","content":"Paris."}

Pipe into any OpenAI-compatible LLM client by stripping memo-only fields:

memo run history/main read -p session=s | jq 'map({role,content})'

2. Tool calls — assistant emits, tool replies, linked by id

memo carries the OpenAI/Anthropic tool-use shape natively. An assistant turn stores tool_calls: [{id, name, arguments}]; a tool-result turn uses role=tool, tool_call_id (matching the call's id), and name (the tool that ran). The id is the join key — same pattern simple-agent uses.

{"role":"user","content":"What's the weather in Berlin?"}
{"role":"assistant","tool_calls":[{"id":"call_42","name":"get_weather","arguments":"{\"city\":\"Berlin\"}"}]}
{"role":"tool","tool_call_id":"call_42","name":"get_weather","content":"{\"temp_c\":11,\"sky\":\"cloudy\"}"}
{"role":"assistant","content":"Berlin is 11°C and cloudy."}

Append a tool call:

memo run history/main append -p session=s -p role=assistant \
  -p tool_calls='[{"id":"call_42","name":"get_weather","arguments":"{\"city\":\"Berlin\"}"}]'

Append the matching tool result:

memo run history/main append -p session=s -p role=tool \
  -p tool_call_id=call_42 \
  -p name=get_weather \
  -p content='{"temp_c":11,"sky":"cloudy"}'

Join calls to their results with jq:

memo run history/main read -p session=s | jq '
  (map(select(.role=="tool")) | map({(.tool_call_id): .content}) | add) as $results
  | map(select(.tool_calls) | .tool_calls[] | {id, name, args: .arguments, result: $results[.id]})
'

When feeding history back to the LLM in a follow-up turn, re-nest the flat tool_calls shape into the OpenAI {type, function:{name, arguments}} form (and keep role=tool records with their tool_call_id):

memo run history/main read -p session=s -p 'role!=summary' | jq '
  map(
    if .tool_calls then
      {role, content: (.content // ""),
       tool_calls: [.tool_calls[] | {id, type:"function", function:{name, arguments}}]}
    elif .role == "tool" then
      {role, content, tool_call_id}
    else
      {role, content}
    end
  )'

The full assistant ↔ tool ↔ assistant loop is examples/tool-loop/.

3. Summaries — derived record with boundary in meta

For long conversations: every N turns, an LLM is asked to summarize the batch. The summary lands as a record with role=summary and a meta.from / meta.to boundary that says which turn-indices it covers. No separate state file — the summary IS the cursor.

{"role":"user","content":"... 50 turns ..."}
{"role":"summary","content":"<summary text>","meta":{"from":"0","to":"50"}}
{"role":"user","content":"... 50 more turns ..."}
{"role":"summary","content":"<summary text>","meta":{"from":"50","to":"100"}}

Find where the last summary ended (cursor lookup, no state file):

LAST=$(memo run history/main read -p session=s \
  -p role=summary -p limit=1 -p tail=true \
  | jq -r '.[0].meta.to // "0"')

Read only the un-summarized turns:

memo run history/main read -p session=s \
  -p 'role!=summary' \
  -p offset="$LAST"

Inject the latest summary as system context in a follow-up call:

SUMMARY=$(memo run history/main read -p session=s \
  -p role=summary -p limit=1 -p tail=true \
  | jq -r '.[0].content // ""')

jq -nc \
  --arg sys "Summary of earlier conversation:\n$SUMMARY" \
  --arg q "$QUESTION" \
  '[{role:"system",content:$sys},{role:"user",content:$q}]' \
  | xargs -I{} pgf run llm/proxy chat-completion -p model=qwen36 -p messages={}

Full demo: examples/recall/.

4. Facts — extracted preferences / corrections / decisions

Same shape as summaries, different role and meta keys. Each fact is one record; meta carries the type (pref, correction, decision, context) and the source-window boundary so re-runs are cursor-less.

{"role":"fact","content":"prefers tabs over spaces","meta":{"type":"pref","from":"0","to":"50"}}
{"role":"fact","content":"running Go 1.25, not 1.21","meta":{"type":"correction","from":"0","to":"50"}}
{"role":"fact","content":"docker compose v2 only","meta":{"type":"pref","from":"50","to":"100"}}

List all facts:

memo run history/main read -p session=s -p role=fact

Inject them as system context for the next call:

FACTS=$(memo run history/main read -p session=s -p role=fact \
  | jq -r 'map("- [\(.meta.type)] \(.content)") | join("\n")')

Substring search (cheap recall, no index):

memo run history/main read -p session=s -p role=fact \
  | jq 'map(select(.content | test("docker"; "i")))'

Ranked / semantic recall has two levels:

For per-session, ephemeral semantic search (no index), embed query + candidates on the fly via pgf llm embed, cosine-rank in jq. Right when you have ≲ a few hundred candidates per session and don't want persistent state. See examples/embed-recall/.
For long-lived, cross-session semantic memory, push records into wissen (hybrid BM25 + vectors with a persistent index). memo stores; wissen retrieves.

semantic-layer is a different concern — NL → domain-term routing for SQL/agent dispatch, not embedding retrieval. Don't reach for it for chat-record search.

Full demo: examples/extract-facts/ (writes) + examples/embed-recall/ (reads).

Examples

End-to-end shell scripts in examples/ showing memo + pgf composition:

Script	What it does
`examples/chat/chat.sh`	Interactive REPL. Reads a line, persists, asks the LLM with full history, persists the reply. Resume by re-running with the same `$SESSION`.
`examples/summarize/summarize.sh`	Folds un-summarized turns into a rolling summary. Cursor-less — the summary record itself carries its boundary in `meta.from` / `meta.to`. Idempotent; safe to cron.
`examples/recall/recall.sh`	Proves the loop works both ways: the LLM recalls a seeded fact (a) from in-context history, then (b) from a summary-only context with no raw turns sent.
`examples/tool-loop/tool-loop.sh`	Autonomous tool-use agent. LLM picks tools (e.g. `hn_top`, `now`), script dispatches, results linked via `tool_call_id`, loop until final answer. All state in memo.
`examples/extract-facts/extract-facts.sh`	Pulls preferences / corrections / decisions out of a session and appends them as `role=fact` records (with `meta.type`). Same cursor-less pattern as `summarize.sh`; idempotent.
`examples/embed-recall/embed-recall.sh`	Semantic search over a session's facts/summaries. Embeds query + candidates via `pgf llm embed`, cosine in jq, returns ranked top-K. No persistent index — for that, push to `wissen`.
`examples/retention/retention.sh`	Tiered cleanup via `memo purge`: tools after 24h, raw turns after 30d, summaries after 365d, facts never. Cursor-less and idempotent — safe to cron.

All seven rely only on memo + pgf + jq. They expect:

memo connect history <inst>
pgf  connect llm     <inst>
pgf  connect rss     <inst>   # tool-loop only