SOMA docs
Agent

Memory strategy

Two distinct memory systems, don't conflate them.

SOMA has two distinct memory layers. They solve different problems and must not be collapsed into one.

LayerStorePurposeRetention
Conversational (short-term)Mastra Memory + PostgresStoreLast N turns per thread, used to keep multi-turn context coherentlastMessages: 12
Knowledge (long-term)entities / edges / events / factsThe graph. Queried by toolsForever (or until forget / supersede)

Conversational memory

Mastra's built-in Memory tracks the message history of a thread, identified by threadId. On the web, the thread id is web:<userId>:<epochMs>; on the bot, it's tg:<chatId>:<epochMs>. The agent uses the last 12 turns as immediate context.

Gated on opt-in. Mastra's PgVector worker has historical issues running inside Next.js serverless (thread-stream worker resolution). We keep conversational memory disabled by default and enable only when SOMA_MASTRA_MEMORY=1 is set. SOMA's knowledge graph covers long-term recall through tool calls anyway.

Knowledge memory

The graph is the canonical memory. Tools hit it directly:

  • memory_recall — semantic search over entities via pgvector HNSW, optionally reranked.
  • search_entities — lexical full-text search over entities.name and searchable properties.
  • graph_neighbors — BFS walk from a root entity, up to 3 hops.

Facts are pulled in separately when the agent needs durable preferences ("user prefers morning workouts"). The fact-extract workflow runs async per conversation turn.

:::note Mastra's built-in semanticRecall is disabled. It would duplicate work (the knowledge graph already handles semantic recall) and require a second embedder config. The semanticRecall: { ... } block was removed from the Memory constructor. :::

Why two layers

Conversational memory answers "what did I just say two messages ago?". Knowledge memory answers "what do I know about Atomic Habits?". They have different:

  • Lifetimes — conversational is bounded (12 messages); knowledge is unbounded.
  • Granularity — conversational stores raw messages; knowledge stores extracted entities + facts.
  • Access patterns — conversational is always full-dump-on-read; knowledge is query-by-semantic-similarity.

Collapsing them would require either caching raw messages indefinitely (storage bloat) or discarding them immediately (losing multi-turn context). The split keeps both paths optimal.