Models & stack

The agent is a Mastra Agent instance with a tool registry and a streaming Anthropic model. The stack is tight on purpose — one agent framework, one model vendor, one embedding vendor.

Role	Model	Where
Primary streaming agent	claude-sonnet-4-5-20250929	`somaAgent` in `packages/agent/src/agents/soma.ts`
Deep synthesis (weekly review, brief)	claude-opus-4-5	`somaAgentOpus`, same file
Embeddings (entities + facts + query)	voyage-3-large (1024d)	`@soma/tools/shared/embed.ts`
Reranking (top-K over top-3K)	rerank-2	`@soma/tools/shared/rerank.ts`
Voice transcription (bot)	Whisper (OpenAI)	`@soma/tools/shared/transcribe.ts`
Fact extraction	claude-haiku-4-5	`packages/agent/src/inngest/functions/fact-extract.ts`

Why Sonnet for streaming

Sonnet 4.5 is the right cost/latency/quality trade for typed-tool-call-driven dialog. It reliably picks tools from the registry, rarely hallucinates entities that don't exist in memory_recall results, and keeps tone consistent across turns. Opus is a noticeable upgrade for long-form synthesis (weekly review) but 3-5× slower and 5× more expensive — overkill for chat turns.

Why Haiku for fact extraction

Fact extraction runs async per conversation turn. We need structured output (an array of fact objects), good-enough quality, and very low cost. Haiku hits all three. generateObject with a Zod schema gives us non-negotiable shape.

Why Voyage instead of OpenAI embeddings

voyage-3-large scores top-1 on MTEB knowledge retrieval benchmarks at 1024d, and the accompanying rerank-2 cross-encoder lifts top-K precision dramatically over dense-only recall. The cost is comparable to text-embedding-3-large. Swapping later is a one-file change in @soma/tools/shared/embed.ts, but there's no reason to.

Agent definition

// packages/agent/src/agents/soma.ts
export const somaAgent = new Agent({
  name: 'soma',
  description: 'SOMA — personal AI assistant with knowledge-graph memory.',
  instructions: SYSTEM_PROMPT,
  model: anthropic('claude-sonnet-4-5-20250929'),
  tools: allTools,
  ...(memory ? { memory } : {}),
});

allTools is the dictionary of tool handles from @soma/tools/index.ts. See Tool registry.

Why Sonnet for streaming

Why Haiku for fact extraction

Why Voyage instead of OpenAI embeddings

Agent definition

On this page