Models & stack
Which LLM drives which path, and why.
The agent is a Mastra Agent instance with a tool registry and a streaming Anthropic model. The stack is tight on purpose — one agent framework, one model vendor, one embedding vendor.
| Role | Model | Where |
|---|---|---|
| Primary streaming agent | claude-sonnet-4-5-20250929 | somaAgent in packages/agent/src/agents/soma.ts |
| Deep synthesis (weekly review, brief) | claude-opus-4-5 | somaAgentOpus, same file |
| Embeddings (entities + facts + query) | voyage-3-large (1024d) | @soma/tools/shared/embed.ts |
| Reranking (top-K over top-3K) | rerank-2 | @soma/tools/shared/rerank.ts |
| Voice transcription (bot) | Whisper (OpenAI) | @soma/tools/shared/transcribe.ts |
| Fact extraction | claude-haiku-4-5 | packages/agent/src/inngest/functions/fact-extract.ts |
Why Sonnet for streaming
Sonnet 4.5 is the right cost/latency/quality trade for typed-tool-call-driven dialog. It reliably picks tools from the registry, rarely hallucinates entities that don't exist in memory_recall results, and keeps tone consistent across turns. Opus is a noticeable upgrade for long-form synthesis (weekly review) but 3-5× slower and 5× more expensive — overkill for chat turns.
Why Haiku for fact extraction
Fact extraction runs async per conversation turn. We need structured output (an array of fact objects), good-enough quality, and very low cost. Haiku hits all three. generateObject with a Zod schema gives us non-negotiable shape.
Why Voyage instead of OpenAI embeddings
voyage-3-large scores top-1 on MTEB knowledge retrieval benchmarks at 1024d, and the accompanying rerank-2 cross-encoder lifts top-K precision dramatically over dense-only recall. The cost is comparable to text-embedding-3-large. Swapping later is a one-file change in @soma/tools/shared/embed.ts, but there's no reason to.
Agent definition
// packages/agent/src/agents/soma.ts
export const somaAgent = new Agent({
name: 'soma',
description: 'SOMA — personal AI assistant with knowledge-graph memory.',
instructions: SYSTEM_PROMPT,
model: anthropic('claude-sonnet-4-5-20250929'),
tools: allTools,
...(memory ? { memory } : {}),
});allTools is the dictionary of tool handles from @soma/tools/index.ts. See Tool registry.