Runbook

Daily

Command	What
`doppler run -- pnpm dev`	Full local stack (Next dev + bot polling on demand)
`pnpm typecheck`	All 10 workspaces, ~5s warm
`pnpm test`	Vitest across core/db/tools/ui/web/bot
`pnpm lint`	ESLint across all 9 packages with lint scripts
`pnpm format:check`	Prettier check
`doppler open --project soma --config prod`	Opens Doppler UI
`vercel logs soma-ai.cc --follow`	Tail prod logs
`vercel --prod --yes`	Deploy web

Bot dev loop

# Terminal 1: web (Inngest handler + Telegram webhook)
doppler run -- pnpm --filter @soma/web dev

# Terminal 2: bot polling (no public URL needed)
doppler run -- pnpm --filter @soma/bot dev

When running both, the polling bot takes messages and you can test agent flows without Telegram webhook setup.

Checking the agent is alive

curl -s https://soma-ai.cc/api/health
# {"ok":true,"service":"web","db":true,"ts":"..."}

curl -s https://soma-ai.cc/api/inngest \
  | python3 -c "import json,sys;d=json.load(sys.stdin);print({k:d[k] for k in ['mode','function_count']})"
# {'mode': 'cloud', 'function_count': 8}

Rotating a secret

Update in Doppler: doppler secrets set KEY='...' --project soma --config prod
Push to Vercel: printf '%s' "$NEW" | vercel env add KEY production --force
Redeploy: vercel --prod --yes

For Telegram webhook secret specifically, also re-run setWebhook with the new secret.

For OAuth encryption key rotation: more complex. Existing encrypted tokens in oauth_connections are keyed by the current value. Rotating means either re-encrypting all rows with a background job or forcing users to reconnect. No shortcut.

Diagnosing a slow chat response

Open Langfuse. Filter tags: chat. Find the trace for the slow request.
Check the Claude span — if time_to_first_token is >3s, Anthropic latency is the cause.
If tool calls dominate, look at memory_recall — is it hitting HNSW or doing a seq scan? EXPLAIN ANALYZE the query.
If both LLM and tools look fine, check Vercel function logs for cold-start + GC pauses.

Diagnosing "agent stream onError"

The chat route logs every stream-level error via log.error({ err, userId }, 'agent stream onError'). Check Axiom or Vercel logs for the api/chat surface. Common causes:

Claude 400 "invalid tool name" — a tool id has a . (dot). Anthropic pattern is ^[a-zA-Z0-9_-]{1,128}$. See Tool registry.
Mastra worker crash — the worker thread exited. Pino transport or PgVector trying to init in Next runtime. See Observability.
Postgres connection refused — Supabase direct hostname is IPv6-only on Pro without add-on. Use the session pooler.

Incident response

Site down? Vercel dashboard → Deployments → Promote the previous build (or vercel rollback).
Telegram bot silent? getWebhookInfo to check for last_error_message. If Telegram can't reach the webhook, check Vercel deployment and DNS.
Workflows not firing? Inngest dashboard → Apps → verify soma-ai.cc/api/inngest is healthy. Re-sync if needed.
DB errors? Supabase Dashboard → Reports → check connection count and slow queries.