Ops
Runbook
Daily commands, common diagnostic steps, secret rotation.
Daily
| Command | What |
|---|---|
doppler run -- pnpm dev | Full local stack (Next dev + bot polling on demand) |
pnpm typecheck | All 10 workspaces, ~5s warm |
pnpm test | Vitest across core/db/tools/ui/web/bot |
pnpm lint | ESLint across all 9 packages with lint scripts |
pnpm format:check | Prettier check |
doppler open --project soma --config prod | Opens Doppler UI |
vercel logs soma-ai.cc --follow | Tail prod logs |
vercel --prod --yes | Deploy web |
Bot dev loop
# Terminal 1: web (Inngest handler + Telegram webhook)
doppler run -- pnpm --filter @soma/web dev
# Terminal 2: bot polling (no public URL needed)
doppler run -- pnpm --filter @soma/bot devWhen running both, the polling bot takes messages and you can test agent flows without Telegram webhook setup.
Checking the agent is alive
curl -s https://soma-ai.cc/api/health
# {"ok":true,"service":"web","db":true,"ts":"..."}
curl -s https://soma-ai.cc/api/inngest \
| python3 -c "import json,sys;d=json.load(sys.stdin);print({k:d[k] for k in ['mode','function_count']})"
# {'mode': 'cloud', 'function_count': 8}Rotating a secret
- Update in Doppler:
doppler secrets set KEY='...' --project soma --config prod - Push to Vercel:
printf '%s' "$NEW" | vercel env add KEY production --force - Redeploy:
vercel --prod --yes
For Telegram webhook secret specifically, also re-run setWebhook with the new secret.
For OAuth encryption key rotation: more complex. Existing encrypted tokens in oauth_connections are keyed by the current value. Rotating means either re-encrypting all rows with a background job or forcing users to reconnect. No shortcut.
Diagnosing a slow chat response
- Open Langfuse. Filter
tags: chat. Find the trace for the slow request. - Check the Claude span — if
time_to_first_tokenis >3s, Anthropic latency is the cause. - If tool calls dominate, look at
memory_recall— is it hitting HNSW or doing a seq scan?EXPLAIN ANALYZEthe query. - If both LLM and tools look fine, check Vercel function logs for cold-start + GC pauses.
Diagnosing "agent stream onError"
The chat route logs every stream-level error via log.error({ err, userId }, 'agent stream onError'). Check Axiom or Vercel logs for the api/chat surface. Common causes:
- Claude 400 "invalid tool name" — a tool id has a
.(dot). Anthropic pattern is^[a-zA-Z0-9_-]{1,128}$. See Tool registry. - Mastra worker crash —
the worker thread exited. Pino transport or PgVector trying to init in Next runtime. See Observability. - Postgres connection refused — Supabase direct hostname is IPv6-only on Pro without add-on. Use the session pooler.
Incident response
- Site down? Vercel dashboard → Deployments → Promote the previous build (or
vercel rollback). - Telegram bot silent?
getWebhookInfoto check forlast_error_message. If Telegram can't reach the webhook, check Vercel deployment and DNS. - Workflows not firing? Inngest dashboard → Apps → verify
soma-ai.cc/api/inngestis healthy. Re-sync if needed. - DB errors? Supabase Dashboard → Reports → check connection count and slow queries.