SOMA docs
Ops

Runbook

Daily commands, common diagnostic steps, secret rotation.

Daily

CommandWhat
doppler run -- pnpm devFull local stack (Next dev + bot polling on demand)
pnpm typecheckAll 10 workspaces, ~5s warm
pnpm testVitest across core/db/tools/ui/web/bot
pnpm lintESLint across all 9 packages with lint scripts
pnpm format:checkPrettier check
doppler open --project soma --config prodOpens Doppler UI
vercel logs soma-ai.cc --followTail prod logs
vercel --prod --yesDeploy web

Bot dev loop

# Terminal 1: web (Inngest handler + Telegram webhook)
doppler run -- pnpm --filter @soma/web dev

# Terminal 2: bot polling (no public URL needed)
doppler run -- pnpm --filter @soma/bot dev

When running both, the polling bot takes messages and you can test agent flows without Telegram webhook setup.

Checking the agent is alive

curl -s https://soma-ai.cc/api/health
# {"ok":true,"service":"web","db":true,"ts":"..."}

curl -s https://soma-ai.cc/api/inngest \
  | python3 -c "import json,sys;d=json.load(sys.stdin);print({k:d[k] for k in ['mode','function_count']})"
# {'mode': 'cloud', 'function_count': 8}

Rotating a secret

  1. Update in Doppler: doppler secrets set KEY='...' --project soma --config prod
  2. Push to Vercel: printf '%s' "$NEW" | vercel env add KEY production --force
  3. Redeploy: vercel --prod --yes

For Telegram webhook secret specifically, also re-run setWebhook with the new secret.

For OAuth encryption key rotation: more complex. Existing encrypted tokens in oauth_connections are keyed by the current value. Rotating means either re-encrypting all rows with a background job or forcing users to reconnect. No shortcut.

Diagnosing a slow chat response

  1. Open Langfuse. Filter tags: chat. Find the trace for the slow request.
  2. Check the Claude span — if time_to_first_token is >3s, Anthropic latency is the cause.
  3. If tool calls dominate, look at memory_recall — is it hitting HNSW or doing a seq scan? EXPLAIN ANALYZE the query.
  4. If both LLM and tools look fine, check Vercel function logs for cold-start + GC pauses.

Diagnosing "agent stream onError"

The chat route logs every stream-level error via log.error({ err, userId }, 'agent stream onError'). Check Axiom or Vercel logs for the api/chat surface. Common causes:

  • Claude 400 "invalid tool name" — a tool id has a . (dot). Anthropic pattern is ^[a-zA-Z0-9_-]{1,128}$. See Tool registry.
  • Mastra worker crashthe worker thread exited. Pino transport or PgVector trying to init in Next runtime. See Observability.
  • Postgres connection refused — Supabase direct hostname is IPv6-only on Pro without add-on. Use the session pooler.

Incident response

  1. Site down? Vercel dashboard → Deployments → Promote the previous build (or vercel rollback).
  2. Telegram bot silent? getWebhookInfo to check for last_error_message. If Telegram can't reach the webhook, check Vercel deployment and DNS.
  3. Workflows not firing? Inngest dashboard → Apps → verify soma-ai.cc/api/inngest is healthy. Re-sync if needed.
  4. DB errors? Supabase Dashboard → Reports → check connection count and slow queries.