Why your botâs database secretly runs the show
The first âsuccessfulâ bot I shipped in 2018 fell over not because the model was bad, or the code was slow, but because the database was a mess.
Users got replies meant for someone else. Sessions randomly âforgotâ who you were. Analytics were useless.
All because I treated the DB like an afterthought.
If youâre building bots for real users, the database is the spine. Not the model. Not the gateway. The data model.
In 2026 we have nicer LLMs and shinier frameworks, but the same old pain:
- âWhere do I store conversation state?â
- âHow do I keep sessions separate?â
- âWhat do I log without blowing up storage?â
Let me walk through how I design databases for bots that actually run in production, not just in demos.
Core entities: users, sessions, messages
Every bot I ship starts with three tables. Names change, ideas donât:
- users â who is talking to you
- sessions â a span of related conversation
- messages â what was said, in order
users
I treat users as âwho this platform thinks you areâ, not âthe philosophical concept of a personâ.
users - id (pk) - external_id (unique, e.g., slack_user_id, phone) - platform (slack | web | whatsapp | etc.) - created_at - deleted_at (nullable)
A Slack user and a WhatsApp user might be the same human, but I donât over-engineer âglobal identityâ until I actually need it.
If you try to solve identity perfectly on day one, you stall.
sessions
A session is âa coherent conversation thread.â For a support bot, it might start when someone asks a question and end after 30 minutes of silence or when an agent closes the ticket.
sessions - id (pk) - user_id (fk users.id) - platform (again, slack/web/etc.) - status (active | closed) - started_at - ended_at (nullable) - metadata (jsonb) -- e.g., topic, channel_id, ticket_id
I always store metadata as JSON for platform-specific junk I donât want leaking into the core schema.
Example: Slack channel_id, web browser_session_id, whatever.
messages
Messages are the single source of truth for âwhat actually happened.â
messages - id (pk) - session_id (fk sessions.id) - sender_type (user | bot | system) - sender_id (nullable, fk users.id for user, maybe bot_id later) - external_message_id (nullable, e.g., slack_ts) - content (text) - content_type (text | json | rich) - created_at - raw_payload (jsonb, nullable)
Two reasons this table matters:
- Debugging: when someone says âthe bot went crazy,â you need an exact transcript.
- Training: youâll want real conversations for fine-tuning or prompt tweaks.
For a bot handling ~50k messages/day (one we run in Postgres 15 on AWS RDS),
this structure has held up fine with proper indexing on session_id and created_at.
Querying a sessionâs last 50 messages is still cheap.
Where to put bot âstateâ so it doesnât bite you
The biggest design mistake I see: stuffing state everywhere. Redis. JWTs. Random JSON columns. Hidden in prompt strings.
Then nobody knows which state is the real one.
Short-term state: cache it
Things you need only during the current interaction, and that you can reconstruct from scratch, go into fast storage:
- Redis,
KeyDB, or in-memory store for ephemeral keys - Expiration in minutes, not days
Example keys I actually use:
session_state:{session_id} -> json, ttl=30m
rate_limit:{user_id} -> counters, ttl=1h
otp:{phone} -> code, ttl=5m
If Redis dies, your bot might forget where it was in a form. Annoying, but not catastrophic.
Thatâs the right kind of thing to put there.
Persistent state: database it
Anything youâd regret losing goes into a real database:
- Onboarding progress
- Feature flags per user
- Long-running workflows (âloan application #1234 step 3/7â)
I usually keep this in a separate table instead of bloating sessions:
conversation_state - id (pk) - session_id (fk sessions.id, unique) - state_name (text) -- e.g., "verify_email", "collect_address" - data (jsonb) -- arbitrary structured state - updated_at
Yes, JSON. No, not for everything. But for bot flow state, JSON is fine. It changes often, and youâre the only consumer.
Donât hide state in JWTs
Storing session logic in JWT claims is tempting. Itâs also how you get bugs where multiple clients disagree about reality.
I keep JWTs for auth, and thatâs it. State lives in the DB or cache.
Logging, analytics, and not drowning in data
Bots generate an absurd amount of data. If you log every token for every request and keep it forever, your infra bill will remind you monthly.
Split âruntime logsâ and âanalytics dataâ
I treat them as two different concerns:
- Runtime logs: for debugging. Short retention (7â30 days).
Ship them to something like Loki, Elasticsearch, or CloudWatch. - Analytics data: structured, queryable, long-term. Lives in SQL or a warehouse.
For analytics I like a separate schema or database:
conversation_metrics - id (pk) - session_id - user_id - messages_user_count - messages_bot_count - started_at - ended_at - first_response_ms - resolved (bool)
You compute this via a job that scans messages nightly or via streaming.
In one customer bot we turned on nightly aggregation in March 2025 and cut support âwhat is going on?â questions in half because we could actually see dropout points.
Prompt and model logs
If your bot uses LLMs, keep a record of what you sent and what you got back, with a pointer to the message/session.
llm_calls - id (pk) - session_id - message_id (fk messages.id, nullable) - provider (openai | anthropic | local) - model - prompt_tokens - completion_tokens - cost_usd - request_payload (jsonb) -- redacted - response_payload (jsonb) -- redacted - created_at
Two important notes:
- Redact secrets and PII before persisting logs.
- Set a retention policy. You donât need 3 years of raw prompts for 99% of bots.
SQL vs NoSQL vs vector stores (and how I mix them)
Short version: I default to Postgres. Iâll add other stores when the pain is real, not hypothetical.
Relational (Postgres, MySQL)
Perfect for:
- Users, sessions, messages, state
- Reporting, joins, migrations
- Strong consistency for âwho said what and whenâ
Postgres with jsonb gives you enough flexibility to avoid spinning up five different databases too early.
Cache (Redis)
Use this for:
- Rate limits
- Ephemeral session state
- Small, hot lookup tables (config flags, feature toggles)
Vector store
If your bot does retrieval-augmented generation (RAG), youâll want embeddings and similarity search.
In practice I see three patterns:
- Postgres + pgvector
- Dedicated services like Pinecone, Weaviate, Qdrant
- Cloud-specific options like Aurora + pgvector, AlloyDB, etc.
I lean on pgvector until:
- Embeddings > ~5â10 million rows or
- Query latency requirements get tight (< 50ms) with heavy traffic
Schema sketch:
documents - id (pk) - source (kb | ticket | faq) - source_id - content - metadata (jsonb) - created_at document_embeddings - id (pk) - document_id (fk documents.id) - embedding vector(1536) - created_at
Keep the âwhatâ (documents) separate from the âhow we search itâ (embeddings).
Youâll change models; you donât want that tangled with your core content.
FAQ: schema questions I keep getting
Q: Should I store full messages in the DB or just summaries?
Store full messages, at least for 30â90 days. Summaries are great for search and analytics, but when something breaks, you want the raw text.
If youâre worried about storage, set a retention policy and archive old messages to cheaper storage.
Q: How do I handle multi-tenant bots (many customers)?
Add a tenant_id everywhere that matters: users, sessions, messages, conversation_state.
Index it. Every query that touches user data must filter by tenant_id.
If you need stricter isolation later, you can split tenants across schemas or databases, but start with a solid tenant column.
Q: Where do I store files (images, PDFs) users send?
Not in the database. Store them in S3/GCS/Blob storage and keep references:
attachments - id (pk) - message_id (fk messages.id) - file_url - file_type - size_bytes - created_at
If you need to run OCR or embedding on them, store that output in separate tables linked back via attachment_id.
đ Published: