A serious AI voter-engagement system is not a chatbot with a phone number. It is a distributed system that ingests millions of voter records, routes them through telephony in compliance with TRAI rules, manages real-time multi-language conversations on top of a state-of-the-art ML stack, and produces structured intelligence that feeds back into the campaign's strategic decisions — all while honouring ECI rules and DPDP constraints.
This guide is the reference architecture. It is what a serious engineering team would build, and what serious vendors deliver as a service. Everything below is what we wish someone had written down in 2023.
High-level architecture
The engine has six layers:
- Data ingestion — voter lists, electoral roll, DND scrubbing
- Orchestration — call scheduling, throttling, retry logic
- Conversation runtime — STT, LLM, TTS, telephony
- Storage and audit — call records, transcripts, audit logs
- Analytics and dashboard — sentiment, intent, daily aggregates
- CRM and CRM-adjacent integrations — Salesforce, panna pramukh apps, WhatsApp, SMS
Each layer has its own service boundary, its own SLOs and its own failure modes. The mistake first-time builders make is to treat this as a single monolithic application.
Layer 1: Data ingestion
Inputs flow in from three sources:
- ECI electoral roll (per-AC PDF/CSV, contains voter ID, name, age, gender, booth)
- Campaign-collected data (door-to-door survey results, event attendance, donations)
- Third-party enrichment (phone-to-name matching from telco partner, scheme-eligibility flags from government data partner)
The ingestion pipeline runs daily during campaign and weekly otherwise. Key transformations:
- Phone normalisation to international format (+91…) with validity check
- DND scrubbing against the National DND registry (real-time API)
- Deduplication by EPIC voter ID, then by phone, then by name+location fuzzy match
- Booth-level segmentation — every voter is tagged with their booth + AC + parliamentary constituency
- Language tag — best-guess at voter's primary language based on geography + name pattern
Output: a normalised voter table with hashed phone numbers and metadata. Stored in a Postgres-compatible database (CloudSQL India, RDS Mumbai, or self-hosted on India-hosted infrastructure for DPDP residency).
Layer 2: Orchestration
This is the layer that decides who gets called when. It has more business logic than the conversation runtime.
Scheduling. Each call wave has a target voter cohort (e.g., "all undecided voters in AC-204, booths 1-50"), a target window (T-30 to T-25 days), a target completion rate (e.g., 70%). The orchestrator distributes calls across the window to:
- Avoid concurrency spikes
- Respect telco rate limits (typically 50–200 calls/second per sender ID)
- Honour time-of-day rules (no calls before 9am or after 8pm per TRAI guidelines)
- Avoid calling the same voter more than once per 48 hours
Throttling. When carrier spam-flagging is detected (connect rate drops below 50% within an hour), the orchestrator automatically reduces calls/hour and rotates sender IDs.
Retry logic. Unanswered calls retry once after 4 hours. Busy lines retry twice. Hangups don't retry — the voter has expressed disinterest.
Use-case routing. A single voter may be in multiple campaigns simultaneously (persuasion wave + GOTV wave + scheme-awareness). The orchestrator decides which use case wins for any given moment, based on priority rules.
Output: a stream of call-execution requests sent to Layer 3.
Layer 3: Conversation runtime
This is the core. For each call:
- Telephony dial via SIP trunk or WebRTC. Indian-numbered sender pool. Carrier routing through Jio/Airtel/Vi/BSNL.
- STT stream opens as soon as voter answers. 16kHz PCM, streaming partial transcripts.
- LLM inference with system prompt + conversation history + RAG-retrieved KB chunks.
- TTS stream outputs response audio in chunks. 24kHz PCM transcoded to 8kHz for telephony.
- VAD (voice activity detection) runs continuously. Detects voter speaking — pauses TTS. Detects end of utterance — triggers next LLM inference.
- HARD STOP rules evaluated at each turn: goodbye detected, anger detected, two silent turns, 90-sec cap.
- Call termination with structured record write to Layer 4.
Latency targets per stage (recap from the technical guide):
- STT: <250ms
- LLM time-to-first-token: <500ms
- TTS time-to-first-audio: <250ms
- Telephony (in-country): <200ms
Total end-to-end: <1200ms in target conditions.
The runtime must be co-located across the path. Running STT in Singapore, LLM in Mumbai and TTS in Frankfurt produces 2–3× the target latency through accumulated network hops.
Layer 4: Storage and audit
Three storage tiers, each with different access patterns:
Hot tier (real-time):
- Active calls table — currently-running conversations
- 24-hour transcripts — for war-room war-room review
- Daily sentiment aggregates per booth
- Storage: Postgres + Redis (cache)
Warm tier (last 90 days):
- Full call recordings (compressed audio)
- Searchable transcript index
- Per-voter conversation history
- Storage: S3-compatible object storage (India region) + Postgres index
Cold tier (compliance retention):
- Audit log — every system prompt change, every model swap, every release
- Voter-erasure log — every right-to-erasure request and confirmation
- Annual reports for ECI compliance
- Storage: cold S3/Glacier-equivalent, 24-month retention default
DPDP-compliance specifics:
- Phone numbers are hashed (HMAC-SHA256 with rotating salt) before storage. Plaintext phone exists only in transit and in the active-calls table.
- Voter consent is captured at first contact and recorded as a separate document tied to the voter hash.
- Right-to-erasure pipeline scans all tiers and produces a "completed" attestation within 7 days.
- All admin access to voter-identifiable data is logged.
Layer 5: Analytics and dashboard
The output that the campaign manager actually looks at.
Real-time dashboards (refresh every 60 seconds):
- Total calls today, completion rate, sentiment distribution
- Top 10 issues mentioned in last hour
- Booths with anomalous metrics (low completion, high hangup, negative sentiment)
- Active call count and concurrency
Daily aggregates (auto-generated 6am):
- Sentiment by booth-AC-PC, comparison vs prior day
- Top 20 issues with trending direction
- Demographic breakdown (age, gender) by intent class
- Cost per call, cost per meaningful conversation
Weekly reports (auto-generated Monday morning):
- Sentiment trajectory chart per booth
- Issue cluster analysis (which issues are correlated)
- Persuasion-window booths (where sentiment is moveable)
- Operations stats (uptime, latency p99, cost variance)
On-demand analysis:
- "Show me all calls from a specific district where voters mentioned roads"
- "Compare sentiment between 9am and 5pm calls"
- "What's the message that resonates best in this region?"
The intelligence layer here is what turns the raw conversation stream into strategic input for the campaign. Without this, the campaign is running calls blind.
Layer 6: Integrations
The engine has to plug into the rest of the campaign's stack.
Inbound integrations:
- ECI electoral roll fetch (per-AC, on-demand)
- DND registry sync (real-time API)
- Scheme-eligibility data (where available from state govt)
- Campaign survey results (door-to-door, rally attendance)
Outbound integrations:
- WhatsApp Business API (for cross-channel handoff)
- SMS gateway (for time-critical reminders)
- Panna pramukh / booth-worker apps (for ground-team coordination)
- CRM (Salesforce, custom Postgres, etc.) for grievance ticket creation
Webhook surface:
- Per-call completion webhook (to update CRM, trigger follow-ups)
- Sentiment threshold alert (page the war-room if sentiment in a booth drops below threshold)
- Operations alerts (telephony spam-flag, infrastructure failure)
Engineering team structure
For a campaign that builds and operates this engine in-house (rare — recommended only for permanent multi-election deployment):
- Lead engineer (full stack, owns architecture)
- Telephony engineer (SIP, DLT, carrier relationships)
- ML/AI engineer (system prompt, dialect tuning, retrieval)
- Data engineer (pipeline, dashboard, integrations)
- DevOps / SRE (infrastructure, latency, on-call)
- Compliance / legal liaison (ECI, TRAI, DPDP)
That's 6 specialised roles, ~₹3-5 cr annual personnel cost. Most state-scale campaigns cannot justify this in-house. Specialist vendors absorb this team cost across many campaigns.
SLOs to demand from any vendor
If you outsource, these are the SLOs to bake into the contract:
- Uptime: 99.5% during campaign window (43.8 minutes max downtime per month)
- First-token latency p95: <800ms
- End-to-end call latency p95: <1500ms
- Call completion rate: 50%+ (lower indicates bad voice/script/list)
- Sentiment classification accuracy: 80%+ on a held-out test set
- Dashboard refresh latency: <5 minutes
- Right-to-erasure SLA: 7 days
- War-room support response: <15 minutes during campaign window, <2 hours otherwise
What this engine looks like at maturity
A mature deployment runs the same engine across:
- The current campaign (active outbound + inbound)
- Permanent governance helpline (inbound, 5-year)
- Quarterly sentiment surveys (small outbound waves)
- Special-occasion campaigns (scheme awareness drives, emergency response)
The infrastructure is the same. The configuration (system prompt, voter cohort, language settings) changes per use case. Over a 5-year cycle, a constituency can accumulate ~10 million conversation records — a uniquely deep grounded dataset that no opinion poll can replicate.
Where to go next
- The 30-Day Deployment Playbook — execution
- 10 Must-Have Features — vendor checklist
- Voter Sentiment Analysis Pipeline — analytics deep dive
- AI Election Agent Pricing — cost benchmarking
The engine is six layers, each with its own complexity. Building it well takes a small dedicated team and 90+ days. Operating it well takes a 5-year horizon — which is exactly the horizon Indian elections run on.