Should we build this in-house?

For constituency-scale and most state-scale campaigns, no. The technical complexity (multi-region failover, dialect ML, telephony integration, DPDP-compliant data lake) is too high for a campaign team to deliver in the 30-90 day window. For permanent infrastructure used across multiple elections, a hybrid build-and-buy model can work.

What's the minimum technical team to operate the engine?

During campaign: 1 engineer + 1 data analyst + 1 ops on rotating coverage. Steady state: 0.5 engineer (vendor handles most production work).

How is voter privacy ensured in this architecture?

Phone numbers are hashed before storage. Call transcripts are stored with encryption-at-rest. Access logs track every read of voter-identifiable data. Right-to-erasure pipeline removes all data within 7 days of request. India-hosted infrastructure satisfies DPDP residency defaults.

Can this engine integrate with our existing CRM?

Yes — typical CRMs (Salesforce, HubSpot, custom Postgres-based systems) integrate via REST API or message queue. Field-team CRMs used by Indian parties (panna pramukh apps) integrate via a custom adapter, typically 3–5 days of engineering work.

What's the latency budget at each tier of the architecture?

STT: <250ms. LLM: <500ms. TTS: <250ms. Telephony: <200ms one-way. Total <1200ms end-to-end. Tighter budgets (<800ms) require GPU co-location and high-tier providers.

Building a Two-Way AI Voter Engagement Engine: A Technical Playbook

Q: What's the latency budget at each tier of the architecture?

STT: <250ms. LLM: <500ms. TTS: <250ms. Telephony: <200ms one-way. Total <1200ms end-to-end. Tighter budgets (<800ms) require GPU co-location and high-tier providers.

A serious AI voter-engagement system is not a chatbot with a phone number. It is a distributed system that ingests millions of voter records, routes them through telephony in compliance with TRAI rules, manages real-time multi-language conversations on top of a state-of-the-art ML stack, and produces structured intelligence that feeds back into the campaign's strategic decisions — all while honouring ECI rules and DPDP constraints.

This guide is the reference architecture. It is what a serious engineering team would build, and what serious vendors deliver as a service. Everything below is what we wish someone had written down in 2023.

High-level architecture

The engine has six layers:

Data ingestion — voter lists, electoral roll, DND scrubbing
Orchestration — call scheduling, throttling, retry logic
Conversation runtime — STT, LLM, TTS, telephony
Storage and audit — call records, transcripts, audit logs
Analytics and dashboard — sentiment, intent, daily aggregates
CRM and CRM-adjacent integrations — Salesforce, panna pramukh apps, WhatsApp, SMS

Each layer has its own service boundary, its own SLOs and its own failure modes. The mistake first-time builders make is to treat this as a single monolithic application.

Layer 1: Data ingestion

Inputs flow in from three sources:

ECI electoral roll (per-AC PDF/CSV, contains voter ID, name, age, gender, booth)
Campaign-collected data (door-to-door survey results, event attendance, donations)
Third-party enrichment (phone-to-name matching from telco partner, scheme-eligibility flags from government data partner)

The ingestion pipeline runs daily during campaign and weekly otherwise. Key transformations:

Phone normalisation to international format (+91…) with validity check
DND scrubbing against the National DND registry (real-time API)
Deduplication by EPIC voter ID, then by phone, then by name+location fuzzy match
Booth-level segmentation — every voter is tagged with their booth + AC + parliamentary constituency
Language tag — best-guess at voter's primary language based on geography + name pattern

Output: a normalised voter table with hashed phone numbers and metadata. Stored in a Postgres-compatible database (CloudSQL India, RDS Mumbai, or self-hosted on India-hosted infrastructure for DPDP residency).

Layer 2: Orchestration

This is the layer that decides who gets called when. It has more business logic than the conversation runtime.

Scheduling. Each call wave has a target voter cohort (e.g., "all undecided voters in AC-204, booths 1-50"), a target window (T-30 to T-25 days), a target completion rate (e.g., 70%). The orchestrator distributes calls across the window to:

Avoid concurrency spikes
Respect telco rate limits (typically 50–200 calls/second per sender ID)
Honour time-of-day rules (no calls before 9am or after 8pm per TRAI guidelines)
Avoid calling the same voter more than once per 48 hours

Throttling. When carrier spam-flagging is detected (connect rate drops below 50% within an hour), the orchestrator automatically reduces calls/hour and rotates sender IDs.

Retry logic. Unanswered calls retry once after 4 hours. Busy lines retry twice. Hangups don't retry — the voter has expressed disinterest.

Use-case routing. A single voter may be in multiple campaigns simultaneously (persuasion wave + GOTV wave + scheme-awareness). The orchestrator decides which use case wins for any given moment, based on priority rules.

Output: a stream of call-execution requests sent to Layer 3.

Layer 3: Conversation runtime

This is the core. For each call:

Telephony dial via SIP trunk or WebRTC. Indian-numbered sender pool. Carrier routing through Jio/Airtel/Vi/BSNL.
STT stream opens as soon as voter answers. 16kHz PCM, streaming partial transcripts.
LLM inference with system prompt + conversation history + RAG-retrieved KB chunks.
TTS stream outputs response audio in chunks. 24kHz PCM transcoded to 8kHz for telephony.
VAD (voice activity detection) runs continuously. Detects voter speaking — pauses TTS. Detects end of utterance — triggers next LLM inference.
HARD STOP rules evaluated at each turn: goodbye detected, anger detected, two silent turns, 90-sec cap.
Call termination with structured record write to Layer 4.

Latency targets per stage (recap from the technical guide):

STT: <250ms
LLM time-to-first-token: <500ms
TTS time-to-first-audio: <250ms
Telephony (in-country): <200ms

Total end-to-end: <1200ms in target conditions.

The runtime must be co-located across the path. Running STT in Singapore, LLM in Mumbai and TTS in Frankfurt produces 2–3× the target latency through accumulated network hops.

Layer 4: Storage and audit

Three storage tiers, each with different access patterns:

Hot tier (real-time):

Active calls table — currently-running conversations
24-hour transcripts — for war-room war-room review
Daily sentiment aggregates per booth
Storage: Postgres + Redis (cache)

Warm tier (last 90 days):

Full call recordings (compressed audio)
Searchable transcript index
Per-voter conversation history
Storage: S3-compatible object storage (India region) + Postgres index

Cold tier (compliance retention):

Audit log — every system prompt change, every model swap, every release
Voter-erasure log — every right-to-erasure request and confirmation
Annual reports for ECI compliance
Storage: cold S3/Glacier-equivalent, 24-month retention default

DPDP-compliance specifics:

Phone numbers are hashed (HMAC-SHA256 with rotating salt) before storage. Plaintext phone exists only in transit and in the active-calls table.
Voter consent is captured at first contact and recorded as a separate document tied to the voter hash.
Right-to-erasure pipeline scans all tiers and produces a "completed" attestation within 7 days.
All admin access to voter-identifiable data is logged.

Layer 5: Analytics and dashboard

The output that the campaign manager actually looks at.

Real-time dashboards (refresh every 60 seconds):

Total calls today, completion rate, sentiment distribution
Top 10 issues mentioned in last hour
Booths with anomalous metrics (low completion, high hangup, negative sentiment)
Active call count and concurrency

Daily aggregates (auto-generated 6am):

Sentiment by booth-AC-PC, comparison vs prior day
Top 20 issues with trending direction
Demographic breakdown (age, gender) by intent class
Cost per call, cost per meaningful conversation

Weekly reports (auto-generated Monday morning):

Sentiment trajectory chart per booth
Issue cluster analysis (which issues are correlated)
Persuasion-window booths (where sentiment is moveable)
Operations stats (uptime, latency p99, cost variance)

On-demand analysis:

"Show me all calls from a specific district where voters mentioned roads"
"Compare sentiment between 9am and 5pm calls"
"What's the message that resonates best in this region?"

The intelligence layer here is what turns the raw conversation stream into strategic input for the campaign. Without this, the campaign is running calls blind.

Layer 6: Integrations

The engine has to plug into the rest of the campaign's stack.

Inbound integrations:

ECI electoral roll fetch (per-AC, on-demand)
DND registry sync (real-time API)
Scheme-eligibility data (where available from state govt)
Campaign survey results (door-to-door, rally attendance)

Outbound integrations:

WhatsApp Business API (for cross-channel handoff)
SMS gateway (for time-critical reminders)
Panna pramukh / booth-worker apps (for ground-team coordination)
CRM (Salesforce, custom Postgres, etc.) for grievance ticket creation

Webhook surface:

Per-call completion webhook (to update CRM, trigger follow-ups)
Sentiment threshold alert (page the war-room if sentiment in a booth drops below threshold)
Operations alerts (telephony spam-flag, infrastructure failure)

Engineering team structure

For a campaign that builds and operates this engine in-house (rare — recommended only for permanent multi-election deployment):

Lead engineer (full stack, owns architecture)
Telephony engineer (SIP, DLT, carrier relationships)
ML/AI engineer (system prompt, dialect tuning, retrieval)
Data engineer (pipeline, dashboard, integrations)
DevOps / SRE (infrastructure, latency, on-call)
Compliance / legal liaison (ECI, TRAI, DPDP)

That's 6 specialised roles, ~₹3-5 cr annual personnel cost. Most state-scale campaigns cannot justify this in-house. Specialist vendors absorb this team cost across many campaigns.

SLOs to demand from any vendor

If you outsource, these are the SLOs to bake into the contract:

Uptime: 99.5% during campaign window (43.8 minutes max downtime per month)
First-token latency p95: <800ms
End-to-end call latency p95: <1500ms
Call completion rate: 50%+ (lower indicates bad voice/script/list)
Sentiment classification accuracy: 80%+ on a held-out test set
Dashboard refresh latency: <5 minutes
Right-to-erasure SLA: 7 days
War-room support response: <15 minutes during campaign window, <2 hours otherwise

What this engine looks like at maturity

A mature deployment runs the same engine across:

The current campaign (active outbound + inbound)
Permanent governance helpline (inbound, 5-year)
Quarterly sentiment surveys (small outbound waves)
Special-occasion campaigns (scheme awareness drives, emergency response)

The infrastructure is the same. The configuration (system prompt, voter cohort, language settings) changes per use case. Over a 5-year cycle, a constituency can accumulate ~10 million conversation records — a uniquely deep grounded dataset that no opinion poll can replicate.

Where to go next

The 30-Day Deployment Playbook — execution
10 Must-Have Features — vendor checklist
Voter Sentiment Analysis Pipeline — analytics deep dive
AI Election Agent Pricing — cost benchmarking

The engine is six layers, each with its own complexity. Building it well takes a small dedicated team and 90+ days. Operating it well takes a 5-year horizon — which is exactly the horizon Indian elections run on.