Converse API
The Converse API is a comprehensive backend service that provides conversational AI capabilities, real-time messaging, optical character recognition, text-to-speech synthesis, and image generation. Built on FastAPI with Firebase integration, it serves as the backbone for multi-modal communication applications.
Overview
Converse API represents a modern approach to conversational platforms, integrating multiple AI services into a unified interface. The system employs a microservices-oriented architecture where each functional domain (AI chat, OCR, TTS, messaging) operates as a discrete service boundary while sharing common authentication and storage infrastructure.
The API follows RESTful principles with JSON as the primary data interchange format. All endpoints requiring user context implement bearer token authentication via Firebase ID tokens, ensuring secure, stateless request validation. Cross-Origin Resource Sharing (CORS) is configured to allow unrestricted access, suitable for public-facing applications.
Key Features
- Conversational AI: Natural language processing with contextual responses powered by Pollinations AI's OpenAI-compatible endpoint
- Visual Generation: Automatic image synthesis based on conversation context using prompt-based image generation
- Optical Character Recognition: Text extraction from images using OCR.space API with multi-language support
- Text-to-Speech: Neural voice synthesis with customizable voice profiles via Microsoft Edge TTS
- Real-time Messaging: WhatsApp-inspired chat architecture with conversation indexing and push notifications
- Push Notifications: Firebase Cloud Messaging integration for system and chat notifications
Architecture
System Design
The architecture follows a layered design pattern consisting of presentation (API endpoints), business logic (service functions), and data access (Firebase SDK) layers. The system maintains separation of concerns through dedicated modules for authentication, AI processing, messaging, and media handling.
Technology Stack
| Component | Technology | Purpose |
|---|---|---|
| Web Framework | FastAPI 0.x | Asynchronous HTTP server with automatic OpenAPI documentation |
| Authentication | Firebase Admin SDK | JWT verification and user identity management |
| Database | Cloud Firestore | NoSQL document store with real-time capabilities |
| HTTP Client | HTTPX | Async HTTP requests to external services |
| Image Processing | Pillow (PIL) | Image format conversion and optimization |
| TTS Engine | edge-tts | Microsoft Edge Text-to-Speech synthesis |
Data Flow
Request processing follows this sequence: (1) CORS middleware validates origin, (2) endpoint handler receives request, (3) authentication middleware verifies Firebase token, (4) business logic processes data with external service calls as needed, (5) Firestore transactions update state, (6) response serialization returns JSON or streaming media.
Authentication
The API implements Firebase Authentication using ID tokens as bearer credentials. All protected endpoints expect an Authorization header containing a Firebase-issued JWT token.
Token Verification Process
The verify_user() function extracts the bearer token from the Authorization header and validates it against Firebase's public keys. Upon successful verification, the function returns the authenticated user's UID (User Identifier), which serves as the primary key for all user-scoped operations.
Authentication Flow
- Client authenticates with Firebase Authentication (email/password, OAuth, etc.)
- Firebase returns an ID token to the client
- Client includes token in Authorization header for all API requests
- API validates token signature and expiration using Firebase Admin SDK
- API extracts UID from validated token for user-specific operations
API Endpoints
Health Check & Documentation
Returns this comprehensive HTML documentation page. Used for service monitoring and developer reference.
Response
User Initialization
Creates or updates user profile information in Firestore. This endpoint should be called after initial authentication to establish user presence in the system and register FCM tokens for push notifications.
Authentication
Required. Firebase ID token in Authorization header.
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
| name | string | No | User's display name |
| string | No | User's email address | |
| fcm_token | string | Yes | Firebase Cloud Messaging device token |
Response
set() with merge=True, which performs an upsert operation. Existing user data is preserved unless explicitly overwritten by new values.
AI Conversation
Processes natural language input through the AI system, generating a textual response and corresponding visual representation. The conversation history is persisted to Firestore for retrieval via the history endpoint.
Authentication
Required. Firebase ID token in Authorization header.
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
| message | string | Yes | User's natural language input |
Processing Pipeline
- User Verification: Validates Firebase token and ensures user document exists
- AI Processing: Sends message to Pollinations AI with system prompt context
- Image Generation: Uses AI response as prompt for image generation API
- Image Optimization: Converts generated image to JPEG format at 85% quality
- Data Persistence: Stores conversation turn (user message, AI reply, image) in Firestore
- Response Serialization: Returns AI text and base64-encoded image
Response
Firestore Storage Schema
Chat turns are stored in: users/{uid}/chats/{auto_id}
Optical Character Recognition
Extracts text content from uploaded images using OCR.space API, then provides an AI-powered explanation of what the image contains. Returns a natural language description rather than raw OCR text.
Authentication
Required. Firebase ID token in Authorization header.
Request
Multipart form data with file upload:
| Field | Type | Description |
|---|---|---|
| file | UploadFile | Image file containing text to extract |
OCR Configuration
- Language: English (eng) - configurable via
languageparameter - Overlay: Disabled - returns only text without positional metadata
- Processing: Executed in thread pool to avoid blocking event loop
- AI Enhancement: Extracted text is sent to AI for natural explanation
Response
Error Handling
If OCR fails or no text is detected, returns:
Text-to-Speech Synthesis
Converts text input to natural-sounding speech using Microsoft Edge's neural TTS engine. Returns streaming MP3 audio with prosody enhancements for improved naturalness.
Authentication
Required. Firebase ID token in Authorization header.
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
| text | string | Yes | Text content to synthesize |
| voice | string | No | Voice profile identifier (default: en-US-JennyNeural) |
Speech Processing
The for_speech() function applies prosody modifications to improve speech naturalness:
- Breathing Pauses: Adds line breaks after sentences (periods, question marks)
- Thinking Pauses: Replaces em-dashes and semicolons with ellipses
- Pacing Control: Prepends ellipsis to short responses to prevent rushed delivery
Response
Streaming audio response:
Available Voices
The system supports all Microsoft Edge TTS voices. Common options include:
en-US-JennyNeural- Female, American English (default)en-US-GuyNeural- Male, American Englishen-GB-SoniaNeural- Female, British Englishen-AU-NatashaNeural- Female, Australian English
Image Generation Proxy
Generates images from text prompts using Pollinations AI's image synthesis API. Acts as a proxy with image optimization and caching headers.
Query Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| prompt | string | Yes | Text description of desired image |
Processing Steps
- URL-encodes the prompt parameter
- Requests image from Pollinations API (60-second timeout)
- Converts image to RGB color space
- Re-encodes as JPEG at 90% quality
- Returns optimized image with cache headers
Response
Chat History Retrieval
Fetches the authenticated user's complete AI conversation history, ordered chronologically by timestamp.
Authentication
Required. Firebase ID token in Authorization header.
Response
users/{uid}/chats collection in Firestore, enabling full history retrieval and cross-device synchronization.
Send Chat Message
Sends a message to another user, creating or updating a conversation. Implements WhatsApp-style chat architecture with real-time WebSocket delivery and FCM fallback notifications.
Authentication
Required. Firebase ID token in Authorization header.
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
| to_uid | string | Yes | Recipient's user ID |
| text | string | Yes | Message content |
Processing Flow
- User Validation: Verifies sender authentication and prevents self-messaging
- Recipient Verification: Ensures target user exists in Firestore
- Conversation Management: Creates or updates conversation using deterministic ID (sorted UIDs)
- Message Storage: Persists message in
conversations/{convo_id}/messages - Chat Index Update: Updates both sender and recipient's chat index with unread counters
- Real-time Delivery: Attempts WebSocket delivery to active connections
- Push Notification: Sends FCM notification as fallback for offline users
Response
Conversation ID Structure
Conversation IDs are deterministic, formed by sorting participant UIDs alphabetically and joining with underscore:
Firestore Data Structure
Conversation Document: conversations/{convo_id}
Message Document: conversations/{convo_id}/messages/{message_id}
Chat Index Document: users/{uid}/chat_index/{convo_id}
List User Conversations
Retrieves all conversations the authenticated user is participating in, ordered by most recent activity.
Authentication
Required. Firebase ID token in Authorization header.
Query Strategy
Uses Firestore's array-contains query to find conversations where the user is a participant:
Response
/chat/index endpoint which reads from a user-specific subcollection for better performance.
Get Conversation Messages
Retrieves all messages from a specific conversation in chronological order. Implements permission checking to ensure users can only access conversations they participate in.
Authentication
Required. Firebase ID token in Authorization header.
Path Parameters
| Parameter | Type | Description |
|---|---|---|
| convo_id | string | Conversation identifier (format: uid1_uid2) |
Authorization Check
The endpoint verifies:
- Conversation exists in Firestore
- Authenticated user is in the conversation's participants array
Response
Error Responses
| Status Code | Condition | Description |
|---|---|---|
| 404 | Conversation not found | Specified convo_id does not exist |
| 403 | Not allowed | User is not a participant in this conversation |
order_by("ts") to maintain conversation flow consistency.
Get User Chat Index
Retrieves the authenticated user's personalized chat index, providing a WhatsApp-style conversation list with buddy information, message previews, and unread counts. This is the recommended endpoint for building chat list UIs.
Authentication
Required. Firebase ID token in Authorization header.
Architecture Advantage
Unlike /chat/list which queries the global conversations collection, this endpoint reads from a user-specific subcollection (users/{uid}/chat_index), providing:
- O(1) read complexity regardless of total conversation count
- Denormalized buddy information (name, photo) for instant display
- Per-user unread counters
- Optimized for client-side rendering
Response
/chat/list. The denormalized structure eliminates the need for additional queries to fetch participant details.
Unread Counter Management
The unread_count field is automatically managed:
- Incremented: When a message is received (via
/chat/send) - Reset to 0: When the user sends a message in that conversation
- Manual Reset: Client can implement mark-as-read by updating the counter directly
Admin Notification Interface
Web-based admin panel for sending push notifications to users. Protected by admin key authentication.
Authentication
Query parameter key must match ADMIN_NOTIFY_KEY environment variable.
Query Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| key | string | Yes | Admin authentication key |
Features
- User selection with "Select All" checkbox
- Custom notification title and body
- Real-time delivery status feedback
- Material Design UI
Send Admin Notifications
Programmatic endpoint for sending bulk notifications to selected users. Used by the admin UI but can also be called directly via API.
Authentication
Required. Admin key in admin-key header.
Request Headers
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
| uids | array[string] | Yes | List of user IDs to notify |
| title | string | Yes | Notification title |
| body | string | Yes | Notification message |
Response
Delivery Logic
- Validates admin key
- For each UID:
- Checks if user document exists
- Retrieves FCM token from user document
- Sends FCM notification if token exists
- Silently skips users without tokens
- Returns count of successful deliveries
{"type": "admin", "ts": timestamp} which clients can use to differentiate from chat notifications.
Real-time WebSocket Connection
Establishes a persistent WebSocket connection for receiving real-time chat messages. Enables instant message delivery without polling.
Authentication
Required. Firebase ID token as query parameter.
Connection URL
Connection Lifecycle
- Handshake: Client connects with Firebase token in query params
- Verification: Server validates token and extracts UID
- Registration: Connection added to ConnectionManager for user's UID
- Keep-alive: Client sends periodic pings to maintain connection
- Message Delivery: Server pushes JSON messages when events occur
- Disconnect: Connection removed from manager on close/error
Message Format
Chat messages received via WebSocket:
Error Handling
| Close Code | Reason | Description |
|---|---|---|
| 1008 | Policy Violation | Missing or invalid authentication token |
Data Models
User Document
Location: users/{uid}
Chat Turn Document
Location: users/{uid}/chats/{chat_id}
Conversation Document
Location: conversations/{convo_id}
Message Document
Location: conversations/{convo_id}/messages/{message_id}
Chat Index Document
Location: users/{uid}/chat_index/{convo_id}
Error Handling
The API uses standard HTTP status codes and returns JSON error responses:
Common Error Responses
| Status Code | Meaning | Common Causes |
|---|---|---|
| 400 | Bad Request | Invalid request body, missing required fields, self-messaging attempt |
| 401 | Unauthorized | Missing or invalid Firebase token, expired token |
| 403 | Forbidden | Accessing conversation without participation, admin key mismatch |
| 404 | Not Found | User or conversation doesn't exist |
| 500 | Internal Server Error | External service failure (AI, image generation, OCR), Firestore errors |
Error Response Format
Firebase Integration
Service Account Configuration
The API requires a Firebase Admin SDK service account stored in the FIREBASE_SERVICE_ACCOUNT environment variable as JSON string:
Firebase Services Used
- Authentication: ID token verification for API access
- Firestore: User profiles, chat history, conversations, messages
- Cloud Messaging (FCM): Push notifications for offline users
FCM Notification Structure
Notifications are sent as data messages with system notification overlay:
Security Considerations
Authentication Best Practices
- Token Expiration: Firebase ID tokens expire after 1 hour. Implement automatic refresh in client applications
- Secure Storage: Never store tokens in localStorage; use secure, HTTP-only cookies or in-memory storage
- HTTPS Only: Always use HTTPS in production to prevent token interception
Authorization Model
The API implements resource-level authorization:
- Users can only access their own chat history and user data
- Conversation access requires participant verification
- Admin endpoints require separate admin key authentication
API Key Management
| Key Type | Environment Variable | Purpose |
|---|---|---|
| Pollinations AI | POLLINATIONS_API_KEY | AI chat and image generation |
| OCR.space | OCR_SPACE_API_KEY | Text extraction from images |
| Admin Notify | ADMIN_NOTIFY_KEY | Broadcast notification access |
| Firebase Service Account | FIREBASE_SERVICE_ACCOUNT | Authentication and Firestore access |
CORS Configuration
The API currently allows all origins (allow_origins=["*"]). For production deployments, restrict to specific domains:
Rate Limiting
Consider implementing rate limiting for public endpoints to prevent abuse:
- OCR endpoint: 10 requests/minute per user
- Image generation: 5 requests/minute per user
- Chat messages: 100 requests/minute per user
Converse API Documentation
For support or feature requests, contact Vinaycharyvelpula@gmail.com