Design: Chat Agent (LangGraph)
Overview
The "AI Teacher" is a stateful agent managed by a LangGraph StateGraph. This allows for complex logic, error handling, and multi-step reasoning.
Graph Definition (teacher_agent.py)
Nodes
check_wallet: Fetches wallet and profile data viasupabase_service. Gets tier limits fromTierConfig. Reserves tokens viawallet_service.reserve().retrieve: Callsretrieval_pipeline.retrieve()for vector search + reranking. Builds context string and sources list.ask_clarifying: Triggered if the question is too vague or retrieval score is low. Returns a localized clarification message (French/Arabic).finalize: Calculates actual cost, callswallet_service.finalize()to settle the reservation, and logs usage tousage_logsviasupabase_service.
Agent Flow
graph TD
START([Student sends question]) --> CW[check_wallet]
CW --> D{decide_path}
D -->|Vague question<br>or missing context| AC[ask_clarifying]
D -->|Specific question<br>with grade/subject| R[retrieve_and_rerank]
R --> D2{Retrieval quality?}
D2 -->|Score >= 0.35| F[finalize]
D2 -->|Score < 0.35| AC
AC --> F
F --> END([Return response])
style CW fill:#e1f5fe
style R fill:#e8f5e9
style AC fill:#fff3e0
style F fill:#f3e5f5
Conditional Edges
- Wallet -> Decision: If the question is very short or context is missing, go to
ask_clarifying. - Retrieval -> Decision: If retrieval score < 0.35, go to
ask_clarifying. Otherwise, go tofinalize.
Billing Integration (Reserve/Finalize)
The agent uses the atomic reservation pattern:
1. check_wallet reserves BASE_CHAT_COST + TOKEN_BUFFER tokens.
2. The reservation_id is stored in the agent state.
3. finalize calculates actual cost and calls wallet_service.finalize(reservation_id, actual).
4. The delta is refunded if actual < estimated.
Retrieval Pipeline
The agent uses RetrievalPipeline from app/core/dependencies.py:
1. Language Detection: GPT-mini detects the query language.
2. Translation: If query language differs from corpus language, translate automatically.
3. Dense Search: Get top-K matches from Pinecone (K = tier-specific).
4. Rerank Cache Check: Return cached results if available (TTL: 15 min).
5. Reranking: Use GPT-mini to select top-N most relevant snippets (N = tier-specific).
6. Chunk Text Fetch: Retrieve full text from Postgres cache or DB.
Retrieval Pipeline Detail
sequenceDiagram
participant Agent as Teacher Agent
participant RP as RetrievalPipeline
participant GPTm as GPT-4o-mini
participant PC as Pinecone
participant Cache as LRU Cache
participant DB as Postgres
Agent->>RP: retrieve(query, tier, namespace)
RP->>GPTm: detect_language(query)
GPTm-->>RP: "ar"
RP->>GPTm: translate(query, "ar" → "fr")
GPTm-->>RP: translated_query
RP->>PC: query(vector, top_k=10)
PC-->>RP: raw matches
RP->>Cache: check rerank cache
Cache-->>RP: miss
RP->>GPTm: rerank(query, matches, top_n=3)
GPTm-->>RP: ranked results
RP->>Cache: store rerank result
RP->>DB: fetch chunk text
DB-->>RP: full text
RP-->>Agent: context + sources
Multilingual Support (Arabic/French)
To support students asking in Arabic when the curriculum is in French: 1. Translation Step: If the input language is detected as Arabic, the query is translated to French using the LLM before vector search. 2. Cross-Lingual Retrieval: Vector search is performed using the French query against the French corpus. 3. Cross-Lingual Reranking: The reranker is instructed to match the original question (and its translation) against snippets, accounting for language differences. 4. Arabic Response: The final answer is generated in Arabic using the French context.
State Object (TeacherState)
The agent maintains a state dictionary throughout the lifecycle:
- question, language, namespace, grade, subject
- tier, token_balance, hint_level
- reservation_id: UUID of the active token reservation
- retrieval_top_k, rerank_top_n: Tier-specific limits
- matches_selected: The final snippets used
- context: The formatted string passed to the LLM
- answer, answer_chars: The generated response
- cost_tokens: Final token cost after finalization
- sources: Source references for the answer
Dependency Injection
All services are injected as singletons from app/core/dependencies.py:
- wallet_service (WalletReservationService)
- retrieval_pipeline (RetrievalPipeline)
- supabase_service (Supabase Client)
- openai_client (OpenAI Client)