Design: Chat Agent (LangGraph)

Overview

The "AI Teacher" is a stateful agent managed by a LangGraph StateGraph. This allows for complex logic, error handling, and multi-step reasoning.

Graph Definition (`teacher_agent.py`)

Nodes

check_wallet: Fetches wallet and profile data via supabase_service. Gets tier limits from TierConfig. Reserves tokens via wallet_service.reserve().
retrieve: Calls retrieval_pipeline.retrieve() for vector search + reranking. Builds context string and sources list.
ask_clarifying: Triggered if the question is too vague or retrieval score is low. Returns a localized clarification message (French/Arabic).
finalize: Calculates actual cost, calls wallet_service.finalize() to settle the reservation, and logs usage to usage_logs via supabase_service.

Agent Flow

graph TD
    START([Student sends question]) --> CW[check_wallet]
    CW --> D{decide_path}
    D -->|Vague question<br>or missing context| AC[ask_clarifying]
    D -->|Specific question<br>with grade/subject| R[retrieve_and_rerank]
    R --> D2{Retrieval quality?}
    D2 -->|Score >= 0.35| F[finalize]
    D2 -->|Score < 0.35| AC
    AC --> F
    F --> END([Return response])

    style CW fill:#e1f5fe
    style R fill:#e8f5e9
    style AC fill:#fff3e0
    style F fill:#f3e5f5

Conditional Edges

Wallet -> Decision: If the question is very short or context is missing, go to ask_clarifying.
Retrieval -> Decision: If retrieval score < 0.35, go to ask_clarifying. Otherwise, go to finalize.

Billing Integration (Reserve/Finalize)

The agent uses the atomic reservation pattern: 1. check_wallet reserves BASE_CHAT_COST + TOKEN_BUFFER tokens. 2. The reservation_id is stored in the agent state. 3. finalize calculates actual cost and calls wallet_service.finalize(reservation_id, actual). 4. The delta is refunded if actual < estimated.

Retrieval Pipeline

The agent uses RetrievalPipeline from app/core/dependencies.py: 1. Language Detection: GPT-mini detects the query language. 2. Translation: If query language differs from corpus language, translate automatically. 3. Dense Search: Get top-K matches from Pinecone (K = tier-specific). 4. Rerank Cache Check: Return cached results if available (TTL: 15 min). 5. Reranking: Use GPT-mini to select top-N most relevant snippets (N = tier-specific). 6. Chunk Text Fetch: Retrieve full text from Postgres cache or DB.

Retrieval Pipeline Detail

sequenceDiagram
    participant Agent as Teacher Agent
    participant RP as RetrievalPipeline
    participant GPTm as GPT-4o-mini
    participant PC as Pinecone
    participant Cache as LRU Cache
    participant DB as Postgres

    Agent->>RP: retrieve(query, tier, namespace)
    RP->>GPTm: detect_language(query)
    GPTm-->>RP: "ar"
    RP->>GPTm: translate(query, "ar" → "fr")
    GPTm-->>RP: translated_query
    RP->>PC: query(vector, top_k=10)
    PC-->>RP: raw matches
    RP->>Cache: check rerank cache
    Cache-->>RP: miss
    RP->>GPTm: rerank(query, matches, top_n=3)
    GPTm-->>RP: ranked results
    RP->>Cache: store rerank result
    RP->>DB: fetch chunk text
    DB-->>RP: full text
    RP-->>Agent: context + sources

Multilingual Support (Arabic/French)

To support students asking in Arabic when the curriculum is in French: 1. Translation Step: If the input language is detected as Arabic, the query is translated to French using the LLM before vector search. 2. Cross-Lingual Retrieval: Vector search is performed using the French query against the French corpus. 3. Cross-Lingual Reranking: The reranker is instructed to match the original question (and its translation) against snippets, accounting for language differences. 4. Arabic Response: The final answer is generated in Arabic using the French context.

State Object (`TeacherState`)

The agent maintains a state dictionary throughout the lifecycle: - question, language, namespace, grade, subject - tier, token_balance, hint_level - reservation_id: UUID of the active token reservation - retrieval_top_k, rerank_top_n: Tier-specific limits - matches_selected: The final snippets used - context: The formatted string passed to the LLM - answer, answer_chars: The generated response - cost_tokens: Final token cost after finalization - sources: Source references for the answer

Dependency Injection

All services are injected as singletons from app/core/dependencies.py: - wallet_service (WalletReservationService) - retrieval_pipeline (RetrievalPipeline) - supabase_service (Supabase Client) - openai_client (OpenAI Client)

Back to Index