BacMR Backend Architecture

Status: ✅ Phases A-F Implemented (Integration In Progress) Audience: Developers, contributors, and operators Last updated: 2026-02-17 Implementation Branch: feature/sonnet-impl-20260217-155229

🎯 Implementation Status Summary

Phase	Services	Migrations	API Endpoints	Status
A - Core Schema	✅ Complete	✅ 12-17	⏳ 70%	Services ready, some endpoints need wiring
B - Security	✅ Complete	✅ 18-19	✅ 95%	Auth, wallet, admin working
C - Caching	✅ Complete	N/A	N/A	Internal services ready
D - Retrieval	✅ Complete	N/A	⏳ 60%	Services ready, endpoints need integration
E - Scraper	✅ Complete	N/A	⏳ 60%	Services ready, endpoints need integration
F - Observability	✅ Complete	N/A	✅ 90%	Metrics working, background jobs ready

Overall: Services 95% ✅ | Migrations 100% ✅ | API Integration 70% ⏳

✅ Working End-to-End

Auth (signup, signin, profile, JWT custom claims)
Wallet (balance, reservations, reserve/finalize pattern)
Admin (user management, role updates, admin access control)
Metrics (Prometheus + JSON endpoints)

⏳ Services Ready, Endpoints Need Integration

Chat with RAG retrieval (RetrievalPipeline ready, router needs wiring)
Quiz generation (QuizGeneratorService ready, router needs wiring)
PDF ingestion (IngestionService ready, router needs creation)
Scraper sync (ScraperService ready, router needs wiring)

See: docs/90_ops/implementation_status.md for detailed status

See: docs/90_ops/dependency_injection.md for service wiring pattern

High-Level Architecture Diagram (includes Request ID Propagation & Rate Limiting)
Service Responsibility List
Storage Responsibilities
Postgres Schema Additions
Ingestion Pipeline
Pinecone Index & Metadata Schema
Retrieval → Rerank → Reasoning Pipeline
API Contract
RLS & Auth Plan
Billing & Wallet (Reservation Pattern)
Scraper & Canonicalization
Observability & Resiliency
Security
Testing Matrix
Sonnet Task List

1. High-Level Architecture Diagram

graph TB
    subgraph Client
        U[Student / Teacher / Admin<br/>Next.js Frontend]
    end

    subgraph API_Gateway ["API Gateway (FastAPI)"]
        GW[FastAPI App<br/>CORS · Rate Limit · Auth Middleware]
    end

    subgraph Auth ["Auth Layer"]
        SA[Supabase Auth<br/>JWT + Custom Claims Hook]
    end

    subgraph Core_Services ["Core Services"]
        FU[File Upload Service<br/>Presigned URL → S3/GCS]
        PW[Parser Workers<br/>PDF Extract · OCR · Normalize]
        EW[Embedding Worker<br/>tiktoken chunker · OpenAI embed]
        VA[Vector Adapter<br/>Pinecone upsert/query]
        GM[GPT-mini Service<br/>Rerank · Validate · Detect Language]
        RS[Reasoning Service<br/>LangGraph Teacher Agent · GPT-4o]
        BS[Billing Service<br/>Reserve · Finalize · Reconcile]
    end

    subgraph Data_Stores ["Data Stores"]
        PG[(Supabase Postgres<br/>profiles · wallet · chunks<br/>ingestion_jobs · reservations)]
        PC[(Pinecone<br/>curriculum-1536<br/>vectors + lightweight metadata)]
        S3[(Blob Store<br/>S3 / GCS<br/>Raw PDFs)]
        RC[(Cache<br/>Redis / In-Memory LRU)]
    end

    subgraph Infra ["Infrastructure"]
        SM[Secret Manager<br/>Vault / Cloud KMS]
        MON[Monitoring<br/>Structured Logs · Metrics · Alerts]
        RJ[Reindex Job<br/>Scheduled · DR Export]
    end

    U -->|"HTTPS + JWT"| GW
    GW -->|"Verify JWT"| SA
    GW -->|"Upload PDF"| FU
    GW -->|"/ask, /chat"| RS
    GW -->|"/ingestion/jobs"| PW
    GW -->|"/wallet/*"| BS

    FU -->|"Store raw PDF"| S3
    PW -->|"Extract text"| S3
    PW -->|"Write chunks"| PG
    PW -->|"Request embeddings"| EW
    EW -->|"OpenAI embed API"| OAI_E[OpenAI Embeddings]
    EW -->|"Upsert vectors"| VA
    VA -->|"Upsert/Query"| PC

    RS -->|"Dense search"| VA
    RS -->|"Rerank candidates"| GM
    GM -->|"gpt-4o-mini API"| OAI_M[OpenAI gpt-4o-mini]
    RS -->|"Generate answer"| OAI_C[OpenAI gpt-4o]
    RS -->|"Reserve / finalize tokens"| BS
    RS -->|"Fetch chunk text"| RC
    RC -->|"Cache miss"| PG

    BS -->|"Ledger + Reservations"| PG
    PW -->|"Audit log"| PG

    GW -->|"Read secrets"| SM
    GW -->|"Emit metrics"| MON
    RS -->|"Emit metrics"| MON
    RJ -->|"Export chunks + vectors"| S3
    RJ -->|"Re-embed"| EW

    style PG fill:#336791,color:#fff
    style PC fill:#1a73e8,color:#fff
    style S3 fill:#e47911,color:#fff
    style RC fill:#dc382c,color:#fff

Data-Flow Paths

Path	Flow
Ingestion	Admin → FastAPI → File Upload → S3 → Parser Worker → Embedding Worker → Pinecone + Postgres
Retrieval (Chat)	Student → FastAPI → Auth → Billing Reserve → Pinecone query → Cache/Postgres (chunk text) → GPT-mini rerank → GPT-4o reason → SSE stream → Billing Finalize
Billing	FastAPI → Reserve (Postgres TX) → LLM call → Finalize (Postgres TX) → Ledger entry
Scraping	Admin → FastAPI → Scraper → Canonicalize → Dedupe → Postgres `references`

Request ID Propagation

Every inbound HTTP request receives a request_id (UUID v4) at the API gateway. This ID is the single correlation key across every subsystem — without it, debugging an LLM failure that spans Pinecone, OpenAI, wallet, and audit tables is nearly impossible.

Generation: FastAPI middleware generates request_id = uuid4() at the start of every request (or adopts X-Request-ID from the client/load-balancer if present).

Propagation path:

Component	How `request_id` is used
Structured logs	Every log line includes `request_id` as a top-level JSON field
Wallet / Reservations	`reservations.request_id` and `wallet_ledger.request_id` link billing to the originating request
Usage logs	`usage_logs.request_id` correlates the RAG interaction
OpenAI calls	Passed as `user` parameter in OpenAI API calls (enables cost attribution in OpenAI Dashboard)
Pinecone queries	Logged alongside query parameters for post-hoc debugging
Ingestion audit	`ingestion_audit.request_id` (nullable — only set when triggered via API, not cron)
SSE stream	Returned in the final `done` event: `{"type": "done", "request_id": "uuid", ...}`
Error responses	Every error response body includes `"request_id": "uuid"` so the client can report it

Implementation: - Middleware sets request.state.request_id. - A contextvars.ContextVar makes it available to all service layers without explicit threading. - File: app/core/middleware.py

Rate Limiting

Students will spam refresh, open multiple tabs, and trigger parallel queries. Without rate limiting, a single user can exhaust the platform's OpenAI quota.

Strategy: Per-user (authenticated) rate limit with per-IP fallback for unauthenticated endpoints.

Scope	Limit	Window	Applies to
Per-user (JWT `user_id`)	10 requests	1 minute	`/ask`, `/chat`, `/quizzes/generate`, `/search/semantic`
Per-user (JWT `user_id`)	30 requests	1 minute	`/wallet/`, `/upload/`
Per-IP (unauthenticated)	5 requests	1 minute	`/auth/signup`, `/auth/login`
Per-user (admin)	60 requests	1 minute	`/admin/`, `/ingestion/`, `/scraping/*`

Enforcement: - In-memory sliding-window counter (sufficient at single-instance scale). - If deploying multiple instances: Redis-backed counter (same Redis as cache layer). - Response on breach: HTTP 429 Too Many Requests with Retry-After header (seconds until window resets). - request_id is included in the 429 response body for support debugging.

Response format:

{
  "error": "rate_limited",
  "request_id": "uuid",
  "retry_after": 23,
  "limit": 10,
  "window": "1m"
}

Implementation: FastAPI middleware in app/core/middleware.py (same file as request-ID middleware).

2. Service Responsibility List

2.1 Implemented Services (Phase A-F)

Service	Responsibility	Implementation File	Status
API Gateway	Route requests, CORS, auth middleware, request validation	`app/main.py`, `app/api/routers/`	✅ Working
Request Middleware	Generate/adopt `request_id` (UUID), enforce per-user and per-IP rate limits, inject `request_id` into `contextvars`	`app/core/middleware.py`	✅ Implemented
Auth Service	JWT verification, role extraction from custom claims (`app_metadata.role`), admin guard	`app/core/auth.py`	✅ Working
Dependency Registry	Centralized singleton service instances with proper dependency wiring	`app/core/dependencies.py`	✅ Working
Chunking Service	Token-based chunking (tiktoken), deterministic chunk IDs (`sha256(file_id:page:chunk_index)`), language-specific sizes	`app/services/chunking.py`	✅ Implemented (S1)
Ingestion Service	State machine (queued → ready/failed), retry logic (max 3), audit trail	`app/services/ingestion.py`	✅ Implemented (S2)
Wallet Reservation Service	Reserve tokens (atomic), finalize after LLM, expire stale reservations, reconcile ledger	`app/services/wallet_reservation.py`	✅ Working (S3)
Pinecone Adapter	Upsert/query vectors with lightweight metadata (<1 KB), no full text storage	`app/services/pinecone_adapter.py`	✅ Implemented (S4)
Embedding Service	Generate embeddings (OpenAI), track refs in embedding_refs table, upsert to Pinecone	`app/services/embedding_service.py`	✅ Implemented (S5)
Upload Service	Generate presigned URLs for S3/GCS/Supabase Storage, validate file type/size	`app/services/upload.py`	✅ Implemented (S21)
Cache Service	Dual LRU cache (rerank 15-min TTL, chunk text 1-hour TTL), invalidation on re-ingestion	`app/services/cache.py`	✅ Implemented (S10-S11)
Tier Config	Free/Standard/Premium limits (top-K, rerank-N, tokens), cost estimation	`app/services/tier_config.py`	✅ Implemented (S12)
GPT-mini Service	Rerank candidates, detect language (French/Arabic/Hassaniya), translate queries, validate input, circuit breaker	`app/services/gpt_mini.py`	✅ Implemented (S20)
Retrieval Pipeline	Full flow: detect language → translate → embed → dense search → rerank → fetch chunks	`app/services/retrieval_pipeline.py`	✅ Implemented (Phase D)
Quiz Generator	RAG-based quiz generation with GPT-4o, multiple-choice with explanations and source pages	`app/services/quiz_generator.py`	✅ Implemented (S22)
Circuit Breaker	Protection for OpenAI/Pinecone calls, 3 failures → open, 120s recovery, fallback strategies	`app/services/circuit_breaker.py`	✅ Implemented (S16)
Text Normalizer	Arabic canonicalization (alef unification, tatweel removal, boilerplate removal)	`app/services/text_normalizer.py`	✅ Implemented (S14)
Deduplication Service	SimHash (64-bit) with Hamming distance ≤ 3 for duplicate detection	`app/services/deduplication.py`	✅ Implemented (S13)
Quality Checker	Content quality heuristics (min length, OCR confidence, encoding validation)	`app/services/quality_checker.py`	✅ Implemented (S15)
Scraper Service	Automated pipeline: canonicalize → quality check → dedupe → insert canonical refs	`app/services/scraper_service.py`	✅ Implemented (Phase E)
Monitoring	Structured JSON logging, Prometheus-compatible metrics (counters, histograms, gauges)	`app/core/logging.py`, `app/core/metrics.py`	✅ Implemented (S17)
Config Management	Settings with env vars, defaults for all parameters	`app/core/config.py`	✅ Updated

2.2 Background Jobs (Phase F)

Job	Responsibility	Implementation File	Schedule	Status
Reservation Expiry	Expire un-finalized reservations older than 5 min, refund tokens	`scripts/expire_reservations.py`	Continuous (60s loop)	✅ Ready
Wallet Reconciliation	Compare wallet balance with ledger sum, flag discrepancies (no auto-correct)	`scripts/reconcile_wallets.py`	Daily 2 AM (cron)	✅ Ready (S18)
DR Export	Export chunks to NDJSON, upload to blob store for disaster recovery	`scripts/export_chunks.py`	Weekly Sunday 3 AM (cron)	✅ Ready (S19)
Reindex	Re-embed chunks with new model, create new namespace, verify counts	`scripts/reindex.py`	On-demand (manual)	✅ Ready (S19)

2.3 Legacy Services (Pre-existing, Kept for Compatibility)

Service	File	Notes
Legacy Embeddings	`app/services/embeddings.py`	Kept for backward compatibility; new code uses `embedding_service.py`
Legacy Wallet	`app/services/wallet.py`	Kept for backward compatibility; new code uses `wallet_reservation.py`
Legacy Pinecone	`app/services/pinecone_store.py`	Kept; new code uses `pinecone_adapter.py`
Legacy Retrieval	`app/services/retrieval.py`	Kept; new code uses `retrieval_pipeline.py`

GPT-mini Validator/Reranker — Hosting & SLA

Hosted on: OpenAI API (same API key as main models). Model: gpt-4o-mini.
SLA: Same as OpenAI API (99.9% target). No self-hosted fallback needed at current scale.
Fallback: If gpt-4o-mini returns error or latency > 5 seconds:
Reranking: Skip rerank, return dense-retrieval order (graceful degradation).
Language detection: Fall back to simple regex-based Arabic/French detector.
Input validation: Allow the request through (fail-open for validation; fail-closed for safety).

3. Storage Responsibilities

What Goes Where

Data	Store	Rationale
Full chunk text	Postgres (`chunks.content`)	Source of truth; enables full-text search; avoids Pinecone 40 KB metadata limit
Embedding vectors (1536-dim)	Pinecone	Optimized for ANN search
Lightweight metadata per vector	Pinecone metadata	Filter fields only: `chunk_id`, `file_id`, `language`, `grade`, `subject`, `source_url`, `page_number`, `ingestion_ts`
Raw PDF files	S3 / GCS (Blob Store)	Archival; enables re-ingestion without re-downloading
User data (profiles, wallets, ledger)	Postgres	Relational, RLS-protected
Ingestion state machine	Postgres (`ingestion_jobs`)	Transactional state with audit trail
Reservation state	Postgres (`reservations`)	Must be atomic with wallet balance
Cached rerank results	Redis / In-memory	Ephemeral; TTL 15 min
Cached chunk text	Redis / In-memory	LRU; TTL 1 hour

Canonical Chunk-Store Approach

┌──────────────┐      ┌──────────────────────┐
│   Pinecone   │      │      Postgres        │
│              │      │                      │
│  vector_id ──┼──────┼→ chunks.chunk_id     │
│  metadata:   │      │  chunks.content      │
│   chunk_id   │      │  chunks.file_id      │
│   file_id    │      │  chunks.page_number  │
│   language   │      │  chunks.token_count  │
│   grade      │      │                      │
│   subject    │      └──────────────────────┘
│   page_number│
│   ingestion_ts│
└──────────────┘

At retrieval time: 1. Query Pinecone → get chunk_id list. 2. Fetch chunk text from cache (Redis/LRU) → on miss, query Postgres chunks table. 3. Pass text to reranker and reasoning model.

4. Postgres Schema Additions

`ingestion_jobs`

CREATE TABLE IF NOT EXISTS ingestion_jobs (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    reference_id    UUID NOT NULL REFERENCES references(id),
    file_id         UUID,                                       -- FK to documents table if applicable
    status          TEXT NOT NULL DEFAULT 'queued'
                    CHECK (status IN ('queued','parsing','tokenizing',
                           'embedding_request_sent','embedding_upserted',
                           'ready','failed')),
    chunks_created  INT DEFAULT 0,                              -- count of chunks produced
    vectors_upserted INT DEFAULT 0,                             -- count of vectors sent to Pinecone
    retry_count     INT DEFAULT 0,                              -- current retry attempt
    max_retries     INT DEFAULT 3,
    error_message   TEXT,                                       -- last error (nullable)
    created_at      TIMESTAMPTZ DEFAULT now(),
    updated_at      TIMESTAMPTZ DEFAULT now()
);

-- Index for status polling
CREATE INDEX IF NOT EXISTS idx_ingestion_jobs_status ON ingestion_jobs(status);
-- Index for reference lookup
CREATE INDEX IF NOT EXISTS idx_ingestion_jobs_reference ON ingestion_jobs(reference_id);

`chunks` (enhanced)

-- If chunks table already exists, ALTER; otherwise CREATE.
-- This shows the target schema.
CREATE TABLE IF NOT EXISTS chunks (
    chunk_id        TEXT PRIMARY KEY,                           -- sha256(file_id:page:chunk_index)
    file_id         UUID NOT NULL REFERENCES documents(id),
    page_number     INT NOT NULL,
    chunk_index     INT NOT NULL,                              -- position within the page
    content         TEXT NOT NULL,                              -- full chunk text
    token_count     INT NOT NULL,                              -- token count (tiktoken cl100k_base)
    language        TEXT NOT NULL DEFAULT 'fr',                 -- 'fr', 'ar', 'ha' (Hassaniya)
    embedding_model TEXT NOT NULL DEFAULT 'text-embedding-3-small',
    ingestion_job_id UUID REFERENCES ingestion_jobs(id),
    created_at      TIMESTAMPTZ DEFAULT now()
);

-- Composite index for idempotency check
CREATE UNIQUE INDEX IF NOT EXISTS idx_chunks_deterministic
    ON chunks(file_id, page_number, chunk_index);

`reservations`

CREATE TABLE IF NOT EXISTS reservations (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id         UUID NOT NULL REFERENCES auth.users(id),
    estimated       INT NOT NULL,                              -- estimated token cost reserved
    actual          INT,                                       -- actual token cost (set on finalize)
    status          TEXT NOT NULL DEFAULT 'reserved'
                    CHECK (status IN ('reserved','finalized','expired','refunded')),
    request_id      UUID,                                      -- links to usage_logs
    created_at      TIMESTAMPTZ DEFAULT now(),
    finalized_at    TIMESTAMPTZ,
    expires_at      TIMESTAMPTZ DEFAULT now() + INTERVAL '5 minutes'
);

CREATE INDEX IF NOT EXISTS idx_reservations_user ON reservations(user_id);
CREATE INDEX IF NOT EXISTS idx_reservations_status ON reservations(status)
    WHERE status = 'reserved';                                 -- partial index for expiry job

`wallet_ledger` (enhanced — add reservation_id)

-- ALTER existing table
ALTER TABLE wallet_ledger
    ADD COLUMN IF NOT EXISTS reservation_id UUID REFERENCES reservations(id);

`embedding_refs`

CREATE TABLE IF NOT EXISTS embedding_refs (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    chunk_id        TEXT NOT NULL REFERENCES chunks(chunk_id),
    pinecone_vector_id TEXT NOT NULL,                           -- the ID used in Pinecone
    pinecone_namespace TEXT NOT NULL,                           -- e.g. grade-12-math
    embedding_model TEXT NOT NULL DEFAULT 'text-embedding-3-small',
    upserted_at     TIMESTAMPTZ DEFAULT now()
);

CREATE UNIQUE INDEX IF NOT EXISTS idx_embedding_refs_vector
    ON embedding_refs(pinecone_vector_id, pinecone_namespace);

`ingestion_audit`

CREATE TABLE IF NOT EXISTS ingestion_audit (
    id              BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    ingestion_job_id UUID NOT NULL REFERENCES ingestion_jobs(id),
    from_status     TEXT,
    to_status       TEXT NOT NULL,
    message         TEXT,                                      -- error detail or info
    created_at      TIMESTAMPTZ DEFAULT now()
);

CREATE INDEX IF NOT EXISTS idx_ingestion_audit_job
    ON ingestion_audit(ingestion_job_id);

`references` enhancements

ALTER TABLE references
    ADD COLUMN IF NOT EXISTS content_fingerprint BIGINT,        -- SimHash for dedupe
    ADD COLUMN IF NOT EXISTS canonical_id UUID REFERENCES references(id),
    ADD COLUMN IF NOT EXISTS last_checked_at TIMESTAMPTZ,
    ADD COLUMN IF NOT EXISTS ocr_confidence REAL;               -- 0.0–1.0

5. Ingestion Pipeline

Deterministic Chunk ID

chunk_id = sha256( file_id + ":" + page_number + ":" + chunk_index )

file_id: UUID from the documents table.
page_number: 0-indexed page from PDF extraction.
chunk_index: 0-indexed position of the chunk within that page.

This ensures that re-ingesting the same file with the same parser produces identical chunk IDs → Pinecone upserts are idempotent (overwrite, no duplicates).

Token-Based Chunking Strategy

Language	Tokenizer	Chunk Size	Overlap	Notes
French	`tiktoken` / `cl100k_base`	512 tokens	64 tokens	Standard Latin-script tokenization
Arabic (MSA)	`tiktoken` / `cl100k_base`	384 tokens	48 tokens	Arabic tokenizes at ~1.5× expansion; smaller chunks maintain quality
Hassaniya	`tiktoken` / `cl100k_base`	384 tokens	48 tokens	Treated as Arabic-script; same tokenizer with cultural localization

Ingestion Job State Machine

stateDiagram-v2
    [*] --> queued : POST /ingestion/jobs
    queued --> parsing : Worker picks up job
    parsing --> tokenizing : Text extracted successfully
    parsing --> failed : PDF corrupt / download error
    tokenizing --> embedding_request_sent : Chunks created, embeddings requested
    tokenizing --> failed : Tokenizer error
    embedding_request_sent --> embedding_upserted : OpenAI returns embeddings
    embedding_request_sent --> embedding_request_sent : Transient error (retry ≤ 3)
    embedding_request_sent --> failed : Max retries exceeded
    embedding_upserted --> ready : Pinecone upsert confirmed
    embedding_upserted --> failed : Pinecone upsert error (after retries)
    failed --> queued : Manual retry via admin API
    ready --> [*]

Retry Semantics

Transient errors (HTTP 429, 500, 503 from OpenAI/Pinecone): Retry with exponential backoff (1 s, 4 s, 16 s). Max 3 retries.
Permanent errors (HTTP 400, invalid PDF): Move to failed immediately; no retry.
Idempotent upserts: Because chunk IDs and Pinecone vector IDs are deterministic, a retry that re-sends the same vectors is safe.

6. Pinecone Index & Metadata Schema

Index Configuration

Property	Value
Index name	`curriculum-1536`
Dimensions	1536
Metric	cosine
Cloud	Serverless (AWS or GCP)

Namespace Strategy

Format: grade-{grade}-{subject} (e.g., grade-12-math, grade-10-physics). Default namespace: default (for unclassified content).

Vector ID Format

vector_id = chunk_id   (i.e., the same sha256 hash)

Metadata Fields per Vector

{
  "chunk_id": "a1b2c3...",
  "file_id": "uuid-...",
  "language": "fr",
  "grade": "12",
  "subject": "math",
  "source_url": "https://koutoubi.mr/...",
  "page_number": 5,
  "ingestion_ts": "2026-02-17T10:30:00Z"
}

Note: text is NOT stored in Pinecone metadata. Full text lives in Postgres chunks.content.

Recommended Filter Fields

For query(filter=...): - language — prefilter to corpus language. - grade — scope to student's grade level. - subject — scope to the subject being studied. - file_id — useful for admin queries ("show all vectors from this document").

7. Retrieval → Rerank → Reasoning Pipeline

flowchart LR
    Q[User Query] --> LD[Language Detect<br/>GPT-mini]
    LD --> TR{Translation<br/>needed?}
    TR -->|Yes| TRANS[Translate to<br/>corpus language]
    TR -->|No| DR
    TRANS --> DR[Dense Retrieval<br/>Pinecone top-K]
    DR --> LP{Lexical<br/>prefilter?}
    LP -->|Arabic query| BM[BM25 Keyword<br/>Filter]
    LP -->|No| RR
    BM --> RR[Rerank<br/>GPT-mini top-N]
    RR --> CC[Cache Check<br/>sha256 query+ns+tier]
    CC -->|Hit| RES
    CC -->|Miss| RR2[Call GPT-mini<br/>reranker]
    RR2 --> CS[Cache Store<br/>TTL 15 min]
    CS --> RES[Fetch Chunk Text<br/>Cache → Postgres]
    RES --> GEN[Reasoning<br/>GPT-4o + context]
    GEN --> SSE[Stream SSE<br/>to client]

Cost Policy by Tier

Tier	top-K (Dense)	Rerank-N	Reranker	Cache TTL
Free	10	3	gpt-4o-mini	15 min
Standard	20	5	gpt-4o-mini	15 min
Premium	30	8	gpt-4o-mini	15 min

Caching Behavior

Cache	Key	TTL	Invalidation
Rerank results	`sha256(query + namespace + tier)`	15 min	On re-ingestion of any file in the namespace
Chunk text	`chunk_id`	1 hour	On re-ingestion (chunk_id changes if content changes)

8. API Contract

8.1 Auth Endpoints

`POST /auth/signup`

Delegates to Supabase Auth. The backend creates a profiles row and initializes a wallet via the handle_new_user trigger.

Field	Value
Auth	None (public)
Request	`{ "email": "student@example.mr", "password": "...", "metadata": { "full_name": "Ahmed" } }`
Response 201	`{ "user_id": "uuid", "email": "...", "role": "student" }`
Error 400	`{ "error": "email_already_registered" }`

`POST /auth/login`

Delegates to Supabase Auth, returns JWT.

Field	Value
Auth	None (public)
Request	`{ "email": "...", "password": "..." }`
Response 200	`{ "access_token": "jwt...", "refresh_token": "...", "expires_in": 3600 }`
Error 401	`{ "error": "invalid_credentials" }`

8.2 File Upload

`POST /upload/file`

Field	Value
Auth	Bearer JWT (admin role)
Request	`{ "filename": "math_12.pdf", "content_type": "application/pdf", "grade": "12", "subject": "math", "language": "fr" }`
Response 200	`{ "upload_url": "https://s3.../presigned...", "file_id": "uuid", "expires_in": 300 }`
Error 403	`{ "error": "admin_required" }`
Error 400	`{ "error": "invalid_file_type", "allowed": ["application/pdf"] }`

The client uploads directly to the presigned URL. After upload, the client calls POST /ingestion/jobs.

8.3 Ingestion

`POST /ingestion/jobs`

Field	Value
Auth	Bearer JWT (admin role)
Request	`{ "reference_id": "uuid", "force": false }`
Response 202	`{ "job_id": "uuid", "status": "queued" }`
Error 409	`{ "error": "ingestion_already_in_progress" }`
Error 404	`{ "error": "reference_not_found" }`

force: true re-ingests even if the reference is already ready.

`GET /ingestion/jobs/{id}`

Field	Value
Auth	Bearer JWT (admin role)
Response 200	`{ "job_id": "uuid", "status": "embedding_upserted", "chunks_created": 42, "vectors_upserted": 42, "retry_count": 0, "created_at": "...", "updated_at": "..." }`
Error 404	`{ "error": "job_not_found" }`

8.4 Chat / Ask

`POST /ask`

Field	Value
Auth	Bearer JWT (student/teacher/admin)
Request	`{ "question": "ما هي الترجمة في الرياضيات؟", "grade": "12", "subject": "math", "language": "ar", "stream": true }`
Response 200 (stream)	`text/event-stream` with SSE events: `data: {"token": "...", "type": "content"}` ... `data: {"type": "done", "sources": [...], "tokens_used": 12}`
Response 200 (JSON)	`{ "answer": "...", "sources": [{"page": 45, "file": "math_12.pdf", "snippet": "..."}], "tokens_used": 12, "reservation_id": "uuid" }`
Error 402	`{ "error": "insufficient_balance", "balance": 3, "estimated_cost": 5 }`
Error 503	`{ "error": "service_unavailable", "reason": "llm_circuit_open" }`

Internal flow: 1. Pre-validate input (GPT-mini: safety check, language detect). 2. Reserve tokens (POST /wallet/reserve internally). 3. Retrieve from Pinecone (dense search). 4. Optionally rerank (GPT-mini). 5. Generate answer (GPT-4o via LangGraph). 6. Finalize reservation with actual token usage.

8.5 Quiz Generation

`POST /quizzes/generate`

Field	Value
Auth	Bearer JWT (student/teacher/admin)
Request	`{ "grade": "12", "subject": "math", "topic": "translations", "num_questions": 5, "language": "fr" }`
Response 200	`{ "quiz_id": "uuid", "questions": [{ "q": "...", "options": ["A","B","C","D"], "correct": "B", "explanation": "...", "source_page": 45 }], "tokens_used": 20 }`
Error 402	`{ "error": "insufficient_balance" }`

8.6 Wallet / Billing

`POST /wallet/reserve`

Field	Value
Auth	Internal (service-to-service; not exposed publicly)
Request	`{ "user_id": "uuid", "estimated": 10, "request_id": "uuid" }`
Response 200	`{ "reservation_id": "uuid", "balance_after_reserve": 40 }`
Error 402	`{ "error": "insufficient_balance", "balance": 3, "estimated": 10 }`

`POST /wallet/finalize`

Field	Value
Auth	Internal (service-to-service)
Request	`{ "reservation_id": "uuid", "actual": 8 }`
Response 200	`{ "reservation_id": "uuid", "status": "finalized", "refunded": 2, "balance_after": 42 }`
Error 404	`{ "error": "reservation_not_found" }`
Error 409	`{ "error": "reservation_already_finalized" }`

`GET /wallet/balance`

Field	Value
Auth	Bearer JWT
Response 200	`{ "user_id": "uuid", "token_balance": 50, "subscription_tier": "free", "pending_reservations": 0 }`

8.7 Semantic Search

`GET /search/semantic`

Field	Value
Auth	Bearer JWT
Query params	`?q=translation&grade=12&subject=math&language=fr&limit=5`
Response 200	`{ "results": [{ "chunk_id": "...", "text": "...", "score": 0.92, "page": 45, "source": "math_12.pdf" }] }`

8.8 Admin Endpoints

`POST /admin/scraping/{source}/sync`

Field	Value
Auth	Bearer JWT (admin role)
Response 200	`{ "run_id": "uuid", "status": "success", "found": 15, "new": 3, "duplicates": 2, "errors": 0 }`

`POST /admin/reindex`

Field	Value
Auth	Bearer JWT (admin role)
Request	`{ "namespace": "grade-12-math", "reason": "model_upgrade" }` (omit namespace to reindex all)
Response 202	`{ "reindex_job_id": "uuid", "status": "queued", "estimated_chunks": 1200 }`

`PATCH /admin/users/{user_id}/role`

Field	Value
Auth	Bearer JWT (admin role)
Request	`{ "role": "teacher" }`
Response 200	`{ "user_id": "uuid", "role": "teacher" }`
Error 400	`{ "error": "invalid_role", "allowed": ["student","teacher","admin"] }`

Error Code Summary

Every error response includes request_id for correlation:

{ "error": "<code>", "request_id": "uuid-...", ... }

HTTP	Code	Meaning
400	`bad_request`	Invalid input
401	`unauthorized`	Missing or invalid JWT
402	`insufficient_balance`	Wallet balance too low
403	`forbidden`	Role not authorized
404	`not_found`	Resource does not exist
409	`conflict`	Duplicate or already-in-progress
429	`rate_limited`	Too many requests (includes `retry_after` seconds)
503	`service_unavailable`	LLM or Pinecone circuit open

9. RLS & Auth Plan

Current State

RLS Phase 2 complete: all public tables have RLS enabled.
Admin auth uses user_metadata.role (not custom claims yet).
x-admin-key still accepted (deprecated).

Target State

Roles in JWT custom claims via Postgres hook (app_metadata.role).
x-admin-key removed entirely.
New tables (ingestion_jobs, reservations, embedding_refs, ingestion_audit) have RLS.

RLS Policy Templates

User-facing tables (profiles, wallet, wallet_ledger, usage_logs, reservations)

-- Users can SELECT their own rows
CREATE POLICY "user_select_own" ON {table}
    FOR SELECT
    USING (auth.uid() = user_id);

-- No INSERT/UPDATE/DELETE via public API
-- (service_role bypasses RLS for backend operations)

System tables (ingestion_jobs, chunks, embedding_refs, ingestion_audit, documents)

ALTER TABLE {table} ENABLE ROW LEVEL SECURITY;

-- Only service_role can access
CREATE POLICY "service_role_only" ON {table}
    FOR ALL
    USING (auth.role() = 'service_role');

Admin tables (references, scrape_runs)

-- Admin can SELECT and INSERT/UPDATE
CREATE POLICY "admin_read_write" ON {table}
    FOR ALL
    USING (
        auth.role() = 'service_role'
        OR (auth.jwt() -> 'app_metadata' ->> 'role') = 'admin'
    );

JWT Custom Claims Migration Checklist

Create Postgres hook function:

CREATE OR REPLACE FUNCTION public.custom_access_token_hook(event jsonb)
RETURNS jsonb LANGUAGE plpgsql STABLE AS $$
DECLARE
    claims jsonb;
    user_role TEXT;
BEGIN
    SELECT role INTO user_role FROM public.profiles
        WHERE user_id = (event->>'user_id')::uuid;

    claims := event->'claims';
    IF user_role IS NOT NULL THEN
        claims := jsonb_set(claims, '{app_metadata,role}', to_jsonb(user_role));
    ELSE
        claims := jsonb_set(claims, '{app_metadata,role}', '"student"');
    END IF;

    event := jsonb_set(event, '{claims}', claims);
    RETURN event;
END;
$$;

-- Grant necessary permissions
GRANT USAGE ON SCHEMA public TO supabase_auth_admin;
GRANT EXECUTE ON FUNCTION public.custom_access_token_hook TO supabase_auth_admin;
REVOKE EXECUTE ON FUNCTION public.custom_access_token_hook FROM authenticated, anon, public;
GRANT SELECT ON TABLE public.profiles TO supabase_auth_admin;

Register in Supabase Dashboard: Authentication → Hooks → "Customize Access Token" → select custom_access_token_hook.
Update RLS policies: Change (auth.jwt() ->> 'role') to (auth.jwt() -> 'app_metadata' ->> 'role') in all admin policies.

Update FastAPI auth dependency:

# In app/core/auth.py — get_current_admin
# Read role from: jwt_payload["app_metadata"]["role"]
# Instead of: jwt_payload["user_metadata"]["role"]

Test with canary user: Create a test admin, verify JWT contains app_metadata.role = "admin", verify all admin endpoints accept the new token.
Remove x-admin-key support: Delete the x-admin-key header check from all routers. Update .env.example to remove ADMIN_API_KEY.
Rollback procedure:
If hook fails: Disable the hook in Supabase Dashboard. JWTs revert to default claims.
Keep user_metadata.role as fallback in get_current_admin for 2 weeks after migration.
Monitor auth error rates; if > 1% increase, rollback.

10. Billing & Wallet (Reservation Pattern)

Reservation Flow

sequenceDiagram
    participant C as Client
    participant API as FastAPI
    participant W as Wallet Service
    participant PG as Postgres
    participant LLM as OpenAI

    C->>API: POST /ask {question}
    API->>W: reserve(user_id, estimated=10)
    W->>PG: BEGIN TX: deduct estimated from wallet, insert reservation
    PG-->>W: reservation_id
    W-->>API: reservation_id, balance_after=40

    API->>LLM: Retrieve + Rerank + Generate
    LLM-->>API: answer (actual_tokens=8)

    API->>W: finalize(reservation_id, actual=8)
    W->>PG: BEGIN TX: update reservation, refund delta (2), insert ledger
    PG-->>W: OK
    W-->>API: finalized, refunded=2

    API-->>C: SSE stream + tokens_used=8

DB Transaction — Reserve

BEGIN;
    -- Deduct estimated amount from wallet
    UPDATE wallet
    SET token_balance = token_balance - :estimated,
        updated_at = now()
    WHERE user_id = :uid
      AND token_balance >= :estimated;
    -- If no row updated → insufficient balance → ROLLBACK

    -- Create reservation record
    INSERT INTO reservations (user_id, estimated, status, request_id, created_at, expires_at)
    VALUES (:uid, :estimated, 'reserved', :request_id, now(), now() + INTERVAL '5 minutes')
    RETURNING id;
COMMIT;

DB Transaction — Finalize

BEGIN;
    -- Mark reservation finalized
    UPDATE reservations
    SET actual = :actual,
        status = 'finalized',
        finalized_at = now()
    WHERE id = :reservation_id
      AND status = 'reserved';
    -- If no row updated → already finalized or expired → ROLLBACK

    -- Refund delta if actual < estimated
    UPDATE wallet
    SET token_balance = token_balance + GREATEST(:estimated - :actual, 0),
        updated_at = now()
    WHERE user_id = :uid;

    -- Record in ledger
    INSERT INTO wallet_ledger (user_id, delta, reason, request_id, reservation_id)
    VALUES (:uid, -:actual, 'agent_chat', :request_id, :reservation_id);
COMMIT;

Expiry Job (Background)

Runs every 60 seconds:

-- Find expired, un-finalized reservations
UPDATE reservations
SET status = 'expired'
WHERE status = 'reserved'
  AND expires_at < now()
RETURNING user_id, estimated;

-- For each expired reservation, refund the wallet
UPDATE wallet
SET token_balance = token_balance + :estimated
WHERE user_id = :uid;

INSERT INTO wallet_ledger (user_id, delta, reason, reservation_id)
VALUES (:uid, :estimated, 'reservation_expired', :reservation_id);

Reconciliation (Nightly)

-- Compare ledger sum vs wallet balance
SELECT
    w.user_id,
    w.token_balance AS current_balance,
    COALESCE(SUM(wl.delta), 0) AS ledger_sum,
    w.token_balance - COALESCE(SUM(wl.delta), 0) AS discrepancy
FROM wallet w
LEFT JOIN wallet_ledger wl ON w.user_id = wl.user_id
GROUP BY w.user_id, w.token_balance
HAVING w.token_balance != COALESCE(SUM(wl.delta), 0);

Flag any discrepancies > 0 as alerts. Do not auto-correct; require manual investigation.

11. Scraper & Canonicalization

Pipeline (Fully Automated)

flowchart TD
    SC[Scraper Fetches Sitemap] --> DL[Download PDF]
    DL --> NORM[Canonicalize Text]
    NORM --> FP[Compute SimHash Fingerprint]
    FP --> DD{Hamming Distance ≤ 3<br/>from existing?}
    DD -->|Yes| DUP[Mark as duplicate<br/>link canonical_id]
    DD -->|No| QC{Quality Check}
    QC -->|Pass| STORE[Insert into references<br/>status: discovered]
    QC -->|Fail| LOG[Log to ingestion_audit<br/>reason: quality_failed]
    DUP --> DONE[Done]
    STORE --> DONE
    LOG --> DONE

Canonicalization Steps

Whitespace normalization: Collapse multiple spaces, tabs, newlines to single space. Trim leading/trailing.
Arabic script normalization:
Unify alef variants: أ إ آ ا → ا
Remove tatweel (kashida): ـ → (empty)
Normalize taa marbuta: ة → ه (context-dependent, configurable)
Normalize hamza: ؤ ئ → و ي + hamza (optional, configurable)
Boilerplate removal: Per-source regex patterns (configurable in scraper_config.json):
Remove page headers/footers matching known patterns (e.g., "Page X of Y", site watermarks).
Non-content page filtering: Skip pages with < 50 characters after normalization.

Deduplication

Algorithm: SimHash (64-bit) on the normalized full text of the PDF.
Threshold: Hamming distance ≤ 3 → considered duplicate.
Storage: references.content_fingerprint stores the SimHash value.
Canonical reference: references.canonical_id (self-FK) points to the first-discovered version.

Provenance Metadata

Every references row contains:

Field	Purpose
`source_url`	Canonical URL (after redirect resolution)
`discovered_at`	First time scraper found this PDF
`last_checked_at`	Last time scraper verified URL is live
`content_fingerprint`	SimHash for deduplication
`scrape_run_id`	Which scrape run discovered it
`canonical_id`	Points to canonical (non-duplicate) reference

Content Quality Heuristics

Check	Threshold	Action
Minimum text length (per page)	≥ 200 chars after normalization	Skip page, log reason
OCR confidence (Arabic/Hassaniya)	≥ 0.70	Flag for review if below
OCR confidence (French)	≥ 0.80	Flag for review if below
Encoding	Valid UTF-8	Reject and log
File size	≤ 100 MB	Reject oversized files

12. Observability & Resiliency

Metrics to Emit

Metric	Type	Labels	Purpose
`ingestion_job_duration_seconds`	Histogram	`status`, `language`	Track ingestion performance
`ingestion_job_status_total`	Counter	`status`	Track job outcomes
`pinecone_query_duration_seconds`	Histogram	`namespace`	Vector search latency
`pinecone_upsert_duration_seconds`	Histogram	`namespace`	Upsert latency
`openai_request_duration_seconds`	Histogram	`model`, `endpoint`	LLM call latency
`openai_tokens_used_total`	Counter	`model`, `type` (input/output)	Cost tracking
`wallet_reservation_total`	Counter	`status` (reserved/finalized/expired)	Billing flow health
`wallet_balance_discrepancy`	Gauge	—	Reconciliation drift
`circuit_breaker_state`	Gauge	`service` (openai/pinecone)	0=closed, 1=open, 2=half-open
`http_request_duration_seconds`	Histogram	`method`, `path`, `status`	API latency
`rerank_cache_hit_ratio`	Gauge	—	Cache effectiveness
`active_reservations`	Gauge	—	Currently reserved, un-finalized
`rate_limit_rejected_total`	Counter	`scope` (user/ip), `path`	Rate-limit enforcement activity
`request_id_propagation`	—	—	All log lines, wallet rows, usage_logs, and error responses include `request_id`

Circuit Breaker Configuration

Service	Failure threshold	Window	Recovery timeout	Fallback
OpenAI Embeddings	3 failures	60 s	120 s	Queue job for later retry
OpenAI gpt-4o-mini (rerank)	3 failures	60 s	120 s	Skip reranking; use dense order
OpenAI gpt-4o (reasoning)	3 failures	60 s	120 s	Return 503 to client
Pinecone (query)	3 failures	60 s	120 s	Return 503 to client
Pinecone (upsert)	3 failures	60 s	120 s	Queue for retry

Alert Thresholds

Alert	Condition	Severity
High ingestion failure rate	> 20% of jobs in `failed` state in last hour	Critical
Wallet discrepancy detected	Any non-zero discrepancy in reconciliation	Warning
Circuit breaker opened	Any circuit breaker transitions to `open`	Critical
High reservation expiry rate	> 10% of reservations expiring (not finalized) in last hour	Warning
OpenAI latency spike	p99 > 30 seconds for any model	Warning
Pinecone latency spike	p99 > 5 seconds	Warning
Stale reservations	> 50 reservations in `reserved` status older than 5 min	Warning
Single user rate-limited repeatedly	Same user_id rate-limited > 20 times in 5 min	Warning (potential abuse)

Reindex & Disaster Recovery

Reindex Strategy

Export all canonical chunks from Postgres chunks table.
Re-embed using the new model.
Upsert to a new Pinecone namespace (e.g., grade-12-math-v2).
Swap the active namespace in config once verification passes.
Delete the old namespace.

Disaster Recovery

Blob store: Raw PDFs archived in S3/GCS. Can re-ingest from scratch.
Postgres: Supabase provides automatic daily backups + point-in-time recovery.
Pinecone: If Pinecone data is lost, re-embed from Postgres chunks table (canonical chunks are the source of truth).
Export schedule: Weekly export of chunks table to blob store as Parquet/CSV for offline recovery.

13. Security

Secrets Management

Secret	Current	Target
`OPENAI_API_KEY`	`.env` file	Cloud Secret Manager (GCP/AWS/Azure) or Vault
`PINECONE_API_KEY`	`.env` file	Cloud Secret Manager
`SUPABASE_SERVICE_KEY`	`.env` file	Cloud Secret Manager
`ADMIN_API_KEY`	`.env` file	Remove entirely (replace with JWT admin role)

Rules: - In production, secrets MUST NOT be stored in plain environment variables or .env files. - Use the deployment platform's secret injection (e.g., Render's Environment Groups, GCP Secret Manager, AWS Secrets Manager). - Rotate keys quarterly. Automate rotation where possible. - SUPABASE_SERVICE_KEY should only be available to the backend service, never to the frontend.

PII Redaction

Before sending any user data to OpenAI (chat, rerank, quiz generation):
Strip email addresses (regex: \S+@\S+\.\S+).
Strip phone numbers (regex: Mauritanian format +222...).
Do NOT send user_id or wallet balance to OpenAI.
Log redacted versions in usage_logs.

Audit Logging

Event	Table	Fields
User login	Supabase Auth logs (built-in)	timestamp, user_id, IP
Admin action (role change, sync, reindex)	`ingestion_audit` or dedicated `admin_audit`	admin_user_id, action, target, timestamp
Wallet mutation	`wallet_ledger`	user_id, delta, reason, request_id, reservation_id
Ingestion state change	`ingestion_audit`	job_id, from_status, to_status, message
RLS policy violation	Postgres logs	query, user, table, policy

14. Testing Matrix

RLS Tests

#	Test Case	Table	User	Operation	Expected
T1	Student reads own profile	`profiles`	authenticated (student)	SELECT WHERE user_id = self	Allowed
T2	Student reads other profile	`profiles`	authenticated (student)	SELECT WHERE user_id = other	Denied (0 rows)
T3	Student reads own wallet	`wallet`	authenticated (student)	SELECT WHERE user_id = self	Allowed
T4	Student updates own wallet	`wallet`	authenticated (student)	UPDATE	Denied
T5	Anonymous reads profiles	`profiles`	anon	SELECT	Denied (0 rows)
T6	Service role reads all	`wallet`	service_role	SELECT	Allowed (all rows)
T7	Admin reads references	`references`	authenticated (admin)	SELECT	Allowed
T8	Student reads references	`references`	authenticated (student)	SELECT	Denied (0 rows)
T9	Student reads ingestion_jobs	`ingestion_jobs`	authenticated (student)	SELECT	Denied (0 rows)
T10	Student reads own reservations	`reservations`	authenticated (student)	SELECT WHERE user_id = self	Allowed
T11	Student reads other reservations	`reservations`	authenticated (student)	SELECT WHERE user_id = other	Denied (0 rows)

Ingestion Idempotency Tests

#	Test Case	Expected
T12	Ingest same file twice	Second run produces same chunk_ids; Pinecone vector count unchanged; no duplicate chunks in Postgres
T13	Ingest file, modify content, re-ingest	New chunk_ids generated; old vectors replaced; old chunks marked stale
T14	Ingestion fails mid-embedding	Job status = `failed`; partial chunks cleaned up; re-trigger starts fresh
T15	Concurrent ingestion of same reference	Second job returns 409 conflict

Reservation / Billing Tests

#	Test Case	Expected
T16	Reserve with sufficient balance	Balance decremented; reservation created with status `reserved`
T17	Reserve with insufficient balance	402 error; balance unchanged; no reservation created
T18	Finalize with actual < estimated	Delta refunded to wallet; ledger entry = -actual
T19	Finalize with actual > estimated (capped)	Additional deduction from wallet (capped at 2× estimate); ledger entry = -actual
T20	Reservation expires (not finalized)	Expiry job refunds estimated amount; reservation status = `expired`
T21	Double-finalize same reservation	Second call returns 409; no double-deduction
T22	Reconciliation detects discrepancy	Alert fired; no auto-correction

Rerank & Caching Tests

#	Test Case	Expected
T23	Same query within TTL	Second call hits cache; no GPT-mini call
T24	Same query after TTL expires	Cache miss; GPT-mini called; new cache entry
T25	Re-ingestion invalidates cache	After re-ingestion of file in namespace, cached reranks for that namespace are evicted

Circuit Breaker Tests

#	Test Case	Expected
T26	OpenAI returns 500 three times	Circuit opens; rerank calls skip to dense order; circuit resets after 120 s
T27	Pinecone times out during ingestion	Job retries (up to 3); if all fail, circuit opens; ingestion jobs queued

Rate Limiting & Request ID Tests

#	Test Case	Expected
T28	Student sends 11 `/ask` requests in 1 minute	First 10 succeed; 11th returns 429 with `retry_after` and `request_id`
T29	Unauthenticated IP sends 6 `/auth/login` in 1 minute	First 5 succeed; 6th returns 429
T30	Admin sends 61 `/admin/*` requests in 1 minute	First 60 succeed; 61st returns 429
T31	Every successful `/ask` response includes `request_id`	`request_id` present in SSE `done` event and in JSON response
T32	Every error response includes `request_id`	400, 401, 402, 403, 429, 503 responses all contain `request_id` field
T33	`request_id` propagates to `reservations` and `wallet_ledger`	After a chat, `reservations.request_id` and `wallet_ledger.request_id` match the API response's `request_id`
T34	Client-provided `X-Request-ID` is adopted	If client sends `X-Request-ID: custom-uuid`, the response and logs use that same ID

Integration / E2E Tests

#	Test Case	Expected
T35	Full chat flow (reserve → retrieve → rerank → answer → finalize)	Correct answer returned; wallet balance = original - actual; ledger entry exists; all rows share same `request_id`
T36	Arabic query against French corpus	Translation occurs; relevant French chunks retrieved; answer in Arabic
T37	Admin triggers scrape → ingest → search	New references discovered; ingestion completes; semantic search returns results from new content

15. Sonnet Task List (Implementation Status)

Priority 1: Correctness & Data Integrity

#	Title	Implementation Status	Files Created	Testing Status
S1	Deterministic chunk IDs	✅ COMPLETE	`app/services/chunking.py`, migration 13	⏳ Needs integration testing
S2	Ingestion jobs state machine	✅ COMPLETE	`app/services/ingestion.py`, migration 12	✅ Wired to admin router
S3	Reservation-based billing	✅ COMPLETE & TESTED	`app/services/wallet_reservation.py`, migration 14, `scripts/expire_reservations.py`	✅ Working in wallet router
S4	Lightweight Pinecone metadata	✅ COMPLETE	`app/services/pinecone_adapter.py`	⏳ Needs retrieval testing
S5	Embedding refs tracking	✅ COMPLETE	`app/services/embedding_service.py`, migration 15	⏳ Needs ingestion testing
S21	Presigned upload service	✅ COMPLETE	`app/services/upload.py`	⏳ Needs router implementation

Priority 2: Security & RLS Hardening

#	Title	Implementation Status	Files Created	Testing Status
S6	JWT custom claims hook	✅ COMPLETE	Migration 18, `app/core/auth.py` updated	⏳ Hook needs manual registration in Dashboard
S7	Remove x-admin-key support	⏳ DEPRECATED (warnings added)	`app/core/auth.py` updated	✅ Logs deprecation warnings
S8	RLS for new tables	✅ COMPLETE	Migration 16	⏳ Needs database migration run
S9	Secrets management	⏳ PARTIAL (config updated)	`app/core/config.py`, `.env.example`	⏳ Production deployment needed
S9b	Request-ID + rate limiting	✅ COMPLETE	`app/core/middleware.py`	⏳ Needs production testing

Priority 3: Caching & Cost Control

#	Title	Implementation Status	Files Created	Testing Status
S10	Rerank result caching	✅ COMPLETE	`app/services/cache.py`	⏳ Needs retrieval pipeline testing
S11	Chunk text cache	✅ COMPLETE	`app/services/cache.py`	⏳ Needs retrieval pipeline testing
S12	Tier-based retrieval limits	✅ COMPLETE	`app/services/tier_config.py`	⏳ Needs retrieval pipeline testing

Priority 4: Scraper Hardening

#	Title	Implementation Status	Files Created	Testing Status
S13	SimHash deduplication	✅ COMPLETE	`app/services/deduplication.py`, migration 17	⏳ Needs scraper integration testing
S14	Arabic canonicalization	✅ COMPLETE	`app/services/text_normalizer.py`	⏳ Needs scraper testing
S15	Quality heuristics	✅ COMPLETE	`app/services/quality_checker.py`	⏳ Needs scraper testing

Priority 5: Observability & DR

#	Title	Implementation Status	Files Created	Testing Status
S16	Circuit breaker	✅ COMPLETE	`app/services/circuit_breaker.py`	⏳ Needs failure simulation testing
S17	Structured logging & metrics	✅ COMPLETE	`app/core/logging.py`, `app/core/metrics.py`, `app/api/routers/metrics.py`	✅ Metrics endpoints working
S18	Wallet reconciliation job	✅ COMPLETE	`scripts/reconcile_wallets.py`	⏳ Ready to run (needs cron setup)
S19	Reindex & DR export	✅ COMPLETE	`scripts/reindex.py`, `scripts/export_chunks.py`	⏳ Ready to run (needs cron setup)

Priority 6: API & Integration

#	Title	Implementation Status	Files Created	Testing Status
S20	GPT-mini service	✅ COMPLETE	`app/services/gpt_mini.py`	⏳ Needs retrieval pipeline testing
S21	Presigned upload endpoint	✅ COMPLETE	`app/services/upload.py`	⏳ Needs router implementation
S22	Quiz generation	✅ COMPLETE	`app/services/quiz_generator.py`, `app/api/routers/quiz.py`	⏳ Needs dependency injection + testing

Integration & Fixes (Post-Sonnet)

#	Title	Implementation Status	Files Created/Modified	Testing Status
I1	Dependency injection pattern	✅ COMPLETE & WORKING	`app/core/dependencies.py`	✅ Used by wallet, admin routers
I2	Wallet router integration	✅ COMPLETE & WORKING	`app/api/routers/wallet.py` (updated)	✅ Balance, reservations tested
I3	Admin router fixes	✅ COMPLETE & WORKING	`app/api/routers/admin.py` (fixed)	✅ Users, roles tested
I4	Auth fixes	✅ COMPLETE & WORKING	`app/core/auth.py` (fixed)	✅ JWT flow tested
I5	Chat router integration	⏳ TODO	`app/api/routers/chat.py` (stub exists)	⏳ Needs retrieval_pipeline wiring
I6	Quiz router integration	⏳ TODO	`app/api/routers/quiz.py` (stub exists)	⏳ Needs quiz_generator wiring
I7	Ingestion router creation	⏳ TODO	`app/api/routers/ingestion.py` (missing)	⏳ Needs creation
I8	Scraper router integration	⏳ TODO	`app/api/routers/scraper_admin.py` (stub exists)	⏳ Needs scraper_service wiring

Appendix: Prioritization Rules

The Sonnet task list is ordered by these rules:

Correctness first (S1–S5): Fix ingestion idempotency, billing atomicity, and data integrity. Without these, the system produces duplicates and loses revenue.
Security second (S6–S9): Harden auth and RLS. Without these, students can access admin data or bypass billing.
Cost control third (S10–S12): Add caching and tier enforcement. Without these, the platform overspends on LLM calls.
Scraper quality fourth (S13–S15): Add dedupe and canonicalization. Without these, the vector index contains duplicates and noise.
Observability fifth (S16–S19): Add circuit breakers, metrics, and DR. Without these, outages go undetected and recovery is manual.
New features last (S20–S22): GPT-mini service, file upload, quiz generation. These add value but depend on the foundation above.

16. Dependency Injection Implementation

Overview

A centralized service registry pattern was implemented to properly wire all services to routers.

File: app/core/dependencies.py Pattern: Singleton instances initialized at module level Benefit: Proper dependency injection, testable, no duplicate instances

Service Initialization Order

# 1. External Clients
openai_client = OpenAI(api_key=settings.OPENAI_API_KEY)
supabase_service = create_client(settings.SUPABASE_URL, settings.SUPABASE_SERVICE_ROLE_KEY)

# 2. Base Adapters
pinecone_adapter = PineconeAdapter(api_key=..., index_name=...)
cache_service = CacheService()

# 3. Core Services
embedding_service = EmbeddingService(openai_client, supabase_service, pinecone_adapter)
gpt_mini_service = GPTMiniService(openai_client)
wallet_service = WalletReservationService(supabase_service)
ingestion_service = IngestionService(supabase_service)

# 4. Pipelines
retrieval_pipeline = RetrievalPipeline(
    openai_client, supabase_service, pinecone_adapter,
    embedding_service, gpt_mini_service, cache_service
)

# 5. Feature Services
quiz_generator = QuizGeneratorService(openai_client, retrieval_pipeline)
scraper_service = ScraperService(supabase_service, text_normalizer, deduplication_service, quality_checker)

Usage in Routers

Working Example (wallet router):

from app.core.dependencies import wallet_service, supabase_service

@router.get("/balance")
async def get_balance(user: dict = Depends(get_current_user)):
    balance_data = wallet_service.get_balance(UUID(user["id"]))
    return WalletBalanceResponse(**balance_data)

See: docs/90_ops/dependency_injection.md for full documentation

17. Implementation Status & Next Steps

✅ What's Working (Tested on Personal Laptop)

Auth & Profile: Signup, signin, JWT, profile management
Wallet & Billing: Balance, reservations, reserve/finalize pattern
Admin: User management, role updates
Metrics: Prometheus and JSON endpoints

⏳ What's Ready But Needs Router Wiring

Chat with RAG: RetrievalPipeline ready, chat router needs integration (2-3 hours)
Quiz Generation: QuizGeneratorService ready, quiz router needs wiring (1 hour)
PDF Ingestion: IngestionService ready, router needs creation (1-2 hours)
Scraper Sync: ScraperService ready, scraper router needs wiring (1 hour)
File Upload: UploadService ready, endpoint needs creation (30 min)

Estimated Remaining Work: 5-7 hours of router integration

🔧 Fixes Applied (Post-Sonnet Implementation)

Fix	File	Issue	Resolution	Status
Dependency injection	`app/core/dependencies.py`	Services not wired to routers	Created singleton registry	✅ Working
Wallet router	`app/api/routers/wallet.py`	Endpoints were stubs	Implemented actual logic	✅ Working
Admin router	`app/api/routers/admin.py`	Not using dependencies	Imported singletons	✅ Working
Auth service	`app/core/auth.py`	Client pattern mismatch	Fixed service client usage	✅ Working
Type hints	Multiple services	Missing imports	Added List, Dict, Any imports	✅ Fixed

📖 Documentation Organization

All documentation moved to mkdocs structure: - docs/00_overview/ - High-level guides (architecture, start_here) - docs/20_runbooks/ - Operational guides (quick_start, deployment) - docs/30_design/ - Design docs (plan, RLS, auth, etc.) - docs/90_ops/ - Implementation guides (status, dependency injection, phase guides) - docs/Artifacts/ - Phase completion guides, checklists - docs/Postman/ - Postman testing guides

See: MkDocs navigation for searchable documentation

18. Testing Strategy (Updated)

Unit Tests (Ready)

tests/unit/test_chunking.py - Deterministic chunk ID tests
Additional unit tests can be created for each service

Integration Tests (Needs Router Completion)

Chat flow: reserve → retrieve → answer → finalize
Ingestion flow: upload → parse → chunk → embed → upsert
Scraper flow: sync → dedupe → quality check → insert

Postman Collection (Ready)

postman/collection_v2.json - 40+ endpoints
10 testing workflows documented
Auto-capture of JWT, request-ID, job-ID, etc.

Background Jobs (Ready to Deploy)

Reservation expiry: Continuous systemd service
Wallet reconciliation: Daily cron at 2 AM
DR export: Weekly cron on Sunday 3 AM

This document reflects the actual implementation state as of 2026-02-17 after Sonnet implementation and integration fixes. For next steps, see docs/90_ops/implementation_status.md