Skip to content

BacMR Backend Architecture

Status: ✅ Phases A-F Implemented (Integration In Progress) Audience: Developers, contributors, and operators Last updated: 2026-02-17 Implementation Branch: feature/sonnet-impl-20260217-155229


🎯 Implementation Status Summary

Phase Services Migrations API Endpoints Status
A - Core Schema ✅ Complete ✅ 12-17 ⏳ 70% Services ready, some endpoints need wiring
B - Security ✅ Complete ✅ 18-19 ✅ 95% Auth, wallet, admin working
C - Caching ✅ Complete N/A N/A Internal services ready
D - Retrieval ✅ Complete N/A ⏳ 60% Services ready, endpoints need integration
E - Scraper ✅ Complete N/A ⏳ 60% Services ready, endpoints need integration
F - Observability ✅ Complete N/A ✅ 90% Metrics working, background jobs ready

Overall: Services 95% ✅ | Migrations 100% ✅ | API Integration 70% ⏳

✅ Working End-to-End

  • Auth (signup, signin, profile, JWT custom claims)
  • Wallet (balance, reservations, reserve/finalize pattern)
  • Admin (user management, role updates, admin access control)
  • Metrics (Prometheus + JSON endpoints)

⏳ Services Ready, Endpoints Need Integration

  • Chat with RAG retrieval (RetrievalPipeline ready, router needs wiring)
  • Quiz generation (QuizGeneratorService ready, router needs wiring)
  • PDF ingestion (IngestionService ready, router needs creation)
  • Scraper sync (ScraperService ready, router needs wiring)

See: docs/90_ops/implementation_status.md for detailed status

See: docs/90_ops/dependency_injection.md for service wiring pattern


Table of Contents

  1. High-Level Architecture Diagram (includes Request ID Propagation & Rate Limiting)
  2. Service Responsibility List
  3. Storage Responsibilities
  4. Postgres Schema Additions
  5. Ingestion Pipeline
  6. Pinecone Index & Metadata Schema
  7. Retrieval → Rerank → Reasoning Pipeline
  8. API Contract
  9. RLS & Auth Plan
  10. Billing & Wallet (Reservation Pattern)
  11. Scraper & Canonicalization
  12. Observability & Resiliency
  13. Security
  14. Testing Matrix
  15. Sonnet Task List

1. High-Level Architecture Diagram

graph TB
    subgraph Client
        U[Student / Teacher / Admin<br/>Next.js Frontend]
    end

    subgraph API_Gateway ["API Gateway (FastAPI)"]
        GW[FastAPI App<br/>CORS · Rate Limit · Auth Middleware]
    end

    subgraph Auth ["Auth Layer"]
        SA[Supabase Auth<br/>JWT + Custom Claims Hook]
    end

    subgraph Core_Services ["Core Services"]
        FU[File Upload Service<br/>Presigned URL → S3/GCS]
        PW[Parser Workers<br/>PDF Extract · OCR · Normalize]
        EW[Embedding Worker<br/>tiktoken chunker · OpenAI embed]
        VA[Vector Adapter<br/>Pinecone upsert/query]
        GM[GPT-mini Service<br/>Rerank · Validate · Detect Language]
        RS[Reasoning Service<br/>LangGraph Teacher Agent · GPT-4o]
        BS[Billing Service<br/>Reserve · Finalize · Reconcile]
    end

    subgraph Data_Stores ["Data Stores"]
        PG[(Supabase Postgres<br/>profiles · wallet · chunks<br/>ingestion_jobs · reservations)]
        PC[(Pinecone<br/>curriculum-1536<br/>vectors + lightweight metadata)]
        S3[(Blob Store<br/>S3 / GCS<br/>Raw PDFs)]
        RC[(Cache<br/>Redis / In-Memory LRU)]
    end

    subgraph Infra ["Infrastructure"]
        SM[Secret Manager<br/>Vault / Cloud KMS]
        MON[Monitoring<br/>Structured Logs · Metrics · Alerts]
        RJ[Reindex Job<br/>Scheduled · DR Export]
    end

    U -->|"HTTPS + JWT"| GW
    GW -->|"Verify JWT"| SA
    GW -->|"Upload PDF"| FU
    GW -->|"/ask, /chat"| RS
    GW -->|"/ingestion/jobs"| PW
    GW -->|"/wallet/*"| BS

    FU -->|"Store raw PDF"| S3
    PW -->|"Extract text"| S3
    PW -->|"Write chunks"| PG
    PW -->|"Request embeddings"| EW
    EW -->|"OpenAI embed API"| OAI_E[OpenAI Embeddings]
    EW -->|"Upsert vectors"| VA
    VA -->|"Upsert/Query"| PC

    RS -->|"Dense search"| VA
    RS -->|"Rerank candidates"| GM
    GM -->|"gpt-4o-mini API"| OAI_M[OpenAI gpt-4o-mini]
    RS -->|"Generate answer"| OAI_C[OpenAI gpt-4o]
    RS -->|"Reserve / finalize tokens"| BS
    RS -->|"Fetch chunk text"| RC
    RC -->|"Cache miss"| PG

    BS -->|"Ledger + Reservations"| PG
    PW -->|"Audit log"| PG

    GW -->|"Read secrets"| SM
    GW -->|"Emit metrics"| MON
    RS -->|"Emit metrics"| MON
    RJ -->|"Export chunks + vectors"| S3
    RJ -->|"Re-embed"| EW

    style PG fill:#336791,color:#fff
    style PC fill:#1a73e8,color:#fff
    style S3 fill:#e47911,color:#fff
    style RC fill:#dc382c,color:#fff

Data-Flow Paths

Path Flow
Ingestion Admin → FastAPI → File Upload → S3 → Parser Worker → Embedding Worker → Pinecone + Postgres
Retrieval (Chat) Student → FastAPI → Auth → Billing Reserve → Pinecone query → Cache/Postgres (chunk text) → GPT-mini rerank → GPT-4o reason → SSE stream → Billing Finalize
Billing FastAPI → Reserve (Postgres TX) → LLM call → Finalize (Postgres TX) → Ledger entry
Scraping Admin → FastAPI → Scraper → Canonicalize → Dedupe → Postgres references

Request ID Propagation

Every inbound HTTP request receives a request_id (UUID v4) at the API gateway. This ID is the single correlation key across every subsystem — without it, debugging an LLM failure that spans Pinecone, OpenAI, wallet, and audit tables is nearly impossible.

Generation: FastAPI middleware generates request_id = uuid4() at the start of every request (or adopts X-Request-ID from the client/load-balancer if present).

Propagation path:

Component How request_id is used
Structured logs Every log line includes request_id as a top-level JSON field
Wallet / Reservations reservations.request_id and wallet_ledger.request_id link billing to the originating request
Usage logs usage_logs.request_id correlates the RAG interaction
OpenAI calls Passed as user parameter in OpenAI API calls (enables cost attribution in OpenAI Dashboard)
Pinecone queries Logged alongside query parameters for post-hoc debugging
Ingestion audit ingestion_audit.request_id (nullable — only set when triggered via API, not cron)
SSE stream Returned in the final done event: {"type": "done", "request_id": "uuid", ...}
Error responses Every error response body includes "request_id": "uuid" so the client can report it

Implementation: - Middleware sets request.state.request_id. - A contextvars.ContextVar makes it available to all service layers without explicit threading. - File: app/core/middleware.py

Rate Limiting

Students will spam refresh, open multiple tabs, and trigger parallel queries. Without rate limiting, a single user can exhaust the platform's OpenAI quota.

Strategy: Per-user (authenticated) rate limit with per-IP fallback for unauthenticated endpoints.

Scope Limit Window Applies to
Per-user (JWT user_id) 10 requests 1 minute /ask, /chat, /quizzes/generate, /search/semantic
Per-user (JWT user_id) 30 requests 1 minute /wallet/*, /upload/*
Per-IP (unauthenticated) 5 requests 1 minute /auth/signup, /auth/login
Per-user (admin) 60 requests 1 minute /admin/*, /ingestion/*, /scraping/*

Enforcement: - In-memory sliding-window counter (sufficient at single-instance scale). - If deploying multiple instances: Redis-backed counter (same Redis as cache layer). - Response on breach: HTTP 429 Too Many Requests with Retry-After header (seconds until window resets). - request_id is included in the 429 response body for support debugging.

Response format:

{
  "error": "rate_limited",
  "request_id": "uuid",
  "retry_after": 23,
  "limit": 10,
  "window": "1m"
}

Implementation: FastAPI middleware in app/core/middleware.py (same file as request-ID middleware).


2. Service Responsibility List

2.1 Implemented Services (Phase A-F)

Service Responsibility Implementation File Status
API Gateway Route requests, CORS, auth middleware, request validation app/main.py, app/api/routers/ ✅ Working
Request Middleware Generate/adopt request_id (UUID), enforce per-user and per-IP rate limits, inject request_id into contextvars app/core/middleware.py ✅ Implemented
Auth Service JWT verification, role extraction from custom claims (app_metadata.role), admin guard app/core/auth.py ✅ Working
Dependency Registry Centralized singleton service instances with proper dependency wiring app/core/dependencies.py ✅ Working
Chunking Service Token-based chunking (tiktoken), deterministic chunk IDs (sha256(file_id:page:chunk_index)), language-specific sizes app/services/chunking.py ✅ Implemented (S1)
Ingestion Service State machine (queued → ready/failed), retry logic (max 3), audit trail app/services/ingestion.py ✅ Implemented (S2)
Wallet Reservation Service Reserve tokens (atomic), finalize after LLM, expire stale reservations, reconcile ledger app/services/wallet_reservation.py ✅ Working (S3)
Pinecone Adapter Upsert/query vectors with lightweight metadata (<1 KB), no full text storage app/services/pinecone_adapter.py ✅ Implemented (S4)
Embedding Service Generate embeddings (OpenAI), track refs in embedding_refs table, upsert to Pinecone app/services/embedding_service.py ✅ Implemented (S5)
Upload Service Generate presigned URLs for S3/GCS/Supabase Storage, validate file type/size app/services/upload.py ✅ Implemented (S21)
Cache Service Dual LRU cache (rerank 15-min TTL, chunk text 1-hour TTL), invalidation on re-ingestion app/services/cache.py ✅ Implemented (S10-S11)
Tier Config Free/Standard/Premium limits (top-K, rerank-N, tokens), cost estimation app/services/tier_config.py ✅ Implemented (S12)
GPT-mini Service Rerank candidates, detect language (French/Arabic/Hassaniya), translate queries, validate input, circuit breaker app/services/gpt_mini.py ✅ Implemented (S20)
Retrieval Pipeline Full flow: detect language → translate → embed → dense search → rerank → fetch chunks app/services/retrieval_pipeline.py ✅ Implemented (Phase D)
Quiz Generator RAG-based quiz generation with GPT-4o, multiple-choice with explanations and source pages app/services/quiz_generator.py ✅ Implemented (S22)
Circuit Breaker Protection for OpenAI/Pinecone calls, 3 failures → open, 120s recovery, fallback strategies app/services/circuit_breaker.py ✅ Implemented (S16)
Text Normalizer Arabic canonicalization (alef unification, tatweel removal, boilerplate removal) app/services/text_normalizer.py ✅ Implemented (S14)
Deduplication Service SimHash (64-bit) with Hamming distance ≤ 3 for duplicate detection app/services/deduplication.py ✅ Implemented (S13)
Quality Checker Content quality heuristics (min length, OCR confidence, encoding validation) app/services/quality_checker.py ✅ Implemented (S15)
Scraper Service Automated pipeline: canonicalize → quality check → dedupe → insert canonical refs app/services/scraper_service.py ✅ Implemented (Phase E)
Monitoring Structured JSON logging, Prometheus-compatible metrics (counters, histograms, gauges) app/core/logging.py, app/core/metrics.py ✅ Implemented (S17)
Config Management Settings with env vars, defaults for all parameters app/core/config.py ✅ Updated

2.2 Background Jobs (Phase F)

Job Responsibility Implementation File Schedule Status
Reservation Expiry Expire un-finalized reservations older than 5 min, refund tokens scripts/expire_reservations.py Continuous (60s loop) ✅ Ready
Wallet Reconciliation Compare wallet balance with ledger sum, flag discrepancies (no auto-correct) scripts/reconcile_wallets.py Daily 2 AM (cron) ✅ Ready (S18)
DR Export Export chunks to NDJSON, upload to blob store for disaster recovery scripts/export_chunks.py Weekly Sunday 3 AM (cron) ✅ Ready (S19)
Reindex Re-embed chunks with new model, create new namespace, verify counts scripts/reindex.py On-demand (manual) ✅ Ready (S19)

2.3 Legacy Services (Pre-existing, Kept for Compatibility)

Service File Notes
Legacy Embeddings app/services/embeddings.py Kept for backward compatibility; new code uses embedding_service.py
Legacy Wallet app/services/wallet.py Kept for backward compatibility; new code uses wallet_reservation.py
Legacy Pinecone app/services/pinecone_store.py Kept; new code uses pinecone_adapter.py
Legacy Retrieval app/services/retrieval.py Kept; new code uses retrieval_pipeline.py

GPT-mini Validator/Reranker — Hosting & SLA

  • Hosted on: OpenAI API (same API key as main models). Model: gpt-4o-mini.
  • SLA: Same as OpenAI API (99.9% target). No self-hosted fallback needed at current scale.
  • Fallback: If gpt-4o-mini returns error or latency > 5 seconds:
  • Reranking: Skip rerank, return dense-retrieval order (graceful degradation).
  • Language detection: Fall back to simple regex-based Arabic/French detector.
  • Input validation: Allow the request through (fail-open for validation; fail-closed for safety).

3. Storage Responsibilities

What Goes Where

Data Store Rationale
Full chunk text Postgres (chunks.content) Source of truth; enables full-text search; avoids Pinecone 40 KB metadata limit
Embedding vectors (1536-dim) Pinecone Optimized for ANN search
Lightweight metadata per vector Pinecone metadata Filter fields only: chunk_id, file_id, language, grade, subject, source_url, page_number, ingestion_ts
Raw PDF files S3 / GCS (Blob Store) Archival; enables re-ingestion without re-downloading
User data (profiles, wallets, ledger) Postgres Relational, RLS-protected
Ingestion state machine Postgres (ingestion_jobs) Transactional state with audit trail
Reservation state Postgres (reservations) Must be atomic with wallet balance
Cached rerank results Redis / In-memory Ephemeral; TTL 15 min
Cached chunk text Redis / In-memory LRU; TTL 1 hour

Canonical Chunk-Store Approach

┌──────────────┐      ┌──────────────────────┐
│   Pinecone   │      │      Postgres        │
│              │      │                      │
│  vector_id ──┼──────┼→ chunks.chunk_id     │
│  metadata:   │      │  chunks.content      │
│   chunk_id   │      │  chunks.file_id      │
│   file_id    │      │  chunks.page_number  │
│   language   │      │  chunks.token_count  │
│   grade      │      │                      │
│   subject    │      └──────────────────────┘
│   page_number│
│   ingestion_ts│
└──────────────┘

At retrieval time: 1. Query Pinecone → get chunk_id list. 2. Fetch chunk text from cache (Redis/LRU) → on miss, query Postgres chunks table. 3. Pass text to reranker and reasoning model.


4. Postgres Schema Additions

ingestion_jobs

CREATE TABLE IF NOT EXISTS ingestion_jobs (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    reference_id    UUID NOT NULL REFERENCES references(id),
    file_id         UUID,                                       -- FK to documents table if applicable
    status          TEXT NOT NULL DEFAULT 'queued'
                    CHECK (status IN ('queued','parsing','tokenizing',
                           'embedding_request_sent','embedding_upserted',
                           'ready','failed')),
    chunks_created  INT DEFAULT 0,                              -- count of chunks produced
    vectors_upserted INT DEFAULT 0,                             -- count of vectors sent to Pinecone
    retry_count     INT DEFAULT 0,                              -- current retry attempt
    max_retries     INT DEFAULT 3,
    error_message   TEXT,                                       -- last error (nullable)
    created_at      TIMESTAMPTZ DEFAULT now(),
    updated_at      TIMESTAMPTZ DEFAULT now()
);

-- Index for status polling
CREATE INDEX IF NOT EXISTS idx_ingestion_jobs_status ON ingestion_jobs(status);
-- Index for reference lookup
CREATE INDEX IF NOT EXISTS idx_ingestion_jobs_reference ON ingestion_jobs(reference_id);

chunks (enhanced)

-- If chunks table already exists, ALTER; otherwise CREATE.
-- This shows the target schema.
CREATE TABLE IF NOT EXISTS chunks (
    chunk_id        TEXT PRIMARY KEY,                           -- sha256(file_id:page:chunk_index)
    file_id         UUID NOT NULL REFERENCES documents(id),
    page_number     INT NOT NULL,
    chunk_index     INT NOT NULL,                              -- position within the page
    content         TEXT NOT NULL,                              -- full chunk text
    token_count     INT NOT NULL,                              -- token count (tiktoken cl100k_base)
    language        TEXT NOT NULL DEFAULT 'fr',                 -- 'fr', 'ar', 'ha' (Hassaniya)
    embedding_model TEXT NOT NULL DEFAULT 'text-embedding-3-small',
    ingestion_job_id UUID REFERENCES ingestion_jobs(id),
    created_at      TIMESTAMPTZ DEFAULT now()
);

-- Composite index for idempotency check
CREATE UNIQUE INDEX IF NOT EXISTS idx_chunks_deterministic
    ON chunks(file_id, page_number, chunk_index);

reservations

CREATE TABLE IF NOT EXISTS reservations (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id         UUID NOT NULL REFERENCES auth.users(id),
    estimated       INT NOT NULL,                              -- estimated token cost reserved
    actual          INT,                                       -- actual token cost (set on finalize)
    status          TEXT NOT NULL DEFAULT 'reserved'
                    CHECK (status IN ('reserved','finalized','expired','refunded')),
    request_id      UUID,                                      -- links to usage_logs
    created_at      TIMESTAMPTZ DEFAULT now(),
    finalized_at    TIMESTAMPTZ,
    expires_at      TIMESTAMPTZ DEFAULT now() + INTERVAL '5 minutes'
);

CREATE INDEX IF NOT EXISTS idx_reservations_user ON reservations(user_id);
CREATE INDEX IF NOT EXISTS idx_reservations_status ON reservations(status)
    WHERE status = 'reserved';                                 -- partial index for expiry job

wallet_ledger (enhanced — add reservation_id)

-- ALTER existing table
ALTER TABLE wallet_ledger
    ADD COLUMN IF NOT EXISTS reservation_id UUID REFERENCES reservations(id);

embedding_refs

CREATE TABLE IF NOT EXISTS embedding_refs (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    chunk_id        TEXT NOT NULL REFERENCES chunks(chunk_id),
    pinecone_vector_id TEXT NOT NULL,                           -- the ID used in Pinecone
    pinecone_namespace TEXT NOT NULL,                           -- e.g. grade-12-math
    embedding_model TEXT NOT NULL DEFAULT 'text-embedding-3-small',
    upserted_at     TIMESTAMPTZ DEFAULT now()
);

CREATE UNIQUE INDEX IF NOT EXISTS idx_embedding_refs_vector
    ON embedding_refs(pinecone_vector_id, pinecone_namespace);

ingestion_audit

CREATE TABLE IF NOT EXISTS ingestion_audit (
    id              BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    ingestion_job_id UUID NOT NULL REFERENCES ingestion_jobs(id),
    from_status     TEXT,
    to_status       TEXT NOT NULL,
    message         TEXT,                                      -- error detail or info
    created_at      TIMESTAMPTZ DEFAULT now()
);

CREATE INDEX IF NOT EXISTS idx_ingestion_audit_job
    ON ingestion_audit(ingestion_job_id);

references enhancements

ALTER TABLE references
    ADD COLUMN IF NOT EXISTS content_fingerprint BIGINT,        -- SimHash for dedupe
    ADD COLUMN IF NOT EXISTS canonical_id UUID REFERENCES references(id),
    ADD COLUMN IF NOT EXISTS last_checked_at TIMESTAMPTZ,
    ADD COLUMN IF NOT EXISTS ocr_confidence REAL;               -- 0.0–1.0

5. Ingestion Pipeline

Deterministic Chunk ID

chunk_id = sha256( file_id + ":" + page_number + ":" + chunk_index )
  • file_id: UUID from the documents table.
  • page_number: 0-indexed page from PDF extraction.
  • chunk_index: 0-indexed position of the chunk within that page.

This ensures that re-ingesting the same file with the same parser produces identical chunk IDs → Pinecone upserts are idempotent (overwrite, no duplicates).

Token-Based Chunking Strategy

Language Tokenizer Chunk Size Overlap Notes
French tiktoken / cl100k_base 512 tokens 64 tokens Standard Latin-script tokenization
Arabic (MSA) tiktoken / cl100k_base 384 tokens 48 tokens Arabic tokenizes at ~1.5× expansion; smaller chunks maintain quality
Hassaniya tiktoken / cl100k_base 384 tokens 48 tokens Treated as Arabic-script; same tokenizer with cultural localization

Ingestion Job State Machine

stateDiagram-v2
    [*] --> queued : POST /ingestion/jobs
    queued --> parsing : Worker picks up job
    parsing --> tokenizing : Text extracted successfully
    parsing --> failed : PDF corrupt / download error
    tokenizing --> embedding_request_sent : Chunks created, embeddings requested
    tokenizing --> failed : Tokenizer error
    embedding_request_sent --> embedding_upserted : OpenAI returns embeddings
    embedding_request_sent --> embedding_request_sent : Transient error (retry ≤ 3)
    embedding_request_sent --> failed : Max retries exceeded
    embedding_upserted --> ready : Pinecone upsert confirmed
    embedding_upserted --> failed : Pinecone upsert error (after retries)
    failed --> queued : Manual retry via admin API
    ready --> [*]

Retry Semantics

  • Transient errors (HTTP 429, 500, 503 from OpenAI/Pinecone): Retry with exponential backoff (1 s, 4 s, 16 s). Max 3 retries.
  • Permanent errors (HTTP 400, invalid PDF): Move to failed immediately; no retry.
  • Idempotent upserts: Because chunk IDs and Pinecone vector IDs are deterministic, a retry that re-sends the same vectors is safe.

6. Pinecone Index & Metadata Schema

Index Configuration

Property Value
Index name curriculum-1536
Dimensions 1536
Metric cosine
Cloud Serverless (AWS or GCP)

Namespace Strategy

Format: grade-{grade}-{subject} (e.g., grade-12-math, grade-10-physics). Default namespace: default (for unclassified content).

Vector ID Format

vector_id = chunk_id   (i.e., the same sha256 hash)

Metadata Fields per Vector

{
  "chunk_id": "a1b2c3...",
  "file_id": "uuid-...",
  "language": "fr",
  "grade": "12",
  "subject": "math",
  "source_url": "https://koutoubi.mr/...",
  "page_number": 5,
  "ingestion_ts": "2026-02-17T10:30:00Z"
}

Note: text is NOT stored in Pinecone metadata. Full text lives in Postgres chunks.content.

For query(filter=...): - language — prefilter to corpus language. - grade — scope to student's grade level. - subject — scope to the subject being studied. - file_id — useful for admin queries ("show all vectors from this document").


7. Retrieval → Rerank → Reasoning Pipeline

flowchart LR
    Q[User Query] --> LD[Language Detect<br/>GPT-mini]
    LD --> TR{Translation<br/>needed?}
    TR -->|Yes| TRANS[Translate to<br/>corpus language]
    TR -->|No| DR
    TRANS --> DR[Dense Retrieval<br/>Pinecone top-K]
    DR --> LP{Lexical<br/>prefilter?}
    LP -->|Arabic query| BM[BM25 Keyword<br/>Filter]
    LP -->|No| RR
    BM --> RR[Rerank<br/>GPT-mini top-N]
    RR --> CC[Cache Check<br/>sha256 query+ns+tier]
    CC -->|Hit| RES
    CC -->|Miss| RR2[Call GPT-mini<br/>reranker]
    RR2 --> CS[Cache Store<br/>TTL 15 min]
    CS --> RES[Fetch Chunk Text<br/>Cache → Postgres]
    RES --> GEN[Reasoning<br/>GPT-4o + context]
    GEN --> SSE[Stream SSE<br/>to client]

Cost Policy by Tier

Tier top-K (Dense) Rerank-N Reranker Cache TTL
Free 10 3 gpt-4o-mini 15 min
Standard 20 5 gpt-4o-mini 15 min
Premium 30 8 gpt-4o-mini 15 min

Caching Behavior

Cache Key TTL Invalidation
Rerank results sha256(query + namespace + tier) 15 min On re-ingestion of any file in the namespace
Chunk text chunk_id 1 hour On re-ingestion (chunk_id changes if content changes)

8. API Contract

8.1 Auth Endpoints

POST /auth/signup

Delegates to Supabase Auth. The backend creates a profiles row and initializes a wallet via the handle_new_user trigger.

Field Value
Auth None (public)
Request { "email": "student@example.mr", "password": "...", "metadata": { "full_name": "Ahmed" } }
Response 201 { "user_id": "uuid", "email": "...", "role": "student" }
Error 400 { "error": "email_already_registered" }

POST /auth/login

Delegates to Supabase Auth, returns JWT.

Field Value
Auth None (public)
Request { "email": "...", "password": "..." }
Response 200 { "access_token": "jwt...", "refresh_token": "...", "expires_in": 3600 }
Error 401 { "error": "invalid_credentials" }

8.2 File Upload

POST /upload/file

Field Value
Auth Bearer JWT (admin role)
Request { "filename": "math_12.pdf", "content_type": "application/pdf", "grade": "12", "subject": "math", "language": "fr" }
Response 200 { "upload_url": "https://s3.../presigned...", "file_id": "uuid", "expires_in": 300 }
Error 403 { "error": "admin_required" }
Error 400 { "error": "invalid_file_type", "allowed": ["application/pdf"] }

The client uploads directly to the presigned URL. After upload, the client calls POST /ingestion/jobs.


8.3 Ingestion

POST /ingestion/jobs

Field Value
Auth Bearer JWT (admin role)
Request { "reference_id": "uuid", "force": false }
Response 202 { "job_id": "uuid", "status": "queued" }
Error 409 { "error": "ingestion_already_in_progress" }
Error 404 { "error": "reference_not_found" }

force: true re-ingests even if the reference is already ready.

GET /ingestion/jobs/{id}

Field Value
Auth Bearer JWT (admin role)
Response 200 { "job_id": "uuid", "status": "embedding_upserted", "chunks_created": 42, "vectors_upserted": 42, "retry_count": 0, "created_at": "...", "updated_at": "..." }
Error 404 { "error": "job_not_found" }

8.4 Chat / Ask

POST /ask

Field Value
Auth Bearer JWT (student/teacher/admin)
Request { "question": "ما هي الترجمة في الرياضيات؟", "grade": "12", "subject": "math", "language": "ar", "stream": true }
Response 200 (stream) text/event-stream with SSE events: data: {"token": "...", "type": "content"} ... data: {"type": "done", "sources": [...], "tokens_used": 12}
Response 200 (JSON) { "answer": "...", "sources": [{"page": 45, "file": "math_12.pdf", "snippet": "..."}], "tokens_used": 12, "reservation_id": "uuid" }
Error 402 { "error": "insufficient_balance", "balance": 3, "estimated_cost": 5 }
Error 503 { "error": "service_unavailable", "reason": "llm_circuit_open" }

Internal flow: 1. Pre-validate input (GPT-mini: safety check, language detect). 2. Reserve tokens (POST /wallet/reserve internally). 3. Retrieve from Pinecone (dense search). 4. Optionally rerank (GPT-mini). 5. Generate answer (GPT-4o via LangGraph). 6. Finalize reservation with actual token usage.


8.5 Quiz Generation

POST /quizzes/generate

Field Value
Auth Bearer JWT (student/teacher/admin)
Request { "grade": "12", "subject": "math", "topic": "translations", "num_questions": 5, "language": "fr" }
Response 200 { "quiz_id": "uuid", "questions": [{ "q": "...", "options": ["A","B","C","D"], "correct": "B", "explanation": "...", "source_page": 45 }], "tokens_used": 20 }
Error 402 { "error": "insufficient_balance" }

8.6 Wallet / Billing

POST /wallet/reserve

Field Value
Auth Internal (service-to-service; not exposed publicly)
Request { "user_id": "uuid", "estimated": 10, "request_id": "uuid" }
Response 200 { "reservation_id": "uuid", "balance_after_reserve": 40 }
Error 402 { "error": "insufficient_balance", "balance": 3, "estimated": 10 }

POST /wallet/finalize

Field Value
Auth Internal (service-to-service)
Request { "reservation_id": "uuid", "actual": 8 }
Response 200 { "reservation_id": "uuid", "status": "finalized", "refunded": 2, "balance_after": 42 }
Error 404 { "error": "reservation_not_found" }
Error 409 { "error": "reservation_already_finalized" }

GET /wallet/balance

Field Value
Auth Bearer JWT
Response 200 { "user_id": "uuid", "token_balance": 50, "subscription_tier": "free", "pending_reservations": 0 }

GET /search/semantic

Field Value
Auth Bearer JWT
Query params ?q=translation&grade=12&subject=math&language=fr&limit=5
Response 200 { "results": [{ "chunk_id": "...", "text": "...", "score": 0.92, "page": 45, "source": "math_12.pdf" }] }

8.8 Admin Endpoints

POST /admin/scraping/{source}/sync

Field Value
Auth Bearer JWT (admin role)
Response 200 { "run_id": "uuid", "status": "success", "found": 15, "new": 3, "duplicates": 2, "errors": 0 }

POST /admin/reindex

Field Value
Auth Bearer JWT (admin role)
Request { "namespace": "grade-12-math", "reason": "model_upgrade" } (omit namespace to reindex all)
Response 202 { "reindex_job_id": "uuid", "status": "queued", "estimated_chunks": 1200 }

PATCH /admin/users/{user_id}/role

Field Value
Auth Bearer JWT (admin role)
Request { "role": "teacher" }
Response 200 { "user_id": "uuid", "role": "teacher" }
Error 400 { "error": "invalid_role", "allowed": ["student","teacher","admin"] }

Error Code Summary

Every error response includes request_id for correlation:

{ "error": "<code>", "request_id": "uuid-...", ... }
HTTP Code Meaning
400 bad_request Invalid input
401 unauthorized Missing or invalid JWT
402 insufficient_balance Wallet balance too low
403 forbidden Role not authorized
404 not_found Resource does not exist
409 conflict Duplicate or already-in-progress
429 rate_limited Too many requests (includes retry_after seconds)
503 service_unavailable LLM or Pinecone circuit open

9. RLS & Auth Plan

Current State

  • RLS Phase 2 complete: all public tables have RLS enabled.
  • Admin auth uses user_metadata.role (not custom claims yet).
  • x-admin-key still accepted (deprecated).

Target State

  • Roles in JWT custom claims via Postgres hook (app_metadata.role).
  • x-admin-key removed entirely.
  • New tables (ingestion_jobs, reservations, embedding_refs, ingestion_audit) have RLS.

RLS Policy Templates

User-facing tables (profiles, wallet, wallet_ledger, usage_logs, reservations)

-- Users can SELECT their own rows
CREATE POLICY "user_select_own" ON {table}
    FOR SELECT
    USING (auth.uid() = user_id);

-- No INSERT/UPDATE/DELETE via public API
-- (service_role bypasses RLS for backend operations)

System tables (ingestion_jobs, chunks, embedding_refs, ingestion_audit, documents)

ALTER TABLE {table} ENABLE ROW LEVEL SECURITY;

-- Only service_role can access
CREATE POLICY "service_role_only" ON {table}
    FOR ALL
    USING (auth.role() = 'service_role');

Admin tables (references, scrape_runs)

-- Admin can SELECT and INSERT/UPDATE
CREATE POLICY "admin_read_write" ON {table}
    FOR ALL
    USING (
        auth.role() = 'service_role'
        OR (auth.jwt() -> 'app_metadata' ->> 'role') = 'admin'
    );

JWT Custom Claims Migration Checklist

  1. Create Postgres hook function:

    CREATE OR REPLACE FUNCTION public.custom_access_token_hook(event jsonb)
    RETURNS jsonb LANGUAGE plpgsql STABLE AS $$
    DECLARE
        claims jsonb;
        user_role TEXT;
    BEGIN
        SELECT role INTO user_role FROM public.profiles
            WHERE user_id = (event->>'user_id')::uuid;
    
        claims := event->'claims';
        IF user_role IS NOT NULL THEN
            claims := jsonb_set(claims, '{app_metadata,role}', to_jsonb(user_role));
        ELSE
            claims := jsonb_set(claims, '{app_metadata,role}', '"student"');
        END IF;
    
        event := jsonb_set(event, '{claims}', claims);
        RETURN event;
    END;
    $$;
    
    -- Grant necessary permissions
    GRANT USAGE ON SCHEMA public TO supabase_auth_admin;
    GRANT EXECUTE ON FUNCTION public.custom_access_token_hook TO supabase_auth_admin;
    REVOKE EXECUTE ON FUNCTION public.custom_access_token_hook FROM authenticated, anon, public;
    GRANT SELECT ON TABLE public.profiles TO supabase_auth_admin;
    

  2. Register in Supabase Dashboard: Authentication → Hooks → "Customize Access Token" → select custom_access_token_hook.

  3. Update RLS policies: Change (auth.jwt() ->> 'role') to (auth.jwt() -> 'app_metadata' ->> 'role') in all admin policies.

  4. Update FastAPI auth dependency:

    # In app/core/auth.py — get_current_admin
    # Read role from: jwt_payload["app_metadata"]["role"]
    # Instead of: jwt_payload["user_metadata"]["role"]
    

  5. Test with canary user: Create a test admin, verify JWT contains app_metadata.role = "admin", verify all admin endpoints accept the new token.

  6. Remove x-admin-key support: Delete the x-admin-key header check from all routers. Update .env.example to remove ADMIN_API_KEY.

  7. Rollback procedure:

  8. If hook fails: Disable the hook in Supabase Dashboard. JWTs revert to default claims.
  9. Keep user_metadata.role as fallback in get_current_admin for 2 weeks after migration.
  10. Monitor auth error rates; if > 1% increase, rollback.

10. Billing & Wallet (Reservation Pattern)

Reservation Flow

sequenceDiagram
    participant C as Client
    participant API as FastAPI
    participant W as Wallet Service
    participant PG as Postgres
    participant LLM as OpenAI

    C->>API: POST /ask {question}
    API->>W: reserve(user_id, estimated=10)
    W->>PG: BEGIN TX: deduct estimated from wallet, insert reservation
    PG-->>W: reservation_id
    W-->>API: reservation_id, balance_after=40

    API->>LLM: Retrieve + Rerank + Generate
    LLM-->>API: answer (actual_tokens=8)

    API->>W: finalize(reservation_id, actual=8)
    W->>PG: BEGIN TX: update reservation, refund delta (2), insert ledger
    PG-->>W: OK
    W-->>API: finalized, refunded=2

    API-->>C: SSE stream + tokens_used=8

DB Transaction — Reserve

BEGIN;
    -- Deduct estimated amount from wallet
    UPDATE wallet
    SET token_balance = token_balance - :estimated,
        updated_at = now()
    WHERE user_id = :uid
      AND token_balance >= :estimated;
    -- If no row updated → insufficient balance → ROLLBACK

    -- Create reservation record
    INSERT INTO reservations (user_id, estimated, status, request_id, created_at, expires_at)
    VALUES (:uid, :estimated, 'reserved', :request_id, now(), now() + INTERVAL '5 minutes')
    RETURNING id;
COMMIT;

DB Transaction — Finalize

BEGIN;
    -- Mark reservation finalized
    UPDATE reservations
    SET actual = :actual,
        status = 'finalized',
        finalized_at = now()
    WHERE id = :reservation_id
      AND status = 'reserved';
    -- If no row updated → already finalized or expired → ROLLBACK

    -- Refund delta if actual < estimated
    UPDATE wallet
    SET token_balance = token_balance + GREATEST(:estimated - :actual, 0),
        updated_at = now()
    WHERE user_id = :uid;

    -- Record in ledger
    INSERT INTO wallet_ledger (user_id, delta, reason, request_id, reservation_id)
    VALUES (:uid, -:actual, 'agent_chat', :request_id, :reservation_id);
COMMIT;

Expiry Job (Background)

Runs every 60 seconds:

-- Find expired, un-finalized reservations
UPDATE reservations
SET status = 'expired'
WHERE status = 'reserved'
  AND expires_at < now()
RETURNING user_id, estimated;

-- For each expired reservation, refund the wallet
UPDATE wallet
SET token_balance = token_balance + :estimated
WHERE user_id = :uid;

INSERT INTO wallet_ledger (user_id, delta, reason, reservation_id)
VALUES (:uid, :estimated, 'reservation_expired', :reservation_id);

Reconciliation (Nightly)

-- Compare ledger sum vs wallet balance
SELECT
    w.user_id,
    w.token_balance AS current_balance,
    COALESCE(SUM(wl.delta), 0) AS ledger_sum,
    w.token_balance - COALESCE(SUM(wl.delta), 0) AS discrepancy
FROM wallet w
LEFT JOIN wallet_ledger wl ON w.user_id = wl.user_id
GROUP BY w.user_id, w.token_balance
HAVING w.token_balance != COALESCE(SUM(wl.delta), 0);

Flag any discrepancies > 0 as alerts. Do not auto-correct; require manual investigation.


11. Scraper & Canonicalization

Pipeline (Fully Automated)

flowchart TD
    SC[Scraper Fetches Sitemap] --> DL[Download PDF]
    DL --> NORM[Canonicalize Text]
    NORM --> FP[Compute SimHash Fingerprint]
    FP --> DD{Hamming Distance ≤ 3<br/>from existing?}
    DD -->|Yes| DUP[Mark as duplicate<br/>link canonical_id]
    DD -->|No| QC{Quality Check}
    QC -->|Pass| STORE[Insert into references<br/>status: discovered]
    QC -->|Fail| LOG[Log to ingestion_audit<br/>reason: quality_failed]
    DUP --> DONE[Done]
    STORE --> DONE
    LOG --> DONE

Canonicalization Steps

  1. Whitespace normalization: Collapse multiple spaces, tabs, newlines to single space. Trim leading/trailing.
  2. Arabic script normalization:
  3. Unify alef variants: أ إ آ اا
  4. Remove tatweel (kashida): ـ → (empty)
  5. Normalize taa marbuta: ةه (context-dependent, configurable)
  6. Normalize hamza: ؤ ئو ي + hamza (optional, configurable)
  7. Boilerplate removal: Per-source regex patterns (configurable in scraper_config.json):
  8. Remove page headers/footers matching known patterns (e.g., "Page X of Y", site watermarks).
  9. Non-content page filtering: Skip pages with < 50 characters after normalization.

Deduplication

  • Algorithm: SimHash (64-bit) on the normalized full text of the PDF.
  • Threshold: Hamming distance ≤ 3 → considered duplicate.
  • Storage: references.content_fingerprint stores the SimHash value.
  • Canonical reference: references.canonical_id (self-FK) points to the first-discovered version.

Provenance Metadata

Every references row contains:

Field Purpose
source_url Canonical URL (after redirect resolution)
discovered_at First time scraper found this PDF
last_checked_at Last time scraper verified URL is live
content_fingerprint SimHash for deduplication
scrape_run_id Which scrape run discovered it
canonical_id Points to canonical (non-duplicate) reference

Content Quality Heuristics

Check Threshold Action
Minimum text length (per page) ≥ 200 chars after normalization Skip page, log reason
OCR confidence (Arabic/Hassaniya) ≥ 0.70 Flag for review if below
OCR confidence (French) ≥ 0.80 Flag for review if below
Encoding Valid UTF-8 Reject and log
File size ≤ 100 MB Reject oversized files

12. Observability & Resiliency

Metrics to Emit

Metric Type Labels Purpose
ingestion_job_duration_seconds Histogram status, language Track ingestion performance
ingestion_job_status_total Counter status Track job outcomes
pinecone_query_duration_seconds Histogram namespace Vector search latency
pinecone_upsert_duration_seconds Histogram namespace Upsert latency
openai_request_duration_seconds Histogram model, endpoint LLM call latency
openai_tokens_used_total Counter model, type (input/output) Cost tracking
wallet_reservation_total Counter status (reserved/finalized/expired) Billing flow health
wallet_balance_discrepancy Gauge Reconciliation drift
circuit_breaker_state Gauge service (openai/pinecone) 0=closed, 1=open, 2=half-open
http_request_duration_seconds Histogram method, path, status API latency
rerank_cache_hit_ratio Gauge Cache effectiveness
active_reservations Gauge Currently reserved, un-finalized
rate_limit_rejected_total Counter scope (user/ip), path Rate-limit enforcement activity
request_id_propagation All log lines, wallet rows, usage_logs, and error responses include request_id

Circuit Breaker Configuration

Service Failure threshold Window Recovery timeout Fallback
OpenAI Embeddings 3 failures 60 s 120 s Queue job for later retry
OpenAI gpt-4o-mini (rerank) 3 failures 60 s 120 s Skip reranking; use dense order
OpenAI gpt-4o (reasoning) 3 failures 60 s 120 s Return 503 to client
Pinecone (query) 3 failures 60 s 120 s Return 503 to client
Pinecone (upsert) 3 failures 60 s 120 s Queue for retry

Alert Thresholds

Alert Condition Severity
High ingestion failure rate > 20% of jobs in failed state in last hour Critical
Wallet discrepancy detected Any non-zero discrepancy in reconciliation Warning
Circuit breaker opened Any circuit breaker transitions to open Critical
High reservation expiry rate > 10% of reservations expiring (not finalized) in last hour Warning
OpenAI latency spike p99 > 30 seconds for any model Warning
Pinecone latency spike p99 > 5 seconds Warning
Stale reservations > 50 reservations in reserved status older than 5 min Warning
Single user rate-limited repeatedly Same user_id rate-limited > 20 times in 5 min Warning (potential abuse)

Reindex & Disaster Recovery

Reindex Strategy

  1. Export all canonical chunks from Postgres chunks table.
  2. Re-embed using the new model.
  3. Upsert to a new Pinecone namespace (e.g., grade-12-math-v2).
  4. Swap the active namespace in config once verification passes.
  5. Delete the old namespace.

Disaster Recovery

  • Blob store: Raw PDFs archived in S3/GCS. Can re-ingest from scratch.
  • Postgres: Supabase provides automatic daily backups + point-in-time recovery.
  • Pinecone: If Pinecone data is lost, re-embed from Postgres chunks table (canonical chunks are the source of truth).
  • Export schedule: Weekly export of chunks table to blob store as Parquet/CSV for offline recovery.

13. Security

Secrets Management

Secret Current Target
OPENAI_API_KEY .env file Cloud Secret Manager (GCP/AWS/Azure) or Vault
PINECONE_API_KEY .env file Cloud Secret Manager
SUPABASE_SERVICE_KEY .env file Cloud Secret Manager
ADMIN_API_KEY .env file Remove entirely (replace with JWT admin role)

Rules: - In production, secrets MUST NOT be stored in plain environment variables or .env files. - Use the deployment platform's secret injection (e.g., Render's Environment Groups, GCP Secret Manager, AWS Secrets Manager). - Rotate keys quarterly. Automate rotation where possible. - SUPABASE_SERVICE_KEY should only be available to the backend service, never to the frontend.

PII Redaction

  • Before sending any user data to OpenAI (chat, rerank, quiz generation):
  • Strip email addresses (regex: \S+@\S+\.\S+).
  • Strip phone numbers (regex: Mauritanian format +222...).
  • Do NOT send user_id or wallet balance to OpenAI.
  • Log redacted versions in usage_logs.

Audit Logging

Event Table Fields
User login Supabase Auth logs (built-in) timestamp, user_id, IP
Admin action (role change, sync, reindex) ingestion_audit or dedicated admin_audit admin_user_id, action, target, timestamp
Wallet mutation wallet_ledger user_id, delta, reason, request_id, reservation_id
Ingestion state change ingestion_audit job_id, from_status, to_status, message
RLS policy violation Postgres logs query, user, table, policy

14. Testing Matrix

RLS Tests

# Test Case Table User Operation Expected
T1 Student reads own profile profiles authenticated (student) SELECT WHERE user_id = self Allowed
T2 Student reads other profile profiles authenticated (student) SELECT WHERE user_id = other Denied (0 rows)
T3 Student reads own wallet wallet authenticated (student) SELECT WHERE user_id = self Allowed
T4 Student updates own wallet wallet authenticated (student) UPDATE Denied
T5 Anonymous reads profiles profiles anon SELECT Denied (0 rows)
T6 Service role reads all wallet service_role SELECT Allowed (all rows)
T7 Admin reads references references authenticated (admin) SELECT Allowed
T8 Student reads references references authenticated (student) SELECT Denied (0 rows)
T9 Student reads ingestion_jobs ingestion_jobs authenticated (student) SELECT Denied (0 rows)
T10 Student reads own reservations reservations authenticated (student) SELECT WHERE user_id = self Allowed
T11 Student reads other reservations reservations authenticated (student) SELECT WHERE user_id = other Denied (0 rows)

Ingestion Idempotency Tests

# Test Case Expected
T12 Ingest same file twice Second run produces same chunk_ids; Pinecone vector count unchanged; no duplicate chunks in Postgres
T13 Ingest file, modify content, re-ingest New chunk_ids generated; old vectors replaced; old chunks marked stale
T14 Ingestion fails mid-embedding Job status = failed; partial chunks cleaned up; re-trigger starts fresh
T15 Concurrent ingestion of same reference Second job returns 409 conflict

Reservation / Billing Tests

# Test Case Expected
T16 Reserve with sufficient balance Balance decremented; reservation created with status reserved
T17 Reserve with insufficient balance 402 error; balance unchanged; no reservation created
T18 Finalize with actual < estimated Delta refunded to wallet; ledger entry = -actual
T19 Finalize with actual > estimated (capped) Additional deduction from wallet (capped at 2× estimate); ledger entry = -actual
T20 Reservation expires (not finalized) Expiry job refunds estimated amount; reservation status = expired
T21 Double-finalize same reservation Second call returns 409; no double-deduction
T22 Reconciliation detects discrepancy Alert fired; no auto-correction

Rerank & Caching Tests

# Test Case Expected
T23 Same query within TTL Second call hits cache; no GPT-mini call
T24 Same query after TTL expires Cache miss; GPT-mini called; new cache entry
T25 Re-ingestion invalidates cache After re-ingestion of file in namespace, cached reranks for that namespace are evicted

Circuit Breaker Tests

# Test Case Expected
T26 OpenAI returns 500 three times Circuit opens; rerank calls skip to dense order; circuit resets after 120 s
T27 Pinecone times out during ingestion Job retries (up to 3); if all fail, circuit opens; ingestion jobs queued

Rate Limiting & Request ID Tests

# Test Case Expected
T28 Student sends 11 /ask requests in 1 minute First 10 succeed; 11th returns 429 with retry_after and request_id
T29 Unauthenticated IP sends 6 /auth/login in 1 minute First 5 succeed; 6th returns 429
T30 Admin sends 61 /admin/* requests in 1 minute First 60 succeed; 61st returns 429
T31 Every successful /ask response includes request_id request_id present in SSE done event and in JSON response
T32 Every error response includes request_id 400, 401, 402, 403, 429, 503 responses all contain request_id field
T33 request_id propagates to reservations and wallet_ledger After a chat, reservations.request_id and wallet_ledger.request_id match the API response's request_id
T34 Client-provided X-Request-ID is adopted If client sends X-Request-ID: custom-uuid, the response and logs use that same ID

Integration / E2E Tests

# Test Case Expected
T35 Full chat flow (reserve → retrieve → rerank → answer → finalize) Correct answer returned; wallet balance = original - actual; ledger entry exists; all rows share same request_id
T36 Arabic query against French corpus Translation occurs; relevant French chunks retrieved; answer in Arabic
T37 Admin triggers scrape → ingest → search New references discovered; ingestion completes; semantic search returns results from new content

15. Sonnet Task List (Implementation Status)

Priority 1: Correctness & Data Integrity

# Title Implementation Status Files Created Testing Status
S1 Deterministic chunk IDs COMPLETE app/services/chunking.py, migration 13 ⏳ Needs integration testing
S2 Ingestion jobs state machine COMPLETE app/services/ingestion.py, migration 12 ✅ Wired to admin router
S3 Reservation-based billing COMPLETE & TESTED app/services/wallet_reservation.py, migration 14, scripts/expire_reservations.py ✅ Working in wallet router
S4 Lightweight Pinecone metadata COMPLETE app/services/pinecone_adapter.py ⏳ Needs retrieval testing
S5 Embedding refs tracking COMPLETE app/services/embedding_service.py, migration 15 ⏳ Needs ingestion testing
S21 Presigned upload service COMPLETE app/services/upload.py ⏳ Needs router implementation

Priority 2: Security & RLS Hardening

# Title Implementation Status Files Created Testing Status
S6 JWT custom claims hook COMPLETE Migration 18, app/core/auth.py updated ⏳ Hook needs manual registration in Dashboard
S7 Remove x-admin-key support DEPRECATED (warnings added) app/core/auth.py updated ✅ Logs deprecation warnings
S8 RLS for new tables COMPLETE Migration 16 ⏳ Needs database migration run
S9 Secrets management PARTIAL (config updated) app/core/config.py, .env.example ⏳ Production deployment needed
S9b Request-ID + rate limiting COMPLETE app/core/middleware.py ⏳ Needs production testing

Priority 3: Caching & Cost Control

# Title Implementation Status Files Created Testing Status
S10 Rerank result caching COMPLETE app/services/cache.py ⏳ Needs retrieval pipeline testing
S11 Chunk text cache COMPLETE app/services/cache.py ⏳ Needs retrieval pipeline testing
S12 Tier-based retrieval limits COMPLETE app/services/tier_config.py ⏳ Needs retrieval pipeline testing

Priority 4: Scraper Hardening

# Title Implementation Status Files Created Testing Status
S13 SimHash deduplication COMPLETE app/services/deduplication.py, migration 17 ⏳ Needs scraper integration testing
S14 Arabic canonicalization COMPLETE app/services/text_normalizer.py ⏳ Needs scraper testing
S15 Quality heuristics COMPLETE app/services/quality_checker.py ⏳ Needs scraper testing

Priority 5: Observability & DR

# Title Implementation Status Files Created Testing Status
S16 Circuit breaker COMPLETE app/services/circuit_breaker.py ⏳ Needs failure simulation testing
S17 Structured logging & metrics COMPLETE app/core/logging.py, app/core/metrics.py, app/api/routers/metrics.py ✅ Metrics endpoints working
S18 Wallet reconciliation job COMPLETE scripts/reconcile_wallets.py ⏳ Ready to run (needs cron setup)
S19 Reindex & DR export COMPLETE scripts/reindex.py, scripts/export_chunks.py ⏳ Ready to run (needs cron setup)

Priority 6: API & Integration

# Title Implementation Status Files Created Testing Status
S20 GPT-mini service COMPLETE app/services/gpt_mini.py ⏳ Needs retrieval pipeline testing
S21 Presigned upload endpoint COMPLETE app/services/upload.py ⏳ Needs router implementation
S22 Quiz generation COMPLETE app/services/quiz_generator.py, app/api/routers/quiz.py ⏳ Needs dependency injection + testing

Integration & Fixes (Post-Sonnet)

# Title Implementation Status Files Created/Modified Testing Status
I1 Dependency injection pattern COMPLETE & WORKING app/core/dependencies.py ✅ Used by wallet, admin routers
I2 Wallet router integration COMPLETE & WORKING app/api/routers/wallet.py (updated) ✅ Balance, reservations tested
I3 Admin router fixes COMPLETE & WORKING app/api/routers/admin.py (fixed) ✅ Users, roles tested
I4 Auth fixes COMPLETE & WORKING app/core/auth.py (fixed) ✅ JWT flow tested
I5 Chat router integration TODO app/api/routers/chat.py (stub exists) ⏳ Needs retrieval_pipeline wiring
I6 Quiz router integration TODO app/api/routers/quiz.py (stub exists) ⏳ Needs quiz_generator wiring
I7 Ingestion router creation TODO app/api/routers/ingestion.py (missing) ⏳ Needs creation
I8 Scraper router integration TODO app/api/routers/scraper_admin.py (stub exists) ⏳ Needs scraper_service wiring

Appendix: Prioritization Rules

The Sonnet task list is ordered by these rules:

  1. Correctness first (S1–S5): Fix ingestion idempotency, billing atomicity, and data integrity. Without these, the system produces duplicates and loses revenue.
  2. Security second (S6–S9): Harden auth and RLS. Without these, students can access admin data or bypass billing.
  3. Cost control third (S10–S12): Add caching and tier enforcement. Without these, the platform overspends on LLM calls.
  4. Scraper quality fourth (S13–S15): Add dedupe and canonicalization. Without these, the vector index contains duplicates and noise.
  5. Observability fifth (S16–S19): Add circuit breakers, metrics, and DR. Without these, outages go undetected and recovery is manual.
  6. New features last (S20–S22): GPT-mini service, file upload, quiz generation. These add value but depend on the foundation above.

16. Dependency Injection Implementation

Overview

A centralized service registry pattern was implemented to properly wire all services to routers.

File: app/core/dependencies.py Pattern: Singleton instances initialized at module level Benefit: Proper dependency injection, testable, no duplicate instances

Service Initialization Order

# 1. External Clients
openai_client = OpenAI(api_key=settings.OPENAI_API_KEY)
supabase_service = create_client(settings.SUPABASE_URL, settings.SUPABASE_SERVICE_ROLE_KEY)

# 2. Base Adapters
pinecone_adapter = PineconeAdapter(api_key=..., index_name=...)
cache_service = CacheService()

# 3. Core Services
embedding_service = EmbeddingService(openai_client, supabase_service, pinecone_adapter)
gpt_mini_service = GPTMiniService(openai_client)
wallet_service = WalletReservationService(supabase_service)
ingestion_service = IngestionService(supabase_service)

# 4. Pipelines
retrieval_pipeline = RetrievalPipeline(
    openai_client, supabase_service, pinecone_adapter,
    embedding_service, gpt_mini_service, cache_service
)

# 5. Feature Services
quiz_generator = QuizGeneratorService(openai_client, retrieval_pipeline)
scraper_service = ScraperService(supabase_service, text_normalizer, deduplication_service, quality_checker)

Usage in Routers

Working Example (wallet router):

from app.core.dependencies import wallet_service, supabase_service

@router.get("/balance")
async def get_balance(user: dict = Depends(get_current_user)):
    balance_data = wallet_service.get_balance(UUID(user["id"]))
    return WalletBalanceResponse(**balance_data)

See: docs/90_ops/dependency_injection.md for full documentation


17. Implementation Status & Next Steps

✅ What's Working (Tested on Personal Laptop)

  1. Auth & Profile: Signup, signin, JWT, profile management
  2. Wallet & Billing: Balance, reservations, reserve/finalize pattern
  3. Admin: User management, role updates
  4. Metrics: Prometheus and JSON endpoints

⏳ What's Ready But Needs Router Wiring

  1. Chat with RAG: RetrievalPipeline ready, chat router needs integration (2-3 hours)
  2. Quiz Generation: QuizGeneratorService ready, quiz router needs wiring (1 hour)
  3. PDF Ingestion: IngestionService ready, router needs creation (1-2 hours)
  4. Scraper Sync: ScraperService ready, scraper router needs wiring (1 hour)
  5. File Upload: UploadService ready, endpoint needs creation (30 min)

Estimated Remaining Work: 5-7 hours of router integration

🔧 Fixes Applied (Post-Sonnet Implementation)

Fix File Issue Resolution Status
Dependency injection app/core/dependencies.py Services not wired to routers Created singleton registry ✅ Working
Wallet router app/api/routers/wallet.py Endpoints were stubs Implemented actual logic ✅ Working
Admin router app/api/routers/admin.py Not using dependencies Imported singletons ✅ Working
Auth service app/core/auth.py Client pattern mismatch Fixed service client usage ✅ Working
Type hints Multiple services Missing imports Added List, Dict, Any imports ✅ Fixed

📖 Documentation Organization

All documentation moved to mkdocs structure: - docs/00_overview/ - High-level guides (architecture, start_here) - docs/20_runbooks/ - Operational guides (quick_start, deployment) - docs/30_design/ - Design docs (plan, RLS, auth, etc.) - docs/90_ops/ - Implementation guides (status, dependency injection, phase guides) - docs/Artifacts/ - Phase completion guides, checklists - docs/Postman/ - Postman testing guides

See: MkDocs navigation for searchable documentation


18. Testing Strategy (Updated)

Unit Tests (Ready)

  • tests/unit/test_chunking.py - Deterministic chunk ID tests
  • Additional unit tests can be created for each service

Integration Tests (Needs Router Completion)

  • Chat flow: reserve → retrieve → answer → finalize
  • Ingestion flow: upload → parse → chunk → embed → upsert
  • Scraper flow: sync → dedupe → quality check → insert

Postman Collection (Ready)

  • postman/collection_v2.json - 40+ endpoints
  • 10 testing workflows documented
  • Auto-capture of JWT, request-ID, job-ID, etc.

Background Jobs (Ready to Deploy)

  • Reservation expiry: Continuous systemd service
  • Wallet reconciliation: Daily cron at 2 AM
  • DR export: Weekly cron on Sunday 3 AM

This document reflects the actual implementation state as of 2026-02-17 after Sonnet implementation and integration fixes. For next steps, see docs/90_ops/implementation_status.md