BacMR Backend Architecture
Status: ✅ Phases A-F Implemented (Integration In Progress)
Audience: Developers, contributors, and operators
Last updated: 2026-02-17
Implementation Branch: feature/sonnet-impl-20260217-155229
🎯 Implementation Status Summary
| Phase |
Services |
Migrations |
API Endpoints |
Status |
| A - Core Schema |
✅ Complete |
✅ 12-17 |
⏳ 70% |
Services ready, some endpoints need wiring |
| B - Security |
✅ Complete |
✅ 18-19 |
✅ 95% |
Auth, wallet, admin working |
| C - Caching |
✅ Complete |
N/A |
N/A |
Internal services ready |
| D - Retrieval |
✅ Complete |
N/A |
⏳ 60% |
Services ready, endpoints need integration |
| E - Scraper |
✅ Complete |
N/A |
⏳ 60% |
Services ready, endpoints need integration |
| F - Observability |
✅ Complete |
N/A |
✅ 90% |
Metrics working, background jobs ready |
Overall: Services 95% ✅ | Migrations 100% ✅ | API Integration 70% ⏳
✅ Working End-to-End
- Auth (signup, signin, profile, JWT custom claims)
- Wallet (balance, reservations, reserve/finalize pattern)
- Admin (user management, role updates, admin access control)
- Metrics (Prometheus + JSON endpoints)
⏳ Services Ready, Endpoints Need Integration
- Chat with RAG retrieval (RetrievalPipeline ready, router needs wiring)
- Quiz generation (QuizGeneratorService ready, router needs wiring)
- PDF ingestion (IngestionService ready, router needs creation)
- Scraper sync (ScraperService ready, router needs wiring)
See: docs/90_ops/implementation_status.md for detailed status
See: docs/90_ops/dependency_injection.md for service wiring pattern
Table of Contents
- High-Level Architecture Diagram (includes Request ID Propagation & Rate Limiting)
- Service Responsibility List
- Storage Responsibilities
- Postgres Schema Additions
- Ingestion Pipeline
- Pinecone Index & Metadata Schema
- Retrieval → Rerank → Reasoning Pipeline
- API Contract
- RLS & Auth Plan
- Billing & Wallet (Reservation Pattern)
- Scraper & Canonicalization
- Observability & Resiliency
- Security
- Testing Matrix
- Sonnet Task List
1. High-Level Architecture Diagram
graph TB
subgraph Client
U[Student / Teacher / Admin<br/>Next.js Frontend]
end
subgraph API_Gateway ["API Gateway (FastAPI)"]
GW[FastAPI App<br/>CORS · Rate Limit · Auth Middleware]
end
subgraph Auth ["Auth Layer"]
SA[Supabase Auth<br/>JWT + Custom Claims Hook]
end
subgraph Core_Services ["Core Services"]
FU[File Upload Service<br/>Presigned URL → S3/GCS]
PW[Parser Workers<br/>PDF Extract · OCR · Normalize]
EW[Embedding Worker<br/>tiktoken chunker · OpenAI embed]
VA[Vector Adapter<br/>Pinecone upsert/query]
GM[GPT-mini Service<br/>Rerank · Validate · Detect Language]
RS[Reasoning Service<br/>LangGraph Teacher Agent · GPT-4o]
BS[Billing Service<br/>Reserve · Finalize · Reconcile]
end
subgraph Data_Stores ["Data Stores"]
PG[(Supabase Postgres<br/>profiles · wallet · chunks<br/>ingestion_jobs · reservations)]
PC[(Pinecone<br/>curriculum-1536<br/>vectors + lightweight metadata)]
S3[(Blob Store<br/>S3 / GCS<br/>Raw PDFs)]
RC[(Cache<br/>Redis / In-Memory LRU)]
end
subgraph Infra ["Infrastructure"]
SM[Secret Manager<br/>Vault / Cloud KMS]
MON[Monitoring<br/>Structured Logs · Metrics · Alerts]
RJ[Reindex Job<br/>Scheduled · DR Export]
end
U -->|"HTTPS + JWT"| GW
GW -->|"Verify JWT"| SA
GW -->|"Upload PDF"| FU
GW -->|"/ask, /chat"| RS
GW -->|"/ingestion/jobs"| PW
GW -->|"/wallet/*"| BS
FU -->|"Store raw PDF"| S3
PW -->|"Extract text"| S3
PW -->|"Write chunks"| PG
PW -->|"Request embeddings"| EW
EW -->|"OpenAI embed API"| OAI_E[OpenAI Embeddings]
EW -->|"Upsert vectors"| VA
VA -->|"Upsert/Query"| PC
RS -->|"Dense search"| VA
RS -->|"Rerank candidates"| GM
GM -->|"gpt-4o-mini API"| OAI_M[OpenAI gpt-4o-mini]
RS -->|"Generate answer"| OAI_C[OpenAI gpt-4o]
RS -->|"Reserve / finalize tokens"| BS
RS -->|"Fetch chunk text"| RC
RC -->|"Cache miss"| PG
BS -->|"Ledger + Reservations"| PG
PW -->|"Audit log"| PG
GW -->|"Read secrets"| SM
GW -->|"Emit metrics"| MON
RS -->|"Emit metrics"| MON
RJ -->|"Export chunks + vectors"| S3
RJ -->|"Re-embed"| EW
style PG fill:#336791,color:#fff
style PC fill:#1a73e8,color:#fff
style S3 fill:#e47911,color:#fff
style RC fill:#dc382c,color:#fff
Data-Flow Paths
| Path |
Flow |
| Ingestion |
Admin → FastAPI → File Upload → S3 → Parser Worker → Embedding Worker → Pinecone + Postgres |
| Retrieval (Chat) |
Student → FastAPI → Auth → Billing Reserve → Pinecone query → Cache/Postgres (chunk text) → GPT-mini rerank → GPT-4o reason → SSE stream → Billing Finalize |
| Billing |
FastAPI → Reserve (Postgres TX) → LLM call → Finalize (Postgres TX) → Ledger entry |
| Scraping |
Admin → FastAPI → Scraper → Canonicalize → Dedupe → Postgres references |
Request ID Propagation
Every inbound HTTP request receives a request_id (UUID v4) at the API gateway. This ID is the single correlation key across every subsystem — without it, debugging an LLM failure that spans Pinecone, OpenAI, wallet, and audit tables is nearly impossible.
Generation: FastAPI middleware generates request_id = uuid4() at the start of every request (or adopts X-Request-ID from the client/load-balancer if present).
Propagation path:
| Component |
How request_id is used |
| Structured logs |
Every log line includes request_id as a top-level JSON field |
| Wallet / Reservations |
reservations.request_id and wallet_ledger.request_id link billing to the originating request |
| Usage logs |
usage_logs.request_id correlates the RAG interaction |
| OpenAI calls |
Passed as user parameter in OpenAI API calls (enables cost attribution in OpenAI Dashboard) |
| Pinecone queries |
Logged alongside query parameters for post-hoc debugging |
| Ingestion audit |
ingestion_audit.request_id (nullable — only set when triggered via API, not cron) |
| SSE stream |
Returned in the final done event: {"type": "done", "request_id": "uuid", ...} |
| Error responses |
Every error response body includes "request_id": "uuid" so the client can report it |
Implementation:
- Middleware sets request.state.request_id.
- A contextvars.ContextVar makes it available to all service layers without explicit threading.
- File: app/core/middleware.py
Rate Limiting
Students will spam refresh, open multiple tabs, and trigger parallel queries. Without rate limiting, a single user can exhaust the platform's OpenAI quota.
Strategy: Per-user (authenticated) rate limit with per-IP fallback for unauthenticated endpoints.
| Scope |
Limit |
Window |
Applies to |
Per-user (JWT user_id) |
10 requests |
1 minute |
/ask, /chat, /quizzes/generate, /search/semantic |
Per-user (JWT user_id) |
30 requests |
1 minute |
/wallet/*, /upload/* |
| Per-IP (unauthenticated) |
5 requests |
1 minute |
/auth/signup, /auth/login |
| Per-user (admin) |
60 requests |
1 minute |
/admin/*, /ingestion/*, /scraping/* |
Enforcement:
- In-memory sliding-window counter (sufficient at single-instance scale).
- If deploying multiple instances: Redis-backed counter (same Redis as cache layer).
- Response on breach: HTTP 429 Too Many Requests with Retry-After header (seconds until window resets).
- request_id is included in the 429 response body for support debugging.
Response format:
{
"error": "rate_limited",
"request_id": "uuid",
"retry_after": 23,
"limit": 10,
"window": "1m"
}
Implementation: FastAPI middleware in app/core/middleware.py (same file as request-ID middleware).
2. Service Responsibility List
2.1 Implemented Services (Phase A-F)
| Service |
Responsibility |
Implementation File |
Status |
| API Gateway |
Route requests, CORS, auth middleware, request validation |
app/main.py, app/api/routers/ |
✅ Working |
| Request Middleware |
Generate/adopt request_id (UUID), enforce per-user and per-IP rate limits, inject request_id into contextvars |
app/core/middleware.py |
✅ Implemented |
| Auth Service |
JWT verification, role extraction from custom claims (app_metadata.role), admin guard |
app/core/auth.py |
✅ Working |
| Dependency Registry |
Centralized singleton service instances with proper dependency wiring |
app/core/dependencies.py |
✅ Working |
| Chunking Service |
Token-based chunking (tiktoken), deterministic chunk IDs (sha256(file_id:page:chunk_index)), language-specific sizes |
app/services/chunking.py |
✅ Implemented (S1) |
| Ingestion Service |
State machine (queued → ready/failed), retry logic (max 3), audit trail |
app/services/ingestion.py |
✅ Implemented (S2) |
| Wallet Reservation Service |
Reserve tokens (atomic), finalize after LLM, expire stale reservations, reconcile ledger |
app/services/wallet_reservation.py |
✅ Working (S3) |
| Pinecone Adapter |
Upsert/query vectors with lightweight metadata (<1 KB), no full text storage |
app/services/pinecone_adapter.py |
✅ Implemented (S4) |
| Embedding Service |
Generate embeddings (OpenAI), track refs in embedding_refs table, upsert to Pinecone |
app/services/embedding_service.py |
✅ Implemented (S5) |
| Upload Service |
Generate presigned URLs for S3/GCS/Supabase Storage, validate file type/size |
app/services/upload.py |
✅ Implemented (S21) |
| Cache Service |
Dual LRU cache (rerank 15-min TTL, chunk text 1-hour TTL), invalidation on re-ingestion |
app/services/cache.py |
✅ Implemented (S10-S11) |
| Tier Config |
Free/Standard/Premium limits (top-K, rerank-N, tokens), cost estimation |
app/services/tier_config.py |
✅ Implemented (S12) |
| GPT-mini Service |
Rerank candidates, detect language (French/Arabic/Hassaniya), translate queries, validate input, circuit breaker |
app/services/gpt_mini.py |
✅ Implemented (S20) |
| Retrieval Pipeline |
Full flow: detect language → translate → embed → dense search → rerank → fetch chunks |
app/services/retrieval_pipeline.py |
✅ Implemented (Phase D) |
| Quiz Generator |
RAG-based quiz generation with GPT-4o, multiple-choice with explanations and source pages |
app/services/quiz_generator.py |
✅ Implemented (S22) |
| Circuit Breaker |
Protection for OpenAI/Pinecone calls, 3 failures → open, 120s recovery, fallback strategies |
app/services/circuit_breaker.py |
✅ Implemented (S16) |
| Text Normalizer |
Arabic canonicalization (alef unification, tatweel removal, boilerplate removal) |
app/services/text_normalizer.py |
✅ Implemented (S14) |
| Deduplication Service |
SimHash (64-bit) with Hamming distance ≤ 3 for duplicate detection |
app/services/deduplication.py |
✅ Implemented (S13) |
| Quality Checker |
Content quality heuristics (min length, OCR confidence, encoding validation) |
app/services/quality_checker.py |
✅ Implemented (S15) |
| Scraper Service |
Automated pipeline: canonicalize → quality check → dedupe → insert canonical refs |
app/services/scraper_service.py |
✅ Implemented (Phase E) |
| Monitoring |
Structured JSON logging, Prometheus-compatible metrics (counters, histograms, gauges) |
app/core/logging.py, app/core/metrics.py |
✅ Implemented (S17) |
| Config Management |
Settings with env vars, defaults for all parameters |
app/core/config.py |
✅ Updated |
2.2 Background Jobs (Phase F)
| Job |
Responsibility |
Implementation File |
Schedule |
Status |
| Reservation Expiry |
Expire un-finalized reservations older than 5 min, refund tokens |
scripts/expire_reservations.py |
Continuous (60s loop) |
✅ Ready |
| Wallet Reconciliation |
Compare wallet balance with ledger sum, flag discrepancies (no auto-correct) |
scripts/reconcile_wallets.py |
Daily 2 AM (cron) |
✅ Ready (S18) |
| DR Export |
Export chunks to NDJSON, upload to blob store for disaster recovery |
scripts/export_chunks.py |
Weekly Sunday 3 AM (cron) |
✅ Ready (S19) |
| Reindex |
Re-embed chunks with new model, create new namespace, verify counts |
scripts/reindex.py |
On-demand (manual) |
✅ Ready (S19) |
2.3 Legacy Services (Pre-existing, Kept for Compatibility)
| Service |
File |
Notes |
| Legacy Embeddings |
app/services/embeddings.py |
Kept for backward compatibility; new code uses embedding_service.py |
| Legacy Wallet |
app/services/wallet.py |
Kept for backward compatibility; new code uses wallet_reservation.py |
| Legacy Pinecone |
app/services/pinecone_store.py |
Kept; new code uses pinecone_adapter.py |
| Legacy Retrieval |
app/services/retrieval.py |
Kept; new code uses retrieval_pipeline.py |
GPT-mini Validator/Reranker — Hosting & SLA
- Hosted on: OpenAI API (same API key as main models). Model:
gpt-4o-mini.
- SLA: Same as OpenAI API (99.9% target). No self-hosted fallback needed at current scale.
- Fallback: If
gpt-4o-mini returns error or latency > 5 seconds:
- Reranking: Skip rerank, return dense-retrieval order (graceful degradation).
- Language detection: Fall back to simple regex-based Arabic/French detector.
- Input validation: Allow the request through (fail-open for validation; fail-closed for safety).
3. Storage Responsibilities
What Goes Where
| Data |
Store |
Rationale |
| Full chunk text |
Postgres (chunks.content) |
Source of truth; enables full-text search; avoids Pinecone 40 KB metadata limit |
| Embedding vectors (1536-dim) |
Pinecone |
Optimized for ANN search |
| Lightweight metadata per vector |
Pinecone metadata |
Filter fields only: chunk_id, file_id, language, grade, subject, source_url, page_number, ingestion_ts |
| Raw PDF files |
S3 / GCS (Blob Store) |
Archival; enables re-ingestion without re-downloading |
| User data (profiles, wallets, ledger) |
Postgres |
Relational, RLS-protected |
| Ingestion state machine |
Postgres (ingestion_jobs) |
Transactional state with audit trail |
| Reservation state |
Postgres (reservations) |
Must be atomic with wallet balance |
| Cached rerank results |
Redis / In-memory |
Ephemeral; TTL 15 min |
| Cached chunk text |
Redis / In-memory |
LRU; TTL 1 hour |
Canonical Chunk-Store Approach
┌──────────────┐ ┌──────────────────────┐
│ Pinecone │ │ Postgres │
│ │ │ │
│ vector_id ──┼──────┼→ chunks.chunk_id │
│ metadata: │ │ chunks.content │
│ chunk_id │ │ chunks.file_id │
│ file_id │ │ chunks.page_number │
│ language │ │ chunks.token_count │
│ grade │ │ │
│ subject │ └──────────────────────┘
│ page_number│
│ ingestion_ts│
└──────────────┘
At retrieval time:
1. Query Pinecone → get chunk_id list.
2. Fetch chunk text from cache (Redis/LRU) → on miss, query Postgres chunks table.
3. Pass text to reranker and reasoning model.
4. Postgres Schema Additions
ingestion_jobs
CREATE TABLE IF NOT EXISTS ingestion_jobs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
reference_id UUID NOT NULL REFERENCES references(id),
file_id UUID, -- FK to documents table if applicable
status TEXT NOT NULL DEFAULT 'queued'
CHECK (status IN ('queued','parsing','tokenizing',
'embedding_request_sent','embedding_upserted',
'ready','failed')),
chunks_created INT DEFAULT 0, -- count of chunks produced
vectors_upserted INT DEFAULT 0, -- count of vectors sent to Pinecone
retry_count INT DEFAULT 0, -- current retry attempt
max_retries INT DEFAULT 3,
error_message TEXT, -- last error (nullable)
created_at TIMESTAMPTZ DEFAULT now(),
updated_at TIMESTAMPTZ DEFAULT now()
);
-- Index for status polling
CREATE INDEX IF NOT EXISTS idx_ingestion_jobs_status ON ingestion_jobs(status);
-- Index for reference lookup
CREATE INDEX IF NOT EXISTS idx_ingestion_jobs_reference ON ingestion_jobs(reference_id);
chunks (enhanced)
-- If chunks table already exists, ALTER; otherwise CREATE.
-- This shows the target schema.
CREATE TABLE IF NOT EXISTS chunks (
chunk_id TEXT PRIMARY KEY, -- sha256(file_id:page:chunk_index)
file_id UUID NOT NULL REFERENCES documents(id),
page_number INT NOT NULL,
chunk_index INT NOT NULL, -- position within the page
content TEXT NOT NULL, -- full chunk text
token_count INT NOT NULL, -- token count (tiktoken cl100k_base)
language TEXT NOT NULL DEFAULT 'fr', -- 'fr', 'ar', 'ha' (Hassaniya)
embedding_model TEXT NOT NULL DEFAULT 'text-embedding-3-small',
ingestion_job_id UUID REFERENCES ingestion_jobs(id),
created_at TIMESTAMPTZ DEFAULT now()
);
-- Composite index for idempotency check
CREATE UNIQUE INDEX IF NOT EXISTS idx_chunks_deterministic
ON chunks(file_id, page_number, chunk_index);
reservations
CREATE TABLE IF NOT EXISTS reservations (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES auth.users(id),
estimated INT NOT NULL, -- estimated token cost reserved
actual INT, -- actual token cost (set on finalize)
status TEXT NOT NULL DEFAULT 'reserved'
CHECK (status IN ('reserved','finalized','expired','refunded')),
request_id UUID, -- links to usage_logs
created_at TIMESTAMPTZ DEFAULT now(),
finalized_at TIMESTAMPTZ,
expires_at TIMESTAMPTZ DEFAULT now() + INTERVAL '5 minutes'
);
CREATE INDEX IF NOT EXISTS idx_reservations_user ON reservations(user_id);
CREATE INDEX IF NOT EXISTS idx_reservations_status ON reservations(status)
WHERE status = 'reserved'; -- partial index for expiry job
wallet_ledger (enhanced — add reservation_id)
-- ALTER existing table
ALTER TABLE wallet_ledger
ADD COLUMN IF NOT EXISTS reservation_id UUID REFERENCES reservations(id);
embedding_refs
CREATE TABLE IF NOT EXISTS embedding_refs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
chunk_id TEXT NOT NULL REFERENCES chunks(chunk_id),
pinecone_vector_id TEXT NOT NULL, -- the ID used in Pinecone
pinecone_namespace TEXT NOT NULL, -- e.g. grade-12-math
embedding_model TEXT NOT NULL DEFAULT 'text-embedding-3-small',
upserted_at TIMESTAMPTZ DEFAULT now()
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_embedding_refs_vector
ON embedding_refs(pinecone_vector_id, pinecone_namespace);
ingestion_audit
CREATE TABLE IF NOT EXISTS ingestion_audit (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
ingestion_job_id UUID NOT NULL REFERENCES ingestion_jobs(id),
from_status TEXT,
to_status TEXT NOT NULL,
message TEXT, -- error detail or info
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX IF NOT EXISTS idx_ingestion_audit_job
ON ingestion_audit(ingestion_job_id);
references enhancements
ALTER TABLE references
ADD COLUMN IF NOT EXISTS content_fingerprint BIGINT, -- SimHash for dedupe
ADD COLUMN IF NOT EXISTS canonical_id UUID REFERENCES references(id),
ADD COLUMN IF NOT EXISTS last_checked_at TIMESTAMPTZ,
ADD COLUMN IF NOT EXISTS ocr_confidence REAL; -- 0.0–1.0
5. Ingestion Pipeline
Deterministic Chunk ID
chunk_id = sha256( file_id + ":" + page_number + ":" + chunk_index )
file_id: UUID from the documents table.
page_number: 0-indexed page from PDF extraction.
chunk_index: 0-indexed position of the chunk within that page.
This ensures that re-ingesting the same file with the same parser produces identical chunk IDs → Pinecone upserts are idempotent (overwrite, no duplicates).
Token-Based Chunking Strategy
| Language |
Tokenizer |
Chunk Size |
Overlap |
Notes |
| French |
tiktoken / cl100k_base |
512 tokens |
64 tokens |
Standard Latin-script tokenization |
| Arabic (MSA) |
tiktoken / cl100k_base |
384 tokens |
48 tokens |
Arabic tokenizes at ~1.5× expansion; smaller chunks maintain quality |
| Hassaniya |
tiktoken / cl100k_base |
384 tokens |
48 tokens |
Treated as Arabic-script; same tokenizer with cultural localization |
Ingestion Job State Machine
stateDiagram-v2
[*] --> queued : POST /ingestion/jobs
queued --> parsing : Worker picks up job
parsing --> tokenizing : Text extracted successfully
parsing --> failed : PDF corrupt / download error
tokenizing --> embedding_request_sent : Chunks created, embeddings requested
tokenizing --> failed : Tokenizer error
embedding_request_sent --> embedding_upserted : OpenAI returns embeddings
embedding_request_sent --> embedding_request_sent : Transient error (retry ≤ 3)
embedding_request_sent --> failed : Max retries exceeded
embedding_upserted --> ready : Pinecone upsert confirmed
embedding_upserted --> failed : Pinecone upsert error (after retries)
failed --> queued : Manual retry via admin API
ready --> [*]
Retry Semantics
- Transient errors (HTTP 429, 500, 503 from OpenAI/Pinecone): Retry with exponential backoff (1 s, 4 s, 16 s). Max 3 retries.
- Permanent errors (HTTP 400, invalid PDF): Move to
failed immediately; no retry.
- Idempotent upserts: Because chunk IDs and Pinecone vector IDs are deterministic, a retry that re-sends the same vectors is safe.
Index Configuration
| Property |
Value |
| Index name |
curriculum-1536 |
| Dimensions |
1536 |
| Metric |
cosine |
| Cloud |
Serverless (AWS or GCP) |
Namespace Strategy
Format: grade-{grade}-{subject} (e.g., grade-12-math, grade-10-physics).
Default namespace: default (for unclassified content).
vector_id = chunk_id (i.e., the same sha256 hash)
{
"chunk_id": "a1b2c3...",
"file_id": "uuid-...",
"language": "fr",
"grade": "12",
"subject": "math",
"source_url": "https://koutoubi.mr/...",
"page_number": 5,
"ingestion_ts": "2026-02-17T10:30:00Z"
}
Note: text is NOT stored in Pinecone metadata. Full text lives in Postgres chunks.content.
Recommended Filter Fields
For query(filter=...):
- language — prefilter to corpus language.
- grade — scope to student's grade level.
- subject — scope to the subject being studied.
- file_id — useful for admin queries ("show all vectors from this document").
7. Retrieval → Rerank → Reasoning Pipeline
flowchart LR
Q[User Query] --> LD[Language Detect<br/>GPT-mini]
LD --> TR{Translation<br/>needed?}
TR -->|Yes| TRANS[Translate to<br/>corpus language]
TR -->|No| DR
TRANS --> DR[Dense Retrieval<br/>Pinecone top-K]
DR --> LP{Lexical<br/>prefilter?}
LP -->|Arabic query| BM[BM25 Keyword<br/>Filter]
LP -->|No| RR
BM --> RR[Rerank<br/>GPT-mini top-N]
RR --> CC[Cache Check<br/>sha256 query+ns+tier]
CC -->|Hit| RES
CC -->|Miss| RR2[Call GPT-mini<br/>reranker]
RR2 --> CS[Cache Store<br/>TTL 15 min]
CS --> RES[Fetch Chunk Text<br/>Cache → Postgres]
RES --> GEN[Reasoning<br/>GPT-4o + context]
GEN --> SSE[Stream SSE<br/>to client]
Cost Policy by Tier
| Tier |
top-K (Dense) |
Rerank-N |
Reranker |
Cache TTL |
| Free |
10 |
3 |
gpt-4o-mini |
15 min |
| Standard |
20 |
5 |
gpt-4o-mini |
15 min |
| Premium |
30 |
8 |
gpt-4o-mini |
15 min |
Caching Behavior
| Cache |
Key |
TTL |
Invalidation |
| Rerank results |
sha256(query + namespace + tier) |
15 min |
On re-ingestion of any file in the namespace |
| Chunk text |
chunk_id |
1 hour |
On re-ingestion (chunk_id changes if content changes) |
8. API Contract
8.1 Auth Endpoints
POST /auth/signup
Delegates to Supabase Auth. The backend creates a profiles row and initializes a wallet via the handle_new_user trigger.
| Field |
Value |
| Auth |
None (public) |
| Request |
{ "email": "student@example.mr", "password": "...", "metadata": { "full_name": "Ahmed" } } |
| Response 201 |
{ "user_id": "uuid", "email": "...", "role": "student" } |
| Error 400 |
{ "error": "email_already_registered" } |
POST /auth/login
Delegates to Supabase Auth, returns JWT.
| Field |
Value |
| Auth |
None (public) |
| Request |
{ "email": "...", "password": "..." } |
| Response 200 |
{ "access_token": "jwt...", "refresh_token": "...", "expires_in": 3600 } |
| Error 401 |
{ "error": "invalid_credentials" } |
8.2 File Upload
POST /upload/file
| Field |
Value |
| Auth |
Bearer JWT (admin role) |
| Request |
{ "filename": "math_12.pdf", "content_type": "application/pdf", "grade": "12", "subject": "math", "language": "fr" } |
| Response 200 |
{ "upload_url": "https://s3.../presigned...", "file_id": "uuid", "expires_in": 300 } |
| Error 403 |
{ "error": "admin_required" } |
| Error 400 |
{ "error": "invalid_file_type", "allowed": ["application/pdf"] } |
The client uploads directly to the presigned URL. After upload, the client calls POST /ingestion/jobs.
8.3 Ingestion
POST /ingestion/jobs
| Field |
Value |
| Auth |
Bearer JWT (admin role) |
| Request |
{ "reference_id": "uuid", "force": false } |
| Response 202 |
{ "job_id": "uuid", "status": "queued" } |
| Error 409 |
{ "error": "ingestion_already_in_progress" } |
| Error 404 |
{ "error": "reference_not_found" } |
force: true re-ingests even if the reference is already ready.
GET /ingestion/jobs/{id}
| Field |
Value |
| Auth |
Bearer JWT (admin role) |
| Response 200 |
{ "job_id": "uuid", "status": "embedding_upserted", "chunks_created": 42, "vectors_upserted": 42, "retry_count": 0, "created_at": "...", "updated_at": "..." } |
| Error 404 |
{ "error": "job_not_found" } |
8.4 Chat / Ask
POST /ask
| Field |
Value |
| Auth |
Bearer JWT (student/teacher/admin) |
| Request |
{ "question": "ما هي الترجمة في الرياضيات؟", "grade": "12", "subject": "math", "language": "ar", "stream": true } |
| Response 200 (stream) |
text/event-stream with SSE events: data: {"token": "...", "type": "content"} ... data: {"type": "done", "sources": [...], "tokens_used": 12} |
| Response 200 (JSON) |
{ "answer": "...", "sources": [{"page": 45, "file": "math_12.pdf", "snippet": "..."}], "tokens_used": 12, "reservation_id": "uuid" } |
| Error 402 |
{ "error": "insufficient_balance", "balance": 3, "estimated_cost": 5 } |
| Error 503 |
{ "error": "service_unavailable", "reason": "llm_circuit_open" } |
Internal flow:
1. Pre-validate input (GPT-mini: safety check, language detect).
2. Reserve tokens (POST /wallet/reserve internally).
3. Retrieve from Pinecone (dense search).
4. Optionally rerank (GPT-mini).
5. Generate answer (GPT-4o via LangGraph).
6. Finalize reservation with actual token usage.
8.5 Quiz Generation
POST /quizzes/generate
| Field |
Value |
| Auth |
Bearer JWT (student/teacher/admin) |
| Request |
{ "grade": "12", "subject": "math", "topic": "translations", "num_questions": 5, "language": "fr" } |
| Response 200 |
{ "quiz_id": "uuid", "questions": [{ "q": "...", "options": ["A","B","C","D"], "correct": "B", "explanation": "...", "source_page": 45 }], "tokens_used": 20 } |
| Error 402 |
{ "error": "insufficient_balance" } |
8.6 Wallet / Billing
POST /wallet/reserve
| Field |
Value |
| Auth |
Internal (service-to-service; not exposed publicly) |
| Request |
{ "user_id": "uuid", "estimated": 10, "request_id": "uuid" } |
| Response 200 |
{ "reservation_id": "uuid", "balance_after_reserve": 40 } |
| Error 402 |
{ "error": "insufficient_balance", "balance": 3, "estimated": 10 } |
POST /wallet/finalize
| Field |
Value |
| Auth |
Internal (service-to-service) |
| Request |
{ "reservation_id": "uuid", "actual": 8 } |
| Response 200 |
{ "reservation_id": "uuid", "status": "finalized", "refunded": 2, "balance_after": 42 } |
| Error 404 |
{ "error": "reservation_not_found" } |
| Error 409 |
{ "error": "reservation_already_finalized" } |
GET /wallet/balance
| Field |
Value |
| Auth |
Bearer JWT |
| Response 200 |
{ "user_id": "uuid", "token_balance": 50, "subscription_tier": "free", "pending_reservations": 0 } |
8.7 Semantic Search
GET /search/semantic
| Field |
Value |
| Auth |
Bearer JWT |
| Query params |
?q=translation&grade=12&subject=math&language=fr&limit=5 |
| Response 200 |
{ "results": [{ "chunk_id": "...", "text": "...", "score": 0.92, "page": 45, "source": "math_12.pdf" }] } |
8.8 Admin Endpoints
POST /admin/scraping/{source}/sync
| Field |
Value |
| Auth |
Bearer JWT (admin role) |
| Response 200 |
{ "run_id": "uuid", "status": "success", "found": 15, "new": 3, "duplicates": 2, "errors": 0 } |
POST /admin/reindex
| Field |
Value |
| Auth |
Bearer JWT (admin role) |
| Request |
{ "namespace": "grade-12-math", "reason": "model_upgrade" } (omit namespace to reindex all) |
| Response 202 |
{ "reindex_job_id": "uuid", "status": "queued", "estimated_chunks": 1200 } |
PATCH /admin/users/{user_id}/role
| Field |
Value |
| Auth |
Bearer JWT (admin role) |
| Request |
{ "role": "teacher" } |
| Response 200 |
{ "user_id": "uuid", "role": "teacher" } |
| Error 400 |
{ "error": "invalid_role", "allowed": ["student","teacher","admin"] } |
Error Code Summary
Every error response includes request_id for correlation:
{ "error": "<code>", "request_id": "uuid-...", ... }
| HTTP |
Code |
Meaning |
| 400 |
bad_request |
Invalid input |
| 401 |
unauthorized |
Missing or invalid JWT |
| 402 |
insufficient_balance |
Wallet balance too low |
| 403 |
forbidden |
Role not authorized |
| 404 |
not_found |
Resource does not exist |
| 409 |
conflict |
Duplicate or already-in-progress |
| 429 |
rate_limited |
Too many requests (includes retry_after seconds) |
| 503 |
service_unavailable |
LLM or Pinecone circuit open |
9. RLS & Auth Plan
Current State
- RLS Phase 2 complete: all public tables have RLS enabled.
- Admin auth uses
user_metadata.role (not custom claims yet).
x-admin-key still accepted (deprecated).
Target State
- Roles in JWT custom claims via Postgres hook (
app_metadata.role).
x-admin-key removed entirely.
- New tables (
ingestion_jobs, reservations, embedding_refs, ingestion_audit) have RLS.
RLS Policy Templates
User-facing tables (profiles, wallet, wallet_ledger, usage_logs, reservations)
-- Users can SELECT their own rows
CREATE POLICY "user_select_own" ON {table}
FOR SELECT
USING (auth.uid() = user_id);
-- No INSERT/UPDATE/DELETE via public API
-- (service_role bypasses RLS for backend operations)
System tables (ingestion_jobs, chunks, embedding_refs, ingestion_audit, documents)
ALTER TABLE {table} ENABLE ROW LEVEL SECURITY;
-- Only service_role can access
CREATE POLICY "service_role_only" ON {table}
FOR ALL
USING (auth.role() = 'service_role');
Admin tables (references, scrape_runs)
-- Admin can SELECT and INSERT/UPDATE
CREATE POLICY "admin_read_write" ON {table}
FOR ALL
USING (
auth.role() = 'service_role'
OR (auth.jwt() -> 'app_metadata' ->> 'role') = 'admin'
);
JWT Custom Claims Migration Checklist
-
Create Postgres hook function:
CREATE OR REPLACE FUNCTION public.custom_access_token_hook(event jsonb)
RETURNS jsonb LANGUAGE plpgsql STABLE AS $$
DECLARE
claims jsonb;
user_role TEXT;
BEGIN
SELECT role INTO user_role FROM public.profiles
WHERE user_id = (event->>'user_id')::uuid;
claims := event->'claims';
IF user_role IS NOT NULL THEN
claims := jsonb_set(claims, '{app_metadata,role}', to_jsonb(user_role));
ELSE
claims := jsonb_set(claims, '{app_metadata,role}', '"student"');
END IF;
event := jsonb_set(event, '{claims}', claims);
RETURN event;
END;
$$;
-- Grant necessary permissions
GRANT USAGE ON SCHEMA public TO supabase_auth_admin;
GRANT EXECUTE ON FUNCTION public.custom_access_token_hook TO supabase_auth_admin;
REVOKE EXECUTE ON FUNCTION public.custom_access_token_hook FROM authenticated, anon, public;
GRANT SELECT ON TABLE public.profiles TO supabase_auth_admin;
-
Register in Supabase Dashboard: Authentication → Hooks → "Customize Access Token" → select custom_access_token_hook.
-
Update RLS policies: Change (auth.jwt() ->> 'role') to (auth.jwt() -> 'app_metadata' ->> 'role') in all admin policies.
-
Update FastAPI auth dependency:
# In app/core/auth.py — get_current_admin
# Read role from: jwt_payload["app_metadata"]["role"]
# Instead of: jwt_payload["user_metadata"]["role"]
-
Test with canary user: Create a test admin, verify JWT contains app_metadata.role = "admin", verify all admin endpoints accept the new token.
-
Remove x-admin-key support: Delete the x-admin-key header check from all routers. Update .env.example to remove ADMIN_API_KEY.
-
Rollback procedure:
- If hook fails: Disable the hook in Supabase Dashboard. JWTs revert to default claims.
- Keep
user_metadata.role as fallback in get_current_admin for 2 weeks after migration.
- Monitor auth error rates; if > 1% increase, rollback.
10. Billing & Wallet (Reservation Pattern)
Reservation Flow
sequenceDiagram
participant C as Client
participant API as FastAPI
participant W as Wallet Service
participant PG as Postgres
participant LLM as OpenAI
C->>API: POST /ask {question}
API->>W: reserve(user_id, estimated=10)
W->>PG: BEGIN TX: deduct estimated from wallet, insert reservation
PG-->>W: reservation_id
W-->>API: reservation_id, balance_after=40
API->>LLM: Retrieve + Rerank + Generate
LLM-->>API: answer (actual_tokens=8)
API->>W: finalize(reservation_id, actual=8)
W->>PG: BEGIN TX: update reservation, refund delta (2), insert ledger
PG-->>W: OK
W-->>API: finalized, refunded=2
API-->>C: SSE stream + tokens_used=8
DB Transaction — Reserve
BEGIN;
-- Deduct estimated amount from wallet
UPDATE wallet
SET token_balance = token_balance - :estimated,
updated_at = now()
WHERE user_id = :uid
AND token_balance >= :estimated;
-- If no row updated → insufficient balance → ROLLBACK
-- Create reservation record
INSERT INTO reservations (user_id, estimated, status, request_id, created_at, expires_at)
VALUES (:uid, :estimated, 'reserved', :request_id, now(), now() + INTERVAL '5 minutes')
RETURNING id;
COMMIT;
DB Transaction — Finalize
BEGIN;
-- Mark reservation finalized
UPDATE reservations
SET actual = :actual,
status = 'finalized',
finalized_at = now()
WHERE id = :reservation_id
AND status = 'reserved';
-- If no row updated → already finalized or expired → ROLLBACK
-- Refund delta if actual < estimated
UPDATE wallet
SET token_balance = token_balance + GREATEST(:estimated - :actual, 0),
updated_at = now()
WHERE user_id = :uid;
-- Record in ledger
INSERT INTO wallet_ledger (user_id, delta, reason, request_id, reservation_id)
VALUES (:uid, -:actual, 'agent_chat', :request_id, :reservation_id);
COMMIT;
Expiry Job (Background)
Runs every 60 seconds:
-- Find expired, un-finalized reservations
UPDATE reservations
SET status = 'expired'
WHERE status = 'reserved'
AND expires_at < now()
RETURNING user_id, estimated;
-- For each expired reservation, refund the wallet
UPDATE wallet
SET token_balance = token_balance + :estimated
WHERE user_id = :uid;
INSERT INTO wallet_ledger (user_id, delta, reason, reservation_id)
VALUES (:uid, :estimated, 'reservation_expired', :reservation_id);
Reconciliation (Nightly)
-- Compare ledger sum vs wallet balance
SELECT
w.user_id,
w.token_balance AS current_balance,
COALESCE(SUM(wl.delta), 0) AS ledger_sum,
w.token_balance - COALESCE(SUM(wl.delta), 0) AS discrepancy
FROM wallet w
LEFT JOIN wallet_ledger wl ON w.user_id = wl.user_id
GROUP BY w.user_id, w.token_balance
HAVING w.token_balance != COALESCE(SUM(wl.delta), 0);
Flag any discrepancies > 0 as alerts. Do not auto-correct; require manual investigation.
11. Scraper & Canonicalization
Pipeline (Fully Automated)
flowchart TD
SC[Scraper Fetches Sitemap] --> DL[Download PDF]
DL --> NORM[Canonicalize Text]
NORM --> FP[Compute SimHash Fingerprint]
FP --> DD{Hamming Distance ≤ 3<br/>from existing?}
DD -->|Yes| DUP[Mark as duplicate<br/>link canonical_id]
DD -->|No| QC{Quality Check}
QC -->|Pass| STORE[Insert into references<br/>status: discovered]
QC -->|Fail| LOG[Log to ingestion_audit<br/>reason: quality_failed]
DUP --> DONE[Done]
STORE --> DONE
LOG --> DONE
Canonicalization Steps
- Whitespace normalization: Collapse multiple spaces, tabs, newlines to single space. Trim leading/trailing.
- Arabic script normalization:
- Unify alef variants:
أ إ آ ا → ا
- Remove tatweel (kashida):
ـ → (empty)
- Normalize taa marbuta:
ة → ه (context-dependent, configurable)
- Normalize hamza:
ؤ ئ → و ي + hamza (optional, configurable)
- Boilerplate removal: Per-source regex patterns (configurable in
scraper_config.json):
- Remove page headers/footers matching known patterns (e.g., "Page X of Y", site watermarks).
- Non-content page filtering: Skip pages with < 50 characters after normalization.
Deduplication
- Algorithm: SimHash (64-bit) on the normalized full text of the PDF.
- Threshold: Hamming distance ≤ 3 → considered duplicate.
- Storage:
references.content_fingerprint stores the SimHash value.
- Canonical reference:
references.canonical_id (self-FK) points to the first-discovered version.
Every references row contains:
| Field |
Purpose |
source_url |
Canonical URL (after redirect resolution) |
discovered_at |
First time scraper found this PDF |
last_checked_at |
Last time scraper verified URL is live |
content_fingerprint |
SimHash for deduplication |
scrape_run_id |
Which scrape run discovered it |
canonical_id |
Points to canonical (non-duplicate) reference |
Content Quality Heuristics
| Check |
Threshold |
Action |
| Minimum text length (per page) |
≥ 200 chars after normalization |
Skip page, log reason |
| OCR confidence (Arabic/Hassaniya) |
≥ 0.70 |
Flag for review if below |
| OCR confidence (French) |
≥ 0.80 |
Flag for review if below |
| Encoding |
Valid UTF-8 |
Reject and log |
| File size |
≤ 100 MB |
Reject oversized files |
12. Observability & Resiliency
Metrics to Emit
| Metric |
Type |
Labels |
Purpose |
ingestion_job_duration_seconds |
Histogram |
status, language |
Track ingestion performance |
ingestion_job_status_total |
Counter |
status |
Track job outcomes |
pinecone_query_duration_seconds |
Histogram |
namespace |
Vector search latency |
pinecone_upsert_duration_seconds |
Histogram |
namespace |
Upsert latency |
openai_request_duration_seconds |
Histogram |
model, endpoint |
LLM call latency |
openai_tokens_used_total |
Counter |
model, type (input/output) |
Cost tracking |
wallet_reservation_total |
Counter |
status (reserved/finalized/expired) |
Billing flow health |
wallet_balance_discrepancy |
Gauge |
— |
Reconciliation drift |
circuit_breaker_state |
Gauge |
service (openai/pinecone) |
0=closed, 1=open, 2=half-open |
http_request_duration_seconds |
Histogram |
method, path, status |
API latency |
rerank_cache_hit_ratio |
Gauge |
— |
Cache effectiveness |
active_reservations |
Gauge |
— |
Currently reserved, un-finalized |
rate_limit_rejected_total |
Counter |
scope (user/ip), path |
Rate-limit enforcement activity |
request_id_propagation |
— |
— |
All log lines, wallet rows, usage_logs, and error responses include request_id |
Circuit Breaker Configuration
| Service |
Failure threshold |
Window |
Recovery timeout |
Fallback |
| OpenAI Embeddings |
3 failures |
60 s |
120 s |
Queue job for later retry |
| OpenAI gpt-4o-mini (rerank) |
3 failures |
60 s |
120 s |
Skip reranking; use dense order |
| OpenAI gpt-4o (reasoning) |
3 failures |
60 s |
120 s |
Return 503 to client |
| Pinecone (query) |
3 failures |
60 s |
120 s |
Return 503 to client |
| Pinecone (upsert) |
3 failures |
60 s |
120 s |
Queue for retry |
Alert Thresholds
| Alert |
Condition |
Severity |
| High ingestion failure rate |
> 20% of jobs in failed state in last hour |
Critical |
| Wallet discrepancy detected |
Any non-zero discrepancy in reconciliation |
Warning |
| Circuit breaker opened |
Any circuit breaker transitions to open |
Critical |
| High reservation expiry rate |
> 10% of reservations expiring (not finalized) in last hour |
Warning |
| OpenAI latency spike |
p99 > 30 seconds for any model |
Warning |
| Pinecone latency spike |
p99 > 5 seconds |
Warning |
| Stale reservations |
> 50 reservations in reserved status older than 5 min |
Warning |
| Single user rate-limited repeatedly |
Same user_id rate-limited > 20 times in 5 min |
Warning (potential abuse) |
Reindex & Disaster Recovery
Reindex Strategy
- Export all canonical chunks from Postgres
chunks table.
- Re-embed using the new model.
- Upsert to a new Pinecone namespace (e.g.,
grade-12-math-v2).
- Swap the active namespace in config once verification passes.
- Delete the old namespace.
Disaster Recovery
- Blob store: Raw PDFs archived in S3/GCS. Can re-ingest from scratch.
- Postgres: Supabase provides automatic daily backups + point-in-time recovery.
- Pinecone: If Pinecone data is lost, re-embed from Postgres
chunks table (canonical chunks are the source of truth).
- Export schedule: Weekly export of
chunks table to blob store as Parquet/CSV for offline recovery.
13. Security
Secrets Management
| Secret |
Current |
Target |
OPENAI_API_KEY |
.env file |
Cloud Secret Manager (GCP/AWS/Azure) or Vault |
PINECONE_API_KEY |
.env file |
Cloud Secret Manager |
SUPABASE_SERVICE_KEY |
.env file |
Cloud Secret Manager |
ADMIN_API_KEY |
.env file |
Remove entirely (replace with JWT admin role) |
Rules:
- In production, secrets MUST NOT be stored in plain environment variables or .env files.
- Use the deployment platform's secret injection (e.g., Render's Environment Groups, GCP Secret Manager, AWS Secrets Manager).
- Rotate keys quarterly. Automate rotation where possible.
- SUPABASE_SERVICE_KEY should only be available to the backend service, never to the frontend.
PII Redaction
- Before sending any user data to OpenAI (chat, rerank, quiz generation):
- Strip email addresses (regex:
\S+@\S+\.\S+).
- Strip phone numbers (regex: Mauritanian format
+222...).
- Do NOT send
user_id or wallet balance to OpenAI.
- Log redacted versions in
usage_logs.
Audit Logging
| Event |
Table |
Fields |
| User login |
Supabase Auth logs (built-in) |
timestamp, user_id, IP |
| Admin action (role change, sync, reindex) |
ingestion_audit or dedicated admin_audit |
admin_user_id, action, target, timestamp |
| Wallet mutation |
wallet_ledger |
user_id, delta, reason, request_id, reservation_id |
| Ingestion state change |
ingestion_audit |
job_id, from_status, to_status, message |
| RLS policy violation |
Postgres logs |
query, user, table, policy |
14. Testing Matrix
RLS Tests
| # |
Test Case |
Table |
User |
Operation |
Expected |
| T1 |
Student reads own profile |
profiles |
authenticated (student) |
SELECT WHERE user_id = self |
Allowed |
| T2 |
Student reads other profile |
profiles |
authenticated (student) |
SELECT WHERE user_id = other |
Denied (0 rows) |
| T3 |
Student reads own wallet |
wallet |
authenticated (student) |
SELECT WHERE user_id = self |
Allowed |
| T4 |
Student updates own wallet |
wallet |
authenticated (student) |
UPDATE |
Denied |
| T5 |
Anonymous reads profiles |
profiles |
anon |
SELECT |
Denied (0 rows) |
| T6 |
Service role reads all |
wallet |
service_role |
SELECT |
Allowed (all rows) |
| T7 |
Admin reads references |
references |
authenticated (admin) |
SELECT |
Allowed |
| T8 |
Student reads references |
references |
authenticated (student) |
SELECT |
Denied (0 rows) |
| T9 |
Student reads ingestion_jobs |
ingestion_jobs |
authenticated (student) |
SELECT |
Denied (0 rows) |
| T10 |
Student reads own reservations |
reservations |
authenticated (student) |
SELECT WHERE user_id = self |
Allowed |
| T11 |
Student reads other reservations |
reservations |
authenticated (student) |
SELECT WHERE user_id = other |
Denied (0 rows) |
Ingestion Idempotency Tests
| # |
Test Case |
Expected |
| T12 |
Ingest same file twice |
Second run produces same chunk_ids; Pinecone vector count unchanged; no duplicate chunks in Postgres |
| T13 |
Ingest file, modify content, re-ingest |
New chunk_ids generated; old vectors replaced; old chunks marked stale |
| T14 |
Ingestion fails mid-embedding |
Job status = failed; partial chunks cleaned up; re-trigger starts fresh |
| T15 |
Concurrent ingestion of same reference |
Second job returns 409 conflict |
Reservation / Billing Tests
| # |
Test Case |
Expected |
| T16 |
Reserve with sufficient balance |
Balance decremented; reservation created with status reserved |
| T17 |
Reserve with insufficient balance |
402 error; balance unchanged; no reservation created |
| T18 |
Finalize with actual < estimated |
Delta refunded to wallet; ledger entry = -actual |
| T19 |
Finalize with actual > estimated (capped) |
Additional deduction from wallet (capped at 2× estimate); ledger entry = -actual |
| T20 |
Reservation expires (not finalized) |
Expiry job refunds estimated amount; reservation status = expired |
| T21 |
Double-finalize same reservation |
Second call returns 409; no double-deduction |
| T22 |
Reconciliation detects discrepancy |
Alert fired; no auto-correction |
Rerank & Caching Tests
| # |
Test Case |
Expected |
| T23 |
Same query within TTL |
Second call hits cache; no GPT-mini call |
| T24 |
Same query after TTL expires |
Cache miss; GPT-mini called; new cache entry |
| T25 |
Re-ingestion invalidates cache |
After re-ingestion of file in namespace, cached reranks for that namespace are evicted |
Circuit Breaker Tests
| # |
Test Case |
Expected |
| T26 |
OpenAI returns 500 three times |
Circuit opens; rerank calls skip to dense order; circuit resets after 120 s |
| T27 |
Pinecone times out during ingestion |
Job retries (up to 3); if all fail, circuit opens; ingestion jobs queued |
Rate Limiting & Request ID Tests
| # |
Test Case |
Expected |
| T28 |
Student sends 11 /ask requests in 1 minute |
First 10 succeed; 11th returns 429 with retry_after and request_id |
| T29 |
Unauthenticated IP sends 6 /auth/login in 1 minute |
First 5 succeed; 6th returns 429 |
| T30 |
Admin sends 61 /admin/* requests in 1 minute |
First 60 succeed; 61st returns 429 |
| T31 |
Every successful /ask response includes request_id |
request_id present in SSE done event and in JSON response |
| T32 |
Every error response includes request_id |
400, 401, 402, 403, 429, 503 responses all contain request_id field |
| T33 |
request_id propagates to reservations and wallet_ledger |
After a chat, reservations.request_id and wallet_ledger.request_id match the API response's request_id |
| T34 |
Client-provided X-Request-ID is adopted |
If client sends X-Request-ID: custom-uuid, the response and logs use that same ID |
Integration / E2E Tests
| # |
Test Case |
Expected |
| T35 |
Full chat flow (reserve → retrieve → rerank → answer → finalize) |
Correct answer returned; wallet balance = original - actual; ledger entry exists; all rows share same request_id |
| T36 |
Arabic query against French corpus |
Translation occurs; relevant French chunks retrieved; answer in Arabic |
| T37 |
Admin triggers scrape → ingest → search |
New references discovered; ingestion completes; semantic search returns results from new content |
15. Sonnet Task List (Implementation Status)
Priority 1: Correctness & Data Integrity
| # |
Title |
Implementation Status |
Files Created |
Testing Status |
| S1 |
Deterministic chunk IDs |
✅ COMPLETE |
app/services/chunking.py, migration 13 |
⏳ Needs integration testing |
| S2 |
Ingestion jobs state machine |
✅ COMPLETE |
app/services/ingestion.py, migration 12 |
✅ Wired to admin router |
| S3 |
Reservation-based billing |
✅ COMPLETE & TESTED |
app/services/wallet_reservation.py, migration 14, scripts/expire_reservations.py |
✅ Working in wallet router |
| S4 |
Lightweight Pinecone metadata |
✅ COMPLETE |
app/services/pinecone_adapter.py |
⏳ Needs retrieval testing |
| S5 |
Embedding refs tracking |
✅ COMPLETE |
app/services/embedding_service.py, migration 15 |
⏳ Needs ingestion testing |
| S21 |
Presigned upload service |
✅ COMPLETE |
app/services/upload.py |
⏳ Needs router implementation |
Priority 2: Security & RLS Hardening
| # |
Title |
Implementation Status |
Files Created |
Testing Status |
| S6 |
JWT custom claims hook |
✅ COMPLETE |
Migration 18, app/core/auth.py updated |
⏳ Hook needs manual registration in Dashboard |
| S7 |
Remove x-admin-key support |
⏳ DEPRECATED (warnings added) |
app/core/auth.py updated |
✅ Logs deprecation warnings |
| S8 |
RLS for new tables |
✅ COMPLETE |
Migration 16 |
⏳ Needs database migration run |
| S9 |
Secrets management |
⏳ PARTIAL (config updated) |
app/core/config.py, .env.example |
⏳ Production deployment needed |
| S9b |
Request-ID + rate limiting |
✅ COMPLETE |
app/core/middleware.py |
⏳ Needs production testing |
Priority 3: Caching & Cost Control
| # |
Title |
Implementation Status |
Files Created |
Testing Status |
| S10 |
Rerank result caching |
✅ COMPLETE |
app/services/cache.py |
⏳ Needs retrieval pipeline testing |
| S11 |
Chunk text cache |
✅ COMPLETE |
app/services/cache.py |
⏳ Needs retrieval pipeline testing |
| S12 |
Tier-based retrieval limits |
✅ COMPLETE |
app/services/tier_config.py |
⏳ Needs retrieval pipeline testing |
Priority 4: Scraper Hardening
| # |
Title |
Implementation Status |
Files Created |
Testing Status |
| S13 |
SimHash deduplication |
✅ COMPLETE |
app/services/deduplication.py, migration 17 |
⏳ Needs scraper integration testing |
| S14 |
Arabic canonicalization |
✅ COMPLETE |
app/services/text_normalizer.py |
⏳ Needs scraper testing |
| S15 |
Quality heuristics |
✅ COMPLETE |
app/services/quality_checker.py |
⏳ Needs scraper testing |
Priority 5: Observability & DR
| # |
Title |
Implementation Status |
Files Created |
Testing Status |
| S16 |
Circuit breaker |
✅ COMPLETE |
app/services/circuit_breaker.py |
⏳ Needs failure simulation testing |
| S17 |
Structured logging & metrics |
✅ COMPLETE |
app/core/logging.py, app/core/metrics.py, app/api/routers/metrics.py |
✅ Metrics endpoints working |
| S18 |
Wallet reconciliation job |
✅ COMPLETE |
scripts/reconcile_wallets.py |
⏳ Ready to run (needs cron setup) |
| S19 |
Reindex & DR export |
✅ COMPLETE |
scripts/reindex.py, scripts/export_chunks.py |
⏳ Ready to run (needs cron setup) |
Priority 6: API & Integration
| # |
Title |
Implementation Status |
Files Created |
Testing Status |
| S20 |
GPT-mini service |
✅ COMPLETE |
app/services/gpt_mini.py |
⏳ Needs retrieval pipeline testing |
| S21 |
Presigned upload endpoint |
✅ COMPLETE |
app/services/upload.py |
⏳ Needs router implementation |
| S22 |
Quiz generation |
✅ COMPLETE |
app/services/quiz_generator.py, app/api/routers/quiz.py |
⏳ Needs dependency injection + testing |
Integration & Fixes (Post-Sonnet)
| # |
Title |
Implementation Status |
Files Created/Modified |
Testing Status |
| I1 |
Dependency injection pattern |
✅ COMPLETE & WORKING |
app/core/dependencies.py |
✅ Used by wallet, admin routers |
| I2 |
Wallet router integration |
✅ COMPLETE & WORKING |
app/api/routers/wallet.py (updated) |
✅ Balance, reservations tested |
| I3 |
Admin router fixes |
✅ COMPLETE & WORKING |
app/api/routers/admin.py (fixed) |
✅ Users, roles tested |
| I4 |
Auth fixes |
✅ COMPLETE & WORKING |
app/core/auth.py (fixed) |
✅ JWT flow tested |
| I5 |
Chat router integration |
⏳ TODO |
app/api/routers/chat.py (stub exists) |
⏳ Needs retrieval_pipeline wiring |
| I6 |
Quiz router integration |
⏳ TODO |
app/api/routers/quiz.py (stub exists) |
⏳ Needs quiz_generator wiring |
| I7 |
Ingestion router creation |
⏳ TODO |
app/api/routers/ingestion.py (missing) |
⏳ Needs creation |
| I8 |
Scraper router integration |
⏳ TODO |
app/api/routers/scraper_admin.py (stub exists) |
⏳ Needs scraper_service wiring |
Appendix: Prioritization Rules
The Sonnet task list is ordered by these rules:
- Correctness first (S1–S5): Fix ingestion idempotency, billing atomicity, and data integrity. Without these, the system produces duplicates and loses revenue.
- Security second (S6–S9): Harden auth and RLS. Without these, students can access admin data or bypass billing.
- Cost control third (S10–S12): Add caching and tier enforcement. Without these, the platform overspends on LLM calls.
- Scraper quality fourth (S13–S15): Add dedupe and canonicalization. Without these, the vector index contains duplicates and noise.
- Observability fifth (S16–S19): Add circuit breakers, metrics, and DR. Without these, outages go undetected and recovery is manual.
- New features last (S20–S22): GPT-mini service, file upload, quiz generation. These add value but depend on the foundation above.
16. Dependency Injection Implementation
Overview
A centralized service registry pattern was implemented to properly wire all services to routers.
File: app/core/dependencies.py
Pattern: Singleton instances initialized at module level
Benefit: Proper dependency injection, testable, no duplicate instances
Service Initialization Order
# 1. External Clients
openai_client = OpenAI(api_key=settings.OPENAI_API_KEY)
supabase_service = create_client(settings.SUPABASE_URL, settings.SUPABASE_SERVICE_ROLE_KEY)
# 2. Base Adapters
pinecone_adapter = PineconeAdapter(api_key=..., index_name=...)
cache_service = CacheService()
# 3. Core Services
embedding_service = EmbeddingService(openai_client, supabase_service, pinecone_adapter)
gpt_mini_service = GPTMiniService(openai_client)
wallet_service = WalletReservationService(supabase_service)
ingestion_service = IngestionService(supabase_service)
# 4. Pipelines
retrieval_pipeline = RetrievalPipeline(
openai_client, supabase_service, pinecone_adapter,
embedding_service, gpt_mini_service, cache_service
)
# 5. Feature Services
quiz_generator = QuizGeneratorService(openai_client, retrieval_pipeline)
scraper_service = ScraperService(supabase_service, text_normalizer, deduplication_service, quality_checker)
Usage in Routers
Working Example (wallet router):
from app.core.dependencies import wallet_service, supabase_service
@router.get("/balance")
async def get_balance(user: dict = Depends(get_current_user)):
balance_data = wallet_service.get_balance(UUID(user["id"]))
return WalletBalanceResponse(**balance_data)
See: docs/90_ops/dependency_injection.md for full documentation
17. Implementation Status & Next Steps
✅ What's Working (Tested on Personal Laptop)
- Auth & Profile: Signup, signin, JWT, profile management
- Wallet & Billing: Balance, reservations, reserve/finalize pattern
- Admin: User management, role updates
- Metrics: Prometheus and JSON endpoints
⏳ What's Ready But Needs Router Wiring
- Chat with RAG: RetrievalPipeline ready, chat router needs integration (2-3 hours)
- Quiz Generation: QuizGeneratorService ready, quiz router needs wiring (1 hour)
- PDF Ingestion: IngestionService ready, router needs creation (1-2 hours)
- Scraper Sync: ScraperService ready, scraper router needs wiring (1 hour)
- File Upload: UploadService ready, endpoint needs creation (30 min)
Estimated Remaining Work: 5-7 hours of router integration
🔧 Fixes Applied (Post-Sonnet Implementation)
| Fix |
File |
Issue |
Resolution |
Status |
| Dependency injection |
app/core/dependencies.py |
Services not wired to routers |
Created singleton registry |
✅ Working |
| Wallet router |
app/api/routers/wallet.py |
Endpoints were stubs |
Implemented actual logic |
✅ Working |
| Admin router |
app/api/routers/admin.py |
Not using dependencies |
Imported singletons |
✅ Working |
| Auth service |
app/core/auth.py |
Client pattern mismatch |
Fixed service client usage |
✅ Working |
| Type hints |
Multiple services |
Missing imports |
Added List, Dict, Any imports |
✅ Fixed |
📖 Documentation Organization
All documentation moved to mkdocs structure:
- docs/00_overview/ - High-level guides (architecture, start_here)
- docs/20_runbooks/ - Operational guides (quick_start, deployment)
- docs/30_design/ - Design docs (plan, RLS, auth, etc.)
- docs/90_ops/ - Implementation guides (status, dependency injection, phase guides)
- docs/Artifacts/ - Phase completion guides, checklists
- docs/Postman/ - Postman testing guides
See: MkDocs navigation for searchable documentation
18. Testing Strategy (Updated)
Unit Tests (Ready)
tests/unit/test_chunking.py - Deterministic chunk ID tests
- Additional unit tests can be created for each service
Integration Tests (Needs Router Completion)
- Chat flow: reserve → retrieve → answer → finalize
- Ingestion flow: upload → parse → chunk → embed → upsert
- Scraper flow: sync → dedupe → quality check → insert
Postman Collection (Ready)
postman/collection_v2.json - 40+ endpoints
- 10 testing workflows documented
- Auto-capture of JWT, request-ID, job-ID, etc.
Background Jobs (Ready to Deploy)
- Reservation expiry: Continuous systemd service
- Wallet reconciliation: Daily cron at 2 AM
- DR export: Weekly cron on Sunday 3 AM
This document reflects the actual implementation state as of 2026-02-17 after Sonnet implementation and integration fixes.
For next steps, see docs/90_ops/implementation_status.md