Backend Gap-Filling Architecture
Date: 2026-02-18 Purpose: Fill the 12 missing endpoints needed to support the BacMR-UI frontend.
1. Documents Table Enrichment
Problem
The documents table is lean — it lacks title, major, weight, and category. Students need this metadata when browsing available content. Currently this data only exists in references.
Solution
Add optional columns to documents. Populated automatically when ingesting from a reference, left NULL for manual uploads. Admin can update them later.
Migration: 20260218000021_documents_enrichment.sql
ALTER TABLE documents
ADD COLUMN IF NOT EXISTS title TEXT,
ADD COLUMN IF NOT EXISTS major TEXT,
ADD COLUMN IF NOT EXISTS weight INT DEFAULT 0,
ADD COLUMN IF NOT EXISTS category TEXT,
ADD COLUMN IF NOT EXISTS reference_id UUID REFERENCES references(id),
ADD COLUMN IF NOT EXISTS status TEXT DEFAULT 'ready',
ADD COLUMN IF NOT EXISTS pinecone_index TEXT,
ADD COLUMN IF NOT EXISTS pinecone_namespace TEXT,
ADD COLUMN IF NOT EXISTS pinecone_vector_count INT;
CREATE INDEX IF NOT EXISTS idx_documents_status ON documents(status);
CREATE INDEX IF NOT EXISTS idx_documents_grade_subject ON documents(grade, subject);
CREATE INDEX IF NOT EXISTS idx_documents_reference ON documents(reference_id) WHERE reference_id IS NOT NULL;
Data flow
- Ingest from reference: Copy
title,major,weight,categoryfrom the reference row. Setreference_idto link them. - Manual upload: These fields are
NULL. Admin can update viaPATCH /admin/documents/{id}. - Student queries: Query
documents WHERE status = 'ready'— no joins needed.
2. Student Content Browsing
New router: app/api/routers/curriculum.py
Prefix: /curriculum
Auth: None (public endpoints)
Endpoints
GET /curriculum/subjects
Returns distinct subjects from ingested documents.
Query: SELECT DISTINCT subject FROM documents WHERE status = 'ready' AND subject IS NOT NULL
Response:
GET /curriculum/textbooks
Returns available textbooks grouped by education level (grade).
Query params: ?grade=...&subject=...&language=... (all optional filters)
Query: SELECT * FROM documents WHERE status = 'ready', grouped client-side or via distinct grade values.
Response:
{
"levels": ["elementary-1", "secondary-3", "high-school-7"],
"textbooks": [
{
"id": "uuid",
"title": "Manuel de Mathematiques 7C",
"grade": "high-school-7",
"subject": "Mathematiques",
"major": "C",
"language": "French",
"weight": 9,
"page_count": 120,
"chunk_count": 85,
"namespace": "grade-high-school-7-Mathematiques"
}
]
}
GET /curriculum/textbooks/{id}
Returns details for a single textbook including its page range (derived from chunks).
Query: SELECT * FROM documents WHERE id = {id} + SELECT MIN(page_number), MAX(page_number) FROM chunks WHERE file_id = {id}
Response:
{
"document": { "id": "...", "title": "...", "grade": "...", "..." : "..." },
"page_range": { "min": 1, "max": 120 }
}
Schemas: app/schemas/curriculum.py
SubjectsResponse(subjects: List[str], total: int)TextbookOut(id, title, grade, subject, major, language, weight, page_count, chunk_count, namespace)TextbookListResponse(levels: List[str], textbooks: List[TextbookOut])TextbookDetailResponse(document: TextbookOut, page_range: dict)
3. Ingestion Job Management
Flow change
The current POST /admin/ingest/{reference_id} runs synchronously. This changes to a job-based model:
POST /admin/ingest/{reference_id}
→ Creates ingestion_job (status: "queued")
→ Creates document record (status: "processing")
→ Returns { job_id, status: "queued" }
POST /admin/jobs/dispatch
→ Picks oldest "queued" job
→ Starts processing as a FastAPI BackgroundTask
→ Pipeline: download PDF → extract pages → chunk → embed → upsert Pinecone
→ On success: job status → "ready", document status → "ready"
→ On failure: job status → "failed", document status → "failed"
→ Returns { job_id, status: "parsing" }
GET /admin/jobs?status=...&limit=...&offset=...
→ List jobs with filters
GET /admin/jobs/{id}
→ Single job details with audit trail
POST /admin/jobs/{id}/requeue
→ Reset "failed" job to "queued", increment retry_count
→ Returns updated job
Processing pipeline (inside dispatch background task)
- Fetch job + reference from DB
- Download PDF from
reference.pdf_sourcevia httpx - Extract pages via
pdf_processor.extract_pages() - Chunk via
pdf_processor.build_chunks() - Embed via
embedding_service.generate_embeddings() - Upsert to Pinecone via
pinecone_adapter.index.upsert() - Create/update
documentsrow with metadata from reference - Update job status → "ready"
- Insert audit trail entries at each state transition
Endpoints added to: app/api/routers/admin.py
The existing POST /admin/ingest/{reference_id} is modified from synchronous to job-creating. Four new endpoints are added for job management.
Schemas: app/schemas/ingestion.py (extend existing)
IngestionJobOut(id, reference_id, file_id, status, chunks_created, vectors_upserted, retry_count, error_message, created_at, updated_at)IngestionJobListResponse(items: List[IngestionJobOut], total: int)
4. Admin User CRUD
New endpoints added to: app/api/routers/admin.py
All use Supabase Auth admin API via supabase_service.
POST /admin/users
Create a new user account.
Request:
{
"email": "student@example.com",
"password": "securepassword",
"full_name": "Ahmed Mohamed",
"role": "student"
}
Implementation:
supabase_service.auth.admin.create_user({
"email": email,
"password": password,
"email_confirm": True,
"user_metadata": {"full_name": full_name, "role": role}
})
handle_new_user trigger.
DELETE /admin/users/{user_id}
Delete a user account.
Implementation:
1. Delete from profiles table
2. Delete from Supabase Auth: supabase_service.auth.admin.delete_user(user_id)
POST /admin/users/{user_id}/reset-password
Reset a user's password.
Request: { "new_password": "..." }
Implementation:
Schemas: app/schemas/admin.py (extend existing)
UserCreateRequest(email, password, full_name, role)UserResetPasswordRequest(new_password)
5. Admin Stats
Endpoint added to: app/api/routers/admin.py
GET /admin/stats
Returns dashboard statistics.
Response:
{
"documents": { "total": 45, "ready": 40, "processing": 3, "failed": 2 },
"chunks": { "total": 3200 },
"references": { "total": 120, "discovered": 75, "ready": 40, "failed": 5 },
"users": { "total": 150, "admins": 2, "students": 148 },
"jobs": { "queued": 2, "running": 1, "completed": 42, "failed": 3 },
"wallet": { "total_tokens_in_circulation": 75000 }
}
Implementation: Multiple count queries against documents, chunks, references, profiles, ingestion_jobs, wallet.
6. Document Delete
Endpoint added to: app/api/routers/admin.py
DELETE /admin/documents/{id}
Delete a document and all associated data.
Implementation:
1. Fetch document (get pinecone_namespace, id)
2. Delete Pinecone vectors: pinecone_adapter.delete_by_file_id(document.id, namespace)
3. Delete chunks: DELETE FROM chunks WHERE file_id = document.id (CASCADE should handle this)
4. Delete document row
5. If reference_id exists, update reference status back to discovered
Response: { "status": "deleted", "chunks_removed": 85, "vectors_removed": 85 }
7. Router Registration
Update app/api/router.py to include the new curriculum router:
from app.api.routers import chat, admin, wallet, scraping, auth, me, curriculum
api_router.include_router(curriculum.router, tags=["curriculum"])
8. Files to Create/Modify
| File | Action |
|---|---|
db/migrations/20260218000021_documents_enrichment.sql |
New migration |
app/schemas/curriculum.py |
New file: student browsing schemas |
app/api/routers/curriculum.py |
New file: student browsing router |
app/schemas/admin.py |
Add UserCreateRequest, UserResetPasswordRequest |
app/schemas/ingestion.py |
Add IngestionJobOut, IngestionJobListResponse |
app/api/routers/admin.py |
Add jobs, user CRUD, stats, document delete endpoints |
app/api/router.py |
Register curriculum router |
app/api/routers/admin.py |
Modify ingest endpoint to create job instead of running synchronously |
9. Endpoint Summary
| # | Method | Path | Auth | Category |
|---|---|---|---|---|
| 1 | GET | /curriculum/subjects |
None | Student browsing |
| 2 | GET | /curriculum/textbooks |
None | Student browsing |
| 3 | GET | /curriculum/textbooks/{id} |
None | Student browsing |
| 4 | GET | /admin/jobs |
Admin | Job management |
| 5 | GET | /admin/jobs/{id} |
Admin | Job management |
| 6 | POST | /admin/jobs/dispatch |
Admin | Job management |
| 7 | POST | /admin/jobs/{id}/requeue |
Admin | Job management |
| 8 | POST | /admin/users |
Admin | User CRUD |
| 9 | DELETE | /admin/users/{user_id} |
Admin | User CRUD |
| 10 | POST | /admin/users/{user_id}/reset-password |
Admin | User CRUD |
| 11 | GET | /admin/stats |
Admin | Dashboard |
| 12 | DELETE | /admin/documents/{id} |
Admin | Cleanup |
Plus modification of existing POST /admin/ingest/{reference_id} to job-based model, and PATCH /admin/documents/{id} for metadata updates.