Skip to content

Backend Gap-Filling Architecture

Date: 2026-02-18 Purpose: Fill the 12 missing endpoints needed to support the BacMR-UI frontend.


1. Documents Table Enrichment

Problem

The documents table is lean — it lacks title, major, weight, and category. Students need this metadata when browsing available content. Currently this data only exists in references.

Solution

Add optional columns to documents. Populated automatically when ingesting from a reference, left NULL for manual uploads. Admin can update them later.

Migration: 20260218000021_documents_enrichment.sql

ALTER TABLE documents
    ADD COLUMN IF NOT EXISTS title TEXT,
    ADD COLUMN IF NOT EXISTS major TEXT,
    ADD COLUMN IF NOT EXISTS weight INT DEFAULT 0,
    ADD COLUMN IF NOT EXISTS category TEXT,
    ADD COLUMN IF NOT EXISTS reference_id UUID REFERENCES references(id),
    ADD COLUMN IF NOT EXISTS status TEXT DEFAULT 'ready',
    ADD COLUMN IF NOT EXISTS pinecone_index TEXT,
    ADD COLUMN IF NOT EXISTS pinecone_namespace TEXT,
    ADD COLUMN IF NOT EXISTS pinecone_vector_count INT;

CREATE INDEX IF NOT EXISTS idx_documents_status ON documents(status);
CREATE INDEX IF NOT EXISTS idx_documents_grade_subject ON documents(grade, subject);
CREATE INDEX IF NOT EXISTS idx_documents_reference ON documents(reference_id) WHERE reference_id IS NOT NULL;

Data flow

  • Ingest from reference: Copy title, major, weight, category from the reference row. Set reference_id to link them.
  • Manual upload: These fields are NULL. Admin can update via PATCH /admin/documents/{id}.
  • Student queries: Query documents WHERE status = 'ready' — no joins needed.

2. Student Content Browsing

New router: app/api/routers/curriculum.py

Prefix: /curriculum Auth: None (public endpoints)

Endpoints

GET /curriculum/subjects

Returns distinct subjects from ingested documents.

Query: SELECT DISTINCT subject FROM documents WHERE status = 'ready' AND subject IS NOT NULL

Response:

{
    "subjects": ["Mathematiques", "Physique", "Arabe", "Francais"],
    "total": 4
}

GET /curriculum/textbooks

Returns available textbooks grouped by education level (grade).

Query params: ?grade=...&subject=...&language=... (all optional filters)

Query: SELECT * FROM documents WHERE status = 'ready', grouped client-side or via distinct grade values.

Response:

{
    "levels": ["elementary-1", "secondary-3", "high-school-7"],
    "textbooks": [
        {
            "id": "uuid",
            "title": "Manuel de Mathematiques 7C",
            "grade": "high-school-7",
            "subject": "Mathematiques",
            "major": "C",
            "language": "French",
            "weight": 9,
            "page_count": 120,
            "chunk_count": 85,
            "namespace": "grade-high-school-7-Mathematiques"
        }
    ]
}

GET /curriculum/textbooks/{id}

Returns details for a single textbook including its page range (derived from chunks).

Query: SELECT * FROM documents WHERE id = {id} + SELECT MIN(page_number), MAX(page_number) FROM chunks WHERE file_id = {id}

Response:

{
    "document": { "id": "...", "title": "...", "grade": "...", "..." : "..." },
    "page_range": { "min": 1, "max": 120 }
}

Schemas: app/schemas/curriculum.py

  • SubjectsResponse(subjects: List[str], total: int)
  • TextbookOut(id, title, grade, subject, major, language, weight, page_count, chunk_count, namespace)
  • TextbookListResponse(levels: List[str], textbooks: List[TextbookOut])
  • TextbookDetailResponse(document: TextbookOut, page_range: dict)

3. Ingestion Job Management

Flow change

The current POST /admin/ingest/{reference_id} runs synchronously. This changes to a job-based model:

POST /admin/ingest/{reference_id}
    → Creates ingestion_job (status: "queued")
    → Creates document record (status: "processing")
    → Returns { job_id, status: "queued" }

POST /admin/jobs/dispatch
    → Picks oldest "queued" job
    → Starts processing as a FastAPI BackgroundTask
    → Pipeline: download PDF → extract pages → chunk → embed → upsert Pinecone
    → On success: job status → "ready", document status → "ready"
    → On failure: job status → "failed", document status → "failed"
    → Returns { job_id, status: "parsing" }

GET /admin/jobs?status=...&limit=...&offset=...
    → List jobs with filters

GET /admin/jobs/{id}
    → Single job details with audit trail

POST /admin/jobs/{id}/requeue
    → Reset "failed" job to "queued", increment retry_count
    → Returns updated job

Processing pipeline (inside dispatch background task)

  1. Fetch job + reference from DB
  2. Download PDF from reference.pdf_source via httpx
  3. Extract pages via pdf_processor.extract_pages()
  4. Chunk via pdf_processor.build_chunks()
  5. Embed via embedding_service.generate_embeddings()
  6. Upsert to Pinecone via pinecone_adapter.index.upsert()
  7. Create/update documents row with metadata from reference
  8. Update job status → "ready"
  9. Insert audit trail entries at each state transition

Endpoints added to: app/api/routers/admin.py

The existing POST /admin/ingest/{reference_id} is modified from synchronous to job-creating. Four new endpoints are added for job management.

Schemas: app/schemas/ingestion.py (extend existing)

  • IngestionJobOut(id, reference_id, file_id, status, chunks_created, vectors_upserted, retry_count, error_message, created_at, updated_at)
  • IngestionJobListResponse(items: List[IngestionJobOut], total: int)

4. Admin User CRUD

New endpoints added to: app/api/routers/admin.py

All use Supabase Auth admin API via supabase_service.

POST /admin/users

Create a new user account.

Request:

{
    "email": "student@example.com",
    "password": "securepassword",
    "full_name": "Ahmed Mohamed",
    "role": "student"
}

Implementation:

supabase_service.auth.admin.create_user({
    "email": email,
    "password": password,
    "email_confirm": True,
    "user_metadata": {"full_name": full_name, "role": role}
})
Profile row is auto-created by the existing handle_new_user trigger.

DELETE /admin/users/{user_id}

Delete a user account.

Implementation: 1. Delete from profiles table 2. Delete from Supabase Auth: supabase_service.auth.admin.delete_user(user_id)

POST /admin/users/{user_id}/reset-password

Reset a user's password.

Request: { "new_password": "..." }

Implementation:

supabase_service.auth.admin.update_user_by_id(user_id, {"password": new_password})

Schemas: app/schemas/admin.py (extend existing)

  • UserCreateRequest(email, password, full_name, role)
  • UserResetPasswordRequest(new_password)

5. Admin Stats

Endpoint added to: app/api/routers/admin.py

GET /admin/stats

Returns dashboard statistics.

Response:

{
    "documents": { "total": 45, "ready": 40, "processing": 3, "failed": 2 },
    "chunks": { "total": 3200 },
    "references": { "total": 120, "discovered": 75, "ready": 40, "failed": 5 },
    "users": { "total": 150, "admins": 2, "students": 148 },
    "jobs": { "queued": 2, "running": 1, "completed": 42, "failed": 3 },
    "wallet": { "total_tokens_in_circulation": 75000 }
}

Implementation: Multiple count queries against documents, chunks, references, profiles, ingestion_jobs, wallet.


6. Document Delete

Endpoint added to: app/api/routers/admin.py

DELETE /admin/documents/{id}

Delete a document and all associated data.

Implementation: 1. Fetch document (get pinecone_namespace, id) 2. Delete Pinecone vectors: pinecone_adapter.delete_by_file_id(document.id, namespace) 3. Delete chunks: DELETE FROM chunks WHERE file_id = document.id (CASCADE should handle this) 4. Delete document row 5. If reference_id exists, update reference status back to discovered

Response: { "status": "deleted", "chunks_removed": 85, "vectors_removed": 85 }


7. Router Registration

Update app/api/router.py to include the new curriculum router:

from app.api.routers import chat, admin, wallet, scraping, auth, me, curriculum

api_router.include_router(curriculum.router, tags=["curriculum"])

8. Files to Create/Modify

File Action
db/migrations/20260218000021_documents_enrichment.sql New migration
app/schemas/curriculum.py New file: student browsing schemas
app/api/routers/curriculum.py New file: student browsing router
app/schemas/admin.py Add UserCreateRequest, UserResetPasswordRequest
app/schemas/ingestion.py Add IngestionJobOut, IngestionJobListResponse
app/api/routers/admin.py Add jobs, user CRUD, stats, document delete endpoints
app/api/router.py Register curriculum router
app/api/routers/admin.py Modify ingest endpoint to create job instead of running synchronously

9. Endpoint Summary

# Method Path Auth Category
1 GET /curriculum/subjects None Student browsing
2 GET /curriculum/textbooks None Student browsing
3 GET /curriculum/textbooks/{id} None Student browsing
4 GET /admin/jobs Admin Job management
5 GET /admin/jobs/{id} Admin Job management
6 POST /admin/jobs/dispatch Admin Job management
7 POST /admin/jobs/{id}/requeue Admin Job management
8 POST /admin/users Admin User CRUD
9 DELETE /admin/users/{user_id} Admin User CRUD
10 POST /admin/users/{user_id}/reset-password Admin User CRUD
11 GET /admin/stats Admin Dashboard
12 DELETE /admin/documents/{id} Admin Cleanup

Plus modification of existing POST /admin/ingest/{reference_id} to job-based model, and PATCH /admin/documents/{id} for metadata updates.