ADR 0002: Scraping Platform with Generic Sources
Context
The curriculum is hosted on multiple portals, starting with koutoubi.mr. We need a system that can scale to other sources.
Decision
Create a BaseScraper class and a references table that acts as a unified catalog.
- Each scraper is responsible for mapping its source-specific HTML/metadata into the standard references schema.
- The status field in references manages the state transition from "Discovered" to "Ready" (Indexed).
Consequences
- Positive: Adding a new source (e.g., a government PDF portal) only requires writing a new scraper class.
- Negative: Complexity in "inference" logic as each site has different naming conventions for grades and subjects.