Skip to content

ADR 0002: Scraping Platform with Generic Sources

Context

The curriculum is hosted on multiple portals, starting with koutoubi.mr. We need a system that can scale to other sources.

Decision

Create a BaseScraper class and a references table that acts as a unified catalog. - Each scraper is responsible for mapping its source-specific HTML/metadata into the standard references schema. - The status field in references manages the state transition from "Discovered" to "Ready" (Indexed).

Consequences

  • Positive: Adding a new source (e.g., a government PDF portal) only requires writing a new scraper class.
  • Negative: Complexity in "inference" logic as each site has different naming conventions for grades and subjects.

Back to Index