Skip to content

Pinecone (Vector DB) State

Index Configuration

  • Index Name: curriculum-1536
  • Dimension: 1536
  • Metric: cosine
  • Cloud: aws / gcp (Serverless)

Namespace Strategy

To optimize retrieval speed and relevance, we isolate data by Namespace: - Format: grade-{grade}-{subject} (e.g., grade-12-math). - Default: default (used for miscellaneous or untagged documents).

Metadata Schema

Every vector stored in Pinecone includes the following metadata:

{
  "text": "The actual chunk text (max 1000 chars)",
  "pdf_source": "filename.pdf or URL",
  "page_number": 12,
  "grade": "12",
  "subject": "math",
  "language": "fr",
  "page_start": 0,
  "page_end": 1000
}

Dimensions Warning

The system is hardcoded for 1536 dimensions (compatible with OpenAI text-embedding-3-small). - Constraint: Attempting to upsert or query with a different model (e.g., HuggingFace 768-dim) will fail.

Back to Index