Pinecone (Vector DB) State
Index Configuration
- Index Name:
curriculum-1536 - Dimension:
1536 - Metric:
cosine - Cloud:
aws/gcp(Serverless)
Namespace Strategy
To optimize retrieval speed and relevance, we isolate data by Namespace:
- Format: grade-{grade}-{subject} (e.g., grade-12-math).
- Default: default (used for miscellaneous or untagged documents).
Metadata Schema
Every vector stored in Pinecone includes the following metadata:
{
"text": "The actual chunk text (max 1000 chars)",
"pdf_source": "filename.pdf or URL",
"page_number": 12,
"grade": "12",
"subject": "math",
"language": "fr",
"page_start": 0,
"page_end": 1000
}
Dimensions Warning
The system is hardcoded for 1536 dimensions (compatible with OpenAI text-embedding-3-small).
- Constraint: Attempting to upsert or query with a different model (e.g., HuggingFace 768-dim) will fail.