Design: Wallet & Billing System
Token Model
BacMR uses a virtual currency ("Tokens") to manage AI costs. - Deduction Rule: - Base Cost: 5 tokens (covers retrieval and processing). - Output Cost: 1 token per 200 characters generated. - Clarification: Flat 2 tokens (cheaper).
Reserve / Finalize Pattern
The billing system uses atomic reservations to prevent over-billing:
sequenceDiagram
participant User
participant Agent as Teacher Agent
participant WS as WalletService
participant DB as Postgres
User->>Agent: POST /chat
Agent->>WS: reserve(user_id, estimated=15)
WS->>DB: Deduct 15 from wallet
WS->>DB: INSERT reservation (status=reserved, TTL=5min)
WS-->>Agent: reservation_id, balance=85
Note over Agent: LLM retrieval + generation
Agent->>WS: finalize(reservation_id, actual=8)
WS->>DB: UPDATE reservation (status=finalized, actual=8)
WS->>DB: Refund delta (15-8=7) to wallet
WS->>DB: INSERT ledger entry (delta=-8)
WS-->>Agent: balance=92
- Reserve: Before the LLM call, deduct the estimated cost from the wallet and create a
reservationsrecord (status:reserved, TTL: 5 minutes). - LLM Call: Execute the retrieval + generation pipeline.
- Finalize: After the LLM response, update the reservation with actual cost. Refund the delta if actual < estimated. Cap actual at 2x estimated.
- Expire: A background job (
scripts/expire_reservations.py) returns tokens for stale reservations.
Implementation
- Service:
WalletReservationServiceinapp/services/wallet_reservation.py - Singleton:
wallet_serviceinapp/core/dependencies.py - Agent integration:
check_walletnode callswallet_service.reserve(),finalizenode callswallet_service.finalize()
Ledger System
Every transaction is logged in the wallet_ledger table.
- Delta: Positive (top-up) or negative (usage).
- Reason: agent_chat, topup, reservation_expired, etc.
- Request ID: Links the deduction to a specific usage_log entry.
Financial Transactions
A separate transactions table tracks real-money flows (top-ups, refunds, API costs, server expenses).
- Direction: credit (incoming) or debit (outgoing).
- Types: topup, refund, promotion, api_cost, server_cost, infrastructure.
- Currency: Supports MRU (Mauritanian Ouguiya), USD, EUR.
- Payment methods: cash, bank_transfer, mobile_money, bankily, masrivi, seddad.
- Migration: db/migrations/20260217000020_transactions.sql
Top-Up Flow
graph LR
A[Admin calls<br>POST /wallet/topup] --> B[Increment wallet<br>balance]
B --> C[Insert wallet_ledger<br>delta = +tokens]
C --> D[Insert transaction<br>direction=credit<br>type=topup]
D --> E[Return new balance]
- Admin calls
POST /wallet/topupwith student UUID, token amount, and payment details. - Wallet balance is incremented.
- A
wallet_ledgerentry is created (delta = +token_amount). - A
transactionsrecord is created (direction = credit, type = topup).
Tiers & Caps
Different subscription tiers control the "quality" of the AI: | Tier | Top-K (Search) | Rerank-N | Max Tokens/Request | Daily Limit | | :--- | :--- | :--- | :--- | :--- | | Free | 10 | 3 | 500 | 50 | | Standard | 20 | 5 | 2,000 | 500 | | Premium | 30 | 8 | 4,000 | Unlimited |
Tier configuration is managed by TierConfig in app/services/tier_config.py.