Motivation / problem
A multi-provider RAG system spends real money on every chat turn and every ingestion embedding, across five providers with different price sheets. Without a governance layer you get the classic FinOps failure mode: the bill is only legible after it arrives, there is no per-tenant attribution for chargeback, and nothing stops a runaway agent loop or a mispriced model from burning the monthly budget in an afternoon. Worse, AskMyDocs historically guessed per-turn cost from a staticconfig/ai.php price sheet resolved client-side — a number that drifted from
the real bill the moment a provider changed its rates.
AskMyDocs answers this with padosoft/laravel-ai-finops
(+ the companion React cockpit
padosoft/laravel-ai-finops-admin):
an immutable usage ledger, hierarchical budgets, a declarative policy
DSL, chargeback/cost-centers, forecasting + anomaly detection,
cost-aware routing, price-watch and multi-channel alerts — all tenant-scoped,
all priced by the package’s own multi-source cost cascade.
Theory & background
FinOps splits into three loops: inform (meter every call, price it, attribute it), govern (budgets + policies that can warn or block), and optimize (forecast, route by quality-per-dollar, watch for price changes). The package implements all three; AskMyDocs ships inform on by default and leaves govern’s hard blocks opt-in (observe-first). The package meters at a single point: it listens on thelaravel/ai SDK
lifecycle events (AgentPrompted / AgentStreamed / EmbeddingsGenerated). Any
call that flows through the SDK is metered and priced automatically — for free.
The shape of the integration changed in v8.16/W2. AskMyDocs used to call four
of its five providers (OpenAI / Anthropic / Gemini / OpenRouter) over raw
Illuminate\Support\Facades\Http, leaving the SDK’s automatic metering blind to
everything but Regolo. W2 migrated provider transport onto the laravel/ai
SDK (ADR 0015): Anthropic, Gemini and Regolo now run
fully through the SDK, and OpenAI / OpenRouter run through it for ordinary chat +
embeddings. The SDK’s native metering therefore covers almost all traffic. A thin
residual bridge closes the one remaining gap (below).
Design
Native SDK metering does the heavy lifting: every Anthropic / Gemini / Regolo call and every no-tools OpenAI / OpenRouter chat + embedding fires a lifecycle event the package’sMeteringListener records and prices automatically.
The one path the SDK cannot see is the MCP tool-calling turn for the two hybrid
providers (OpenAI / OpenRouter): a with-tools turn is still issued over raw
Http:: because the SDK’s tool-call surface does not yet match the host’s MCP loop.
App\FinOps\AiCallMeter is the residual bridge for exactly that turn — it feeds
the raw-Http:: result into the same MeteringListener pricing pipeline so the
tool turn lands in the ledger with identical pricing, tenant attribution and
subscription coverage. App\Ai\AiManager::bridgeShouldMeterChat() gates it: the
bridge fires only when a hybrid provider issues a with-tools turn
(array_key_exists('tools', …) or the history contains a tool turn), so an
SDK-metered call is never double-counted. The bridge is non-blocking and fully
try/catch’d (the ChatLogManager discipline): a metering
failure never breaks a chat turn or an ingestion run.
Tenant attribution is wired through App\FinOps\HostTenantResolver (a
config-cacheable class-string, not a closure), which returns the request-scoped
tenant id from App\Support\TenantContext — so every
ledger row, budget and rollup belongs to the active tenant (R30).
Real per-turn cost on every chat log (W3)
The old “guess from a static sheet, client-side” cost model is gone. AtChatLogManager time, App\FinOps\ChatTurnCostResolver resolves the real
cost of the turn through the package’s CostResolutionService cascade (actual
billed → tokens × tariff → estimated) and persists it onto the chat_logs row
(cost, cost_currency), plus a trace_id that joins the row to its
ai_finops_usage_ledger row(s). The chat API echoes the resolved cost in
meta.cost / meta.cost_currency (additive — R27), and the chat UI’s cost meter
reads the server value instead of recomputing. Streaming turns stamp the same
cost + trace_id. The whole path is gated on AI_FINOPS_METERING so a deployment
with FinOps off pays no price-feed cost and writes no dangling trace id.
Data model / contract
The package owns its tables under theai_finops_ prefix (created by php artisan migrate after install). The load-bearing ones:
| Table | Holds |
|---|---|
ai_finops_usage_ledger | One immutable row per metered call: provider, model, tokens, cost_total, cost_method, tenant_id, trace tags. |
ai_finops_budgets | N-scope budgets (global → tenant → user → cost-center → provider → model → agent → purpose), soft/hard, daily→yearly + rolling. |
ai_finops_policies / ai_finops_approvals | Declarative policy DSL (block / require-approval / downgrade / throttle / queue) + the approval workflow. |
ai_finops_kill_switches | Global + scoped kill switches. |
ai_finops_cost_centers | Chargeback / showback allocation. |
ai_finops_pricing_overrides / ai_finops_subscription_windows | Manual price overrides + flat-rate coverage windows (covered calls cost 0; tokens still tracked). |
ai_finops_routing_rules / ai_finops_whatif_scenarios | Cost-aware routing + the what-if re-pricing simulator. |
ai_finops_price_watch_* / ai_finops_alerts_* / ai_finops_anomaly_acks | Provider price-change watch, alert channels/rules/log, anomaly acks. |
ai_finops_audit_log | Immutable governance-mutation trail (budgets, policies, kill-switches, …). |
chat_logs table (v8.16/W3):
cost decimal(18,8), cost_currency char(3), and an indexed trace_id
string(64) — the join key back to the ledger. The precision mirrors the ledger’s
cost_total so the two never round apart.
The capability is reachable on all three R44 surfaces over one shared core:
- PHP — the
ai-finops:*Artisan commands (report,capture-prices,check-alerts,prune) + the package services. - HTTP —
api/admin/ai-finops/*(readdashboard/kpis,usage,budgets,footprint/summary,forecast,audit, … ; writebudgets,policies,cost-centers,settings/kill-switch, …). - MCP — three read tools on the
enterprise-kbserver (v8.16/W4), tenant-scoped (R30) and OFF-path safe (R43):FinOpsSpendSummaryTool(total cost + tokens + per-(provider, model) breakdown over a window),FinOpsTopModelsTool(costliest models with cost-share), andFinOpsBudgetStatusTool(each tenant-scoped budget’s limit / spend / state, delegating toBudget::status()).
ai-finops:capture-prices, ai-finops:check-alerts, ai-finops:prune
(ai-finops:report is on-demand).
Security & flags (R32 / R30 / R43)
- Method-aware authorization. Every privileged route sits behind
App\Http\Middleware\FinOpsAuthorize: safe methods (GET/HEAD) require theviewAiFinOpsgate (super-admin + admin); mutating methods (POST/PUT/PATCH/DELETE) requiremanageAiFinOps(super-admin only). The package controllers do no internal authorization — this middleware is the boundary, and it is regression-locked inAdminAuthorizationMatrixTest(R32). - Secure-by-host override. The package default route stack is
['api']+['auth'](no Sanctum, no tenant scope). The hostconfig/ai-finops.phpreplaces it withEncryptCookies + StartSession + auth:sanctum + tenant.authorize(+finops.authorize). The package’shealthprobe is wrapped too, so it is authenticated, not an anonymous uptime endpoint. - Tenant isolation (R30).
HostTenantResolverbinds every ledger row to the active tenant; the MCP read tools filter every ledger / budget query by the active tenant and deliberately exclude global/cross-scope budgets (whose spend aggregates across tenants) — cross-tenant cost leakage is not possible through the API or MCP. - Default-OFF surfaces (R43). Both master
AI_FINOPS_ENABLEDand the admin SPAAI_FINOPS_ADMIN_ENABLEDdegrade cleanly when off (routes unregistered → 404, never a 500); the MCP tools return an empty, well-formed payload when the ledger/budgets tables are absent. Hard enforcement (AI_FINOPS_ENFORCEMENT) is also default-OFF — observe first, then turn on blocking once budgets/policies are seeded.
Decision rationale (ADR-style)
- Package over a hand-rolled cost table. AskMyDocs used to keep static
cost_ratesinconfig/ai.php, but that is a price sheet, not governance — no ledger, no budgets, no per-tenant rollup, no enforcement. Adopting the maintained package buys the entire FinOps loop (and multi-source live pricing) instead of reinventing it. See architecture decisions. - Provider transport migrated to the SDK (ADR 0015). v8.16/W2 reversed the
long-standing raw-
Http::decision for the metered providers: routing chat + embeddings throughlaravel/aimakes metering, pricing and tenant attribution fire natively on the SDK’s own lifecycle events, instead of being bolted on by a bridge. Anthropic / Gemini / Regolo are fully SDK; OpenAI / OpenRouter are SDK for chat + embeddings. - A bounded residual bridge, not a half-migration. The single path still on
raw
Http::is the OpenAI / OpenRouter MCP with-tools turn, whose tool-call shape the SDK does not yet match the host’s loop on.AiCallMeterbridges only that turn into the same pricing pipeline, gated bybridgeShouldMeterChat()so nothing is double-counted. It retires entirely once the SDK’s tool surface lands. - Server-side cost authority (W3). Per-turn cost is resolved once,
server-side, at
ChatLogManagertime and persisted onchat_logswith atrace_idjoin to the ledger. The client no longer guesses cost from a static sheet — it renders the authoritative server number. The whole resolver is gated onAI_FINOPS_METERINGso an off deployment pays nothing and stamps no dangling trace id.
Worked example
A normal chat turn (defaultopenrouter, no tools) is metered natively by the SDK
and writes one tenant-scoped ledger row — no host bridge involved:
chat_logs row carries the resolved cost and the join key:
viewer returns 403 (gate denied); unauthenticated returns
401. Or ask the same question through the MCP surface — an agent on the
enterprise-kb server calls FinOpsSpendSummaryTool and gets the active tenant’s
30-day spend + per-model breakdown, scoped to its own tenant only. Turn the cockpit
on with AI_FINOPS_ADMIN_ENABLED=true and
php artisan vendor:publish --tag=ai-finops-admin-assets --force, then open
/admin/ai-finops.
Gotchas & operations
- Currency. Budgets compare against spend in the base currency (
USD, matching provider list prices);EURis display-only. Set an FX provider to budget in another currency. - Enforcement is off by default. Seed budgets/policies first, then flip
AI_FINOPS_ENFORCEMENT=true— turning it on with no budgets does nothing; with a mis-scoped hard budget it can 402 live traffic. - Pre-flight enforcement is bounded to the SDK path. Hard budget/policy blocks
(HTTP 402) fire on the SDK-metered calls; the residual raw-
Http::MCP tool turn is metered after the call, so it is recorded but not pre-flight blocked. It closes when the bridge retires. - Cost resolution follows
AI_FINOPS_METERING. With metering off,chat_logsrows carrynullcost + notrace_id, and the resolver never touches the price feed — by design, so the off path is free and side-effect-less. - Regolo manual pricing. Regolo is feed-less; enter its rates via
pricing/overridesor it prices at zero.
AI providers
The five-provider federation FinOps meters — now on the
laravel/ai SDK.Multi-tenant isolation
The
TenantContext every ledger row, budget and rollup is scoped to.Scheduler & maintenance
Where the
ai-finops:* Tier-1 crons run.Architecture decisions
ADR 0015 — the provider-transport SDK migration this page builds on.