License management
Every QRY tenant runs against a license: a GCP service-account JSON key that, on validation, returns the tenant's user cap, datasource cap, and feature flags. The license is checked on backend startup and every 6 hours thereafter, so tenants are bounded to what they paid for.
The key word is "bounded" — QRY doesn't crash without a license, it gracefully degrades. This page covers the model and the operational concerns.
What the license carries
For each tenant:
- Maximum users — hard cap on accounts active at any time.
- Maximum datasources — cap on configured datasources.
- Feature flags — granular:
rag,batch-profiling,scheduled-tasks,workspaces,domain-agents,forge,lakeflow,nexus,ml-hub, etc. - Expiration — when the license becomes invalid.
- Tenant id — scoping; one license can't be reused for another tenant.
Validation cadence
- On startup — the backend container won't accept queries until validation passes.
- Every 6 hours — a background validator re-checks. Detects revocations, expiries, and provider outages.
- 24-hour grace period — if validation fails (e.g. GCP outage), QRY stays operational on the last-known-good license for up to 24 hours. After that, the tenant is suspended.
The grace period is the difference between "GCP had a hiccup, no impact" and "GCP outage took down our tenant for half a day". 24h is generous; configurable for high-sensitivity tenants who'd rather fail closed.
Usage snapshots
A daily cronjob writes a usage snapshot per tenant:
- Active users today.
- Active datasources.
- Total queries.
- Total LLM tokens.
- Total scheduled-task executions.
Snapshots feed the licensing dashboard for plan-bumping decisions and cost attribution.
Enforcement
- User cap exceeded — admin can't create new users until existing ones are removed or the plan is bumped.
- Datasource cap exceeded — admin can't add new datasources.
- Feature flag off — the feature's API endpoints return 403; the UI hides the navigation entry.
User-facing copy on enforcement is in Admin > Branding > License messages and can be customised.
Configuring on a tenant
When a tenant is provisioned, the license JSON key is dropped into the tenant's namespace as a Kubernetes secret (qry-license-key). The provisioning script (provision_tenant.sh) handles this — see Multi-tenant provisioning.
For an existing tenant, rotate via:
kubectl create secret generic qry-license-key -n qry-<tenant> \
--from-file=key.json=/path/to/new-key.json \
--dry-run=client -o yaml | kubectl apply -f -
kubectl rollout restart deployment/qry-backend -n qry-<tenant>
Validation re-runs on backend restart and picks up the new key.
Key rotation
Plan: 180-day rotation. Beyond that, the GCP service account key is considered stale.
Rotation steps:
- Create a new key for the tenant's service account in GCP IAM.
- Update the
qry-license-keysecret as above. - Restart
qry-backend. - Verify validation passes (backend logs show "License validated" on startup).
- After 24-48 hours of healthy operation, delete the old key in GCP.
Don't delete the old key before the secret is updated — the in-flight grace period uses the active key, but deletion immediately invalidates it.
What happens during a GCP outage
- Validation calls fail.
- Within the 24h grace, the tenant operates normally on the cached last-known-good license.
- If GCP recovers within 24h, you don't notice.
- After 24h, the tenant moves to a degraded mode: read-only conversations, no new users / datasources / scheduled tasks, no feature gates re-checked. (Partial; configurable.)
- After validation succeeds again, normal operation resumes.
Common issues
Backend startup fails with "license invalid".
The service-account JSON key is malformed, expired, or for the wrong tenant. Check kubectl get secret qry-license-key -n qry-<tenant> -o yaml and decode the value.
License says my plan covers 50 users but admin can't create the 41st. Maybe the cap counts inactive accounts too. Soft-deleted users count for retention purposes. Either bump the plan or hard-delete users beyond the retention window.
Feature flag is on but the UI doesn't show the feature. Browser cache. The user has to hard-refresh after a feature flag toggle. Backend reflects it immediately on next request.
Daily snapshot job hasn't run.
Check the Celery beat scheduler. The job is license.usage_snapshot running daily at UTC 00:00 by default.
24h grace doesn't fit our compliance posture.
Set license.grace_hours to a smaller value (e.g. 1). Below 0 means fail-closed immediately on first validation failure.
See also
- Multi-tenant provisioning — how the license secret lands in a fresh tenant.
- Users and groups — user-cap interactions.
- License Management reference — full feature reference.