Rate limits

Per-user quotas and how to handle 429 responses.

Rate limits protect the upstream model providers from abuse and keep latency predictable for everyone. Limits are applied per Supabase user id, not per IP, so multiple clients running under the same account share a single budget.

Current limits

Endpoint	Method	Limit	Window
`/api/ai`	POST	30 requests	10 minutes (rolling)

The window is a rolling sliding window. The oldest request expires off the back of the window 10 minutes after it occurred. There is no hard reset at the top of the hour.

429 response

When you exceed the limit, the server returns HTTP 429 Too Many Requests with a JSON body and a Retry-After header.

HTTP/1.1 429 Too Many Requests
Retry-After: 312
Content-Type: application/json

{
  "error": "rate_limited",
  "retry_after_seconds": 312
}

Retry-After is the number of seconds until your oldest in-window request expires. Wait at least this long before retrying. Retrying earlier will only consume more of the upstream provider's budget without succeeding.

Recommended client behavior

Treat 429 as a normal control-flow signal, not an error.
Read Retry-After; if absent, fall back to exponential backoff starting at 30 seconds.
Cap concurrent in-flight requests at 2 to 3 per user.
Cache deterministic prompt outputs locally where possible.

Minimal retry loop (pseudo)

response = call_api()
if response.status == 429:
    wait = int(response.headers.get("Retry-After", "30"))
    sleep(wait)
    response = call_api()

Infrastructure note

Rate state is currently held in per-edge-instance memory. In practice users hit the same instance for a given region, but in rare cases your effective budget may be slightly higher than 30 in 10 minutes if your requests fan out across regions. Do not rely on this for capacity planning; treat 30 in 10 minutes as the contract.

Need a higher limit

If your integration legitimately needs a higher cap (research workloads, batch backfills, multi-account operations) get in touch through the in-app feature request form with a description of the workload and an estimate of peak requests per minute. Custom limits are available on the Lifetime plan and via enterprise arrangements.