Rate limiting allows you to enhance system security and protect API servers by limiting the
number of API requests that can be made within a given period of time. A request can be as simple
request for the homepage of a website or a
request on a log‑in form.
You develop rate limiting configurations that specify Request Quotas and Spike Arrest limits.
Request Quota and Spike Arrest are counters that have different lifetime windows. The Request
Quota time window is typically much longer than the Spike Arrest time window. To keep all the
requests allowed in the longer window from being exhausted immediately, Spike Arrest prevents a
shorter term “spike” in the request rate.
For example, if you have a Request Quota of 1000 requests per minute and a Spike Arrest of 50
requests per 10 seconds, if 50 requests are received in 5 seconds, for the next 5 seconds all
requests are sent to the fallback branch but they are still counted in the request quota. The 1
minute window continues.
If clients send 100 requests in 10 seconds. The first 50 requests are passed, and the next 50
requests are sent to the fallback branch (but still count toward the quota). After 10 seconds,
the available quota is reduced by 100, and becomes 900.
Often companies that provide APIs charge for different levels of access to the API. By
classifying API requests and using rate limiting, you can direct different classes of users to
different quotas. You can use multiple Rate Limiting agents in a per-request policy to impose
additional restrictions, as needed.