Skip to main content

26. Rate Limiting

About this chapter

In this chapter, we'll introduce Rate Limiting to the API - a critical feature of any public-facing API, (and arguably any API). Rate limiting controls the number of requests that can be made to an API (using different algorithms) which ensures fair use of the API's resources, avoids noisy neighbor impacts and does not reward poor integration practices.

rate limiting in production

As called out in Chapter 25 - Compression we are aiming to build a production-quality API in this book. To that end, I would not implement Rate Limiting using .NET Middleware as we're going to do in this chapter.

Instead I'd leverage 1 of the many solutions that provide distributed rate limiting at the infrastructure level, such as API Gateways (AWS API Gateway, Azure API Management), reverse proxies (NGINX Plus), or cloud-based services (Cloudflare). These solutions offer better scalability, consistency across multiple API instances, as well as centralized management.

The solution we'll implement here is certainly better than nothing, and illustrates the key concepts of Rate Limiting theory well - which is sufficient for our purposes.


Learning outcomes:

  • Understand what rate limiting is and why it's essential for API protection
  • Learn about common rate limiting algorithms and their trade-offs
  • Implement rate limiting using .NET's built-in middleware
  • Configure Fixed Window rate limiting with IP-based partitioning
  • Return proper HTTP 429 responses with Retry-After headers
  • Understand the limitations of middleware-based rate limiting for production scenarios
  • Recognize when to use infrastructure-level rate limiting solutions

Architecture Checkpoint

While we are making code changes in this chapter, they do not directly relate to the solution architecture. The code changes all relate to the request pipeline configured in Program.cs


Companion Code
  • The code for this section can be found here on GitHub
  • The complete finished code can be found here on GitHub

Feature branch

Ensure that main is current, then create a feature branch called: chapter_26_rate_limiting, and check it out:

git branch chapter_26_rate_limiting
git checkout chapter_26_rate_limiting
tip

If you can't remember the full workflow, refer back to Chapter 5

Rate Limiting

Rate limiting is a technique used to control the rate at which clients can make requests to an API. At its core, it's about enforcing constraints on resource consumption to maintain system stability and fairness.

Why Rate Limiting?

APIs are shared resources that need protection from both malicious actors and well-intentioned but poorly designed clients. Rate limiting serves several critical purposes:

  • Prevent Resource Exhaustion: Uncontrolled request volumes can overwhelm server resources (CPU, memory, database connections), degrading performance for all users
  • Mitigate DDoS Attacks: By limiting requests per client, you reduce the impact of distributed denial-of-service attacks
  • Ensure Fair Usage: Prevents a single client from monopolizing API resources at the expense of others (the "noisy neighbor" problem)
  • Cost Control: For APIs backed by metered infrastructure, limiting requests helps control operational costs
  • Enforce SLA Tiers: Different clients can have different rate limits based on subscription levels or service agreements

Common Rate Limiting Algorithms

Several algorithms exist for implementing rate limiting, each with different characteristics and trade-offs:

Fixed Window: Allows a fixed number of requests within a time window (e.g., 100 requests per minute). Simple to implement but can allow bursts at window boundaries.

Sliding Window: Similar to fixed window but uses a rolling time window, providing smoother rate limiting without boundary burst issues.

Token Bucket: Tokens are added to a bucket at a constant rate up to a maximum. Each request consumes a token. Allows for controlled bursts while maintaining average rate limits.

Leaky Bucket: Requests enter a queue (bucket) and are processed at a constant rate. Excess requests overflow and are rejected. Smooths out bursts but can delay requests.

Concurrency Limit: Instead of limiting request rate, this algorithm limits the number of concurrent requests, useful for long-running operations.

For our implementation, we'll use .NET's built-in rate limiting middleware which supports several of these algorithms.

Implement

Program.cs

Open Program.cs and add the following using statement:

using System.Threading.RateLimiting;

Register the rate limiting service:

// .
// .
// .
// Existing code

builder.Services.AddCors(opt => {
opt.AddPolicy("JavaScriptClient",
policyBuilder => {
policyBuilder.WithOrigins(allowedOrigins)
.AllowAnyHeader()
.AllowAnyMethod()
.AllowCredentials();
});
});

builder.Services.AddRateLimiter(options =>
{
// Default policy: fixed window, 5 requests per minute per IP
options.GlobalLimiter = PartitionedRateLimiter.Create<HttpContext, string>(httpContext =>
{
var ip = httpContext.Connection.RemoteIpAddress?.ToString() ?? "unknown";
return RateLimitPartition.GetFixedWindowLimiter(
partitionKey: ip,
factory: partition => new FixedWindowRateLimiterOptions
{
PermitLimit = 5,
Window = TimeSpan.FromMinutes(1),
QueueProcessingOrder = QueueProcessingOrder.OldestFirst,
QueueLimit = 0 // Don't queue requests
});
});

options.OnRejected = async (context, cancellationToken) =>
{
context.HttpContext.Response.StatusCode = StatusCodes.Status429TooManyRequests;
context.HttpContext.Response.Headers["Retry-After"] = "60"; // Retry after 60 seconds
context.HttpContext.Response.ContentType = "application/json";
await context.HttpContext.Response.WriteAsync(
"{\"error\":\"Too many requests. Please try again later.\"}",
cancellationToken);
};
});

var app = builder.Build();

// Existing code
// .
// .
// .

This code:

  • Registers rate limiting services with AddRateLimiter()
  • Creates a GlobalLimiter that applies to all incoming requests
  • Partitions rate limits by client IP address (each IP gets its own limit counter)
  • Implements a Fixed Window algorithm allowing 5 requests per minute per IP
  • Sets QueueLimit to 0, meaning requests exceeding the limit are rejected immediately rather than queued
  • Defines an OnRejected handler that:
    • Returns an HTTP 429 (Too Many Requests) status code
    • Supplies a Retry-After header with a value of 60 (retry again after 60s)
    • Supplies JSON error message
how low can you go

I have set the number of requests to 5 in the above code to allow us to test the rate limiting functionality. This value would typically be way too low for most applications, but again would depend on the use-case for your API.

Just be mindful that in this case, 5 is a very low threshold that you almost certainly would increase in more production-focused scenarios.

Then register in the request pipeline:

// .
// .
// .
// Existing code

var app = builder.Build();

app.UseMiddleware<GlobalExceptionHandlerMiddleware>();

app.UseSerilogRequestLogging();

app.UseRateLimiter();

if (app.Environment.IsDevelopment())
{
app.MapOpenApi();
app.UseHangfireDashboard();
}

// Existing code
// .
// .
// .

Rate limiting should come early in the pipeline before any significant processing. In this case I've only prioritized: Global Exception Handling and Logging. You could be even more aggressive in where rate limiting is placed (i.e. move it earlier in the pipeline) but I feel the current placement provides a good balance of functionalities.

Exercising

Save everything, and run up the API. As we have added rate limiting globally: every endpoint is subjected to rate limiting, we can test with any endpoint, e.g.:

### Get all platforms
GET {{baseUrl}}/api/platforms

Make 6 requests within a 1 min window and you should hit the rate limit:

HTTP/1.1 429 Too Many Requests
Connection: close
Content-Type: application/json
Date: Thu, 05 Mar 2026 13:21:39 GMT
Server: Kestrel
Retry-After: 60
Transfer-Encoding: chunked

{
"error": "Too many requests. Please try again later."
}

You can see we get:

  • HTTP 429 response
  • Retry-After header with a value of 60

The Fixed Window algorithm starts its 60-second window when the first request that triggers rate limiting is received. Subsequent requests will be rejected until those 60 seconds elapse.

E.g. if you try a 7th request within 1 minute of the 6th request you'll get:

HTTP/1.1 429 Too Many Requests
Connection: close
Content-Type: application/json
Date: Thu, 05 Mar 2026 13:21:51 GMT
Server: Kestrel
Retry-After: 60
Transfer-Encoding: chunked

{
"error": "Too many requests. Please try again later."
}

One limitation of our implementation is that we have hardcoded the value of Retry-After to 60, meaning that all subsequent requests within the fixed retry window will show 60, when in fact the remaining time will be lower.

We could add some rather verbose timing logic to remedy this, but it's a bit convoluted to the point where I don't think it's worth it.

Revert limit

As a reminder, we set the rate limit very low to 5 requests to facilitate simple testing. Having completed our simple exercises, you may choose to update this value to something a little more generous. E.g. 100 requests a minute is still very conservative.

Version Control

With the code complete, it's time to commit our code. A summary of those steps can be found below, for a more detailed overview refer to Chapter 5

  • Save all files
  • git add .
  • git commit -m "add rate limiting"
  • git push (will fail - copy suggestion)
  • git push --set-upstream origin chapter_26_rate_limiting
  • Move to GitHub and complete the PR process through to merging
  • Back at a command prompt: git checkout main
  • git pull

Conclusion

In this chapter, we implemented rate limiting using .NET's built-in middleware with a Fixed Window algorithm partitioned by client IP address. While functional and useful for learning the core concepts, this implementation has notable limitations—particularly the hardcoded Retry-After header and lack of distributed state across multiple API instances.

For production APIs, especially those running at scale or across multiple servers, infrastructure-level rate limiting solutions (API Gateways, CDNs, or dedicated rate limiting services) offer superior performance, accurate retry information, and centralized management. That said, the middleware approach we've implemented here provides basic protection and is significantly better than no rate limiting at all.

Rate limiting is a critical defense mechanism for any API. Even an imperfect implementation protects against resource exhaustion and abuse, giving you time to detect issues and respond appropriately. As your API matures and traffic grows, migrating to a more robust solution should be a priority.