Top LinkedIn Content on Best Practices for API Development

Principal Software Engineer @Atlassian| Ex-Sr. Engineer @Microsoft || Sharing insights on SW Engineering, Career Growth & Interview Preparation

69,298 followers 6mo

A candidate interviewing for a Senior Engineer @ Meta was asked to design a rate limiter. Another candidate at Google's L5 loop got hit with the same question. I've been asked this three times across different companies. Rate-limiting questions look simple until you add one layer of complexity: – Add distributed rate limiting? Now you're dealing with race conditions and clock skew. – Add multiple rate limit tiers? Welcome to priority queues and quota management. – Add per-user, per-IP, and per-API-key limits? Your Redis bill just exploded. Here's my personal checklist of 15 things you must get right when building rate limiters: 1. Always do rate limiting on the server, not the client → Client-side limits are useless. They’re easily bypassed, so always enforce limits on your backend. 2. Choose the right placement → For most web APIs, place the rate limiter at the API gateway or load balancer (the “edge”) for global protection and minimal added latency. 3. Identify users correctly → Use a combination of user ID, API key, and IP address. Apply stricter limits for anonymous/IP-only clients, higher for authenticated or premium users. 4. Support multiple rule types → Allow per-user, per-IP, and per-endpoint limits. Make rules configurable, not hardcoded. 5. Pick an algorithm that fits your needs → Know the pros/cons: – Fixed Window: Easy, but suffers from burst issues. – Sliding Log: Accurate, but memory-heavy. – Sliding Window Counter: Good balance, small memory footprint. – Token Bucket: Handles bursts and steady rates, an industry standard for distributed systems. 6. Store rate limit state in a fast, shared store → Use an in-memory cache like Redis or Memcached. Every gateway instance must read and write to this store, so limits are enforced globally. 7. Make every check atomic → Use atomic operations (e.g., Redis Lua scripts or MULTI/EXEC) to avoid race conditions and double-accepting requests. 8. Shard your cache for scale → Don’t rely on a single Redis instance. Use Redis Cluster or consistent hashing to scale horizontally and handle millions of users/requests. 9. Build in replication and failover → Each cache node should have replicas. If a primary fails, replicas take over. This keeps the system available and fault-tolerant. 10. Decide your “failure mode” → Fail-open (let all requests through if the cache is down) = risk of backend overload. Fail-closed (block all requests) = user-facing downtime. For critical APIs, prefer fail-closed to protect backend. 11. Return proper status codes and headers → Use HTTP 429 for “Too Many Requests.” Include headers like: – X-RateLimit-Limit, – X-RateLimit-Remaining, – X-RateLimit-Reset, Retry-After This helps clients know when to back off. 12. Use connection pooling for cache access → Avoid reconnecting to Redis on every check. Pool connections to minimize latency. Continued in Comments...

69 Comments

Milan Jovanović

Practical .NET and Software Architecture Tips | Microsoft MVP

277,337 followers 1w

In-memory rate limiting works perfectly... right until you scale. If you're running a single API instance, storing rate limits in memory is easy and fast. But the second you scale out to multiple instances, that approach breaks down. Your services suddenly have no centralized way to agree on the correct rate limit values, leaving your API vulnerable. That’s when you need a distributed rate limiter. In my new video, we tackle this head-on. I'll show you exactly how to build a production-ready, distributed rate limiter in C# using Redis. We walk through the code for: 🔹 Why native .NET memory limits fall short at scale 🔹 Building a Fixed Window algorithm using Redis counters 🔹 Implementing a more advanced Sliding Window algorithm with Redis sorted sets 🔹 Hooking it all up with custom middleware in .NET 10 If you want to protect your APIs from abuse and handle high traffic correctly, check out the full step-by-step breakdown here: https://lnkd.in/gWk4SaSM

22 Comments

Priyanka Vergadia

#1 Visual Storyteller in Tech | VP Level Product & GTM | TED Speaker | Enterprise AI Adoption at Scale

117,536 followers 3mo

🛑 "429 Too Many Requests" isn't just an error code; it's a survival strategy for your distributed systems. Stop treating Rate Limiting as a simple counter. To prevent crashes, you need the right algorithm. This visual explains the patterns you need to know. 𝐇𝐨𝐰 𝐰𝐞 𝐜𝐨𝐮𝐧𝐭: 1️⃣ Token Bucket: User gets a "bucket" of tokens that refills at a constant rate. Great for bursty traffic. If a user has been idle, they accumulate tokens and can make a sudden burst of requests without being throttled immediately. Use Case: Social media feeds or messaging apps. 2️⃣ Leaky Bucket: Requests enter a queue and are processed at a constant, fixed rate. Acts as a traffic shaper. It smooths out spikes, protecting your database from write-heavy shockwaves. Use Case: Throttling network packets or writing to legacy systems. 3️⃣ Fixed Window: A simple counter resets at specific time boundaries (e.g., the top of the minute). Easiest to implement but suffers from the "boundary double-hit" issue (e.g., 100 requests at 12:00:59 and 100 more at 12:01:01). Use Case: Basic internal tools where precision isn't critical. 4️⃣ Sliding Window Log: Tracks the timestamp of every request. Solves the boundary issue completely. It’s highly accurate but expensive on memory (O(N) space complexity) because you store logs, not just a count. Use Case: High-precision, low-volume APIs. 5️⃣ Sliding Window Counter: The hybrid approach. Approximates the rate by weighing the count of the previous window and the current window. Low memory footprint, high accuracy. Use Case: Large-scale systems handling millions of RPS. 𝐖𝐡𝐞𝐫𝐞 𝐰𝐞 𝐞𝐧𝐟𝐨𝐫𝐜𝐞 6️⃣ Distributed Rate Limiting: Essential for microservices. You cannot rely on local memory; you need a centralized store (like Redis with Lua scripts) to maintain a global count across the cluster. 7️⃣ Fixed Window with Quota: Often distinct from technical throttling. This is business logic—hard caps over long periods (months/years). Use Case: Tiered billing plans (e.g., "Free Tier: 10k calls/month"). 8️⃣ Adaptive Rate Limiting: The "smart" limiter. It doesn't use static numbers but monitors system health (CPU, memory, latency). If the system struggles, it tightens the limits automatically. Use Case: Auto-scaling systems and disaster recovery. 𝐖𝐡𝐨 𝐰𝐞 𝐥𝐢𝐦𝐢𝐭 9️⃣ IP-Based Rate Limiting: The first line of defense. Limits based on the source IP to prevent botnets or DDoS attacks. Use Case: Public-facing unauthenticated APIs. 🔟 User/Tenant-Based Rate Limiting: Limits based on API Key or User ID. Ensures one heavy user doesn't degrade performance for others ("Noisy Neighbor" problem). Use Case: SaaS platforms and multi-tenant architectures. 💡 For most production systems, Sliding Window Counter combined with Distributed Limiting is the gold standard. It offers the best balance of memory efficiency and user fairness. #SystemDesign #SoftwareArchitecture #API #Microservices #DevOps #BackendEngineering #RateLimiting #CloudComputing

2 Comments

Rocky Bhatia

400K+ Engineers | Architect @ Adobe | GenAI & Systems at Scale

215,406 followers 1mo

Your API works perfectly - until someone hammers it with 10,000 requests in a second. Rate limiting is what stands between a stable system and a full outage. But not all rate limiting algorithms are equal 👇 1. Fixed Window Counter Counts requests in a fixed time window and resets after each interval. Simple to implement but burst-prone at window boundaries. 2. Sliding Window Log Stores each request timestamp and removes expired entries. Accurate limiting but memory-heavy at scale. 4. Sliding Window Counter Combines current and previous window counts to smooth traffic. Lower memory usage, better burst protection than fixed windows. 4. Token Bucket Adds tokens at a fixed rate. Requests consume tokens. Supports controlled bursts while maintaining average rate limits. Most widely used. 5. Leaky Bucket Processes requests at a fixed outflow rate. Smooths bursts by queuing or dropping excess traffic. Predictable but less flexible. 6. Concurrency Limiter Limits how many requests run simultaneously - not per time window. Essential for protecting downstream services from overload. How to choose: → Need simplicity? Fixed Window → Need accuracy? Sliding Window Log → Need balance? Sliding Window Counter → Need burst tolerance? Token Bucket → Need smooth throughput? Leaky Bucket → Protecting a slow backend? Concurrency Limiter Most production systems combine 2–3 of these at different layers - gateway, service, and database. One algorithm rarely covers all your attack surfaces. Which one does your system rely on? 👇

59 Comments

Sameer Bhardwaj

Co-founder @Layrs | Ex Google

50,436 followers 4w

Imagine you’re in a system design interview at Google for an L5 role, and the interviewer asks: “If 10M users hit your API at the same time and your rate limiter allows 1000 req/sec, what happens to the other 9.99M?” This is a classic overload-control + retry-amplification problem. Btw, if you’re preparing for system design interviews, check out our AI Tutor: https://lnkd.in/gcWfR7jW You can: - voice chat about your questions in real-time - get feedback in real time and improve with these sessions - learn concepts, practice HLD questions even if you're a complete beginner Here is how I would break it down. [1] Clarify what we actually need to build This is not just “return 429 when over the limit.” It is: - protect the backend from overload - keep latency stable for the requests we do accept - avoid retry storms from rejected clients - give clients a fair chance to recover - degrade gracefully instead of turning 10M requests into 20M So the core problem is not only rate limiting. It is admission control plus controlled recovery behavior. [2] The other 9.99M cannot all get immediate retries If all rejected requests get a 429 and retry immediately, the limiter becomes part of the problem. A better model is: - accept up to the allowed rate - reject excess traffic quickly - return backoff hints like `Retry-After` - force clients and SDKs to use exponential backoff + jitter - optionally queue a small bounded overflow only if the business case justifies it The key idea is simple: do not turn rejection into amplification. [3] High-level flow A reasonable design would be: - clients hit edge load balancers / API gateway - request first passes through a distributed rate limiter - accepted requests move to the backend - rejected requests get a fast 429 or graceful degradation response - clients retry later using backoff, not instantly - observability layer tracks rejection rate, retry rate, queue depth, and user impact The limiter is only one part. The client behavior matters just as much. [4] What should happen to the rejected traffic? This depends on the API. For example: - interactive read APIs: reject fast, retry later - write APIs: maybe accept into a bounded queue if loss is costly - idempotent operations: safer to retry - non-critical traffic: drop or degrade early - premium / internal traffic: separate priority buckets So the answer is not “all 9.99M get blocked.” The answer is “different classes of traffic may be handled differently.” [5] The tradeoffs interviewers care about This is where the answer gets interesting: - immediate 429 is cheap, but dangerous if clients retry badly - queues smooth bursts, but can increase latency and memory pressure - token bucket handles bursts better than a strict per-second counter - fairness matters so one tenant or region does not starve everyone else - backoff with jitter is critical to avoid synchronized retries - if the limiter itself fails, fail-open vs fail-closed depends on the API

26 Comments

SERKUT YILDIRIM

Microsoft MVP | .NET, Java, Software Development

59,983 followers 3w

💡 𝗖#/.𝗡𝗘𝗧 𝗖𝗼𝗿𝗲 𝗧𝗶𝗽 - 𝗥𝗮𝘁𝗲 𝗟𝗶𝗺𝗶𝘁𝗶𝗻𝗴 🔥 💎 𝗥𝗮𝘁𝗲 𝗟𝗶𝗺𝗶𝘁𝗶𝗻𝗴 𝗶𝗻 .𝗡𝗘𝗧 𝗶𝘀 𝗕𝘂𝗶𝗹𝘁-𝗜𝗻! Did you know .NET includes native rate limiting middleware? No third-party libraries needed to protect your APIs. ✅ 𝗪𝗵𝘆 𝗜𝘁 𝗠𝗮𝘁𝘁𝗲𝗿𝘀 Rate limiting protects your APIs from abuse, DDOS attacks, and excessive resource consumption. It ensures fair usage across clients and prevents brute-force attempts on sensitive endpoints like authentication. ⚡ 𝗙𝗼𝘂𝗿 𝗣𝗼𝘄𝗲𝗿𝗳𝘂𝗹 𝗔𝗹𝗴𝗼𝗿𝗶𝘁𝗵𝗺𝘀 𝗢𝘂𝘁-𝗼𝗳-𝘁𝗵𝗲-𝗕𝗼𝘅 ◾ 𝗙𝗶𝘅𝗲𝗱 𝗪𝗶𝗻𝗱𝗼𝘄: Simple time-based limits (e.g., 100 requests per minute). ◾ 𝗦𝗹𝗶𝗱𝗶𝗻𝗴 𝗪𝗶𝗻𝗱𝗼𝘄: Smoother distribution, prevents burst attacks at window boundaries. ◾ 𝗧𝗼𝗸𝗲𝗻 𝗕𝘂𝗰𝗸𝗲𝘁: Allows controlled bursts while maintaining average rate. ◾ 𝗖𝗼𝗻𝗰𝘂𝗿𝗿𝗲𝗻𝗰𝘆: Limits simultaneous connections (perfect for .database-heavy operations). 🔥 𝗞𝗲𝘆 𝗙𝗲𝗮𝘁𝘂𝗿𝗲𝘀 ◾ Per-endpoint policies with different rules per route. ◾ IP-based or user-based partitioning for granular control. ◾ Works in-memory by default, easily extends to Redis for multi-server deployments. ◾ Seamless integration with .NET Core middleware pipeline. 🤔 𝗪𝗵𝗶𝗰𝗵 𝗿𝗮𝘁𝗲 𝗹𝗶𝗺𝗶𝘁𝗶𝗻𝗴 𝗮𝗹𝗴𝗼𝗿𝗶𝘁𝗵𝗺 𝗱𝗼 𝘆𝗼𝘂 𝘂𝘀𝗲? #csharp #dotnet #programming #softwareengineering #softwaredevelopment

55 Comments

Tannika Majumder

Senior Software Engineer at Microsoft | Ex Postman | Ex OYO | IIIT Hyderabad

49,260 followers 6mo

I spent over 18 months of my life working 50-60-hour workweeks at Postman, when I restarted my career after doing my masters at IIITH. One of the biggest learning after dealing with APIs all day was that if you’re scaling your production API to millions of users, it’s best done with rate limiters. But how is it applied exactly, and how does it work? Let me break it down to you with the example of Stripe: ● Why Do APIs Need Rate Limiting? – When your app gets a sudden surge of users, API traffic can spike to millions of requests a minute. – Sometimes it’s a runaway script, a spam bot, or just honest users trying to batch-process a lot of data. – If one user (or bug) floods your servers, it can slow down or crash your service for everyone. – Rate limiting sets boundaries so no single user, bug, or partner can bring your API down. ● How Does Stripe Use Rate Limiters? Let’s break down Stripe’s 4-layer rate limiting and load shedding system. ● Request Rate Limiter – Limits how many API requests a user can send per second (e.g., 100 req/sec). – Prevents a single customer or buggy script from overloading the system. – Stripe uses the token bucket algorithm, every request “spends” a token, and tokens refill at a steady pace. – Allows for small, quick bursts in traffic (like during a flash sale) but smooths things back down. – If you go over, you get an HTTP 429 (“Too Many Requests”). ● Concurrent Requests Limiter – Restricts how many API requests you can have in progress at one time (e.g., only 20 running at once). – Useful for expensive endpoints that use a lot of CPU/memory. – Encourages users to finish their current jobs before sending new ones. – Solved real issues at Stripe, where too many simultaneous requests to heavy endpoints caused slowdowns. ● Fleet Usage Load Shedder – Reserves part of Stripe’s infrastructure for the most critical API requests. – Keeps essential operations (like charging a card) flowing, even if less-critical actions (like listing charges) have to wait. – If non-critical traffic uses up too much capacity, those requests are dropped (shed) to protect high-priority traffic. – Stripe splits infrastructure: for example, 20% saved for critical, 80% for everything else. ● Worker Utilization Load Shedder – Monitors how busy API workers (the backend servers) are. – When things get overloaded, starts dropping less important requests in order: + Test mode traffic goes first, + Then GETs, + Then POSTs, + Critical actions are always last to be dropped. – This protects the system during major incidents or sudden surges. – Shedding ramps up slowly, so the system doesn’t keep toggling between overload and normal (avoids “flapping”). Continued in comments...

15 Comments

Du'An Lightfoot

19,306 followers 4mo

Imagine your AI agent burned through $50K in API calls overnight. How could this happen? Simple, a lack of guardrails. Yes, autonomous AI systems are incredibly powerful but they can also be incredibly dangerous without proper boundaries. This is why "Design for Controlled Autonomy" is a core design principle in AWS's GenAI Lens Framework. Think about this: Would you give a junior developer root access to production on day one? No, so why would you let an AI agent operate without constraints? Here's what controlled autonomy looks like: ✓ Operational Requirements Define EXACTLY what your AI can and cannot do. Set token limits, rate limits, and scope boundaries. No exceptions. ✓ Security Controls Implement least-privilege access. Your AI should only touch what it needs to complete its task. The same applies to the tools you give it. Nothing more. ✓ Failure Conditions Build stopping conditions. Set thresholds for when the system should stop, alert, or fail gracefully. Assume failures WILL happen. ✓ Cost Boundaries Set hard caps on API calls, compute resources, and data processing. Monitor usage in real-time, not after the damage is done. ✓ Safe Parameters Define acceptable behavior ranges. If your AI starts acting outside these bounds, it should trigger immediate intervention. The goal is to implement your agent safely without limiting its potential. Autonomy without control = chaos. Control without autonomy = bottleneck. Controlled autonomy = scalable innovation. Most AI failures in production aren't model issues. They're architecture issues. Build the guardrails before you need them. Your future self (and your Leadership) will thank you. What's your approach to setting AI guardrails? Drop your strategies below 👇🏾 #AgenticAI #AIEngineering #CloudArchitecture #AWS #MachineLearning #MLOps #DevOps #ArtificialIntelligence

3 Comments

Bryan Dennstedt🌱

AI Strategy & Fractional CTO | Partner at TechCXO | “AI with Bry” Podcast | Telemedicine & HealthTech Expert, Plant-based Sustainability Investor

7,049 followers 1w

Your app stopped working at 2pm. Every user getting "Too Many Requests" errors. You check logs. 429 errors everywhere. You hit Stripe's API rate limit. You didn't know you had one. You're making 100 API calls per second. Their limit is 25. Your app has been hammering their API for weeks. They rate limited you. Now nothing works. Customer checkouts failing. Support overwhelmed. Cost: $6K in lost revenue. Half a day fixing it. Here's what happened: Your app grew. More users. More API calls. Nobody was tracking API usage against provider limits. Nobody set up caching. Every request hit the API directly. You assumed infinite capacity. You were wrong. Here's what should exist: Know your API limits. Every third-party service has rate limits. Document them. Implement caching. Don't call the API for data you already have. Implement backoff. When you get rate limited, slow down automatically. Monitor API usage. Track calls per minute. Alert before you hit limits. Load test with realistic API constraints. Find limits before production does. If you don't know your third-party API limits, you will hit them in production. Not if. When. 28 years in technology. 17 years as CTO. Every API integration I've built included rate limit handling from day one. Integrating with third-party APIs and not sure about rate limits? Schedule a call at: bry.net Before you hit them and take down your app.

8 Comments

Best Practices for API Development

More in Best Practices for API Development

More Technology topics

Explore categories