Sharding, Replication and System Design Week 51.1 Cohort 3.0
Hi there! I'm Aditya, a passionate Full-Stack Developer driven by a love for turning concepts into captivating digital experiences. With a blend of creativity and technical expertise, I specialize in crafting user-friendly websites and applications that leave a lasting impression. Let's connect and bring your digital vision to life!
Stateless vs Stateful Backend
Stateless backend
Each request is independent. The server does not keep client session state between requests.
Any server instance can handle any request if it has access to shared resources (database, cache).
Stateful backend
Server keeps client-specific state across requests (session, in-memory game state, websocket connection state).
Requests from the same client often need the same server instance or a way to retrieve that state.
When to choose
Use stateless when possible (typical web APIs, microservices, REST/HTTP services).
Use stateful when you need low-latency in-memory state or long-lived bidirectional connections and state replication is impractical.
Examples
Stateless: REST API, microservices reading DB, serverless functions.
Stateful: multiplayer game servers, WebSocket chat servers maintaining connection-specific state, in-memory queued tasks per user.
Common designs to combine strengths
Keep core logic stateless; store session or user state in a shared store (Redis, DB).
Use stateful servers only where needed and front them with a load balancer and sticky sessions, or use a dedicated routing layer.
Stateless vs Stateful
| Aspect | Stateless | Stateful |
|---|---|---|
| Request handling | Any instance can handle any request | Requests may need a specific instance |
| Horizontal scaling | Easy | Harder (needs sticky routing / replication) |
| Failure recovery | Simple (restart any instance) | Complex (must recover in-memory state) |
| Session storage | External (DB/Redis) | Often in-process memory |
| Use cases | REST APIs, microservices, serverless | Game servers, long-lived sockets, session-heavy apps |
| Best for | High scale, simple deployment | Low-latency, connection-oriented workloads |
Scaling Node.js With the Cluster Module
What Cluster does
Node.js Cluster runs multiple Node.js worker processes that share the same server port.
Each worker is a separate process using its own event loop and memory - lets you use multi-core machines.
master process distributes connections to workers (Node handles the LB across workers).
Benefits
Simple way to utilize all CPU cores on one machine.
Auto-restart crashed workers (supervisor behavior via master).
Limitations & considerations
Works only on a single host. To scale across machines you still need a load balancer or service mesh.
Workers do not share memory — use Redis/DB for shared state.
Not a replacement for process managers (PM2, Docker orchestration). Use cluster + process manager or container orchestrator.
Sticky sessions: if your app is stateful (needs the same worker), Node’s default distribution is not sticky. You may need an external sticky LB or a custom master distribution.
Alternatives / complements
PM2 cluster mode (manages forks and restarts).
Worker threads (for CPU-bound tasks within one process).
Multiple containers / pods behind a load balancer for multi-host scaling (Kubernetes, ECS).
Example
Scaling Stateless Backend: Load Balancer Patterns
Why load balancers
Distribute incoming requests across multiple backend instances.
Improve availability (health checks, failover).
Centralize cross-cutting concerns (SSL termination, rate limits, authentication gateways).
Types of load balancers (simple)
Layer 4 (Transport): TCP/UDP level; efficient, fast. e.g., many cloud LBs in TCP mode.
Layer 7 (Application): HTTP-aware; can route by URL, headers; supports SSL termination and advanced routing. e.g., NGINX, HAProxy, ALB (AWS).
Key features to use for stateless services
Health checks: automatically remove unhealthy instances.
Auto-scaling integration: add/remove instances based on metrics.
SSL/TLS termination: offload crypto to LB to simplify backends.
HTTP routing and path-based rules: send traffic to appropriate services.
Sticky sessions: not needed for stateless backends — avoid when possible.
Rate limiting + WAF: protect backends from abuse and attacks.
Connection reuse and keepalive: improve latency and resource usage.
Load balancing algorithms
Round robin: simple, evenly distributes.
Least connections: prefers less-busy instances (good for uneven request durations).
IP hash: deterministic routing (can be used for “soft” stickiness).
Weighted routing: prefer stronger/faster instances.
Sticky sessions and statelessness
Avoid sticky sessions for stateless services — they reduce scheduler flexibility and can cause hot spots.
If you must support sessions: store session in a fast shared store (Redis) and keep backends stateless.
SSL Termination on Load Balancer
Rate Limiting on Load Balancer Level
DDos Protection on Load Balancer Level
Auto Scaling VS Mannual Scaling
Sticky Backend (Sticky Connection)
Scaling Stateful Backend Server
Mostly done manually
ASGs/MIGs let you do metric based scaling
K8s has HPA that lets you scale pods and nodes
Karpenter lets you scale node pools based on applications
Stopping bot abuse and bulk buying for high-demand releases like tickets and sneaker drops
Distributed bots use many IPs, like proxies and botnets, making IP limits useless. Attackers can bypass IP or session limits with multiple accounts or stolen credentials. Bots mimic human-like requests to avoid detection.
CAPTCHA farms let attackers use real people to solve challenges, making low-rate defenses weak. Strict limits can hurt real users during traffic spikes, so thresholds are kept low, which attackers exploit. Attackers also spread requests over time and different endpoints to avoid simple rate limits.
CAPTCHAs + rate limiting - why combine them
Layered defense: rate limits slow mass requests; CAPTCHAs raise attacker cost by requiring human (or costly solver) involvement.
Adaptive application: show CAPTCHAs only after suspicious behavior or rate-thresholds, reducing friction for normal users.
Reduce solver ROI: limits constrain how many CAPTCHA solves an attacker can use, making bulk buys expensive or slow.
Complementary signals: CAPTCHA failures, challenge frequency, and rate-limit hits feed a risk score for stronger actions (block, require verification, queue).
Combine adaptive rate limiting, targeted CAPTCHAs, behavioral analysis, and policies (per-user quotas, verified accounts, lotteries/queues) for effective protection without wrecking UX.
How to Add Frontend Page Captcha with Cloudflare
Cloudflare provides an easy way to protect your frontend pages from spam, bots, fake signups, credential stuffing, and automated attacks using Turnstile Captcha. Unlike traditional captchas, Cloudflare Turnstile focuses on user privacy and smooth user experience.
Steps to Add Cloudflare Turnstile Captcha
1. Create a Turnstile Site
Go to Cloudflare Turnstile
Login to your Cloudflare account
Create a new Turnstile widget
Add your domain
Copy:
Site Key
Secret Key
2. Add Captcha Widget in Frontend
<form id="signup-form">
<input type="email" placeholder="Enter Email" />
<div
class="cf-turnstile"
data-sitekey="YOUR_SITE_KEY">
</div>
<button type="submit">Submit</button>
</form>
<script src="https://challenges.cloudflare.com/turnstile/v0/api.js" async defer></script>
3. Verify Captcha on Backend
Never trust only frontend validation. Always verify captcha response from backend.
Example using Node.js:
const axios = require("axios");
app.post("/submit", async (req, res) => {
const token = req.body["cf-turnstile-response"];
const response = await axios.post(
"https://challenges.cloudflare.com/turnstile/v0/siteverify",
new URLSearchParams({
secret: process.env.TURNSTILE_SECRET,
response: token
})
);
if(response.data.success){
return res.send("Human Verified");
}
res.status(400).send("Captcha Failed");
});
Features of Cloudflare Turnstile
No annoying image puzzles in many cases
Privacy-focused alternative to reCAPTCHA
Reduces fake signup attacks
Works against automated scripts and bots
Easy frontend integration
Backend verification support
Supports invisible verification mode
Frontend Captcha vs Backend Captcha
Both frontend and backend captchas are important, but they serve different purposes.
| Feature | Frontend Captcha | Backend Captcha Verification |
|---|---|---|
| Runs On | Browser/UI | Server |
| Purpose | User interaction validation | Actual security verification |
| Security Level | Medium | High |
| Can Be Bypassed | Yes | Harder |
| Prevents Fake Requests | Partially | Strongly |
| User Experience | Visible to user | Invisible |
| Recommended | Yes | Mandatory |
Frontend Captcha
Frontend captcha is the visible challenge shown to the user before form submission.
Examples:
Checkbox verification
Image selection
Invisible behavioral detection
Limitation
Attackers can directly hit your backend API and bypass frontend checks if backend verification is missing.
Backend Captcha Verification
Backend verification validates the captcha token with Cloudflare servers.
Why It Is Important
Without backend verification:
Anyone can fake requests
Bots can bypass frontend UI
API endpoints remain vulnerable
Best Practice
Always use:
Frontend captcha widget
Backend token verification
Rate limiting
WAF protection
Together they create layered security.
Invisible Captchas vs Visible Captchas in Cloudflare
Cloudflare Turnstile supports both invisible and visible captcha modes.
| Feature | Invisible Captcha | Visible Captcha |
|---|---|---|
| User Interaction | Minimal | Required sometimes |
| UX Experience | Smooth | Slight interruption |
| Bot Detection | Behavioral analysis | Challenge-based |
| Accessibility | Better | Moderate |
| Security | Good | Strong |
| Best Use Case | Modern apps | High-risk forms |
Bot Protection with Cloudflare by Proxied Request
One of Cloudflare’s strongest security advantages is its reverse proxy architecture.
When Cloudflare proxy is enabled:
User → Cloudflare → Your Server
Your real server IP remains hidden behind Cloudflare infrastructure.
How Cloudflare Bot Protection Works
Cloudflare analyzes requests before they reach your server.
It checks IP reputation, request frequency, browser fingerprint, ASN/network reputation, bot signatures, JavaScript execution, TLS fingerprinting, and conducts behavioral analysis. Suspicious traffic can be blocked, challenged, rate limited, redirected, or logged.
Common Cloudflare Security Features
| Feature | Purpose |
|---|---|
| WAF | Blocks malicious requests |
| Rate Limiting | Prevents spam/flood attacks |
| Bot Protection | Detects automated traffic |
| DDoS Protection | Prevents server overload |
| SSL/TLS | Encrypts traffic |
| CDN | Improves speed globally |
AWS Has Similar Service Called WAF
Amazon Web Services also provides security services similar to Cloudflare.
The major one is: AWS WAF helps filter and monitor HTTP requests reaching your applications.
AWS WAF vs Cloudflare
| Feature | Cloudflare | AWS WAF |
|---|---|---|
| Works As | Reverse Proxy CDN | Cloud Firewall |
| DDoS Protection | Built-in | Via AWS Shield |
| Global CDN | Included | CloudFront needed |
| Bot Protection | Strong built-in | Additional configs |
| Setup Complexity | Easier | More AWS-oriented |
| Best For | General web apps | AWS ecosystem apps |
When to Use Cloudflare
Use Cloudflare if:
You want easy setup
You need CDN + security together
You want origin IP protection
You need strong bot mitigation
You want fast global delivery
When to Use AWS WAF
Use AWS WAF if:
Your infrastructure is fully on AWS
You already use CloudFront
You need deep AWS integrations
You want custom enterprise security rules
How Rate Limiting Can Be Bypassed
Rate limiting controls incoming server requests to allocate resources fairly and prevent abuse.
However, attackers can bypass limits by distributing requests across multiple IPs or using botnets.
They might also exploit flaws in rate limiting logic, like resetting counters or targeting unprotected endpoints. Understanding these vulnerabilities is key to enhancing security.
What is CDN ?
A Content Delivery Network (CDN) is a system of distributed servers that work together to deliver web content quickly to users based on their geographic location.
By caching content closer to the user's location, CDNs reduce latency and improve load times. They also help handle large volumes of traffic and protect against certain types of cyber attacks by distributing the load across multiple servers.
This makes them essential for enhancing the performance and reliability of websites and applications.
How to Distribute React Frontend with CDN different ways
Scaling Databases
Serverless Databases
Managed Database
Self Hosted Databases
Good questions
Distributed vs in mem cache
Write first or read first
In memory Cache vs Redis Cache
Which is the Right operation ordering (client → cache → database)
Below are common candidate orders for handling a write when you have a client, a Redis cache, and a database.
Option A — Update the cache, then update the database, then return to the client.
Option B — Update the database, then update the cache, then return to the client.
Option C — Update the database, then invalidate (clear) the cache, then return to the client.
Option D — Invalidate (clear) the cache, then update the database, then return to the client.
Answer : Option D
Indexing in Databases
Indexing is a database optimization technique used to make data searching faster. Without indexing, the database scans every row one by one, which is called a Full Table Scan.
With indexing, the database can directly locate the required data, similar to how we use an index page in a book to quickly find a topic.
In real-world applications containing millions of records, indexing is extremely important because users expect very fast responses.
Most production databases heavily rely on indexes for performance optimization.
How Indexing Works
An index stores a special data structure (commonly B-Tree) that maintains references to the actual rows in sorted order. When a query searches indexed columns, the database engine uses the index instead of scanning the whole table.
Example:
SELECT * FROM users WHERE email = "test@gmail.com";
If email is indexed:
- Database directly finds the row.
If email is not indexed:
- Database checks every row manually.
Common Columns Used for Indexing
User ID
Email
Username
Phone Number
Frequently searched fields
| Advantages of Indexing | Disadvantages of Indexing |
|---|---|
| Faster search operations | Requires additional storage |
| Improves query performance | Insert/Update/Delete becomes slightly slower |
| Reduces response time | Too many indexes can reduce write performance |
| Useful for filtering and sorting |
Database Replicas
Database replication means creating copies of a primary database server. These copies are called replicas or read replicas. Replication is mainly used to improve scalability, availability, and performance in large-scale systems.
In most real-world applications, read operations are much higher than write operations. To reduce load on the primary database, read requests are distributed across multiple replicas.
Primary Database vs Read Replica
| Database Type | Purpose |
|---|---|
| Primary DB | Handles write operations |
| Read Replica | Handles read operations |
95% Reads and 5% Writes Scenario
In applications like social media platforms, where 95% of traffic involves reading data and only 5% involves writing, multiple read replicas should be created to reduce the load on the primary database.
Replication Lag
Replication lag occurs because, in many systems, replication is asynchronous, meaning the primary database updates instantly while replicas receive updates after a short delay, which can cause replicas to become temporarily outdated during high traffic.
What Should Be the Configuration of Read Replicas?
In a system where most operations are read operations (for example 95% reads and 5% writes), the number of Read Replicas should be configured according to the read traffic load.
The general idea is:
More read traffic → More read replicas
Fewer read replicas → Higher load on each replica
If the replicas are not sufficient to handle incoming requests, the database can experience Replication Lag.
Recommended Configuration
| Traffic Type | Suggested Configuration |
|---|---|
| Low read traffic | 1–2 Read Replicas |
| Medium read traffic | 3–5 Read Replicas |
| Very high read traffic | Multiple replicas with load balancer |
Real-World Example
User changes profile photo:
Primary DB updates immediately
Replica may still show old photo for few seconds
<img
src="database-replica-system-design.png"
alt="System design architecture showing one primary database and multiple read replicas handling read traffic"
/>
| Advantages | Disadvantages |
|---|---|
| Handles large read traffic | Replication lag |
| Improves scalability | Increased infrastructure cost |
| Better fault tolerance | Data consistency challenges |
| Reduces database load | Replicas may temporarily show outdated data |
| Read requests can be distributed across multiple servers | More infrastructure management complexity |
| Improves application performance and response time | Network synchronization overhead between databases |
Archiving Old Data (Cold Storage)
As applications grow, databases store massive amounts of data such as logs, analytics, transactions, and user activity. Keeping all historical data inside the main database becomes expensive and affects performance.
To solve this problem, old or rarely accessed data is moved to cheaper storage systems called cold storage or archive storage.
AWS Glacier
A typical strategy involves storing recent data (0–1 year) in the main database and moving older data (1+ years) to Amazon S3 Glacier, which is designed for long-term storage, backup systems, and archived data, keeping the active database small and efficient.
| Advantages | Disadvantages |
|---|---|
| Reduces storage cost | Archived data retrieval is slower |
| Improves DB performance | Extra management complexity |
| Easier database scaling | Data recovery may take additional time |
| Better long-term storage management | Requires archive policies and monitoring |
| Keeps active database smaller and faster | Accessing old data is less convenient |
| Helps optimize production database resources | Additional storage architecture setup required |
<img
src="aws-glacier-archive-system.png"
alt="Architecture diagram showing recent data in active database and old data archived to AWS Glacier"
/>
Sharding in Databases
Sharding is a database scaling technique where one large database is divided into multiple smaller databases called shards. Each shard stores only a portion of the total data.
Sharding is used when a single database server cannot handle massive traffic, storage, or millions of users.
Large-scale companies like:
Instagram
Netflix
Facebook
use sharding to scale their systems.
Types of Database Sharding
When a database becomes too large to handle on a single server, engineers divide the data into smaller parts called shards. This technique is known as Sharding. It helps applications scale horizontally and manage large amounts of traffic and data efficiently.
1. Range-Based Sharding
In range-based sharding, data is divided according to a specific range of values such as IDs, dates, or numbers. This method is simple because records are stored in sequential order.
Users 1-1000 → Shard 1
Users 1001-2000 → Shard 2
Users 2001-3000 → Shard 3
Advantages and Disadvantages
| Advantages | Disadvantages |
|---|---|
| Simple implementation | Uneven load distribution |
| Easy to understand | One shard may become overloaded |
| Easy debugging | Hotspot problem can occur |
| Works well for ordered data | Scaling becomes difficult later |
2. Hash-Based Sharding
In hash-based sharding, a hash function decides which shard will store the data. This helps distribute records more evenly across multiple servers. This technique is commonly used in high-scale systems to balance traffic.
Advantages and Disadvantages
| Advantages | Disadvantages |
|---|---|
| Better load balancing | Difficult resharding |
| Prevents hotspot issues | Complex shard management |
| More even data distribution | Harder debugging |
| Useful for high-scale systems | Migration becomes difficult |
3. Geo-Based Sharding
Geo-based sharding divides data according to geographic regions or user locations. Different regions use different database servers.
India Users → India Server
US Users → US Server
Europe Users → Europe Server
This approach improves speed for global users by keeping data closer to their location.
Advantages and Disadvantages
| Advantages | Disadvantages |
|---|---|
| Faster local response time | Difficult cross-region queries |
| Reduced latency | Complex data synchronization |
| Better user experience | Higher infrastructure complexity |
| Improved regional performance | Cross-region transactions are difficult |
Problems with Sharding
1. Complex Queries
Joining data across shards becomes difficult and slow.
2. Rebalancing Issues
When one shard becomes overloaded, moving data between shards is complicated.
3. Cross-Shard Transactions
Transactions involving multiple shards are difficult to maintain consistently.
4. Hotspot Problem
Some shards may receive much higher traffic than others.
Example:
Viral users
Celebrity accounts
5. Increased Complexity
Application architecture becomes harder to manage.
Need:
Routing logic
Monitoring
Shard management