Sharding, Replication and System Design Week 51.1 Cohort 3.0

Stateless vs Stateful Backend

Stateless backend

Each request is independent. The server does not keep client session state between requests.

Any server instance can handle any request if it has access to shared resources (database, cache).

Stateful backend

Server keeps client-specific state across requests (session, in-memory game state, websocket connection state).

Requests from the same client often need the same server instance or a way to retrieve that state.

When to choose

Use stateless when possible (typical web APIs, microservices, REST/HTTP services).
Use stateful when you need low-latency in-memory state or long-lived bidirectional connections and state replication is impractical.

Examples

Stateless: REST API, microservices reading DB, serverless functions.
Stateful: multiplayer game servers, WebSocket chat servers maintaining connection-specific state, in-memory queued tasks per user.

Common designs to combine strengths

Keep core logic stateless; store session or user state in a shared store (Redis, DB).
Use stateful servers only where needed and front them with a load balancer and sticky sessions, or use a dedicated routing layer.

Stateless vs Stateful

Aspect	Stateless	Stateful
Request handling	Any instance can handle any request	Requests may need a specific instance
Horizontal scaling	Easy	Harder (needs sticky routing / replication)
Failure recovery	Simple (restart any instance)	Complex (must recover in-memory state)
Session storage	External (DB/Redis)	Often in-process memory
Use cases	REST APIs, microservices, serverless	Game servers, long-lived sockets, session-heavy apps
Best for	High scale, simple deployment	Low-latency, connection-oriented workloads

Scaling Node.js With the Cluster Module

What Cluster does

Node.js Cluster runs multiple Node.js worker processes that share the same server port.

Each worker is a separate process using its own event loop and memory - lets you use multi-core machines.

master process distributes connections to workers (Node handles the LB across workers).

Benefits

Simple way to utilize all CPU cores on one machine.
Auto-restart crashed workers (supervisor behavior via master).

Limitations & considerations

Works only on a single host. To scale across machines you still need a load balancer or service mesh.
Workers do not share memory — use Redis/DB for shared state.
Not a replacement for process managers (PM2, Docker orchestration). Use cluster + process manager or container orchestrator.
Sticky sessions: if your app is stateful (needs the same worker), Node’s default distribution is not sticky. You may need an external sticky LB or a custom master distribution.

Alternatives / complements

PM2 cluster mode (manages forks and restarts).
Worker threads (for CPU-bound tasks within one process).
Multiple containers / pods behind a load balancer for multi-host scaling (Kubernetes, ECS).

Example

Scaling Stateless Backend: Load Balancer Patterns

Why load balancers

Distribute incoming requests across multiple backend instances.
Improve availability (health checks, failover).
Centralize cross-cutting concerns (SSL termination, rate limits, authentication gateways).

Types of load balancers (simple)

Layer 4 (Transport): TCP/UDP level; efficient, fast. e.g., many cloud LBs in TCP mode.
Layer 7 (Application): HTTP-aware; can route by URL, headers; supports SSL termination and advanced routing. e.g., NGINX, HAProxy, ALB (AWS).

Key features to use for stateless services

Health checks: automatically remove unhealthy instances.
Auto-scaling integration: add/remove instances based on metrics.
SSL/TLS termination: offload crypto to LB to simplify backends.
HTTP routing and path-based rules: send traffic to appropriate services.
Sticky sessions: not needed for stateless backends — avoid when possible.
Rate limiting + WAF: protect backends from abuse and attacks.
Connection reuse and keepalive: improve latency and resource usage.

Load balancing algorithms

Round robin: simple, evenly distributes.
Least connections: prefers less-busy instances (good for uneven request durations).
IP hash: deterministic routing (can be used for “soft” stickiness).
Weighted routing: prefer stronger/faster instances.

Sticky sessions and statelessness

Avoid sticky sessions for stateless services — they reduce scheduler flexibility and can cause hot spots.

If you must support sessions: store session in a fast shared store (Redis) and keep backends stateless.

SSL Termination on Load Balancer

Rate Limiting on Load Balancer Level

DDos Protection on Load Balancer Level

Auto Scaling VS Mannual Scaling

Sticky Backend (Sticky Connection)

Scaling Stateful Backend Server

Mostly done manually

ASGs/MIGs let you do metric based scaling

K8s has HPA that lets you scale pods and nodes

Karpenter lets you scale node pools based on applications

Stopping bot abuse and bulk buying for high-demand releases like tickets and sneaker drops

Distributed bots use many IPs, like proxies and botnets, making IP limits useless. Attackers can bypass IP or session limits with multiple accounts or stolen credentials. Bots mimic human-like requests to avoid detection.

CAPTCHA farms let attackers use real people to solve challenges, making low-rate defenses weak. Strict limits can hurt real users during traffic spikes, so thresholds are kept low, which attackers exploit. Attackers also spread requests over time and different endpoints to avoid simple rate limits.

CAPTCHAs + rate limiting - why combine them

Layered defense: rate limits slow mass requests; CAPTCHAs raise attacker cost by requiring human (or costly solver) involvement.
Adaptive application: show CAPTCHAs only after suspicious behavior or rate-thresholds, reducing friction for normal users.
Reduce solver ROI: limits constrain how many CAPTCHA solves an attacker can use, making bulk buys expensive or slow.
Complementary signals: CAPTCHA failures, challenge frequency, and rate-limit hits feed a risk score for stronger actions (block, require verification, queue).

Combine adaptive rate limiting, targeted CAPTCHAs, behavioral analysis, and policies (per-user quotas, verified accounts, lotteries/queues) for effective protection without wrecking UX.

How to Add Frontend Page Captcha with Cloudflare

Cloudflare provides an easy way to protect your frontend pages from spam, bots, fake signups, credential stuffing, and automated attacks using Turnstile Captcha. Unlike traditional captchas, Cloudflare Turnstile focuses on user privacy and smooth user experience.

Steps to Add Cloudflare Turnstile Captcha

1. Create a Turnstile Site

Go to Cloudflare Turnstile
Login to your Cloudflare account
Create a new Turnstile widget
Add your domain
Copy:
- Site Key
- Secret Key

<form id="signup-form">
  <input type="email" placeholder="Enter Email" />

  <div
    class="cf-turnstile"
    data-sitekey="YOUR_SITE_KEY">
  </div>

  <button type="submit">Submit</button>
</form>

<script src="https://challenges.cloudflare.com/turnstile/v0/api.js" async defer></script>

3. Verify Captcha on Backend

Never trust only frontend validation. Always verify captcha response from backend.

Example using Node.js:

const axios = require("axios");

app.post("/submit", async (req, res) => {

   const token = req.body["cf-turnstile-response"];

   const response = await axios.post(
      "https://challenges.cloudflare.com/turnstile/v0/siteverify",
      new URLSearchParams({
         secret: process.env.TURNSTILE_SECRET,
         response: token
      })
   );

   if(response.data.success){
      return res.send("Human Verified");
   }

   res.status(400).send("Captcha Failed");

});

Features of Cloudflare Turnstile

No annoying image puzzles in many cases
Privacy-focused alternative to reCAPTCHA
Reduces fake signup attacks
Works against automated scripts and bots
Easy frontend integration
Backend verification support
Supports invisible verification mode

Frontend Captcha vs Backend Captcha

Both frontend and backend captchas are important, but they serve different purposes.

Feature	Frontend Captcha	Backend Captcha Verification
Runs On	Browser/UI	Server
Purpose	User interaction validation	Actual security verification
Security Level	Medium	High
Can Be Bypassed	Yes	Harder
Prevents Fake Requests	Partially	Strongly
User Experience	Visible to user	Invisible
Recommended	Yes	Mandatory

Frontend Captcha

Frontend captcha is the visible challenge shown to the user before form submission.

Examples:

Checkbox verification
Image selection
Invisible behavioral detection

Limitation

Attackers can directly hit your backend API and bypass frontend checks if backend verification is missing.

Backend Captcha Verification

Backend verification validates the captcha token with Cloudflare servers.

Why It Is Important

Without backend verification:

Anyone can fake requests
Bots can bypass frontend UI
API endpoints remain vulnerable

Best Practice

Always use:

Frontend captcha widget
Backend token verification
Rate limiting
WAF protection

Together they create layered security.

Invisible Captchas vs Visible Captchas in Cloudflare

Cloudflare Turnstile supports both invisible and visible captcha modes.

Feature	Invisible Captcha	Visible Captcha
User Interaction	Minimal	Required sometimes
UX Experience	Smooth	Slight interruption
Bot Detection	Behavioral analysis	Challenge-based
Accessibility	Better	Moderate
Security	Good	Strong
Best Use Case	Modern apps	High-risk forms

Bot Protection with Cloudflare by Proxied Request

One of Cloudflare’s strongest security advantages is its reverse proxy architecture.

When Cloudflare proxy is enabled:

User → Cloudflare → Your Server

Your real server IP remains hidden behind Cloudflare infrastructure.

How Cloudflare Bot Protection Works

Cloudflare analyzes requests before they reach your server.

It checks IP reputation, request frequency, browser fingerprint, ASN/network reputation, bot signatures, JavaScript execution, TLS fingerprinting, and conducts behavioral analysis. Suspicious traffic can be blocked, challenged, rate limited, redirected, or logged.

Common Cloudflare Security Features

Feature	Purpose
WAF	Blocks malicious requests
Rate Limiting	Prevents spam/flood attacks
Bot Protection	Detects automated traffic
DDoS Protection	Prevents server overload
SSL/TLS	Encrypts traffic
CDN	Improves speed globally

AWS Has Similar Service Called WAF

Amazon Web Services also provides security services similar to Cloudflare.

The major one is: AWS WAF helps filter and monitor HTTP requests reaching your applications.

AWS WAF vs Cloudflare

Feature	Cloudflare	AWS WAF
Works As	Reverse Proxy CDN	Cloud Firewall
DDoS Protection	Built-in	Via AWS Shield
Global CDN	Included	CloudFront needed
Bot Protection	Strong built-in	Additional configs
Setup Complexity	Easier	More AWS-oriented
Best For	General web apps	AWS ecosystem apps

When to Use Cloudflare

Use Cloudflare if:

You want easy setup
You need CDN + security together
You want origin IP protection
You need strong bot mitigation
You want fast global delivery

When to Use AWS WAF

Use AWS WAF if:

Your infrastructure is fully on AWS
You already use CloudFront
You need deep AWS integrations
You want custom enterprise security rules

How Rate Limiting Can Be Bypassed

Rate limiting controls incoming server requests to allocate resources fairly and prevent abuse.

However, attackers can bypass limits by distributing requests across multiple IPs or using botnets.

They might also exploit flaws in rate limiting logic, like resetting counters or targeting unprotected endpoints. Understanding these vulnerabilities is key to enhancing security.

What is CDN ?

A Content Delivery Network (CDN) is a system of distributed servers that work together to deliver web content quickly to users based on their geographic location.

By caching content closer to the user's location, CDNs reduce latency and improve load times. They also help handle large volumes of traffic and protect against certain types of cyber attacks by distributing the load across multiple servers.

This makes them essential for enhancing the performance and reliability of websites and applications.

How to Distribute React Frontend with CDN different ways

Scaling Databases

Serverless Databases

Managed Database

Self Hosted Databases

Good questions

Distributed vs in mem cache
Write first or read first

In memory Cache vs Redis Cache

Which is the Right operation ordering (client → cache → database)

Below are common candidate orders for handling a write when you have a client, a Redis cache, and a database.

Option A — Update the cache, then update the database, then return to the client.
Option B — Update the database, then update the cache, then return to the client.
Option C — Update the database, then invalidate (clear) the cache, then return to the client.
Option D — Invalidate (clear) the cache, then update the database, then return to the client.

Answer : Option D

Indexing in Databases

Indexing is a database optimization technique used to make data searching faster. Without indexing, the database scans every row one by one, which is called a Full Table Scan.

With indexing, the database can directly locate the required data, similar to how we use an index page in a book to quickly find a topic.

In real-world applications containing millions of records, indexing is extremely important because users expect very fast responses.

Most production databases heavily rely on indexes for performance optimization.

How Indexing Works

An index stores a special data structure (commonly B-Tree) that maintains references to the actual rows in sorted order. When a query searches indexed columns, the database engine uses the index instead of scanning the whole table.

Example:

SELECT * FROM users WHERE email = "test@gmail.com";

If email is indexed:

Database directly finds the row.

If email is not indexed:

Database checks every row manually.

Common Columns Used for Indexing

User ID

Email
Username
Phone Number
Frequently searched fields

Advantages of Indexing	Disadvantages of Indexing
Faster search operations	Requires additional storage
Improves query performance	Insert/Update/Delete becomes slightly slower
Reduces response time	Too many indexes can reduce write performance
Useful for filtering and sorting

Database Replicas

Database replication means creating copies of a primary database server. These copies are called replicas or read replicas. Replication is mainly used to improve scalability, availability, and performance in large-scale systems.

In most real-world applications, read operations are much higher than write operations. To reduce load on the primary database, read requests are distributed across multiple replicas.

Primary Database vs Read Replica

Database Type	Purpose
Primary DB	Handles write operations
Read Replica	Handles read operations

95% Reads and 5% Writes Scenario

In applications like social media platforms, where 95% of traffic involves reading data and only 5% involves writing, multiple read replicas should be created to reduce the load on the primary database.

Replication Lag

Replication lag occurs because, in many systems, replication is asynchronous, meaning the primary database updates instantly while replicas receive updates after a short delay, which can cause replicas to become temporarily outdated during high traffic.

What Should Be the Configuration of Read Replicas?

In a system where most operations are read operations (for example 95% reads and 5% writes), the number of Read Replicas should be configured according to the read traffic load.

The general idea is:

More read traffic → More read replicas
Fewer read replicas → Higher load on each replica

If the replicas are not sufficient to handle incoming requests, the database can experience Replication Lag.

Recommended Configuration

Traffic Type	Suggested Configuration
Low read traffic	1–2 Read Replicas
Medium read traffic	3–5 Read Replicas
Very high read traffic	Multiple replicas with load balancer

Real-World Example

User changes profile photo:

Primary DB updates immediately
Replica may still show old photo for few seconds


<img 
  src="database-replica-system-design.png" 
  alt="System design architecture showing one primary database and multiple read replicas handling read traffic"
/>

Advantages	Disadvantages
Handles large read traffic	Replication lag
Improves scalability	Increased infrastructure cost
Better fault tolerance	Data consistency challenges
Reduces database load	Replicas may temporarily show outdated data
Read requests can be distributed across multiple servers	More infrastructure management complexity
Improves application performance and response time	Network synchronization overhead between databases

Archiving Old Data (Cold Storage)

As applications grow, databases store massive amounts of data such as logs, analytics, transactions, and user activity. Keeping all historical data inside the main database becomes expensive and affects performance.

To solve this problem, old or rarely accessed data is moved to cheaper storage systems called cold storage or archive storage.

AWS Glacier

A typical strategy involves storing recent data (0–1 year) in the main database and moving older data (1+ years) to Amazon S3 Glacier, which is designed for long-term storage, backup systems, and archived data, keeping the active database small and efficient.

Advantages	Disadvantages
Reduces storage cost	Archived data retrieval is slower
Improves DB performance	Extra management complexity
Easier database scaling	Data recovery may take additional time
Better long-term storage management	Requires archive policies and monitoring
Keeps active database smaller and faster	Accessing old data is less convenient
Helps optimize production database resources	Additional storage architecture setup required

<img 
  src="aws-glacier-archive-system.png" 
  alt="Architecture diagram showing recent data in active database and old data archived to AWS Glacier"
/>

Sharding in Databases

Sharding is a database scaling technique where one large database is divided into multiple smaller databases called shards. Each shard stores only a portion of the total data.

Sharding is used when a single database server cannot handle massive traffic, storage, or millions of users.

Large-scale companies like:

Instagram
Netflix
Facebook

use sharding to scale their systems.

Types of Database Sharding

When a database becomes too large to handle on a single server, engineers divide the data into smaller parts called shards. This technique is known as Sharding. It helps applications scale horizontally and manage large amounts of traffic and data efficiently.

1. Range-Based Sharding

In range-based sharding, data is divided according to a specific range of values such as IDs, dates, or numbers. This method is simple because records are stored in sequential order.

Users 1-1000    → Shard 1
Users 1001-2000 → Shard 2
Users 2001-3000 → Shard 3

Advantages and Disadvantages

Advantages	Disadvantages
Simple implementation	Uneven load distribution
Easy to understand	One shard may become overloaded
Easy debugging	Hotspot problem can occur
Works well for ordered data	Scaling becomes difficult later

2. Hash-Based Sharding

In hash-based sharding, a hash function decides which shard will store the data. This helps distribute records more evenly across multiple servers. This technique is commonly used in high-scale systems to balance traffic.

Advantages and Disadvantages

Advantages	Disadvantages
Better load balancing	Difficult resharding
Prevents hotspot issues	Complex shard management
More even data distribution	Harder debugging
Useful for high-scale systems	Migration becomes difficult

3. Geo-Based Sharding

Geo-based sharding divides data according to geographic regions or user locations. Different regions use different database servers.

India Users  → India Server
US Users     → US Server
Europe Users → Europe Server

This approach improves speed for global users by keeping data closer to their location.

Advantages and Disadvantages

Advantages	Disadvantages
Faster local response time	Difficult cross-region queries
Reduced latency	Complex data synchronization
Better user experience	Higher infrastructure complexity
Improved regional performance	Cross-region transactions are difficult

Problems with Sharding

1. Complex Queries

Joining data across shards becomes difficult and slow.

2. Rebalancing Issues

When one shard becomes overloaded, moving data between shards is complicated.

3. Cross-Shard Transactions

Transactions involving multiple shards are difficult to maintain consistently.

4. Hotspot Problem

Some shards may receive much higher traffic than others.

Example:

Viral users
Celebrity accounts

5. Increased Complexity

Application architecture becomes harder to manage.

Need:

Routing logic
Monitoring
Shard management

Command Palette

Stateless vs Stateful Backend

Stateless backend

Stateful backend

When to choose

Examples

Common designs to combine strengths

Stateless vs Stateful

Scaling Node.js With the Cluster Module

What Cluster does

Benefits

Limitations & considerations

Alternatives / complements

Scaling Stateless Backend: Load Balancer Patterns

Why load balancers

Types of load balancers (simple)

Key features to use for stateless services

Load balancing algorithms

Sticky sessions and statelessness

SSL Termination on Load Balancer

Rate Limiting on Load Balancer Level

DDos Protection on Load Balancer Level

Auto Scaling VS Mannual Scaling

Sticky Backend (Sticky Connection)

Scaling Stateful Backend Server

Stopping bot abuse and bulk buying for high-demand releases like tickets and sneaker drops

CAPTCHAs + rate limiting - why combine them

How to Add Frontend Page Captcha with Cloudflare

Steps to Add Cloudflare Turnstile Captcha

1. Create a Turnstile Site

2. Add Captcha Widget in Frontend

3. Verify Captcha on Backend

Features of Cloudflare Turnstile

Frontend Captcha vs Backend Captcha

Frontend Captcha

Limitation

Backend Captcha Verification

Why It Is Important

Best Practice

Invisible Captchas vs Visible Captchas in Cloudflare

Bot Protection with Cloudflare by Proxied Request

How Cloudflare Bot Protection Works

Common Cloudflare Security Features

AWS Has Similar Service Called WAF

AWS WAF vs Cloudflare

When to Use Cloudflare

When to Use AWS WAF

How Rate Limiting Can Be Bypassed

What is CDN ?

How to Distribute React Frontend with CDN different ways

Scaling Databases

Serverless Databases

Managed Database

Self Hosted Databases

In memory Cache vs Redis Cache

Which is the Right operation ordering (client → cache → database)

Indexing in Databases

How Indexing Works

Database Replicas

Primary Database vs Read Replica

95% Reads and 5% Writes Scenario

Replication Lag

What Should Be the Configuration of Read Replicas?

Recommended Configuration

Real-World Example

Archiving Old Data (Cold Storage)

AWS Glacier

Sharding in Databases

Types of Database Sharding

1. Range-Based Sharding

Advantages and Disadvantages

2. Hash-Based Sharding

Advantages and Disadvantages

3. Geo-Based Sharding

Advantages and Disadvantages

Problems with Sharding

Comments

More from this blog