Skip to main content

Command Palette

Search for a command to run...

Sharding, Replication and System Design Week 51.1 Cohort 3.0

Updated
18 min read
A

Hi there! I'm Aditya, a passionate Full-Stack Developer driven by a love for turning concepts into captivating digital experiences. With a blend of creativity and technical expertise, I specialize in crafting user-friendly websites and applications that leave a lasting impression. Let's connect and bring your digital vision to life!

Stateless vs Stateful Backend

Stateless backend

Each request is independent. The server does not keep client session state between requests.

Any server instance can handle any request if it has access to shared resources (database, cache).

Stateful backend

Server keeps client-specific state across requests (session, in-memory game state, websocket connection state).

Requests from the same client often need the same server instance or a way to retrieve that state.

When to choose

  • Use stateless when possible (typical web APIs, microservices, REST/HTTP services).

  • Use stateful when you need low-latency in-memory state or long-lived bidirectional connections and state replication is impractical.

Examples

  • Stateless: REST API, microservices reading DB, serverless functions.

  • Stateful: multiplayer game servers, WebSocket chat servers maintaining connection-specific state, in-memory queued tasks per user.

Common designs to combine strengths

  • Keep core logic stateless; store session or user state in a shared store (Redis, DB).

  • Use stateful servers only where needed and front them with a load balancer and sticky sessions, or use a dedicated routing layer.


Stateless vs Stateful

Aspect Stateless Stateful
Request handling Any instance can handle any request Requests may need a specific instance
Horizontal scaling Easy Harder (needs sticky routing / replication)
Failure recovery Simple (restart any instance) Complex (must recover in-memory state)
Session storage External (DB/Redis) Often in-process memory
Use cases REST APIs, microservices, serverless Game servers, long-lived sockets, session-heavy apps
Best for High scale, simple deployment Low-latency, connection-oriented workloads

Scaling Node.js With the Cluster Module

What Cluster does

Node.js Cluster runs multiple Node.js worker processes that share the same server port.

Each worker is a separate process using its own event loop and memory - lets you use multi-core machines.

master process distributes connections to workers (Node handles the LB across workers).

Benefits

  • Simple way to utilize all CPU cores on one machine.

  • Auto-restart crashed workers (supervisor behavior via master).

Limitations & considerations

  • Works only on a single host. To scale across machines you still need a load balancer or service mesh.

  • Workers do not share memory — use Redis/DB for shared state.

  • Not a replacement for process managers (PM2, Docker orchestration). Use cluster + process manager or container orchestrator.

  • Sticky sessions: if your app is stateful (needs the same worker), Node’s default distribution is not sticky. You may need an external sticky LB or a custom master distribution.

Alternatives / complements

  • PM2 cluster mode (manages forks and restarts).

  • Worker threads (for CPU-bound tasks within one process).

  • Multiple containers / pods behind a load balancer for multi-host scaling (Kubernetes, ECS).

Example

Scaling Stateless Backend: Load Balancer Patterns

Why load balancers

  • Distribute incoming requests across multiple backend instances.

  • Improve availability (health checks, failover).

  • Centralize cross-cutting concerns (SSL termination, rate limits, authentication gateways).

Types of load balancers (simple)

  • Layer 4 (Transport): TCP/UDP level; efficient, fast. e.g., many cloud LBs in TCP mode.

  • Layer 7 (Application): HTTP-aware; can route by URL, headers; supports SSL termination and advanced routing. e.g., NGINX, HAProxy, ALB (AWS).

Key features to use for stateless services

  • Health checks: automatically remove unhealthy instances.

  • Auto-scaling integration: add/remove instances based on metrics.

  • SSL/TLS termination: offload crypto to LB to simplify backends.

  • HTTP routing and path-based rules: send traffic to appropriate services.

  • Sticky sessions: not needed for stateless backends — avoid when possible.

  • Rate limiting + WAF: protect backends from abuse and attacks.

  • Connection reuse and keepalive: improve latency and resource usage.

Load balancing algorithms

  • Round robin: simple, evenly distributes.

  • Least connections: prefers less-busy instances (good for uneven request durations).

  • IP hash: deterministic routing (can be used for “soft” stickiness).

  • Weighted routing: prefer stronger/faster instances.

Sticky sessions and statelessness

Avoid sticky sessions for stateless services — they reduce scheduler flexibility and can cause hot spots.

If you must support sessions: store session in a fast shared store (Redis) and keep backends stateless.

SSL Termination on Load Balancer

Rate Limiting on Load Balancer Level

DDos Protection on Load Balancer Level

Auto Scaling VS Mannual Scaling

Sticky Backend (Sticky Connection)

Scaling Stateful Backend Server

Mostly done manually

ASGs/MIGs let you do metric based scaling

K8s has HPA that lets you scale pods and nodes

Karpenter lets you scale node pools based on applications

Stopping bot abuse and bulk buying for high-demand releases like tickets and sneaker drops

Distributed bots use many IPs, like proxies and botnets, making IP limits useless. Attackers can bypass IP or session limits with multiple accounts or stolen credentials. Bots mimic human-like requests to avoid detection.

CAPTCHA farms let attackers use real people to solve challenges, making low-rate defenses weak. Strict limits can hurt real users during traffic spikes, so thresholds are kept low, which attackers exploit. Attackers also spread requests over time and different endpoints to avoid simple rate limits.

CAPTCHAs + rate limiting - why combine them

  • Layered defense: rate limits slow mass requests; CAPTCHAs raise attacker cost by requiring human (or costly solver) involvement.

  • Adaptive application: show CAPTCHAs only after suspicious behavior or rate-thresholds, reducing friction for normal users.

  • Reduce solver ROI: limits constrain how many CAPTCHA solves an attacker can use, making bulk buys expensive or slow.

  • Complementary signals: CAPTCHA failures, challenge frequency, and rate-limit hits feed a risk score for stronger actions (block, require verification, queue).

Combine adaptive rate limiting, targeted CAPTCHAs, behavioral analysis, and policies (per-user quotas, verified accounts, lotteries/queues) for effective protection without wrecking UX.

How to Add Frontend Page Captcha with Cloudflare

Cloudflare provides an easy way to protect your frontend pages from spam, bots, fake signups, credential stuffing, and automated attacks using Turnstile Captcha. Unlike traditional captchas, Cloudflare Turnstile focuses on user privacy and smooth user experience.

Steps to Add Cloudflare Turnstile Captcha

1. Create a Turnstile Site

  1. Go to Cloudflare Turnstile

  2. Login to your Cloudflare account

  3. Create a new Turnstile widget

  4. Add your domain

  5. Copy:

    • Site Key

    • Secret Key

2. Add Captcha Widget in Frontend

<form id="signup-form">
  <input type="email" placeholder="Enter Email" />

  <div
    class="cf-turnstile"
    data-sitekey="YOUR_SITE_KEY">
  </div>

  <button type="submit">Submit</button>
</form>

<script src="https://challenges.cloudflare.com/turnstile/v0/api.js" async defer></script>

3. Verify Captcha on Backend

Never trust only frontend validation. Always verify captcha response from backend.

Example using Node.js:

const axios = require("axios");

app.post("/submit", async (req, res) => {

   const token = req.body["cf-turnstile-response"];

   const response = await axios.post(
      "https://challenges.cloudflare.com/turnstile/v0/siteverify",
      new URLSearchParams({
         secret: process.env.TURNSTILE_SECRET,
         response: token
      })
   );

   if(response.data.success){
      return res.send("Human Verified");
   }

   res.status(400).send("Captcha Failed");

});

Features of Cloudflare Turnstile

  • No annoying image puzzles in many cases

  • Privacy-focused alternative to reCAPTCHA

  • Reduces fake signup attacks

  • Works against automated scripts and bots

  • Easy frontend integration

  • Backend verification support

  • Supports invisible verification mode

Frontend Captcha vs Backend Captcha

Both frontend and backend captchas are important, but they serve different purposes.

Feature Frontend Captcha Backend Captcha Verification
Runs On Browser/UI Server
Purpose User interaction validation Actual security verification
Security Level Medium High
Can Be Bypassed Yes Harder
Prevents Fake Requests Partially Strongly
User Experience Visible to user Invisible
Recommended Yes Mandatory

Frontend Captcha

Frontend captcha is the visible challenge shown to the user before form submission.

Examples:

  • Checkbox verification

  • Image selection

  • Invisible behavioral detection

Limitation

Attackers can directly hit your backend API and bypass frontend checks if backend verification is missing.

Backend Captcha Verification

Backend verification validates the captcha token with Cloudflare servers.

Why It Is Important

Without backend verification:

  • Anyone can fake requests

  • Bots can bypass frontend UI

  • API endpoints remain vulnerable

Best Practice

Always use:

  1. Frontend captcha widget

  2. Backend token verification

  3. Rate limiting

  4. WAF protection

Together they create layered security.

Invisible Captchas vs Visible Captchas in Cloudflare

Cloudflare Turnstile supports both invisible and visible captcha modes.

Feature Invisible Captcha Visible Captcha
User Interaction Minimal Required sometimes
UX Experience Smooth Slight interruption
Bot Detection Behavioral analysis Challenge-based
Accessibility Better Moderate
Security Good Strong
Best Use Case Modern apps High-risk forms

Bot Protection with Cloudflare by Proxied Request

One of Cloudflare’s strongest security advantages is its reverse proxy architecture.

When Cloudflare proxy is enabled:

User → Cloudflare → Your Server

Your real server IP remains hidden behind Cloudflare infrastructure.

How Cloudflare Bot Protection Works

Cloudflare analyzes requests before they reach your server.

It checks IP reputation, request frequency, browser fingerprint, ASN/network reputation, bot signatures, JavaScript execution, TLS fingerprinting, and conducts behavioral analysis. Suspicious traffic can be blocked, challenged, rate limited, redirected, or logged.

Common Cloudflare Security Features

Feature Purpose
WAF Blocks malicious requests
Rate Limiting Prevents spam/flood attacks
Bot Protection Detects automated traffic
DDoS Protection Prevents server overload
SSL/TLS Encrypts traffic
CDN Improves speed globally

AWS Has Similar Service Called WAF

Amazon Web Services also provides security services similar to Cloudflare.

The major one is: AWS WAF helps filter and monitor HTTP requests reaching your applications.

AWS WAF vs Cloudflare

Feature Cloudflare AWS WAF
Works As Reverse Proxy CDN Cloud Firewall
DDoS Protection Built-in Via AWS Shield
Global CDN Included CloudFront needed
Bot Protection Strong built-in Additional configs
Setup Complexity Easier More AWS-oriented
Best For General web apps AWS ecosystem apps

When to Use Cloudflare

Use Cloudflare if:

  • You want easy setup

  • You need CDN + security together

  • You want origin IP protection

  • You need strong bot mitigation

  • You want fast global delivery

When to Use AWS WAF

Use AWS WAF if:

  • Your infrastructure is fully on AWS

  • You already use CloudFront

  • You need deep AWS integrations

  • You want custom enterprise security rules

How Rate Limiting Can Be Bypassed

Rate limiting controls incoming server requests to allocate resources fairly and prevent abuse.

However, attackers can bypass limits by distributing requests across multiple IPs or using botnets.

They might also exploit flaws in rate limiting logic, like resetting counters or targeting unprotected endpoints. Understanding these vulnerabilities is key to enhancing security.

What is CDN ?

A Content Delivery Network (CDN) is a system of distributed servers that work together to deliver web content quickly to users based on their geographic location.

By caching content closer to the user's location, CDNs reduce latency and improve load times. They also help handle large volumes of traffic and protect against certain types of cyber attacks by distributing the load across multiple servers.

This makes them essential for enhancing the performance and reliability of websites and applications.

How to Distribute React Frontend with CDN different ways

Scaling Databases

Serverless Databases

Managed Database

Self Hosted Databases

Good questions

  1. Distributed vs in mem cache

  2. Write first or read first

In memory Cache vs Redis Cache

Which is the Right operation ordering (client → cache → database)

Below are common candidate orders for handling a write when you have a client, a Redis cache, and a database.

  • Option A — Update the cache, then update the database, then return to the client.

  • Option B — Update the database, then update the cache, then return to the client.

  • Option C — Update the database, then invalidate (clear) the cache, then return to the client.

  • Option D — Invalidate (clear) the cache, then update the database, then return to the client.

Answer : Option D

Indexing in Databases

Indexing is a database optimization technique used to make data searching faster. Without indexing, the database scans every row one by one, which is called a Full Table Scan.

With indexing, the database can directly locate the required data, similar to how we use an index page in a book to quickly find a topic.

In real-world applications containing millions of records, indexing is extremely important because users expect very fast responses.

Most production databases heavily rely on indexes for performance optimization.

How Indexing Works

An index stores a special data structure (commonly B-Tree) that maintains references to the actual rows in sorted order. When a query searches indexed columns, the database engine uses the index instead of scanning the whole table.

Example:

SELECT * FROM users WHERE email = "test@gmail.com";

If email is indexed:

  • Database directly finds the row.

If email is not indexed:

  • Database checks every row manually.

Common Columns Used for Indexing

User ID

  • Email

  • Username

  • Phone Number

  • Frequently searched fields

Advantages of Indexing Disadvantages of Indexing
Faster search operations Requires additional storage
Improves query performance Insert/Update/Delete becomes slightly slower
Reduces response time Too many indexes can reduce write performance
Useful for filtering and sorting

Database Replicas

Database replication means creating copies of a primary database server. These copies are called replicas or read replicas. Replication is mainly used to improve scalability, availability, and performance in large-scale systems.

In most real-world applications, read operations are much higher than write operations. To reduce load on the primary database, read requests are distributed across multiple replicas.

Primary Database vs Read Replica

Database Type Purpose
Primary DB Handles write operations
Read Replica Handles read operations

95% Reads and 5% Writes Scenario

In applications like social media platforms, where 95% of traffic involves reading data and only 5% involves writing, multiple read replicas should be created to reduce the load on the primary database.

Replication Lag

Replication lag occurs because, in many systems, replication is asynchronous, meaning the primary database updates instantly while replicas receive updates after a short delay, which can cause replicas to become temporarily outdated during high traffic.

What Should Be the Configuration of Read Replicas?

In a system where most operations are read operations (for example 95% reads and 5% writes), the number of Read Replicas should be configured according to the read traffic load.

The general idea is:

  • More read traffic → More read replicas

  • Fewer read replicas → Higher load on each replica

If the replicas are not sufficient to handle incoming requests, the database can experience Replication Lag.

Traffic Type Suggested Configuration
Low read traffic 1–2 Read Replicas
Medium read traffic 3–5 Read Replicas
Very high read traffic Multiple replicas with load balancer

Real-World Example

User changes profile photo:

  • Primary DB updates immediately

  • Replica may still show old photo for few seconds


<img 
  src="database-replica-system-design.png" 
  alt="System design architecture showing one primary database and multiple read replicas handling read traffic"
/>
Advantages Disadvantages
Handles large read traffic Replication lag
Improves scalability Increased infrastructure cost
Better fault tolerance Data consistency challenges
Reduces database load Replicas may temporarily show outdated data
Read requests can be distributed across multiple servers More infrastructure management complexity
Improves application performance and response time Network synchronization overhead between databases

Archiving Old Data (Cold Storage)

As applications grow, databases store massive amounts of data such as logs, analytics, transactions, and user activity. Keeping all historical data inside the main database becomes expensive and affects performance.

To solve this problem, old or rarely accessed data is moved to cheaper storage systems called cold storage or archive storage.

AWS Glacier

A typical strategy involves storing recent data (0–1 year) in the main database and moving older data (1+ years) to Amazon S3 Glacier, which is designed for long-term storage, backup systems, and archived data, keeping the active database small and efficient.

Advantages Disadvantages
Reduces storage cost Archived data retrieval is slower
Improves DB performance Extra management complexity
Easier database scaling Data recovery may take additional time
Better long-term storage management Requires archive policies and monitoring
Keeps active database smaller and faster Accessing old data is less convenient
Helps optimize production database resources Additional storage architecture setup required
<img 
  src="aws-glacier-archive-system.png" 
  alt="Architecture diagram showing recent data in active database and old data archived to AWS Glacier"
/>

Sharding in Databases

Sharding is a database scaling technique where one large database is divided into multiple smaller databases called shards. Each shard stores only a portion of the total data.

Sharding is used when a single database server cannot handle massive traffic, storage, or millions of users.

Large-scale companies like:

  • Instagram

  • Netflix

  • Facebook

use sharding to scale their systems.

Types of Database Sharding

When a database becomes too large to handle on a single server, engineers divide the data into smaller parts called shards. This technique is known as Sharding. It helps applications scale horizontally and manage large amounts of traffic and data efficiently.

1. Range-Based Sharding

In range-based sharding, data is divided according to a specific range of values such as IDs, dates, or numbers. This method is simple because records are stored in sequential order.

Users 1-1000    → Shard 1
Users 1001-2000 → Shard 2
Users 2001-3000 → Shard 3

Advantages and Disadvantages

Advantages Disadvantages
Simple implementation Uneven load distribution
Easy to understand One shard may become overloaded
Easy debugging Hotspot problem can occur
Works well for ordered data Scaling becomes difficult later

2. Hash-Based Sharding

In hash-based sharding, a hash function decides which shard will store the data. This helps distribute records more evenly across multiple servers. This technique is commonly used in high-scale systems to balance traffic.

Advantages and Disadvantages

Advantages Disadvantages
Better load balancing Difficult resharding
Prevents hotspot issues Complex shard management
More even data distribution Harder debugging
Useful for high-scale systems Migration becomes difficult

3. Geo-Based Sharding

Geo-based sharding divides data according to geographic regions or user locations. Different regions use different database servers.

India Users  → India Server
US Users     → US Server
Europe Users → Europe Server

This approach improves speed for global users by keeping data closer to their location.

Advantages and Disadvantages

Advantages Disadvantages
Faster local response time Difficult cross-region queries
Reduced latency Complex data synchronization
Better user experience Higher infrastructure complexity
Improved regional performance Cross-region transactions are difficult

Problems with Sharding

1. Complex Queries

Joining data across shards becomes difficult and slow.

2. Rebalancing Issues

When one shard becomes overloaded, moving data between shards is complicated.

3. Cross-Shard Transactions

Transactions involving multiple shards are difficult to maintain consistently.

4. Hotspot Problem

Some shards may receive much higher traffic than others.

Example:

  • Viral users

  • Celebrity accounts

5. Increased Complexity

Application architecture becomes harder to manage.

Need:

  • Routing logic

  • Monitoring

  • Shard management