The 10 Horsemen of the Software Apocalypse

In software development, certain failures show up again and again: concurrency bugs, resource leaks, numeric overflows, cascade effects, and dependencies that spiral out of control. In this article I go through ten of the worst: Race Conditions, Deadlocks, Memory Leaks, Integer Overflow, Thundering Herd, Cache Stampede, SQL/NoSQL injection, Buffer Overflow, Dependency Hell, and Retry Storms. They’re not theory; they’re failures that take down systems in production.

1. Race Condition

What it is: Two or more threads or processes access a shared resource, and the result depends on the order of execution. If that order isn’t guaranteed, the final state can be wrong.

Example: A shared counter incremented by two threads. Without synchronization, you can lose increments or read inconsistent values.

How to mitigate: Locks, atomic types, immutability, or designs that avoid shared state (e.g. queues and single-consumer workers).

2. Deadlock

What it is: Two or more threads block each other because each is waiting for a resource held by another. No one makes progress.

Example: Thread A holds lock 1 and waits for lock 2; thread B holds lock 2 and waits for lock 1.

How to mitigate: Fixed order when acquiring locks, timeouts, avoiding multiple locks when possible, or try-lock and backoff patterns.

3. Memory Leak

What it is: Memory is allocated but never freed. Over time the process uses all available RAM and crashes or gets killed by the OS.

Example: Listeners or subscriptions that are never removed, unbounded caches, references that keep objects alive unnecessarily.

How to mitigate: Clear lifecycle (register and unregister), bounded caches, profiling and memory monitoring in staging/production.

4. Integer Overflow / Underflow

What it is: An integer calculation goes beyond the representable range (overflow) or below it (underflow). In languages without automatic checks, the value “wraps” and can be used in indices, sizes, or money with catastrophic results.

Example: total = quantity * price with small integers can overflow; a negative index from underflow can write out of bounds.

How to mitigate: Use larger types when needed, validate ranges before critical operations, and where the language allows, checked arithmetic or big-integer libraries for money and large counts.

5. Thundering Herd

What it is: Many clients or processes react to the same event (e.g. lock release or cache entry expiry) and all hit the same resource at once. A massive spike in load that can take the system down.

Example: Thousands of workers sleeping on a lock; when it’s released, they all wake up and hit the database at the same time.

How to mitigate: Queues, exponential backoff, a single “leader” that renews or refreshes, or randomized retry delay to spread load.

6. Cache Stampede

What it is: A cache key expires; many requests arrive at once, miss the cache, and all trigger the same expensive operation (e.g. a heavy query). The backend gets overloaded.

Example: Homepage cache that expires at midnight; thousands of users load the page and all regenerate the same content at once.

How to mitigate: Single-flight (only one request regenerates, others wait), probabilistic early expiration, or background refresh before expiry (stale-while-revalidate).

7. SQL / NoSQL Injection

What it is: User input is concatenated directly into a query (SQL or NoSQL) without escaping or parameters. An attacker can inject logic and read, modify, or delete data.

Example: "SELECT * FROM users WHERE id = " + userInput allows injecting 1 OR 1=1 or full statements.

How to mitigate: Always parameterized queries or ORMs that use them; input validation and sanitization; least privilege in the DB; never trust user input inside the query.

8. Buffer Overflow

What it is: More data is written than the allocated buffer can hold. In C/C++ and low-level code, that can overwrite adjacent memory (variables, return addresses) and open the door to exploits.

Example: strcpy(dest, userInput) with userInput longer than dest.

How to mitigate: Use safe APIs (bounded copies, known sizes), languages or routines that check bounds; in modern code, prefer memory-safe languages (Rust, Go, etc.) where possible.

9. Dependency Hell

What it is: A project with many dependencies; incompatible versions (A wants lib X 1.0, B wants X 2.0), breaking updates, or huge, fragile dependency trees. Builds or deploys become impossible or unreproducible.

Example: Updating one library for a security fix and finding another depends on an old version with a different API.

How to mitigate: Lockfiles (package-lock, yarn.lock, Cargo.lock, etc.), fewer direct dependencies, incremental review and upgrades, and tooling that detects conflicts and vulnerabilities.

10. Retry Storms

What it is: A service degrades or goes down; clients retry over and over. The volume of retries multiplies load on the failing service and shared dependencies, preventing recovery.

Example: An API that’s slow; every client retries every few seconds; within seconds you have 10x more requests and the system fully collapses.

How to mitigate: Exponential backoff and jitter on retries, circuit breakers to stop calling when the service is down, concurrency limits and queues, and clear timeouts so connections don’t hang.

My personal perspective

These “horsemen” aren’t edge cases; they’re structural risks. Race conditions and deadlocks appear when there’s concurrency and shared state. Memory leaks and integer overflow when limits and lifecycles aren’t considered. Thundering herd and cache stampede when many actors react to the same event. Injection and buffer overflow when user input is trusted without defenses. Dependency hell and retry storms when dependencies and failure modes aren’t managed explicitly.

Mitigation isn’t “avoid concurrency” or “don’t use dependencies”; it’s designing with these failures in mind: less shared state, more immutability, validation and prepared parameters, limits and backoff, and observability to catch them before it’s too late. Knowing these ten helps you anticipate where a system can break and make safer design choices.