What Is a Database, Really? The Invariants Behind Every Query
A database is not a spreadsheet with an API. It's a system that makes promises about your data — and spends extraordinary effort keeping them.
Every application you’ve ever built stores state somewhere. A JSON file. A Redis key. A row in PostgreSQL. But there’s a difference between storing data and trusting that the data is correct after a crash, a concurrent write, and a disk failure — all at the same time.
That difference is a database.
A database is not a spreadsheet with an API. It’s not a fancy file. It’s a system that makes invariants — promises about your data that hold true no matter what the machine, the network, or your application does. You insert a row inside a transaction, the database commits, and the power goes out one millisecond later. That row is there when you come back. That’s not a feature. That’s a promise. And the entire architecture of the database exists to keep it.
This series is about those promises: what a database guarantees, how it enforces each guarantee, and what breaks when it can’t.
graph TB
subgraph App["Your Application"]
Q1["INSERT INTO orders ..."]
Q2["SELECT * FROM orders ..."]
Q3["UPDATE orders SET ..."]
end
subgraph DB["Database"]
direction LR
QP["Query Processor"]
SE["Storage Engine"]
TM["Transaction Manager"]
BM["Buffer Manager"]
end
subgraph HW["Storage"]
DISK["Disk / SSD"]
WAL["Write-Ahead Log"]
end
App -->|SQL| DB
QP --> SE
SE --> BM
TM --> WAL
BM --> DISK
style DB fill:#1a1a2e,stroke:#e94560,color:#fff
style HW fill:#0f3460,stroke:#16213e,color:#fff
Four Promises, One Acronym
You’ve heard of ACID. Most developers know it as a checkbox on a comparison chart: “PostgreSQL is ACID-compliant.” But ACID isn’t a feature list. It’s four invariants, each one solving a specific failure mode that will destroy your data if left unaddressed.
Atomicity: All or Nothing
A transaction does ten things. Either all ten happen, or none of them do. There is no state where five happened and five didn’t.
1
2
3
4
5
6
7
8
9
10
11
12
13
// Transfer $500 from account A to account B
//
// Without atomicity:
// 1. Subtract $500 from A ✓
// 2. — crash —
// 3. Add $500 to B ✗ (never executed)
// Result: $500 vanished from the system
//
// With atomicity:
// 1. Subtract $500 from A (logged, not yet committed)
// 2. — crash —
// On recovery: transaction was not committed → roll back step 1
// Result: both accounts unchanged. Money is conserved.
This has nothing to do with concurrency. Atomicity is about crash recovery. The database must guarantee that a half-finished transaction leaves no trace. The mechanism? A write-ahead log (WAL) that records every change before it’s applied. On crash, the database replays committed transactions and rolls back uncommitted ones.
The atomicity invariant: a transaction either completes entirely or has no effect. There is no partial state.
Consistency: Rules That Cannot Be Broken
Your schema defines rules. A primary key is unique. A foreign key references an existing row. A balance cannot be negative. The database enforces these on every transaction, every write, every statement.
1
2
3
4
5
6
7
8
9
10
11
12
// You define a constraint:
// CREATE TABLE accounts (
// id SERIAL PRIMARY KEY,
// balance DECIMAL NOT NULL CHECK (balance >= 0)
// );
//
// Now try:
// UPDATE accounts SET balance = -100 WHERE id = 1;
//
// The database rejects this. Not your application code.
// Not a validation library. The database itself.
// The constraint is an invariant — it cannot be violated.
graph LR
T["Transaction"] -->|"applies changes"| S1["State A (valid)"]
S1 -->|"constraints checked"| S2["State B (valid)"]
T -.->|"constraint violated"| REJECT["ROLLBACK"]
style S1 fill:#16c79a,stroke:#16c79a,color:#fff
style S2 fill:#16c79a,stroke:#16c79a,color:#fff
style REJECT fill:#e94560,stroke:#e94560,color:#fff
The consistency invariant: every transaction moves the database from one valid state to another. No constraint is ever violated, not even temporarily (within the transaction’s visible scope).
Isolation: Transactions Don’t Interfere
Two transactions run concurrently. Each one behaves as if it were the only transaction in the system. One transaction’s uncommitted writes are invisible to another.
This is the hardest invariant to enforce and the most expensive. Full isolation (serializability) means concurrent transactions produce the same result as if they ran one at a time. Databases offer weaker levels — Read Committed, Repeatable Read — that trade safety for speed. Every level is a different point on the isolation-performance curve.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
// Transaction 1 (T1): Transaction 2 (T2):
// BEGIN; BEGIN;
// SELECT balance FROM accounts
// WHERE id = 1; → $1000
// UPDATE accounts
// SET balance = 500
// WHERE id = 1;
// COMMIT;
// SELECT balance FROM accounts
// WHERE id = 1; → ???
//
// Under READ COMMITTED: T1 sees $500 (reads committed data)
// Under REPEATABLE READ: T1 sees $1000 (snapshot from start)
// Under SERIALIZABLE: T1 sees $1000 (full isolation)
graph TB
subgraph Levels["Isolation Levels"]
direction LR
RC["Read Committed"]
RR["Repeatable Read"]
SR["Serializable"]
end
RC -->|"more isolation"| RR
RR -->|"more isolation"| SR
PERF["Performance"] -.->|"decreases"| SR
SAFE["Safety"] -.->|"increases"| SR
style RC fill:#e94560,stroke:#e94560,color:#fff
style RR fill:#c77a30,stroke:#c77a30,color:#fff
style SR fill:#16c79a,stroke:#16c79a,color:#fff
The isolation invariant: concurrent transactions do not corrupt each other’s results. The strength of the guarantee depends on the isolation level — but at every level, the database prevents specific, named anomalies.
Durability: Surviving the Crash
Once a transaction commits, the data survives any failure — power loss, kernel panic, disk controller crash. The database said “committed,” and it meant it.
This sounds like the persistence invariant from operating systems, and it is — but harder. The OS promises that fsync makes data durable. The database must make transactions durable, which means ordering writes correctly, flushing the WAL before acknowledging the commit, and recovering consistently after a crash.
1
2
3
4
5
6
7
8
9
10
11
12
// The durability sequence:
//
// 1. Transaction writes changes to WAL (in memory)
// 2. WAL is flushed to disk (fsync)
// 3. Database returns "COMMIT OK" to the client
// 4. — crash can happen here —
// 5. On recovery: replay WAL → all committed data is restored
//
// The critical ordering:
// WAL must hit disk BEFORE the commit acknowledgment.
// If you ack first and crash before flush, you lied.
// The client thinks the data is safe. It isn't.
sequenceDiagram
participant C as Client
participant DB as Database
participant WAL as WAL (disk)
participant D as Data Files (disk)
C->>DB: COMMIT
DB->>WAL: Write transaction log
WAL-->>DB: fsync complete
DB->>C: COMMIT OK ✓
Note over C: Client can trust the data is safe
DB->>D: Write data pages (async, later)
Note over D: Actual data files updated lazily
Note over WAL,D: If crash happens before data files update:<br/>Recovery replays WAL → data is restored
The durability invariant: once a transaction is committed, no failure can lose it. The WAL is the mechanism. The fsync before the ack is the enforcement. Skip either one and you have a database that lies about commits.
The Architecture of Trust
Every database, from SQLite to PostgreSQL to CockroachDB, is built from the same fundamental components. The names change. The invariants don’t.
graph TB
subgraph Client["Client Layer"]
SQL["SQL / Wire Protocol"]
end
subgraph Query["Query Processing"]
PARSE["Parser"]
OPT["Optimizer"]
EXEC["Executor"]
end
subgraph Storage["Storage Engine"]
BUF["Buffer Pool"]
IDX["Index Manager (B-Tree)"]
TXN["Transaction Manager"]
LOG["WAL / Log Manager"]
end
subgraph Disk["Persistent Storage"]
DF["Data Files (pages)"]
LF["Log Files (WAL)"]
end
SQL --> PARSE --> OPT --> EXEC
EXEC --> BUF
EXEC --> IDX
EXEC --> TXN
TXN --> LOG
BUF --> DF
LOG --> LF
style Query fill:#1a1a2e,stroke:#e94560,color:#fff
style Storage fill:#0f3460,stroke:#16213e,color:#fff
style Disk fill:#16213e,stroke:#0f3460,color:#fff
Query Processor — parses SQL, optimizes the execution plan, runs the query. This is where the database decides how to answer your question — full table scan or index lookup, nested loop join or hash join.
Buffer Pool — an in-memory cache of disk pages. The database doesn’t read from disk on every query. It keeps hot pages in memory and evicts cold ones. Same idea as the OS page cache, but the database manages it directly because it knows its access patterns better than the OS does.
Index Manager — B-Trees (or LSM-Trees, or hash indexes) that let you find rows without scanning every page. An index on user_id means SELECT * FROM orders WHERE user_id = 42 touches a handful of pages instead of millions.
Transaction Manager — coordinates concurrent transactions, enforces isolation, manages locks or MVCC (Multi-Version Concurrency Control) snapshots.
WAL / Log Manager — the write-ahead log. Every change is logged before it’s applied. This is the foundation of both atomicity and durability.
Why Not Just Use Files?
You can store data in a file. serde_json can serialize any Rust struct to disk in three lines. Why bother with a database at all?
Because files don’t give you any of the four invariants.
| Problem | File | Database |
|---|---|---|
| Crash during write | Partial write, corrupted file | WAL ensures atomic recovery |
| Two processes write simultaneously | Last writer wins, data lost | Transactions serialize access |
| Find one record in a million | Read entire file, linear scan | B-Tree index, O(log n) |
| Schema changes | Hope your parsing code handles it | ALTER TABLE, constraints enforced |
| Data exceeds RAM | Rewrite everything on every change | Buffer pool, page-level I/O |
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// The "just use a file" approach:
use std::fs;
use serde::{Serialize, Deserialize};
#[derive(Serialize, Deserialize)]
struct AppState {
users: Vec<User>,
orders: Vec<Order>,
}
fn save(state: &AppState) -> std::io::Result<()> {
let json = serde_json::to_string(state)?;
fs::write("state.json", json)?; // ← NOT atomic
Ok(())
}
// What happens when:
// - Power fails during fs::write? Corrupted file.
// - Two instances call save() at once? Last write wins.
// - You have 10 million orders? Serialize/deserialize ALL of them
// just to update one.
// - You need orders by user_id? Linear scan every time.
//
// A database solves all of these. That's the point.
What This Series Will Cover
We’ll go deep into each layer of the database, starting from the bottom — where bytes meet disk — and working up to the query optimizer.
| Post | Layer | The Invariant |
|---|---|---|
| This post | Overview | ACID: the four promises |
| Storage & Pages | Disk I/O | Fixed-size pages, no partial reads |
| Buffer Pool | Memory | Hot pages cached, eviction is invisible |
| B-Trees | Indexing | Sorted, balanced, O(log n) always |
| WAL | Recovery | Committed = durable, always |
| Transactions & MVCC | Concurrency | Readers don’t block writers |
| Query Processing | Optimization | Same result, minimum cost |
The goal is not to memorize PostgreSQL internals. It’s to understand the invariants that every database must uphold — and then see how specific systems (PostgreSQL, SQLite, RocksDB) choose different mechanisms to enforce them. The invariants are universal. The implementations are tradeoffs.
Next up: disk I/O and pages — why databases don’t read bytes, they read pages, and why the fixed-size page is the most important data structure you’ve never thought about.
This is Post 1 of the series Invariants the Storage Engine Keeps — databases through the guarantees they make about your data, and what breaks when they can’t.
Stay in the loop
Subscribe via RSS to get new posts on systems, Rust, and cryptography.
Subscribe to RSS