Post

What Is a Database, Really? The Invariants Behind Every Query

A database is not a spreadsheet with an API. It's a system that makes promises about your data — and spends extraordinary effort keeping them.

What Is a Database, Really? The Invariants Behind Every Query

Every application you’ve ever built stores state somewhere. A JSON file. A Redis key. A row in PostgreSQL. But there’s a difference between storing data and trusting that the data is correct after a crash, a concurrent write, and a disk failure — all at the same time.

That difference is a database.

A database is not a spreadsheet with an API. It’s not a fancy file. It’s a system that makes invariantspromises about your data that hold true no matter what the machine, the network, or your application does. You insert a row inside a transaction, the database commits, and the power goes out one millisecond later. That row is there when you come back. That’s not a feature. That’s a promise. And the entire architecture of the database exists to keep it.

This series is about those promises: what a database guarantees, how it enforces each guarantee, and what breaks when it can’t.

graph TB
    subgraph App["Your Application"]
        Q1["INSERT INTO orders ..."]
        Q2["SELECT * FROM orders ..."]
        Q3["UPDATE orders SET ..."]
    end

    subgraph DB["Database"]
        direction LR
        QP["Query Processor"]
        SE["Storage Engine"]
        TM["Transaction Manager"]
        BM["Buffer Manager"]
    end

    subgraph HW["Storage"]
        DISK["Disk / SSD"]
        WAL["Write-Ahead Log"]
    end

    App -->|SQL| DB
    QP --> SE
    SE --> BM
    TM --> WAL
    BM --> DISK

    style DB fill:#1a1a2e,stroke:#e94560,color:#fff
    style HW fill:#0f3460,stroke:#16213e,color:#fff

Four Promises, One Acronym

You’ve heard of ACID. Most developers know it as a checkbox on a comparison chart: “PostgreSQL is ACID-compliant.” But ACID isn’t a feature list. It’s four invariants, each one solving a specific failure mode that will destroy your data if left unaddressed.

Atomicity: All or Nothing

A transaction does ten things. Either all ten happen, or none of them do. There is no state where five happened and five didn’t.

1
2
3
4
5
6
7
8
9
10
11
12
13
// Transfer $500 from account A to account B
//
// Without atomicity:
//   1. Subtract $500 from A  ✓
//   2. — crash —
//   3. Add $500 to B         ✗ (never executed)
//   Result: $500 vanished from the system
//
// With atomicity:
//   1. Subtract $500 from A  (logged, not yet committed)
//   2. — crash —
//   On recovery: transaction was not committed → roll back step 1
//   Result: both accounts unchanged. Money is conserved.

This has nothing to do with concurrency. Atomicity is about crash recovery. The database must guarantee that a half-finished transaction leaves no trace. The mechanism? A write-ahead log (WAL) that records every change before it’s applied. On crash, the database replays committed transactions and rolls back uncommitted ones.

The atomicity invariant: a transaction either completes entirely or has no effect. There is no partial state.

Consistency: Rules That Cannot Be Broken

Your schema defines rules. A primary key is unique. A foreign key references an existing row. A balance cannot be negative. The database enforces these on every transaction, every write, every statement.

1
2
3
4
5
6
7
8
9
10
11
12
// You define a constraint:
// CREATE TABLE accounts (
//     id SERIAL PRIMARY KEY,
//     balance DECIMAL NOT NULL CHECK (balance >= 0)
// );
//
// Now try:
// UPDATE accounts SET balance = -100 WHERE id = 1;
//
// The database rejects this. Not your application code.
// Not a validation library. The database itself.
// The constraint is an invariant — it cannot be violated.
graph LR
    T["Transaction"] -->|"applies changes"| S1["State A (valid)"]
    S1 -->|"constraints checked"| S2["State B (valid)"]
    T -.->|"constraint violated"| REJECT["ROLLBACK"]

    style S1 fill:#16c79a,stroke:#16c79a,color:#fff
    style S2 fill:#16c79a,stroke:#16c79a,color:#fff
    style REJECT fill:#e94560,stroke:#e94560,color:#fff

The consistency invariant: every transaction moves the database from one valid state to another. No constraint is ever violated, not even temporarily (within the transaction’s visible scope).

Isolation: Transactions Don’t Interfere

Two transactions run concurrently. Each one behaves as if it were the only transaction in the system. One transaction’s uncommitted writes are invisible to another.

This is the hardest invariant to enforce and the most expensive. Full isolation (serializability) means concurrent transactions produce the same result as if they ran one at a time. Databases offer weaker levels — Read Committed, Repeatable Read — that trade safety for speed. Every level is a different point on the isolation-performance curve.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
// Transaction 1 (T1):               Transaction 2 (T2):
// BEGIN;                             BEGIN;
// SELECT balance FROM accounts
//   WHERE id = 1;  → $1000
//                                    UPDATE accounts
//                                      SET balance = 500
//                                      WHERE id = 1;
//                                    COMMIT;
// SELECT balance FROM accounts
//   WHERE id = 1;  → ???
//
// Under READ COMMITTED:   T1 sees $500 (reads committed data)
// Under REPEATABLE READ:  T1 sees $1000 (snapshot from start)
// Under SERIALIZABLE:     T1 sees $1000 (full isolation)
graph TB
    subgraph Levels["Isolation Levels"]
        direction LR
        RC["Read Committed"]
        RR["Repeatable Read"]
        SR["Serializable"]
    end

    RC -->|"more isolation"| RR
    RR -->|"more isolation"| SR

    PERF["Performance"] -.->|"decreases"| SR
    SAFE["Safety"] -.->|"increases"| SR

    style RC fill:#e94560,stroke:#e94560,color:#fff
    style RR fill:#c77a30,stroke:#c77a30,color:#fff
    style SR fill:#16c79a,stroke:#16c79a,color:#fff

The isolation invariant: concurrent transactions do not corrupt each other’s results. The strength of the guarantee depends on the isolation level — but at every level, the database prevents specific, named anomalies.

Durability: Surviving the Crash

Once a transaction commits, the data survives any failure — power loss, kernel panic, disk controller crash. The database said “committed,” and it meant it.

This sounds like the persistence invariant from operating systems, and it is — but harder. The OS promises that fsync makes data durable. The database must make transactions durable, which means ordering writes correctly, flushing the WAL before acknowledging the commit, and recovering consistently after a crash.

1
2
3
4
5
6
7
8
9
10
11
12
// The durability sequence:
//
// 1. Transaction writes changes to WAL (in memory)
// 2. WAL is flushed to disk (fsync)
// 3. Database returns "COMMIT OK" to the client
// 4. — crash can happen here —
// 5. On recovery: replay WAL → all committed data is restored
//
// The critical ordering:
//   WAL must hit disk BEFORE the commit acknowledgment.
//   If you ack first and crash before flush, you lied.
//   The client thinks the data is safe. It isn't.
sequenceDiagram
    participant C as Client
    participant DB as Database
    participant WAL as WAL (disk)
    participant D as Data Files (disk)

    C->>DB: COMMIT
    DB->>WAL: Write transaction log
    WAL-->>DB: fsync complete
    DB->>C: COMMIT OK ✓

    Note over C: Client can trust the data is safe

    DB->>D: Write data pages (async, later)
    Note over D: Actual data files updated lazily

    Note over WAL,D: If crash happens before data files update:<br/>Recovery replays WAL → data is restored

The durability invariant: once a transaction is committed, no failure can lose it. The WAL is the mechanism. The fsync before the ack is the enforcement. Skip either one and you have a database that lies about commits.

The Architecture of Trust

Every database, from SQLite to PostgreSQL to CockroachDB, is built from the same fundamental components. The names change. The invariants don’t.

graph TB
    subgraph Client["Client Layer"]
        SQL["SQL / Wire Protocol"]
    end

    subgraph Query["Query Processing"]
        PARSE["Parser"]
        OPT["Optimizer"]
        EXEC["Executor"]
    end

    subgraph Storage["Storage Engine"]
        BUF["Buffer Pool"]
        IDX["Index Manager (B-Tree)"]
        TXN["Transaction Manager"]
        LOG["WAL / Log Manager"]
    end

    subgraph Disk["Persistent Storage"]
        DF["Data Files (pages)"]
        LF["Log Files (WAL)"]
    end

    SQL --> PARSE --> OPT --> EXEC
    EXEC --> BUF
    EXEC --> IDX
    EXEC --> TXN
    TXN --> LOG
    BUF --> DF
    LOG --> LF

    style Query fill:#1a1a2e,stroke:#e94560,color:#fff
    style Storage fill:#0f3460,stroke:#16213e,color:#fff
    style Disk fill:#16213e,stroke:#0f3460,color:#fff

Query Processor — parses SQL, optimizes the execution plan, runs the query. This is where the database decides how to answer your question — full table scan or index lookup, nested loop join or hash join.

Buffer Pool — an in-memory cache of disk pages. The database doesn’t read from disk on every query. It keeps hot pages in memory and evicts cold ones. Same idea as the OS page cache, but the database manages it directly because it knows its access patterns better than the OS does.

Index Manager — B-Trees (or LSM-Trees, or hash indexes) that let you find rows without scanning every page. An index on user_id means SELECT * FROM orders WHERE user_id = 42 touches a handful of pages instead of millions.

Transaction Manager — coordinates concurrent transactions, enforces isolation, manages locks or MVCC (Multi-Version Concurrency Control) snapshots.

WAL / Log Manager — the write-ahead log. Every change is logged before it’s applied. This is the foundation of both atomicity and durability.

Why Not Just Use Files?

You can store data in a file. serde_json can serialize any Rust struct to disk in three lines. Why bother with a database at all?

Because files don’t give you any of the four invariants.

ProblemFileDatabase
Crash during writePartial write, corrupted fileWAL ensures atomic recovery
Two processes write simultaneouslyLast writer wins, data lostTransactions serialize access
Find one record in a millionRead entire file, linear scanB-Tree index, O(log n)
Schema changesHope your parsing code handles itALTER TABLE, constraints enforced
Data exceeds RAMRewrite everything on every changeBuffer pool, page-level I/O
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// The "just use a file" approach:
use std::fs;
use serde::{Serialize, Deserialize};

#[derive(Serialize, Deserialize)]
struct AppState {
    users: Vec<User>,
    orders: Vec<Order>,
}

fn save(state: &AppState) -> std::io::Result<()> {
    let json = serde_json::to_string(state)?;
    fs::write("state.json", json)?;  // ← NOT atomic
    Ok(())
}

// What happens when:
// - Power fails during fs::write? Corrupted file.
// - Two instances call save() at once? Last write wins.
// - You have 10 million orders? Serialize/deserialize ALL of them
//   just to update one.
// - You need orders by user_id? Linear scan every time.
//
// A database solves all of these. That's the point.

What This Series Will Cover

We’ll go deep into each layer of the database, starting from the bottom — where bytes meet disk — and working up to the query optimizer.

PostLayerThe Invariant
This postOverviewACID: the four promises
Storage & PagesDisk I/OFixed-size pages, no partial reads
Buffer PoolMemoryHot pages cached, eviction is invisible
B-TreesIndexingSorted, balanced, O(log n) always
WALRecoveryCommitted = durable, always
Transactions & MVCCConcurrencyReaders don’t block writers
Query ProcessingOptimizationSame result, minimum cost

The goal is not to memorize PostgreSQL internals. It’s to understand the invariants that every database must uphold — and then see how specific systems (PostgreSQL, SQLite, RocksDB) choose different mechanisms to enforce them. The invariants are universal. The implementations are tradeoffs.

Next up: disk I/O and pages — why databases don’t read bytes, they read pages, and why the fixed-size page is the most important data structure you’ve never thought about.


This is Post 1 of the series Invariants the Storage Engine Keeps — databases through the guarantees they make about your data, and what breaks when they can’t.

This post is licensed under CC BY 4.0 by the author.