Skip to main content
  1. Posts/

SQLite for Continuous Ingestion — They Do Things Differently There

9 mins

SQLite is the right database for most desktop applications. It’s embedded, zero-config, transactional, and fast enough for almost any local workload. The problem is that its out-of-the-box defaults were calibrated for a different era and a different access pattern — single-process interactive queries, not continuous multi-threaded ingestion pipelines.

The cracks appear the moment you have more than two threads writing simultaneously. Reads blocking on writes. Periodic fsync spikes visible as battery drain. Query latency climbing as tables grow. The workload here is a local-first desktop tool that continuously ingests events from multiple sources: filesystem watchers, API pollers, behavioral samplers firing on a fixed interval.

You can fix SQLite for this. The fix is a small set of pragmas applied at connection open time — none of them obscure, all of them documented, the combination just not obvious until you’ve hit the problems they solve. Going through that exercise also clarified when the answer is something else entirely: redb, a pure-Rust embedded database that gets the concurrency properties right by construction rather than configuration, at the cost of giving up SQL.

Both are worth knowing.

Act I: The Default Journal Mode #

SQLite’s default mode is rollback journaling. On a write, SQLite copies the original pages to a journal file, writes new data to the main database, then deletes the journal on commit. Safe, simple, correct. Also completely serializing: a write in progress blocks readers, a reader holding a shared lock makes writers wait.

For a desktop app with a background ingestion thread and a UI thread reading to render, this means your interface freezes on every background commit. Under any sustained write load it stops being intermittent. It’s constant.

WAL mode — Write-Ahead Logging — fixes this at the architecture level. Writes append to a separate WAL file rather than modifying the database directly. Readers hold a snapshot of the last checkpointed state and apply any relevant WAL pages on top. Readers and writers operate on different views of the data simultaneously, without blocking each other.

PRAGMA journal_mode=WAL;

This is the single most impactful change. Everything else is tuning. This one changes the concurrency model.

There’s a side-effect worth knowing before you ship. WAL mode turns your single database.db file into three: database.db, database.db-wal, and database.db-shm. The WAL file holds pending writes; the SHM file is a shared memory index for coordinating readers. If you copy just database.db for backup or migration, you may be missing data. Before any backup operation, checkpoint to flush everything back into the main file:

PRAGMA wal_checkpoint(TRUNCATE);

Act II: The fsync Tax #

The default synchronous=FULL calls fsync after every write. For a financial ledger this is the right call — it guarantees durability against OS crashes. For an ingestion buffer where the source data is recoverable from origin (the Slack thread still exists in Slack, the GitHub event still exists in the API, the filesystem change is still on disk), it’s an expensive guarantee you did not ask for.

synchronous=NORMAL skips some intermediate fsyncs but still syncs at WAL checkpoints. You can lose at most the current WAL window on an OS crash, and even that is recoverable from source. The throughput difference on a modern SSD is roughly 3–5× for small sequential writes. On battery the impact is more significant — each unnecessary fsync is a disk wakeup.

PRAGMA synchronous=NORMAL;

Act III: The Page Size Gotcha #

SQLite’s default page size is 4096 bytes. For an ingestion workload with moderate row sizes and frequent sequential scans — “give me the last N events”, “scan all snapshots for this pattern” — 4096 means more B-tree levels and more page reads for the same data.

8192 doubles the rows per page and reduces tree depth. Sequential scans touch fewer pages; the page cache holds more data per entry.

PRAGMA page_size=8192;

The catch, and it is a serious one: this pragma has no effect if any tables already exist. It must be set before schema initialization on a new database. In Rust:

fn open(path: &Path) -> Result<Self> {
    let is_new = !path.exists();
    let conn = Connection::open(path)?;

    if is_new {
        conn.execute_batch("PRAGMA page_size=8192;")?;
    }

    conn.execute_batch("
        PRAGMA journal_mode=WAL;
        PRAGMA synchronous=NORMAL;
        -- rest of configuration
    ")?;

    Self::initialize_schema(&conn)?;
    Ok(Self { conn })
}

If you set it after tables exist, SQLite silently ignores it. No error. No warning. Wrong page size forever. The !path.exists() check before the first connection is the correct gate.

Act IV: The Rest of the Block #

Five additional settings round out the configuration for an ingestion-heavy workload:

PRAGMA cache_size=-64000;       -- 64MB page cache (negative = kilobytes)
PRAGMA temp_store=MEMORY;       -- intermediate results in RAM, not disk
PRAGMA mmap_size=268435456;     -- 256MB memory-mapped I/O
PRAGMA wal_autocheckpoint=1000; -- checkpoint every ~8MB of WAL at 8192 page size
PRAGMA foreign_keys=ON;         -- not on by default; should always be on

cache_size controls how many pages SQLite holds in memory. The default is 2MB — adequate for a small database, inadequate once analytical queries start spanning thousands of rows. 64MB is reasonable for a desktop app on a machine with 16GB RAM. The negative sign specifies kilobytes rather than a page count, making the setting machine-independent as page size varies.

temp_store=MEMORY moves SQLite’s internal temporary tables — used for intermediate results in joins, ORDER BY, GROUP BY — from disk to RAM. For queries over large result sets this eliminates a class of disk I/O that otherwise appears as an unexplained slowdown partway through query execution.

mmap_size enables memory-mapped I/O up to the specified limit. Instead of read() syscalls, reads become memory accesses and the OS manages paging. The ceiling means SQLite maps up to that much of the database file; if the database is smaller, only that much is mapped. For sequential scan workloads this is meaningfully faster than buffered reads.

wal_autocheckpoint=1000 is the default, but worth being explicit about. With 8192-byte pages, 1000 pages is roughly 8MB of WAL before autocheckpoint — a fine cadence for most ingestion workloads. If writes are particularly bursty, lower it; if writes are infrequent and reads are latency-sensitive, raise it.

One more setting — not a data model pragma but essential for multi-threaded access:

conn.busy_timeout(Duration::from_millis(5000))?;

Without this, if two threads attempt a write simultaneously, the second receives SQLITE_BUSY immediately and you need retry logic everywhere. With busy_timeout, SQLite waits up to the specified duration before returning busy. For a desktop app with a small number of concurrent writers, 5 seconds means you should never see a legitimate busy error in practice — and if you do, it signals real contention worth investigating rather than a transient timing issue to paper over with a retry.

The Complete Constructor #

impl SqliteStore {
    pub fn open(path: &Path) -> Result<Self> {
        let is_new = !path.exists();
        let conn = Connection::open(path)?;

        conn.busy_timeout(Duration::from_millis(5000))?;

        if is_new {
            // Must precede all table creation — silently ignored otherwise
            conn.execute_batch("PRAGMA page_size=8192;")?;
        }

        conn.execute_batch("
            PRAGMA journal_mode=WAL;
            PRAGMA synchronous=NORMAL;
            PRAGMA cache_size=-64000;
            PRAGMA temp_store=MEMORY;
            PRAGMA mmap_size=268435456;
            PRAGMA wal_autocheckpoint=1000;
            PRAGMA foreign_keys=ON;
        ")?;

        Self::initialize_schema(&conn)?;
        Ok(Self { conn: Arc::new(Mutex::new(conn)) })
    }
}

This belongs in the constructor, not scattered in application code. Every connection gets the right configuration automatically. These pragmas are not optional tuning — they are SQLite working correctly for this workload. The defaults just happen to target a different one.

Why SQLite Ships This Way #

WAL was added in 2010. The SQLite authors are famously conservative about changing defaults because doing so breaks existing deployments that rely on implicit behavior. WAL remains opt-in even though for modern desktop app workloads it is almost always the right choice.

Apple’s CoreData stack enables WAL mode by default since iOS 12 / macOS 10.14. That is about as close to an official acknowledgment as you will get that these settings are not esoteric — they are sensible defaults that the library cannot adopt without breaking backward compatibility.

The pragmas above are not a workaround for SQLite limitations. They are SQLite working as designed. The defaults are aimed at a different target, and the library has to serve both.

Act V: When the Alternative Is Simpler by Design #

Working through the pragma configuration clarified something. What WAL mode gives you is a specific concurrency property: readers never block writers, writers never block readers. SQLite achieves this by configuration. redb achieves it structurally.

redb stores data in copy-on-write B-trees. A reader holds a snapshot of the tree at transaction start. A writer creates new versions of modified pages. Reader and writer are looking at different tree versions simultaneously with no locking between them. The concurrency property is not something you configure — it is how the storage model works.

// redb — no pragma block, no busy_timeout, no page_size ordering constraint
impl EventStore {
    pub fn open(path: &Path) -> Result<Self> {
        let db = Database::create(path)?;

        let write_txn = db.begin_write()?;
        write_txn.open_table(EVENT_LOG)?;
        write_txn.commit()?;

        Ok(Self { db })
    }

    pub fn append(&self, offset: u64, payload: &[u8]) -> Result<()> {
        let write_txn = self.db.begin_write()?;
        {
            let mut table = write_txn.open_table(EVENT_LOG)?;
            table.insert(offset, payload)?;
        }
        write_txn.commit()?;
        Ok(())
    }

    pub fn read_from(&self, from: u64, limit: usize) -> Result<Vec<(u64, Vec<u8>)>> {
        let read_txn = self.db.begin_read()?;
        let table = read_txn.open_table(EVENT_LOG)?;
        table.range(from..)?
            .take(limit)
            .map(|r| r.map(|(k, v)| (k.value(), v.value().to_vec())))
            .collect()
    }
}

No pragma block. No busy_timeout. No page size ordering constraint. The table type is checked at compile time — EVENT_LOG is a TableDefinition<u64, &[u8]> and the compiler enforces it throughout. The constructor is open and go.

The backup story is also cleaner. redb is a single file. No -wal and -shm companions to track. A file copy is always a consistent snapshot because copy-on-write writes never partially overwrite committed data.

Where SQLite still wins. redb is a key-value store. There is no SQL, no joins, no WHERE timestamp > x AND source_id = y. If your access patterns require relational queries — aggregations over many columns, joins across tables, full-text search, complex filtering — SQLite with the pragma block is the right tool and redb cannot substitute for it.

The natural split for a multi-role storage layer is to use each where it fits: redb for the event log (append, read from offset, advance cursor), SQLite or a richer embedded store for anything requiring relational expressiveness. A trait boundary between the two keeps the substitution localized; nothing upstream or downstream changes.

The Rule of Thumb #

If your storage role is append-heavy with simple read patterns and no SQL requirement, redb gives you the correct concurrency model without any configuration ceremony. If you need SQL, use SQLite and apply the pragma block above — it is SQLite working as it should, just not as it ships by default.

The defaults are set for a different world. For this one, you have to bring your own.