Expand SQLite3 data validation by PimSanders · Pull Request #23 · fox-it/dissect.database

PimSanders · 2026-02-02T15:30:23Z

This PR close #16 by expanding the data validation capabilities in SQLite3.

The SQLite3 WAL file can store multiple versions of the same frame, when reading only valid frames should be returned. The docs define a valid frame as follows:

A frame is considered valid if and only if the following conditions are true:

The salt-1 and salt-2 values in the frame-header match salt values in the wal-header

The checksum values in the final 8 bytes of the frame-header exactly match the checksum computed consecutively on the first 24 bytes of the WAL header and the first 8 bytes and the content of all frames up to and including the current frame.

The first check was already implemented, I have interpreted the second check as:

The checksum values in the final 8 bytes of the frame-header (checksum-1 and checksum-2) exactly match the computed checksum over:

the first 24 bytes of the WAL header

the first 8 bytes of each frame header (up to and including this frame)

the page data of each frame (up to and including this frame)

When initializing a database the option validate_checksum can be passed to use the new validation. I have chosen to only calculate the salts by default (just like before) as this will probably be good enough, and a lot faster. See the example below for the time impact:

In [1]: %timeit -n10 list(list(sqlite3.SQLite3(Path("./big.sqlite"), Path("./big.sqlite-wal"), validate_checksum=False).tables())[0].rows())
33 ms ± 1.06 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [2]: %timeit -n10 list(list(sqlite3.SQLite3(Path("./big.sqlite"), Path("./big.sqlite-wal"), validate_checksum=True).tables())[0].rows())
1.05 s ± 4.88 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

codecov · 2026-02-02T15:33:06Z

Codecov Report

❌ Patch coverage is 0% with 23 lines in your changes missing coverage. Please review.
✅ Project coverage is 0.00%. Comparing base (64ae6d8) to head (f4b6ffb).
⚠️ Report is 5 commits behind head on main.

Files with missing lines	Patch %	Lines
dissect/database/sqlite3/wal.py	0.00%	22 Missing ⚠️
dissect/database/sqlite3/sqlite3.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@          Coverage Diff           @@
##            main     #23    +/-   ##
======================================
  Coverage   0.00%   0.00%            
======================================
  Files        146     150     +4     
  Lines       3881    4086   +205     
======================================
- Misses      3881    4086   +205

Flag	Coverage Δ
unittests	`0.00% <0.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codspeed-hq · 2026-02-02T15:48:05Z

Merging this PR will not alter performance

✅ 6 untouched benchmarks

_{Comparing PimSanders:improvement/expand-wal-validation (f4b6ffb) with main (6149d6f)¹}

No successful run was found on main (798cf10) during the generation of this report, so 6149d6f was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩

Schamper

Maybe add a benchmark test too? I'll look at the actual checksum checking part later when I have a bit more time.

dissect/database/sqlite3/sqlite3.py