RegreSQL: Regression Testing for PostgreSQL Queries

TL;DR - RegreSQL brings PostgreSQL's regression testing methodology to your application queries, catching both correctness bugs and performance regressions before production.

As puzzling as it might seem, the common problem with production changes is the ever-present "AHA" moment when things start slowing down or crashing straight away. Testing isn't easy as it is, but there's a widespread practice gap when it comes to testing SQL queries. Some might pretend to "fix it" by using ORMs to abstract away the problem. Others treat SQL as "just glue code" that doesn't deserve systematic testing. Most settle for integration tests that verify the application layer works, never actually testing whether their queries will survive the next schema change or index modification.

For PostgreSQL development itself, the project has a robust regression test suite that has been preventing disasters in core development for decades. The database itself knows how to test SQL systematically - we just don't use those same techniques for our own queries. Enter RegreSQL, a tool originally created by Dimitri Fontaine for The Art of PostgreSQL book (which is excellent for understanding and mastering PostgreSQL as a database system), designed to bring the same regression testing framework to our application queries.

I've been trying to use it for some time, but due to missing features and limitations gave up several times. Until now. I decided to fork the project and spend the time needed to take it to the next level.

Introduction

The RegreSQL promise starts with the biggest strength and perceived weakness of SQL queries. They are just strings. And unless you use something like sqlc (for Go), PG'OCaml or Rust's SQLx toolkit giving you compile-time checking, your queries are validated only when they are executed. Which in better case mean either usually slow-ish test suite or integration tests, in worst scenario only when deployed. ORMs are another possibility - completely abstracting away SQL (but more on that later).

But even with compile-time checking, you are only checking for one class of problems: schema mismatches. What about behavior changes after schema migration or performance regressions? What about understanding whether your optimization actually made things faster or just moved the problem elsewhere?

This is where RegreSQL comes in. Rather than trying to turn SQL into something else, RegreSQL embraces "SQL as strings" reality and applies the same testing methodology PostgreSQL itself uses: regression testing. You write (or generate - continue reading) your SQL queries, provide input data, and RegreSQL verifies that future changes don't break those expectations.

The features don't stop there though - it tracks performance baselines, detects common query plan regressions (like sequential scans), and gives you framework for systematic experimentation with the schema changes and query change management.

Basic regression testing

Enough with theory. Let's jump in straight into the action and see what a sample run of RegreSQL looks like

$ regresql text
Connecting to 'postgres://radim:password123@192.168.139.28/cdstore_test'… ✓

Running regression tests...

✓ album-by-artist_list-albums-by-artist.1.json (0.00s)
✓ album-by-artist_list-albums-by-artist.2.json (0.00s)

✓ album-tracks_list-tracks-by-albumid.2.json (0.00s)
✓ album-tracks_list-tracks-by-albumid.1.json (0.00s)

✓ artist_top-artists-by-album.1.json (0.00s)

✓ genre-topn_genre-top-n.top-1.json (0.00s)
✓ genre-topn_genre-top-n.top-3.json (0.00s)

✓ genre-tracks_tracks-by-genre.json (0.00s)

Results: 8 passed, 0 failed, 8 skipped (0.00s)

In this example based on Chinook database (as used originally in The Art of PostgreSQL book), RegreSQL scans the current directory (or one provided by -C /path/to/project) for *.sql files and attempts to run all queries against the configured PostgreSQL connection.

The individual files can contain either single or multiple sql queries. Like following example

-- name: top-artists-by-album
-- Get the list of the N artists with the most albums
SELECT
    artist.name,
    count(*) AS albums
FROM
    artist
    LEFT JOIN album USING (artist_id)
GROUP BY
    artist.name
ORDER BY
    albums DESC
LIMIT :n;

The syntax for the queries supports both positional arguments (like $1 known from libpq library) or (preferred) psql style variable (:varname). The each identified query (not file) is then executed for 0..N times, based on number of predefined plans and verified to the expected results - validating the expected data matches the one returned. The support for SQL files handling is available separately with https://github.com/boringSQL/queries (Go version only for now).

This gives you what original RegreSQL tool has introduced - change your schema, refactor a query, run regresql test and see immediately what broke. The test suite now has ability to catch regressions before they are committed / shipped. The current version built on top of it, giving you better console formatter instead of TAP style output, as well as jUnit, JSON and GitHub actions formatters for better integration into your CI/CD pipelines.

Performance regression testing

Basic regression testing catches correctness issues - wrong results, broken queries, schema mismatches. But there's another class of production issues it misses. Performance regressions. No matter how unbelievable it might sound but queries get deployed without appropriate indexes, or they change over time. Simple fix - both for handwritten SQL or ORM code - can switch from milliseconds to seconds. You add index that helps one query, but tanks another. You modify conditionals and accidently force sequential scan of millions of rows. This is where it hurts.

RegreSQL addresses this by tracking performance baselines alongside correctness. Once baselines are generated

$ regresql baseline
Connecting to 'postgres://appuser:password123@192.168.139.28/cdstore_test'… ✓
Creating baselines directory: regresql/baselines
Creating directory 'regresql/baselines'

Creating baselines for queries:

  ./
  Created baseline: album-by-artist_list-albums-by-artist.1.json
  Created baseline: album-by-artist_list-albums-by-artist.2.json
  Created baseline: album-tracks_list-tracks-by-albumid.1.json
  Created baseline: album-tracks_list-tracks-by-albumid.2.json
  Created baseline: artist_top-artists-by-album.1.json
  Created baseline: genre-topn_genre-top-n.top-1.json
  Created baseline: genre-topn_genre-top-n.top-3.json
  Created baseline: genre-tracks_tracks-by-genre.json

Baselines have been created successfully!
Baseline files are stored in: regresql/baselines

the test command not only tests the regressions to the captured times, but also detects the common bad patterns in query execution plans. For now it provides warnings for detection of sequential scans - both on their and/or with nested loops and multiple sort operations. I believe this alone might provide a valuable insights and reduce the mishaps in production. It's also a place where further development of RegreSQL will take place.

To demonstrate this, let's review the test output with the baselines.

Connecting to 'postgres://appuser:password123@192.168.139.28/cdstore_test'… ✓

Running regression tests...

✓ album-by-artist_list-albums-by-artist.1.json (0.00s)
✓ album-by-artist_list-albums-by-artist.2.json (0.00s)
✓ album-by-artist_list-albums-by-artist.1.cost (22.09 <= 22.09 * 110%) (0.00s)
  ⚠️  Sequential scan detected on table 'artist'
    Suggestion: Consider adding an index if this table is large or this query is frequently executed
  ⚠️  Nested loop join with sequential scan detected
    Suggestion: Add index on join column to avoid repeated sequential scans
✓ album-by-artist_list-albums-by-artist.2.cost (22.09 <= 22.09 * 110%) (0.00s)
  ⚠️  Sequential scan detected on table 'artist'
    Suggestion: Consider adding an index if this table is large or this query is frequently executed
  ⚠️  Nested loop join with sequential scan detected
    Suggestion: Add index on join column to avoid repeated sequential scans

✓ album-tracks_list-tracks-by-albumid.1.json (0.00s)
✓ album-tracks_list-tracks-by-albumid.2.json (0.00s)
✓ album-tracks_list-tracks-by-albumid.1.cost (8.23 <= 8.23 * 110%) (0.00s)
✓ album-tracks_list-tracks-by-albumid.2.cost (8.23 <= 8.23 * 110%) (0.00s)

✓ artist_top-artists-by-album.1.json (0.00s)
✓ artist_top-artists-by-album.1.cost (35.70 <= 35.70 * 110%) (0.00s)
  ⚠️  Multiple sequential scans detected on tables: album, artist
    Suggestion: Review query and consider adding indexes on filtered/joined columns

✓ genre-topn_genre-top-n.top-1.json (0.00s)
✓ genre-topn_genre-top-n.top-3.json (0.00s)
✓ genre-topn_genre-top-n.top-1.cost (6610.59 <= 6610.59 * 110%) (0.00s)
  ⚠️  Multiple sequential scans detected on tables: genre, artist
    Suggestion: Review query and consider adding indexes on filtered/joined columns
  ⚠️  Multiple sort operations detected (2 sorts)
    Suggestion: Consider composite indexes for ORDER BY clauses to avoid sorting
  ⚠️  Nested loop join with sequential scan detected
    Suggestion: Add index on join column to avoid repeated sequential scans
✓ genre-topn_genre-top-n.top-3.cost (6610.59 <= 6610.59 * 110%) (0.00s)
  ⚠️  Multiple sequential scans detected on tables: artist, genre
    Suggestion: Review query and consider adding indexes on filtered/joined columns
  ⚠️  Multiple sort operations detected (2 sorts)
    Suggestion: Consider composite indexes for ORDER BY clauses to avoid sorting
  ⚠️  Nested loop join with sequential scan detected
    Suggestion: Add index on join column to avoid repeated sequential scans

✓ genre-tracks_tracks-by-genre.json (0.00s)
✓ genre-tracks_tracks-by-genre.cost (37.99 <= 37.99 * 110%) (0.00s)
  ⚠️  Multiple sequential scans detected on tables: genre, track
    Suggestion: Review query and consider adding indexes on filtered/joined columns

Results: 16 passed (0.00s)

As you can see, despite from not having baseline, RegreSQL is able to detect the basic bad patterns that should be addressed before queries can be considered "production ready".

In some cases, having the detection of sequential scans, or just tracking query costs baselines might be considered undesirable, which would lead to false positives. RegreSQL enables this to be addressed by query metadata as demonstrated below.

-- name: query_name
-- metadata: key1=value1, key2=value2
SELECT ...;

At this point RegreSQL recognizes

notest to skip the query testing altogether (not just cost tracking)
nobaseline to skip cost tracking
noseqscanwarn to keep cost tracking but disable sequential scan warnings
and difffloattolerance to cost failure threshold (default 10% at the moment).

-- name: query_name
-- regresql: notest, nobaseline
-- regresql: noseqscanwarn
-- regresql: difffloattolerance:0.25
-- query that can vary in cost by 20% without being considered a failure
SELECT ...;

ORM enters the room

ORMs abstract away SQL, but they still generate it - and that generated SQL can have performance problems you won't catch until production. Consider this common scenario: you start with a simple SQLAlchemy query that works fine, then months later add eager loading for related data:

orders = (
    session.query(Order)
    .filter(Order.user_id == user_id)
    .options(
        joinedload(Order.user),
        joinedload(Order.shipping_address),
        selectinload(Order.items)  # NEW: Load order items
    )
    .all()
)

That innocent selectinload(Order.items) generates a separate query - and without an index on order_items.order_id, it performs a sequential scan.

RegreSQL can catch this by intercepting ORM-generated SQL using SQLAlchemy's event system:

@event.listens_for(engine, "before_cursor_execute")
def capture_sql(conn, cursor, statement, *args):
    captured_queries.append(statement)

Run your ORM code, capture the SQL, save it as a .sql file, and test it with RegreSQL. The performance baseline testing will flag the missing index before it hits production. This is currently experimental, but ORM integration is a key area for RegreSQL's future development.

Test Data Management

Up until now we have covered how RegreSQL verifies query correctness and tracks performance regressions. But there's a critical prerequisite we've only skimmed through. Every regression test needs consistent, reproducible data. Change the data, change their cardinality, and your expected results become meaningless. Your performance baselines drift. Your tests become flaky.

Traditional approach to create test data might involve

Database dumps become unmanageable - 500MB files you can't review, can't understand, that break with every schema migration, and whose data becomes stale as production evolves. Which version of the dump are your tests even using?
SQL scripts might be better than dumps, but still imperative and hard to maintain. You end up with INSERT statements scattered across multiple files, managing foreign keys manually, and debugging constraint violations.
Factories in application code might work great for integration tests, but we're testing SQL directly. Do you really want to maintain parallel data generation in your application language just for SQL tests?
Shared test database is the synonym for classic "works on my machine" problem. State leaks between tests. Parallel execution becomes impossible. Debugging is a nightmare.

What we need is something that's declarative (what data, not how to insert it), reproducible (similar data every time), composable (build complex scenarios from simple pieces), and scalable (from 10 rows to 100,000).

This is where next improvement in RegreSQL's fixture system comes in. Think of it as infrastructure-as-code for your test data. You describe the data you need in YAML files, and RegreSQL handles the rest - dependencies, cleanup, foreign keys, and even realistic data generation at scale.

RegreSQL's fixture system lets you define test data in YAML files stored in regresql/fixtures/. Here's a simple example

  fixture: basic_users
  description: a handful of test users
  cleanup: rollback

  data:
    - table: users
      rows:
        - id: 1
          email: alice@example.com
          name: Alice Anderson
          created_at: 2024-01-15
        - id: 2
          email: bob@example.com
          name: Bob Builder
          created_at: 2024-02-20

To use this fixture in your tests, reference it in the query's plan file (regresql/plans/get-user.yaml) you can just reference the fixture

  fixtures:
    - basic_users

  "1":
    email: alice@example.com

  "2":
    email: bob@example.com

And when you run regresql test, the fixture is automatically loaded before the query executes, and cleaned up afterward. No manual setup scripts, no state leakage between tests. But it does not stop with static fixtures. When you want to test queries against realistic volumes you can use range of data generators including

sequences, random integer, decimal, string, uuid, email and name generators
date_between for generating random timestamps within a range
foreign key references to be able to reuse data from other table's fixtures
range to select value from predefined sources
Go template support

 fixture: realistic_orders
  generate:
    - table: customers
      count: 1000
      columns:
        id:
          generator: sequence
          start: 1
        email:
          generator: email
          domain: shop.example.com
        name:
          generator: name
          type: full
        created_at:
          generator: date_between
          start: "2023-01-01"
          end: "2024-12-31"

    - table: orders
      count: 5000
      columns:
        id:
          generator: sequence
          start: 1
        customer_id:
          generator: int
          min: 1
          max: 1000
        amount:
          generator: decimal
          min: 10.00
          max: 999.99
          precision: 2
        order_date:
          generator: date_between
          start: "2023-01-01"
          end: "2024-12-31"

This generates 1,000 customers and 5,000 orders with realistic-looking data - names, emails, dates, and amounts that feel production-like.

The fixtures are also stackable and can be build on top of each other. For example if you need to make sure users fixtures are created before orders fixtures, just declare the dependency (the already planned improvement is to include the support automatic foreign-key detection to avoid ID hard-coding). RegreSQL loads fixtures in dependency order and handles cleanup in reverse.

  fixture: orders_with_shipping
  depends_on:
    - basic_users

  data:
    - table: orders
      rows:
        - id: 101
          user_id: 1  # References Alice from basic_users
          total: 99.99
          status: shipped

Should the available options for fixtures (manual data or data generators) not be enough, you always have options to use good old SQL based data generation.

  fixture: mixed_setup
  description: Combine SQL with YAML and generated data
  cleanup: rollback

  # SQL executes first (either as file or inline)
  sql:
    - file: sql/setup_schema.sql
    - inline: "INSERT INTO config (key, value) VALUES ('version', '1.0');"

  # followed YAML data
  data:
    - table: users
      rows:
        - id: 1
          email: admin@example.com

  # and finally generated data
  generate:
    - table: orders
      count: 100
      columns:
        id:
          generator: sequence
          start: 1
        user_id:
          generator: int
          min: 1
          max: 1

RegreSQL provides commands to inspect and validate your fixtures

  # List all available fixtures
  regresql fixtures list

  # Show fixture details and dependencies
  regresql fixtures show realistic_orders

  # Validate fixture definitions
  regresql fixtures validate

  # Show dependency graph
  regresql fixtures deps

  # Apply fixture manually (for debugging)
  regresql fixtures apply basic_users

The fixture system has been design to transforms test data from a maintenance burden into a documented, version-controlled process. Your YAML files become the single source of truth for what data your tests need, making it easy to understand test scenarios and maintain test data as the application evolves.

(EDIT 2025-11-20) The dynamically generated fixtures on each tests will break the correctness testing of RegreSQL. You have to use fixtures according to your use

Fixed generated fixtures re-used between tests
Use RegreSQL to test fixtures before/after migration

I'm working on next release of RegreSQL to address DX of fixtures usage.

RegreSQL future

Introducing a new open source project is an ambitious goal, and RegreSQL is just starting up. Despite the fork being in works for almost 2 years. In coming weeks and months I plan further improvements, as well as better documentation and more tutorials. The project is maintained as part of my boringSQL brand, where it's vital component (together with pgTap) for building SQL Labs which (as I sincerely hope) will provide a foundation for its further development.

At the same time RegreSQL is an attempt to give back to welcoming PostgreSQL community, make developer user experience slightly better if possible and (just maybe) provide one more argument against the case that SQL queries are not testable.

RegreSQL is available at GitHub - feel free to open issue, or drop me email about the project at radim@boringsql.com or connect on LinkedIn.