Same rows, different SUM

Everyone knows not to store money as a double precision. One can hope. The rule is so well drilled that it has stopped being interesting, and it is also not where the trouble usually starts. The float is already in the schema before anyone weighs in on it: a measurement column someone later sums for a report, telemetry that drifts into a finance dashboard, a third-party feed ingested as double precision because that is how it arrived.

Here is the part the rule does not warn you about. Take a table of five million floating-point readings, sum the column, and run it three times in a row. Nothing else touches the table. Same connection, same data, same statement.

SELECT sum(reading) FROM measurements;

        sum
--------------------
 2500519211.7874823

       sum
-------------------
 2500519211.787477

        sum
--------------------
 2500519211.7874575

Three runs, three different totals. No UPDATE, no concurrent writer, no random seed. The rows did not change between runs, and the query is the same character for character. Yet the answer is not.

This is not a Postgres bug, and it is not specific to Postgres. It is what happens when floating-point arithmetic meets parallel aggregation, and it has been generating "my dashboard total changed and I have no idea why" tickets for as long as databases have parallelized. The non-determinism does not wait for you to opt into bad practice; it shows up the moment a parallel plan runs over whatever floats you happen to have.

The schema

One column of double precision, five million rows.

CREATE TABLE measurements (
    id      integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    reading double precision NOT NULL
);

INSERT INTO measurements (reading)
SELECT random() * 1000
FROM generate_series(1, 5000000);

ANALYZE measurements;

Five million rows is enough that Postgres parallelizes the sum on its own. No settings forced, defaults all the way:

EXPLAIN (COSTS OFF) SELECT sum(reading) FROM measurements;

                     QUERY PLAN
-----------------------------------------------------
 Finalize Aggregate
   ->  Gather
         Workers Planned: 2
         ->  Partial Aggregate
               ->  Parallel Seq Scan on measurements

Two workers each sum part of the table in parallel, and a final step folds the partial sums together. Everything that follows is in that split.

Floating point does not associate

On paper, (a + b) + c and a + (b + c) are the same number. Postgres is not doing arithmetic on paper. Floating-point addition is not associative: change the grouping and you can change the answer, because every intermediate result is rounded to fit 64 bits. Move the parentheses and you move which roundings happen.

The textbook demonstration takes three values and one subtraction:

SELECT 1.0e20::float8 + 1.0::float8 - 1.0e20::float8;

 ?column?
----------
        0

The answer should be 1. Read the expression left to right and watch where it goes wrong:

1.0e20 + 1.0 should be 100000000000000000001. But a double only carries about 16 significant digits, and that number needs 21. The trailing 1 falls off the end, below the smallest digit the number can represent. So the addition rounds back to exactly 1.0e20.
1.0e20 - 1.0e20 is then exactly 0.

The 1 never made it past the first step. It was swallowed by the addition, before the subtraction even ran. This is absorption: a value too small relative to its neighbour to leave a mark. It is not a database phenomenon. The same expression evaluates to 0 in C, Python, JavaScript, and every other language that uses IEEE 754 doubles.

A double precision number keeps 52 bits for the mantissa, the digits that carry precision. This is around 15 to 17 decimal digits, no more. When the running total becomes big and the next number is small, the small one does not fit into those bits, so the rounding throws its tail away. Sum five million numbers and the total is, for most of its life, much bigger than any single number you add to it.

So the sum of a column depends on the order in which the rows are added. Not on which rows, on the order. That is the seam parallel query pries open.

Why parallelism surfaces it

The plan spells out exactly how. Parallel Seq Scan does not split the table into fixed slices upfront. It hands every worker a shared pointer into the same block iterator, and each worker grabs the next block, processes it, and comes back for another. Each worker runs its own Partial Aggregate, summing only the rows it happened to grab, and by default the leader process pitches in as a third one (parallel_leader_participation). Gather collects those partial sums in whatever order they arrive, and Finalize Aggregate folds them into a single total.

Two things vary from run to run, and neither is under your control:

Which rows each process sees. Because the blocks are pulled from a shared iterator on demand, the split is a race. A worker that gets a few microseconds ahead, for any reason the OS scheduler decides, grabs the next block, so the same worker ends up with a different set of rows on every run. The division is never guaranteed to be the same twice.
The order the partial sums come back. Finalize Aggregate folds the partials into its total in arrival order. With only two values this would not matter, since a + b equals b + a. But there are three of them, the two workers and the leader, and combining three floats is once again an associativity question: which two get added first decides the rounding.

Both feed directly into the one thing float addition is sensitive to: grouping. Different row split, different partial sums; different arrival order, different final grouping of the additions. The total lands a few bits away from where it landed last time. Across five million readings that shows up in the sixth decimal place, exactly where the three runs above disagreed.

sum(float8) is marked parallel-safe, which is why the planner is free to do this. It is safe in the sense that it will not crash or corrupt anything. It is not safe in the sense people assume, that the answer is a function of the input.

This is not unique to SUM. AVG, STDDEV, VARIANCE, and any other floating-point aggregate that accumulates across rows inherit the same order sensitivity. SUM is just the easiest to demonstrate and the one most likely to land in a number a human reads.

It was never really parallelism

It is tempting to file this under "parallel query is flaky" and turn it off. That misses the point. Parallelism does not cause the non-determinism, it only exposes an order-dependence that was there the whole time.

Turn parallelism off completely and the sum becomes stable:

SET max_parallel_workers_per_gather = 0;

SELECT sum(reading) FROM measurements;  -- 2500519211.787516
SELECT sum(reading) FROM measurements;  -- 2500519211.787516
SELECT sum(reading) FROM measurements;  -- 2500519211.787516

Stable, but only because a serial sequential scan visits the heap in the same physical order every time, so the additions group the same way every time. The order is fixed, so the rounding is fixed. Change the order and the serial sum moves too. We can force different orders without any parallelism at all, using an ordered-set aggregate:

SET max_parallel_workers_per_gather = 0;     -- serial, so ONLY order varies

SELECT sum(reading ORDER BY id)           FROM measurements;  -- 2500519211.787516
SELECT sum(reading ORDER BY reading)      FROM measurements;  -- 2500519211.7876067
SELECT sum(reading ORDER BY reading DESC) FROM measurements;  -- 2500519211.7875423

Same rows, same single thread, three summation orders, three answers. Parallelism is one way to shuffle the order, but a plain ORDER BY inside the aggregate does it just as well. The dependence is on order, full stop.

Deterministic is not the same as correct

There is a subtler trap hiding in the "turn it off and it is stable" result. A stable answer is not necessarily the right answer. To see the true total, sum the same values as numeric, which adds in exact decimal with no rounding until the very end:

SELECT sum(reading::numeric) FROM measurements;

              sum
-------------------------------
 2500519211.787503050078417764

That is the exact sum of the stored values. Now line everything up:

How the sum was computed	Result	Off by
Exact (`numeric`)	`2500519211.787503050…`	reference
Serial `float8`, physical order	`2500519211.787516`	~1.3e-5
Parallel `float8`, run A	`2500519211.7874823`	~2.1e-5
Parallel `float8`, run B	`2500519211.787477`	~2.6e-5
Ordered by value ascending	`2500519211.7876067`	~1.0e-4

Every floating-point total is wrong, including the "stable" serial one. They differ from the exact value in the fifth and sixth decimal places. Parallelism did not introduce the error. It removed the one comforting property the serial sum had, which was getting the same wrong answer every time. That sameness is what people mistake for correctness.

Summing smallest-to-largest is the classic textbook advice for reducing floating-point error, and here ascending order happens to land further from the exact value than physical order does. Order changes the error; it does not reliably shrink it. If you need a smaller error there are compensated-summation algorithms (Kahan and friends), but inside a SQL aggregate you do not get to choose them.

Where it actually bites

In the sixth decimal of a five-million-row random sum, nobody cares. The problem is the places where someone does.

The obvious one is reconciliation. A finance team sums a revenue column stored as double precision, exports it, and the number disagrees with the same query run an hour later, or with the same logic in a different tool running a different parallel degree. Nothing in the data changed, so the hunt goes to caching, to replicas, to "a dirty read somewhere" long before anyone suspects the aggregate itself.

The same thing wrecks regression tests, just less visibly. Assert that a float total equals an exact literal, or compare two pipelines for byte-identical sums, and the test will flap. It passes on a small fixture that runs serially, then fails in CI on a larger one that crosses the parallel threshold, with no code change in between. The test is asserting a property the type does not have, and it will keep flapping until someone notices that.

And anywhere a float sum feeds an equality check, a join, or a GROUP BY key, you are building on sand. Two totals that "should" be equal can differ in their last bits, and now your join silently drops rows.

Postgres already knows this is dangerous

The most convincing evidence that this is a real, known hazard and not a corner case is that Postgres deliberately gives up an optimization because of it.

Window functions can compute a moving sum efficiently using an inverse transition function: as the frame slides forward, subtract the value leaving the window instead of re-summing the whole frame. It works for numeric and integers. For float4 and float8, Postgres refuses to do it. From the documentation on user-defined aggregates:

An example of an aggregate for which adding an inverse transition function seems easy at first, yet where this requirement cannot be met is sum over float4 or float8 inputs.

The docs then show why, with the same absorption we saw earlier. If a window holds 1e20 and 1, subtracting 1e20 as it leaves the frame yields 0, not the 1 that should remain. So the built-in float sum simply does not use the inverse optimization, and a moving sum(float8) recomputes the frame to stay correct:

SELECT n, sum(x) OVER (ORDER BY n ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING)
FROM (VALUES (1, 1.0e20::float8), (2, 1.0::float8)) AS v(n, x);

 n | sum
---+-------
 1 | 1e+20
 2 |     1

The second row is 1, correct, because Postgres declined the shortcut that would have made it 0. The engine is quietly working around the imprecision of float sums on your behalf. The non-determinism in the parallel case is the same hazard showing up somewhere the engine cannot paper over it.

What to do about it

The fix is almost always the type, not the query.

Use numeric for anything anyone reconciles, compares, or reads as a figure. Money, totals on reports, anything that has to match another system or itself across runs. numeric sums in exact decimal and is order-independent: same rows, same answer, parallel or not, in any order.

SET max_parallel_workers_per_gather = 4;
SELECT sum(reading::numeric) FROM measurements;  -- 2500519211.787503050078417764
SELECT sum(reading::numeric) FROM measurements;  -- 2500519211.787503050078417764

The cast above stabilizes a single query, but the durable fix is storing the column as numeric in the first place, so the exact values are what live on disk.

numeric is not free. It is arbitrary-precision arithmetic done in software, so it is slower to compute and takes more space on disk than a hardware float the CPU adds in a single instruction. For anything that has to reconcile, that is the price of correctness, and it is almost always worth paying. For a column nobody ever sums into a figure a person reads, it usually is not.

If the data is genuinely floating point (sensor readings, scientific measurements, anything where the input is already an approximation), then accept that aggregates over it are approximate and do not assert exact equality on them. Round to a sensible number of decimals before comparing or displaying, and never use a raw float sum as a join or grouping key.

Pinning max_parallel_workers_per_gather = 0 makes a float sum repeatable, but does not make it correct, and it costs you parallelism on every other query in the session. It is a way to get a stable wrong answer, not a fix. Reach for the type instead.

None of this is Postgres-specific. Any database that parallelizes aggregation, and they all do, has the same behavior over the same IEEE 754 floats. The trigger that makes it visible is simply crossing whatever threshold turns parallelism on, which is why it so often shows up when a table grows past a certain size and "nothing else changed."