Every query starts with a plan. Every slow query probably starts with a bad one. And more often than not, the statistics are to blame. But how does it really work? PostgreSQL doesn't run the query to find out — it estimates the cost. It reads pre-computed data from pg_class and pg_statistic and does the maths to figure out the cheapest path to your data.
In ideal scenario, the numbers read are accurate, and you get the plan you expect. But when they are stale, the situation gets out of control. Planner estimates 500 rows, plans a nested loop, and hits 25,000. What seemed as optimal plan turns into a cascading failure.
How do statistics get stale? It can be either bulk load, a schema migration, faster-than-expected growth, or simply VACUUM not keeping up. Whatever the cause, the result is the same. The planner is flying blind. Choosing paths based on reality that no longer exists.
In this post we will go inside the two catalogs the planner depends on, understand what ANALYZE actually gets for you from a 30,000-row table, and see how those numbers determine whether your query takes milliseconds or minutes.
Sample schema
For demonstration purposes we will use the same schema as in the article Reading Buffer statistics in EXPLAIN output.
CREATE TABLE customers (
id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
name text NOT NULL
);
CREATE TABLE orders (
id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
customer_id integer NOT NULL REFERENCES customers(id),
amount numeric(10,2) NOT NULL,
status text NOT NULL DEFAULT 'pending',
note text,
created_at date NOT NULL DEFAULT CURRENT_DATE
);
INSERT INTO customers (name)
SELECT 'Customer ' || i
FROM generate_series(1, 2000) AS i;
INSERT INTO orders (customer_id, amount, status, note, created_at)
SELECT
(random() * 1999 + 1)::int,
(random() * 500 + 5)::numeric(10,2),
(ARRAY['pending','shipped','delivered','cancelled'])[floor(random()*4+1)::int],
CASE WHEN random() < 0.3 THEN 'Some note text here for padding' ELSE NULL END,
'2022-01-01'::date + (random() * 1095)::int
FROM generate_series(1, 100000);
ANALYZE customers;
ANALYZE orders;What the Planner Reads
As mentioned above, every decision the planner makes is based on two sources:
- table-level metadata from
pg_class - column-level metadata from
pg_statistic.
pg_class - relational-level stats
| Column | Meaning |
|---|---|
relpages | Number of 8KB pages representing the table on the disk |
reltuples | Estimated number of live rows in the table |
relallvisible | Pages where all tuples are visible to all transactions |
For our sample table it looks like this.
SELECT relname, relpages, reltuples, relallvisible
FROM pg_class
WHERE relname = 'orders'; relname | relpages | reltuples | relallvisible
---------+----------+-----------+---------------
orders | 856 | 100000 | 856cpu_tuple_cost, which defaults to 0.01The planner sees 100,000 rows spread across 856 pages. Every cost estimate starts from those two numbers. relpages drives sequential scan cost - each page is one unit of I/O work as configured via seq_page_cost. reltuples controls estimates for joins, aggregations, and pretty much everything else.
The reltuples value is only an estimate, not a live count. It's updated by ANALYZE (and autovacuum), not by individual INSERTs or DELETEs. Between ANALYZE runs, PostgreSQL scales reltuples proportionally when relpages value changes — if the table grows by 10% in pages, the planner assumes 10% more rows too.
This works well enough for normal growth, but breaks down with bloat. If dead tuples are inflating the number of pages used, without adding real rows, the planner overestimates the table size.
The other column mentioned above matters for specific operations. relallvisible tells the planner how much of the table can be read with an index-only scan. Meaning an index-only scan can return results using the index without checking the heap for visibility.
pg_statistic (via pg_stats) - column-level stats
Knowing the size of a table is only half the picture. To estimate how many rows might match, the planner needs to understand the data inside each column. PostgreSQL maintains statistics in pg_statistic catalog. Which you're most likely never going to use directly. In practice you will use the view pg_stats which presents the human-friendly data behind it.
The most interesting values it exposes are:
| Statistic | What it tells the planner |
|---|---|
null_frac | Fraction of entries that are NULL values |
avg_width | Average width in bytes |
n_distinct | Number of distinct values (negative means fraction of rows) |
most_common_vals | Most frequent values |
most_common_freqs | Frequencies of those values |
histogram_bounds | Values dividing the remaining data into equal-population buckets (most_common_vals are excluded) |
correlation | Statistical correlation between physical row ordering and logical ordering of the column values |
SELECT attname, null_frac, avg_width, n_distinct,
most_common_vals, most_common_freqs, histogram_bounds, correlation
FROM pg_stats
WHERE tablename = 'orders' AND attname = 'status';-[ RECORD 1 ]-----+--------------------------------------
attname | status
null_frac | 0
avg_width | 8
n_distinct | 4
most_common_vals | {pending,shipped,delivered,cancelled}
most_common_freqs | {0.25396666,0.25,0.24973333,0.2463}
histogram_bounds |
correlation | 0.2524199
From this planner knows there are exactly 4 distinct values, more or less equally distributed, without any NULL values. When you write a predicate WHERE status = 'pending' it will estimate ~25% of rows are going to match. All that by not running a query, but reading the catalog row.
Different situation is when checking column note.
SELECT attname, null_frac, avg_width, n_distinct,
array_length(most_common_vals, 1) AS mcv_count,
array_length(histogram_bounds, 1) AS histogram_buckets
FROM pg_stats
WHERE tablename = 'orders' AND attname = 'note';-[ RECORD 1 ]-----+-------
attname | note
null_frac | 0.6982
avg_width | 32
n_distinct | 1
mcv_count | 1
histogram_buckets |
For this column there's ~70% of NULLs, and only one distinct value. Therefore WHERE note IS NOT NULL will estimate 30% of rows.
Selectivity in Action
Now that we have covered what data the planner has available, let's have a look at how it's used to estimate how many rows a certain part of a query will read. This "guess" is called Selectivity. And it's defined as a floating-point number between 0 and 1.
The formula is pretty simple:
Estimated Rows = Total Rows * Selectivity
But the way Selectivity is calculated depends entirely on the operator you use (=, <, > or LIKE).
The equality (Most Common Values)
Equality is easiest to start with. When you use WHERE status = 'shipped' the planner first checks most_common_vals (MCV) list. If there's a match, the selectivity is same as the one provided by most_common_freqs.
If the value isn't in the list, the planner assumes the value is part of the "remaining" population. It subtracts all MCV frequencies from 1.0 and divides the remainder by the number of other distinct values.
(1.0 - (SELECT sum(s) FROM unnest(most_common_freqs) s))
/
(n_distinct - array_length(most_common_vals, 1))The Range lookup (The Histogram)
Life would be easy if we would be looking only for exact values. Most of the time we need to utilise range lookup. For example WHERE amount > 400.
MCVs are useless in this case as there might be thousands or millions of unique constants. This is where histogram_bounds comes in. PostgreSQL divides the column values into a number of buckets, where each bucket contains equal number of rows (not values).
The Selectivity in this case is determined by how many buckets your query covers. In our example, if we have bounds (100, 200, 300, 400, 500, 600) the planner will establish it covers 2 full buckets. Since there are 5 buckets in total the Selectivity is going to be 0.4 (2/5).
If you are now wondering where 2 and 5 come from? The histogram_bounds array defines boundaries between buckets. With bounds (100, 200, 300, 400, 500, 600) there are 5 buckets in total.
(100-200), (200-300), (300-400), (400-500), (500-600)
And the same logic applies to matching. The condition amount > 400 matches on (400-500) and (500-600) buckets (2 in total).
Slightly more complex situation happens in the cases where your value falls inside a bucket. If your query has WHERE amount > 350 the planner locates the bucket containing value 350 (in our example (300-400)), assumes the data is linearly distributed and calculates the ratio within that bucket.
Search and pattern matching
This is probably the most treacherous territory for the planner. For substring matching patterns like WHERE note LIKE '%middle%', there's no histogram or list of values to rely on. The planner must fall back to "magic constants" hardcoded in the PostgreSQL source code.
The default for generic patterns is 0.5% of the total rows, defined as
#define DEFAULT_MATCH_SEL 0.005
Slightly better situation comes for prefixed matches like WHERE note LIKE 'boringSQL%' where PostgreSQL can fall back to range conditions and use histogram bounds. While this is a subtle difference, it makes a night and day difference. Unfortunately, few software developers are aware of it.
Correlation and index scan cost
Remember correlation from the pg_stats? It says how closely the physical order of rows on disk matches the logical order of column values. Values close to 1.0 mean high correlation, values near 0 mean data is laid out randomly across pages.
If you recall how data is organized in 8KB page the row locality matters. This matters because it determines whether an index scan is worth it. The planner assumes a random page read costs 4× more than a sequential one (random_page_cost = 4.0 vs seq_page_cost = 1.0). When correlation is high, the rows an index points to are physically adjacent — the planner expects sequential I/O and costs the scan cheaply. When correlation is low, each lookup likely hits a different page, and the planner costs each of those reads at the higher random rate. That difference alone can be enough to make a sequential scan cheaper than an index scan.
n_distinct and join estimation
Value n_distinct plays an important role in joins. This further stresses the importance of running ANALYZE after bulk data changes.
Let's imagine this simplified logic for the equality join:
null_frac (NULLs don't join) and MCVs on both sides. If both sides have MCVs, it calculates the "inner product" of the frequencies.estimated_rows = (rows_left × rows_right) / max(n_distinct_left, n_distinct_right)
Let's say you're trying to join two tables, both having roughly 2,000 distinct values for the join key. Outdated n_distinct will cause significant estimation drifts and the planner may pick a wrong join strategy altogether.
This is the heap we are talking about
Please, keep in mind this describes how the planner estimates rows for a basic heap scan. Indexes, join selectivity, and complex types like JSONB each come with their own estimation logic, operator handling and quirks. The fundamentals of estimation are the same nevertheless.
What if there are no statistics?
So far we have touched on why up-to-date statistics are a must. But what if there are no statistics at all? For example for a new table or new column when ANALYZE has not yet run.
In those cases PostgreSQL falls back to hardcoded defaults.
| Condition type | Default selectivity | Constant |
|---|---|---|
Equality (=) | 0.5% | DEFAULT_EQ_SEL = 0.005 |
Range (>, <) | 33.3% | DEFAULT_INEQ_SEL = 0.3333 |
Range (BETWEEN) | 0.5% | DEFAULT_RANGE_INEQ_SEL = 0.005 |
Pattern matching (LIKE) | 0.5% | DEFAULT_MATCH_SEL = 0.005 |
IS NULL | 0.5% | DEFAULT_UNK_SEL = 0.005 |
IS NOT NULL | 99.5% | DEFAULT_NOT_UNK_SEL = 0.995 |
Nothing that a quick ANALYZE can't fix, correct? Or maybe not.
While this article focuses on statistics and getting them right, there are situations where no statistics will be available (never or not predictably). If you believe it won't affect you, please, think twice.
- CTEs and subqueries when not inlined/materialized have no statistics.
- Temporary tables are not touched by autovacuum, so no automatic
ANALYZE. - Foreign tables do not guarantee stats are propagated.
- And, to a big surprise, computed expressions in WHERE like
WHERE amount * 1.1 > 500orlower(email) = 'hello@example.com'; unless you create an expression index or extended statistics.
How ANALYZE Works
As we have already mentioned several times, ANALYZE is the only mechanism that populates pg_class and pg_statistic with fresh data. Understanding what it samples, what it computes, and what it misses, is key to understanding why statistics are sometimes wrong.
The process itself consists of 6 separate steps (as reported by pg_stat_progress_analyze):
- initializing
- acquiring sample rows — sampling pages from the table
- acquiring inherited sample rows (child/partitioned tables)
- computing statistics (MCVs, histograms, correlation, etc.)
- computing extended statistics (see below)
- finalizing and writing to
pg_statistic
For our purposes we will only cover sampling, computing statistics and writing to pg_statistic.
The sampling mechanism
ANALYZE actually doesn't read the entire table. It samples what is considered to be a statistically justified minimum sample size (defined as 300 for the reservoir sampling algorithm). For PostgreSQL that means 300 × default_statistics_target rows.
SHOW default_statistics_target; default_statistics_target
---------------------------
100
(1 row)
With the default target of 100, that's 30,000 rows. For our 100,000-row orders table, ANALYZE reads about 30% of the data. For a 50-million-row table, it reads 0.06%. The same target also controls the size of the MCV list and histogram. Up to 100 entries each.
histogram_bounds contains 101 values. 100 buckets need 101 boundaries to close them.The sampling is two-stage. First, ANALYZE selects a random set of pages. Then it reads all live rows from those pages. This gives a representative cross-section without reading every page.
Computing statistics
Once ANALYZE has its 30,000 sample rows (considering the default values), it processes each column independently. The pipeline for a single column looks roughly like this:
First, it counts NULLs and calculates null_frac and avg_width — the cheapest statistics to compute. Then it sorts the non-null values and builds the MCV list by counting duplicates. Values that appear frequently enough make the cut; the rest are passed to the histogram builder, which divides them into equal-population buckets. Finally, because the values are already sorted, ANALYZE compares the logical sort order against the physical tuple positions (which page each row came from) to compute correlation.
The key detail here is that MCVs and histograms are not built from the same pool. Values that land in most_common_vals are excluded from histogram_bounds. This is why you'll sometimes see a column with MCVs but no histogram, or a histogram but no MCVs. They represent different slices of the same data.
Writing to pg_statistic
Once all columns are processed, ANALYZE writes the results into pg_statistic — one row per column. If a row for that column already exists, it's updated in place. This is a regular heap update, which means the old row becomes a dead tuple. On tables with many columns or frequent ANALYZE runs, this can cause pg_statistic itself to bloat.
After pg_statistic is updated, ANALYZE refreshes relpages, reltuples, and relallvisible in pg_class. These values are recalculated from the sampling, not from a full table scan (they are estimates too).
Controlling statistics quality
default_statistics_target
The default target of 100 works well for most columns. It means up to 100 MCVs, 101 histogram bounds, and a 30,000-row sample. Increasing it helps when:
- A column has many distinct values and the top 100 don't cover enough of the distribution
- Range queries on skewed data produce bad estimates because histogram buckets are too coarse
- Join estimates are off because
n_distinctis inaccurate
The cost scales linearly. Setting it to 1000 means 300,000 sampled rows, up to 1000 MCVs, more catalog storage, and slower planning from larger arrays to search. The maximum is 10,000.
You don't have to raise it globally. For a single problematic column:
ALTER TABLE orders ALTER COLUMN status SET STATISTICS 500;
ANALYZE orders;
Now status gets up to 500 MCVs and 501 histogram bounds, while every other column stays at 100.
Extended statistics
Standard statistics treat each column independently. This means the planner can't know that city = 'Edinburgh' and country = 'UK' are correlated — it multiplies their selectivities independently, potentially underestimating by orders of magnitude.
Extended statistics solve this for specific column combinations:
CREATE STATISTICS orders_status_date (dependencies, ndistinct, mcv)
ON status, created_at FROM orders;
ANALYZE orders;
This tells ANALYZE to compute functional dependencies, combined distinct counts, and combined MCVs between the columns. The planner can then use these to avoid the independence assumption for queries filtering on both columns.
The three types of extended statistics serve different purposes:
- dependencies capture functional dependencies between columns. Helps when knowing one column's value determines or narrows another's (e.g.
zip_codelargely determinescity). - ndistinct tracks the number of distinct value combinations across columns. Helps with
GROUP BYon multiple columns where the planner would otherwise multiply distinct counts independently. - mcv builds a combined most-common-values list for column tuples. The most powerful but most expensive option. Helps with multi-column WHERE conditions on correlated values.
You can create extended statistics with any combination of these types. Start with dependencies as it's cheapest, add mcv when multi-column filter estimates are consistently wrong.
Extended statistics are worth creating when you see EXPLAIN estimates that are consistently wrong on multi-column filters, and the columns are logically correlated. They are computed during step 5 of the ANALYZE process and stored in pg_statistic_ext_data.
Diagnosing bad estimates
When a query is slow, the first question should always be: did the planner estimate correctly? Compare the estimate to reality with EXPLAIN ANALYZE.
Estimate off by handful of rows means statistics are fine. But when you see estimates off by 10x or more, that's where planning goes wrong. A nested loop that looks cheap for 100 rows becomes a disaster at 10,000.
The statistics tell you what the planner believed, and comparing that with reality tells you what to do next. Either run ANALYZE, consider tuning the statistics target for a specific column, or creating extended statistics if multiple columns are involved.
The planner is only as good as what it reads from the catalog. When estimates go wrong, don't blame the planner. Check the data it's working with.