Reading Buffer statistics in EXPLAIN output

In the article about Buffers in PostgreSQL we kept adding EXPLAIN (ANALYZE, BUFFERS) to every query without giving much thought to the output. Time to fix that. PostgreSQL breaks down buffer usage for each plan node, and once you learn to read those numbers, you'll know exactly where your query spent time waiting for I/O - and where it didn't have to. That's about as fundamental as it gets when diagnosing performance problems.

PostgreSQL 18: BUFFERS by Default
Starting with PostgreSQL 18, EXPLAIN ANALYZE automatically includes buffer statistics - you no longer need to explicitly add BUFFERS. The examples below use the explicit syntax for compatibility with older versions, but on PG18+ a simple EXPLAIN ANALYZE gives you the same information.

A complete example

For this article we will use following schema and seeded data.

CREATE TABLE customers (
    id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    name text NOT NULL
);

CREATE TABLE orders (
    id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    customer_id integer NOT NULL REFERENCES customers(id),
    amount numeric(10,2) NOT NULL,
    status text NOT NULL DEFAULT 'pending',
    note text,
    created_at date NOT NULL DEFAULT CURRENT_DATE
);

INSERT INTO customers (name)
SELECT 'Customer ' || i
FROM generate_series(1, 2000) AS i;

-- seed data: ~100,000 orders spread across 2022-2025
INSERT INTO orders (customer_id, amount, status, note, created_at)
SELECT
    (random() * 1999 + 1)::int,
    (random() * 500 + 5)::numeric(10,2),
    (ARRAY['pending','shipped','delivered','cancelled'])[floor(random()*4+1)::int],
    CASE WHEN random() < 0.3 THEN 'Some note text here for padding' ELSE NULL END,
    '2022-01-01'::date + (random() * 1095)::int  -- ~3 years of data
FROM generate_series(1, 100000);

-- make sure stats are up to date
ANALYZE customers;
ANALYZE orders;

-- we are going to skip indexes on purpose

-- and fire sample query
select count(1) from customers;

Let's start with a random query

EXPLAIN (ANALYZE, BUFFERS)
SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.created_at > '2024-01-01';

and its output.

                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=58.00..2253.87 rows=33784 width=71) (actual time=0.835..26.695 rows=33239.00 loops=1)
   Hash Cond: (o.customer_id = c.id)
   Buffers: shared hit=13 read=857
   ->  Seq Scan on orders o  (cost=0.00..2107.00 rows=33784 width=58) (actual time=0.108..18.106 rows=33239.00 loops=1)
         Filter: (created_at > '2024-01-01'::date)
         Rows Removed by Filter: 66761
         Buffers: shared read=857
   ->  Hash  (cost=33.00..33.00 rows=2000 width=17) (actual time=0.697..0.698 rows=2000.00 loops=1)
         Buckets: 2048  Batches: 1  Memory Usage: 118kB
         Buffers: shared hit=13
         ->  Seq Scan on customers c  (cost=0.00..33.00 rows=2000 width=17) (actual time=0.007..0.231 rows=2000.00 loops=1)
               Buffers: shared hit=13
 Planning:
   Buffers: shared hit=130 read=29 dirtied=3
 Planning Time: 1.585 ms
 Execution Time: 28.067 ms
(16 rows)

As that's quite a bit of information, let's break it down by individual categories.

Shared buffers: hit, read, dirtied & written

As described in previous article these are most common buffers statistics you will see.

shared hit is number of pages found in shared buffers (i.e. cached). This is fast path where no disk I/O is required. Higher is better for performance.

shared read identifies number of pages not in shared buffers, fetched from disk (or OS cache), and each one of them adds potential I/O latency.

If you see a SELECT that shows `dirtied` pages. It's not a bug. PostgreSQL sets hint bits and prunes HOT chains during reads - the first backend to read a page after writes will dirty it. Normal behavior, not a problem.

**shared dirtied** has number of pages modified by this query. The query changed the data that was already cached (in the buffer pool) and those pages will eventually need to be written to the disk.

shared written is a number of pages written to disk during query execution. To remind us, this happens when query needs buffer space but needs to evict dirty pages synchronously. If you see this repeatedly during a SELECT it might be a warning sign - your background writer is not keeping up as it should.

Let's have a look at our query's top-level buffer stats:

Buffers: shared hit=13 read=857

Only 13 pages were cached in shared buffers, while 857 had to be fetched from disk (or OS cache). No pages were dirtied or written - expected for a pure SELECT with no side effects.

But where did those 13 hits come from? The breakdown by node tells us:

->  Seq Scan on orders o
      Buffers: shared read=857
->  Seq Scan on customers c
      Buffers: shared hit=13

The customers table (in this case small - 2,000 rows, 13 pages) was fully cached - likely accessed frequently or as it's in our case accessed recently. The orders table (100,000 rows, 857 pages) had zero hits - every single page required I/O. This is typical after a restart or when scanning a table that doesn't fit comfortably in shared buffers.

Interpreting the ratio

In the context of this article we're going to consider ratio between shared hit and total buffers processed. Is there a perfect ratio you should strive for? As we will demonstrate there's no such universal value.

Let's calculate it for our query:

hit_ratio = shared hit / (shared hit + shared read)
          = 13 / (13 + 857)
          = 1.5%

In an OLTP workload, the same small set of rows gets accessed over and over - fetch a customer by ID, look up an order by reference number, check inventory for a product. The working set is a small fraction of the total database. These queries touch a handful of pages each, and those pages stay hot in shared buffers because they keep getting requested. A well-tuned OLTP system naturally converges toward high hit ratios - not because someone set a target, but because the access pattern keeps the relevant data cached.

Honestly, that looks terrible. If this were the ratio of most of your OLTP queries you can easily say - there's a problem. But this was run on a freshly loaded dataset with cold caches - every page of the orders table had to be fetched for the first time. Run the same query again and you'll likely see most of those 857 reads become hits as shared buffers and the OS page cache warm up. On a test environment (where nothing else runs you will most likely hit the 100%).

What matters is the hit ratio per query, tracked over time, compared to its own baseline:

A reporting query scanning a large date range might consistently show 10-30% hit ratio. That's fine - it's expected to touch cold data.
A query serving your login page should be near 100%. If it drops to 80%, something changed - maybe the table grew, an index was rebuilt, or shared_buffers is under pressure from a new workload.
A query that ran at 95% hit ratio last week and now runs at 40% deserves investigation, regardless of whether 40% sounds "good" or "bad" in isolation.

The ratio is a diagnostic tool, not a scorecard. Use it to spot regressions, compare before-and-after when tuning, and understand where your query's time is actually going. A low ratio paired with high execution time points you toward I/O as the bottleneck. A high ratio with high execution time tells you to look elsewhere - maybe CPU, maybe row count, maybe a bad plan.

Context matters more than absolute numbers. Compare similar queries over time, not arbitrary benchmarks.

Local buffers

Local buffers track I/O for temporary tables. Unlike regular tables that live in shared buffers, temp tables use per-backend memory - each connection gets its own local buffer pool, controlled by the temp_buffers setting.

CREATE TEMP TABLE temp_large_orders AS
SELECT o.id, o.amount, o.status, o.created_at, c.name AS customer_name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.amount > 200;

EXPLAIN (ANALYZE, BUFFERS)
SELECT status, count(*), sum(amount)
FROM temp_large_orders
GROUP BY status;

First thing you should notice is there's no shared buffers at all, at least in execution phase. The entire query ran against local buffers because temp tables are invisible to other backends.

                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=1281.60..1284.10 rows=200 width=72) (actual time=24.659..24.661 rows=4.00 loops=1)
   Group Key: status
   Batches: 1  Memory Usage: 32kB
   Buffers: local hit=576
   ->  Seq Scan on temp_large_orders  (cost=0.00..979.20 rows=40320 width=48) (actual time=0.009..5.965 rows=60731.00 loops=1)
         Buffers: local hit=576
 Planning:
   Buffers: shared hit=36 read=5
 Planning Time: 0.294 ms
 Execution Time: 24.708 ms
(10 rows)

The individual values that might be reported are local hit/read with the same concept as shared, but for temp tables in the per-backend buffer pool.

Another value is local dirtied/written representing temp table modifications. "Dirtied" means the query modified pages in the local buffer pool. "Written" means dirty pages had to be flushed to disk to make room for new ones - the same clock-sweep eviction mechanism as shared buffers, but against the local buffer pool. Unlike shared buffers, temp table writes don't generate WAL and aren't subject to checkpointing.

In practice, local written is rare to see - PostgreSQL handles temp table overflow efficiently enough that you're unlikely to encounter it unless your temp_buffers is severely undersized relative to your temp table workload.

Temp buffers: when work_mem isn't enough

While local buffers are not that often considered a problem, or visible, temp buffers track cases where operations spill from memory to disk for sorts, hashes, and other operations that exceed current work_mem settings.

SET work_mem = '256kB';

EXPLAIN (ANALYZE, BUFFERS)
SELECT o.id, o.amount, o.status, o.created_at, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id
ORDER BY o.amount DESC;

We explictely forced lower work_mem to see the impact.

                                                              QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=38374.70..38874.70 rows=200000 width=36) (actual time=109.345..120.574 rows=200000.00 loops=1)
   Sort Key: o.amount DESC
   Sort Method: external merge  Disk: 9736kB
   Buffers: shared hit=1738, temp read=3636 written=3722
   ->  Hash Join  (cost=116.00..4353.56 rows=200000 width=36) (actual time=1.597..34.857 rows=200000.00 loops=1)
         Hash Cond: (o.customer_id = c.id)
         Buffers: shared hit=1738
         ->  Seq Scan on orders o  (cost=0.00..3712.00 rows=200000 width=27) (actual time=0.016..6.973 rows=200000.00 loops=1)
               Buffers: shared hit=1712
         ->  Hash  (cost=66.00..66.00 rows=4000 width=17) (actual time=1.568..1.569 rows=4000.00 loops=1)
               Buckets: 4096  Batches: 1  Memory Usage: 235kB
               Buffers: shared hit=26
               ->  Seq Scan on customers c  (cost=0.00..66.00 rows=4000 width=17) (actual time=0.012..0.629 rows=4000.00 loops=1)
                     Buffers: shared hit=26
 Planning:
   Buffers: shared hit=15
 Planning Time: 1.184 ms
 Execution Time: 123.932 ms

There you can see the temp read/written with number of pages read from and written to temporary files on disk. This indicates the operation couldn't fit in memory.

Naming Confusion Alert

temp read/written in EXPLAIN has nothing to do with the temp_buffers parameter.

temp_buffers = memory for temporary tables (CREATE TEMP TABLE)
temp read/written = disk spill from sorts/hashes (governed by work_mem)

The Sort Method: external merge Disk: 9736kB confirms it - sorting 200,000 rows with only 256kB of work_mem forced PostgreSQL to spill ~9.7MB to temporary files on disk. The temp written=3722 happened during the sort phase as pages were flushed out, and temp read=3636 happened during the merge phase as PostgreSQL read them back to produce the final sorted result.

Notice something else: the Hash Join and everything below it shows only shared hit=1738 with no temp buffers at all. The hash table for 4,000 customers fit comfortably in 235kB of memory. The temp spill is isolated to the Sort node - buffer stats always attribute I/O to the node that caused it.

Try bumping work_mem to something reasonable and the spill disappears:

SET work_mem = '16MB';

You should see no temp buffers at all. The sort completed in memory, execution time dropped, and the only I/O was reading the actual table data.

To reduce temp file usage you can:

Increase work_mem (careful there, don't forget it's per-operation setting, not per-query, so a complex query with multiple sorts or hash joins allocates work_mem for each one)
Optimize the query to process fewer rows before the sort
And probably most importantly, consider adding indexes to avoid sorts entirely - an index on orders(amount DESC) would eliminate the sort node altogether

Planning buffers

Up until now we have avoided planning buffers completely. It's an addition that started with PostgreSQL 13, allowing you to see the buffer usage during the query planning, separate from the execution:

 Planning:
   Buffers: shared hit=36 read=5

Why does planning need buffers? The planner reads system catalogs (pg_class, pg_statistic, pg_index, etc.) to understand table structures and statistics. Complex queries touching many tables can have a non-trivial impact on planning-time I/O.

High read count in planning phase suggests either system catalogues aren't cached (cold start most likely), or your query is touching many tables or columns.

If planning time is a problem, ensure system catalogs stay hot. On systems with many partitions, planning overhead can become significant - this is one reason partition pruning matters.

The blurry line between planning and execution buffers

While writing the articles for this blog I often face the doubts whatever I have everything correct. Because with a complex system like PostgreSQL one is always learning. Recently learned more about planning buffers based on my somewhat technically imprecise assumption.

PostgreSQL actually does not resolve all metadata during the planning phase. The planner does the minimum work needed to choose the best plan, but defers some catalog lookups to execution time. When a Sort node first runs, it looks up the comparison function from pg_amproc via get_opfamily_proc(). That lookup hits shared buffers and gets counted as execution buffers. On the second run in the same session, the syscache already has that information - no buffer access, fewer reported buffers.

Putting it together

Here's a sample output of a query with problems across every buffer category:

 Buffers: shared hit=50 read=15000 written=847
          temp read=2500 written=2500
 Planning:
   Buffers: shared hit=12 read=156
 Planning Time: 45.678 ms
 Execution Time: 12345.678 ms

Reading this top to bottom: the hit ratio is abysmal (50 hits vs 15,000 reads), so the working set isn't cached. The written=847 means the query forced synchronous evictions - the background writer can't keep up. The temp spill points to an operation exceeding work_mem. Even planning needed 156 reads, suggesting system catalogs got evicted from cache.

Each number points to a specific tuning lever: shared_buffers, bgwriter_lru_maxpages, work_mem, or query optimization to touch less data.

Looking beyond single queries

Single query analysis is useful, but patterns across your workload matter more. pg_stat_statements exposes the same buffer counters aggregated over time:

SELECT
    substring(query, 1, 60) AS query,
    calls,
    shared_blks_hit,
    shared_blks_read,
    round(100.0 * shared_blks_hit /
      nullif(shared_blks_hit + shared_blks_read, 0), 2) AS hit_pct,
    temp_blks_written
FROM pg_stat_statements
WHERE calls > 100
ORDER BY shared_blks_read DESC
LIMIT 10;

This shows which queries are causing the most disk reads across your system - often more actionable than analyzing one query at a time.

Conclusion

Buffer statistics transform EXPLAIN from "here's the plan" to "here's exactly where the time went." Every number points to a specific cause and a specific fix. Once you start reading them, you stop guessing and start tuning.

If you need to get bigger picture on buffer management, check out Introduction to Buffers in PostgreSQL.