Most applications start with a single PostgreSQL database, but over time, the need to scale out, distribute the load, or integrate naturally arises. PostgreSQL's logical replication is one of the features that meets these demands by streaming row-level changes from one PostgreSQL instance to another, all using a publish-subscribe model. Logical replication is more than an advanced feature; it provides a flexible framework you can build on to further distribute and integrate PostgreSQL within your architecture.

In this article, we will start with the foundation, explore the core ideas behind logical replication, and learn how to use it.

Physical vs. Logical Replication

Before we can dive deeper, let's understand the role of replication in PostgreSQL and how it's built on top of the Write-Ahead Log (WAL).

The WAL is a sequential, append-only log that records every change made to the cluster data. For durability purposes, all modifications are first written to the WAL and only then permanently written to disk. This allows PostgreSQL to recover from crashes by replaying logged changes.

Versioned changes, necessitated by concurrent transactions, are managed through Multi-Version Concurrency Control (MVCC). Instead of overwriting data directly, MVCC creates multiple versions of rows, allowing each transaction to see a consistent snapshot of the database. It is the WAL that captures these versioned changes along with the transactional metadata to ensure data consistency at any given point in time.

Physical replication is built directly on the Write-Ahead Log. It enables streaming of the binary WAL data from the primary server to one or more standby servers (replicas), effectively creating a byte-for-byte copy of the entire cluster. This requirement makes the replicas read-only, making them ideal candidates for failover or scaling purposes.

Compared to this, Logical replication, while also being built on top of the WAL data, takes a fundamentally different approach. Instead of streaming raw change data, logical replication decodes the WAL into logical, row-level changes – such as INSERT, UPDATE, and DELETE – and only then sends them to the subscribers using a Publish-Subscribe model. Compared to physical replication, this allows selective replication, while allowing writable subscribers which are not strictly tied to a single publisher. This might increase the flexibility of available setups, however logical replication does not replicate DDL changes.

PhysicalLogical
Data StreamedBinary WAL segmentsRow-level SQL changes
ScopeByte-for-byte streamSelected tables
Node TypeRead-only standbyFully writable instance
PostgreSQL VersionAll servers must match major versionSupports across versions
Database SchemaChanges automatically replicatedChanges must be applied on subscriber(s) separately
Use CaseFailover, high-availability, and read scalingIntegration, zero-downtime upgrades and schema migrations, complex topologies

Physical replication is your go-to for high availability and disaster recovery, where you want a fast, exact copy of the entire database cluster that can take over in case of failure. It’s simple to set up and very efficient but limited in flexibility.

Logical replication shines when you need fine-grained control over what data is replicated, require writable replicas, or want to integrate PostgreSQL with other systems or versions. It’s ideal for zero-downtime upgrades, multi-region deployments, and building scalable, modular architectures.

Setting up Logical Replication

Enough of boring theory for now. To get started, you will need two instances of PostgreSQL – it's up to you whether you provision two virtual machines or two clusters running on the same computer. We will call one publisher and the other subscriber.

Preparing the Publisher

First, you need to prepare the publisher to emit the logical changes. You will need to modify postgresql.conf (or add particular conf.d configuration) with the following parameters:

wal_level = logical
max_replication_slots = 10
max_wal_senders = 10

Your publisher also needs to be reachable by the subscriber, in most cases via a TCP socket.

listen_addresses = '*'

Here, the configuration is:

  • wal_level set to logical is a crucial piece of the configuration. It tells PostgreSQL how much information to write to the WAL, in this case, to support logical decoding.
  • max_replication_slots defines the maximum number of replication slots that can be created on the server. Each logical replication subscription needs its own slot.
  • max_wal_senders should be set high enough to accommodate all expected concurrent replication connections from subscribers. Each active replication subscription consumes a wal_sender slot.

should match the maximum number of connections from the publisher (primary) to subscribers or replication clients. The number should be higher than or equal to the number of replication slots to avoid replication problems.

You will also need to configure client authentication in pg_hba.conf to allow the subscriber to connect for replication (for simplification, we allow all users):

# TYPE      DATABASE        USER            ADDRESS                 METHOD
host        replication     all             subscriber_ip/32        scram-sha-256

While for the purposes of the article we will assume the use of a superuser, making it convenient:

CREATE USER my_user_name SUPERUSER PASSWORD 'my_secure_password';

It is recommended to use a dedicated replication user for real-life deployments:

CREATE USER replication_user WITH REPLICATION ENCRYPTED PASSWORD 'my_secure_password';

Once configured, restart PostgreSQL for the configuration changes to take effect. After that, connect to the server and prepare the environment.

-- example for psql

CREATE DATABASE logical_demo_publisher;
\c logical_demo_publisher;

Create a sample table and seed initial data:

CREATE TABLE products (
    id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    name TEXT NOT NULL,
    category TEXT,
    price DECIMAL(10, 2) NOT NULL,
    stock_quantity INT DEFAULT 0,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    description TEXT,
    is_active BOOLEAN DEFAULT TRUE
);

-- seed some data (10 records)
INSERT INTO products (
    name,
    category,
    price,
    stock_quantity,
    description,
    is_active
)
SELECT
    'Product Batch ' || s.id AS name,
    CASE (s.id % 5)
        WHEN 0 THEN 'Electronics'
        WHEN 1 THEN 'Books'
        WHEN 2 THEN 'Home Goods'
        WHEN 3 THEN 'Apparel'
        ELSE 'Miscellaneous'
    END AS category,
    ROUND((RANDOM() * 500 + 10)::numeric, 2) AS price,
    FLOOR(RANDOM() * 200)::int AS stock_quantity,
    'Auto-generated description for product ID ' || s.id || '. Lorem ipsum dolor sit amet, consectetur adipiscing elit.' AS description,
    (s.id % 10 <> 0) AS is_active
FROM generate_series(1, 10) AS s(id);

The final step for the publisher is to create a publication to define what data we want to publish.

CREATE PUBLICATION my_publication FOR TABLE products;

Preparing the Subscriber

Next, we will use our subscriber instance to receive the logical changes. There's no need to make any configuration changes just for subscribing, as it's not emitting changes itself (please note the "just").

And create the target database and schema. The schema creation is an important part as:

-- example for psql

CREATE DATABASE my_subscriber_db;
\c my_subscriber_db;

CREATE TABLE products (
    id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    name TEXT NOT NULL,
    category TEXT,
    price DECIMAL(10, 2) NOT NULL,
    stock_quantity INT DEFAULT 0,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    description TEXT,
    is_active BOOLEAN DEFAULT TRUE
);

Now we have the foundation for testing logical replication. The simplest way is to create a subscription using connection details and the name of the publication to subscribe to.

CREATE SUBSCRIPTION my_subscription
    CONNECTION 'host=publisher_ip_address port=5432 user=your_replication_user password=my_secure_password dbname=logical_demo_publisher'
    PUBLICATION my_publication;
  • my_subscription should be a descriptive name for your subscription.
  • CONNECTION defines how to connect to the publisher (which is a regular connection string and could also be a service).
  • PUBLICATION specifies the name of the publication you created earlier on the publisher.

If your connection was correct, the subscription will by default start with the initial data sync and listen for incoming changes. If you have used the queries above, you can validate that the data is now on the subscriber.

# SELECT count(1) FROM products;
 count
---------
    10
(1 row)

The number of records should match the number of records created using generate_series above (10 in our example). You can go ahead and insert a single row again on your publisher instance and validate the data being replicated to the subscriber.

Congratulations! You have set up your first logical replication in PostgreSQL!

Core Concepts of Logical Replication

That was easy, right? And that's the goal. Now that you've seen logical replication in action, let's delve deeper into the core concepts that made this work.

Publication

As the name implies, a publication is where it all starts. It's essentially a catalogue of data you offer from the publisher. You can publish a number of objects:

--- specific tables
CREATE PUBLICATION my_publication FOR TABLE products, orders;

--- everything in your database (make sure you really want to do this)
CREATE PUBLICATION all_data FOR ALL TABLES;

--- replicate specific columns only (PostgreSQL 15 and higher)
CREATE PUBLICATION generic_data FOR TABLE products (id, name, price);

--- filter published data (PostgreSQL 15 and higher)
CREATE PUBLICATION active_products FOR TABLE products WHERE (is_active = true);

--- filter different operations
CREATE PUBLICATION my_publication FOR TABLE products, orders
WITH (publish = 'insert');

--- or you can mix & match it to get exactly what you need
CREATE PUBLICATION eu_customers FOR
    TABLE customers (id, email, country, created_at)
        WHERE (country IN ('DE', 'FR', 'IT')),
    TABLE orders (id, customer_id, total_amount)
        WHERE (total_amount > 100)
WITH (publish = 'insert, update');

As you can see, logical replication really gives you quite a lot of options and is far from the rigid, byte-for-byte copying of physical replication. This is exactly the characteristic that allows you to build complex topologies.

Once you set up your publication(s), you can review it:

--- in psql
\dRp
\dRp+ my_publication

--- using pg_catalog
SELECT * FROM pg_publication_tables WHERE pubname = 'my_publication';

From these options, it's easy to think of a publication as a customisable data feed.

Subscription

The other side of logical replication is a subscription. It defines what and how to consume events from a publisher. You subscribe to a publication using connection details and a number of options.

--- basic subscription with full connection detail
CREATE SUBSCRIPTION my_subscription
    CONNECTION 'host=publisher_host port=5432 user=repl_user password=secret dbname=source_db'
    PUBLICATION my_publication;

-- subscription using service definition
CREATE SUBSCRIPTION realtime_only
    CONNECTION 'service=my_source_db'
    PUBLICATION my_publication;

The default behaviour of the subscription is to copy existing data, start immediately, and continue with streaming data changes. Once you start experimenting with logical replication, you can control that behaviour.

--- subscribe without initial copy of the data
CREATE SUBSCRIPTION streaming_only
    CONNECTION 'service=my_source_db'
    PUBLICATION my_publication
    WITH (copy_data = false);

--- or defer the start for later
CREATE SUBSCRIPTION manual_start
    CONNECTION 'service=my_source_db'
    PUBLICATION my_publication
    WITH (enabled = false);

You can monitor your subscriptions:

--- in psql
\dRs
\dRs+ my_subscription

--- or their status using pg_catalog
SELECT subname, received_lsn, latest_end_lsn, latest_end_time
FROM pg_stat_subscription;

Replication Slot

Publications and subscriptions establish the data flow, but there's a critical piece missing: how does the publisher keep track of multiple subscribers reading the WAL at different speeds? Replication slots are the answer. They act as a persistent booking in the WAL stream that tracks exactly where each subscriber is (or was last time), ensuring no changes are lost even if the connection stops.

Let's have a look at what a replication slot can tell us about itself.

# SELECT * FROM pg_replication_slots;
-[ RECORD 1 ]-------+---------------
slot_name           | my_subscription
plugin              | pgoutput
slot_type           | logical
datoid              | 16390
database            | my_db_source
temporary           | f
active              | t
active_pid          | 3162
xmin                |
catalog_xmin        | 777
restart_lsn         | 0/1DEFBF0
confirmed_flush_lsn | 0/1DEFC28
wal_status          | reserved
safe_wal_size       |
two_phase           | f
inactive_since      |
conflicting         | f
invalidation_reason |
failover            | f
synced              | f

The most interesting attributes are:

  • slot_name
  • plugin used for logical replication
  • slot_type confirming it's for logical replication
  • active indicates whether there's a subscriber reading from this slot

And most important of those being:

  • restart_lsn identifying the LSN where this slot "holds" WAL files from being released
  • confirmed_flush_lsn being the last LSN position the subscriber confirmed it has successfully processed (i.e., applied).

While we won't go into details about WAL, it's enough to say LSN (Log Sequence Number) is a unique address or position in the WAL that identifies the position in the stream of database changes.

In most cases, you don't have to create replication slots manually – subscriptions create and manage them. The persistent nature of the slots guarantees they survive the restart of both publisher and subscriber.

Please note, if the subscription is not actively consuming the changes, the publisher PostgreSQL won't release WAL files that contain changes a slot has not consumed. This prevents data loss, but can easily fill up disk if any of the subscribers fall too far behind.

You can monitor the WAL retention per each replication slot.

# SELECT
    slot_name,
    active,
    pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS wal_retained
FROM pg_replication_slots;

-[ RECORD 1 ]+--------------------
slot_name    | my_subscription
active       | t
wal_retained | 265 MB

Replication Identity

As we have already covered, logical replication works with row-level changes, such as INSERT, UPDATE, and DELETE. And not all operations are equal. While INSERT is relatively straightforward (a new row is sent to the subscriber), for both UPDATE and DELETE operations, PostgreSQL needs to be able to uniquely identify the target row to modify on the subscriber.

This is managed by the REPLICA IDENTITY property of each published table. By default, if a table has a primary key (it has, right?), or you can specify it USING INDEX for a unique index, or as an alternative, specify FULL (should be generally avoided) to all old column's row values to find the target row to update. Please note, PostgreSQL might default to REPLICA IDENTITY FULL if no primary key or suitable key index exists.

Whenever possible, please always:

  • Choose REPLICA IDENTITY DEFAULT (for primary key), giving you the most efficient choice. If you don't have a primary key, please consider using it (a surrogate index is always an option).
  • REPLICA IDENTITY USING INDEX index_name if there would be a better (or smaller) unique index matching the business logic or architecture of your database model.

There are special cases of what not to do when it comes to replication identity, but we will touch on those cases in the advanced section of this guide.

Schema Changes

Although it was already mentioned, it's vital to reiterate that logical replication, as it streams only row-level changes rather than full WAL segments, does not automatically replicate Data Definition Language (DDL) changes.

Any schema changes made on the publisher must be manually applied to all subscribers before data reflecting the change is replicated.

We will cover schema specifically in later parts of this guide, but to sum it up, this is the recommended order of operations when it comes to schema changes:

  1. (Optional) but recommended for breaking changes, pause replication where applicable.
  2. Apply and verify DDL changes to the subscribers.
  3. Apply and verify DDL changes on the publisher.
  4. Resume the replication if applicable.

While this might seem like a major drawback compared to physical replication, it's the characteristic that gives us the most flexibility.

Other Considerations

As logical replication reliably handles row-level changes, it's crucial to understand other specific limitations and behaviours. Specifically:

  • It's important to understand that sequences are not replicated. While they are used to generate the value on the publisher, and their value advances there, the corresponding sequence (if it exists) on the subscriber won't be updated.
  • Other database objects won't be replicated. While it might be understandable for views, stored procedures, triggers, and rules, you also can't rely on it for materialized views.
  • Special consideration (similar to schema changes) must be paid to user-defined data types. If a replicated table uses a user-defined type (e.g., a column using an ENUM), the type must already be available on the subscriber(s) and have exactly the same name and structure. As ENUMs are represented as ordered sets, even the order of the values matters!

Living with Logical Replication

In our previous example, we set up the publisher and subscriber, relied on the initial copy of the data, and let things run. Both of them will survive a restart, and thanks to the replication slot, they will keep going as soon as they come online.

But what if you need to perform maintenance or do regular things like a schema update on the subscriber or the publisher? Sometimes you might need to temporarily stop the replication. PostgreSQL makes this straightforward with subscription management, allowing you to stop consuming the changes.

ALTER SUBSCRIPTION my_subscription DISABLE;

As we hinted earlier, when disabled, the subscription stops consuming changes from the publisher, while keeping the replication slot. This means:

  • No new changes are applied on the subscriber.
  • WAL files will start to accumulate on the publisher (since the replication slot holds its position).

While disabled, you can't really check the status from the subscriber, as only the replication slot has the data. You can use the query we mentioned above on the publisher to check the status.

SELECT
    slot_name,
    active,
    pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS wal_retained
FROM pg_replication_slots;

When ready, you can resume the subscription.

ALTER SUBSCRIPTION my_subscription ENABLE;

Coordinating Schema Changes

As we mentioned earlier, since logical replication does not automatically replicate DDL changes, you need to coordinate schema updates manually. Here's a basic overview of how to perform such a change, step by step.

First, disable the replication on the subscriber and apply your schema changes.

--- disable replication
ALTER SUBSCRIPTION my_subscription DISABLE;

--- apply changes on the subscriber
ALTER TABLE products ADD COLUMN category_id INTEGER;
CREATE INDEX products_by_category ON products(category_id);

Only then can you apply the changes to the publisher.

ALTER TABLE products ADD COLUMN category_id INTEGER;
CREATE INDEX products_by_category ON products(category_id);

And resume the replication.

ALTER SUBSCRIPTION my_subscription ENABLE;

If you applied the schema change to the publisher first, any new data using the new column would fail to replicate to the subscriber that doesn't have the column yet.

Handling Incompatible Changes

PostgreSQL's default conflict resolution is very simple. When a conflict occurs, logical replication stops, and the subscription enters an error state. For example, if you consider a product row already present on the subscriber, you will see something like this in the log files:

ERROR: duplicate key value violates unique constraint "products_pkey"
DETAIL: Key (id)=(452) already exists.
CONTEXT:  processing remote data for replication origin "pg_16444" during message type "INSERT" for replication target relation "public.products" in transaction 783, finished at 0/1E020E8
LOG:  background worker "logical replication apply worker" (PID 457) exited with exit code 1

As you can see, logical replication stopped with an error code and won't continue until you manually resolve the conflict. To do so, you need to identify the conflicting data and decide what to do with it.

-- option 1: remove the conflicting data
DELETE FROM products WHERE id = 452;

-- option 2: update the data to match the expected state
UPDATE products SET name = 'New Product', price = 29.99 WHERE id = 452;

Only after the conflict is resolved can the logical replication continue. This has been only a very simple example of a conflict that might arise during logical replication. Other examples might involve:

  • Data problems, constraint violations, or type mismatch
  • Schema conflicts (missing or renamed table/column)
  • Permissions issues or row-level security

In all those cases, you still need to go and fix the problem before logical replication can resume.

Wrap Up

Logical replication in PostgreSQL opens up powerful possibilities beyond traditional physical replication. We have covered the foundations that can get you started with setting up your own logical replication environment, including understanding the differences between physical and logical replication, configuring publishers and subscribers, and managing core components like publications, subscriptions, and replication slots.

This article is part of the upcoming guide Mastering Logical Replication in PostgreSQL. If you are interested in the topic, please consider subscribing to get the latest articles as they are published.