Beyond the Basics of Logical Replication

With First Steps with Logical Replication we set up a basic working replication between a publisher and a subscriber and were introduced to the fundamental concepts. In this article, we're going to expand on the practical aspects of logical replication operational management, monitoring, and dive deep into the foundations of logical decoding.

Initial Data Copy

As we demonstrated in the first part, when setting up the subscriber, you can choose (or not to) to rely on initial data copy using the option WITH (copy_data = false). While the default copy is incredibly useful behavior, this default has characteristics you should understand before using it in a production environment.

The mechanism effectively asks the publisher to copy the table data by taking a snapshot (courtesy of MVCC), sending it to the subscriber, and thanks to the replication slot "bookmark," seamlessly continues streaming the changes from the point the snapshot was taken.

Simplicity is the key feature here, as a single command handles the snapshot, transfer, and transition to ongoing streaming.

The trade-off you're making is when it comes to performance, solely due to the fact that it's using a single process per table. Behind the scenes the initial synchronization is done using COPY given the best possible performance. While it works almost instantly for majority of the tables, you will encounter notable delay and overhead when dealing with tables with gigabytes of data.

Although parallelism can be controlled by the max_sync_workers_per_subscription configuration parameter, it still might leave you waiting for hours (and days) for any real-life database to get replicated. You can monitor whether the tables have already been synchronized or are still waiting/in progress using the pg_subscription_rel catalog.

SELECT srrelid::regclass AS table_name, srsubstate
FROM pg_subscription_rel;

Where each table will have one of the following states:

i not yet started
d copy is in progress
s syncing (or waiting for confirmation)
r done & replicating

Luckily, the state r indicates that the streaming of changes can start even if not all tables are synchronized. Nevertheless, the replication slot retains the LSN position, which means that the publisher will retain WAL files until the subscriber has caught up.

Manual Synchronization for Production Workloads

As mentioned above, the implicit copy might bring performance trade-offs that are simply too much for production use, unless you consider it while designing your logical replication topology. In all other scenarios, going with manual synchronization is the way forward.

The entire process has one leading principle: the data you load manually must be consistent with the point in time from which the logical replication stream begins. This is achieved by creating a logical replication slot before (or synchronized with) the point at which you restore the data (PITR) and enabling it once the data is transferred. Only this will allow you to correctly apply all subsequent changes on the subscriber.

There are several ways to achieve this, and you will have to evaluate the mechanism based on the available constraints in your particular use case:

Use the backup & restore mechanism that supports Point-in-Time Recovery (PITR) if the amount of data to be synchronized approaches the total size of the backed-up data (i.e., most of the tables).
Orchestrate the data change ingestion on the publisher with the backup, by stopping the incoming changes (for example, during a planned maintenance window) and only restoring them when a consistent snapshot is available to a known LSN. This is for cases where you can control table ingestion and the amount of data being transferred fits the maintenance window.
If none of this is available, you might be able to manually advance replication slots to a predefined LSN. Please understand, this should be a last resort as it might be considered expert domain.

The recommended way for the most reliable backup and restore handling is the use of pgBackRest, as it allows you to restore a particular backup in advance and apply only changes needed later to advance to the selected Point-in-Time, hence significantly reducing the time required for the initial sync. On the other hand, if you are only using a small portion of the data within the backup on your subscriber, it might create resource constraints not acceptable for the setup.

While pg_dump can't be considered a reliable backup tool, it might help you in case you are talking about making a consistent backup of a small subset of tables using the "synchronized snapshot" created using pg_export_snapshot.

pg_dump -d publisher_db --snapshot="000004A2-1" --data-only -Fc -f initial_data.dump

Please note, the topic of consistent backups and Point-in-Time-Restore is way beyond this logical replication guide.

Monitoring Logical Replication

Once you set up your publishers and subscribers and have the initial data in place, it's time to start thinking about how to keep that process running.

For day-to-day operations, the basic building block for monitoring is the pg_replication_slots catalog. Sample query:

SELECT
    slot_name,
    plugin,
    slot_type,
    active,
    pg_size_pretty(
        pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)
    ) AS wal_retained_size
FROM pg_replication_slots
WHERE
    slot_type = 'logical';

giving you result:

-[ RECORD 1 ]-----+-----------------------------
slot_name         | my_sample_subscription
plugin            | pgoutput
slot_type         | logical
active            | t
wal_retained_size | 49 MB

You should look for two things:

active indicating whether the slot is being used by any connection. While this might be acceptable for shorter durations, it might be a sign that the subscriber is down or not configured properly.
The computed wal_retained_size is a critical metric. An increasing value indicates problems consuming changes by the subscriber, and your cluster might be at risk of running out of disk space.

Both values are important as they might indicate problems you need to pay attention to. The active flag might indicate dangling slots, where the subscriber has been destroyed (or simply wiped) without properly dropping the subscription—hence leaving the replication slot behind. In such cases, the only direct response is dropping the slot by its name:

SELECT pg_drop_replication_slot('my_dangling_slot');

If you want to dig deeper, you can also use pg_stat_replication that provides more (albeit volatile) data about the replication status. It can give you other metrics, like replay_lag.

To wrap up the monitoring part, you can either design your custom checks or use a modified version of the query above:

SELECT
  slot_name,
  CASE WHEN active
    THEN 1
    ELSE 0
  END as active_status,
  pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) as wal_bytes_retained,
  extract(epoch from now()) as time
FROM pg_replication_slots
WHERE slot_type = 'logical';

as part of a Grafana alerting rule:

Alert Rule: "Inactive Replication Slot"
Condition:
  - Query: active_status
  - Reducer: last()
  - Evaluator: below 1
  - For: 30m   # if inactive for 30+ minutes

Additional conditions (AND):
  - Query: wal_bytes_retained
  - Reducer: last()
  - Evaluator: above 1073741824   # 1GB limit

Nevertheless, you need to evaluate the monitoring details applicable for your particular use case(s). There might also be other options, for example, if your architecture allows for it, to use max_slot_wal_keep_size and mark the replication slot invalid and release the WAL files when a particular slot falls behind the configured value (for example, 100GB). While this might sound reasonable, you must considered whatever your application or use case is able to recover from such a data loss.

Evolving Publications

Going back to the first part of our journey into logical replication, there's an important distinction to make. While we have described the initial use case, in real-life your publisher schema will change as time goes by. Tables will be created and included implicitly (via FOR ALL TABLES) or explicitly added/removed.

ALTER PUBLICATION my_publication ADD TABLE public.new_table;
ALTER PUBLICATION my_publication DROP TABLE public.old_table;

But what happens when you add a new table? For the existing subscription, the answer is not much at all. The only way for the subscriber to reflect the changes in the underlying configuration is to refresh the subscription. It's a process where PostgreSQL compares the current subscription table list with the publication table list, (if configured to do so) starts the initial synchronization process and once finished goes to apply streamed changes.

As mentioned, you can refresh a subscription with or without copying the initial data. The copy_data option controls this behavior and is currently the only supported setting.

ALTER SUBSCRIPTION my_subscription
    REFRESH PUBLICATION;

ALTER SUBSCRIPTION my_subscription
    REFRESH PUBLICATION WITH (copy_data = false);

When you remove a table from the publisher, the situation is much simpler as the changes for that particular table are no longer considered for logical decoding.

But what happens if you were to add a new table to the publication without refreshing the subscription? Despite the WAL files including changes for all tables, and the logical decoding process will process all published table changes, there will be no additional WAL retention to consider. The subscriber will advance confirmed_flush_lsn as before, simply because the initial state for a newly added table has not yet been recorded and streamed changes will be (for that moment) ignored.

Logical Decoding

If you paid attention throughout our journey or already worked a bit with logical replication, you might have come across something called the pgoutput plugin. This is the built-in default mechanism responsible for logical decoding—a mechanism that goes through the WAL stream and transforms it into a higher-level format.

An example is, instead of physical replication—byte X at offset Y, in page Z—the logical decoding translates the WAL stream to row-level changes. It uses the context awareness of the source database to understand the changes and change from object identifier to the schema, table, and column name, as well as the respective values (both old and new). It also provides a way to assemble the changes into the correct transaction flow.

The plugin's job is to format that output in a particular way and by itself is not aware of:

any transport layer details
replication slots
any retry logic

PostgreSQL offers a robust framework for logical decoding. The built-in plugins are just a start, and developers can create custom output plugins. But not only that—we can also look inside the logical decoded stream manually.

Let's consider this example:

-- create new publication
CREATE PUBLICATION all_data FOR ALL TABLES;

-- create new replication slot with pgoutput plugin
SELECT pg_create_logical_replication_slot('plugin_demo', 'pgoutput');

-- sample table
CREATE TABLE IF NOT EXISTS demo_table (
    id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    name TEXT,
    email TEXT
);

-- generate some changes with insert/update/delete
INSERT INTO demo_table (name, email) VALUES ('John Doe', 'john@example.com');
UPDATE demo_table SET email = 'doe@example.com' WHERE id = 1;
DELETE FROM demo_table WHERE id = 1;

-- get binary changes
SELECT * FROM pg_logical_slot_get_binary_changes('plugin_demo', NULL, NULL, 'proto_version', '1', 'publication_names', 'all_data');

-- drop the replication slot and the publication
SELECT pg_drop_replication_slot('plugin_demo');
DROP TABLE demo_table;
DROP PUBLICATION all_data;

Giving us a peek into the recently performed changes (output is shortened):

    lsn       | xid  |                                                                 data
------------+------+---------------------------------
 1/D6997380 | 1038 | \x4200000001d69974180
 1/D6997380 | 1038 | \x52000042057075626c696300646...
 1/D6997380 | 1038 | \x49000042054e000374000...
 1/D6997448 | 1038 | \x430000000001d6997...
 1/D6997448 | 1039 | \x4200000001d6997...
 1/D6997448 | 1039 | \x49000042054e0....
 1/D6997510 | 1039 | \x430000000001d6...

Which is correct, but won't help us to demonstrate the flow. Don't forget pgoutput is the actual plugin used for logical decoding, and is binary. But don't despair, PostgreSQL offers the test_decoding plugin which decodes the changes into a human-readable text representation.

-- setup the publication
CREATE PUBLICATION all_data FOR ALL TABLES;

-- create replication slot with test_decoding plugin
SELECT pg_create_logical_replication_slot('plugin_demo', 'test_decoding');

CREATE TABLE IF NOT EXISTS demo_table (
    id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    name TEXT,
    email TEXT
);

-- generate some changes with insert/update/delete
INSERT INTO demo_table (name, email) VALUES ('John Doe', 'john@example.com');
UPDATE demo_table SET email = 'doe@example.com' WHERE id = 1;
DELETE FROM demo_table WHERE id = 1;

-- get human readable changes
SELECT * FROM pg_logical_slot_get_changes('plugin_demo', NULL, NULL);

-- drop the replication slot and the publication
SELECT pg_drop_replication_slot('plugin_demo');
DROP TABLE demo_table;
DROP PUBLICATION all_data;

This time the output is actually going to be helpful.

    lsn       | xid  |                                                        data
------------+------+-----------------------------------------------------------------------------------------------------
 1/D69CCC30 | 1048 | BEGIN 1048
 1/D69CCC98 | 1048 | table public.demo_table: INSERT: id[integer]:1 name[text]:'John Doe' email[text]:'john@example.com'
 1/D69CCDC0 | 1048 | COMMIT 1048
 1/D69CCDC0 | 1049 | BEGIN 1049
 1/D69CCDC0 | 1049 | table public.demo_table: UPDATE: id[integer]:1 name[text]:'John Doe' email[text]:'doe@example.com'
 1/D69CCE50 | 1049 | COMMIT 1049
 1/D69CCE50 | 1050 | BEGIN 1050
 1/D69CCE50 | 1050 | table public.demo_table: DELETE: id[integer]:1
 1/D69CCEC0 | 1050 | COMMIT 1050

Just reading it, you can easily follow the sequence of the transaction (auto-commit in psql) and track individual changes.

We can also reiterate the importance of REPLICA IDENTITY with this example. If you double-check the DDL for demo_table, you will see IDENTITY used, which by its definition will create a primary key. With default replica identity, you are effectively relying on that as the primary source of identifying data. You can use the test_decoding plugin to demonstrate the verbosity it will create.

(In this example, leaving the setup and tear down.)

ALTER TABLE demo_table REPLICA IDENTITY FULL;

INSERT INTO demo_table (id, name, email) OVERRIDING SYSTEM VALUE VALUES (999, 'John Doe', 'john@example.com');
UPDATE demo_table SET email = 'doe@example.com' WHERE id = 999;
DELETE FROM demo_table WHERE id = 999;

SELECT * FROM pg_logical_slot_get_changes('plugin_demo', NULL, NULL);

And you can observe the changes. While by its nature INSERT gives you the same output (all values are being sent), UPDATE and DELETE are no longer relying on the primary key. Instead, they have to provide all values, for the UPDATE case both old and new data:

lsn  | 1/D69F8770
xid  | 1062
data | table public.demo_table: UPDATE: old-key: id[integer]:999 name[text]:'John Doe' email[text]:'john@example.com' new-tuple: id[integer]:999 name[text]:'John Doe' email[text]:'doe@exa
mple.com'

and for DELETE the old data.

lsn  | 1/D69F8828
xid  | 1063
data | table public.demo_table: DELETE: id[integer]:999 name[text]:'John Doe' email[text]:'doe@example.com'

Making it obvious how misconfigured replica identity might increase the amount of data decoded and replicated.

Fine-grained Publication Control

Before we introduce the advanced replication topologies in later parts of this series, let's have a look at how you can precisely control what data to replicate to the subscribers. Fine-grained control helps you to control replication overhead, reduce network traffic, enhance security, and ensure the subscribers only receive the data they need.

The first option to filter the data is by explicit column_list, allowing you to exclude sensitive (PII or similar) or unnecessary data. When selecting the columns, you should include the primary key or the columns behind replica identity; if not, PostgreSQL will add them automatically.

CREATE PUBLICATION hr_analytics
    FOR TABLE hr.employees (employee_id, first_name, last_name, department, start_date);

Similar to column selection, you can control what operations a publication replicates. The available options are 'insert', 'update', 'delete', and 'truncate'.

CREATE PUBLICATION hr_analytics
    FOR TABLE hr.employees (employee_id, first_name, last_name, department, start_date)
        WITH (publish = 'insert,delete');

Securing Logical Replication

In the previous section, we hit a crucial element of logical replication—limiting access to the publication. Otherwise, what purpose would it have to limit fields published if another publication would still offer them without restrictions? So far in this series, we have relied for simplicity on SUPERUSER access to create and manage publications. To design logical replication for real-life production use, you need a least-privilege model.

Luckily for us, the PostgreSQL security model for logical replication follows the same rules as for regular queries. If a user can't SELECT from a specific table, or specific columns or rows, they can't include them in a publication either. Therefore, this section assumes you are already familiar with the PostgreSQL permission model.

The high-level overview of the permissions needed is that the replication user must have CONNECT permission on the database, USAGE on the schema, and SELECT on specific tables (columns). You also need to ensure (if applicable) row-level security (RLS) is correctly managed.

Using the fine-grained example used above, we can demonstrate the setup. First, creating a role with LOGIN and REPLICATION clauses and granting database and schema access:

CREATE ROLE hr_analytics_role WITH LOGIN REPLICATION PASSWORD 'a_strong_password';

GRANT CONNECT ON DATABASE my_publisher_db TO hr_analytics_role;
GRANT USAGE ON SCHEMA hr TO hr_analytics_role;

The crucial step is to correctly define the GRANT for selecting the data.

GRANT SELECT
    (employee_id, first_name, last_name, department, hire_date)
ON hr.employees TO hr_analytics_role;

This approach ensures that even if someone gains access to the replication role, they cannot expose data beyond what was explicitly granted. The publication will respect these column-level restrictions, creating a secure boundary for your replicated data.

Wrap Up

Today we've moved beyond the basics of logical replication to cover the essential practices for production environments and expanded the understanding of how it actually works.

This article is second part of the upcoming guide Mastering Logical Replication in PostgreSQL. If you are interested in the topic, please consider subscribing to get the latest articles as they are published.

Initial Data Copy#

Manual Synchronization for Production Workloads#

Monitoring Logical Replication#

Evolving Publications#

Logical Decoding#

Fine-grained Publication Control#

Securing Logical Replication#

Wrap Up#