Copy-Paste your Database: Logical Replication with PostgreSQL on AWS RDS

Geert-Jan Zwiers
8 min readOct 11, 2021

--

Major database upgrades are one of the most challenging tasks to perform for any team of developers wanting to build a high-availability infrastructure. Upgrading a PostgreSQL (Postgres) instance to a new major version on AWS Relational Database Service (RDS), for example, involves multiple instance reboots while the database engine is updated, often leading to a significant service interruption.

Logical replication can be used to vastly reduce the downtime required for such operations. A logical replica is basically a new instance of the database that is going to apply the same transactions (insert, update, delete, etc.) in the same order as the original from a certain point-in-time. This is what ‘logical’ means in this context as opposed to physical replication where data addresses are copied byte-by-byte. The chosen point-in-time could be when the instance was created or the time that a snapshot of the data was taken.

On AWS, logical replicas can be used to transfer data to instances with a shiny new major version, different hardware capacity (instance class) or any other modification that would normally cause a lot of downtime. On the other hand, logical replication is not a silver bullet. Only regular tables can be replicated, and they need to have a primary key or some other unique identifier. Furthermore, the database schema and sequence data are not replicated and need to be transferred or updated manually.

In this post, I will go over the steps to take in order to create and synchronize a logical replica on RDS and how to switch an application to the replica when ready.

Preparations

The first thing to do is to check if an RDS Postgres instance can be replicated. Login to an instance with an SQL utility like pgAdmin, then run the following command to see if all tables in the database have a primary key:

SELECT tbl.table_schema,    
tbl.table_name
FROM information_schema.tables tbl
WHERE table_type = 'BASE TABLE'
AND table_schema NOT IN ('pg_catalog', 'information_schema')
AND NOT EXISTS (SELECT 1
FROM information_schema.key_column_usage kcu
WHERE kcu.table_name = tbl.table_name
AND kcu.table_schema = tbl.table_schema)

Next, go to the AWS RDS Console and find your instance. Click on the Configuration tab and find the parameter group for this database. Make sure this group has rds.logical_replication set to 1 (true) and that max_replication_slots, max_logical_replication_workers and max_worker_processes are set to values > 0.

Applying the parameter group changes requires a reboot of the instance, which causes a very short (20–30s) downtime. If you’re working with a production database, it’s best to schedule the reboot at a quiet moment or during a planned maintenance window.

The next step is needed when your database sits on a Virtual Private Cloud (VPC). For example, you may have to connect to the instance over SSH using AWS Bastion, after which you can access the instance on a localhost port. The security group rules for inbound connections need to allow traffic from the VPC address ranges in this case. Your instance should accept inbound traffic from IP address ranges from devices on the same VPC. If the instance is not on a VPC, you have to make sure that it accepts connections from your local IP address instead.

Finally, if you’re just trying this out it helps to add some test data to the database so that we can see that everything is replicated later on. We can create a very simple table users where each user just has a username and id:

CREATE TABLE users(
id INT PRIMARY KEY NOT NULL,
username TEXT
);
INSERT INTO users(id, username)
VALUES (1, 'Mary'),
(2, 'Jack')
(3, 'Mel');

Creating a target database

Now that the original instance or publisher has been prepared, a new instance can be created that is going to replicate the data from the publisher. This instance can be configured with a different major version or instance class, any property that would normally amount to serious downtime. For example, maybe the publisher instance is on Postgres 12, but the new instance should be on Postgres 14 instead.

You can either create a new instance with the AWS Console, or, if you’re using an infra-as-code solution, deploy a new instance from your stacks. This new instance should be on the same VPC as the original (either your own configuration or a default VPC).

Transfer the database schema

Before the replication process can begin the table schema(‘s) need to be copied from the publisher to the subscriber. This can be achieved with the pg_dump and pg_restore commands. An easy way to go is to use AWS Bastion and SSH, which allow you to specify localhost as the host argument for both commands:

pg_dump --schema-only -h localhost -p 5432 -U admin -d postgres -FC schema.archivepg_restore --schema-only -h localhost -p 5433 -U admin -d postgres -s schema.archive

Note the use of --schema-only to copy only the database schema. Furthermore, note that these commands assume that the publisher instance is available on port 5432 of localhost, while the subscriber is on port 5433. You could also run the first command, then logout and let the subscriber connect to port 5432, and run the second command. Finally, the -U parameter is only required if you’re using a different name for the master user than the default postgres, such as admin.

For simplicity the master user is used in this example, though it’s a good practice to make a replication user which only has the database grants needed to do the job.

Logical replication

Now that both databases are ready, the subscriber can begin synchronizing with the publisher, thereby creating a logical replica. First, create a publication on the publisher:

CREATE PUBLICATION mypub FOR ALL TABLES;

If you only want to transfer a few specific tables, you can use FOR TABLE mytable instead of FOR ALL TABLES.

Next, on the subscriber, create a subcription to the publisher:

CREATE SUBSCRIPTION mysub
CONNECTION 'dbname=postgres host=publisher.rds.amazonaws.com user=admin password=mypassword'
PUBLICATION mypub;

The default database name in Postgres is postgres, but if you’re using a different name change the value of dbname in the connection string. You may have noticed that host is not localhost in this command, even though that was used when transferring the schema. That’s because both databases are on the same network (the VPC) and don’t need to use e.g. AWS Bastion to connect, assuming the security groups are properly configured. Instead of localhost you can use the instance’s endpoint which can be found, for example, in the RDS Console.

Once you run the subscription command, the subscriber will begin the replication. You can verify this by looking at the database logs in RDS, which should show some date and timestamps followed by something like:

logical replication apply worker for subscription "mysub" has started

To monitor the progress, run the following on the publisher instance:

SELECT slot_name, confirmed_flush_lsn, pg_current_wal_lsn(), (pg_current_wal_lsn() - confirmed_flush_lsn) AS lsn_distance 
FROM pg_catalog.pg_replication_slots
WHERE slot_type = 'logical'

This returns the value for lsn_distance, which is a representation of how far off the subscriber is in catching up to the publisher. When the lsn_distance returned is 0, the subscriber is in-sync.

Switching to the replica

If the subscriber is all caught up to the publisher, it’s time to update the app to switch over to the new database. From a purely technical standpoint this is not too hard and is done by updating the database connection parameters/connection string. However, it’s important to update in a way that ensures no data is lost. Say some new data comes in last-minute on the publisher, then the replication worker may not be able to sync with the subscriber before the switch is made. To prevent this issue we can disable write operations on the publisher while the app switches. This results in a small window of time in which the original database is ‘readonly’, but if timed well it need only be a minute or so. This is much more brief than any downtime that would take place if no replica had been made and the original instance’s database engine had been upgraded. Users can still be affected by the readonly mode though, so it’s advisable to perform these switches at the right time with announcements beforehand for live databases.

To make the publisher readonly, run the following:

SET default_transaction_read_only= on;

Normally this should not require a reboot, but you can check to be sure with:

SELECT pending_restart FROM pg_settings WHERE NAME = 'default_transaction_read_only';

Make one last check using the command shown earlier that the lsn_distance is indeed 0. If not, your publisher still has some final changes that need to be synchronized. When it’s ready remove the subscription from the subscriber:

DROP SUBSCRIPTION mysub;

The last thing to do is to reset the table sequences on the subscriber. If not done, there’s a chance that Postgres will attempt to insert new rows of data where there’s already an existing row, because the replication does not cover the sequence data. To update all sequences in the database, run the following SQL:

WITH sequences AS (
SELECT *
FROM (
SELECT table_schema,
table_name,
column_name,
pg_get_serial_sequence(format('%I.%I', table_schema, table_name), column_name AS col_sequence FROM information_schema.columns
WHERE table_schema NOT IN ('pg_catalog', 'information_schema')
) t
WHERE col_sequence IS NOT NULL
), maxvals AS (
SELECT table_schema, table_name, column_name, col_sequence,
(xpath('/row/max/text()',
query_to_xml(format('select max(%I) from %I.%I', column_name, table_schema, table_name), true, true, ''))
)[1]::text::bigint AS max_val
FROM sequences
)
SELECT table_schema,
table_name,
column_name
col_sequence,
coalesce(max_val, 0) AS max_val,
setval(col_sequence, coalesce(max_val, 1))
FROM maxvals;

This big chunk of code selects all sequences owned by a column, then gets the max value for that column associated with the sequence. Finally, it applies that max value to each sequence with setval.

And that is it! There is now a logical replica ready for use with the same data, which on AWS you could have given a new Postgres major version or a better instance type with more hardware capacities, all with very little downtime (just a short reboot and a few minutes of readonly mode).

Switching the app depends a bit on your AWS cloud environment. When using AWS ECS it is very easy to switch to the new database by creating a new ECS task definition with the updated database parameters. This can be done from infra-as-code or manually, but take into account the extra time of going through an automated pipeline when deploying the database switch that way.

Logical replication can seem complex at first, but can become a powerful tool for engineers maintaining high-availability infrastructures with some practice. In this post, the necessary steps to prepare and configure a publisher and then set up and synchronize a subscriber were covered. Although AWS VPCs, CDK and ECS were mentioned, the focal point of this guide was AWS RDS with Postgres and how to setup logical replication in that context, because the other AWS services are each a topic on their own. I hope this post helped you in learning a new technique to improve your database skills and that it enables you to perform your mission-critical database changes in the Cloud with confidence.

--

--