PostgreSQL hot standby feedback. How it works.

From time to time, people come to us with a problem, „Our queries cancelled on standby servers, and we enabled hot_standby_feedback, but queries on standby get cancelled. Why?“. Indeed, after reading the documentation, it may seem that the feedback solves query cancelling problems, and if it is enabled the conflicts will disappear. However, this is not true. In this article we take a look at the feedback, how it works and which problems it is intended to solve.
To begin with let us turn to the official documentation:
hot_standby_feedback – Specifies whether or not a hot standby will send feedback to the primary or upstream standby about queries currently executing on the standby. This parameter can be used to eliminate query cancels caused by cleanup records, but can cause database bloat on the primary for some workloads. Feedback messages will not be sent more frequently than once per wal_receiver_status_interval. The default value is off. This parameter can only be set in the postgresql.conf file or on the server command line.

As seen from the description, standby server sends information about queries which are currently running on it. What kind of information? By default the feedback is disabled, in which case wal receiver process sends only information about which parts of the transaction log are written, flushed and replayed (see pg_stat_replication view for details). To send feedback, enable the option and reload postgres.
Now we take a look into postgresql source, particularly on the code of the wal receiver process. This process runs on standby server and receives WALs from master server. We search src/backend/replication/walreceiver.c for the function XLogWalRcvSendHSFeedback. WAL receiver process use this function for sending feedback. It sends feedback only when following conditions are met:

wal_receiver_status_interval parameter is greater than zero;
hot_standby_feedback parameter enabled.

Feedback send interval specified in wal_receiver_status_interval. Before sending feedback the oldest xmin is to be determined: it is the ID of the oldest transaction which executes longer than others transactions (see GetOldestXmin function in src/backend/storage/ipc/procarray.c for details). Note, while searching oldest xmin, PostgreSQL checks all currently running transactions and all logical replication slots. Slots have their own values of xmin and catalog_xmin. Xmin is the oldest transaction ID which retention for this database is needed by this slot. Catalog_xmin is the oldest transaction which affects the system catalogs that should retain in the database for this slot. VACUUM cannot remove tuples deleted by any later transaction.

When the id of the oldest transaction is determined, the vacuum_defer_cleanup_age is subtracted from this value. vacuum_defer_cleanup_age parameter is a means to postpone the VACUUM, it is a parameter in the earlier versions of postgresql that is now replaced by feedback:
vacuum_defer_cleanup_age – Specifies the number of transactions by which VACUUM and HOT updates will defer cleanup of dead row versions. The default is zero transactions, meaning that dead row versions can be removed as soon as possible, that is, as soon as they are no longer visible to any open transaction. You may wish to set this to a non-zero value on a primary server that is supporting hot standby servers, as described in Section 25.5. This allows more time for queries on the standby to complete without incurring conflicts due to early cleanup of rows. However, since the value is measured in terms of number of write transactions occurring on the primary server, it is difficult to predict just how much additional grace time will be made available to standby queries. This parameter can only be set in the postgresql.conf file or on the server command line.

Thus the oldest xmin is equal to the oldest transaction id running on the standby minus vacuum_defer_cleanup_age, or the oldest transaction id that is needed for logical replication slot minus vacuum_defer_cleanup_age.

But that’s not all, now we need to take into account the epoch, because it may happen that the epoch has changed, and transactions counter was restarted. To do this, we take the value of the next transaction and compare it to the oldest xmin. If the oldest xmin is greater than the next transaction, then the epoch value is decreased.

Now we have to construct our feedback message and send it to the master. To do this, initialize a blank message, add the label which says that the this message is feedback, then add current timestamp, oldest xmin and epoch. Now send the message to the master, using walrcv_send (actually libpqrcv_send) function. At this point, the XLogWalRcvSendHSFeedback is done.

Next, the message will be accepted by wal sender process that is running on the master or an upstream server. Oldest xmin value will be recorded in the WAL sender’s internal structure MyPgXact->xmin. Every postgres backend has it’s own structure. In the future, when the vacuum occurs, and before the vacuuming relation, the oldest transaction will also be determined (with the function GetOldestXmin that we already know) and the WAL senders’s MyPgXact->xmin will be also taken into account. The calculated oldest xmin will be used to check tuple visibility in any open transactions. If the tuple is still seeing at least one transaction, it cannot be vacuumed.

Thus wal sender process acts as an intermediary for standby and if the vacuum has to process tuples which are visible on standby transactions, it will be forced to skip these tuples.

Summary. The parameter hot_standby_feedback does nothing with conflict resolution of streaming replication and its only purpose is to avoid queries cancellation on standby in the case the vacuum on master server cleans something up.

Thanks!

Bemerkungen: 8

8 Antworten zu “PostgreSQL hot standby feedback. How it works.”

dif sagt:

September 23, 2015 um 12:03 am Uhr

> "Our queries cancelled on standby servers, and we enabled hot_standby_feedback, but queries on standby get cancelled. Why?"

Ok, what should i do to not get canceled?

Antworten
dif sagt:

September 23, 2015 um 12:37 am Uhr

This comment has been removed by the author.

Antworten
Alexey Lesovsky sagt:

Oktober 5, 2015 um 7:11 am Uhr

Sorry for late answer, see max_standby_streaming_delay.

Antworten
ioguix sagt:

Oktober 13, 2015 um 10:02 pm Uhr

…or if your concurrency and workload allows it, put your replication on hold with pg_xlog_replay_pause(), send your query, then resume your replication with pg_xlog_replay_resume().

Antworten
Ilya Kosmodemiansky sagt:

Oktober 15, 2015 um 11:02 am Uhr

The key word is "if". The first issue in such case will be autovacuum performance on master if there is any heavy write workload

Antworten
Anonym sagt:

Mai 30, 2023 um 5:41 am Uhr

what is the role of hot standby feedback in logical replication

Antworten
raj sagt:

Mai 30, 2023 um 5:42 am Uhr

role of hot_standby_feedback parameter in case of logical replication

Antworten
- Sergei Kornilov sagt:
  
  Juni 4, 2023 um 8:22 pm Uhr
  
  Up to 16th version of PostgreSQL hot_standby_feedback doesn’t play any role in logical replication. Logical decoding can be started only on primary, and PostgreSQL tracks the required oldest xmin for logical decoding internally (pg_replication_slots.catalog_xmin) regardless whether or not hot_standby_feedback is activated on the logical replication side. I took a quick look at the code: hot_standby_feedback is actually only used in the wal receiver code, not in the logical replication worker. That is, it only makes sense for streaming (physical) replication.
  
  Starting Postgres 16 (release is planned for sometime in the fall) there will be a nuance here. The process is the same if we set up logical replication from primary to another primary, however, it will be possible to start logical decoding from a physical replica: primary -> streaming replication -> standby, and from that standby to launch logical replication to a different Postgresql instance. In this case, that standby should have hot_standby_feedback turned on to work normally.
  
  Hope this helps,
  Sergei
  
  Antworten

Schreibe einen Kommentar Antworten abbrechen

Paket	Preis*
PostgreSQL Configuration-Analyse	ab 500 €
Hardware und Betriebssystem Analyse	ab 600 €
Analyse langsamer SQL-Abfragen	ab 500 €
Umfassender Health Check	ab 3500 €
* für eine einzelne PostgreSQL Installation (1 master und 1 replica)

Basic	Premium	VIP
Inkl. bis zu 5 Stunden Beratungszeit pro Monat	Inkl. bis zu 15 Stunden Beratungszeit pro Monat	Inkl. bis zu 30 Stunden Beratungszeit pro Monat
Slack, Mattermost, Telegram	Slack, Mattermost, Telegram	Slack, Mattermost, Telegram
Antwortzeit bis zu — 1 Stunde	Antwortzeit bis zu — 1 Stunde	Antwortzeit bis zu — 30 Minuten
2 PostgreSQL Instance max	5 PostgreSQL Instance max	10 PostgreSQL Instance max

Basic

Premium

VIP

Inkl. bis zu 5 Stunden Beratungszeit pro Monat

Inkl. bis zu 15 Stunden Beratungszeit pro Monat

Inkl. bis zu 30 Stunden Beratungszeit pro Monat

Slack, Mattermost, Telegram

Antwortzeit bis zu — 1 Stunde

Antwortzeit bis zu — 30 Minuten

2 PostgreSQL Instance max

5 PostgreSQL Instance max

10 PostgreSQL Instance max

Nachrichten und Blog zurück

PostgreSQL hot standby feedback. How it works.

Das könnte Sie auch interessieren:

Operating PostgreSQL as a Data Source for Analytics Pipelines – Recap from the Stuttgart Meetup

Data archiving and retention in PostgreSQL. Best practices for large datasets

Taming large datasets in PostgreSQL: archiving and retention without the pain

How to Upgrade RDS PostgreSQL with Minimal Downtime