How to Properly Load Balance Write Functions With Pgpool
pgpool
Pgpool-II is a middleware that resides between PostgreSQL servers and clients. It can handle connection pooling, replication, and load balancing. It can also limit connections, handle watchdog tasks to coordinate the cluster, and even serve memory query cache.
In master-slave (active-passive) replication topologies, slaves are used only for read operations; by default, pgpool load balances connections in such manner. But in some cases, the default behavior is not useful. For example, specific procedures or functions used in SELECT statements can make write operations while reading. The most basic way of disabling load balancing for a single query is to add a particular comment before the statement.
/*NO LOAD BALANCE*/ SELECT writing_function()
But in some cases, we don't get to alter code. Also, some database adaptors can remove the comments from statements to improve performance. To ensure such statements always go to the master node, we can specify regex-based rules for pgpool.
Test Environment
To test this behavior, we will be using bitnami/pgpool containers. There is an official docker-compose file that quickly deploys and configures three containers, one master PostgreSQL, one slave PostgreSQL, and a pgpool container.
curl -sSL https://raw.githubusercontent.com/bitnami/bitnami-docker-pgpool/master/docker-compose.yml > docker-compose.yml
docker-compose up -d
Some extra parameters need to set to produce more detailed logs. This will help us see the effects of the rules more clearly.
We can alter these options by using the following statements on a live system to enable such logging:
pgpool SET log_per_node_statement=on;
pgpool SET log_statement=on;
pgpool SET log_client_messages=on;
For these commands to work, we need to connect to the database using pgpool. Using .pgpass file can save time.
The blacklist parameters cannot be changed without restarting pgpool. These options are not available as docker environment variables. We need to add custom configuration for the pgpool container.
image: bitnami/pgpool:4
ports:
- 5432:5432
+ volumes:
+ - /path/to/extra.conf:/config/extra.conf
environment:
+ - PGPOOL_USER_CONF_FILE=/config/extra.conf
- PGPOOL_BACKEND_NODES=0:pg-0:5432,1:pg-1:5432
- PGPOOL_SR_CHECK_USER=customuser
The contents of the file:
~$ cat extra.conf
log_destination = 'syslog,stderr'
log_statement = on
log_per_node_statement=on
log_client_messages=on
After altering the docker-compose file, we need to restart the container.
docker-compose restart pgpool
Now we can trace the queries that go over pgpool.
docker-compose logs -f
Now we can test it by running statements with over pgsql:
pgpool_1 | 2020-07-10 15:59:06: pid 132: DETAIL: query: "create table foo(a int);"
pgpool_1 | 2020-07-10 15:59:06: pid 132: LOG: statement: create table foo(a int);
pgpool_1 | 2020-07-10 15:59:06: pid 132: LOG: DB node id: 0 backend pid: 7973 statement: create table foo(a int);
pgpool_1 | 2020-07-10 15:59:10: pid 132: LOG: Query message from frontend.
pgpool_1 | 2020-07-10 15:59:10: pid 132: DETAIL: query: "create table bar(a int);"
pgpool_1 | 2020-07-10 15:59:10: pid 132: LOG: statement: create table bar(a int);
pgpool_1 | 2020-07-10 15:59:10: pid 132: LOG: DB node id: 0 backend pid: 7973 statement: create table bar(a int);
pgpool_1 | 2020-07-10 15:59:15: pid 132: LOG: Query message from frontend.
pgpool_1 | 2020-07-10 15:59:15: pid 132: DETAIL: query: "select * from foo;"
pgpool_1 | 2020-07-10 15:59:15: pid 132: LOG: statement: select * from foo;
pgpool_1 | 2020-07-10 15:59:15: pid 132: LOG: DB node id: 1 backend pid: 20604 statement: select * from foo;
Notice that INSERT and CREATE TABLE statements routed to database 0, which is the master node. SELECT operations routed to database 1, which is the slave node.
Load Balancing
Function Load Balancing
Pgpool supports both whitelist and blacklist approaches to fine-tune how statements routed. We chose to proceed with blacklisting. To set rules based on SQL function names, blackfunctionlist can be used.
The contents of the configuration file:
~$ cat extra.conf
log_destination = 'syslog,stderr'
log_statement = on
log_per_node_statement=on
log_client_messages=on
black_function_list = 'nextval,setval,lastval,currval,functionX.*'
Now lets watch the logs as we run following statements:
SELECT * FROM functionY(1,3);
SELECT * FROM functionX(1,3);
pgpool_1 | 2020-07-10 16:14:46: pid 138: LOG: Query message from frontend.
pgpool_1 | 2020-07-10 16:14:46: pid 138: DETAIL: query: "select * from functionX(1,3);"
pgpool_1 | 2020-07-10 16:14:46: pid 138: LOG: statement: select * from functionX(1,3);
pgpool_1 | 2020-07-10 16:14:46: pid 138: LOG: DB node id: 0 backend pid: 219 statement: select * from functionX(1,3);
pgpool_1 | 2020-07-10 16:14:52: pid 138: LOG: Query message from frontend.
pgpool_1 | 2020-07-10 16:14:52: pid 138: DETAIL: query: "select * from functionY(1,3);"
pgpool_1 | 2020-07-10 16:14:52: pid 138: LOG: statement: select * from functionY(1,3);
pgpool_1 | 2020-07-10 16:14:52: pid 138: LOG: DB node id: 1 backend pid: 201 statement: select * from functionY(1,3);
Statement Load Balancing
What if we are dealing with a particular function that only does write operations when a specific parameter is set? We can maker a finer tuning with black_query_pattern_list
Lets us assume:
- SELECT * FROM functionX(1,3) makes write operations
- SELECT * FROM functionX(2,3) does not make write operations.
The black_function_list
parameter compares the given Regex only with functions called in the statements. black_query_pattern_list
is similar parameter, but it compares the given Regex with the whole statement instead.
We replace the black_function_list
parameter with black_query_pattern_list
in the configuration file:
~$ cat extra.conf
log_destination = 'syslog,stderr'
log_statement = on
log_per_node_statement=on
log_client_messages=on
black_query_pattern_list = '.*functionX\(1.*'
Note that parenthesis needs to be escaped with \
.
We run the following statements:
SELECT * FROM functionX(2,3);
SELECT * FROM functionX(1,3);
pgpool_1 | 2020-07-10 16:28:11: pid 137: DETAIL: query: "select * from functionX(2,3);"
pgpool_1 | 2020-07-10 16:28:11: pid 137: LOG: statement: select * from functionX(2,3);
pgpool_1 | 2020-07-10 16:28:11: pid 137: LOG: DB node id: 1 backend pid: 331 statement: select * from functionX(2,3);
pgpool_1 | 2020-07-10 16:28:15: pid 137: DETAIL: query: "select * from functionX(1,3);"
pgpool_1 | 2020-07-10 16:28:15: pid 137: LOG: statement: select * from functionX(1,3);
pgpool_1 | 2020-07-10 16:28:15: pid 137: LOG: DB node id: 0 backend pid: 467 statement: select * from functionX(1,3);
There are also other ways to configure how pgpool balances the load.
If the developers and DBAs can agree on a naming convention for such functions, a single rule can redirect all such queries.