Resources / Blogs /

Load Balancing/Query Routing using PGPOOL-II

Load Balancing/Query Routing using PGPOOL-II

In This blog we will focus on one prominent feature of Pgpool-II,

which is Load Balancing/Query Routing.

What is Load Balancing?

1)Load balancing is a technique that involves the distribution of a set of tasks among a group of computing resources to optimize processing efficiency.

2)Load balancing is necessary in PostgreSQL to distribute traffic across multiple servers and prevent overutilization or underutilization. This ensures that the workload is evenly distributed, reducing the load on each server.

3)The concept of load balancing involves assigning tasks to resources in a way that maximizes throughput, minimizes response time, and ensures high availability,

4)The load balancing as the process of distributing network traffic across multiple servers to optimize resource utilization and avoid overloading any single resource.

5)In PostgreSQL, load balancing is achieved through tools like Pgpool, which can manage connections to multiple servers, distribute queries across them, and automatically failover to standby servers if needed.

Why do we need Pgpool-II for load balancing?

PostgreSQL does not have a built-in mechanism for load balancing queries across servers in its replication topology, so the responsibility falls on the application or third-party tools such as Pgpool-II. Using Pgpool-II can enable the distribution of read queries across PostgreSQL servers without requiring significant changes in the application layer. A primary-replica server architecture with Pgpool-II acting as an intermediate layer between applications and databases can be visualized in the diagram below.

How is it actually work?

It sit between the application and databases, Pgpool-II acts as an intermediary layer that receives queries from the application and routes them to the backend database servers. To optimize query distribution, Pgpool-II parses each query to determine whether it is a read or write operation before sending it to the appropriate database server.

Types of Load Balancing modes

The pgpool-II provides two options for load balancing. Based on the use case, it is mandatory to pick either one of them before configuration.

  1. Session Level Load Balancing(Default)
  2. Statement Level Load Balancing
Session Level Load Balancing:
  1. This is the default mode
  2. This mode determines the back-end database server to send read queries when the client connects to the pgpool.

For Example, if we have two database servers, one of them is selected randomly each time a new session is created.

Statement Level Load Balancing:
  • Enabling load balancing in this manner requires activating the statement_level_load_balance variable.
  • This mode determines the back-enddatabase server to which read queries are sent whenever a query is initiated.
  • Applications that utilize a single session to execute numerous queries may experience improper load balancing using the default mode, as most queries are directed to the same server.

Now, let’s start the demonstration for load balancing. For this demonstration, the following database architecture will be used

Server NameIP AddressPurpose
pgpool192.168.0.101The server hosted pgpool-II
primary192.168.0.102PostgreSQL Primary Server
replica1192.168.0.103PostgreSQL Read Replica 1
replica2192.168.0.104PostgreSQL Read Replica 2
replica2192.168.0.105PostgreSQL Read Replica 3

The architecture diagram for this demonstration is,

Session Level Load Balancing:

Here, pgpool is running on port 9999. So, accessing the database through pgpool-II as follows

root@pgpool:~# psql -p 9999 -h localhost -U postgres -d postgres -c “show pool_nodes;”

root@pgpool:~#

node_id  |  hostname  | port | status  | pg_status  | lb_weight  |  role   | pg_role  | select_cnt | load_balance_node

————-+——————-+——+————+—————-+—————+————+————+————+—————–

0       | 192.168.0.103 | 5432 | up     | unknown   | 0.333333  | primary | unknown | 2       | true

1       | 192.168.0.104 | 5432 | up     | unknown   | 0.333333  | standby | unknown | 0       | false

2       | 192.168.0.105 | 5432 | unused | unknown   | 0.333333  | standby | unknown | 0       | false

(3 rows)

The current session is connected to node_id 0, which can be verified by checking the value TRUE in the load_balance_mode column. We will now execute several select queries to determine the target server.

postgres=# select 1;

?column?

———-

1

(1 row)

postgres=# select 1;

?column?

———-

1

(1 row)

postgres=# select 1;

?column?

———-

1

(1 row)

postgres =# show pool_nodes;
node_id |  hostname    | port | status | pg_status   | lb_weight |  role   | pg_role | select_cnt | load_balance_node

———+———————-+——+———-+—————-+—————+————+————+————–+———————

0       | 192.168.0.101 | 5432 | up     | unknown   | 0.333333  | primary | unknown | 5          | true

1       | 192.168.0.102 | 5432 | up     | unknown   | 0.333333  | standby | unknown | 0          | false

2       | 192.168.0.103 | 5432 | unused | unknown   | 0.333333  | standby | unknown | 0        | false

(3 rows)

All the queries executed in this session are pointed to the same server node_id 0. Select_cnt values have increased from 2 to 5 confirming it.

Statement Level Load Balancing:

To switch to statement-based load balancing in Pgpool-II, add the following variable to the /etc/pgpool-II/pgpool.conf configuration file and then execute the command pgpool reload (restarting is not necessary).

  • statement_level_load_balance = on

postgres=# show pool_nodes;

node_id |   hostname    | port | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node

———+—————+——+——–+———–+———–+———+———+————+———————

0       | 192.168.0.101 | 5432 | up     | unknown   | 0.333333  | primary | unknown | 12         | true

1       | 192.168.0.102 | 5432 | up     | unknown   | 0.333333  | standby | unknown | 5          | false

2       | 192.168.0.103 | 5432 | down   | unknown   | 0.333333  | standby | unknown | 0          | false

(3 rows)

postgres=# select 1;

?column?

———
1

(1 row)

postgres=# show pool_nodes;

node_id |   hostname    | port | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node

——–+—————+——+——–+———–+———–+———+———+————+——————–

0       | 192.168.0.101 | 5432 | up     | unknown   | 0.333333  | primary | unknown | 12         | false

1       | 192.168.0.102 | 5432 | up     | unknown   | 0.333333  | standby | unknown | 6          | true

2       | 192.168.0.103 | 5432 | down   | unknown   | 0.333333  | standby | unknown | 0          | false

(3 rows)

Refer to the value TRUE in load_balance_mode column to check the actual target of the last select execution.

When using statement mode, the initial select query will be directed to node_id 0, while the second query will go to node_id 1.
Neither load balancing method will send queries to node_id 2, as Pgpool-II has marked it as down. This prevents query failures caused by accessing failed replicas or replicas with replication lag.

More Granular Load Balancing:

Pgpool-II offers more sophisticated query routing options beyond those discussed above. These include load balancing based on application and database names, routing specific queries to the primary server, avoiding reading queries from replicas with lag, and adjusting node weightage.

Load Balancing Based on Application Names:

There may be situations where it is necessary to connect the application to a specific database server without load balancing, such as directing report queries to the reporting replica. This can be accomplished by utilizing the app_name_redirect_preference_list variable.

For instance, the following configuration would direct the dashboard app to exclusively access the replica server and the payment app to access the primary server:

app_name_redirect_preference_list=’payment:0,dashboard:1′

After adding this configuration to the file, pgpool must be reloaded.

root@pgpool:~# psql -p 9999 -h localhost -U postgres -d postgres -c “show pool_nodes;”

node_id |   hostname    | port | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node

——–+—————+——+——–+———–+———–+———+———+————+——————–

0       | 192.168.0.101 | 5432 | up     | unknown   | 0.333333  | primary | unknown | 0          | true

1       | 192.168.0.102 | 5432 | up     | unknown   | 0.333333  | standby | unknown | 0          | false

2       | 192.168.0.103 | 5432 | down   | unknown   | 0.333333  | standby | unknown | 0          | false

(3 rows)

root@pgpool:~# psql -p 9999 -h localhost -U postgres -d postgres -c “show pool_nodes;”

node_id |   hostname    | port | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node |

———+—————+——+——–+———–+———–+———+———+————+——————-+–

0       | 192.168.0.101 | 5432 | up     | unknown   | 0.333333  | primary | unknown | 0          | false             |

1       | 192.168.0.102 | 5432 | up     | unknown   | 0.333333  | standby | unknown | 0          | true              |

2       | 192.168.0.103 | 5432 | down   | unknown   | 0.333333  | standby | unknown | 0          | false             |

(3 rows)

Now it is verified that the payment app hits to node_id 0 and the dashboard app hits to node_id 1.

Load Balancing Based on Database Names:

Pgpool-II also supports load balancing based on database names, which can be accomplished using the , database_redirect_preference_list variable.

Similar to app_name_redirect_preference_list, this variable routes queries based on database names instead of application names. It is beneficial when there are separate live and reporting databases on a server, and queries need to be directed to specific servers based on the database.

For example, to route live database queries to the primary server and reporting database queries to the replica server, the following configuration can be used:

database_redirect_preference_list=’live:0,report:1′

After adding this configuration to the file, pgpool must be reloaded.

This requires pgpool reload post added it in the config file.

root@pgpool:~# psql -p 9999 -h localhost -U postgres -d postgres -c “show pool_nodes;”

node_id |   hostname    | port | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node |

———+—————+——+——–+———–+———–+———+———+————+——————-+

0       | 192.168.0.101 | 5432 | up     | unknown   | 0.333333  | primary | unknown | 0          | true              |

1       | 192.168.0.102 | 5432 | up     | unknown   | 0.333333  | standby | unknown | 0          | false             |

2       | 192.168.0.103 | 5432 | down   | unknown   | 0.333333  | standby | unknown | 0          | false             |

(3 rows)

root@pgpool:~# psql -p 9999 -h localhost -U postgres -d postgres -c “show pool_nodes;”

node_id |   hostname    | port | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node |

———+—————+——+——–+———–+———–+———+———+————+——————-+

0       | 192.168.0.101 | 5432 | up     | unknown   | 0.333333  | primary | unknown | 0          | false             |

1       | 192.168.0.102 | 5432 | up     | unknown   | 0.333333  | standby | unknown | 0          | true              |

2       | 192.168.0.103 | 5432 | down   | unknown   | 0.333333  | standby | unknown | 0          | false             |

(3 rows)

As per the configuration, the queries are hitting the databases live and report on node_id 0 and node_id 1 respectively.

Query routing only to the primary server(Query Pattern Based Routing):

Certain queries can be routed only to the primary server by adding comments in the query like /*NO LOAD BALANCE*/ .

root@pgpool:~# psql -h localhost -p9999 -U postgres -c “select /*NO LOAD BALANCE*/ count(*) from pg_stat_activity”

count

——-

8

(1 row)

root@pgpool:~# psql -h localhost -p9999 -U postgres -c “select /*NO LOAD BALANCE*/ count(*) from pg_stat_activity”

count

——-

9

(1 row)

root@pgpool:~# psql -h localhost -p9999 -U postgres -c “select /*NO LOAD BALANCE*/ count(*) from pg_stat_activity”

count

——-
10

(1 row)

root@pgpool:~# psql -h localhost  -p9999 -U postgres live -c “show pool_nodes”

node_id |   hostname    | port | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node |

———+—————+——+——–+———–+———–+———+———+————+——————-+

0       | 192.168.0.101 | 5432 | up     | unknown   | 0.333333  | primary | unknown | 3          | true              |

1       | 192.168.0.102 | 5432 | up     | unknown   | 0.333333  | standby | unknown | 0          | false             |

2       | 192.168.0.103 | 5432 | down   | unknown   | 0.333333  | standby | unknown | 0          | false             |

(3 rows)

All the queries were directed only to the primary server.

Another option for query pattern-based routing can be accomplished by utilizing the variable “primary_routing_query_pattern_list.” By setting this variable, you can route queries that match a particular pattern to the primary server.

For example, you can route all the queries on table pg_stat_activity to the primary server by setting

primary_routing_query_pattern_list = “.*pg_stat_activity.”.

Replication Lag Handling:

Setting the variable delay_threshold is important to prevent reading outdated data from replicas that have replication lag. This value represents the maximum acceptable value of replication lag in WAL bytes on the replica server compared to the primary server.

When the delay exceeds the configured value, queries won’t be sent to the specific replica until the lag is reduced.

Node Weightage:

Pgpool-II sends queries to the backend servers randomly. However, there may be situations where it is necessary to avoid sending read traffic to the primary server completely.

This can be achieved using weights. To avoid further read traffic to the primary server, set the value of backend_weight0 = 0.

A question that often arises is how Pgpool-II handles a transaction. By default, once a write query is executed in an explicit transaction, subsequent read queries are not load-balanced until the transaction ends. This behavior is controlled by the variable

disable_load_balance_on_write, which has a default value of “transaction.” This value is suitable for most cases and should not be changed. However, this variable also supports other values such as “OFF,” “trans_transaction,” and “Always.”

OFFRead queries are load balanced even if write queries appear. The client may see older data.
TransactionOnce a write query is executed in an explicit transaction, the subsequent read queries are not load-balanced until the transaction ends
Trans_transactionOnce a write query appears in an explicit transaction, subsequent read queries are not load balanced in the transaction, and subsequent explicit transaction until the session ends
AlwaysOnce write queries appear, subsequent read queries are not load balanced until the session ends regardless they are in explicit transactions or not.