Figuring out when to cluster can be a mountain to climb in itself, but combining that with figuring out which software to use can often be an insurmountable task. In this article I will go over the clustering options available for PostgreSQL and highlight their advantages and disadvantages.
Available Software
When talking about clustering and replication you have to keep a set of standards in mind. You want the sync type to be Synchronous so you don’t lose data, you want some form of load balancing to keep performance optimal and lastly, a form of connection pooling to reduce connection overhead.
Keeping these standards in mind, there are 3 possible solutions that fit this criteria.
Pgpool-II
This is a middleware solution that sits between your DB server and DB client. It also provides high availability, provisioning, read scaling, parallel queries and failover. This is open source and available to anyone who wants to use it. This solution does come with a GUI.
Disadvantages
Configuring and installing this solution can be a bit of a pain. The documentation for the process is a bit lacking and doesn’t cover all operating system-specific differences. There are existing guides you can find but they don’t always work and are sometimes for older versions, leaving you to fill in the gaps. Support can be hard to find and you’re often left to your own methods of troubleshooting when time is important.
Postgres-XL
This is an open source solution that also provides MPP(massively parallel processing), but keep in mind that this is a database solution and not middleware. If you’re already using Postgres, you’ll have to migrate data to your new cluster.
Disadvantages
There are a number of features in the standard Postgres package that are not supported in Postgres-XL.
- Hot Standby
- Savepoints
- Large Objects
- Triggers and Event Triggers
- Foreign Data Wrappers
- Notify and Listen
Citus
This is a Postgres extension so it provides all of the standard PostgreSQL constructs; it also uses MPP to increase query performance. This solution can be installed on premise or in the cloud based on your preference. The community edition is free but lacks some functionality provided by cloud or enterprise solutions like shard rebalancing, tenant isolation, and some security controls.
Disadvantages
- INSERT INTO SELECT isn’t currently supported
- Temporary Tables are not supported
- Composite shard keys are not supported
Conclusion
All of these solutions work and are currently being used in large production environments — the only decision that needs to be made is what are you willing to sacrifice for each scenario. If you’re not advanced with Postgres installation and parameters, then pgpool might not be for you. If you rely heavily on core functionality of Postgres you might want to skip Postgres-XL. Each choice will provide a replication solution and are being used in high end environments by leading tech companies. Evaluation and testing is paramount when deciding on the solution that is best for you.