When we started working with a customer they asked us to setup for them more reliable and performant MySQL database. At the time they were using Percona XtraDB cluster with ELB to balance read traffic between PXC nodes. There was no writer redundancy – if a writer node was down, the app was down. The app by the way is a Magento backed online shop. All stuff was running in AWS.

We started to design the new database architecture. The first component we threw away was ELB. A simple OLTP sysbench test showed that ELB was a bottleneck. The ELB was capable to deliver somewhat around 6800 qps while with ProxySQL the Percona XtraDB Cluster was a bottleneck. We squeezed 10k qps from ProxySQL + PXC. That basically means we got all ProxySQL feature free of charge.

But one ProxySQL instance is a SPOF. To make ProxySQL highly available we put another instance and managed it with keepalived.

Eventually the architecture looked like depicted.

Read/Write queries split

It was very important to distribute load across the cluster. The load on the app was pretty substantial and that eventually hit the database. We needed to make all but a writer nodes to serve reads. The writer however must be one at all the time.

We used a built-in scheduler to manage the nodes. However we realized quickly that it wasn’t the best choice. Soon after deployment we hit some nasty bugs. For example, occasionally the scheduler made two nodes writers. That’s bad because the app doesn’t really like error on COMMIT. You know, with Galera it’s possible due to optimistic locking.

Worse happened when a node from which we took backups was selected as writer.

The worst part that the scheduler was written in Shell. You cannot implement efficient data model in Shell. You cannot write unit test in Shell. I will elaborate later on, but the point is Shell is a bad choice for such a complex job as ProxySQL scheduler.

So, Ovais Tariq did an exceptional job and rewrote the scheduler in Python. The scheduler is a part of our open source project ProxySQL Tools. As of then we could cover the critical parts of the code with unit test as well as with integration tests.

High Availability

A single ProxySQL instance is a single point of failure (SPOF). We needed the second ProxySQL node. Two ProxySQL nodes are managed by keepalived. When active ProxySQL node goes down the second, passive, takes over and continues to serve the app requests.

Keepalived implements VRRP protocol originally designed to provide redundancy of a default gateway in L2 networks.
On Linux servers it basically moves an IP alias between two nodes.

In AWS that doesn’t work that way, unfortunately. Amazon does let to assign an alias to an interface. So we came up with different idea.

We created an Elastic Network Interface, assigned an IP address to it and use that IP as a virtual IP. The application would connect to this IP to reach the database.

When keepalived makes a node active (master in VRRP terminology) it calls a script given by notify_master configuration option. That script is another tool from ProxySQL Tools. We wrote about it before but essentially keepalived runs proxysql-tool aws notify_master. What this tool does is it moves Elastic Network Interface (ENI) to the active node.

Query Rewrite

From time to time we hit a nasty bug # 1258464 in Galera. The bug was confirmed four years ago, but yet not fixed. I really look forward to the Group Replication, it has a chance to perform better here.

It looks like Magento developers stuck in 2004 when MyISAM was still a thing. On some unknown reason Magento regularly sends ALTER TABLE ENABLE/DISABLE KEYS. Occasionally that triggered the bug and a writer node got blocked.

Query rewrite feature in ProxySQL helped us a lot. René suggested to replace ALTER TABLE with some NOOP query. So far it works like a charm.


  • ProxySQL is performant and feature rich proxy for Percona XtraDB Cluster.
  • ProxySQL Tools provide better helper tools for ProxySQL (for a scheduler and HA).