TwinDB Really Loves Backups
A week or two ago one of my former colleagues (at Percona) Jevin Real gave a talk titled Evolving Backups Strategy, Deploying pyxbackup at Percona Live 2015 in Amsterdam. I think Jervin raised some very good points about where MySQL backup solutions in general fall short. There are definitely a lot of tools and scripts out there that claim to do MySQL backups correctly, but don’t actually do it correctly. What I am more interested though is in measuring TwinDB against the points that Jervin highlighted to see if TwinDB falls short too.
We distribute TwinDB agent as a package that can be installed using the standard OS package management system. For example, using YUM on CentOS, RHEL and Amazon Linux, or using APT on Debian and Ubuntu. Since the package is installed using standard packaging systems, hence any dependencies are resolved in the standard way. The packages are publicly available on the TwinDB repository:
- https://packagecloud.io/twindb/main for CentOS, RHEL, Debian and Ubuntu
- https://packagecloud.io/twindb/amzn_main for Amazon Linux
Note that we had to package the TwinDB agent separately for Amazon Linux because since version 2015.03 it is kind of a special case where it can neither be called RHEL 6 nor a RHEL 7. One issue we hit was with the perl-DBD-MySQL package which is a dependency of Percona XtraBackup package. We had to repackage perl-DBD-MySQL, details on why and how can be found in one of our earlier blog posts Xtrabackup and MySQL 5.6 on Amazon instance.
Furthermore, we have tried to define the instructions on how to install the agent very clearly.
TwinDB agent is itself written in Python. Python as a language is portable and runs on many unix variants as mentioned at General Python FAQ. What we also did to further test portability was to build a Continuous Integration/Continuous Deployment pipeline to run automated integration tests on all the platforms that we support. Below is a list of the platforms that we support:
- CentOS 6
- CentOS 7
- Ubuntu Precise
- Ubuntu Trusty
- Debian Wheezy
- Debian Jessie
- Amazon 2014.09
- Amazon 2015.03
You are rest assured that we take portability very seriously and nothing gets rolled out without being automatically tested.
I would be publishing a separate post detailing how we implemented the Continuous Integration/Continuous Deployment pipeline.
A Tool for non-DBAs
The main reason we developed TwinDB agent was to simplify the backup process. We did not want to build something that was complex to use or that would require the user to be a DBA. That is why in most cases no configuration is needed on part of the user. Things like backup schedule and retention policy already have sane defaults which work for most of the cases. A configuration can be tied to a group of MySQL servers or a single MySQL server and if you want to change the backup schedule it is very simple to change it from the GUI.
You do not need to SSH into the MySQL server to do any kind of configuration of the TwinDB agent. The agent is also smart enough to detect replication topology and chooses an appropriate slave to take backups. If the replication topology changes, the agent is able to detect that and would automatically choose a different slave to take backups.
No babysitting needed
As you would have already noticed from the description above, in majority of the cases the agent is autonomous. The only times you would really need to interact with it is for example through the GUI when you have to make configuration changes, or when you need to restore backups. And then you have TwinDB support itself whenever you need it and wherever you need it.
Remote streaming of backups
The TwinDB agent stores the backup on dedicated storage space and does not store it on the same server running MySQL. You do not even have to manage the storage space. TwinDB manages the storage space and provides you with two options:
- TwinDB-hosted scalable and secure storage in the cloud
- On-premises secure hosting on your hardware, be it baremetal or your private cloud
There are many reasons why you would backup MySQL server. But the most important reason is for you to be able to restore the backup as and when needed. We here at TwinDB understand that and have simplified the restore process so that you can just click to do a restore of the backup.
Off-site backups have traditionally been a starting point for data protection. However there are risks involved at multiple stages: transporting the backups and storing them. Therefore, from our perspective encryption of data-in-flight and encryption of data-at-rest is equally important as it enables both, safe transport and safe storage of backed-up data. The next logical question would be: How exactly is encryption done? TwinDB uses asymmetric encryption using OpenGPG. The GPG key is auto-generated on a MySQL server that is being backed up by the agent and the users own the GPG private key. Furthermore, the encryption is always on. So whether you use TwinDB agent as an off-site backup solution or if you backup to on-premises storage space, you are rest assured that the backups are always encrypted.
Full and incremental backups
The TwinDB agent supports both full and incremental backups. There is no additional configuration needed to enable incremental backups. The GUI allows you to configure the backup schedule as shown in one of the screenshots above. You can configure how often full and incremental backups get taken.
We are working on a feature which would allow in-flight backup validation. Our extensive experience with the InnoDB Data Recovery Tool and with data recoveries itself, gives us a head-start on how we want to implement backup validation. In-flight backup validation will provide immediate feedback in contrast to backup validation techniques generally used right now. This is all the more beneficial for users with large datasets.
At the moment though, our backup validation strategy also revolves around regular backup restores.
Reading through all that I have written I hope you get a better picture of where TwinDB agent stacks up against all the other backup solutions. I certainly got a very clear picture when comparing to all the points that my former colleague Jervin raised. I think TwinDB is a perfect example of how a backup solution should be implemented and furthermore it just works.