Show Differences Found by pt-table-checksum

First of all, pt-table-checksum is perfect for checking if the master and its slaves are in-sync. It answers the question “Are the slaves consistent with the master?”. However, if they’re not, pt-table-checksum doesn’t actually tell you what exactly is wrong.

master# pt-table-checksum -q
TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
09-03T22:21:10 0 1 2 1 0 0.013 mysql.proxies_priv
09-03T22:21:10 0 1 9 1 0 0.010 mysql.user

From pt-table-checksum to pt-table-sync

Nevertheless, pt-table-sync may give you a partial answer. It can print SQL statements to sync the replication cluster. Thus, reading the SQL code, you may guess what records were missing/extra or differ.

master# pt-table-sync --print --replicate percona.checksums localhost
DELETE FROM `mysql`.`proxies_priv` WHERE `host`='slave.box' AND `user`='root' AND `proxied_host`='' AND `proxied_user`='' LIMIT 1 /*percona-toolkit src_db:mysql src_tbl:proxies_priv src_dsn:h=localhost dst_db:mysql dst_tbl:proxies_priv dst_dsn:h=ip-192-168-35-251.ec2.internal lock:1 transaction:0 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:21261 user:root host:master.box*/;
REPLACE INTO `mysql`.`proxies_priv`(`host`, `user`, `proxied_host`, `proxied_user`, `with_grant`, `grantor`, `timestamp`) VALUES ('localhost', 'root', '', '', '1', '', '2016-09-03 20:02:28') /*percona-toolkit src_db:mysql src_tbl:proxies_priv src_dsn:h=localhost dst_db:mysql dst_tbl:proxies_priv dst_dsn:h=ip-192-168-35-251.ec2.internal lock:1 transaction:0 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:21261 user:root host:master.box*/;
REPLACE INTO `mysql`.`proxies_priv`(`host`, `user`, `proxied_host`, `proxied_user`, `with_grant`, `grantor`, `timestamp`) VALUES ('master.box', 'root', '', '', '1', '', '2016-09-03 20:02:28') /*percona-toolkit src_db:mysql src_tbl:proxies_priv src_dsn:h=localhost dst_db:mysql dst_tbl:proxies_priv dst_dsn:h=ip-192-168-35-251.ec2.internal lock:1 transaction:0 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:21261 user:root host:master.box*/;
...

This is not very convenient. I’d note that pt-table-sync serves its main purpose – syncing two or more servers. It was never a goal to show you inconsistencies between two servers. Which is something you’d like to know.

Using TwinDB Table Compare

TwinDB Table Compare is a tool that shows the exact difference between a slave and its master. It’s published on PyPi, therefore, you can install it right away:

slave# pip install twindb-table-compare

Old versions of twindb-table-compare used MySQLDb, which depended on libmysql. Not anymore. Now it uses pure Python implementation pymysql. A little convenience for users.

TwinDB Table Compare works side-by-side with pt-table-checksum. You know, pt-table-checksum runs against the master and populates table percona.checksums.

twindb-table-compare reads mismatched chunks from percona.checksums, then runs a SELECT against the master and the slave. Then compares the result and shows it to a user as a diff.

Let’s see twindb-table-compare in action. To illustrate how it works, I intentionally deleted a record directly from the slave.

slave> delete from t1 where id = 393188;

Then, I ran pt-table-checksum on the master, that found an inconsistency. To check if there are any mismatching chunks you can run a query against percona.checksums.

slave# mysql -e "SELECT * FROM percona.checksums WHERE this_cnt<>master_cnt OR this_crc<>master_crc AND db='test'\G"
*************************** 1. row ***************************
            db: test
           tbl: t1
         chunk: 3
    chunk_time: 0.202156
   chunk_index: PRIMARY
lower_boundary: 319858
upper_boundary: 393197
      this_crc: 376517ca
      this_cnt: 73339
    master_crc: 0
    master_cnt: 73340
            ts: 2020-09-13 22:31:28

Now, that we know there is an inconsistency, we need to know what exactly it is.

slave# twindb-table-compare --user dba --password qwerty
...
2020-09-13 22:33:47,642: INFO: compare.get_inconsistencies():543: Executing: SELECT chunk FROM percona.checksums WHERE (this_crc<>master_crc OR this_cnt<>master_cnt) AND db='test' AND tbl='t1'
2020-09-13 22:33:47,643: INFO: compare.get_inconsistencies():554: Found 1 inconsistent chunk
2020-09-13 22:33:47,643: INFO: compare.build_chunk_query():360: # test.t1, chunk 3
2020-09-13 22:33:47,643: INFO: compare.get_chunk_index():63: Executing SELECT chunk_index FROM percona.checksums WHERE db='test' AND tbl='t1' AND chunk = 3
2020-09-13 22:33:47,700: INFO: compare.build_chunk_query():362: # chunk index: PRIMARY
2020-09-13 22:33:47,700: INFO: compare.get_index_fields():86: Executing SELECT COLUMN_NAME FROM information_schema.STATISTICS WHERE TABLE_SCHEMA='test' AND TABLE_NAME='t1' AND INDEX_NAME='PRIMARY' ORDER BY SEQ_IN_INDEX
2020-09-13 22:33:47,702: INFO: compare.get_boundary():129: Executing SELECT lower_boundary, upper_boundary FROM percona.checksums WHERE db='test' AND tbl='t1' AND chunk = 3
2020-09-13 22:33:47,705: INFO: compare.print_horizontal():402: Executing: SELECT id, name FROM test.t1 USE INDEX (PRIMARY) WHERE ( 0 OR ( 1 AND id >= '319858' ) ) AND ( 0 OR ( 1 AND id <= '393197' ) ) 2020-09-13 22:33:52,175: INFO: compare.print_horizontal():424: Executing: SELECT id, name FROM test.t1 USE INDEX (PRIMARY) WHERE ( 0 OR ( 1 AND id >= '319858' ) ) AND ( 0 OR ( 1 AND id <= '393197' ) )
2020-09-13 22:33:56,944: INFO: compare.get_inconsistencies():566: Differences between slave localhost and its master:
@@ -73328,7 +73328,6 @@
393185 40.79178826957176
393186 40.79178826957176
393187 40.79178826957176
-393188 40.79178826957176
393189 40.79178826957176
393190 40.79178826957176
393191 40.79178826957176

From the output we see that the slave is missing a record with id 393188 - the one I deleted.

Efficient Use Of Indexes In MySQL

These are the slides of the “Efficient Use Of Indexes In MySQL” talk we delivered on the SFMySQL Meetup.

This is an introductory talk for developers on MySQL indexes. In my opinion, it’s quite important to understand how InnoDB organizes data. If you know how MySQL accesses data, it’s easier to write optimal queries.

When working with queries, I imagine secondary indexes as a table with records sorted by secondary key fields. This is a powerful concept that helps to understand the MySQL logic. It’s also easy to understand complex optimizations like loose index scan.

For example, for index (last_name, rank) the secondary index table looks like:

Enjoy the slides!

The talk has been recorded. Unfortunately, sound is very quiet. Probably a webinar on the topic will make it up. Let me know if you’d be interested in the comments below!