Recover Table Structure From InnoDB Dictionary

sys_parser

When a table gets dropped, MySQL removes the respective .frm file. This post explains how to recover the table structure if the table was dropped.

You need the table structure to recover a dropped table from the InnoDB tablespace. The B+tree structure of the InnoDB index doesn’t contain any information about field types. MySQL needs to know that in order to access records of the InnoDB table. Normally, MySQL gets the table structure from the .frm file. But when MySQL drops a table the respective frm file removed too.

Fortunately, there’s one more place where MySQL keeps the table structure. It’s the InnoDB dictionary.

The InnoDB dictionary is a set of tables where InnoDB keeps information about the tables. I reviewed them in detail in a separate InnoDB Dictionary post earlier. After the DROP, InnoDB deletes records related to the dropped table from the dictionary. So, we need to recover deleted records from the dictionary and then get the table structure.

Compiling Data Recovery Tool

First, we need to get the source code. The code is hosted on GitHub.

git clone https://github.com/twindb/undrop-for-innodb.git

To compile it, we need gcc, bison and flex. Install these packages with a package manager (yum/apt/etc). Then, time to compile.

# make
cc -g -O3 -I./include -c stream_parser.c
cc -g -O3 -I./include  -pthread -lm stream_parser.o -o stream_parser
flex  sql_parser.l
bison  -o sql_parser.c sql_parser.y
sql_parser.y: conflicts: 6 shift/reduce
cc -g -O3 -I./include -c sql_parser.c
cc -g -O3 -I./include -c c_parser.c
cc -g -O3 -I./include -c tables_dict.c
cc -g -O3 -I./include -c print_data.c
cc -g -O3 -I./include -c check_data.c
cc -g -O3 -I./include  sql_parser.o c_parser.o tables_dict.o print_data.o check_data.o -o c_parser -pthread -lm
cc -g -O3 -I./include -o innochecksum_changer innochecksum.c

Recover The InnoDB Dictionary

Now, let’s create dictionary tables in the sakila_recovered database. The data recovery tool comes with the structure of the dictionary tables.

# cat dictionary/SYS_* | mysql sakila_recovered

The dictionary is stored in the ibdata1 file. So, let’s parse it.

./stream_parser -f /var/lib/mysql/ibdata1
 
...
 
Size to process:                  79691776 (76.000 MiB)
Worker(0): 84.13% done. 2014-09-03 16:31:20 ETA(in 00:00:00). Processing speed: 7.984 MiB/sec
Worker(2): 84.21% done. 2014-09-03 16:31:20 ETA(in 00:00:00). Processing speed: 8.000 MiB/sec
Worker(1): 84.21% done. 2014-09-03 16:31:21 ETA(in 00:00:00). Processing speed: 4.000 MiB/sec
All workers finished in 2 sec

Now, we need to extract the dictionary records from the InnoDB pages. Let’s create a directory for the table dumps.

# mkdir -p dumps/default

And now we can generate table dumps and LOAD INFILE commands to load the dumps. We also need to specify the -D option to the c_parser because the records we need were deleted from the dictionary when the table was dropped.

SYS_TABLES

# ./c_parser -4Df pages-ibdata1/FIL_PAGE_INDEX/0000000000000001.page \
    -t dictionary/SYS_TABLES.sql \
    > dumps/default/SYS_TABLES \
    2> dumps/default/SYS_TABLES.sql

SYS_INDEXES

# ./c_parser -4Df pages-ibdata1/FIL_PAGE_INDEX/0000000000000003.page \
    -t dictionary/SYS_INDEXES.sql \
    > dumps/default/SYS_INDEXES \
    2> dumps/default/SYS_INDEXES.sql

SYS_COLUMNS

# ./c_parser -4Df pages-ibdata1/FIL_PAGE_INDEX/0000000000000002.page \
    -t dictionary/SYS_COLUMNS.sql \
    > dumps/default/SYS_COLUMNS \
    2> dumps/default/SYS_COLUMNS.sql

SYS_FIELDS

# ./c_parser -4Df pages-ibdata1/FIL_PAGE_INDEX/0000000000000004.page \
    -t dictionary/SYS_FIELDS.sql \
    > dumps/default/SYS_FIELDS \
    2> dumps/default/SYS_FIELDS.sql

With the generated LOAD INFILE commands it’s easy to load the dumps.

# cat dumps/default/*.sql | mysql sakila_recovered

Now we have the InnoDB dictionary loaded into normal InnoDB tables.

Compiling sys_parser

ys_parser is a tool that reads the dictionary from tables stored in MySQL and generates the CREATE TABLE structure for a table.

To compile it we need MySQL libraries and development files. Depending on distribution, they may be in -devel or -dev package. On RedHat based systems, you can check it with the command yum provides “*/mysql_config” . On my server it was the mysql-community-devel package.

If all necessary packages are installed, the compilation boils down to a simple command:

# make sys_parser
/usr/bin/mysql_config
cc `mysql_config --cflags` `mysql_config --libs` -o sys_parser sys_parser.c

Recover Table Structure

Now sys_parser can do its magic. Just run it to get the CREATE statement in the standard output.

# ./sys_parser
sys_parser [-h <host>] [-u <user>] [-p <passowrd>] [-d <db>] databases/table

It will use “root” as the username to connect to MySQL, “querty” – as the password. The dictionary is stored in SYS_* tables in the sakila_recovered database. What we want to recover is sakila.actor. InnoDB uses a slash “/” as a separator between database name and table name, so does sys_parser.

# ./sys_parser -u root -p qwerty  -d sakila_recovered sakila/actor
CREATE TABLE `actor`(
`actor_id` SMALLINT UNSIGNED NOT NULL,
`first_name` VARCHAR(45) CHARACTER SET 'utf8' COLLATE 'utf8_general_ci' NOT NULL,
`last_name` VARCHAR(45) CHARACTER SET 'utf8' COLLATE 'utf8_general_ci' NOT NULL,
`last_update` TIMESTAMP NOT NULL,
PRIMARY KEY (`actor_id`)
) ENGINE=InnoDB;
# ./sys_parser -u root -p qwerty  -d sakila_recovered sakila/customer
CREATE TABLE `customer`(
`customer_id` SMALLINT UNSIGNED NOT NULL,
`store_id` TINYINT UNSIGNED NOT NULL,
`first_name` VARCHAR(45) CHARACTER SET 'utf8' COLLATE 'utf8_general_ci' NOT NULL,
`last_name` VARCHAR(45) CHARACTER SET 'utf8' COLLATE 'utf8_general_ci' NOT NULL,
`email` VARCHAR(50) CHARACTER SET 'utf8' COLLATE 'utf8_general_ci',
`address_id` SMALLINT UNSIGNED NOT NULL,
`active` TINYINT NOT NULL,
`create_date` DATETIME NOT NULL,
`last_update` TIMESTAMP NOT NULL,
PRIMARY KEY (`customer_id`)
) ENGINE=InnoDB;

There are few caveats though.

  1. InnoDB doesn’t store all information you can find in the .frm file. For example, if a field is AUTO_INCREMENT, the InnoDB dictionary knows nothing about it. Therefore, sys_parser won’t recover that property. If there were any field or table level comments, they’ll be lost.
  2. sys_parser generates the table structure eligible for further data recovery. It could but it doesn’t recover secondary indexes, or foreign keys.
  3. InnoDB stores the DECIMAL type as a binary string. It doesn’t store the precision of a DECIMAL field. So, that information will be lost.

For example, table payment uses DECIMAL to store money.

# ./sys_parser -u root -p qwerty  -d sakila_recovered sakila/payment
CREATE TABLE `payment`(
        `payment_id` SMALLINT UNSIGNED NOT NULL,
        `customer_id` SMALLINT UNSIGNED NOT NULL,
        `staff_id` TINYINT UNSIGNED NOT NULL,
        `rental_id` INT,
        `amount` DECIMAL(6,0) NOT NULL,
        `payment_date` DATETIME NOT NULL,
        `last_update` TIMESTAMP NOT NULL,
        PRIMARY KEY (`payment_id`)
) ENGINE=InnoDB;

Fortunately, Oracle is planning to extend the InnoDB dictionary and finally get rid of .frm files. I salute that decision, having the structure in two places leads to inconsistencies.

Image credit: m.khajoo

InnoDB Dictionary

Why Does InnoDB Need A Dictionary

An InnoDB dictionary is a set of internal tables InnoDB uses to maintain various information about user tables. It serves as API between a human and the database. While the humans refer to tables by their names, InnoDB works with integer identifiers. The dictionary stores correspondence between table name and index_id.

The dictionary tables are normal InnoDB tables, but they’re not visible to a user. However, some versions of MySQL provide read-only access to the dictionary in information_schema database.

The dictionary is stored in ibdata1. Root page of SYS_TABLES, for example, has id 8, so it’s the eighth page from the beginning of ibdata1.

The dictionary pages are in REDUNDANT format even if you use MySQL 5.6. I hope to write more about record formats in future posts. For now, it’s enough to mention that REDUNDANT is the oldest record format. It was available since 4.0 and was the only format by that time.

SYS_TABLES

CREATE TABLE `SYS_TABLES` (
    `NAME` varchar(255) NOT NULL DEFAULT '',
    `ID` bigint(20) unsigned NOT NULL DEFAULT '0',
    `N_COLS` int(10) DEFAULT NULL,
    `TYPE` int(10) unsigned DEFAULT NULL,
    `MIX_ID` bigint(20) unsigned DEFAULT NULL,
    `MIX_LEN` int(10) unsigned DEFAULT NULL,
    `CLUSTER_NAME` varchar(255) DEFAULT NULL,
    `SPACE` int(10) unsigned DEFAULT NULL,
    PRIMARY KEY (`NAME`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1; 

All dictionary tables have predefined hard-coded index_ids. No surprise, as long as InnoDB doesn’t have a dictionary for the dictionary :). SYS_TABLES has index_id 0-1. Root page of SYS_TABLES is page id 8.

NAME is the human readable table name. Well, the database name and the table name. For example, for table sakila.actor NAME will be “sakila/actor”.

ID is the table identifier. Just an integer number. The table is referenced to by this table_id in other dictionary tables.

N_COLS is the number of columns in the table.

I will pretend TYPE, MIX_ID, MIX_LEN and CLUSTER_NAME are not important. Actually, I have no idea what these fields are. Anyway, they were never used for data recovery, so let’s leave them for future investigations.

SPACE is an identifier of a tablespace. Thus ibdata1 is SPACE 0, ibdata2 is SPACE 1, Every ibd file has its own SPACE identifier. SPACE tells in what tablespace the table’s indexes are stored.

Sample record for table sakila.actor:

# ./c_parser \
        -4f pages-1402815586/FIL_PAGE_INDEX/0-1 \
        | grep sakila/actor
...
"sakila/actor" 61 4 1 0 80 "" 47

“sakila/actor” is the table name obviously. 61 is the table id. 4 is the number of fields in the table:

CREATE TABLE `actor` (
  `actor_id` smallint(5) unsigned NOT NULL 
             AUTO_INCREMENT,
  `first_name` varchar(45) NOT NULL,
  `last_name` varchar(45) NOT NULL,
  `last_update` timestamp NOT NULL 
                DEFAULT CURRENT_TIMESTAMP
                ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`actor_id`),
  KEY `idx_actor_last_name` (`last_name`)
) ENGINE=InnoDB 
  AUTO_INCREMENT=201 
  DEFAULT CHARSET=utf8 

47 is the SPACE identifier. It would be 0 if innodb_file_per_table was disabled. It’s not in my environment:

 mysql> show variables like 'innodb_file_per_table';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| innodb_file_per_table | ON    |
+-----------------------+-------+
1 row in set (0.00 sec) 

SYS_INDEXES

Another important dictionary table is SYS_INDEXES. It stores information about InnoDB indexes.

Its index_id is 0-3 and the root page is stored in page 11.

CREATE TABLE `SYS_INDEXES` (
  `TABLE_ID` bigint(20) unsigned NOT NULL 
             DEFAULT '0',
  `ID` bigint(20) unsigned NOT NULL DEFAULT '0',
  `NAME` varchar(120) DEFAULT NULL,
  `N_FIELDS` int(10) unsigned DEFAULT NULL,
  `TYPE` int(10) unsigned DEFAULT NULL,
  `SPACE` int(10) unsigned DEFAULT NULL,
  `PAGE_NO` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`TABLE_ID`,`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Fields of the table:

TABLE_ID is the table identifier. It’s one from SYS_TABLES.

ID is the index identifier. This is a very important field. The matter is, every InnoDB page contains the index_id of the index it belongs to in the header. So, if we want to recover a table, we need to find all pages that belong to the PRIMARY index of the table, which means we need all pages with a specific index_id.

NAME is the name of an index. Table sakila.actor has two indexes: PRIMARY and idx_actor_last_name. So, these are names in SYS_INDEXES.

N_FIELDS is the number of fields in an index. idx_actor_last_name has one field, so is it in SYS_INDEXES.

TYPE is not an important field 🙂

SPACE is an InnoDB tablespace identifier showing where the index is stored. Probably, InnoDB developers foresaw the possibility to store indexes in different files, but for current MySQL versions it’s the same as in SYS_TABLES.

PAGE_NO is an identifier of the root page of the index. As you know, InnoDB stores indexes in B+tree data structure. Thus, root node of the B+tree is the root page identified by this field.

SYS_COLUMNS

SYS_COLUMNS describes individual fields in a table. It is stored as index_id 0-2.

 CREATE TABLE `SYS_COLUMNS` (
  `TABLE_ID` bigint(20) unsigned NOT NULL,
  `POS` int(10) unsigned NOT NULL,
  `NAME` varchar(255) DEFAULT NULL,
  `MTYPE` int(10) unsigned DEFAULT NULL,
  `PRTYPE` int(10) unsigned DEFAULT NULL,
  `LEN` int(10) unsigned DEFAULT NULL,
  `PREC` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`TABLE_ID`,`POS`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1; 

TABLE_ID is the table identifier, it comes from SYS_TABLES

POS is a relative position of the field in the table.

NAME is the name of the field.

MTYPE and PRTYPE encode the MySQL type of the field. InnoDB started its history as a separate product, not related to MySQL. These two fields are legacy from that time. There’s no strict correspondence between MySQL and the InnoDB type. InnoDB uses its own type that suits a particular MySQL type the best. Encoding and collation are stored here as well.

LEN is the length of the field.

PREC is precision for some MySQL types.

Isn’t it interesting to see an example of the table for sakila.actor? Here it goes. Remember, table_id is 61? We know that from SYS_TABLES.

61      0       "actor\_id"     6       1794    2       0
61      1       "first\_name"   12      2162959 135     0
61      2       "last\_name"    12      2162959 135     0
61      3       "last\_update"  3       525575  4       0 

As we see, smallint is two bytes, timestamp is four bytes. And varchar(45) is 135 bytes. Apparently, this is the maximum size, not the actual size of a field which is different for every record. But why 135, not 45? Because the field is utf8, where each character may use up to three bytes. 3 x 45 adds up to 135.

SYS_FIELDS

The last table for today (but not least) is SYS_FIELDS. It’s stored in index_id 0-4.
It lists individual fields on all indexes.

 CREATE TABLE `SYS_FIELDS` (
  `INDEX_ID` bigint(20) unsigned NOT NULL,
  `POS` int(10) unsigned NOT NULL,
  `COL_NAME` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`INDEX_ID`,`POS`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1; 

The table is quite simple.

INDEX_ID is the index identifier that we can find in SYS_INDEXES.

POS is the position of the field in the index.

And COL_NAME is the field name.

Why Do We Care?

Why do we need an InnoDB dictionary anyway?

There are two reasons. First, if we need to recover table A, we need to know index_id of its PRIMARY index. The easiest way to learn it is the InnoDB dictionary.

Second, as you see, the InnoDB dictionary stores some information about field names and types. Which means, in case we don’t know the table structure we can recover it from the dictionary.