Slow MySQL Start Time in GTID mode? Binary Log File Size May Be The Issue
Tuesday, January 8, 2019 at 10:44AM
Kristi Anderson in Database, GTID, InnoDB, MySQL, administration, data management, database, database scalability, databases, mysql, mysql load, mysql update frequent, recovery, replication, sql, sql

Have you been experiencing slow MySQL startup times in GTID mode? We recently ran into this issue on one of our MySQL hosting deployments and set out to solve the problem. In this blog, we break down the issue that could be slowing down your MySQL restart times, how to debug for your deployment, and what you can do to decrease your start time and improve your understanding of GTID-based replication.

How We Found The Problem

We were investigating slow MySQL startup times on a low-end, disk-based MySQL 5.7.21 deployment which had GTID mode enabled. The system was part of a master-slave pair and was under a moderate write load. When restarting during a scheduled maintenance, we noticed that the database server took 5-10 minutes to start up and begin accepting connections. That kind of delay didn’t make sense, so we set out to investigate.

Debugging Your Slow MySQL Start Time

We used the popular Percona tool pt-ioprofile to see what the database was doing. pt-ioprofile is a very important utility in Percona’s popular toolkit that’s used to debug MySQL issues, and you can see the complete list of features in their documentation. The pt-ioprofile tool uses strace and lsof to watch a process’s I/O and print out a table of files and I/O activity.

So, we started MySQL, waited for the mysqld process to get spawned, and started pt-ioprofile to see what the issue might be:

# pt-ioprofile --profile-process mysqld --run-time 200
Tue Oct 9 15:42:24 UTC 2018
Tracing process ID 18677
total      pread       read     pwrite      write      fsync  fdatasync       open      close   getdents      lseek      fcntl filename
...
216.550641   0.000000  216.550565   0.000000   0.000000   0.000000   0.000000   0.000015   0.000040   0.000000   0.000021   0.000000 /mysql_data/binlogs/mysql-bin.000014
...

What’s Slowing Your MySQL Restart?

On running this multiple times, we observed the following:

These observations clearly pointed to the fact that MySQL was spending a lot of time reading the latest binary log file. If the file was small, like it would be when the log file was flushed before a shutdown, the startup was fast.

Understanding Binlog GTID Recovery

As it turns out, in order to populate the values of gtid_executed and gtid_purged, the MySQL server must parse the binary log files.

Here’s the summary the MySQL 5.7 documentation method recommendation based on a FALSE or TRUE reading:

When binlog_gtid_simple_recovery = FALSE:

To compute gtid_executed:

This process can be very time consuming if there are a large number of binary log files without GTIDs, for example, created when gtid_mode=OFF.

Similarly, to compute gtid_purged:

So, this forms the basis of our understanding on how things used to work in older versions. However, certain optimizations can be made when binlog_gtid_simple_recovery is TRUE. This is the case we are interested in:

When binlog_gtid_simple_recovery = TRUE:

(Note, this is the default in MySQL 5.7.7 and later)

So, for MySQL versions 5.7.7 and above, the latest and the old binary log files are always read during system startup to correctly initialize GTID system variables. Reading the oldest binary log file isn’t as expensive since the event MySQL is looking for, Previous_gtids_log_event, is always the first event in a binary log file.

However, in order to correctly compute gtid_executed, the server must read through the entire latest binary log file and collect all the events in that file. So, the system startup time becomes directly proportional to the size of the latest binary log file.

Note that the situation is even worse when binlog_gtid_simple_recovery is FALSE. Since it is no longer the default option in recent releases, it is not much of a concern.

How To Resolve Your Slow Start Time

Having understood the cause of the issue we were running into, the solution we decided on was fairly obvious – reduce the size of the binary log files. The default size of binary log files is 1GB. It takes time to parse through a file of this size during startup, so it makes sense to decrease the value of max_binlog_size to a lower value.

If decreasing the size of the binary log file is not an option, then flushing the binary log files just before a maintenance shutdown of the mysqld process can help to decrease the binlog GTID recovery times.

Article originally appeared on (http://highscalability.com/).
See website for complete article licensing information.