Introduction
As software engineers, we have used debugging tools like printing statements using console.log or using debugger with breakpoints on our favourite IDE or browser. Similarly, for MySQL databases, there are diagnostic queries that can help us debug too.
In this article, I’ll share more about MySQL diagnostic queries that you can add into your debugging toolkit.
What Went Wrong?
I recently encountered a case where my database migration was stuck for a long time and I had to decide what to do next. To give a bit of context, I am working on a Go application with a MySQL 8.0 database using the InnoDB engine. The application uses Goose for database migrations which relies on migration files to perform data definition language (DDL) queries, such as ALTER TABLE ADD COLUMN.
Whenever we need to modify the database schema, we’d create a .sql migration file to apply the changes and in this case, a migration file was created with the following DDL queries:
ALTER TABLE `table_a` ADD COLUMN `column_b`;
CREATE INDEX column_b_idx ON `table_a` (column_b);The database migration was executed and even after an hour, there was still no feedback whether the migration has completed or failed. The table that I was attempting to add indexes for had a whopping ~30 million worth of data! So it seems to make sense that it takes some time for the migration to finish running, however, more than an hour was unusually long.
In such situation, we could either:
- Abort the migration, retry, and pray that it works now
- Do nothing
- Figure out the root cause for the long-running migration then decide what to do next
If we choose option 1, we risk aborting mid-operation and does that mean that the index creation is incomplete? Aborting a database operation like this seems like pressing on that giant red button that screams danger. Unlike Dee Dee, I had to figure out what exactly that red button does.
I personally also value the importance of identifying the root cause before we can be sure of the right solution for the problem instead of attempting blind fixes. Both option 1 and 2 are solutions, so I opted for option 3.
But how can we figure this out? We’d need to identify what queries are currently running at this point of time so let’s dive into MySQL diagnostics queries in the next section.
The Smoking Gun: What SHOW PROCESSLIST Revealed
MySQL gives us built-in tools to peek under the hood and see exactly what’s happening. In this particular case, I needed to know what queries are still currently running so I queried SHOW processlist;. And there it was - the smoking gun that revealed exactly what was blocking our migration:
*************************** 1. ***************************
Id: 1
Time: 4210 -- seconds
State: altering table
Info: OPTIMIZE TABLE table_a
*************************** 2. ***************************
Id: 2
Time: 3611 -- seconds
State: waiting for table metadata lock
Info: ALTER TABLE `table_a` ADD COLUMN `column_b`Analysing the above output, it’s now clear that:
- Process 1: There is an ongoing
OPTIMIZE TABLEquery that has acquired the metadata lock fortable_afor over an hour - Process 2: The database migration applied to the same
table_ais stuck in a queue and has not been executed yet. It is waiting for the lock from process 1 to be released before it can acquire the exclusive lock to execute the query
Previously, I hypothesised that the delay was due to the large number of indexes that have to be created because of the table size. However, a simple query like SHOW processlist; proved me wrong; we’re still attempting to add the new column.
The application has a daily maintenance job to delete stale data, and once the clean up job is done, an OPTIMIZE TABLE query is executed to defragment the table. OPTIMIZE TABLE can be a time consuming query especially if the table size is huge, and this is blocking the database migration from executing.
Now we know for sure that aborting the query is safe since it has not been executed yet. Once OPTIMIZE TABLE finally finished and released the lock, our migration completed in just under 20 minutes. The issue wasn’t our migration at all - it was just stuck waiting in the queue.
Plugging the Gaps
This debugging marathon has revealed so much about MySQL metadata locks. Now we can turn that pain into gain with some proactive safeguards. Based on the experience so far, a few pain points stood out:
- Lack of observability to identify conflicting operations
- Late feedback on the success/failure of the operation
Build Visibility
Given that this article centers around diagnostic queries, let’s explore some diagnostic queries that can help to provide some observability about the database state so we can identify conflicting operations early.
Firstly, let’s understand what metadata lock is and in what situations this happens. MySQL databases uses metadata lock to ensure data consistency when there are concurrent changes to the table taking place. For example:
OPTIMIZE TABLEALTER TABLECREATE INDEX
MySQL exposes current state via built‑in tables such as performance_schema.metadata_locks and information_schema.processlist. By logging the result of the below SQL query, we can identify the potential list of conflicting operations before running database migration.
SELECT
p.ID,
ml.OBJECT_NAME, -- table name
p.TIME, -- duration of lock acquired
p.STATE, -- state of the query
p.INFO -- query that is acquiring the lock
FROM performance_schema.metadata_locks ml
JOIN information_schema.PROCESSLIST p ON ml.OWNER_THREAD_ID = p.ID
WHERE ml.LOCK_STATUS = 'GRANTED' AND ml.LOCK_TYPE = 'EXCLUSIVE';Fail Fast, Configure Timeouts
We can also set a lock_wait_timeout to abort the operation to receive early feedback about the failure. I decided on 30 minutes max wait time because given the current size of the table which took ~20mins for index creation to complete running, 30 minutes seems like a reasonable wait time. But you may want to adjust it according to how long you might typically expect your DDL operations to complete!
SET SESSION lock_wait_timeout = 1800; -- 30 minutes max wait for metadata locks
ALTER TABLE `table_a` ADD COLUMN `column_b`;
CREATE INDEX column_b_idx ON `table_a` (column_b);Although it’s pretty trivial to debug once we understand the various diagnostic queries that we can use, I think it’s generally more robust if such observability and practices are also baked into the database migration process. With these two methods in place, we get early feedback when migrations are stuck. More importantly, we can see exactly which queries are causing the blockage - no more guesswork if it happens again in the future!
Conclusion
Sometimes the best debugging sessions are the ones that teach you something completely unexpected. I went in thinking our migration was slow because of the table size, but discovered it wasn’t even running yet - just waiting in line behind a daily maintenance operation that I never knew it existed.
This experience reinforced that investigating root causes beats blind fixes every time. The diagnostic queries we explored also proved to be an essential part of any engineer’s debugging toolkit.