MySQL: wait for table metadata lock

问题

mysql> status
--------------
mysql  Ver 14.14 Distrib 5.7.25, for Linux (x86_64) using  EditLine wrapper

Connection id:      17190053
Current database:   gkbb
Current user:       root@10.9.165.246
SSL:            Not in use
Current pager:      less
Using outfile:      ''
Using delimiter:    ;
Server version:     5.5.5-10.1.26-MariaDB MariaDB Server
Protocol version:   10
Connection:     10.9.108.125 via TCP/IP
Server characterset:    utf8
Db     characterset:    utf8
Client characterset:    utf8
Conn.  characterset:    utf8
TCP port:       3306
Uptime:         13 days 21 hours 14 min 38 sec

Threads: 276  Questions: 31378648  Slow queries: 212  Opens: 2977  Flush tables: 1  Open tables: 2761  Queries per second avg: 26.155
--------------

mysql> show global variables like "innodb_version";
+----------------+-------------+
| Variable_name  | Value       |
+----------------+-------------+
| innodb_version | 5.6.36-82.1 |
+----------------+-------------+
1 row in set (0.06 sec)

PS: 查看 status 还有一个快捷方式 \s

编辑测试库表结构(添加字段),卡住,任何操作都不行了,等一个多小时,还是不行。。
还一度怀疑是不是表结构设计问题,字段、数据是不是太多了。

过程

偶尔想起看看会话情况:

SELECT * FROM information_schema.processlist WHERE db = 'mydb';

或命令:

  • mysqladmin -uroot -p123456 processlist
  • mysql -uroot -p123456 -e 'SHOW PROCESSLIST'

看到里面好几个会话的状态都是 wait for table metadata lock,这就有点奇怪了,之前没有见过。

网上的资料显示:

为了在并发环境下维护表元数据的数据一致性,在表上有活动事务(显式或隐式)的时候,不可以对元数据进行写入操作。因此 MySQL 引入了 metadata lock ,来保护表的元数据信息。
因此在对表进行上述操作时,如果表上有活动事务(未提交或回滚),请求写入的会话会等待在 Metadata lock wait 。

如果资料没错,那么就是说,如果有事务没有结束,DDL 操作请求 MDL(metadata lock)时会卡住这张表。

我想起我们的服务中确实存在会话没有关闭的情况。

  1. 用了 SQLAlchemy 做 ORM
  2. 每次查询都使用一个会话,包括 SELECT
  3. 增删改操作都立即 commit 了,SELECT 却没有(记得是有个什么原因特意如此)

合理怀疑:这个查询 SESSION 没有关闭,导致 ALTER 语句进入 MDL 等待状态,然后导致了表无法进行任何操作(包括查询,至于为什么这样,我不知道)。

本地复现

  1. 开两个终端,分别建立 MySQL 连接。
  2. 其中一个终端(A):
    1. SET SESSION auto_commit = 0;
    2. SELECT * FROM test.test LIMIT 1;
  3. 另一个终端(B)只需要:TRUNCATE test.test;,然后发现:卡住了。

PS:
1. DDL 需要 metadata 锁。
2. TRUNCATE 属于 DDL,可能因为其非事务性(不支持提交和回滚)。参考:https://dba.stackexchange.com/questions/36607/why-is-truncate-ddl

现在,回到终端 A:

mysql> select * from information_schema.processlist where db = 'test';
+----+------+-----------+------+---------+------+---------------------------------+----------------------------------------------------------------+
| ID | USER | HOST      | DB   | COMMAND | TIME | STATE                           | INFO                                                           |
+----+------+-----------+------+---------+------+---------------------------------+----------------------------------------------------------------+
|  3 | root | localhost | test | Query   |    0 | executing                       | select * from information_schema.processlist where db = 'test' |
|  5 | root | localhost | test | Query   | 6111 | Waiting for table metadata lock | truncate test                                                  |
+----+------+-----------+------+---------+------+---------------------------------+----------------------------------------------------------------+
2 rows in set (0.00 sec)

mysql> select * from information_schema.innodb_trx\G
*************************** 1. row ***************************
                    trx_id: 421232684444408
                 trx_state: RUNNING
               trx_started: 2019-03-29 16:06:14
     trx_requested_lock_id: NULL
          trx_wait_started: NULL
                trx_weight: 0
       trx_mysql_thread_id: 3
                 trx_query: select * from information_schema.innodb_trx
       trx_operation_state: NULL
         trx_tables_in_use: 0
         trx_tables_locked: 0
          trx_lock_structs: 0
     trx_lock_memory_bytes: 1136
           trx_rows_locked: 0
         trx_rows_modified: 0
   trx_concurrency_tickets: 0
       trx_isolation_level: REPEATABLE READ
         trx_unique_checks: 1
    trx_foreign_key_checks: 1
trx_last_foreign_key_error: NULL
 trx_adaptive_hash_latched: 0
 trx_adaptive_hash_timeout: 0
          trx_is_read_only: 0
trx_autocommit_non_locking: 0
1 row in set (0.00 sec)

mysql> show engine innodb status\G
*************************** 1. row ***************************
  Type: InnoDB
  Name: 
Status: 
=====================================
2019-03-29 19:04:40 0x7f1bcc1d6700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 3 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 7 srv_active, 0 srv_shutdown, 11228 srv_idle
srv_master_thread log flush and writes: 11234
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 16
OS WAIT ARRAY INFO: signal count 10
RW-shared spins 0, rounds 27, OS waits 12
RW-excl spins 0, rounds 32, OS waits 0
RW-sx spins 0, rounds 0, OS waits 0
Spin rounds per wait: 27.00 RW-shared, 32.00 RW-excl, 0.00 RW-sx
------------
TRANSACTIONS
------------
Trx id counter 54542
Purge done for trx's n:o < 54542 undo n:o < 0 state: running but idle
History list length 53
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421232684445328, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421232684443488, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (write thread)
I/O thread 7 state: waiting for completed aio requests (write thread)
I/O thread 8 state: waiting for completed aio requests (write thread)
I/O thread 9 state: waiting for completed aio requests (write thread)
Pending normal aio reads: [0, 0, 0, 0] , aio writes: [0, 0, 0, 0] ,
 ibuf aio reads:, log i/o's:, sync i/o's:
Pending flushes (fsync) log: 0; buffer pool: 0
639 OS file reads, 99 OS file writes, 21 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 0, seg size 2, 4 merges
merged operations:
 insert 0, delete mark 0, delete 0
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 34679, node heap has 0 buffer(s)
Hash table size 34679, node heap has 0 buffer(s)
Hash table size 34679, node heap has 0 buffer(s)
Hash table size 34679, node heap has 0 buffer(s)
Hash table size 34679, node heap has 0 buffer(s)
Hash table size 34679, node heap has 0 buffer(s)
Hash table size 34679, node heap has 0 buffer(s)
Hash table size 34679, node heap has 0 buffer(s)
0.00 hash searches/s, 0.00 non-hash searches/s
---
LOG
---
Log sequence number 26417867
Log flushed up to   26417867
Pages flushed up to 26417867
Last checkpoint at  26417858
0 pending log flushes, 0 pending chkp writes
17 log i/o's done, 0.00 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 137428992
Dictionary memory allocated 133441
Buffer pool size   8192
Free buffers       7710
Database pages     482
Old database pages 0
Modified db pages  0
Pending reads      0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 0, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 446, created 41, written 72
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 482, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
1 read views open inside InnoDB
Process ID=4492, Main thread ID=139757271107328, state: sleeping
Number of rows inserted 6, updated 0, deleted 0, read 20
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================

1 row in set (0.00 sec)

表明:事务存在,TRUNCATE 锁等待。

如果,kill 3 干掉这个没有 commit 的查询 SESSION,TRUNCATE 就会正常执行下去。

结论

  1. 不管怎样,应该保持事务的短小精干,快速执行和退出。
    PS:业务代码中 SELECT 之前为什么不提交的问题需要进一步检查。
  2. DDL 需要 MDL,而没有结束的事务会阻止 MDL。
    更多细节,需要更多深入了解。