Can I/O latency cause a simple UPDATE to take seconds in MySQL?
My MySQL application is experiencing slow performance when running some UPDATE
, INSERT
and DELETE
queries. In this question, I will only discuss one particular UPDATE
, because it's enough to demonstrate the problem:
UPDATE projects SET ring = 5 WHERE id = 1
This UPDATE
is usually fast enough, around 0.2ms, but every now and then (enough to be a problem) it takes several seconds . Here's an excerpt from the log (look at the 4th line):
~ (0.000282) UPDATE `projects` SET `ring` = 5 WHERE `id` = 1
~ (0.000214) UPDATE `projects` SET `ring` = 6 WHERE `id` = 1
~ (0.000238) UPDATE `projects` SET `ring` = 7 WHERE `id` = 1
~ (3.986502) UPDATE `projects` SET `ring` = 8 WHERE `id` = 1
~ (0.000186) UPDATE `projects` SET `ring` = 9 WHERE `id` = 1
~ (0.000217) UPDATE `projects` SET `ring` = 0 WHERE `id` = 1
~ (0.000162) UPDATE `projects` SET `ring` = 1 WHERE `id` = 1
projects
is an InnoDB table with 6 columns of types INT
and VARCHAR
, 17 rows and an index on id
. It happens with other tables too, but here I'm focusing on this one. When trying to solve the problem, I ensured that the queries were all sequential, so this is not a lock issue . The UPDATE
above is executed in the context of a transaction. Other information on the server :
The "was" bit above means it didn't work before or after the upgrade.
What I've tried so far, to no avail :
innodb_flush_log_at_trx_commit
to 0, 1 or 2 innodb_locks_unsafe_for_binlog
on or off timed_mutexes
on or off innodb_flush_method
from the default to O_DSYNC
or O_DIRECT
innodb_buffer_pool_size
from the default to 600M and then to 3000M innodb_log_file_size
from the default to 128M SHOW PROCESSLIST
, which informs me that the state is "updating" SHOW PROFILE ALL
, which says that almost all the time was spent on "updating", and that, within that step, not so much time was spent on CPU cycles and there were many voluntary context switches (like 30) SHOW STATUS
for changes in Innodb_buffer_pool_pages_dirty
. There may be some relation between dirty pages being flushed and the slow queries, but the correlation isn't clear. Then I decided to check the system's I/O latency with ioping
. This is my first VPS, so I was surprised to see this result :
4096 bytes from . (vzfs /dev/vzfs): request=1 time=249.2 ms
4096 bytes from . (vzfs /dev/vzfs): request=2 time=12.3 ms
4096 bytes from . (vzfs /dev/vzfs): request=3 time=110.5 ms
4096 bytes from . (vzfs /dev/vzfs): request=4 time=232.8 ms
4096 bytes from . (vzfs /dev/vzfs): request=5 time=294.4 ms
4096 bytes from . (vzfs /dev/vzfs): request=6 time=704.7 ms
4096 bytes from . (vzfs /dev/vzfs): request=7 time=1115.0 ms
4096 bytes from . (vzfs /dev/vzfs): request=8 time=209.7 ms
4096 bytes from . (vzfs /dev/vzfs): request=9 time=64.2 ms
4096 bytes from . (vzfs /dev/vzfs): request=10 time=396.2 ms
Pretty erratic, I would say.
Having said all of that, I ask:
Can the I/O latency be occasionally killing MySQL performance? I always thought that, when you ran an UPDATE
, the thread taking care of that connection wasn't going to flush data to disk or wait for such a flush; it would return immediately and the flushing would be done by another thread at another time.
If it can't be disk I/O, is there anything else I can possibly try, short of renting a dedicated server?
I'm replying to my own question with additional data that I collected based on your answers.
I used two notebooks connected by means of a wireless network. On notebook A, I mounted a directory of notebook B using sshfs
. Then on notebook AI started up MySQL specifying that mounted directory as its data directory. This should provide MySQL with a very slow I/O device. MySQL was started with innodb_flush_log_at_trx_commit = 0
.
I defined 3 sets of queries, each set consisting of an update and a select query repeated 10,000 times, without explicit transactions. The experiments were:
Each set was run twice using a shell script (hence timings are slower than those in my original question), one under normal conditions and the other after executing the following command:
tc qdisc replace dev wlan0 root handle 1:0 netem delay 200ms
The command above adds a mean delay of 200ms when transmitting packets through wlan0.
First, here's the mean time of the top 99% fastest updates and selects, and the bottom 1% updates and selects.
| Delay: 0ms | Delay: 200ms |
| US1SID | US1MID | US2MID | US1SID | US1MID | US2MID |
| top99%u | 0.0064 | 0.0064 | 0.0064 | 0.0063 | 0.0063 | 0.0063 |
| top99%s | 0.0062 | 0.0063 | 0.0063 | 0.0062 | 0.0062 | 0.0062 |
| bot01%u | 1.1834 | 1.2239 | 0.9561 | 1.9461 | 1.7492 | 1.9731 |
| bot01%s | 0.4600 | 0.5391 | 0.3417 | 1.4424 | 1.1557 | 1.6426 |
As is clear, even with really, really poor I/O performance, MySQL manages to execute most queries really fast. But what concerns me the most is the worst cases, so here's another table, showing the 10 slowest queries. A "u" means it was an update, an "s" a select.
| Delay: 0ms | Delay: 200ms |
| US1SID | US1MID | US2MID | US1SID | US1MID | US2MID |
| 5.443 u | 5.946 u | 5.315 u | 11.500 u | 10.860 u | 11.424 s |
| 5.581 u | 5.954 s | 5.466 u | 11.649 s | 10.995 u | 11.496 s |
| 5.863 s | 6.291 u | 5.658 u | 12.551 s | 11.020 u | 12.221 s |
| 6.192 u | 6.513 u | 5.685 u | 12.893 s | 11.370 s | 12.599 u |
| 6.560 u | 6.521 u | 5.736 u | 13.526 u | 11.387 u | 12.803 u |
| 6.562 u | 6.555 u | 5.743 u | 13.997 s | 11.497 u | 12.920 u |
| 6.872 u | 6.575 u | 5.869 u | 14.662 u | 12.825 u | 13.625 u |
| 6.887 u | 7.908 u | 5.996 u | 19.953 u | 12.860 u | 13.828 s |
| 6.937 u | 8.100 u | 6.330 u | 20.623 u | 14.015 u | 16.292 u |
| 8.665 u | 8.298 u | 6.893 u | 27.102 u | 22.042 s | 17.131 u |
Conclusions:
Poor I/O performance can indeed slow MySQL to a crawl. It's not clear why or when exactly, but it does happen.
The slowing down applies to both selects and updates, with updates suffering more.
For some reason, even selects on a table that wasn't involved in any changes, and which had recently been populated, were also slowed down, as is clear from US2MID above.
As for the test cases proposed by mentatkgs, it seems that updating different rows instead of the same ones does help a little, but doesn't solve the problem.
I guess I will either adapt my software to tolerate such delays or try to move to another provider. Renting a dedicated server is too expensive for this project.
Thank you all for the comments.
As you are hosting your VPS in the cloud, you might be running into issues that are completely out of your control.
VPSs are subject to the whims of the host servers that run them. For example, CPU cycle priority at the Rackspace Cloud is weighted based on the size of the VPS. The bigger your VPS, the better likelihood that your app will perform smoothly. If there is a bigger VPS on the host you're using, it's possible that there is weighted bursting to blame. It's pretty hard to say.
Have you tried running this locally on your own machine? If it runs perfectly on your own system, and you need guaranteed performance, then your best bet will be to move to a dedicated server.
You have a VPS-related IO problem. It's not MySQL's fault.
Are you by any chance using Elastic Block Store with Amazon, or possibly RDS? Both of those use remote storage, and an IP protocol layer to talk to the storage; they can have nasty lag at times.
链接地址: http://www.djcxy.com/p/58226.html上一篇: 告诉配置脚本使用相对路径