Finding disk bottlenecks on Linux

We recently had a client that had some issues with their site slowing down. They thought initially it was due to MySQL locks but this was not the case.  It was clear that the problem was with the disk. Some process was utilizing the disk. When running top we could see that the CPU wait time was 25-30%.

Also running vmstat we could see the wait time was quite high, so the question was which process was causing the issue. Interestingly doing a Google web search brings up almost no coherent posts on finding disk bottlenecks. The solution is good old iostat. That provides the information about the disk read and writes per partition but it does not tell you which process is causing the disk i/o. The later versions of linux kernel provide quite good diagnostic information about the disk i/o but this is not documented in the reasonably popular older posts on the subject of disk thrashing.

For the lastest kernel versions you can use iotop to pinpoint the process that is specifically doing the disk i/o. To do this:

1. Start iotop

2. press the left arrow twice so that the sort field is on disk write.

3. You will now be in real time mode of which process i is writing to the disk so you can see specficially

4. If you wish to get a historic view of writes to date then press ‘a’ again (just press ‘a’ one more time to switch back).

In this clients case the issue was their temp directory was on the same physical drive as their site and MySQL DB. Moving the temp diretory to a separate drive resolved the issue.