Disk IO Issues - please help!

MarkPW · Mar 27, 2013

I'm experiencing some serious disk IO issues on my web server and I can't seem to pinpoint the problem. During peak and off peak (to a lesser extent) my disk IO is maxing out constantly 98-100% and rarely drops below 80% even during off peak.

I'm running 2-CPU Litespeed on 2 x Harpertown 5410, 4G RAM, 2x750G SATA2, hardware RAID 1, CentOS 4.5/cPanel.
DB server is separate and there are no problems there.

I've been running LS for about 4 years, never have I had disk IO issues and this time last year we had 75% more traffic so the current issues are making no sense at all. Requests per second according to Litespeed real time stats is max. 80 per sec.

I and my host have been looking to see if there is any obscure logging but we have found nothing.

Steps I've taken so far to resolve or at least improve the situation:

- Upgrade from 4.0.12 to 4.2.2
- Disable access logs (domlogs already disabled)
- Enable AIO
- Lower Max connections/Max keep-live requests from 2000 to 1000
- Increase Max MMAP File Size from 256K to 1M
- Increase Total MMAP Cache Size from 40M to 128M

I'm seeing no improvement at all.

The only changes on the server since this time last year are we have upgrade PHP to 5.3.21 and upgraded to vBulletin 4.2. I noticed no IO problems after the upgrades and the problem seems to have worsened over the last week with no traffic increase.
We are a download site, but only serve about 20-30 downloads per minute at the moment which is much lower than in previous years.

iostat output (70% of peak):

Code:

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda          5.00 142.00 123.50 34.50 4468.00 1420.00  2234.00   710.00    37.27    23.25  137.32   6.33 100.05
dm-0         0.00   0.00 131.50 177.50 4468.00 1420.00  2234.00   710.00    19.06    40.54  125.97   3.24 100.05
dm-1         0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda          0.50 328.22 66.83 135.15 1366.34 3706.93   683.17  1853.47    25.12    68.24  338.74   4.94  99.70
dm-0         0.00   0.00 67.33 463.37 1366.34 3706.93   683.17  1853.47     9.56   260.83  491.82   1.88  99.70
dm-1         0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda          2.50 245.00 102.00 84.50 2252.00 3392.00  1126.00  1696.00    30.26    42.75  219.96   5.36 100.05
dm-0         0.00   0.00 101.50 424.00 2252.00 3392.00  1126.00  1696.00    10.74   201.47  379.76   1.90 100.05
dm-1         0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

I'd really appreciate some help.

webizen · Mar 27, 2013

You have more writes than reads. that's likely the reason of high i/o. so download (assuming only for disk read) is not the cause. Run tool like 'atop' or 'iotop' to see which process causing the i/o.

MarkPW · Mar 29, 2013

I investigated this and kjournald is writing the most alongside pdflush:

Code:

TASK                   PID      TOTAL       READ      WRITE      DIRTY DEVICES
kjournald              593       1498          0       1498          0 dm-0
pdflush              27441        425          0        425          0 dm-0, loop0
lsphp5               18169        279         22        257          0 dm-0
lsphp5               15794        240         54        186          0 dm-0
kjournald             3092        123          0        123          0 loop0
lsphp5               16402        118         25         93          0 dm-0
lsphp5               18179        101        101          0          0 dm-0

I've not seen this problem before. Only since upgrading Litespeed and PHP have I seen this. I wonder if there is some sort of problem with my Litespeed installation as I'm having trouble stopping it. I wanted to check if kjournald continued high writes with LS stopped, but it just won't stop with "service lsws stop" or "/etc/init.d/lsws stop". When I did manage to stop it a few days ago I noticed via iostat that disk i/o dropped to pretty much 0.

Is there anything I can do to troubleshoot further?

Thanks for your help.

webizen · Mar 30, 2013

Are you suggesting LS causing the I/O? I am a little confused. From your first post, LS (running for quite a while with no issue) is upgraded to help alleviate the problem and it does a bit.

Anyway, you can google "kjournald high IO" for more discussions/answers.

Disk IO Issues - please help!

MarkPW

Member

webizen

Well-Known Member

MarkPW

Member

webizen

Well-Known Member