View Full Version : Quick Edit and iowait
anewday
04-18-2008, 11:55 PM
On my forums, occassionally when a user clicks on the Complete button using the quick edit function..it causes the iowait of the server to spike as much as 40%. As a result, it takes more than 5 seconds for it to be processed, should be a second only. This happens during not so busy times, load under 1 with barely any iowait. The weird thing is I almost never see this type of behavior with Apache. :confused:
mistwang
04-19-2008, 07:58 AM
I/O wait is most likely cuased by heavy random access to HD, I don't know if there is really any corelation with quick edit. If yes, you need to check the HTTP request used for quick edit and find out why.
anewday
04-19-2008, 08:39 AM
How can I find the http request for quick edit?
anewday
04-19-2008, 01:39 PM
Ok, I met this 3 times today and I was able to see consistencies in the error_log file.
2008-04-19 15:37:48.666 INFO [AdminPHP] pid list size: 1
2008-04-19 15:37:48.666 INFO [AdminPHP] add child process pid: 23637
The above 2 lines appeared each time. the pid is of course different.
anewday
04-19-2008, 10:12 PM
George, do you think it has anything to do with the Request/Response settings? I am using the default values.
anewday
04-20-2008, 12:28 PM
It happened again, this time I did a top and sorted it by user:
Tasks: 158 total, 1 running, 157 sleeping, 0 stopped, 0 zombie
Cpu(s): 15.4% us, 3.5% sy, 0.5% ni, 30.5% id, 49.9% wa, 0.3% hi, 0.0% si
Mem: 3114144k total, 3095836k used, 18308k free, 12976k buffers
Swap: 2096440k total, 17304k used, 2079136k free, 2359324k cached
PID USER PR NI %CPU TIME+ %MEM VIRT RES SHR S COMMAND
28656 nobody 17 0 2 0:00.05 0.3 78572 7840 3956 S lsphp4
24031 nobody 15 0 0 3:33.35 0.3 11948 9180 3616 S lshttpd
3339 nobody 26 10 0 0:47.98 0.0 5800 1396 844 S proftpd: (accepting connections)
24030 nobody 15 0 0 0:00.30 0.1 5380 2440 1112 S lshttpd
24039 nobody 15 0 0 0:00.04 0.1 75440 2740 1988 S lsphp4
24552 nobody 16 0 0 2:20.38 0.7 79996 20m 15m S lsphp4
25544 nobody 16 0 0 2:23.60 0.6 79016 19m 14m S lsphp4
26915 nobody 16 0 0 1:16.68 0.6 79808 17m 12m S lsphp4
27096 nobody 16 0 0 0:33.31 0.6 79548 17m 12m S lsphp4
28522 nobody 20 0 0 0:02.71 0.4 79520 11m 6936 S lsphp4
28533 nobody 15 0 0 0:01.02 0.3 78932 9576 5276 S lsphp4
28539 nobody 20 0 0 0:01.97 0.4 79844 11m 6576 S lsphp4
28540 nobody 17 0 0 0:00.02 0.2 77680 6460 3460 S lsphp4
28541 nobody 17 0 0 0:00.07 0.3 78636 8480 4572 S lsphp4
28546 nobody 17 0 0 0:00.06 0.3 78504 8064 4272 S lsphp4
28548 nobody 15 0 0 0:00.34 0.3 79832 9380 4504 S lsphp4
28558 nobody 20 0 0 0:00.50 0.3 79548 9524 4856 S lsphp4
28559 nobody 20 0 0 0:00.45 0.3 79276 9196 4772 S lsphp4
28565 nobody 20 0 0 0:00.46 0.3 79488 9320 4708 S lsphp4
28566 nobody 20 0 0 0:00.66 0.4 79504 10m 6516 S lsphp4
28567 nobody 17 0 0 0:00.07 0.3 78504 7956 4192 S lsphp4
28573 nobody 15 0 0 0:00.36 0.3 79832 9380 4504 S lsphp4
28574 nobody 19 0 0 0:00.30 0.3 78960 8888 4696 S lsphp4
28575 nobody 19 0 0 0:01.14 0.3 79832 10m 5904 S lsphp4
28587 nobody 16 0 0 0:00.37 0.3 78664 10m 6384 S lsphp4
28592 nobody 16 0 0 0:00.31 0.3 79004 9460 5152 S lsphp4
28594 nobody 16 0 0 0:01.83 0.4 79080 10m 6784 S lsphp4
28595 nobody 16 0 0 0:00.52 0.3 78552 8460 4540 S lsphp4
28601 nobody 19 0 0 0:01.38 0.3 79900 10m 5752 S lsphp4
28602 nobody 16 0 0 0:00.99 0.3 78400 9352 5584 S lsphp4
28605 nobody 17 0 0 0:00.02 0.2 77680 6460 3460 S lsphp4
28607 nobody 17 0 0 0:00.23 0.3 78884 9912 5788 S lsphp4
28613 nobody 20 0 0 0:00.44 0.3 79284 9108 4700 S lsphp4
28614 nobody 17 0 0 0:00.10 0.3 78444 7952 4216 S lsphp4
28620 nobody 19 0 0 0:00.35 0.3 79832 9380 4504 S lsphp4
28626 nobody 19 0 0 0:00.32 0.3 79732 9428 4608 S lsphp4
28627 nobody 17 0 0 0:00.25 0.3 78864 8880 4768 S lsphp4
28633 nobody 19 0 0 0:01.71 0.3 79736 10m 5676 S lsphp4
28639 nobody 16 0 0 0:00.82 0.3 78616 8356 4376 S lsphp4
28645 nobody 18 0 0 0:00.06 0.3 78500 7888 4144 S lsphp4
28646 nobody 18 0 0 0:00.07 0.3 78504 7956 4192 S lsphp4
anewday
04-20-2008, 12:31 PM
Won't let me paste that much in the message.
After about 5 minutes, this is the output when iowait returned to normal.
top - 14:28:51 up 245 days, 17:35, 1 user, load average: 1.41, 1.88, 1.22
Tasks: 160 total, 3 running, 157 sleeping, 0 stopped, 0 zombie
Cpu(s): 25.2% us, 5.5% sy, 10.6% ni, 58.2% id, 0.5% wa, 0.0% hi, 0.0% si
Mem: 3114144k total, 3095708k used, 18436k free, 13744k buffers
Swap: 2096440k total, 17232k used, 2079208k free, 2354344k cached
PID USER PR NI %CPU TIME+ %MEM VIRT RES SHR S COMMAND
28758 nobody 16 0 30 0:17.87 0.4 79760 13m 8428 S lsphp4
24552 nobody 16 0 28 2:33.19 0.7 79932 20m 15m R lsphp4
24031 nobody 15 0 0 3:34.38 0.3 11372 8628 3064 S lshttpd
3339 nobody 26 10 0 0:47.98 0.0 5800 1396 844 S proftpd: (accepting connections)
24030 nobody 15 0 0 0:00.31 0.1 5380 2440 1112 S lshttpd
24039 nobody 16 0 0 0:00.04 0.1 75440 2740 1988 S lsphp4
25544 nobody 16 0 0 2:23.60 0.6 79016 19m 14m S lsphp4
28522 nobody 16 0 0 0:02.71 0.4 78964 11m 6936 S lsphp4
28533 nobody 16 0 0 0:01.03 0.3 78820 9576 5388 S lsphp4
28540 nobody 16 0 0 0:00.02 0.2 77680 6532 3532 S lsphp4
28546 nobody 16 0 0 0:00.06 0.3 78504 8352 4552 S lsphp4
28559 nobody 16 0 0 0:00.46 0.3 78508 8672 4796 S lsphp4
28566 nobody 16 0 0 0:00.66 0.3 78808 10m 6516 S lsphp4
28567 nobody 16 0 0 0:00.09 0.3 78700 8496 4428 S lsphp4
28574 nobody 16 0 0 0:00.30 0.3 78652 8736 4720 S lsphp4
28587 nobody 16 0 0 0:00.39 0.3 78728 10m 6452 S lsphp4
28601 nobody 16 0 0 0:01.38 0.3 78760 9876 5752 S lsphp4
28602 nobody 16 0 0 0:01.02 0.3 78820 9900 5712 S lsphp4
28605 nobody 16 0 0 0:00.02 0.2 77680 6532 3532 S lsphp4
28607 nobody 16 0 0 0:00.24 0.3 78360 9520 5788 S lsphp4
28613 nobody 16 0 0 0:00.44 0.3 78388 8456 4700 S lsphp4
28614 nobody 16 0 0 0:00.13 0.3 78848 8628 4408 S lsphp4
28620 nobody 16 0 0 0:00.36 0.3 78692 8588 4532 S lsphp4
28626 nobody 16 0 0 0:00.32 0.3 78668 8668 4636 S lsphp4
28627 nobody 16 0 0 0:00.25 0.3 78588 8724 4768 S lsphp4
28633 nobody 16 0 0 0:01.71 0.3 78696 9740 5676 S lsphp4
28645 nobody 16 0 0 0:00.09 0.3 78800 8716 4548 S lsphp4
28646 nobody 16 0 0 0:00.08 0.3 78704 8500 4428 S lsphp4
28662 nobody 16 0 0 0:00.02 0.2 77680 6532 3532 S lsphp4
28670 nobody 16 0 0 0:00.58 0.3 78828 9068 4872 S lsphp4
28671 nobody 16 0 0 0:00.11 0.3 78848 8628 4408 S lsphp4
28700 nobody 16 0 0 0:00.89 0.3 78400 8676 4912 S lsphp4
28701 nobody 16 0 0 0:00.43 0.3 78376 8624 4880 S lsphp4
28702 nobody 16 0 0 0:00.20 0.3 79084 9128 4736 S lsphp4
28708 nobody 16 0 0 0:03.17 0.3 78664 9832 5800 S lsphp4
28719 nobody 16 0 0 0:00.56 0.3 78568 8812 4876 S lsphp4
28730 nobody 16 0 0 0:00.11 0.3 78852 8628 4404 S lsphp4
28731 nobody 16 0 0 0:00.52 0.3 79092 9384 4956 S lsphp4
28744 nobody 16 0 0 0:00.20 0.3 79084 9132 4736 S lsphp4
28757 nobody 16 0 0 0:00.56 0.3 78732 8920 4824 S lsphp4
28760 nobody 16 0 0 0:00.20 0.3 79032 9064 4728 S lsphp4
28761 nobody 16 0 0 0:01.26 0.3 78468 9200 5364 S lsphp4
28763 nobody 16 0 0 0:00.14 0.2 7776 5744 1596 S admin_php -c ../conf/php.ini
28770 nobody 16 0 0 0:00.76 0.4 79000 10m 6572 S lsphp4
28776 nobody 16 0 0 0:01.44 0.3 78504 10m 6756 S lsphp4
Usually, there are only 4-5 lsphp4 processes. :eek: :confused: I believe something is wrong with lsphp, why does it open so many connections for a simple task? lsphp is killing the disk I/O :( I hope this provides some clues to the problem.
anewday
04-20-2008, 12:53 PM
When I merged 5 consecutive posts (All are less than 30 letters). It got stuck again. This never happens with Apache. :mad:
2008-04-20 14:50:45.162 INFO [24.x.x.x:33055-0#APVH_forum.......com] Connection idle time: 31 while in state: 5 watching for event: 25,close!
2008-04-20 14:50:45.163 NOTICE [24.x.x.x:33055-0#APVH_forum.......com] Content len: 219, Request line:
POST /index.php?act=mod&CODE=postchoice&tact=merge&checked=1 HTTP/1.1
2008-04-20 14:50:45.302 INFO [24.x.x.x:33055-0#APVH_forum......com] HttpExtConnector state: 8, request body sent: 219, response body size: 0, response body sent:0, left in buffer: 0, attempts: 0.
mistwang
04-20-2008, 06:03 PM
I think your MySQL is the bottle neck which causes high I/O wait, and stall all PHP processes, which causes LSWS to start more PHP processes.
Maybe merging posts will cause MySQL to block and takes long time to process?
Please post your lsphp configuration for reviewing.
anewday
04-20-2008, 06:49 PM
my.cnf
[mysqld]
max_connections=500
max_user_connections=300
skip-locking
skip-innodb
query_cache_limit=2M
query_cache_size=128M
query_cache_type=1
interactive_timeout=60
wait_timeout=60
connect_timeout=6
thread_cache_size=128
key_buffer_size=640M
max_allowed_packet=16M
table_cache=2048
join_buffer=1M
sort_buffer=2M
record_buffer=1M
sort_buffer_size=2M
read_buffer_size=2M
safe-show-database
max_connect_errors=10
thread_concurrency=2
myisam_sort_buffer_size=64M
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
#log_slow_queries=/var/log/mysql/mysql_slow-queries.log
long_query_time=3
tmp_table_size=32M
max_heap_table_size=32M
#log-queries-not-using-indexes=/var/log/mysql/mysql_queries_no-indexes.log
join_buffer_size=2M
low_priority_updates=1
read_rnd_buffer_size=2M
[mysql.server]
user=mysql
basedir=/var/lib
[safe_mysqld]
err-log=/var/log/mysqld.log
var/lib/mysql/mysql.pid
open_files_limit=8192
[mysqldump]
quick
max_allowed_packet=16M
[mysql]
no-auto-rehash
[isamchk]
key_buffer=32M
sort_buffer=32M
read_buffer=16M
write_buffer=16M
[myisamchk]
key_buffer=32M
sort_buffer=32M
read_buffer=16M
write_buffer=16M
[mysqlhotcopy]
interactive-timeout
eaccelearator settings in php.ini
extension_dir = "/usr/local/lsws/lsphp4/lib/php/extensions/no-debug-non-zts-20020429"
extension="eaccelerator.so"
eaccelerator.shm_size="64"
eaccelerator.cache_dir="/tmp/lswseaccelerator"
eaccelerator.enable="1"
eaccelerator.optimizer="1"
eaccelerator.check_mtime="0"
eaccelerator.debug="0"
eaccelerator.filter=""
eaccelerator.shm_max="0"
eaccelerator.shm_ttl="86400"
eaccelerator.shm_prune_period="7200"
eaccelerator.shm_only="0"
eaccelerator.compress="1"
eaccelerator.compress_level="9"
eaccelerator.keys="shm"
eaccelerator.sessions="shm"
eaccelerator.content="shm"
lsphp config (you set these last time with the exception of max process time)
http://img153.imageshack.us/img153/6334/lsphp4zd3.th.png (http://img153.imageshack.us/my.php?image=lsphp4zd3.png)
I have switched back to Apache now to see how things will go. The weird thing is this issue doesn't happen before I used litespeed. Anything else I can provide to help? Thanks George.
anewday
04-21-2008, 09:20 AM
A member said it happens with regular Add Reply as well. My site serves large multimedia files. But then still cannot explain why it causes iowait to jump to 40% when the server is 95% idle at 3 in the morning....
anewday
04-21-2008, 10:24 AM
I checked the load graphs and noticed that it spiked up to 15 at around 8AM EST.
http://img368.imageshack.us/img368/1807/loaddayyt0.png
Traffic during that time was only 2.5Mbit/s and the number of processes jumped from an average of 120 to over 160. I bet those are lsphp4 processes when many members are writing to the forums. I really believe there is a bug with litespeed. :(
mistwang
04-21-2008, 10:39 AM
Maybe there was some I/O intensive tasks like backup DB or some other files. Please check your graph history see if it happen every day around the same time.
LSWS itself alone will not cause high I/O wait.
More lsphp4 processes is the result of high I/O wait, not the cause. You need to find out which process causes high I/O wait, pay special attention to your MySQL and backup scripts.
anewday
04-21-2008, 10:48 AM
Maybe there was some I/O intensive tasks like backup DB or some other files. Please check your graph history see if it happen every day around the same time.
LSWS itself alone will not cause high I/O wait.
More lsphp4 processes is the result of high I/O wait, not the cause. You need to find out which process causes high I/O wait, pay special attention to your MySQL and backup scripts.
There are no I/O intensive tasks running at that time, cpanel backup and database backup scripts runs at 1AM daily. As I said earlier, with Apache, I never had an issue like this.
I'll send the http headers to you by PM.
mistwang
04-21-2008, 10:58 AM
Are your server experiencing similar high load during the backup time, there are two spikes in your graph at night time.
You need to check your server when the I/O wait is high, pay attention to "SWAP" memory being used, if for some reason the server start to use swap, you need to find out which process causes it.
And you can reduce "Max conn" for lsphp to avoid starting too many lsphp processes. when your server is short of memory, running more lsphp will only make it worse.
Another suggestion is to run Apache for a day or two and compare the load graph with that of LSWS.
anewday
04-21-2008, 11:08 AM
Are your server experiencing similar high load during the backup time, there are two spikes in your graph at night time.
You need to check your server when the I/O wait is high, pay attention to "SWAP" memory being used, if for some reason the server start to use swap, you need to find out which process causes it.
And you can reduce "Max conn" for lsphp to avoid starting too many lsphp processes. when your server is short of memory, running more lsphp will only make it worse.
Another suggestion is to run Apache for a day or two and compare the load graph with that of LSWS.
Load during backups is 2-3 only, doesn't really slow down browsing. The other spike during night time is when server copies the database to NAS, only takes a few minutes to complete.
According to the memory graphs, it has over 2GB of cached memory all the time.
http://img291.imageshack.us/img291/1249/memdayvd1.th.png (http://img291.imageshack.us/my.php?image=memdayvd1.png)
mistwang
04-21-2008, 11:21 AM
If your DB file is larger than 2GB, it will pretty much invalidate the whole 2GB cache memory, 2GB cache memory can guarantee that server do not have a disk I/O problem.
You need to find out what causes the high I/O wait. And if you think Apache does not have this problem, give it a try and see what you get. We are interested in the result as well.
anewday
04-21-2008, 11:34 AM
The database is around 900Mb only. I will switch back to Apache later to test. It performed fine during the 3 busiest months of last year. This year isn't nearly as active.
anewday
04-21-2008, 08:29 PM
I just discovered the nFLT field in top and its value is 463. Isn't it rather high?
PID USER PR NI %CPU TIME+ %MEM VIRT RES SHR nFLT S COMMAND
1999 nobody 15 0 0 0:46.20 0.2 8552 5700 1428 463 S lshttpd
mistwang
04-22-2008, 07:55 AM
If you stream media file through PHP, it may have this problem. should use internal redirect instead.
You can try increasing the max io buffer size.
http://www.litespeedtech.com/docs/webserver/config/general/#inMemBufSize
anewday
04-22-2008, 10:19 AM
Ok, I don't stream any media files on the server. Max I/O is set to 150M, it should be enough for a forum withi media downloads? The ethernet port is 100Mbit. :)
mistwang
04-23-2008, 09:22 AM
I/O Buffer is only for PHP request/response, won't affect media files, unless the media files were sent through PHP. Does it work better with larger I/O buffer today?
anewday
04-23-2008, 05:08 PM
Can't notice any difference I have tried 150M and 200M. Another observation is that posting messages for the first time doesn't cause any high iowait isues. It's just subsequent edits that do. Very strange...
mistwang
04-23-2008, 05:18 PM
I think probably know what is going on. LSWS use on-disk buffer to store the body of large post requests. this will cause high I/O wait.
The solution is to change the swapping directory to use /dev/shm instead of /tmp/lshttpd/swap.
http://www.litespeedtech.com/docs/webserver/config/general/#swappingDir
anewday
04-23-2008, 05:48 PM
I think probably know what is going on. LSWS use on-disk buffer to store the body of large post requests. this will cause high I/O wait.
The solution is to change the swapping directory to use /dev/shm instead of /tmp/lshttpd/swap.
http://www.litespeedtech.com/docs/webserver/config/general/#swappingDir
Ok, I will try it later. :) But some posts I edit contain less than 10 letters.
furimedia
04-23-2008, 11:42 PM
I think I also have this problem.
My server was always usually less than 1 but now it just averages 1 and higher.
What does setting to /dev/shm do? I am guessing it will uses System RAM instead of accessing disk and if this is the case, shouldn't using /dev/shm be the default behaviour for maximum performance?
anewday
04-24-2008, 09:50 AM
That's what I want to know too. Also, if Memory I/O Buffer is not full (as in my case) then how would setting the path to /dev/shm help since it doesn't have a need for it. :confused: Please clarify.
mistwang
04-24-2008, 02:15 PM
We use on disk buffer mainly for buffering the large request body for those site dealing with large media files, let a 600MB media file fill up the memory is not a good idea, and there could be 20 upload sessions at the same time.
Let's see what we can do about it. At least we can set the threshold higher when there are enough free memory.
anewday
04-24-2008, 04:37 PM
George, so setting the swapping directory to /dev/shm is useless since my Memory I/O Buffer is not full, right? I don't stream any media files by php, either.
mistwang
04-24-2008, 04:40 PM
No, it should help. as current LSWS release use 1MB as the threshold on whether to use the on disk buffer.
When a request has a request body larger than the threshold, the on disk buffer will be used no matter the memory I/O buffer is full or not.
We will change it to 10MB in 3.3.11 release.
anewday
04-24-2008, 04:50 PM
Ok, sounds good. Should include the threshold value on the tooltip in web console. Maybe even allow user to define it. :) Thanks!
miracle
04-24-2008, 11:02 PM
10MB is more than enough in most cases? Why not make LSWS uses as much memory as possible, instead of use disk buffer?
anewday
04-25-2008, 10:57 PM
Quick/regular edit and moving topics still causes high iowait occassionally in 3.3.11 Do you think it's good to disable mysql persistent connections in php.ini ?
anewday
04-26-2008, 09:00 AM
2008-04-26 10:52:22.678 INFO [AdminPHP] pid list size: 1
2008-04-26 10:52:22.678 INFO [AdminPHP] add child process pid: 7016
This shows up in the error_log file whenever the load spikes due to iowait. hmmmm.
How to lessen those on the error_log file?
mistwang
04-26-2008, 12:56 PM
2008-04-26 10:52:22.678 INFO [AdminPHP] pid list size: 1
2008-04-26 10:52:22.678 INFO [AdminPHP] add child process pid: 7016This shows up in the error_log file whenever the load spikes due to iowait. hmmmm.
How to lessen those on the error_log file?
That's just because you are using the Admin console.
mistwang
04-26-2008, 12:56 PM
Quick/regular edit and moving topics still causes high iowait occassionally in 3.3.11 Do you think it's good to disable mysql persistent connections in php.ini ?
does not matter.
anewday
04-26-2008, 05:50 PM
That's just because you are using the Admin console.
I wasn't using the Admin Console during the timestamps though.