100% CPU after 4.1.1 upgrade from 4.0.20

Discussion in 'Bug Reports' started by henpa, Jul 6, 2011.

  1. henpa

    henpa New Member

    Hi, I've been running LiteSpeed for a long time and I don't remember the last time I had any problems with it.

    Yesterday I upgraded from 4.0.20 to 4.1.1 (for no special reason) and later I started noticing an oddity: sometimes the litespeed process (lshttpd) used 100% of the CPU.

    This happens when users use my service and when they try to upload a file, CPU of "litespeed" process goes to 100% and stays there....

    I switched back and forth from 4.0.20 to 4.1.1 and I can reproduce this with 4.1.1 only.

    I'm pretty sure it's a bug somewhere... but I haven't isolated the problem yet... this only happens when the user tries to upload a file (I tested with a 4MB file). I know I haven't done anything besides upgrading LiteSpeed.

    This web service is a Perl script using CGI::Fast, on a Debian 5 linux running 2.6.26.

    I did a "strace" to this process, and when the CPU goes to 100% it loops with messages like these:

    Code:
    10:45:23.817680 gettimeofday({1309959923, 817690}, NULL) = 0
    10:45:23.817732 epoll_wait(6, {{EPOLLOUT, {u32=24, u64=24}}}, 10, 100) = 1
    10:45:23.817785 gettimeofday({1309959923, 817795}, NULL) = 0
    10:45:23.817836 epoll_wait(6, {{EPOLLOUT, {u32=24, u64=24}}}, 10, 100) = 1
    10:45:23.817890 gettimeofday({1309959923, 817900}, NULL) = 0
    10:45:23.817941 epoll_wait(6, {{EPOLLOUT, {u32=24, u64=24}}}, 10, 100) = 1
    10:45:23.817994 gettimeofday({1309959923, 818004}, NULL) = 0
    10:45:23.818046 epoll_wait(6, {{EPOLLOUT, {u32=24, u64=24}}}, 10, 100) = 1
    10:45:23.818099 gettimeofday({1309959923, 818109}, NULL) = 0
    10:45:23.818150 epoll_wait(6, {{EPOLLOUT, {u32=24, u64=24}}}, 10, 100) = 1
    10:45:23.818204 gettimeofday({1309959923, 818214}, NULL) = 0
    10:45:23.818255 epoll_wait(6, {{EPOLLOUT, {u32=24, u64=24}}}, 10, 100) = 1
    10:45:23.818308 gettimeofday({1309959923, 818318}, NULL) = 0
    10:45:23.818360 epoll_wait(6, {{EPOLLOUT, {u32=24, u64=24}}}, 10, 100) = 1
    10:45:23.818413 gettimeofday({1309959923, 818423}, NULL) = 0
    10:45:23.818464 epoll_wait(6, {{EPOLLOUT, {u32=24, u64=24}}}, 10, 100) = 1
    10:45:23.818518 gettimeofday({1309959923, 818528}, NULL) = 0
    10:45:23.818569 epoll_wait(6, {{EPOLLOUT, {u32=24, u64=24}}}, 10, 100) = 1
    10:45:23.818622 gettimeofday({1309959923, 818632}, NULL) = 0
    10:45:23.818674 epoll_wait(6, {{EPOLLOUT, {u32=24, u64=24}}}, 10, 100) = 1
    10:45:23.818727 gettimeofday({1309959923, 818737}, NULL) = 0
    10:45:23.818778 epoll_wait(6, {{EPOLLOUT, {u32=24, u64=24}}}, 10, 100) = 1
    10:45:23.818832 gettimeofday({1309959923, 818841}, NULL) = 0
    10:45:23.818883 epoll_wait(6, {{EPOLLOUT, {u32=24, u64=24}}}, 10, 100) = 1
    10:45:23.818939 gettimeofday({1309959923, 818949}, NULL) = 0
    10:45:23.818991 epoll_wait(6, {{EPOLLOUT, {u32=24, u64=24}}}, 10, 100) = 1
    10:45:23.819044 gettimeofday({1309959923, 819054}, NULL) = 0
    10:45:23.819096 epoll_wait(6, {{EPOLLOUT, {u32=24, u64=24}}}, 10, 100) = 1
    
    I switched back to 4.0.20. Let me know if I can help you with any other info!

    Regards!
  2. cmanns

    cmanns New Member

    time needed?

    noatime enabled?
  3. henpa

    henpa New Member

    William, thanks for your reply, but I didn't understand what you mean. Could you please rewrite in "user-friendly language"?

    Like I said, I only upgraded LiteSpeed from 4.0.20 to 4.1.1 from the admin web console. Is there an upgrade requirement that I am missing here?


    Thanks
  4. mistwang

    mistwang LiteSpeed Staff

  5. henpa

    henpa New Member

    hi mistwang, I never used this feature and I have both Outbound Bandwidth and Inbound Bandwidth set to 0.

    Let me know if you need any other info.


    Thanks
  6. mistwang

    mistwang LiteSpeed Staff

    Are you using a FastCGI script to handle the upload?
  7. henpa

    henpa New Member

    Yes I am! A Perl script with CGI::Fast module.
  8. mistwang

    mistwang LiteSpeed Staff

    can you please reproduce the bug with debug logging enabled?
    You can search related log entries by your IP address, please send the log to bug@litespeedtech...
  9. henpa

    henpa New Member

    mistwang, I will do this tonight when users are not using my server ok? Thanks!
  10. mistwang

    mistwang LiteSpeed Staff

    sure. no problem.
  11. IrPr

    IrPr New Member

    I'm facing the same issue on 4.1.1 and 4.1.2 both, no anything special found on error log
    No any CGI application, no any Output/Inout BW limitation, just simple lsphp5 processes are running
    AIO, epoll and noatime (fstab) are used on the box


    here is top head output:
    top - 00:19:02 up 3 days, 11:21, 5 users, load average: 185.35, 175.61, 102.62
    Tasks: 1153 total, 160 running, 993 sleeping, 0 stopped, 0 zombie
    Cpu(s): 21.3%us, 77.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 1.0%si, 0.0%st
    Mem: 66007536k total, 45632468k used, 20375068k free, 2394380k buffers
    Swap: 2096472k total, 0k used, 2096472k free, 30505516k cached

    look at sys CPU usage, lshttpd is eating all CPU power

    I'll switch to 4.0 and seeing, but any idea on this?
  12. mistwang

    mistwang LiteSpeed Staff

    there is a bug in mod_security implementation may cause high CPU usage after lshttpd run for a while. Please download and upgrade to the latest 4.1.2 build.
    If it happen again, please try "strace -tt -T -p <pid_of_lshttpd_using_100%_cpu>"
  13. IrPr

    IrPr New Member

    upgraded to latest 4.1.12 and it happened again

    strace is very ugly but i found these bellow lines there:

    Code:
    00:56:46.039465 epoll_ctl(7, EPOLL_CTL_MOD, 3, {EPOLLERR|EPOLLHUP, {u32=0, u64=12884901888}}) = 0 <0.005406>
    00:56:46.046294 epoll_ctl(7, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=0, u64=12884901888}}) = 0 <0.000630>
    00:56:46.049874 recvfrom(28, 0x7fff11b5d05f, 1, 2, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.000928>
    00:56:46.051698 epoll_ctl(7, EPOLL_CTL_MOD, 28, {EPOLLERR|EPOLLHUP, {u32=0, u64=120259084288}}) = 0 <0.001321>
    00:56:46.053730 epoll_ctl(7, EPOLL_CTL_MOD, 28, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=0, u64=120259084288}}) = 0 <0.003656>
    00:56:46.059891 recvfrom(41, 0x7fff11b5d05f, 1, 2, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.003208>
    00:56:46.063269 epoll_ctl(7, EPOLL_CTL_MOD, 41, {EPOLLERR|EPOLLHUP, {u32=0, u64=176093659136}}) = 0 <0.002374>
    00:56:46.067189 epoll_ctl(7, EPOLL_CTL_MOD, 41, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=0, u64=176093659136}}) = 0 <0.000409>
    00:56:46.068408 recvfrom(43, 0x7fff11b5d05f, 1, 2, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.000439>
    00:56:46.070474 epoll_ctl(7, EPOLL_CTL_MOD, 74, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=0, u64=317827579904}}) = 0 <0.002435>
    00:56:46.132436 recvfrom(98, 0x7fff11b5d05f, 1, 2, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.000715>
    00:56:46.133559 epoll_ctl(7, EPOLL_CTL_MOD, 98, {EPOLLERR|EPOLLHUP, {u32=0, u64=420906795008}}) = 0 <0.002848>
    00:56:46.136911 epoll_ctl(7, EPOLL_CTL_MOD, 98, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=0, u64=420906795008}}) = 0 <0.000948>
    00:56:46.138793 epoll_ctl(7, EPOLL_CTL_MOD, 101, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=0, u64=433791696896}}) = 0 <0.000544>
    00:56:46.140099 epoll_ctl(7, EPOLL_CTL_MOD, 101, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=0, u64=433791696896}}) = 0 <0.000976>
    00:56:46.141473 epoll_ctl(7, EPOLL_CTL_MOD, 105, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=0, u64=450971566080}}) = 0 <0.000787>
    00:56:46.143161 recvfrom(111, 0x7fff11b5d05f, 1, 2, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.002742>
    00:56:46.148236 epoll_ctl(7, EPOLL_CTL_MOD, 111, {EPOLLERR|EPOLLHUP, {u32=0, u64=476741369856}}) = 0 <0.001239>
    00:56:46.151945 epoll_ctl(7, EPOLL_CTL_MOD, 111, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=0, u64=476741369856}}) = 0 <0.001868>
    00:56:46.185697 recvfrom(112, 0x7fff11b5d05f, 1, 2, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.004630>
    00:56:46.191481 epoll_ctl(7, EPOLL_CTL_MOD, 112, {EPOLLERR|EPOLLHUP, {u32=0, u64=481036337152}}) = 0 <0.003293>
    00:56:46.195478 epoll_ctl(7, EPOLL_CTL_MOD, 112, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=0, u64=481036337152}}) = 0 <0.000903>
    00:56:46.198602 epoll_ctl(7, EPOLL_CTL_MOD, 116, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=0, u64=498216206336}}) = 0 <0.001762>
    00:56:46.201337 epoll_ctl(7, EPOLL_CTL_MOD, 116, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=0, u64=498216206336}}) = 0 <0.002027>
    00:56:46.204415 epoll_ctl(7, EPOLL_CTL_MOD, 117, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=0, u64=502511173632}}) = 0 <0.000342>
    00:56:46.205935 epoll_ctl(7, EPOLL_CTL_MOD, 117, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=0, u64=502511173632}}) = 0 <0.002864>
    00:56:46.209402 recvfrom(118, 0x7fff11b5d05f, 1, 2, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.000978>
    00:56:46.212161 epoll_ctl(7, EPOLL_CTL_MOD, 118, {EPOLLERR|EPOLLHUP, {u32=0, u64=506806140928}}) = 0 <0.000267>
    00:56:46.215712 epoll_ctl(7, EPOLL_CTL_MOD, 118, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=0, u64=506806140928}}) = 0 <0.000114>
    00:56:46.216778 recvfrom(122, 0x7fff11b5d05f, 1, 2, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.000940>
    00:56:46.218182 epoll_ctl(7, EPOLL_CTL_MOD, 122, {EPOLLERR|EPOLLHUP, {u32=0, u64=523986010112}}) = 0 <0.006464>
    00:56:46.225484 epoll_ctl(7, EPOLL_CTL_MOD, 122, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=0, u64=523986010112}}) = 0 <0.000672>
    00:56:46.251877 recvfrom(126, 0x7fff11b5d05f, 1, 2, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.000898>
    00:56:46.253708 epoll_ctl(7, EPOLL_CTL_MOD, 126, {EPOLLERR|EPOLLHUP, {u32=0, u64=541165879296}}) = 0 <0.003339>
    00:56:46.257553 epoll_ctl(7, EPOLL_CTL_MOD, 126, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=0, u64=541165879296}}) = 0 <0.000904>
    00:56:46.258900 recvfrom(143, 0x7fff11b5d05f, 1, 2, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.000041>
    00:56:46.260824 epoll_ctl(7, EPOLL_CTL_MOD, 143, {EPOLLERR|EPOLLHUP, {u32=0, u64=614180323328}}) = 0 <0.003327>
    00:56:46.266498 epoll_ctl(7, EPOLL_CTL_MOD, 143, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=0, u64=614180323328}}) = 0 <0.000084>
    00:56:46.270110 recvfrom(146, 0x7fff11b5d05f, 1, 2, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.025589>
    00:56:46.296781 epoll_ctl(7, EPOLL_CTL_MOD, 146, {EPOLLERR|EPOLLHUP, {u32=0, u64=627065225216}}) = 0 <0.097823>
    00:56:46.395563 epoll_ctl(7, EPOLL_CTL_MOD, 146, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=0, u64=627065225216}}) = 0 <0.000030>
    00:56:46.395680 epoll_ctl(7, EPOLL_CTL_MOD, 151, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=0, u64=648540061696}}) = 0 <0.000009>
    00:56:46.395724 epoll_ctl(7, EPOLL_CTL_MOD, 151, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=0, u64=648540061696}}) = 0 <0.000005>
    00:56:46.395765 recvfrom(155, 0x7fff11b5d05f, 1, 2, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.000011>
    00:56:46.395812 epoll_ctl(7, EPOLL_CTL_MOD, 155, {EPOLLERR|EPOLLHUP, {u32=0, u64=6657<0.000005>
    
  14. mistwang

    mistwang LiteSpeed Staff

    Can also do "lsof -p <pid_of_100%_cpu>", then

    kill -USR2 -p <pid_of_100%_cpu>

    to turn on debug logging dynamically, Log Level must be set to "DEBUG", and "Debug Level" "NONE".

    If it is too big, please send the log file to bug@litespeedtech...
  15. IrPr

    IrPr New Member

    Unfortunately its a production server and i couldn't risk to switch back into 4.1
    but there are other servers which may this problem occur in them, I will update with the result of commands if i possible

    Hope henpa provide more information at the night to investigate this odd issue
  16. mistwang

    mistwang LiteSpeed Staff

    No problem. this type of issue usually used up all available idle CPU cycles on one core, by spinning in the event loop, maybe a little bit sluggish, but should not block a request.
  17. henpa

    henpa New Member

    Hi mistwang, I'm sorry I didn't get the info that you've asked yet, but it is also a production server and I didn't have the opportunity yet, but I will eventually sometime in the next few days when the usage is low.

    Thanks for your help so far and sorry for not writing here again for a few days.
  18. IrPr

    IrPr New Member

    I still experience this issue on busy servers but rarely
    is it addressed/fixed yet?
  19. andreas

    andreas New Member

    Had the 100% CPU problem with the latest 4.1.2 today. Restarting didn't fix it, but after killing and deleting /tmp/lshttpd it seems to work for now.

    strace:
    Code:
    gettimeofday({1312205537, 345598}, NULL) = 0
    epoll_wait(6, {{EPOLLIN, {u32=7, u64=7}}, {EPOLLIN, {u32=9, u64=9}}}, 10, 100) = 2
    gettimeofday({1312205537, 345644}, NULL) = 0
    epoll_wait(6, {{EPOLLIN, {u32=7, u64=7}}, {EPOLLIN, {u32=9, u64=9}}}, 10, 100) = 2
    gettimeofday({1312205537, 345690}, NULL) = 0
    epoll_wait(6, {{EPOLLIN, {u32=7, u64=7}}, {EPOLLIN, {u32=9, u64=9}}}, 10, 100) = 2
    gettimeofday({1312205537, 345736}, NULL) = 0
    epoll_wait(6, {{EPOLLIN, {u32=7, u64=7}}, {EPOLLIN, {u32=9, u64=9}}}, 10, 100) = 2
    gettimeofday({1312205537, 345782}, NULL) = 0
    epoll_wait(6, {{EPOLLIN, {u32=7, u64=7}}, {EPOLLIN, {u32=9, u64=9}}}, 10, 100) = 2
    
  20. andreas

    andreas New Member

    Keeps happening... back to 4.0.20, once again.

Share This Page