LiteSpeed Technologies
Download Download     Blog Blog     Wiki Wiki     Forum Forum     Store     Contact Contact    

Go Back   LiteSpeed Support Forums > External Applications > Ruby/Rails > unusually high cpu usage by lshttpd

Reply
 
Thread Tools Display Modes
  #1  
Old 04-29-2007, 09:23 AM
babelian babelian is offline
Member
 
Join Date: Apr 2007
Posts: 15
Default unusually high cpu usage by lshttpd

i've been using litespeed for almost a year now on a rails/mongrel based site, but in the last few weeks i've been noticing cpu load spikes. up until today i thought it was to do with some changes in the ruby app, but today watching 'top' i've noticed that the lshttpd process occasionally spikes up to 40% of the cpu for a few minutes at a time. i can't tell what its doing (access/error/rails/debugging logs show little-to-no activity).

today (sunday, which should be quiet) the load average is spiking up to 3.5 for a few minutes at a time, and back down to a regular 0.40.

has anybody else experienced any like this? if i siege test the home page litespeed only takes up 15-20% of the cpu, so those ~40% spikes on a quiet day seem very strange.

thanks, zach
Reply With Quote
  #2  
Old 04-29-2007, 12:31 PM
mistwang mistwang is offline
LiteSpeed Staff
 
Join Date: May 2003
Location: New Jersey
Posts: 7,583
Have you upgrade to the latest release? Are you using epoll or poll?
We need more information to look into this issue, maybe you can strace the lshttpd process when you caught that.
Reply With Quote
  #3  
Old 04-29-2007, 03:33 PM
babelian babelian is offline
Member
 
Join Date: Apr 2007
Posts: 15
i'm using 2.2.6 at the moment, a friend had an unsmooth upgrade to 3.0 (upgrade failed and he scrubbed the directories and installed fresh) so i'm waiting a month as we're switching hosting providers then anyway.

i'm using poll, though if its recommend i can switch to epoll - i assume thats preferable (am on fedora 4).

thanks for the response
Reply With Quote
  #4  
Old 04-29-2007, 05:08 PM
babelian babelian is offline
Member
 
Join Date: Apr 2007
Posts: 15
attached is some strace output from lshttpd (also here http://www.arenaflowers.com:8080/strace_output.txt if the attachement didn't work). interestingly, when it runs it squishes lshttpd back down to about 3%. (it was at 35-40% before i started the trace, and would go back up again if i canceled it).

here's the top w/out the strace:

top - 01:03:06 up 28 days, 19:45, 2 users, load average: 1.75, 1.49, 1.19
Tasks: 111 total, 5 running, 105 sleeping, 0 stopped, 1 zombie
Cpu(s): 48.0% us, 27.1% sy, 0.0% ni, 13.1% id, 0.0% wa, 0.0% hi, 11.8% si
Mem: 2074912k total, 1924328k used, 150584k free, 39180k buffers
Swap: 2031608k total, 92k used, 2031516k free, 1052560k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19968 rails 16 0 97316 87m 4692 R 60.3 4.3 216:54.26 ruby /usr/lib/ruby/gems/1.8/gems/mongrel-1.0.1/bin/mongrel_rails start -d -e pro
19970 rails 16 0 67968 58m 4668 R 58.3 2.9 186:58.65 ruby /usr/lib/ruby/gems/1.8/gems/mongrel-1.0.1/bin/mongrel_rails start -d -e pro
27995 rails 15 0 6368 3760 1516 R 51.4 0.2 19:58.76 lshttpd
12954 postfix 15 0 7024 2432 2028 S 0.7 0.1 0:08.28 smtpd -n arenaflowers.com:25 -t inet -u -o smtpd_sasl_auth_enable yes
31287 root 15 0 178m 166m 2276 S 0.3 8.2 3:09.93 backgroundrb



i'm still not convinced its a litespeed problem so i'll try tracing the mongrels as well and see if i see anything. i noticed when i restarted the mongrel it all dropped back to normal, though i should do that a few times to make sure it's not just a coincidence.
Reply With Quote
  #5  
Old 05-03-2007, 12:06 PM
babelian babelian is offline
Member
 
Join Date: Apr 2007
Posts: 15
what seems to be happening is some sort of loop between mongrel and lightspeed. using strace i can repetitions of one file, for example

writev(4, [{"GET /flower_delivery/florists_in"..., 441}, {"X-Forwarded-For: 74.124.192.171\r"..., 35}], 2) = 476

in lightspeed

and a similar line saying read in mongrel:

read(218, "GET /flower_delivery/florists_in"..., 16384) = 476

the repetition goes on much longer than it would take to transfer the file (minutes), and if i gracefully restart lightspeed (or mongrel) everything returns to normal. there's nothing wrong in the ruby app (ie clicking on the same link to test won't cause a problem, and it happens links in various parts of the site, all of which use caches_page).

is the bad file descriptor the issue? see below:


full lightspeed loop block:

gettimeofday({1178219476, 926735}, NULL) = 0
select(216, [3 215], [], [], {0, 619176}) = 1 (in [3], left {0, 619176})
accept(3, {sa_family=AF_INET, sin_port=htons(36453), sin_addr=inet_addr("127.0.0.1")}, [16]) = 218
fcntl64(218, F_GETFL) = 0x2 (flags O_RDWR)
fstat64(218, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f3c000
_llseek(218, 0, 0xbfe86bbc, SEEK_CUR) = -1 ESPIPE (Illegal seek)
fcntl64(218, F_GETFL) = 0x2 (flags O_RDWR)
fstat64(218, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f3b000
_llseek(218, 0, 0xbfe86bbc, SEEK_CUR) = -1 ESPIPE (Illegal seek)
setsockopt(218, SOL_TCP, TCP_CORK, [1], 4) = 0
gettimeofday({1178219476, 930155}, NULL) = 0
select(219, [215 218], [], [], {0, 0}) = 1 (in [218], left {0, 0})
gettimeofday({1178219476, 930767}, NULL) = 0
gettimeofday({1178219476, 931024}, NULL) = 0
select(216, [3 215], [], [], {0, 0}) = 0 (Timeout)
read(218, "GET /flower_delivery/florists_in"..., 16384) = 476
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
close(218) = 0
munmap(0xb7f3b000, 4096) = 0
close(218) = -1 EBADF (Bad file descriptor)
munmap(0xb7f3c000, 4096) = 0


full mongrel block:

connect(4, {sa_family=AF_INET, sin_port=htons(8000), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
fcntl64(4, F_SETFD, FD_CLOEXEC) = 0
epoll_ctl(6, EPOLL_CTL_ADD, 4, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=4, u64=4}}) = 0
time(NULL) = 1178219495
gettimeofday({1178219495, 538716}, NULL) = 0
epoll_wait(6, {{EPOLLOUT, {u32=4, u64=4}}}, 341, 100) = 1
getsockopt(4, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
writev(4, [{"GET /flower_delivery/florists_in"..., 441}, {"X-Forwarded-For: 74.124.192.171\r"..., 35}], 2) = 476
epoll_ctl(6, EPOLL_CTL_MOD, 4, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=4, u64=4}}) = 0
time(NULL) = 1178219495
gettimeofday({1178219495, 540226}, NULL) = 0
epoll_wait(6, {{EPOLLIN, {u32=4, u64=4}}}, 341, 100) = 1
read(4, "", 1460) = 0
epoll_ctl(6, EPOLL_CTL_DEL, 4, {0, {u32=4, u64=4}}) = 0
close(4) = 0
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 4
fcntl64(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0

Last edited by babelian; 05-03-2007 at 12:11 PM..
Reply With Quote
  #6  
Old 05-03-2007, 12:42 PM
mistwang mistwang is offline
LiteSpeed Staff
 
Join Date: May 2003
Location: New Jersey
Posts: 7,583
Looks like that a Mongrel instance has gone bad, it closes the connection without sending back anything, and LiteSpeed keep trying the bad Mongrel, so the end results are both lshttpd and mongrel uses a lot of CPU.
Please try the latest 3.1 release, it should be as stable as 2.2.6 if not better, if the problem is still there, we will investigate more.
Reply With Quote
  #7  
Old 09-14-2007, 06:27 AM
andreas andreas is offline
Senior Member
 
Join Date: Aug 2006
Posts: 91
I have exactly the same problem with 3.2.2. A few times per day the CPU usage of lsws and mongrel explodes for no apparent reason:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6985 www-data 16 0 33616 30m 2272 R 52.5 1.5 9:25.64 mongrel_rails
18228 www-data 15 0 10196 7772 2152 S 30.9 0.4 11:35.12 lshttpd

lsws is set up as proxy for mongrel, which is delivering static files (only a few per second). There is no excessive traffic to the app when it happens, and I can't reproduce the problem by hammering the server with requests. I think the problem started with 3.1, but I'm not sure.

A strace shows that lsws lseems to be sending the same request to mongrel again and again:
writev(14, [{"GET /math/722713bae093b63c9c5fd9"..., 705},...
...
read(12, "", 1460) = 0
close(12) = 0
Reply With Quote
  #8  
Old 09-14-2007, 06:33 AM
babelian babelian is offline
Member
 
Join Date: Apr 2007
Posts: 15
it was 2.2.6 that had the problem for me, and the problem went away with 3.1.1, but this was part of a migration to a completely different setup (two boxes, fresh installs of all packages etc) so it could easily have been something else. sorry i haven't got more to go on
Reply With Quote
  #9  
Old 10-29-2007, 10:44 AM
babelian babelian is offline
Member
 
Join Date: Apr 2007
Posts: 15
After 3-4 months on new servers, this problem has returned again. I'm pretty sure it has nothing with the apps code (definitely between mongrel and litespeed). While the problem is happening I can view the cached page in question fine in a browser, and all the mongrels seem to behave fine when pinged directly (or through LSWS).

I think possibly it might be to do with Google crawling and hitting pages at a certain speed (the pages that cause the problem are always SEO related pages). It happens any time of day/night, so it doesn't seem resource related.

I'm using the LSWS 3.2.4 and mongrel 1.0.1, so the upgrade hasn't helped. Any ideas on which settings I could tweak (Max keep alives etc)? I'm using EPOLL.
Reply With Quote
  #10  
Old 10-29-2007, 10:56 AM
mistwang mistwang is offline
LiteSpeed Staff
 
Join Date: May 2003
Location: New Jersey
Posts: 7,583
Strace output from the process with high CPU utilization should help identifying the problem. If LSWS keep sending the same request to mongrel without getting any valid response back, it probably a issue with mongrel. Have you tried restart mongrels when it heppens?

The better solution probably is to switch to our ruby lsapi.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 11:34 AM.



- Archive - Top
© Copyright 2003-2011 LiteSpeed Technologies, Inc. All rights reserved. Privacy Policy.