lsapi processes not being used, build up to > max_connections

Discussion in 'Bug Reports' started by fantasydreaming, Nov 20, 2006.

  1. fantasydreaming

    fantasydreaming New Member

    Even more alarmingly, shows all 10 of my lsapi processes as 'in use' (on a fairly busy server), but when I strace them, they're all in select(11...)

    Scope Type Name Max CONN Eff Max Pool In Use Idle WaitQ Req/Sec
    ap LSAPI Rails:ap:/ 10 10 10 10 0 8 14

    Scope Type Name Max CONN Eff Max Pool In Use Idle WaitQ Req/Sec
    ap LSAPI Rails:ap:/ 10 10 10 10 0 27 0

    It only looks like one of the lsapi processes is doing anything from what I can tell with strace.

    Additionally, maxconns is 10, but there are 22 running:
    [root@rhyme data]# ps auxww | grep RailsRunner | wc -l
    22

    Any ideas? It's generally very fast, but can build up slow during the busy times. I've recently upgraded to litespeed from lighttpd and am overall very happy, but worried about this.

    Mysql shows no slow queries, server load is only at .4.

    Thank you,
    Kevin
  2. fantasydreaming

    fantasydreaming New Member

    I should add, that one of them seems stuck in 'nanosleep' versus the expected select():

    nanosleep({0, 10000000}, NULL) = 0
    nanosleep({0, 10000000}, NULL) = 0
    nanosleep({0, 10000000}, NULL) = 0
    nanosleep({0, 10000000}, NULL) = 0
    nanosleep({0, 10000000}, NULL) = 0
  3. fantasydreaming

    fantasydreaming New Member

    Two different other kinds of sleep:

    Fresh processes (looks still right)
    select(1, [0], NULL, NULL, {0, 433000}) = 0 (Timeout)
    kill(27359, SIG_0) = 0
    select(1, [0], NULL, NULL, {1, 0}) = 0 (Timeout)
    kill(27359, SIG_0) = 0
    select(1, [0], NULL, NULL, {1, 0}) = 0 (Timeout)



    Another one, not sure what htis is, but perhaps it's just something inside my application. More gettimeofday() going on.

    gettimeofday({1164067094, 613573}, NULL) = 0
    gettimeofday({1164067094, 613617}, NULL) = 0
    select(8, [3 7], [], [], {0, 999956}) = 0 (Timeout)
    gettimeofday({1164067095, 612726}, NULL) = 0
    select(8, [3 7], [], [], {0, 847}) = 0 (Timeout)
    gettimeofday({1164067095, 613680}, NULL) = 0
    select(8, [3 7], [], [], {0, 0}) = 0 (Timeout)
    kill(27360, SIG_0) = 0
    gettimeofday({1164067095, 613777}, NULL) = 0
    gettimeofday({1164067095, 613805}, NULL) = 0
    select(8, [3 7], [], [], {0, 999971} <unfinished ...>

    When I did lswctrl restart before, it wouldn't kill off the lsapi processes stuck in select() w/o the kill every few msec. They must be kind-of crashed, but the 'max workers' checker isn't picking up on it, and I need to kill -5 them to get them to ever go away.

    It doesn't seem to happen for awhile after starting the server. Restarting the server fairly early in it's life results in all the processes being killed & re-created as expected.
  4. mistwang

    mistwang LiteSpeed Staff

    Is the server doing OK now?
  5. fantasydreaming

    fantasydreaming New Member

    nope

    They apparently all crashed this morning resulting in some short downtime.

    My server error log has things like this in it, though I think they're normal:

    2006-11-22 17:58:35.754 [INFO] [218.185.94.226:16092-0#ap] Connection idle time: 16 while in state: 5 watching for event: 25,close!
    2006-11-22 17:58:35.754 [INFO] [218.185.94.226:16092-0#ap] Content len: 0, Request line:
    GET /poem/add HTTP/1.1
    2006-11-22 17:58:35.754 [INFO] [218.185.94.226:16092-0#ap] Redirect: #1, URL: /dispatch.cgi
    2006-11-22 17:58:35.754 [INFO] [218.185.94.226:16092-0#ap] HttpExtConnector state: 8, request body sent: 0, response body size: 0, response body sent:0, left in buffer: 0, attempts: 0.
    2006-11-22 17:58:45.954 [INFO] [72.75.105.178:60212-0#ap] Connection idle time: 16 while in state: 5 watching for event: 25,close!
    2006-11-22 17:58:45.954 [INFO] [72.75.105.178:60212-0#ap] Content len: 1555097, Request line:
    POST /user/face HTTP/1.1
    2006-11-22 17:58:45.954 [INFO] [72.75.105.178:60212-0#ap] Redirect: #1, URL: /dispatch.cgi
    2006-11-22 17:58:45.954 [INFO] [72.75.105.178:60212-0#ap] HttpExtConnector state: 10, request body sent: 131072, response body size: 0, response body sent:0, left in buffer: 0, attempts: 0.

    Also, the 'graceful restart' throws the iowait load on my server crazy and load up to 30 - not sure if perhaps I have a bad scsi cable or something, but it may have to do with the rails processes suck in the 'bad' select(). If I killall -5 ruby & lshttpd, then start the server, it's quick and painless... Likewise, graceful restart fairly quickily after a new start (i.e. no 'bad' selects() yet) seems to work fine.

    Note: I'm running this as a user besides 'nobody' - not sure if that could have an impact at all.

    Please let me know if there's any debug output or log details I can give you that would help!
  6. mistwang

    mistwang LiteSpeed Staff

    You need to find out what exactly causes he bad select(), I think it is in ruby or your rails app, not in LSAPI code.

    You can try "lsof" or start "strace" at beginning of a request.

    Ruby always resume a function call if it is interrupted (EINTR) by a signal, so sometime it becomes very difficult to kill a ruby process in the normal way.
  7. fantasydreaming

    fantasydreaming New Member

    This was caused by a search using the ruby-google module... apparently http-access and http-access2 can both get stuck waiting 'forever' for a google (or anywhere, likely) response.

    The solution for me was to wrap it in a Timeout::timeout(5) do() block.

    I was able to debug it using the gdb, really sweetly following the instructions here: http://eigenclass.org/hiki.rb?ruby live process introspection
  8. mistwang

    mistwang LiteSpeed Staff

    Cool! I will add that to our Wiki. Thanks!
  9. fantasydreaming

    fantasydreaming New Member

    Glad I could help :)

    One other place that I had problems with it hanging was wherever I was attempting to resolve IP addresses to DNS names... maybe it was just being glacial, but I'd sometimes get an execution timeout expired error as well.

Share This Page