Premature end of response header, and 503 service errors (maybe related, maybe not)

fantasydreaming

Well-Known Member
#1
I'm getting tons of these, and I can't figure out why. Several per second when it's happening, which isn't always.

2006-12-07 20:19:27.370 [NOTICE] [136.182.158.153:61844-0#ap] Premature end of response header.
2006-12-07 20:19:27.709 [NOTICE] [24.171.94.109:2024-0#ap] Premature end of response header.
2006-12-07 20:19:27.822 [NOTICE] [69.246.14.128:2349-0#ap] Premature end of response header.

Also, when my rails application attempts to send a 404 header, litespeed sends a 503 header and doesn't log anything about why to it's stderr or error logfiles.

The logfile shows that a 200 response was sent - wierd!
67.171.73.40 - - [08/Dec/2006:11:07:26 -0600] "GET /asdfasdf2 HTTP/1.1" 200 - "-" "Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.8.0.4) Gecko/20060613 Camino/1.0.2"

Curl shows the 503 header
everybean:~/ap inspire$ curl --head http://allpoetry.com/asdfasdf2
HTTP/1.1 503 Service Unavailable
Server: LiteSpeed
Date: Fri, 08 Dec 2006 17:08:06 GMT
Connection: close

Lightppd at the localhost (sorry, rails' default), shows the proper header
everybean:~/ap inspire$ curl --head http://localhost:3000/asdfasdf2
HTTP/1.1 404 Not Found
Connection: close
Date: Fri, 08 Dec 2006 17:07:36 GMT
Content-Type: text/html; charset=iso-8859-1
Set-Cookie: _session_id=blah; path=/; expires=Fri, 05 Jan 2007 16:29:51 GMT
Cache-Control: no-cache
Expires: -1
Server: lighttpd/1.3.13

No core files, running 2.2.5 std, though the same happens on my other server running 2.2.6

Thanks for any help!
 

fantasydreaming

Well-Known Member
#2
While testing the 404 -> 503 problems, no "premature response" messages were generated in the log, so likely they're different problems, but you never know.
 

mistwang

LiteSpeed Staff
#3
Premature end of response header probably is caused by crashed ruby process, which did not send back the response header or just half of it.
For 404 -> 503 problem, I don't know why it happens, maybe something wrong with that the way a routing error was handled in your application. Create a test Rails application and give it try.
 

fantasydreaming

Well-Known Member
#4
1) Shouldn't crashed ruby processes show up in the error log?

2) The 404 is returned fine under lighttpd and webrick... it's only when litespeed is in the steps that it's a problem.

Exact same application, webrick @ port 3000, litespeed at port 80
everybean:~/storywrite inspire$ curl --head http://allpoetry.com:3000/testas3
HTTP/1.1 404 Not Found
Cache-Control: no-cache
Connection: Keep-Alive
Expires: -1
Date: Fri, 08 Dec 2006 22:24:25 GMT
Content-Type: text/html; charset=iso-8859-1
Server: WEBrick/1.3.1 (Ruby/1.8.5/2006-08-25)
Content-Length: 0
Set-Cookie: _session_id=bb2ed7dcf338e4dd1ccb730eb570b012; path=/; expires=Fri, 05 Jan 2007 22:24:05 GMT

everybean:~/storywrite inspire$ curl --head http://allpoetry.com/testas3
HTTP/1.1 503 Service Unavailable
Server: LiteSpeed
Date: Fri, 08 Dec 2006 22:24:34 GMT
Accept-Ranges: bytes
Connection: close
ETag: "dad-45799445-4ed154"
Last-Modified: Fri, 08 Dec 2006 16:35:17 GMT
Content-Type: text/html
Content-Length: 3501
 

mistwang

LiteSpeed Staff
#5
We need more information to figure out what exactly causing the problem.

Please try turn on LSWS debug logging by changing debug level to "high", post the section of error.log for the failed request.

Strace the ruby process. First send a good request to let the ruby process start, then do "strace -p <pid_of_ruby>", trace the process with larger PID. send a request result in 503 error, post the strace result.

check rails' log files.
 

fantasydreaming

Well-Known Member
#7
Rails renders the static 404 page with :status => 404... litespeed can handle you setting status codes, right? I don't see why not.

I'll check the high log level setting tomorrow morning, thanks
 

fantasydreaming

Well-Known Member
#8
Short answer: The rails log shows it successfully processed. The strace of the Railsrunner shows it trying and closing the socket 3 times, and the web server agrees. I think perhaps it's something wrong with the LSAPI hating it when yous end a Status: header with the result?

LITESPEED LOG

2006-12-09 09:50:06.521 [DEBUG] [127.0.0.1:64798-0#storywrite] New request:
Method=[HEAD], URI=[/testas4],
QueryString=[]
Content Length=0
REWRITE] Rule: Match '/testas4' with pattern '^.*$', result: 1
REWRITE] stat( /Users/inspire/Projects/storywrite/public/../../down.html ) failed
REWRITE] Rule: Match '/testas4' with pattern '^$', result: -1
REWRITE] Rule: Match '/testas4' with pattern '^([^.]+)$', result: 2
REWRITE] Source URI: '/testas4' => Result URI: '/testas4.html'
REWRITE] Rule: Match '/testas4.html' with pattern '^(.*)$', result: 2
REWRITE] stat( /Users/inspire/Projects/storywrite/public/testas4.html ) failed
REWRITE] Source URI: '/testas4.html' => Result URI: 'dispatch.cgi'
REWRITE] Last Rule, stop!
Find context with URI: [/dispatch.cgi], location: []
processContextPath() return 0
run lsapi processor.
request [127.0.0.1:64798-0#storywrite:lsapi] is assigned with connection!
[ExtConn] reconnect()
[ExtConn] connecting to [uds://tmp/lshttpd/storywrite:_.sock]...
ExtConn::eek:nWrite()
request header is done
ExtConn::continueRead()
Request body done!
ExtConn::suspendWrite()
processNewReq() return 0.
ExtConn::eek:nRead()
LsapiConn::doRead()
process packet header 8 bytes
process packet header 0 bytes
ExtConn::eek:nRead()
LsapiConn::doRead()
process packet header -1 bytes
connection to [uds://tmp/lshttpd/storywrite:_.sock] on request #0, error: Connection reset by peer!
[ExtConn] close()
No Request has been processed successfully through this connection, the maximum connections allowed will be reduced!
HttpExtConnector::tryRecover()...
trying to recover from connection problem, attempt: #1!
request [127.0.0.1:64798-0#storywrite:lsapi] is assigned with connection!
[ExtConn] reconnect()
[ExtConn] connecting to [uds://tmp/lshttpd/storywrite:_.sock]...
ExtConn::eek:nWrite()
request header is done
ExtConn::continueRead()
Request body done!
ExtConn::suspendWrite()
ExtConn::eek:nRead()
LsapiConn::doRead()
process packet header 8 bytes
process packet header 0 bytes
ExtConn::eek:nRead()
LsapiConn::doRead()
process packet header -1 bytes
connection to [uds://tmp/lshttpd/storywrite:_.sock] on request #0, error: Connection reset by peer!
[ExtConn] close()
No Request has been processed successfully through this connection, the maximum connections allowed will be reduced!
HttpExtConnector::tryRecover()...
trying to recover from connection problem, attempt: #2!
request [127.0.0.1:64798-0#storywrite:lsapi] is assigned with connection!
[ExtConn] reconnect()
[ExtConn] connecting to [uds://tmp/lshttpd/storywrite:_.sock]...
ExtConn::eek:nWrite()
request header is done
ExtConn::continueRead()
Request body done!
ExtConn::suspendWrite()
ExtConn::eek:nRead()
LsapiConn::doRead()
process packet header 8 bytes
process packet header 0 bytes
o such file or directory
o such file or directory
ExtConn::eek:nRead()
LsapiConn::doRead()
process packet header -1 bytes
connection to [uds://tmp/lshttpd/storywrite:_.sock] on request #0, error: Connection reset by peer!
[ExtConn] close()
No Request has been processed successfully through this connection, the maximum connections allowed will be reduced!
HttpExtConnector::tryRecover()...
HttpIOLink::continueWrite()...
write resumed!
HttpIOLink::handleEvents() events=4!
HttpIOLink::suspendWrite()...
HttpConnection::sendHttpError(),code=503 Service Unavailable
HttpConnection::flush()!
Written to client: 111
HttpConnection::nextRequest()!
HttpExtConnector::cleanUp() ...
abort request...
release ExtProcessor!
Non-KeepAlive, CLOSING!
Shutting down out-bound socket ...
HttpIOLink::handleEvents() events=17!
Close socket ...

******* STRACE OF RAILSRUNNER

select(1, [0], NULL, NULL, {1, 0}) = 1 (in [0], left {0, 525000})
--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], WNOHANG|WSTOPPED, NULL) = 8839
wait4(-1, 0x7fbfffd454, WNOHANG|WSTOPPED, NULL) = 0
rt_sigreturn(0xffffffffffffffff) = 1
nanosleep({0, 10000}, NULL) = 0
select(1, [0], NULL, NULL, {0, 0}) = 1 (in [0], left {0, 0})
accept(0, {sa_family=AF_FILE, path="??<"}, [18051271735726047234]) = 3
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2a955619d0) = 9153
close(3) = 0
select(1, [0], NULL, NULL, {1, 0}) = 0 (Timeout)
kill(8492, SIG_0) = 0
select(1, [0], NULL, NULL, {1, 0}) = ? ERESTARTNOHAND (To be restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], WNOHANG|WSTOPPED, NULL) = 9153
wait4(-1, 0x7fbfffd454, WNOHANG|WSTOPPED, NULL) = 0
rt_sigreturn(0xffffffffffffffff) = -1 EINTR (Interrupted system call)
select(1, [0], NULL, NULL, {0, 953000}) = 1 (in [0], left {0, 575000})
nanosleep({0, 10000}, 0) = ? ERESTART_RESTARTBLOCK (To be restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], WNOHANG|WSTOPPED, NULL) = 8502
wait4(-1, 0x7fbfffd474, WNOHANG|WSTOPPED, NULL) = -1 ECHILD (No child processes)
rt_sigreturn(0xffffffffffffffff) = -1 EINTR (Interrupted system call)
select(1, [0], NULL, NULL, {0, 0}) = 1 (in [0], left {0, 0})
accept(0, {sa_family=AF_FILE, path="??<"}, [18051271735726047234]) = 3
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2a955619d0) = 9154
close(3) = 0
select(1, [0], NULL, NULL, {1, 0}) = 0 (Timeout)
kill(8492, SIG_0) = 0
select(1, [0], NULL, NULL, {1, 0}) = ? ERESTARTNOHAND (To be restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], WNOHANG|WSTOPPED, NULL) = 9154
wait4(-1, 0x7fbfffd454, WNOHANG|WSTOPPED, NULL) = -1 ECHILD (No child processes)
rt_sigreturn(0xffffffffffffffff) = -1 EINTR (Interrupted system call)
select(1, [0], NULL, NULL, {0, 934000}) = 0 (Timeout)

****** RAILS LOG ********


Processing UserController#show (for 67.171.73.40 at 2006-12-09 09:17:35) [HEAD]
Session ID: 0ca5e149498d890db933851b193e6421
Parameters: {"action"=>"show", "id"=>"testas6", "controller"=>"user"}
*application.rb:124 reloading vars, last was 2 hours ago
MemCache Get online
MemCache Get user/show/testas6
User Load (0.000910) SELECT * FROM users WHERE (name='testas6') LIMIT 1
Namechange Load (0.000369) SELECT * FROM ap.namechanges WHERE (start='testas6') ORDER BY created_at desc LIMIT 1
Rendering public/404.html (404)
MemCache Put session:0ca5e149498d890db933851b193e6421 expires 0
UserController: missing default helper path user_helper
MemCache Get session:bda40d81bf8501de7187f7beb6b71e3f
 

mistwang

LiteSpeed Staff
#9
The lsws log shows that, ruby just simply closed the connection, not sending anything back.

The ruby process you straced is the parent ruby process which does nothing but accept a connection and fork a child ruby process to handle the request. I really interested is what happened in the child process. I think it crashed, probably.

Is the rails log from litespeed or from other setup?

Are you setup your Rails app with the easy configuration? or step by step manually?
You are using rewrite rules to dispatch the request, have you tried using 404 handler?

Let me know your exact rails setup, I will try to duplicate it in our lab. :)
 
#10
I'm using the "easy" setup. I could try switching to the manual one... So it's 404 handler is automatically handling the ruby process? Maybe it's angry that I'm returning 404 again, so it tries to send it back to the 404 handler, which sends it there again ... etc.

Rails "suexec virtual host" template included, I turned off "restricted, changed the user/group to a new user, and added the rewrite rules:
RewriteRule ^$ index.html [QSA]
RewriteRule ^([^.]+)$ $1.html [QSA]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ dispatch.cgi [QSA,L]

How would I get the PID of the actual ruby process? Does it have the same name as "ruby /opt/lsws/fcgi-bin/RailsRunner.rb"? Each "process" really starts two of these? It thought it was the web server that handled the forking/etc...?
 

mistwang

LiteSpeed Staff
#11
I tried your configuration, proper 404 page was returned.

How would I get the PID of the actual ruby process? Does it have the same name as "ruby /opt/lsws/fcgi-bin/RailsRunner.rb"? Each "process" really starts two of these? It thought it was the web server that handled the forking/etc...?
LSWS only start one ruby process, this process will fork children process to handle all requests. This way, the startup cost of RoR framework is the lowest.

To get PID of the children process, send a normal request without resulting in 503 first, so the child process will be forked already, you should see two "ruby ...RailsRunner.rb" process at this moment, the child process should have a higher PID number. strace that process.

And check if the child process disappeared after processing the 404 request, it should not. if it does, means it crashed for some reason.
 
#12
Solved!

It appears that the problem was that I was being stupid (doh!) and using a relative url to render the 404 file, "../../public/404.html". Switching it to "#{RAILS_ROOT}/public/404.html" solved the problem.

The parent-fork thing must change the working directory, which caused the discrepancy between litespeed and other servers. It might be better to reset the cmd if possible, just to keep it consistent.

Stracing the proper pid let me see that it was getting the error, though i'm still not sure why 1) rails crashes like that, or 2) litespeed shows it as a 503 rather than 500. It's not a timeout, it happens right away...

write(1, "Rendering public/404.html (404)\n", 32) = 32
open("public/404.html", O_RDONLY) = -1 ENOENT (No such file or directory)
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
getcwd("/usr/local/bin", 200) = 15
getcwd("/usr/local/bin", 200) = 15
getcwd("/usr/local/bin", 200) = 15
open("public/404.html", O_RDONLY) = -1 ENOENT (No such file or directory)
write(4, "set aph:session:f802551cd64f1e21"..., 961) = 961
read(4, "STORED\r\n", 1024) = 8
gettimeofday({1165712051, 488544}, NULL) = 0
fstat(1, {st_mode=S_IFREG|0644, st_size=5343777, ...}) = 0
write(1, "MemCache Put session:f802551cd64"..., 64) = 64
rt_sigaction(SIGINT, {SIG_DFL}, {0x46b990, [INT], SA_RESTORER|SA_RESTART, 0x3cfa62e410}, 8) = 0
close(4) = 0
munmap(0x2a97b36000, 4096) = 0
close(4) = -1 EBADF (Bad file descriptor)
munmap(0x2a97b35000, 4096) = 0
poll([{fd=5, events=POLLIN|POLLPRI}], 1, 0) = 0
setsockopt(5, SOL_SOCKET, SO_SNDTIMEO, "\2003\341\1\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0
write(5, "\1\0\0\0\1", 5) = 5
shutdown(5, 2 /* send and receive */) = 0
close(5) = 0
exit_group(1) = ?
Process 13473 detached

Thanks again! You guys rock! Litespeed is the bestest project ever....
 

mistwang

LiteSpeed Staff
#13
Glad you find the problem, we will address the compatibility issue in next ruby-lsapi release for sure. :)

What exactly is the result of getcwd() under other environment? #{RAILS_ROOT}/public?
 
Top