High Traffic vbulletin forum with 503 Errors

nolnet

Active Member
#1
I've spent the last month trying to debug an issue with a very large vBulletin forum with around 3,000 concurrent users. It runs on a DELL enterprise class server with 8 cores and 8x 15K drives, server load and page load times are fine.

The issue is constant 503 errors. I've hired 3rd parties to review the issue as well including some well known folks on this forum. I think Gary also took a quick look into the issue as well.

So far we have tried:
  1. Disabling and tweaking eAccelerator
  2. Disabling and tweaking ioncube loaders
  3. increased memory limit on PHP5 to 800MB was 256
  4. re-compiled apache several times

We tailed the error_log, access_log and domlogs for days during this whole process and found that the 400 error seems to be the cause of the issue when hitting some of the links on the site...if that makes any sense.
This 400 error is being created while a forum link is being hit.

*********
XX.XXX.254.XXX - - [16/Feb/2010:14:42:41 -0600] "GET /forums/showthread.php?t=1185464 HTTP/1.1" 503 400 "http://xxxxx.com/forums/forumdisplay.php?f=40" "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7"
XX.XXX.254.XXX - - [16/Feb/2010:17:06:13 -0600] "GET /forums/showthread.php?t=1185550 HTTP/1.1" 503 400 "http://www.xxxxx.com/" "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7"
XX.XXX.254.XXX - - [16/Feb/2010:17:07:22 -0600] "GET /forums/showthread.php?t=1185573 HTTP/1.1" 503 400 "http://xxxxx.com/forums/forumdisplay.php?f=40" "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7"
*********

Needless to say we've run out of ideas and help is appreciated!
 
Last edited:

mistwang

LiteSpeed Staff
#2
In the access log, 503 is the code, 400 is the response body size of 503 error returned.
503 error is caused by PHP crash, please take a closer look at error log and stderr.log for messages related to the 503 error.

please post your lsphp5 external app configruation.

Have you tried different PHP version?

It will be easy to trouble shoot if 503 consistently happen with certain URL, if it is random, it is hard to debug.
 

nolnet

Active Member
#3
The 503's appear to be generated when using forumdisplay.php, search.php and showthread.php... there is no apparent pattern as to which URL's are most frequent.

I have attached a PDF with the lsphp5 configuration.

Thanks!
 

Attachments

nolnet

Active Member
#10
Well darn they still occur.

############
xx.201.xxx.xx - - [18/Feb/2010:15:57:57 -0600] "GET /forums/showthread.php?t=1186180 HTTP/1.1" 503 400 "http://www.xxxxx.com/" "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.8) Gecko/20100214 Ubuntu/9.10 (karmic) Firefox/3.5.8"
root@xxxxxx [~]# date
Thu Feb 18 16:15:21 CST 2010
############
 
#11
best you record the 503 error rates by grep error.log.
e.g., before and after adjust the lsws settings, how many 503 errors in a period(1 hour, 12 hours etc). so we can see if any improvement.
 

nolnet

Active Member
#12
From 18/Feb/2010:21:21:28 to 19/Feb/2010:09:20:38 ( 12 hours ), there were 2973419 php file access to the server ( dynamic ). 76 of these gave 503 error.
 

mistwang

LiteSpeed Staff
#13
That's caused by random crash by PHP, there is no good solution other than trying different PHP version, turn off eAccelerator etc.
As I know PHP 5.2.12 + eAccelerator is not that stable.
Another thing you can try is to let lsphp5 dump core file, so, you can get an idea what is the problem in PHP and file bug report to PHP group.
To do that, you need to set "LSAPI_ALLOW_CORE_DUMP" env for lsphp5 external app
http://www.litespeedtech.com/php-litespeed-sapi.html

"ulimit -c unlimited" from command line, then stop/start LSWS from command line.
And you may need to make the user that PHP process running as has permission to write to directory holding the PHP script.
 
Last edited:

mistwang

LiteSpeed Staff
#14
We have improved LSWS 4.0.13 a little bit, it may help with your 503 errors.
Please upgrade to 4.0.13, if you have done that before, please upgrade again with "Force reinstall" from web console, or upgrade manually from command line.
 

nolnet

Active Member
#16
we tried upgrade to LSWS 4.0.13 ; changed persistent connections off and also raised the mysql connections. still the 503 happens randomly.

LSAPI_ALLOW_CORE_DUMP is set ; but we cannot see any core dumps. perhaps if you could explain more regarding this, we could finally isolate the real cause.
 

nolnet

Active Member
#19
This is the error that is causing the 503 :

2010-03-01 21:43:59.244 [INFO] [IP:50229-0#APVH_*******] connection to [/tmp/lshttpd/lsphp5.sock.344] on request #35, confirmed, 1, associated process: 29307, running: 1, error: Connection reset by peer!
2010-03-01 21:43:59.244 [NOTICE] [IP:50229-0#APVH_*******] Max retries has been reached, 503!
2010-03-01 21:43:59.244 [NOTICE] [IP:50229-0#APVH_********] oops! 503 Service Unavailable
2010-03-01 21:43:59.244 [NOTICE] [**********:50229-0#APVH_********] Content len: 0, Request line:
--------------------------------------------------------------------------------------

Could this help in isolating the issue ? The folder has nobody write permissions and we can see no reason why this error is happening. The memory is raised for lsphp ; connection time is raised, but the 503 continues.
 

mistwang

LiteSpeed Staff
#20
Have you download and upgrade to the latest build of 4.0.13? It will reduce false alarms.

Usually "Connection reset" is caused by PHP crash in middle of processing a request.
 
Top