Hung Process Help

GOT

Well-Known Member
#1
I have over 100 Litespeed lcienses so this isn't my first rodeo, but I have this one setup that has given me fits for months and I would like another set of eyeballs on it.

We have two dedicated servers loadbalanced running a single WordPress site with a third as the database server.

What we are seeing is that index.php at times (once or twice a day worst case, couple times a week on average) gets hung up and Litespeed stops handling new requests. Process looks like this

11899 ? S 0:05 litespeed (lshttpd)
11903 ? S 0:00 \_ httpd (lscgid)
11904 ? Sl 1:00 \_ litespeed (lshttpd)
1379 ? S 0:00 | \_ lsphp5
1387 ? S 0:00 | \_ lsphp5:/home/ocfcom/public_html/index.php
1490 ? S 0:00 | \_ lsphp5:/home/ocfcom/public_html/index.php
1529 ? S 0:00 | \_ lsphp5:/home/ocfcom/public_html/index.php
1542 ? S 0:00 | \_ lsphp5:/home/ocfcom/public_html/index.php
1556 ? S 0:00 | \_ lsphp5:/home/ocfcom/public_html/index.php
1560 ? S 0:00 | \_ lsphp5:/home/ocfcom/public_html/index.php
1564 ? S 0:00 | \_ lsphp5:/home/ocfcom/public_html/index.php
1610 ? S 0:00 | \_ lsphp5:/home/ocfcom/public_html/index.php
1622 ? S 0:00 | \_ lsphp5:/home/ocfcom/public_html/index.php
1710 ? S 0:00 | \_ lsphp5:/home/ocfcom/public_html/index.php
1758 ? S 0:00 | \_ lsphp5:/home/ocfcom/public_html/index.php
1771 ? S 0:00 | \_ lsphp5:/home/ocfcom/public_html/index.php
1772 ? S 0:00 | \_ lsphp5:/home/ocfcom/public_html/index.php
1773 ? S 0:00 | \_ lsphp5:/home/ocfcom/public_html/index.php
11905 ? Sl 0:26 \_ litespeed (lshttpd)

They're running the latest version and are both 2 CPU licenses.

Here are our external app settings

http://d.pr/i/18oQO

You can see from there that I have escalated the php processes to a staggering 1500, though that seems to be getting ignored.

I'd appreciate any feedback
 

wanah

Well-Known Member
#2
When this happens can you access php files on that account that don't contain mysql commands (like a simple <?php echo 'hello world' ; ) Can you still access static files ? You could also enable logging and filter by your IP and send the result to litespeeds bugs e-mail address to see if they can see anything wrong.
 

Lauren

LiteSpeed Staff
Staff member
#3
The ideal set up would be using LSLB (LiteSpeed Load Balancer) + 2 LSWS, and Cache is on LSLB. Can you WP site use LSCache plugin? If 2 LSWS sit together with LSLB, you can use LSLB for HTTPS termination. With Cache served from load balancer side, you should see big improvement.

If you need us to check your server, you can create a ticket from your acct and provide login.
 

GOT

Well-Known Member
#4
We're not using LSLB in this case. DNSMadeEasy with fail-over. That works well.

Did you look at the settings? See anything obvious in there?

Why is there so few php processes? Shouldn't there be a LOT more?

I'm not the designer of the site, so I would have to talk to them about the cache plugin.
 

NiteWave

Administrator
#5
When this happens can you access php files on that account that don't contain mysql commands (like a simple <?php echo 'hello world' ; ) Can you still access static files ?
yes, this is what we want to know first.

Why is there so few php processes? Shouldn't there be a LOT more?
new php process will be stared on demand.
if there are enough idle php processes available, no need start a new php process.
 

GOT

Well-Known Member
#6
We had another event this morning and I have confirmed that non wordpress php files are still accessible, its just the wordpress that won't load.
 

mistwang

LiteSpeed Staff
#8
If both servers hung at the same time, it is likely something wrong with MySQL DB server.
If one server hung, another one is fine, it could be a problem of dead lock in PHP opcode cache.

What you can do is to strace some PHP processes when server hung, to find out the source of the hung.
 

GOT

Well-Known Member
#9
Its definitely one server at a time. I had an incident today and I checked the database and it was fine, there were not hung queiries. I also found that the processes were not actually hung. They were being terminated but another filling its place almost immediately. I tried doing a trace, but each time, the process was already gone before I could put a trace on it.

We are not using any opcode caching that I can find. Only think I see is that apc is loaded but disabled:

apc
APC Support => disabled
 

GOT

Well-Known Member
#11
Well, I saw apc mentioned in the logs a numer of times and this weekend we disabled apc and did not have a single instance.

Is there something I can do/modify to make this more stable? Or is APC just inherently a problem?
 

GOT

Well-Known Member
#13
Well, the problem is that they are using it mostly for its ability to cache database query data not as an opcacher.

If you don't know of any way to keep apc from deadlocking under Litespeed, I'll have to discuss with my client.
 
Top