requests time out (sometimes...)

sofatime

Well-Known Member
#1
I have almost finished the installation of lsws and I am really happy with it. There is just one strange problem left: Sometimes requests do not get answered and time out. This only happens very rarely. It is difficult to say how often as this is still only a test box, but I would say less than 1%. It seems to be happening only with php requests, but I cannot say that for sure.
If I stop the request and reload it gets answered immediately.
It also seems to me that it happens more often when I do a graceful reload after a configuration change. If I stop and start using lswsctrl it does not happen often.

about the installation:
Solaris 10 x86
lsws enterprise 2.1.18 trial with chroot
using the php binary that came with lsws
php.ini is mostly the default one
using a lsphp process per virtual host with "cgi set uid mode" set to "docroot ID"

Any idea?

Thank you
Daniel
 

mistwang

LiteSpeed Staff
#2
Is the timeout consistent after it starts to happen? Or randomly? A few tips to identify the problem, set "max connection" to "1", enable debug logging, "ktrace" lsphp and lshttpd when that happened and compare the trace result when it is normal.
 

sofatime

Well-Known Member
#3
No, the timeout is not consistent. It seems to be randomly. I will try with debug logging.
Just came across another problem: Apparently DNS lookups do not work in php, e.g. in DB configuration I have to use the IP of the MySQL server. First I had lsws installed without chroot, where this did not happen, now I have it installed with chroot. So I suspect it has to do with that. In the chroot env I have /etc/resolv.conf (was copied during install) with correct values.

BTW: The login to the LiteSpeed Admin GUI Demo is not valid anymore (this is always a good tool to check for default values in configuration...).
 

mistwang

LiteSpeed Staff
#4
Maybe some dll needed for DNS lookups are missing in the chroot jail, you can use ktrace to find out by comparing the system calls from a normal PHP and a jailed PHP from command line.
 

sofatime

Well-Known Member
#5
The following files were missing:

chroot/usr/lib/nss_dns.so.1
chroot/usr/lib/nss_files.so.1
chroot/etc/netconfig

I also copied over chroot/etc/nsswitch.conf, although that is not needed. I am also not sure if nss_files.so.1 is really needed, maybe nss_dns.so.1 is enough, did not test that (I assume nss_files.so would be to access the hosts file).

Maybe you can include that in the chroot-create-script for solaris.
 

mistwang

LiteSpeed Staff
#6
Thanks, I will add them.
Actually, the script tries to copy nss_dns.so.* and nss_files.so.* from /lib/ directory, they are in /usr/lib/ on your server. :)
 
Last edited:

sofatime

Well-Known Member
#7
No, they are in /lib. In /usr/lib I have symlinks to the ones in /lib. I will move them to chroot/lib then.
I see in the script that it copies the nss_* files too, but not for SunOS. There is an "if" structure and the nss_* files only get copied for "else" and not for "SunOS".
 

sofatime

Well-Known Member
#8
about the timeout problem: I solved that compiling my own lsphp (5.1.4) with apc according to the instructions in the wiki. With this new lsphp no timeout problems occur anymore (tested a lot with ab).
I couldn't compile in gd yet, but that's another problem...

I had to copy several libs in the chroot for the new lsphp to work (libxml.so.2, libiconv.so.2 and others). Why were they not necessary with the stock lsphp? Can (should) I compile these libs statically in lsphp? How? I don't mind having these libs in chroot, just wondering. And I also wonder why my lsphp is 12MB while the stock lsphp is only 2 MB. (?)
 

mistwang

LiteSpeed Staff
#9
The stock lsphp is 4.4.2, do not use libxml etc. You don't need to statically link to those libs, as you don't need to distribute the binary on different OS environment.
do "strip lsphp" will shirk the size of the executable by removing debug symbols, but you better leave them along, just in case you need to debug a PHP problem. It should not increase the memory usage at run-time.
 

sofatime

Well-Known Member
#10
I am pretty happy with my setup now (even managed to compile in gd), there's only two small problems:

1) the mentioned timeout problems occur in the admin interface sometimes. that is not really a problem because after just clicking again it works.

2) graceful restart does not work. after graceful restarting it seems like it is not being properly restarted, even the admin interface does not respond anymore sometimes. I always use lswsctrl stop/start, then everything works. But with lswsctrl it takes a long time (or several lswsctrl stop) until all processes are stopped.

Any idea?
 

mistwang

LiteSpeed Staff
#11
Admin interface use the pre-built PHP binary, so it has the timeout problem.

Not sure about the graceful restart problem though, may be some thing wrong with Solairs 10, does it work without chroot? Is your server 64-bit or 32-bit?
 

sofatime

Well-Known Member
#12
The server is 64-bit (Opteron).

I have tested with and without chroot: It seems it has nothing to do with chroot, as this happens without chroot too. But it looks like it has to do with the normal lsphp processes. This is what I did:
- started lsws
- opened admin, changed some setting, restarted using the admin
- restart worked and I could use the admin (even without timeouts), could change another setting, restart again, no problem
... until I opened a php-page of a normal site. From that moment the admin does not respond anymore (until restarted using lswsctrl).

If I do the same as above, but I stop/start the server after config changes using lswsctrl everything works normally.

Does that make sense?

About the fact that stopping takes a long time using lswsctrl: That only happens when normal lsphp processes are running. If I stop the server right after starting (when no lsphp processes are running yet) it stops immediately.

Apart from that the server works normally, also php.
 

mistwang

LiteSpeed Staff
#13
That pretty strange.
Do you mean that admin stop working completely once a php page of a normal site was accessed? Not even able to load any admin page? Is there any "admin_php" process running?
You can try to replace "lsws/admin/fcgi-bin/admin_php" with your customized lsphp binary, but I am not sure the admin console will work cleanly under PHP5 or not. You can give it a try. :)

Have you tried "lswsctrl restart"? Does that work when admin console failed?
 

sofatime

Well-Known Member
#14
Yes, admin stops working completely, and yes, there is a admin_php process running. But it does not happen every time. Sometimes I can do several changes/restarts without problems.
I replaced admin_php with my own php. Didn't change. Some config changes/restarts were ok, then it blocked again.

I can not really say anything about "lswsctrl restart". I tried it a few times and it worked but I remember that it did not work in the past. Only complete stop and start works every time.

How is the restart using admin_php done? As there is no kill command in the chroot I assume some other way?

BTW: I didn't notice any problems with the admin interface using php 5.1.4.

Any other idea? If not - no big deal. I can live with that for now, there shouldn't be too many config changes.
 

sofatime

Well-Known Member
#15
Sorry, I misunderstood your question about "lswsctrl restart". When the admin console failed and I do "lswsctrl restart" it works again (until I load a normal php page).
 

mistwang

LiteSpeed Staff
#16
Which version of solaris 10 are you using?

How is the restart using admin_php done? As there is no kill command in the chroot I assume some other way?
The admin interface made a connection back to web server and tell it to restart. Do not use kill.

If you don't mind, can you please post the the trace result of "ktrace" of the admin_php when it stop working?
Also, debug logging can be turned on for the admin interface by changing log level to "debug" for the admin vhost, and "debug level" to "HIGH" at server level. If you server is in production yet.

Thanks.
 

sofatime

Well-Known Member
#17
Its Solaris 10 3/05.

About ktrace: Unfortunately I do not know this tool. Some research showed me this is called truss on Solaris. I will try this tomorrow, and also the debug log for the admin vhost. Maybe you can give me instructions on how to use truss?
 

mistwang

LiteSpeed Staff
#18
About ktrace: Unfortunately I do not know this tool. Some research showed me this is called truss on Solaris. I will try this tomorrow, and also the debug log for the admin vhost. Maybe you can give me instructions on how to use truss?
Sorry about that, ktrace probably is a BSD thing. Usually, you can use "truss -d <pid>" to attach to a running process to trace system calls made by that process.
This way, you can get a rough idea about what a process was doing.
 

sofatime

Well-Known Member
#19
I did the following:
- debug level to high at server level
- log level to debug at vhost admin level
- truss -p PID

Then I changed a setting in lsws admin, did graceful restart, opened a normal PHP page, then admin blocked (browser waiting for answer).

truss-output:
read(0, 0x0827A670, 8192) (sleeping...)
read(0, 0x0827A670, 8192) = 0
close(0) = 0
_exit(0)
The first line immediately, the last three lines after a few minutes. Then admin_php was gone! Until now I didn't wait so long, so I didn't notice that admin_php stops. And it does not get restarted again.
The truss-thing was a bit difficult, because apparently admin_php gets stopped when I do a graceful restart and a new process gets started, so I had to do a new truss with the new PID.

error_log:
There is no entry concerning the request which does not get answered. The last entry is:
2006-08-09 10:24:59.206 [DEBUG] [192.168.x.x:3075-1] Keep-alive timeout, close!
2006-08-09 10:24:59.206 [DEBUG] [192.168.x.x:3075-1] Close socket ...
But that was from the last request before restart (I think).

I also did netstat -a which shows an established connection to 7080.

I also want to stress again, that this does not happen every time. It's about every 5th time, and I don't know when it happens and when it works.
 

mistwang

LiteSpeed Staff
#20
Looks like something wrong with the event dispatcher, the server does not get the event from OS regarding the blocked request at all.

Are you using devpoll(), if so, please give plain poll() a try, see if it help.

Thank you very much for the debug information. :)
 
Top