solaris litespeed enterprise always dumps core on upgrade

jrmarino

Well-Known Member
#1
I'm running litespeed on two servers, identical hardware, Sunfire X4100M2 I think. One version of litespeed is the enterprise version, the other is the standard version. Solaris 10 is the operating system for both.

When I use the upgrade feature, the standard version always works as expected. It upgrades and restarts itself, no issue.

The enterprise version never works. It upgrades itself. It works for a few seconds and then dumps core. This has happened at least 3 times, and every time I've used the upgrade.

My solution is to upgrade and to immediately disable litespeed via the SMF. With 3.3.6 it dumped core before I could disable it. When I enable it again, runs nominally.

These environments are as about as identical as you can get -- the same version of solaris, the same sun hardware, other software configurations are similar. It's something specific to the enterprise version causing the core dump.
 

jrmarino

Well-Known Member
#3
Hi mistwang,

this is possibly related. Today I restarted the enterprise webserver twice intentionally to update the awstats alias field. Both times I received this email:

At [11/Mar/2008:05:20:11 -0500], web server with pid=21294 received unexpected signal=9, no core file is created. A new instance of web server will be started automatically!
I also updated the standard webserver to update the awstats alias field and I did not receive that email.

The enterprise webserver is at version 3.3.6.
The standard webserver is at version 3.3.7
 

jrmarino

Well-Known Member
#5
I don't recall seeing that before. I think it just started happening with version 3.3.6.

I looked at the SMF logs -- the watchdog is taking care of the restarts and the watchdog is not terminating. Here is the tail end of the SMF logs. You can see the core dumps that I mentioned before. The last activity is when I restarted the server on March 2nd to upgrade to 3.3.6.

Code:
[ Oct 23 01:53:16 Method "start" exited with status 0 ]
[ Jan 11 16:59:24 Stopping because process dumped core. ]
[ Jan 11 16:59:25 Executing stop method ("/opt/lsws/bin/lswsctrl stop") ]
[OK] lshttpd: stopped.
[ Jan 11 16:59:25 Method "stop" exited with status 0 ]
[ Jan 11 17:00:25 Method or service exit timed out.  Killing contract 1167 ]
[ Jan 11 17:01:22 Leaving maintenance because disable requested. ]
[ Jan 11 17:01:22 Disabled. ]
[ Jan 11 17:01:30 Enabled. ]
[ Jan 11 17:01:30 Executing start method ("/opt/lsws/bin/lswsctrl start") ]
[OK] lshttpd: pid=2499.
[ Jan 11 17:01:30 Method "start" exited with status 0 ]
[ Jan 28 12:33:03 Stopping because process dumped core. ]
[ Jan 28 12:33:03 Executing stop method ("/opt/lsws/bin/lswsctrl stop") ]
[OK] lshttpd: stopped.
[ Jan 28 12:33:03 Method "stop" exited with status 0 ]
[ Jan 28 12:34:03 Method or service exit timed out.  Killing contract 1475 ]
[ Jan 28 14:20:50 Leaving maintenance because disable requested. ]
[ Jan 28 14:20:50 Disabled. ]
[ Jan 28 14:20:56 Enabled. ]
[ Jan 28 14:20:56 Executing start method ("/opt/lsws/bin/lswsctrl start") ]
[OK] lshttpd: pid=8965.
[ Jan 28 14:20:56 Method "start" exited with status 0 ]
[ Mar  2 05:41:03 Stopping because process dumped core. ]
[ Mar  2 05:41:03 Executing stop method ("/opt/lsws/bin/lswsctrl stop") ]
[OK] lshttpd: stopped.
[ Mar  2 05:41:03 Method "stop" exited with status 0 ]
[ Mar  2 05:42:04 Method or service exit timed out.  Killing contract 1549 ]
[ Mar  2 05:42:04 Leaving maintenance because disable requested. ]
[ Mar  2 05:42:04 Disabled. ]
[ Mar  2 05:42:04 Enabled. ]
[ Mar  2 05:42:04 Executing start method ("/opt/lsws/bin/lswsctrl start") ]
[OK] lshttpd: pid=28159.
[ Mar  2 05:42:04 Method "start" exited with status 0 ]
 

mistwang

LiteSpeed Staff
#6
[ Mar 2 05:42:04 Method or service exit timed out. Killing contract 1549 ]
Does it mean SMF kill the process with "-9"? That would explain the unexpected signal 9.
LSWS does graceful restart/stop, which will try to finish all the pending requests before exiting.

Will investigate the "Process dumped core" issue. it does not show which process dumped core.
If lshttpd does, you should receive a similar report in email.

Can you locate the core file and check it with GDB? lshttpd core files should be under /tmp/lshttpd .
 

jrmarino

Well-Known Member
#7
There are two issues here. The "signal 9" issue has nothing to do with SMF. SMF starts the watchdog and it would log any action. There have been no actions logged since March 2. I don't think SMF sent a kill signal to a child process without logging that.

If I can find the core file and if I have gdb installed on that server, I will try to do a backtrace. I agree that it's probably the watchdog that dumped core, especially since it happened during a webserver restart.
 

jrmarino

Well-Known Member
#8
Hmmm - this backtrace isn't nearly is descriptive as the one that appeared with an earlier problem. Hopefully this is useful to you.

Code:
shepard-root# gdb lshttpd core
GNU gdb 6.1
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-pc-solaris2.8"...lshttpd: No such file or directory.

Core was generated by `./lshttpd'.
Program terminated with signal 11, Segmentation fault.
#0  0x0804e799 in ?? ()
(gdb) bt
#0  0x0804e799 in ?? ()
#1  0xfee5001f in ?? ()
#2  0x00000012 in ?? ()
#3  0x00000000 in ?? ()
#4  0x08045a1c in ?? ()
#5  0x080459e4 in ?? ()
#6  0xfee465d9 in ?? ()
#7  0x00000012 in ?? ()
#8  0x00000000 in ?? ()
#9  0x08045a1c in ?? ()
#10 0x0804e794 in ?? ()
#11 0x00000000 in ?? ()
#12 0x00000000 in ?? ()
#13 0xfee78000 in ?? ()
#14 0x00000004 in ?? ()
#15 0x0804e794 in ?? ()
#16 0x00020000 in ?? ()
#17 0x00000000 in ?? ()
#18 0x00000000 in ?? ()
#19 0x00000000 in ?? ()
#20 0x00000000 in ?? ()
#21 0x00000000 in ?? ()
#22 0xfee7bca0 in ?? ()
#23 0xfee7bc80 in ?? ()
---Type <return> to continue, or q <return> to quit---
#24 0x00000012 in ?? ()
#25 0xfecb2000 in ?? ()
#26 0xfee7b700 in ?? ()
#27 0x08045a08 in ?? ()
#28 0xfee46759 in ?? ()
#29 0x00000012 in ?? ()
#30 0x00000000 in ?? ()
#31 0x08045a1c in ?? ()
#32 0x00000000 in ?? ()
#33 0xfecb2000 in ?? ()
#34 0xfee78000 in ?? ()
#35 0x08045a1c in ?? ()
#36 0x08045c3c in ?? ()
#37 <signal handler called>
#38 0xfee50957 in ?? ()
#39 0xfee4571f in ?? ()
#40 0x08045ca0 in ?? ()
#41 0x00000001 in ?? ()
#42 0x08045c60 in ?? ()
#43 0x00000000 in ?? ()
#44 0x08045c60 in ?? ()
#45 0x000003e8 in ?? ()
#46 0xfee78000 in ?? ()
#47 0x08045c6c in ?? ()
---Type <return> to continue, or q <return> to quit---
#48 0xfedfa812 in ?? ()
#49 0x08045ca0 in ?? ()
#50 0x00000001 in ?? ()
#51 0x08045c60 in ?? ()
#52 0x00000000 in ?? ()
#53 0x08047e3c in ?? ()
#54 0x08047dc0 in ?? ()
#55 0x0828bfe0 in ?? ()
#56 0x00000001 in ?? ()
#57 0x00000000 in ?? ()
#58 0xfeffc908 in ?? ()
#59 0x08045cbc in ?? ()
#60 0x0804f4f4 in ?? ()
#61 0x08045ca0 in ?? ()
#62 0x00000001 in ?? ()
#63 0x000003e8 in ?? ()
#64 0x0804f532 in ?? ()
#65 0x0828bfe0 in ?? ()
#66 0xfee90290 in ?? ()
#67 0x00000001 in ?? ()
#68 0x0804dfd1 in ?? ()
#69 0x08045cb4 in ?? ()
#70 0xfefcc1d4 in ?? ()
#71 0x0828c100 in ?? ()
---Type <return> to continue, or q <return> to quit---
#72 0x00000006 in ?? ()
#73 0x08040001 in ?? ()
#74 <signal handler called>
Cannot access memory at address 0x5c
(gdb)
 

mistwang

LiteSpeed Staff
#9
It is not useful at all, but thanks! :)
Only backtrace from core produced by the debug build of lshttpd is very useful. debug build is lsws-3.3.x/bin/lshttpd.dbg, you need to manually replace /opt/lsws/bin/lshttpd.3.3.x with it. You should be able find pass installation packages under lsws/autoupdate directory.

Where is the core file located? Are you using 32bit binary or 64bit?

This GDB was configured as "i386-pc-solaris2.8"...lshttpd: No such file or directory.
You probably need to give the full path to lshttpd.
 
Top