[RESOLVED] Restart problem (old server isn't killed)

Discussion in 'Bug Reports' started by Grzegorz Derebecki, Feb 3, 2010.

  1. Grzegorz Derebecki

    Grzegorz Derebecki New Member

    After swith to Litespeed 4.0.12 enterprise (before i use standard version) i have problem with restarting

    Litespeed version: freebsd 64bit ent 4.0.12

    Before restarting:
    Code:
    root@core2:~# ps aux |grep lshttp
    www    33858  0.2  0.1 11572  6504  ??  D     8:40PM   0:00.09 ./lshttpd (lshttpd.4.0.12)
    root   33854  0.0  0.1 10412  5248  ??  S     8:40PM   0:00.01 ./lshttpd (lshttpd.4.0.12)
    
    After press restart in admin pannel
    Code:
    root@core2:~# ps aux |grep lshttp
    www    33858  4.6  0.1 13036  7536  ??  SN    8:40PM   0:04.40 ./lshttpd (lshttpd.4.0.12)
    root   33854  0.0  0.1 10412  5256  ??  S     8:40PM   0:00.03 ./lshttpd (lshttpd.4.0.12)
    root   34344  0.0  0.1 10412  5252  ??  S     8:41PM   0:00.01 lshttpd (lshttpd.4.0.12)
    www    34346  0.0  0.1 10492  5668  ??  S     8:41PM   0:00.03 lshttpd (lshttpd.4.0.12)
    
    And after while (when old server is ready to end):
    Code:
    root@core2:~# ps aux |grep lshttp
    www    33858  0.0  0.1 12572  7076  ??  SN    8:40PM   0:04.42 ./lshttpd (lshttpd.4.0.12)
    root   34344  0.0  0.1 10412  5252  ??  S     8:41PM   0:00.02 lshttpd (lshttpd.4.0.12)
    www    34346  0.0  0.1 11636  6680  ??  S     8:41PM   0:00.48 lshttpd (lshttpd.4.0.12)
    
    As we can see pid 33858 is still running and even i send kill 33858 it don't do anything. Also all lsapi process are runing (php, ruby)



    i tested this on 4.1RC2 and there problem don't exists.
    Last edited by a moderator: Feb 11, 2010
  2. mistwang

    mistwang LiteSpeed Staff

    due to cPanel 11.25 graceful restart issue, we have changed graceful restart of 4.0.12 to finish all pending requests completely, my guess is that there are still requests in process. It takes longer for the lingering process to quit.

    Please strace and lsof the running process. if there are active ESTABLISHED connections to port 80, it is still serving requests.
  3. Grzegorz Derebecki

    Grzegorz Derebecki New Member

    truss:

    Code:
    kevent(7,{},0,{},16,{0.100000000 })		 = 0 (0x0)
    gettimeofday({1265242110.972211 },0x0)		 = 0 (0x0)
    gettimeofday({1265242110.972268 },0x0)		 = 0 (0x0)
    kevent(7,{},0,{},16,{0.100000000 })		 = 0 (0x0)
    gettimeofday({1265242111.071037 },0x0)		 = 0 (0x0)
    stat("/base/opt/lsws/logs/access.log",{ mode=-rw-r--r-- ,inode=2922266,size=11951,blksize=4096 }) = 0 (0x0)
    stat("/base/opt/lsws/logs/error.log",{ mode=-rw-r--r-- ,inode=2923218,size=4336,blksize=4096 }) = 0 (0x0)
    stat("/base/opt/lsws/logs/stderr.log",{ mode=-rw-r--r-- ,inode=2923034,size=3977,blksize=4096 }) = 0 (0x0)
    gettimeofday({1265242111.071355 },0x0)		 = 0 (0x0)
    kevent(7,{},0,{},16,{0.100000000 })		 = 0 (0x0)
    gettimeofday({1265242111.169718 },0x0)		 = 0 (0x0)
    gettimeofday({1265242111.169785 },0x0)		 = 0 (0x0)
    kevent(7,{},0,{},16,{0.100000000 })		 = 0 (0x0)
    gettimeofday({1265242111.268581 },0x0)		 = 0 (0x0)
    gettimeofday({1265242111.268650 },0x0)		 = 0 (0x0)
    kevent(7,{},0,{},16,{0.100000000 })		 = 0 (0x0)
    gettimeofday({1265242111.367619 },0x0)		 = 0 (0x0)
    gettimeofday({1265242111.367685 },0x0)		 = 0 (0x0)
    kevent(7,{},0,{},16,{0.100000000 })		 = 0 (0x0)
    gettimeofday({1265242111.467696 },0x0)		 = 0 (0x0)
    gettimeofday({1265242111.467748 },0x0)		 = 0 (0x0)
    kevent(7,{},0,{},16,{0.100000000 })		 = 0 (0x0)
    gettimeofday({1265242111.567390 },0x0)		 = 0 (0x0)
    gettimeofday({1265242111.567450 },0x0)		 = 0 (0x0)
    kevent(7,{},0,{},16,{0.100000000 })		 = 0 (0x0)
    gettimeofday({1265242111.667418 },0x0)		 = 0 (0x0)
    gettimeofday({1265242111.667470 },0x0)		 = 0 (0x0)
    

    lsof:

    Code:
    root@core2:/opt/nginx# lsof -n |grep 84359
    lshttpd.4 84359    www  cwd     VDIR               0,96               3584  4616504 /tmp/lshttpd
    lshttpd.4 84359    www  rtd     VDIR               0,96                512        2 /
    lshttpd.4 84359    www  txt     VREG               0,96            3452719  2922218 /base/opt/lsws/bin/lshttpd.4.0.12
    lshttpd.4 84359    www  txt     VREG               0,96             197944  1158833 /libexec/ld-elf.so.1
    lshttpd.4 84359    www  txt     VREG               0,96              32512  2239860 /base/usr/local/lib/compat/libcrypt.so.3
    lshttpd.4 84359    www  txt     VREG               0,96             122032  2244221 /base/usr/local/lib/compat/libm.so.4
    lshttpd.4 84359    www  txt     VREG               0,96            1102888  2239855 /base/usr/local/lib/compat/libc.so.6
    lshttpd.4 84359    www    0u    VCHR               0,29                0t0       29 /dev/null
    lshttpd.4 84359    www    1u    VCHR               0,29                0t0       29 /dev/null
    lshttpd.4 84359    www    2u    unix 0xffffff009fa952d0                0t0          ->0xffffff0100a32870
    lshttpd.4 84359    www    3w    VREG               0,96               5004  2923218 /base/opt/lsws/logs/error.log
    lshttpd.4 84359    www    4r    VCHR               0,10            0t28928       10 /dev/random
    lshttpd.4 84359    www    5u    unix 0xffffff00560fb870                0t0          /tmp/lshttpd/fdb.pl_rails.sock.912
    lshttpd.4 84359    www    7u  KQUEUE 0xffffff0206f4fb00                             count=0, state=0x2
    lshttpd.4 84359    www   11u    IPv4 0xffffff0006afd440                0t0      UDP *:*
    lshttpd.4 84359    www   12u    unix 0xffffff0100a32870                0t0          ->0xffffff009fa952d0
    lshttpd.4 84359    www   13u    unix 0xffffff0106aa42d0                0t0          /base/opt/lsws/admin/cgid/cgid.sock
    lshttpd.4 84359    www   15u    unix 0xffffff0056e8b000                0t0          ->0xffffff0056dfa2d0
    lshttpd.4 84359    www   18u    unix 0xffffff00896495a0                0t0          ->0xffffff0213f1eb40
    lshttpd.4 84359    www   20u    unix 0xffffff001692f000                0t0          ->0xffffff020bec52d0
    lshttpd.4 84359    www   23u    unix 0xffffff0089836b40                0t0          /tmp/lshttpd/gfx1.fdb.pl_lsphp.sock.022
    lshttpd.4 84359    www   24u    unix 0xffffff01ff15a5a0                0t0          /tmp/lshttpd/fdb.pl_lsphp.sock.583
    lshttpd.4 84359    www   28u    unix 0xffffff01c786f000                0t0          ->0xffffff01530122d0
    lshttpd.4 84359    www   29u    unix 0xffffff0029780000                0t0          /tmp/lshttpd/app.eurobattle.net_rails_old.sock.513
    lshttpd.4 84359    www   31u    unix 0xffffff01a22b5870                0t0          ->0xffffff008c0e5000
    lshttpd.4 84359    www   33w    VREG               0,96              12054  2922266 /base/opt/lsws/logs/access.log
    lshttpd.4 84359    www   34w    VREG               0,96               3977  2923034 /base/opt/lsws/logs/stderr.log
    lshttpd.4 84359    www   35u    unix 0xffffff0089bc2b40                0t0          /tmp/lshttpd/admin_php.sock.222
    lshttpd.4 84359    www   36u    unix 0xffffff01a2a5f2d0                0t0          ->0xffffff01ff0de000
    lshttpd.4 84359    www   41u    unix 0xffffff0117b822d0                0t0          ->0xffffff0106ae1870
    lshttpd.4 84359    www   43u    unix 0xffffff02293e5000                0t0          ->0xffffff0117aee870
    lshttpd.4 84359    www   44u    unix 0xffffff0089d75870                0t0          ->0xffffff008e8e52d0
    lshttpd.4 84359    www   54u    unix 0xffffff01a2b145a0                0t0          ->0xffffff022774a5a0
    
  4. Grzegorz Derebecki

    Grzegorz Derebecki New Member

    after few hour old lshttpd is still runing.
  5. mistwang

    mistwang LiteSpeed Staff

    Please try a force reinstall from web console to apply the latest build of 4.0.12, add some code to deal with it.
  6. Grzegorz Derebecki

    Grzegorz Derebecki New Member

    Reinstalled (manualy becous force don't works)

    web server quite but root quits first then after wihle no-root server quit and live all lsapi process

    Code:
    root@core2:~# ps aux |grep RAILS|grep in4max
    in4max 86743 13.5  2.2 757572 180600  ??  R    11:25AM   0:25.26 ruby: RAILS: fdb.pl (production ID:86734) (ruby)
    in4max 86746 10.9  2.2 757572 180840  ??  S    11:25AM   0:28.11 ruby: RAILS: fdb.pl (production ID:86734) (ruby)
    in4max 86745  7.1  1.8 728800 152428  ??  S    11:25AM   0:30.65 ruby: RAILS: fdb.pl (production ID:86734) (ruby)
    in4max 86744  4.6  2.2 758596 182116  ??  S    11:25AM   0:26.07 ruby: RAILS: fdb.pl (production ID:86734) (ruby)
    in4max 84884  0.0  1.8 721632 145796  ??  Ss   11:16AM   0:04.57 ruby: RAILS: fdb.pl (production ID:84884) (ruby)
    in4max 84898  0.0  2.3 767812 191056  ??  S    11:16AM   0:19.42 ruby: RAILS: fdb.pl (production ID:84884) (ruby)
    in4max 84899  0.0  2.0 738332 162800  ??  S    11:16AM   0:25.92 ruby: RAILS: fdb.pl (production ID:84884) (ruby)
    in4max 84900  0.0  2.3 762096 186568  ??  S    11:16AM   0:21.38 ruby: RAILS: fdb.pl (production ID:84884) (ruby)
    in4max 84901  0.0  1.8 727776 150796  ??  S    11:16AM   0:27.83 ruby: RAILS: fdb.pl (production ID:84884) (ruby)
    in4max 85694  0.0  1.8 721632 145796  ??  Ss   11:20AM   0:04.45 ruby: RAILS: fdb.pl (production ID:85694) (ruby)
    in4max 85703  0.0  2.2 757572 180596  ??  S    11:20AM   0:23.22 ruby: RAILS: fdb.pl (production ID:85694) (ruby)
    in4max 85704  0.0  2.2 757572 181160  ??  S    11:20AM   0:31.89 ruby: RAILS: fdb.pl (production ID:85694) (ruby)
    in4max 85705  0.0  2.2 757572 181136  ??  S    11:20AM   0:27.93 ruby: RAILS: fdb.pl (production ID:85694) (ruby)
    in4max 85706  0.0  2.2 757572 180572  ??  S    11:20AM   0:28.23 ruby: RAILS: fdb.pl (production ID:85694) (ruby)
    in4max 86734  0.0  1.8 721632 145796  ??  Ss   11:25AM   0:04.42 ruby: RAILS: fdb.pl (production ID:86734) (ruby)
    
    (id = belongs to 1 litespeed server)

    this is how it looks after few restarts. php isn't stoped too.

    by the way swithing version don't change symlink to lscgid :) and after that we have (after this test i checked to rc2 for another test)

    Code:
    www    88224  2.9  0.1 19208  9548  ??  D    11:32AM   0:02.18 lshttpd (lshttpd.4.1RC2)
    root   88222  0.0  0.0 11972  4012  ??  S    11:32AM   0:00.03 lshttpd (lshttpd.4.1RC2)
    root   88223  0.0  0.0  2568   716  ??  S    11:32AM   0:00.00 httpd (lscgid.4.0.12)
    Last edited: Feb 4, 2010
  7. mistwang

    mistwang LiteSpeed Staff

    Please check those PHP and Rails process that refuse to quit with ktrace, LSAPI should check ppid every second, if ppid is -1, means that parent process is dead, LSAPI app should quit. I wonder if those LSAPI app stuck some where.

    I will check the lscgid issue.
  8. Grzegorz Derebecki

    Grzegorz Derebecki New Member

    I'm using LSAPI_PPID_NO_CHECK becous of my settings in freebsd:
    security.bsd.see_other_uids=0
    security.bsd.see_other_gids=0

    here are my lsapi config:

    LSAPI_CHILDREN=4
    LSAPI_AVOID_FORK=1
    LSAI_MAX_IDLE_CHILDREN=2
    LSAPI_MAX_IDLE=600
    LSAPI_PPID_NO_CHECK=1
    LSAPI_EXTRA_CHILDREN=0

    also notice that ANY lsapi process isn't stoped (php/ruby)

    In 4.1RC2 all works perfectly.
  9. mistwang

    mistwang LiteSpeed Staff

    That's why those processes do not quit when parent process quit. Need to change the way how graceful shutdown works.
  10. mistwang

    mistwang LiteSpeed Staff

    Maybe you can try remove LSAPI_PPID_NO_CHECK, as latest LSAPI uses getppid(), instead of kill( ppid, 0 ) to test if parent process is alive or not, so the security setting in FreeBSD does affect kill(), but may not affect getppid() at all.
  11. Grzegorz Derebecki

    Grzegorz Derebecki New Member

    i will test it on ruby becous my lsapi for php is older
  12. Grzegorz Derebecki

    Grzegorz Derebecki New Member

    it won't help

    server exits and leave lsapi process (ID: 16273) and


    Code:
    in4max@core2:~% ps aux |grep RAILS
    in4max 16506 25.4  1.8 728968 152608  ??  S     8:07PM   0:16.78 ruby: RAILS: fdb.pl (production ID:16496) (ruby)
    in4max 16507 22.7  1.8 728968 151572  ??  S     8:07PM   0:30.46 ruby: RAILS: fdb.pl (production ID:16496) (ruby)
    in4max 16505 15.0  1.8 727944 149984  ??  S     8:07PM   0:18.65 ruby: RAILS: fdb.pl (production ID:16496) (ruby)
    in4max 16273  0.0  1.8 721800 145036  ??  Ss    8:06PM   0:04.22 ruby: RAILS: fdb.pl (production ID:16273) (ruby)
    in4max 16281  0.0  1.8 727944 149996  ??  S     8:06PM   0:06.69 ruby: RAILS: fdb.pl (production ID:16273) (ruby)
    in4max 16282  0.0  1.8 727944 150796  ??  S     8:06PM   0:07.13 ruby: RAILS: fdb.pl (production ID:16273) (ruby)
    in4max 16283  0.0  1.8 728968 152244  ??  S     8:06PM   0:14.94 ruby: RAILS: fdb.pl (production ID:16273) (ruby)
    in4max 16496  0.0  1.8 721800 145040  ??  Ss    8:07PM   0:04.32 ruby: RAILS: fdb.pl (production ID:16496) (ruby)
    in4max 16508  0.0  1.8 727944 149956  ??  S     8:07PM   0:07.46 ruby: RAILS: fdb.pl (production ID:16496) (ruby)
    
    when i truss them are killed:

    Code:
    root@core2:/var/db/mysql# truss -p 16281
    clock_gettime(4,{4022.010929191 })		 = 0 (0x0)
    select(1,{0},0x0,0x0,{0.774455 })		 = 0 (0x0)
    getppid(0x1,0x7fffffffe0e0,0x0,0x0,0x7fffffffdff0,0x0) = 17682 (0x4512)
    sigaction(SIGINT,{ SIG_DFL SA_RESTART ss_t },{ 0x45ee24 SA_RESTART ss_t }) = 0 (0x0)
    close(3)					 = 0 (0x0)
    close(6)					 = 0 (0x0)
    close(6)					 ERR#9 'Bad file descriptor'
    write(5,"\^A\0\0\0\^A",5)			 = 5 (0x5)
    shutdown(5,SHUT_RDWR)				 = 0 (0x0)
    close(5)					 = 0 (0x0)
    sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
    sigprocmask(SIG_SETMASK,0x0,0x0)		 = 0 (0x0)
    ...
    
    Last edited: Feb 4, 2010
  13. mistwang

    mistwang LiteSpeed Staff

    There should be one ruby process with parent PID of 1, that's the one started by LSWS, the other worker ruby processes are forked by that process.
    So, check that ruby process with ktrace/kdump, see if it is doing getppid(), and the result.

    The configuration change only affect newly started ruby processes, so you may need to restart LSWS the second time to check the effect.
  14. Grzegorz Derebecki

    Grzegorz Derebecki New Member

    still sometimes all lsapi process arent turned off. Why server exit before it checkc that all sapi process are killed?

    (i meen root process not www with not have rights to kill sapi process)

    This is lived lsapi process:

    Code:
    root@core2:/opt/lsws/fcgi-bin# truss -p 68635
    clock_gettime(13,{1265367432.000000000 })	 = 0 (0x0)
    nanosleep({0.100000000 })			 = 0 (0x0)
    clock_gettime(13,{1265367432.000000000 })	 = 0 (0x0)
    nanosleep({0.100000000 })			 = 0 (0x0)
    clock_gettime(13,{1265367432.000000000 })	 = 0 (0x0)
    nanosleep({0.100000000 })			 = 0 (0x0)
    clock_gettime(13,{1265367432.000000000 })	 = 0 (0x0)
    nanosleep({0.100000000 })			 = 0 (0x0)
    clock_gettime(13,{1265367432.000000000 })	 = 0 (0x0)
    nanosleep({0.100000000 })			 = 0 (0x0)
    clock_gettime(13,{1265367432.000000000 })	 = 0 (0x0)
    nanosleep({0.100000000 })			 = 0 (0x0)
    clock_gettime(13,{1265367432.000000000 })	 = 0 (0x0)
    nanosleep({0.100000000 })			 = 0 (0x0)
    clock_gettime(13,{1265367432.000000000 })	 = 0 (0x0)
    nanosleep({0.100000000 })			 = 0 (0x0)
    clock_gettime(13,{1265367432.000000000 })	 = 0 (0x0)
    nanosleep({0.100000000 })			 = 0 (0x0)
    clock_gettime(13,{1265367433.000000000 })	 = 0 (0x0)
    nanosleep({0.100000000 })			 = 0 (0x0)
    clock_gettime(13,{1265367433.000000000 })	 = 0 (0x0)
    nanosleep({0.100000000 })			 = 0 (0x0)
    clock_gettime(13,{1265367433.000000000 })	 = 0 (0x0)
    nanosleep({0.100000000 })			 = 0 (0x0)
    clock_gettime(13,{1265367433.000000000 })	 = 0 (0x0)
    nanosleep({0.100000000 })			 = 0 (0x0)
    clock_gettime(13,{1265367433.000000000 })	 = 0 (0x0)
    nanosleep({0.100000000 })			 = 0 (0x0)
    clock_gettime(13,{1265367433.000000000 })	 = 0 (0x0)
    nanosleep({0.100000000 })			 = 0 (0x0)
    clock_gettime(13,{1265367433.000000000 })	 = 0 (0x0)
    nanosleep({0.100000000 })			 = 0 (0x0)
    clock_gettime(13,{1265367433.000000000 })	 = 0 (0x0)
    nanosleep({0.100000000 })			 = 0 (0x0)
    clock_gettime(13,{1265367433.000000000 })	 = 0 (0x0)
    nanosleep({0.100000000 })			 = 0 (0x0)
    clock_gettime(13,{1265367433.000000000 })	 = 0 (0x0)
    

    if i do kill 68635 it property kills all childs. Why litespeed don't killed them?
    Last edited: Feb 5, 2010
  15. mistwang

    mistwang LiteSpeed Staff

    Are you sure you removed "LSAPI_PPID_NO_CHECK" completely from env, instead of set it to "0"?
    As long as "LSAPI_PPID_NO_CHECK" is present, regardless value, LSAPI will not check PPID.

    lshttpd process that starts the LSAPI app in suEXEC mode cannot kill it due to permission issue, only the parent lshttpd process running as "root" can, but it has exit already at this stage.
  16. Grzegorz Derebecki

    Grzegorz Derebecki New Member

    Hmm right i set it to 0 strange :)

    of corse parent should to it i think. But i will test it with completly removed LSAPI_PPID_NO_CHECK
  17. Grzegorz Derebecki

    Grzegorz Derebecki New Member

    Ok after few days tests all working ok :)

Share This Page