[solved] Peformance Bottleneck

Discussion in 'Install/Configuration' started by Deltabee, Jul 6, 2012.

  1. Deltabee

    Deltabee New Member

    Hi,

    New to LSWS, just trialing a copy out on cloudlinux/cpanel.

    All installed fine, just have a performance issue, so to speak.

    The problem is that we seem to be able to get a very high throughput for a short amount of time, and then the throughput drops to nothing. It seems to be after just over 12K requests.

    We're using ab with the following test:

    ----------------------

    Code:
    
    [root@vhost1 ~]# ab -k -c 10 -t 30 http://ourtestserver:2080/
    This is ApacheBench, Version 2.3 <$Revision: 655654 $>
    Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
    Licensed to The Apache Software Foundation, http://www.apache.org/
    
    Benchmarking ourtestserver (be patient)
    Completed 5000 requests
    Completed 10000 requests
    Finished 10875 requests
    
    
    Server Software:        LiteSpeed
    Server Hostname:        ourtestserver 
    Server Port:            2080
    
    Document Path:          /
    Document Length:        13 bytes
    
    Concurrency Level:      10
    Time taken for tests:   32.605 seconds
    Complete requests:      10875
    Failed requests:        0
    Write errors:           0
    Keep-Alive requests:    0
    Total transferred:      2631750 bytes
    HTML transferred:       141375 bytes
    Requests per second:    333.54 [#/sec] (mean)
    Time per request:       29.982 [ms] (mean)
    Time per request:       2.998 [ms] (mean, across all concurrent requests)
    Transfer rate:          78.82 [Kbytes/sec] received
    
    Connection Times (ms)
                  min  mean[+/-sd] median   max
    Connect:        0   14 378.3      0   20992
    Processing:     0    2   5.0      1      91
    Waiting:        0    2   4.8      1      90
    Total:          0   15 378.3      1   20992
    
    Percentage of the requests served within a certain time (ms)
      50%      1
      66%      2
      75%      2
      80%      2
      90%      3
      95%      3
      98%      4
      99%     60
     100%  20992 (longest request)
    
    
    ----------------------

    It's hard to see from this output, but those 10875 requests were done very quickly, then it hits a wall.

    If we run the same test immediately after, we get next to no throughput (1578 requests). If we wait 5 mins and run it again, we get the same (similar) 10000+ requests.

    If we run it from a different server immediately afterwards, we get next to no throughput, suggesting it's not a per host throttle (we noticed that was a setting).

    If we run a test to Apache on the same server, we get to 50000 requests indicating that it doesn't seem to be an OS issue.

    Any ideas on what setting might be causing this limit?
    Last edited by a moderator: Jul 8, 2012
  2. webizen

    webizen New Member

    Pls post the ab result for the next run (with 1578 requests went through).
  3. Deltabee

    Deltabee New Member

    This is one run straight afterwards

    Code:
    
    This is ApacheBench, Version 2.3 <$Revision: 655654 $>
    Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
    Licensed to The Apache Software Foundation, http://www.apache.org/
    
    Benchmarking ourtestserver (be patient)
    Finished 16 requests
    
    
    Server Software:        LiteSpeed
    Server Hostname:        ourtestserver
    Server Port:            2080
    
    Document Path:          /
    Document Length:        13 bytes
    
    Concurrency Level:      10
    Time taken for tests:   12.016 seconds
    Complete requests:      16
    Failed requests:        0
    Write errors:           0
    Keep-Alive requests:    0
    Total transferred:      3872 bytes
    HTML transferred:       208 bytes
    Requests per second:    1.33 [#/sec] (mean)
    Time per request:       7510.159 [ms] (mean)
    Time per request:       751.016 [ms] (mean, across all concurrent requests)
    Transfer rate:          0.31 [Kbytes/sec] received
    
    Connection Times (ms)
                  min  mean[+/-sd] median   max
    Connect:        0 2815 3376.9   2998    9017
    Processing:     0    1   0.3      1       1
    Waiting:        0    0   0.3      0       1
    Total:          0 2816 3376.8   2998    9018
    
    Percentage of the requests served within a certain time (ms)
      50%   2998
      66%   2998
      75%   2999
      80%   2999
      90%   9018
      95%   9018
      98%   9018
      99%   9018
     100%   9018 (longest request)
    
    
    
    Last edited: Jul 7, 2012
  4. webizen

    webizen New Member

    your connect time in 2nd run is too long. What kind of hardware you got? also check /usr/local/apache/logs/check error_log and see if there is any msg.
  5. Deltabee

    Deltabee New Member

    Hi,

    It's an 8 core, 4gb ram, Xen VM.

    Watched the error log and it reports nothing.

    Interestingly, if i run the lsws benchmark, i can't connect to apache straight away afterwards, which would suggest it's an OS limit, or something lsws imposes on the entire os?
  6. NiteWave

    NiteWave Administrator

    I'm interested in this issue and have time available today. please PM your server's access if it's ok with you.
  7. Deltabee

    Deltabee New Member

    OK i think it's something to do with the number of connections that are in TIME_WAIT

    They seem to get to 13000 and then not allow any new TCP connections.
  8. Deltabee

    Deltabee New Member

    I've tried adding this:

    net.ipv4.tcp_fin_timeout = 15
    net.ipv4.tcp_max_tw_buckets_ub = 50000
    net.ipv4.tcp_tw_reuse = 1

    to /etc/sysctl.conf

    and rebooted, but the problem is the same.
  9. NiteWave

    NiteWave Administrator

    I noticed following unusual:
    [root@vhost1 ~]# ab -k -c 10 -t 30 http://ourtestserver:2080/
    Keep-Alive requests: 0

    you enabled keep-alive but actual tests showed no keep-alive.

    Document Path: /
    Document Length: 13 bytes

    what's the content of 13 bytes? is it a php or plain html file?

    you can try a small image file for keep-alive tests, see if keep-alive works
  10. Deltabee

    Deltabee New Member

    Well spotted!!

    Code:
     ab -k -c 10 -t 60 http://ourtestserver:2080/youstudio/                                                                                                                      images/joomla_green.gif
    This is ApacheBench, Version 2.3 <$Revision: 655654 $>
    Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
    Licensed to The Apache Software Foundation, http://www.apache.org/
    
    Benchmarking ourtestserver (be patient)
    Completed 5000 requests
    Completed 10000 requests
    Completed 15000 requests
    Completed 20000 requests
    Completed 25000 requests
    Completed 30000 requests
    Completed 35000 requests
    Completed 40000 requests
    Completed 45000 requests
    Completed 50000 requests
    Finished 50000 requests
    
    
    Server Software:        LiteSpeed
    Server Hostname:        ourtestserver
    Server Port:            2080
    
    Document Path:          /youstudio/images/joomla_green.gif
    Document Length:        2103 bytes
    
    Concurrency Level:      10
    Time taken for tests:   4.245 seconds
    Complete requests:      50000
    Failed requests:        0
    Write errors:           0
    Keep-Alive requests:    49510
    Total transferred:      122181870 bytes
    HTML transferred:       105150000 bytes
    Requests per second:    11779.81 [#/sec] (mean)
    Time per request:       0.849 [ms] (mean)
    Time per request:       0.085 [ms] (mean, across all concurrent requests)
    Transfer rate:          28110.92 [Kbytes/sec] received
    
    Connection Times (ms)
                  min  mean[+/-sd] median   max
    Connect:        0    0   0.0      0       1
    Processing:     0    1   1.2      1      79
    Waiting:        0    1   1.2      1      79
    Total:          0    1   1.2      1      79
    
    Percentage of the requests served within a certain time (ms)
      50%      1
      66%      1
      75%      1
      80%      1
      90%      1
      95%      2
      98%      2
      99%      2
     100%     79 (longest request)
    
    and on the server

    Code:
    root@cloudhost1 [~]# netstat -an | grep 'TIME_WAIT' |wc -l
    495
    
    no more 13,000 TIME_WAIT

    So the index.html is just a file that has

    Code:
    <head>
    </head>
    <body>
    <h1>test</h1>
    </body>
    </html>
    
    in it.

    If i use ab directly to the file as in http://ourtestserver/index.html i get the same thing.
  11. Deltabee

    Deltabee New Member

    Just a as control

    If we take out the -k and test on apache, we get the same thing.

    So two issues:

    1) Why doesn't LSWS use KA for our index.html
    2) How do i get round this 13,000 connection limit! :)
  12. Deltabee

    Deltabee New Member

    Ok found the problem.

    If we turn smart keep alives off, it works like this:

    Code:
    Server Software:        LiteSpeed
    Server Hostname:        ourtestserver
    Server Port:            2080
    
    Document Path:          /index.html
    Document Length:        59 bytes
    
    Concurrency Level:      10
    Time taken for tests:   2.599 seconds
    Complete requests:      50000
    Failed requests:        0
    Write errors:           0
    Keep-Alive requests:    49510
    Total transferred:      16281870 bytes
    HTML transferred:       2950000 bytes
    Requests per second:    19235.25 [#/sec] (mean)
    Time per request:       0.520 [ms] (mean)
    Time per request:       0.052 [ms] (mean, across all concurrent requests)
    Transfer rate:          6116.91 [Kbytes/sec] received
    
    Connection Times (ms)
                  min  mean[+/-sd] median   max
    Connect:        0    0   0.0      0       1
    Processing:     0    1   0.2      0       3
    Waiting:        0    0   0.2      0       3
    Total:          0    1   0.2      0       3
    ERROR: The median and mean for the processing time are more than twice the standard
           deviation apart. These results are NOT reliable.
    ERROR: The median and mean for the total time are more than twice the standard
           deviation apart. These results are NOT reliable.
    
    Percentage of the requests served within a certain time (ms)
      50%      0
      66%      1
      75%      1
      80%      1
      90%      1
      95%      1
      98%      1
      99%      1
     100%      3 (longest request)
    
    The only problem is on a simple php file that just calls phpinfo() it seems it still deosn't do KA's. Any ideas?
  13. NiteWave

    NiteWave Administrator

    lsws admin console->Server->Tuning->Smart Keep-Alive

    probably current value is : Yes
    set it to: No

    and test again.
  14. Deltabee

    Deltabee New Member

    See above :)

    The only problem now is php files not using KAs for some reason.
  15. Deltabee

    Deltabee New Member

    ok changed the content of the file, and it started using KA for php.

    Any ideas on how to overcome this 13,000 tcp connection limit for content that doesn't use a KA for whatever reason?
  16. NiteWave

    NiteWave Administrator

    maybe conntrack issue. when ab tests very slow, please check:
    #sysctl net.ipv4.netfilter.ip_conntrack_max
    #sysctl net.ipv4.netfilter.ip_conntrack_count
  17. Deltabee

    Deltabee New Member

    Ok found the issue.

    While the VM had a high net.ipv4.netfilter.ip_conntrack_max value, the dom0 xenserver didn't

    All the connections of the guests count against the max value of the dom0 server.

    Increased the limit by putting the following into /etc/sysctl.conf:

    net.ipv4.netfilter.ip_conntrack_max = 131072

    All works perfectly now. Probably will re-enable smart keepalives.

    Consider this issue solved. Thanks for your help, enterprise license bought.

    PS with Apache was getting about 300 req/s on a php script, with litespeed i get 20851 req/s - almost 70 times the performance!

Share This Page