[solved] Peformance Bottleneck

#1
Hi,

New to LSWS, just trialing a copy out on cloudlinux/cpanel.

All installed fine, just have a performance issue, so to speak.

The problem is that we seem to be able to get a very high throughput for a short amount of time, and then the throughput drops to nothing. It seems to be after just over 12K requests.

We're using ab with the following test:

----------------------

Code:
[root@vhost1 ~]# ab -k -c 10 -t 30 http://ourtestserver:2080/
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking ourtestserver (be patient)
Completed 5000 requests
Completed 10000 requests
Finished 10875 requests


Server Software:        LiteSpeed
Server Hostname:        ourtestserver 
Server Port:            2080

Document Path:          /
Document Length:        13 bytes

Concurrency Level:      10
Time taken for tests:   32.605 seconds
Complete requests:      10875
Failed requests:        0
Write errors:           0
Keep-Alive requests:    0
Total transferred:      2631750 bytes
HTML transferred:       141375 bytes
Requests per second:    333.54 [#/sec] (mean)
Time per request:       29.982 [ms] (mean)
Time per request:       2.998 [ms] (mean, across all concurrent requests)
Transfer rate:          78.82 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   14 378.3      0   20992
Processing:     0    2   5.0      1      91
Waiting:        0    2   4.8      1      90
Total:          0   15 378.3      1   20992

Percentage of the requests served within a certain time (ms)
  50%      1
  66%      2
  75%      2
  80%      2
  90%      3
  95%      3
  98%      4
  99%     60
 100%  20992 (longest request)
----------------------

It's hard to see from this output, but those 10875 requests were done very quickly, then it hits a wall.

If we run the same test immediately after, we get next to no throughput (1578 requests). If we wait 5 mins and run it again, we get the same (similar) 10000+ requests.

If we run it from a different server immediately afterwards, we get next to no throughput, suggesting it's not a per host throttle (we noticed that was a setting).

If we run a test to Apache on the same server, we get to 50000 requests indicating that it doesn't seem to be an OS issue.

Any ideas on what setting might be causing this limit?
 
Last edited by a moderator:
#3
This is one run straight afterwards

Code:
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking ourtestserver (be patient)
Finished 16 requests


Server Software:        LiteSpeed
Server Hostname:        ourtestserver
Server Port:            2080

Document Path:          /
Document Length:        13 bytes

Concurrency Level:      10
Time taken for tests:   12.016 seconds
Complete requests:      16
Failed requests:        0
Write errors:           0
Keep-Alive requests:    0
Total transferred:      3872 bytes
HTML transferred:       208 bytes
Requests per second:    1.33 [#/sec] (mean)
Time per request:       7510.159 [ms] (mean)
Time per request:       751.016 [ms] (mean, across all concurrent requests)
Transfer rate:          0.31 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0 2815 3376.9   2998    9017
Processing:     0    1   0.3      1       1
Waiting:        0    0   0.3      0       1
Total:          0 2816 3376.8   2998    9018

Percentage of the requests served within a certain time (ms)
  50%   2998
  66%   2998
  75%   2999
  80%   2999
  90%   9018
  95%   9018
  98%   9018
  99%   9018
 100%   9018 (longest request)
 
Last edited:

webizen

Well-Known Member
#4
your connect time in 2nd run is too long. What kind of hardware you got? also check /usr/local/apache/logs/check error_log and see if there is any msg.
 
#5
Hi,

It's an 8 core, 4gb ram, Xen VM.

Watched the error log and it reports nothing.

Interestingly, if i run the lsws benchmark, i can't connect to apache straight away afterwards, which would suggest it's an OS limit, or something lsws imposes on the entire os?
 
#7
OK i think it's something to do with the number of connections that are in TIME_WAIT

They seem to get to 13000 and then not allow any new TCP connections.
 
#8
I've tried adding this:

net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_max_tw_buckets_ub = 50000
net.ipv4.tcp_tw_reuse = 1

to /etc/sysctl.conf

and rebooted, but the problem is the same.
 

NiteWave

Administrator
#9
I noticed following unusual:
[root@vhost1 ~]# ab -k -c 10 -t 30 http://ourtestserver:2080/
Keep-Alive requests: 0

you enabled keep-alive but actual tests showed no keep-alive.

Document Path: /
Document Length: 13 bytes

what's the content of 13 bytes? is it a php or plain html file?

you can try a small image file for keep-alive tests, see if keep-alive works
 
#10
Well spotted!!

Code:
 ab -k -c 10 -t 60 http://ourtestserver:2080/youstudio/                                                                                                                      images/joomla_green.gif
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking ourtestserver (be patient)
Completed 5000 requests
Completed 10000 requests
Completed 15000 requests
Completed 20000 requests
Completed 25000 requests
Completed 30000 requests
Completed 35000 requests
Completed 40000 requests
Completed 45000 requests
Completed 50000 requests
Finished 50000 requests


Server Software:        LiteSpeed
Server Hostname:        ourtestserver
Server Port:            2080

Document Path:          /youstudio/images/joomla_green.gif
Document Length:        2103 bytes

Concurrency Level:      10
Time taken for tests:   4.245 seconds
Complete requests:      50000
Failed requests:        0
Write errors:           0
Keep-Alive requests:    49510
Total transferred:      122181870 bytes
HTML transferred:       105150000 bytes
Requests per second:    11779.81 [#/sec] (mean)
Time per request:       0.849 [ms] (mean)
Time per request:       0.085 [ms] (mean, across all concurrent requests)
Transfer rate:          28110.92 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       1
Processing:     0    1   1.2      1      79
Waiting:        0    1   1.2      1      79
Total:          0    1   1.2      1      79

Percentage of the requests served within a certain time (ms)
  50%      1
  66%      1
  75%      1
  80%      1
  90%      1
  95%      2
  98%      2
  99%      2
 100%     79 (longest request)
and on the server

Code:
root@cloudhost1 [~]# netstat -an | grep 'TIME_WAIT' |wc -l
495
no more 13,000 TIME_WAIT

So the index.html is just a file that has

Code:
<head>
</head>
<body>
<h1>test</h1>
</body>
</html>
in it.

If i use ab directly to the file as in http://ourtestserver/index.html i get the same thing.
 
#11
Just a as control

If we take out the -k and test on apache, we get the same thing.

So two issues:

1) Why doesn't LSWS use KA for our index.html
2) How do i get round this 13,000 connection limit! :)
 
#12
Ok found the problem.

If we turn smart keep alives off, it works like this:

Code:
Server Software:        LiteSpeed
Server Hostname:        ourtestserver
Server Port:            2080

Document Path:          /index.html
Document Length:        59 bytes

Concurrency Level:      10
Time taken for tests:   2.599 seconds
Complete requests:      50000
Failed requests:        0
Write errors:           0
Keep-Alive requests:    49510
Total transferred:      16281870 bytes
HTML transferred:       2950000 bytes
Requests per second:    19235.25 [#/sec] (mean)
Time per request:       0.520 [ms] (mean)
Time per request:       0.052 [ms] (mean, across all concurrent requests)
Transfer rate:          6116.91 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       1
Processing:     0    1   0.2      0       3
Waiting:        0    0   0.2      0       3
Total:          0    1   0.2      0       3
ERROR: The median and mean for the processing time are more than twice the standard
       deviation apart. These results are NOT reliable.
ERROR: The median and mean for the total time are more than twice the standard
       deviation apart. These results are NOT reliable.

Percentage of the requests served within a certain time (ms)
  50%      0
  66%      1
  75%      1
  80%      1
  90%      1
  95%      1
  98%      1
  99%      1
 100%      3 (longest request)
The only problem is on a simple php file that just calls phpinfo() it seems it still deosn't do KA's. Any ideas?
 
#15
ok changed the content of the file, and it started using KA for php.

Any ideas on how to overcome this 13,000 tcp connection limit for content that doesn't use a KA for whatever reason?
 

NiteWave

Administrator
#16
maybe conntrack issue. when ab tests very slow, please check:
#sysctl net.ipv4.netfilter.ip_conntrack_max
#sysctl net.ipv4.netfilter.ip_conntrack_count
 
#17
Ok found the issue.

While the VM had a high net.ipv4.netfilter.ip_conntrack_max value, the dom0 xenserver didn't

All the connections of the guests count against the max value of the dom0 server.

Increased the limit by putting the following into /etc/sysctl.conf:

net.ipv4.netfilter.ip_conntrack_max = 131072

All works perfectly now. Probably will re-enable smart keepalives.

Consider this issue solved. Thanks for your help, enterprise license bought.

PS with Apache was getting about 300 req/s on a php script, with litespeed i get 20851 req/s - almost 70 times the performance!
 
Top