Discussion in 'Feedback/Feature Requests' started by cyberzen, Oct 23, 2004.
Could you include more web servers in your benchmark, like thttpd and boa?
A most complete benchmark result will be available soon.
I suggest other people who want other webservers to be included in the benchmark to please post in this thread.
Another web server to benchmark: lighttpd
Sure, if we have additional time to play with it. Our focus is on Apache though.
We tried it before, and the result shows that its keep-alive performance is not as good as boa.
any news about more benchmarks?
New benchmarks will be released with 2.0 release.
Something's not right in the new benchmark.
According to the product page litespeed standard cannot handle more than 300 concurrent connections, yet all the benchmarks show it handling up to 1000 concurrency. What gives?
Not only LSWS, by default, Apache can only have 255 concurrent connections, but they still be able to survive on the 1000 concurrent level test.
That is true that at give time, Standand Edition can have 300 concurrent connections at most. However, it can more clients when it does not keep connections alive. LSWS will changes to non-keepalive mode when total concurrent connections reach certain level. That's why it can successfully pass the test like "ab -n xxxxx -c 1000 -k ..." with most requests non-keepalive.
ApacheBench does think the concurrent level is at 1000.
Regarding the lighttpd benchmark results: the performance would have been drastically better if the event handler would have been set up correctly (linux-sysepoll). But you probably know that already, and that's why it is commented out in the config...?
It is just the opposite, we tried sysepoll, the score is lower than using poll() for the concurrent levels we test, not only for lighttpd, same thing with litespeed and other web server we tested before like the 'userver'.
You probably see benefit of sys_epoll for concurrent level at 10,000.
Just do a simple over the network test, not over the loopback, and see the result yourself.
It is true that epoll is slightly slower than poll when used in the same manner. However, if the design of the work loop is changed to actually use the capability of epoll to store an address directly identifying the dataset/class-member, instead going the long way via the port number, the increase should be significant. Then epoll can easily gain several % speed over poll. At least that's my experience from testing several worker core solutions for hssTVS.
If I ever manage to get hssTVS feature complete you can add it to your benchmark too. :wink: It should be around as fast as your professional version for static content (Linux only though) currently, but it is a long way from even offering the most basic features of your servers and it still lacks any support for dynamic content.
That's always a good idea to use each API in the most efficient way.
But there are weaknesses in epoll that make it not as efficient as poll in such intensive performance tests. As epoll API can only handle one file handle at a time, there are too many kernel/user land context switches. For example, three context switches are required for each non-keepalive connection just for event handling, while poll() only need one. poll() can combine multiple events in one function call, epoll() cannot. In this kind of performance test, usually multiple events will be combined and reported by one poll(), epoll() has to fetch them one by one.
The design of kqueue() in FreeBSD is better.
We can include your hssTVS in next benchmark update if it is feature complete and ready for production use by that time.
My view of epoll is quite different. In my opinion it is superior. While you have to go through your complete fd set when poll returns to find the active ones, you also have to get the right work data set additinally. With epoll you just register the interesting descriptors once and it returns only the ones that have been active since the last call. This can very well be multiple data sets, only limited by the size of the provided buffer. So, if from 200 active connections were 20 active I get 20 data sets at once, ready with the pointer to the class containing the information and the methods to handle it. The problems coming with the needed add/modify/delete are largely a problem of the design philosophy that's commen amoung the event driven servers. I found ways to work around that and that's part of the reason my core is so fast.
In case I finish hssTVS far enough to be worthy I'll contact you. But don't hold your breath, I don't have the time I'd need to get there any time soon. Feel free to check the current version. Most parts needed for the dynamic content are already in there, so the speed of a more final version shouldn't be slower than built 178.
Recent bechmarks from TVS:
"Update to 2.0... The 2.0 Version uses poll and is overall 11% faster for
non-keep-alive requests and around 22% faster when keep alive is enabled.
I am somewhat baffled though about the very low number shown at 250 concurrent
keep-alive connections. The status of ZB shows that the number of kept-alive
requests is by far too low (5056 from 10000), especially as the max concurrent
value was set to 300(default setting). Here it was even slower than 1.5. Well,
while it is considerable faster than the old version 1.5 I tested before, it
still fails to overcome Boa or even my dptTVS. Actually, comparing my numbers
with the results they show on their page, my hssTVS is able to beat the
professional LiteSpeed and comes even near Tux 2.2, so I have every right to
go party tonight. The 2.0 Version is also considerable faster when it comes
to bigger files. So, in case you stll use 1.5 you really should upgrade to
this newer version."
Is there any plan to add zero-copy support (sendfile/sendfilev) to the Standard edition? If not, what is the anticipated perf/CPU boost of the feature in the Pro edition?
No plan yet.
sendfile() will greatly reduce CPU utilization when serving large files, for smaller files, the impact is minimum.
For serving large set of static files, enterprise should be used even for a single CPU server, as disk I/O wait will slow down any single process event driven server, unless aio is used for disk i/o.
For zero-copy sendfile to work you'll also need hardware that supports this feature. I've only found some rather expensive 1GBit cards with such a support till now. But it's still more effective than pumping the data with read/write and the usage is straight forward and easy to implement.
At least the 2.6 Linux core supports non-blocking file access. But portability doesn't allow to optimize code to use that feature I guess. On the other hand it still seems to lack signal support for normal filedescriptors.
I suppose this is indeed the first impression of most people looking at the benchmarks (mine too). So maybe the servers should be measured with epoll and plain poll too, just to show there is no malice.
aio_sendfile() will be the killer.
And maybe a little late but could you add Cherokee httpd to the benchmark? Thanks.
Yes, could you please include hssTVS in a new benchmark as well. I would be interested to see how it compares in serving up static content (say 10kb static) vs LSWS Pro/Enterprise.
Thanks for the great work guys!
Separate names with a comma.