One of the factor is what "ab" did at the end of the benchmark tests. It can affect one web server more than to the others. To reduce the impact of this, just combine multiple runs into one large run.
Another factor is the way web server works, perfork, threaded or event-driven. perfork has poor scalability is for sure.