[solved] Transfers stalling over SSL with Chrome on macOS

#1
This one has plagued us for a while, but I haven't been able to hone in on the specific problem. Only recently have I dialed in a specific, repeatable scenario that causes problems.

What we're seeing
Using Chrome on macOS, begin to transfer these two videos hosted by our LiteSpeed server by opening them in separate tabs. For repeatability, I recommend using an Incognito Window. (Otherwise, you'll probably need to shut down Chrome entirely to get a clean slate for testing. There seems to be "something" in Chrome's profiles (normal or Incognito) that is interacting poorly with LiteSpeed)
  • https://social5.com/uploads/GetFound-web.mp4
  • https://social5.com/uploads/GetNoticed-web.mp4
The first file will begin playing immediately. The second video will stall/freeze. Using the Developer Tools in Chrome, you'll see that only a small number of bytes have been transferred. Note that if you close the first tab showing the playing video, the remaining tab will *usually* start loading and playing the desired video.
Interestingly, something slightly different happens if we load a third video as well. Again, in a *new* Incognito Window, open the first two videos above in tabs. Then, also open this video:
  • https://social5.com/uploads/GetSocial-web.mp4
The first video plays, and the second and third videos stall. However, this time, if you close the first tab, the video in the second tab does not start loading and playing.
There seem to be all sorts of little nuances to this issue as well, but I'm inclined to think it's only an issue with SSL and Chrome.

Some interesting additional information
Performing the same tests as above, but using these non-SSL versions of the same files served from the same LiteSpeed server does not demonstrate any odd behavior (The social5.com server is configured to forward all non-SSL requests to their SSL equivalents, so unfortunately for the time being, we can't simply test the same URLs above with non-SSL):
  • http://static.social5.com/GetFound-web.mp4
  • http://static.social5.com/GetNoticed-web.mp4
  • http://static.social5.com/GetSocial-web.mp4
Performing the same tests as above with the same files served via SSL (or non-SSL) on a non-LiteSpeed server (Apache, I believe) does not demonstrate any odd behavior:
  • https://featurific.com/GetFound-web.mp4
  • https://featurific.com/GetNoticed-web.mp4
  • https://featurific.com/GetSocial-web.mp4
Any idea what is going on? I've searched and searched for anyone having a similar issue or for a setting that I might tweak. I'm totally lost, so hopefully someone here can shed some light on what's going on! Fingers crossed. :)
 
Last edited by a moderator:
#2
Also, for the record, I don't think it has anything to do with the fact that these are video files. Once the connections to the server have stalled/frozen/stopped working, even simply loading something like https://social5.com stalls. (Try opening the first two videos over SSL on social5.com (the first two URLs in the post above) and then try loading https://social5.com. The tab does not load.)
 
Last edited by a moderator:
#3
Another bit of info. I've tested this same scenario on Chrome for Windows and could not reproduce the problem. I've also verified the "bug" on multiple macOS/OS X machines. In my testing, I used the latest version of Chrome (version 66) on both macOS and Windows 7.
 
#5
By the way, I just updated our Litespeed to the most up-to-date version, 5.2.6. After the graceful restart, my initial attempt succeeded (!) without demonstrating the problem. My next attempts would play the first and second files fine, but not the third. I suspect that the trend will continue - that is, that eventually, only the first file will play successfully and both the second and third files will stall.
 

Pong

Administrator
Staff member
#6
we try these videos on the different tab on MacOS High Sierra with Chrome 66, all videos seem to play well in our lab. Not sure how to reliably reproduce the issue.
 

mistwang

LiteSpeed Staff
#7
We could not easily reproduce this problem internally with chrome on mac.
Can you turn on web server debug logging, then try to reproduce the problem. send us the log file for analysis.
You need to
* remove old /usr/local/apache/logs/error_log
* login to server webadmin, then under server->log tab, change "Debug Level" to HIGH, then restart web server
* reproduce ,
* rename /usr/local/apache/logs/error_log to debug_log
* change "Debug Level" to "NONE", then restart web server.

the debug logging will fill up the disk pretty quick, so do it as fast as you. :)
 
#8
Interesting. When I attempted to recreate the problem this morning, initially everything worked fine. It took several minutes of hitting my server with requests (just me, manually, nothing high load) in order to start generating the trouble behavior.

I generated a debug_log file according to your instructions. It's 123 MB! I've sent a link to you, mistwang, via a Conversation.

For what it's worth, although I'm a Litespeed novice, I did some initial analysis of the log file. I did see some curious entries about GOAWAY frames and shutting down/recycling SSL connections. Maybe that has something to do with what's going on?

If it helps with analysis of the file, the IP address I was using when I generated the trouble requests was 45.56.4.94. Also, you may want to search for the .mp4 requests; I also made some requests to the main homepage, which stalled as well.

If it turns out that the log file doesn't help, please let me know what else I can do on my end to help troubleshoot the issue. I'm happy to try creating another log file, etc.
 

mistwang

LiteSpeed Staff
#9
Thanks for the debug log, it is helpful.
I think might be related to QUIC transport. may hit a bug. Our developer is look at it.
In the mean time, can you try turning off QUIC, just use HTTP/2 to serve the video file, see if it fixes the stalling problem.
 
#10
mistwang, thank you SO much! I disabled QUIC and have been hammering our server with no sign of the problematic behavior described above. I believe this has resolved our issue. Thanks again! If the QUIC bug ends up getting fixed, I'd love to know so we can re-enable it on our server.
 

mistwang

LiteSpeed Staff
#12
Yes, we have a debug build pushed, you can update to it with command

/usr/local/lsws/admin/misc/lsup.sh -d -f -v 5.2.7

you can try enabling QUIC.
 
#14
Hi, mistwang!

I used the command you provided to upgrade to 5.2.7. I then reenabled QUIC. From what I can see, the problematic behavior is no longer present. Initial performance is looking good!

Thanks for the quick response and short turnaround time for a fix! :)
 

mistwang

LiteSpeed Staff
#17
Hi Ryan,
Welcome!
5.2.7 build 2 has been pushed a few minutes ago.
It is a bug in our QUIC code.
The bug was that a RST_STREAM frame was ignored if the peer already sent a FIN. This could cause a stall, as the server would continue to send STREAM frames after the reset, until it eventually exhausted either stream or connection flow control window.
 
Top