LiteSpeed Support Forums

LiteSpeed Support Forums (http://www.litespeedtech.com/support/forum/index.php)
-   General (http://www.litespeedtech.com/support/forum/forumdisplay.php?f=29)
-   -   lslb - ExtConn timed out while connecting. (http://www.litespeedtech.com/support/forum/showthread.php?t=4229)

Clockwork 08-11-2010 06:17 AM

lslb - ExtConn timed out while connecting.
 
My lslb error.log is getting flooded by the following notice:

Quote:

2010-08-11 15:10:42.000 NOTICE [xxx] ExtConn timed out while connecting.
the site is working fine, but what does this message mean?

edit:

debug stuff:

Quote:

2010-08-11 18:02:55.000 NOTICE [ip:52299-21#sitename:loadbalancer] ExtConn timed out while connecting.
2010-08-11 18:02:55.000 DEBUG [ip:52299-21#sitename:loadbalancer] connection to [192.168.0.3:80] on request #1, error: Connection timed out!
2010-08-11 18:02:55.000 DEBUG [ip:52299-21#sitename:loadbalancer] [ExtConn] close()
2010-08-11 18:02:55.000 DEBUG [ip:52299-21#sitename:loadbalancer] HttpExtConnector::tryRecover()...
2010-08-11 18:02:55.000 DEBUG [ip:52299-21#sitename:loadbalancer] trying to recover from connection problem, attempt: #1!
2010-08-11 18:02:55.000 DEBUG [ip:52299-21#sitename:loadbalancer] Get SESSION_ID from COOKIE: [hash].
2010-08-11 18:02:55.000 DEBUG [ip:52299-21#sitename:loadbalancer] Found worker [clusterHTTP_s2] by strategy [0].
2010-08-11 18:02:55.000 DEBUG [ip:52299-21#sitename:loadbalancer] [LB] retry worker: [clusterHTTP_s2]
2010-08-11 18:02:55.000 DEBUG [ip:52299-21#sitename:loadbalancer] trying to recover from connection problem, attempt: #1!
2010-08-11 18:02:55.000 DEBUG [192.168.0.4:80] connection available!
2010-08-11 18:02:55.000 DEBUG [192.168.0.4:80] request [ip:52299-21#sitename:loadbalancer] is assigned with connection!
2010-08-11 18:02:55.000 DEBUG [ip:52299-21#sitename:loadbalancer] [ExtConn] reconnect()
2010-08-11 18:02:55.000 DEBUG [ip:52299-21#sitename:loadbalancer] [ExtConn] connecting to [192.168.0.4:80]...
edit:

sometimes I'm getting the following warning:

Quote:

2010-08-11 19:09:05.000 NOTICE [clusterHTTP_s4] PingConn timed out while connecting.
2010-08-11 19:09:05.000 WARN [192.168.0.5:80] Failure detected: Connection Failure, 110:Connection timed out
2010-08-11 19:09:05.000 NOTICE [clusterHTTP_s2] PingConn timed out while connecting.
2010-08-11 19:09:05.000 WARN [192.168.0.4:80] Failure detected: Connection Failure, 110:Connection timed out
2010-08-11 19:09:05.899 INFO [192.168.0.4:80] Fail all outstanding requests!
2010-08-11 19:09:05.899 INFO [192.168.0.4:80] Fail all outstanding requests!
2010-08-11 19:09:06.000 NOTICE [ip:1612-0#sitename] ExtConn timed out while connecting.
2010-08-11 19:09:06.000 INFO [192.168.0.5:80] Fail all outstanding requests!
the problem started after we've changed the connection from our database server to a gigabit port, but this change doesn't affected the loadbalancer or webserver, it just improved the page load speed.

btw. we had the "ExtConn timed out while connecting." notice sometimes before, but not that much like now.

edit:

nginx seems to loadbalance without any problems, so this seems to be a lslb problem

mistwang 08-11-2010 11:33 AM

Please try command "telenet 192.168.0.4 80" from command line multiple times, see if you got long delay connecting to the target server sometimes.

GaryT 08-11-2010 01:19 PM

edit: wrong section

Clockwork 08-11-2010 01:27 PM

I've no telnet installed, but I've tried it with nmap and nc, no problems so far.

Clockwork 08-12-2010 12:54 PM

ohps, could someone move this topic to the loadbalancer forum? my mistake.

I've switched to nginx until there is a solution, lslb doesn't run stable atm, I hope you guys can help us to fix this problem, lslb is our ddos protection and performs way better than nginx.

mistwang 08-12-2010 01:58 PM

No problem. moved.
Have you specify the source IP when your configure each node?
Looks like lslb has problem connecting to all backend servers. could it be a problem with NIC port, switch port? If you use dedicate connection communicate with backend servers, you can check the packet loss of that specific NIC.
LSLB uses persistent connections, while nginx does not, there could be more ESTABLISHED connections with LSLB. Is there a firewall between LSLB and web servers?

If you do think it is a LSLB bug, could you strace lslbd while the problem is happening to help analyze the cause of the problem?

Clockwork 08-12-2010 02:49 PM

clusterHTTP config:
<nodeAddresses>(s1)127.0.0.1->192.168.0.3, (s2)127.0.0.1->192.168.0.4, (s4)127.0.0.1->192.168.0.5</nodeAddresses>

clusterStatic config:
<nodeAddresses>(s3)127.0.0.1->192.168.0.1:81</nodeAddresses>

Quote:

could it be a problem with NIC port, switch port?
I'll ask my provider if he could check the ports.

Quote:

you can check the packet loss of that specific NIC
--- 192.168.0.3 ping statistics ---
272 packets transmitted, 263 received, 3&#37; packet loss, time 271515ms
rtt min/avg/max/mdev = 0.110/1.030/10.585/1.856 ms

Quote:

LSLB uses persistent connections
I've disabled persistent connections in both clusters that I use.

Quote:

Is there a firewall between LSLB and web servers?
nope

Quote:

If you do think it is a LSLB bug, could you strace lslbd while the problem is happening to help analyze the cause of the problem?
I'll do, but first I need to read some strace howto's :p

mistwang 08-12-2010 03:24 PM

I think the problem is the source IP, should use a 192.168.0.x IP assigned to that server, or not use a source IP.

Clockwork 08-12-2010 03:35 PM

I've tried both, same problem.

mistwang 08-12-2010 05:30 PM

Quote:

--- 192.168.0.3 ping statistics ---
272 packets transmitted, 263 received, 3&#37; packet loss, time 271515ms
rtt min/avg/max/mdev = 0.110/1.030/10.585/1.856 ms
3% packet loss for a LAN environment is extremely high.


All times are GMT -7. The time now is 03:57 PM.