View Single Post
Old 06-12-2010, 11:01 PM  
Yil
Too much time...
 
Join Date: May 2005
Posts: 1,194
Default

Can you check to see if you have this in your Debug.log file? I'm guessing you should and I'm interesting in what the IP (only if it's not what you expect) and error number are.
Services_Test: Connect failed for service '<service>' IP=<ip>: <errornum>

In this particular case I'm going to classify this behavior as a bug. There are situations where it's possible to not connect to the server every time since it tries every minute, and it shouldn't treat that as a failure so quickly...

There is already code to wait for all services to "fail" (3 failed connections in a row), and at the least it should apply the 3-in-a-row rule to a single service, or perhaps I should just stick with all "failed" even though that might take a bit longer to figure out. In fact I can see the benefit to "all" and having a local/private service defined on a non-exported port just to handle a DoS type attack on the main port.

This behavior needs to be changed, but the larger question remains. Did the server have trouble connecting to itself because someone was mass connecting with like 5-10 logins all at one time and it used up all the listen() backlog on a non-server OS. Or did something odd happen like the server's IP changed? Or was this just the first sign of catching the server locking up? Let's see what the error code might tell us.

If you see this happening too often for you I'll put out a fix with just this change, just let me know.
Yil is offline   Reply With Quote