»Maverick Posted March 10, 2010 Report Posted March 10, 2010 Hey Mav, my firewall script kicked in and I never saved after banning your IP, it is unbanned again now. Alright, hopefully things will go well this time. Quote
L.C. Posted March 10, 2010 Author Report Posted March 10, 2010 Hey Mav, my firewall script kicked in and I never saved after banning your IP, it is unbanned again now. LC, 4 hours = 14,400, 240 = 4 minutes.Oops. Quote
»Lynx Posted March 11, 2010 Report Posted March 11, 2010 http://bw.krslynx.com/ Traffic is slowly starting to build, I've verified with iftop that it comes from drpepper.dreamhost.com, if it follows the same trend it will continue to swell. Quote
»doc flabby Posted March 11, 2010 Report Posted March 11, 2010 (edited) My directory client does receive an unknown packet (0x90) upon connecting to the directory serverThis is unusual, because no where in code for central does it send such a packet. I wonder, are you making sure you listen on a single IP for replys? I had a similar issue with Central until i bound it to a single IP. http://bw.krslynx.com/ Traffic is slowly starting to build, I've verified with iftop that it comes from drpepper.dreamhost.com, if it follows the same trend it will continue to swell. This is outgoing traffic, its coming FROM bw.krslynx.com. Hey Mav, my firewall script kicked in and I never saved after banning your IP, it is unbanned again now. Alright, hopefully things will go well this time.Your script is doing something weird, I noticed I have the same traffic problem with Central. was spamming up to 3MB/s of udp at drpepper. I think its leaving a connection open, and then reusing the connection in some strange way, creating a loop, which causes the directory server to spam back data. which then causes your script to spam data back, which triggers this process again It might be your not creating a new socket every time you connect, so its using the same port number. But this is a total guess. Its interesting that both Central and snrrrubs dir server suffer from the same problem, it suggests we've both implemented ss protocol pretty well I've since patched Central to guard against this kind of problem today. Essentially any connection over 60 seconds is forcefully killed, eliminating the chance of a connection loop occuring Edited March 11, 2010 by doc flabby Quote
»Maverick Posted March 11, 2010 Report Posted March 11, 2010 (edited) meh I'll stop all directory server polling and get someone to look at the source code. EDIT: Data sources disabled. Cacti will stop executing the directory client each minute now. EDIT2: I'll take another look at the source code. I'm sure that the directory client is properly disconnecting after getting the 0x07 disconnection packet from the directory server though. Edited March 11, 2010 by Maverick Quote
»Maverick Posted March 12, 2010 Report Posted March 12, 2010 (edited) I checked the source of my directory client again with the help of Doc Flabby. He found out that a List in my application wasn't thread-safe while the program was using multiple threads to access it. I made the List thread-safe and then booted up a local Snrrrub Directory Server to see if there is something out of the ordinary. Immediately after sending the client key to the Directory Server, the server kept sending zone lists indefinately (in the log it showed as repeatedly sending a zone list to a client). Shutting down the client even made the Directory Server crash. It's obvious something was going very wrong. I researched the protocol and read Catid's documentation on the protocol after a tip from Doc Flabby. Doc Flabby also said I should use a packet sniffer like WireShark to see what was going on. I used WireShark on the Continuum client when it downloads a zone list from the directory server and found out that the Continuum client sends ACK packets on the 0x0a packets (there were more differences to the protocol the client uses but this was the biggest one). There are two possible causes to this problem; the documented subspace protocol for a directory client is missing large pieces of information about acknowledging 0x0a packets, or there are big differences to using the Subspace protocol over the Continuum protocol (something I didn't expect). On the latter possibility it would be an easy fix to set my directory client to identify itself as using the subspace protocol. Nevertheless, my directory client basically DOS'ed Snrrrub's and Doc Flabby's directory servers (I haven't tested it on Priitk's directory server yet though) - a client should never be able to do such a thing (and even if a client can, the server should stop transmitting packets once the client has stopped doing it). Lynx's directory server is currently even transmitting about 1.2 MBit/s! Anyhow, my apologies for the inconvenience caused. I will continue work and test my directory client more thorougly on Snrrrub's, Doc Flabby's and Priitk's Directory Server. Edited March 12, 2010 by Maverick Quote
»Lynx Posted March 12, 2010 Report Posted March 12, 2010 Good job, I've reset my dirserver again. Quote
rootbear75 Posted March 12, 2010 Report Posted March 12, 2010 hahaha unintentional DoS.... awesome mav... lol Quote
Samapico Posted March 12, 2010 Report Posted March 12, 2010 I'm wondering about why the server does't automatically block the client off or something when such things happen? Not trying to blame the server or whatever, but I would have thought a single client would not be able to pull that much data from a server without being blocked or something; or is it because it opens multiple connections? Quote
»doc flabby Posted March 12, 2010 Report Posted March 12, 2010 (edited) I'm wondering about why the server does't automatically block the client off or something when such things happen? Not trying to blame the server or whatever, but I would have thought a single client would not be able to pull that much data from a server without being blocked or something; or is it because it opens multiple connections?I've adjusted Central to do just this now, as i didnt have time to try and work out what exactly was going wrong Theres some kind of complex interaction going on. Edited March 12, 2010 by doc flabby Quote
L.C. Posted March 12, 2010 Author Report Posted March 12, 2010 I'm wondering about why the server does't automatically block the client off or something when such things happen? Not trying to blame the server or whatever, but I would have thought a single client would not be able to pull that much data from a server without being blocked or something; or is it because it opens multiple connections?Design flaw of directory servers, even with lack of complete knowledge on the protocol. One should, in my opinion, naturally consider fail-safes. Quote
PoLiX Posted March 13, 2010 Report Posted March 13, 2010 mm. Been averaging 800KB/s with peaks of 1.47MB/s Nothin looks abnormally insane. Quote
»Maverick Posted March 18, 2010 Report Posted March 18, 2010 I found out what was wrong with my Directory Client. Basically, my directory client kept sending 0x01 Server List Requests to the Directory Server because there was a bug of not clearing the receive-buffer. Luckily, nothing much else was wrong in the protocol implementation. It was hard to hunt down the bug though. This the protocol I've implemented; CLIENT SERVER ------------------------------------ -------------------------------------------------- Encryption request 0x01 -------> <------ 0x02 Encryption response Server list request (reliable) 0x03 -------> Synchronization request 0x05 -------> <------ 0x03 Reliable Packet Reliable ACK 0x04 -------> <------ 0x0E Cluster <------ 0x03 Reliable Packet (contains 0x0A Massive Chunk) Reliable ACK 0x04 -------> (0x03 is received multiple times for the complete server list) Disconnect 0x07 -------> <------ 0x07 Confirm Disconnection Everything seems to work fine on CatId's, Snrrrub's and Doc Flabby's Central directory server... well .. more or less;Snrrrub's directory server seems to crash if my directory client closes the connection during the server list download.CatId's and Doc Flabby's Central directory server doesn't confirm the disconnection so my client waits for the connection to hit the Time Out and then closes the connection. Luckily this still works though :-) I'm going to restart the process of polling the directory servers every minute. Please let me know if you see any problems popping up again. I hope it doesn't as I've tested things thoroughly this time Quote
»Maverick Posted March 19, 2010 Report Posted March 19, 2010 (edited) L.C., ds.hlrse.net seems to be down? I just checked http://bw.krslynx.com and the outbound traffic rate seems to be climbing again? Edited March 19, 2010 by Maverick Quote
»Maverick Posted March 19, 2010 Report Posted March 19, 2010 (edited) I fear that every time my directory client closes the connection without finishing (either because of a timeout or its process got killed by the server), the directory server keeps sending server lists (as no ACK packets are received by the server). Plus each time the directory client runs, it grabs a random port and doesn't reuse the same port so it can't finish the existing process with the server. If this prediction is what really happens, I wouldn't really know how to fix this on my end. I can test running the directory client at the same port each time but this might just break it when it suddenly starts receiving server list packets without asking for it. EDIT: Querying directory servers disabled again. The querying of the 7 directory servers took to long when 2 or 3 directory servers are down (zone population count and directory server querying takes longer then 1 minute in total). EDIT2: Maybe I'll set the directory client to always use the same local port and make a seperate program that opens this local port and just sends a 0x07 disconnection packet to the directory server after the directory client has run. If the directory client gets killed, the seperate program will close the connection and make the directory server stop sending packets (theoretically). Edited March 19, 2010 by Maverick Quote
L.C. Posted March 19, 2010 Author Report Posted March 19, 2010 hlrse.net went offline yesterday. It's back online now, though I may or may not boot ds.hlrse.net back up until things get sorted out. Our webserver backups are being downloaded in fear that the HDD may be failing, so we aren't booting up any of the regular services like Apache, MySQL, etc, until we get everything settled. ds1.hlrse.net on the other hand is alright. (But if hlrse.net goes offline, so does the DNS for ds1.hlrse.net.) Quote
»Maverick Posted March 19, 2010 Report Posted March 19, 2010 (edited) Doc Flabby, L.C. and PoLiX, Please restart your directory server if you notice excess bandwith usage which is likely to be caused by my directory client misfires. I don't want to cause any excess bandwith costs.Apologies for the inconvenience (sigh) Edited March 19, 2010 by Maverick Quote
»doc flabby Posted March 19, 2010 Report Posted March 19, 2010 (edited) Doc Flabby, L.C. and PoLiX, Central (ssdir.playsubspace.com) now has flood protection, so isnt affected by this problem, as you can see by my bandwidth graph below, feel free to test on Central The problem with your code design, is you once you've received the entire directory listing packet (you're know you have because it will be when the 0x0A packet is fully constructed)you should then immediatly send the server 2 0x07 packets and disconnect. Always send 2 0x07 as it minimises the chance of one being lost. The server doesn't have to send a 0x07, and even if it does, the packet can get lost. If you follow this design it will ensure no problems Edited March 19, 2010 by doc flabby Quote
»Maverick Posted March 19, 2010 Report Posted March 19, 2010 (edited) hm yea true, I will do that. Unfortunately I don't think that will solve the problem of Snrrrub's directory server endlessly spamming packets if a client closes the connection while not finishing the process Edited March 19, 2010 by Maverick Quote
»Maverick Posted March 28, 2010 Report Posted March 28, 2010 (edited) Looking at Lynx's bandwith report it seems like it's also possible that Continuum clients can misfire when downloading the zone list. This time it couldn't have been my directory client causing this as it hasn't run for quite some time now. I would recommend all server owners running Snrrrub's directory server to restart it daily to prevent excessive use of bandwith. Edited March 28, 2010 by Maverick Quote
»Maverick Posted July 1, 2010 Report Posted July 1, 2010 (edited) Note; I've recommenced polling the directory servers every 5 minutes for the Population statistics. Currently only sscentral.com, ds1.krslynx.com and ssdir.playsubspace.com are online and responding (nanavati.net is online but isn't working like it should). I'm using the PHP Directory client this time. If you notice any problems with the directory servers that may be related to the 5-minute interval polling done by the population statistics (servername: panamacity.dreamhost.com), let me know. Edited July 1, 2010 by Maverick Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.