-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 12/12/2011 9:14 PM, Mike Perry wrote:
It looks like Moritz is seeing some evidence of TCP sourceport exhaustion in his Tor logs: "[warn] Error binding network socket: Address already in use".
He's also monitoring TCP connection counts on each IP interface: netstat -ntap | grep $INTERFACE_IP | wc -l
It appears that right now, he's at only about ~10k connections per IP, and not experiencing any log lines at the moment. It is possible this is a transient condition caused by overly-agressive scrapers and/or torrenters who flock to the node for a short while and then move on?
Reports on the recent appearance or prevelance increase of that or other warns from others will be helpful.
Hey Mike,
We're not seeing source port exhaustion, but we are seeing two warns, one of which I haven't been able to nail down:
2011 Dec 13 20:22:07.000|[notice] We stalled too much while trying to write 8542 bytes to address "[scrubbed]". If this happens a lot, either something is wrong with your network connection, or something is wrong with theirs. (fd 409, type Directory, state 2, marked at main.c:990).
2011 Dec 13 22:26:45.000|[warn] Your computer is too slow to handle this many circuit creation requests! Please consider using the MaxAdvertisedBandwidth config option or choosing a more restricted exit policy. [18 similar message(s) suppressed in last 60 seconds]
The second warn I figure I should be tuning myself with MaxAdvertisedBandwidth, and it's happening on BigBoy, the relay on this box that's doing the majority of its bandwidth. So I'm not sure if it's anything that your feedback loop should be involved in or not.
The first warn, however, is happening on all four of the other relays on the box, somewhat sporadically (GreenDragon, GoldDragon, WhiteDragon, RedDragon). It seems, though I can't say for sure, that this (or something else I can't quite find) is constraining those four relays and preventing from them growing nearly as high in bandwidth consumption as BigBoy. They're all configured identically (non-exit middle / eventually entry if/when they get Guard). Those four also are different from BigBoy in that they kept names/configs but lost fingerprints/keys (ie they're all specifically Unnamed in the consensus) during a rebuild when trying to milk more bandwidth out of them. My understanding is that this /shouldn't/ make a big difference, but perhaps it is?
I should note that this all could be entirely unrelated to your PID feedback experiments; I coincidentally started experimenting with trying to bring these nodes up to their hardware potential at around the same time, so it could all be a red herring from my experimentation (but I must say, the fact that we're both experimenting together sure does make drawing valid conclusions difficult. :)).
One other data point, I have seen (also sporadically) some indications in my system logs of hardware hangs on the ethernet interface all of this is running through, so I'm slightly suspicious that it's to blame for the *Dragon problems. It doesn't really explain why BigBoy isn't affected though, and I haven't been able to definitively prove anything yet, so I'm just not sure.
Let me know if any more info would be of use (or if anyone has any ideas about that first message).
Thanks, Tim
- -- Tim Wilde, Senior Software Engineer, Team Cymru, Inc. twilde@cymru.com | +1-847-378-3333 | http://www.team-cymru.org/