Re: [tor-relays] Bandwidth Authority PID Feedback Experiment #2 Starting

13 Dec 2011

      -----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/12/2011 9:14 PM, Mike Perry wrote:
...
It looks like Moritz is seeing some evidence of TCP sourceport 
exhaustion in his Tor logs: "[warn] Error binding network socket:
Address already in use".
He's also monitoring TCP connection counts on each IP interface: 
netstat -ntap | grep $INTERFACE_IP | wc -l
It appears that right now, he's at only about ~10k connections per
IP, and not experiencing any log lines at the moment. It is
possible this is a transient condition caused by overly-agressive
scrapers and/or torrenters who flock to the node for a short while
and then move on?
Reports on the recent appearance or prevelance increase of that or 
other warns from others will be helpful.
Hey Mike,

We're not seeing source port exhaustion, but we are seeing two warns,
one of which I haven't been able to nail down:

2011 Dec 13 20:22:07.000|[notice] We stalled too much while trying to
write 8542 bytes to address "[scrubbed]".  If this happens a lot,
either something is wrong with your network connection, or something
is wrong with theirs. (fd 409, type Directory, state 2, marked at
main.c:990).

2011 Dec 13 22:26:45.000|[warn] Your computer is too slow to handle
this many circuit creation requests! Please consider using the
MaxAdvertisedBandwidth config option or choosing a more restricted
exit policy. [18 similar message(s) suppressed in last 60 seconds]

The second warn I figure I should be tuning myself with
MaxAdvertisedBandwidth, and it's happening on BigBoy, the relay on
this box that's doing the majority of its bandwidth.  So I'm not sure
if it's anything that your feedback loop should be involved in or not.

The first warn, however, is happening on all four of the other relays
on the box, somewhat sporadically (GreenDragon, GoldDragon,
WhiteDragon, RedDragon).  It seems, though I can't say for sure, that
this (or something else I can't quite find) is constraining those four
relays and preventing from them growing nearly as high in bandwidth
consumption as BigBoy.  They're all configured identically (non-exit
middle / eventually entry if/when they get Guard).  Those four also
are different from BigBoy in that they kept names/configs but lost
fingerprints/keys (ie they're all specifically Unnamed in the
consensus) during a rebuild when trying to milk more bandwidth out of
them.  My understanding is that this /shouldn't/ make a big
difference, but perhaps it is?

I should note that this all could be entirely unrelated to your PID
feedback experiments; I coincidentally started experimenting with
trying to bring these nodes up to their hardware potential at around
the same time, so it could all be a red herring from my
experimentation (but I must say, the fact that we're both
experimenting together sure does make drawing valid conclusions
difficult. :)).

One other data point, I have seen (also sporadically) some indications
in my system logs of hardware hangs on the ethernet interface all of
this is running through, so I'm slightly suspicious that it's to blame
for the *Dragon problems.  It doesn't really explain why BigBoy isn't
affected though, and I haven't been able to definitively prove
anything yet, so I'm just not sure.

Let me know if any more info would be of use (or if anyone has any
ideas about that first message).

Thanks,
Tim

- -- 
Tim Wilde, Senior Software Engineer, Team Cymru, Inc.
twilde@cymru.com | +1-847-378-3333 | http://www.team-cymru.org/
-----BEGIN PGP SIGNATURE-----

iEYEARECAAYFAk7n5EUACgkQluRbRini9thmCgCdGMgAYyJ4+yGk7MMEi6IjVcyE
sOMAn3e411v1h11asitncJl9abwNJ2YC
=Zdc9
-----END PGP SIGNATURE-----