On 05/08/2016 03:49 AM, Roger Dingledine wrote:
On Fri, Apr 22, 2016 at 04:22:48PM -0400, Zack Weinberg wrote:
I'm working on an exitmap module that wants to feed order of 5000 short-lived streams through each exit relay. I think this is running foul of some sort of upper limit (in STEM, or in Tor itself, not sure) on the number of streams a circuit can be used for, or how long, or something.
Tor has a built-in "if a stream has been waiting for a connected cell for 10 seconds, something has gone wrong" feature, where it disconnects from that circuit, and marks the circuit as not worth using for future streams. (Assuming the stream hasn't timed out, Tor would then say "oh hey, I need to find a circuit for this stream", and start the process over again.)
It looks from your logs like you're attaching a bunch of streams to a circuit, and one or more of the begin cells (heck, maybe even all of them) aren't getting any answer in the first 10 seconds.
Yeah, that's expected; the policy you describe is very, very wrong for this experiment. Is there any way to turn it off?
What's going on is that I have ~500 hosts in known locations worldwide, and I'm measuring the round-trip latency to each. Unfortunately, I can't count on all of them to be up and responding to SYNs at any given time. The module asks each exit to make 10 connections to each host, and it deliberately targets a *closed* port whenever possible, so I'm expecting to get back mostly RELAY_END/REASON_CONNECTREFUSED replies, and timeouts.
The module internally times out each *stream* after ten seconds, but it wants to use the same circuit forever - this is both because of design limitations in exitmap (see below), and because changing circuits in the middle of this test will spoil at least one measurement with circuit-creation overhead. Changing the route to the exit is also bad, as it will change the intra-Tor latency, which ideally would be the same for all measurements. (This last is mitigated by exitmap using two-hop circuits and being able to pin the entry node, but one would like to be certain that it won't happen.)
Frankly, assuming that something is wrong with a *circuit* when one of its *streams* suffers a connection timeout seems inappropriate in general. That can happen for any number of reasons beyond Tor's control, not least that, simply, the exit node's SYN to the requested destination got dropped by a firewall. What was the original rationale for this policy?
2016-04-22 16:07:54,115 [DEBUG]: Circuit status change: CIRC 6 BUILT [fp],[fp] PURPOSE=GENERAL TIME_CREATED=2016-04-22T20:07:53.305851
[...]
2016-04-22 16:07:58,118 [DEBUG]: Attempting to attach stream 98 to circuit 6. 2016-04-22 16:08:04,697 [DEBUG]: Circuit status change: CIRC 6 CLOSED [fp],[fp] PURPOSE=PATH_BIAS_TESTING TIME_CREATED=2016-04-22T20:07:53.305851 REASON=FINISHED
This is interesting. If the later streams were still attached to the circuit, the circuit shouldn't be closing here. Are you sure the streams actually get attached (and stay attached)?
It looks like you're only giving me exitmap logs, not Tor logs, so it's hard to tell for sure what's going on underneath.
Yeah, I'll try to get some Tor logs (that'll mean more changes to exitmap core, yay). I may not get to it till June, though. The later streams might be a red herring; the module closes each stream immediately after the connection resolves and that may not be visible in these logs.
I am not sure why the circuit is labeled PATH_BIAS_TESTING in the first place; it's possible that exitmap has gotten mixed up about which circuit it is supposed to be using.
You can see that circuit 6 is no longer available, but the module is still trying to use it.
That does indeed sound like an exitmap bug.
exitmap core assumes that one circuit per exit is all you will ever need, and has no way of notifying an experiment module that its circuit has gotten destroyed. It seems pretty clear that this is a design error, but it's going to take a lot of work to change ...
zw