Nick Mathewson:
On Tue, Sep 11, 2012 at 1:12 PM, Jacob Appelbaum jacob@appelbaum.net wrote:
Hi Scott,
It is nice to see you posting again, I had wondered where you had gone.
Scott Bennett:
I know this really belongs on tor-talk, but I haven't been subscribed
to it for a long time now. Sorry if posting this here bothers anyone.
Seems like a fine place to discuss relay problems, which is what it sounds like, no?
Maaybe! The very best place would be the bugtracker, of course. (I do seem to recall that you have some issues with trac -- I'm just mentioning the bugtracker so that other people don't get the idea that the mailing lists are the best place for bug reports. But a bug report on the mailing list is much much better than no bug report at all.)
Oh, I don't mean to imply not to file bugs but rather, if we have a guard that fails circuits, I'd say we should discuss it openly. Is it a load issue? Or something else?
Back in early July, I upgraded from 0.2.3.13-alpha to 0.2.3.18-rc.
I immediately ran into problems with a python script that honors the http_proxy environment variable, which I normally have set to the localhost port for privoxy, which, in turn, connects to tor's SOCKS port. I couldn't really see what was going wrong, but using arm to ask for a new identity seemed to help sometimes to get a circuit that worked. Sending tor a SIGHUP instead also seemed to work about as often.
If you use 0.2.2.x - what happens?
I'm not sure what the bug described here is, fwiw. What is the behavior for the circuits that don't work, and to what extent is 0.2.2.x better?
Scott?
A bit over a week ago, I switched to 0.2.3.20-rc, and the problem
still occurs. However, 0.2.3.20-rc now also emits a new message from time to time, the most recent occurrence of which is
Sep 06 06:02:45.934 [notice] Low circuit success rate 7/21 for guard TORy0=753E0B5922E34BF98F0D21CC08EA7D1ADEEE2F6B.
That is an interesting message - I wonder if the author of that message might chime in?
Looks like bug #6475.
Ok.
Wondering whether such circuit-building failures might be related to the other problem, I began a little experiment: each time I saw a "Low circuit success rate" message, I added the key fingerprint of the node in question to my ExcludeNodes list in torrc and sent tor a SIGHUP. The problem is still occurring, though, and when I look at the circuits involved, they all seem to have at least one of the excluded nodes in them, usually in the entry position. So my question is, what changed between 0.2.3.13-alpha and 0.2.3.18-rc (or possibly 0.2.3.20-rc) in the handling of nodes listed in the ExcludeNodes line in torrc? And is there anything I can do to get the ExcludeNodes list to work again the way it used to work? Thanks in advance for any relevant information.
It seems that there are two issues - one is that a guard is failing to build circuits, the other is that you can't seem to exclude them. I have to admit, I'm more interested in the former... Is there a pattern to the failures? That is for the 7 successes for that node, did you see anything interesting? Were say, the nodes that worked somehow in the same country as that guard? Or perhaps were the other failed circuits all seemingly unrelated to the guard?
As far as the ExcludeNodes - did you set StrictNodes at the same time? Are you also a relay?
Any other configuration info would be helpful here too.
(To answer your question: looking through the changelogs, and the commit logs for src/or/circuitbuild.c and src/or/routerlist.c, I can't find anything that stands out to me as something that might cause an ExcludeNodes regression. So more investigation will be needed!)
I didn't see anything either - my first thought was of course to the worst case "a guard that selectively fails circuits, perhaps only allowing creation to nodes that they also control/watch/etc."
All the best, Jake