On Tue, 11 Sep 2012 16:24:36 -0400 Nick Mathewson <nickm@freehaven.net>
wrote:
On Tue, Sep 11, 2012 at 1:12 PM, Jacob Appelbaum <jacob@appelbaum.net> wrote:
Hi Scott,
It is nice to see you posting again, I had wondered where you had gone.
Scott Bennett:
I know this really belongs on tor-talk, but I haven't been subscribed
to it for a long time now. Sorry if posting this here bothers anyone.
Seems like a fine place to discuss relay problems, which is what it
sounds like, no?
Maaybe! The very best place would be the bugtracker, of course. (I do
seem to recall that you have some issues with trac -- I'm just
mentioning the bugtracker so that other people don't get the idea that
the mailing lists are the best place for bug reports. But a bug
report on the mailing list is much much better than no bug report at
all.)
You switched trackers a year or two ago. I don't recall whether I've
tried the new one. Either way, I hesitate to submit a bug report until
I'm pretty sure I'm looking at a bug, which is why I asked on this list
whether anyone else could suggest anything.
Back in early July, I upgraded from 0.2.3.13-alpha to 0.2.3.18-rc.
I immediately ran into problems with a python script that honors the
http_proxy environment variable, which I normally have set to the localhost
port for privoxy, which, in turn, connects to tor's SOCKS port. I couldn't
really see what was going wrong, but using arm to ask for a new identity
seemed to help sometimes to get a circuit that worked. Sending tor a
SIGHUP instead also seemed to work about as often.
If you use 0.2.2.x - what happens?
I'm not sure what the bug described here is, fwiw. What is the
behavior for the circuits that don't work, and to what extent is
0.2.2.x better?
The problem is that the python script in question often has no trouble,
but also often does. When it does, it usually gets a good connection twice
and fails on the third connection. (Or at least I *think* that's what it
does.) Then I either ask tor nicely with arm for a new set of circuits or
I pound on tor with a SIGHUP for same, and then try the python script again.
The process is slow, so it's very irritating to have to babysit it repeatedly
until it finally gets a good connection. If I've left the script a small
list of work to do, the failure may come at any point in the list, which then
terminates the script. :-( I think it may have an option to ignore errors,
but I don't really see an option like that as being very helpful in this
situation because it would just fail its way through the rest of the list
from that point, but it would take a long time to do that. Quitting outright
is much faster.
A bit over a week ago, I switched to 0.2.3.20-rc, and the problem
still occurs. However, 0.2.3.20-rc now also emits a new message from time
to time, the most recent occurrence of which is
Sep 06 06:02:45.934 [notice] Low circuit success rate 7/21 for guard TORy0=753E0B5922E34BF98F0D21CC08EA7D1ADEEE2F6B.
That is an interesting message - I wonder if the author of that message
might chime in?
Looks like bug #6475.
Wondering whether such circuit-building failures might be related to the
other problem, I began a little experiment: each time I saw a "Low circuit
success rate" message, I added the key fingerprint of the node in question
to my ExcludeNodes list in torrc and sent tor a SIGHUP.
The problem is still occurring, though, and when I look at the
circuits involved, they all seem to have at least one of the excluded
nodes in them, usually in the entry position. So my question is, what
changed between 0.2.3.13-alpha and 0.2.3.18-rc (or possibly 0.2.3.20-rc)
in the handling of nodes listed in the ExcludeNodes line in torrc? And
is there anything I can do to get the ExcludeNodes list to work again
the way it used to work?
Thanks in advance for any relevant information.
It seems that there are two issues - one is that a guard is failing to
build circuits, the other is that you can't seem to exclude them. I have
to admit, I'm more interested in the former... Is there a pattern to the
failures? That is for the 7 successes for that node, did you see
anything interesting? Were say, the nodes that worked somehow in the
same country as that guard? Or perhaps were the other failed circuits
all seemingly unrelated to the guard?
As far as the ExcludeNodes - did you set StrictNodes at the same time?
Are you also a relay?
Any other configuration info would be helpful here too.
Okay. Skipping over local port and IP address usage, here's some
client-side stuff.
TunnelDirConns 1
PreferTunneledDirConns 1
UseEntryGuards 1
NumEntryGuards [if you really need the number here, let me know--SB]
AllowDotExit 0
LongLivedPorts 20-23,47,115,119,143,144,152,178,194,563,706,989,990,992-994,1863,5050,5190-5193,5222,5223,6000-6063,6523,6667,6697,8021,8300
There are also several NodeFamily statements and extensive ExcludeNodes and
ExludeExitNodes statements. If there is other stuff you want, let me know,
but depending upon what you ask for, I may choose to send it to you directly
rather than post it here.
(To answer your question: looking through the changelogs, and the
commit logs for src/or/circuitbuild.c and src/or/routerlist.c, I can't
find anything that stands out to me as something that might cause an
ExcludeNodes regression. So more investigation will be needed!)
The only thing remotely related that I saw was the following item in
the Changelog for 0.2.3.15-rc.
o Minor bugfixes (on 0.2.2.x and earlier):
.
.
.
- After we pick a directory mirror, we would refuse to use it if
it's in our ExcludeExitNodes list, resulting in mysterious failures
to bootstrap for people who just wanted to avoid exiting from
certain locations. Fixes bug 5623; bugfix on 0.2.2.25-alpha.
It probably has nothing to do with what has been causing me trouble, but
it was the only item I found about either of the Exclude{,Exit}Nodes
statement that mentioned a change between 0.2.3.13-alpha and 0.2.3.18-rc.
Scott