[tor-relays] More consensus weight problems

Aaron Gibson aagbsn at extc.org
Mon Jun 29 18:46:44 UTC 2015


On 2015-06-29 17:51, Speak Freely wrote:
> Hello,
> 
> First of all, I love Tor. I love Tor Browser, and I love running 
> relays.
> 
> When the problems are solved, I will most likely spin up more relays.
> 
> I'm leaving my fastest relay running, as a method of checking the 
> status
> for myself. The rest have already started to expire, and within the 
> next
> week or so most of the other ones will have expired as well.
> 
> I'm going to try tor-dev-alpha 2.7.1 and change fingerprints, as per a
> suggestion from s7r, seeing as how I have nothing to lose.
> 
> I just wish the bwauths could scan relays based off previous relative
> consensus weights... If this particular relay was at 27000, it should 
> be
> higher on the list to check compared to another one I have that is at
> 487. My one relay was blazing fast with thousands of connections, my

Well, relays are ranked by capacity, and split over several scanner 
processes - they do get measured against their relative peers. But it 
seems that, when they fall out of the measurement process ('Unmeasured') 
they must start again at the beginning. This is expected because all 
relays start Unmeasured, and gradually increase their position in the 
consensus (per relative capacity), in order to dampen sudden changes and 
limit sybil attacks by requiring relays to stick around for a while, 
increasing the cost to an adversary. It likely should not be the case 
that historically long running relays should start at the bottom if they 
are unmeasured for a short period of time.

We are in the process of testing increasing the number of scanner and 
accompanying tor instances from 4 to 9 (double, plus one for currently 
unmeasured relays) in order to decrease the amount of time each fraction 
of the network takes to measure and ensure that new relays or unmeasured 
relays are measured often. There are additional patches that introduce 
extra exits into the slice of relays, if there are no suitable exits to 
measure with. This likely won't address the above behavior, but we hope 
it will reduce the number of relays that go missing. Currently we seem 
to have mixed results, with one Bandwidth Authority operator claiming 
minimal (50) unmeasured relays, and another claiming ~600 mixed relays. 
These numbers are not directly comparable because they were not sampled 
at the same time, and may not be representative of typical behavior - 
it's a little too soon to tell.

It's a bit tricky to both test these changes, on the live tor network, 
demonstrate that they produce sane results, and convince the directory 
authority operators and partner bandwidth authority operators to upgrade 
- nor do we want to do that all at once - gradual change is better. So, 
the goal is to produce results that will convince operators they should 
update, improve the situation for relay operators, and then start 
looking at longer term solutions for the measurement problem that are 
more maintainable and scalable in the long run.


> other is painfully useless with dozens, but my fastest one lost its
> consensus while the slowest one kept it's consensus. It just seems
> silly. That being said, I don't know how/if the bwauths scan in any
> order or just willy-nilly, (that's not entirely true, I know it's
> segmented to some degree as I recall reading a blog post about how it's
> chopped up) but... I'd be much less upset if my best relays worked and
> my worst relays didn't. More complaining... bleh.
> 

I hope to have a testable hypothesis as to why your faster relays 
suffer(ed) more than the slower relays - it could be that the fraction 
of network by capacity allocated to a particular scanner is not well 
balanced, and that fraction is taking significantly longer to measure. 
In order to evaluate that statement I need to understand the common 
characteristics of relays that become unmeasured/lose rank and see if 
they are from a similar segment of the network, and whether or not that 
segment of the network takes longer to measure than other segments.

Another hypothesis is that your relays are on the boundary between two 
segments, and that a transition between scanner instances causes enough 
missed measurements to drop your relays. It would be helpful to know 
what rank the last good measurement your or other relays had before 
becoming unmeasured.

It will require some cooperation with the existing deployed Bandwidth 
Authorities, in order to learn what their current scan times are - I 
will be writing some simple scripts to scrape these results so that we 
can collect and publish some useful heuristics about the scanner 
processes to better try and debug this problem.

> 
> One thing I would like to point out though... it appears... These
> problems have at least a casual relationship with MyFamily.
> 
> One group of MyFamily is completely done - all of them stuck at 20.
> Another group of MyFamily is working happily.
> I've been doing some tests over the past few months trying to 
> understand
> why I keep having problems, and one thing has consistently popped up...
> MyFamily.

That is very interesting, because MyFamily should have nothing to do 
with the scanner process at all - I'll need to think about this some 
more.

> 
> As one of MyFamily lost consensus, another family gained consensus back
> on or around the same time.
> 
> Yes, especially nusenu, I know I'm supposed to have it all configured 
> to
> be under 1 MyFamily... But in a way I'm glad I didn't, as the casual
> relationship I see really could only be seen having done what I did.
> 
> I say casual because I have no proof of causation. But... it is
> interesting. If no one else has experienced similar problems, then I'd
> chock it up to a completely unexpected unrelated set of mysterious
> circumstances that should not have happened for which there is no
> explanation.
> 
> Aaron, if there is anything I can do to help you please let me know.

If anything that I said above sparks a thought, please let me know :)

> 
> 
> So in conclusion, I'm not done, I'm just not happy.
> 
> This was supposed to be a short email, oops.
> 
> 
> Matt
> Speak Freely



More information about the tor-relays mailing list