[tor-relays] bandwidth authority algorithm is cracked

julien.robin28 at free.fr julien.robin28 at free.fr
Tue Jan 21 22:59:04 UTC 2014


Hi Mike,

What you said is very interesting, it was the missing part for me to understand why the weight of the relay in the consensus can drop (or rise again) so quickly (sometimes 3 times per day) without being caused by any change of used bandwith on the network cable of the dedicated machine : the only visible changes in the server bandwith were visible a couple of minutes/hours after changes in consensus weight, and it was proportionnal.

The cause of the problem I encountered will be found here : when the authority servers were doing bandwith measurement on it.

May be an anormally great amount of circuits (depends on the origins and networks of these circuits) were almost unusable : for example if there is 50 percent chance to be measured with very very low bandwith when the authority server did the job. In this case, no error from the algorithm : he did his job - bad bandwith, bad ratio.



With these informations, I would think that my server encounter(ed) some difficulty somewhere in these 2 possible locations :

1-Into the machine itself, causing aleatory bad bandwith on some circuits or circuits cannot establish*, ever with few people connected (quarter of the machine capacity the problem was still present, ever with new identities running alone with pretty slow bandwith, so we can exclude normal TCP/IP socket congestion between my relay and others ones). Also, I had no "Your computer is too slow to handle this many creation requests" while the problem was still present.

2-Difficulty to communicate with a particular point of the Internet network, point that is involved during bandwith measurements by authorities. If the problem is still present I got to verify by observing, on Tor Atlas, other relays that are on the same network service provider - if possible into the same datacenter (Iliad DC3, France, adresses like 88.191.xxx.xxx but may be not only).

Another 2- May be great scale geographic networks problems on my ISP made circuits to have 50 percent chance to work fast and fine (and using all the available bandwith) and 50 percent chance to be slow and unusable, but it looks like a too big affair, I'm not sure it's really possible (and 50 percent is an example value).

Is the measurement method the same for Exit Nodes and Middle/Entry Nodes ?

*What is the decision of the authority algorithm when the relay to measure cannot be established into one or more circuits ?

Thank you in advance ! I will wait and see for the following days or week and keep you informed.
Julien ROBIN

PS : while I am there, first fall down (consensus weight fraction divided by 2, 0.137% to 0.067%, now 12100, 0.77%) on ArachnideFR94v2 few minutes ago, (but with such low values, variation are may be normal, we got to wait and see, I will mark the consensus weight values into an excel tab to be sure of what I will see on following days and weeks). Exit probablity from 0.400 to 0.200 :(


----- Mail original -----
De: "Mike Perry" <mikeperry at torproject.org>
À: tor-relays at lists.torproject.org
Envoyé: Mardi 21 Janvier 2014 19:49:27
Objet: Re: [tor-relays] bandwidth authority algorithm is cracked

Also keep in mind that what the bandwidth authorities actually measure
is not total capacity but spare stream capacity (by downloading large
files through at least 5 different two hop circuits times for each
relay). They then use this stream throughput measurement to create a
multiplier to multiply your descriptor bandwidth by. The multiplier is
the ratio of your average measured spare stream capacity to the network
average stream spare capacity.

The reasoning behind this is that the bandwidth authorites are a load
balancing mechanism that is meant to reallocate consensus weight to
relays that are underutilized from relays that are overutilized. If your
relay experiences bursts of traffic, the authorities may measure you as
having low stream capacity. However, there are 5 of them, and we take
the median measurement of all 5. Again, each bandwidth authority
also performs 5 measurements of each relay in two hop circuits, pairing
it with relays of similarly observed spare stream capacity.

But yes, it is possible that something has broken in the years since
they have had serious attention. Currently Aaron Gibson devotes some
cycles to fixing issues among his other responsibilities, but we could
use a dedicated pair of eyes keeping track of their behavior, especially
as new Tor versions are released.

Karsten Loesing:
> Hi starlight, hi Julien,
> 
> the bandwidth scanner system is quite complex, so it might be the case
> that part of it is broken.  But from this thread that's hard to say, and
> it's impossible to know what part needs fixing.
> 
> Want to help us debug the problem(s) you observed?
> 
> Here are a few possible starting points:
> 
>  - Search your relays in Atlas at https://atlas.torproject.org/, look at
> the graphs at the bottom, and tell us at what times you think the
> "consensus weight fraction" plot is totally off.
> 
>  - Read Roger's blog post
> https://blog.torproject.org/blog/lifecycle-of-a-new-relay and tell us
> how much your findings overlap or do not overlap with the expectations
> stated in that blog post.
> 
>  - More ambitiously, download the vote documents from the metrics
> website at https://metrics.torproject.org/data.html, find your relay in
> the votes produced by bandwidth authorities, and tell us what unexpected
> things you found while doing so.
> 
>  - Even more ambitiously, read the bandwidth scanner spec at
> https://gitweb.torproject.org/torflow.git/blob/HEAD:/NetworkScanners/BwAuthority/README.spec.txt
> and tell us what data we could obtain from the bandwidth scanners to
> further debug this problem.
> 
> Thanks!
> 
> All the best,
> Karsten
> 
> 
> On 1/10/14 9:37 AM, julien.robin28 at free.fr wrote:
> > I had the same problem since begining of december on my ArachnideFR94
> > server (88.191.192.25, service provider : Iliad - Online.net) :
> > Consensus weight from more than 100,000 to brutally 6,000 and 20,000,
> > after a few time rise up to 50,000, and brutally fall down back 3,000
> > and 10,000 the following day, 30,000, 12,000, 8,000... after an
> > entire month of bandwith never rising back and falling down even
> > lower, after i tryed everything (create a new server identity, but
> > after some weeks, same problem), seeing worst and worst, end of
> > november my bandwith was about 20MB/s  (sometimes into the top 5 of
> > the world biggest servers !), it was about 0,9MB/s when I decided to
> > close it.
> > 
> > No problem of bandwith with the service provider, the bandwith graph
> > were just starting to brutally go down a couple of minutes after the
> > consensus weight brutally fall back. I was thinking it was because
> > too many tor relays are running on this service provider (since end
> > of July, 2013, Tor relays are accepted by this service provider and
> > the service provider also opened to internationnal with interesting
> > prices).
> > 
> > And I have no problem with my 2 others servers at "Digicube" service
> > provider.
> > 
> > If it can help !
> > 
> > Best regards Julien ROBIN
> > 
> > ----- Mail original ----- De: "starlight 2014q1"
> > <starlight.2014q1 at binnacle.cx> À: tor-relays at lists.torproject.org 
> > Envoyé: Vendredi 10 Janvier 2014 05:49:20 Objet: [tor-relays]
> > bandwidth authority algorithm is cracked
> > 
> > The bandwidth authorities assign all kinds of wildly incorrect
> > capacities to the Tor node here.
> > 
> > The Tor relay software has been up for 45 days and has not been down
> > for more than five minutes for three or four months.
> > 
> > Occasional outages from the ISP mucking with their network, but
> > nothing more than ten or fifteen minutes in any week.
> > 
> > The local node bandwidth calculation is consistently 490-495
> > Kbytes/sec.  Very stable.  Very consistent.
> > 
> > The Tor bandwidth authorities assign values anywhere from 100
> > Kbytes/sec to almost 700 Kbytes/sec in an oscillating pattern with a
> > period of about one week.
> > 
> > Something is seriously wrong with that.
> > 
> > _______________________________________________ tor-relays mailing
> > list tor-relays at lists.torproject.org 
> > https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays 
> > _______________________________________________ tor-relays mailing
> > list tor-relays at lists.torproject.org 
> > https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
> > 
> 
> _______________________________________________
> tor-relays mailing list
> tor-relays at lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

-- 
Mike Perry

_______________________________________________
tor-relays mailing list
tor-relays at lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


More information about the tor-relays mailing list