[tor-bugs] #1854 [Analysis]: Investigate raising the minimum bandwidth for getting the Fast flag

Thu Sep 20 18:59:28 UTC 2012

#1854: Investigate raising the minimum bandwidth for getting the Fast flag
---------------------------------------+------------------------------------
 Reporter:  arma                       |          Owner:  arma          
     Type:  task                       |         Status:  needs_revision
 Priority:  normal                     |      Milestone:                
Component:  Analysis                   |        Version:                
 Keywords:  performance loadbalancing  |         Parent:                
   Points:                             |   Actualpoints:                
---------------------------------------+------------------------------------
Changes (by karsten):

  * status:  needs_review => needs_revision

Comment:

 Replying to [comment:27 arma]:
 > I talked to Ian and Aaron a bit more about this analysis. What we'd like
 to see, for a given consensus, is a graph with bandwidth cutoff on the x
 axis and L_\inf on the y axis. L_\inf is the largest distance between the
 two probability distributions -- one being the probability distribution of
 which relay you'd pick from the pristine consensus, and the other the
 distribution in the modified consensus. "largest distance" means the
 element (i.e. relay) with the largest difference.

 Sounds doable.  I'd say let's start with plain consensus weight fractions
 and postpone exit, guard, country, and AS probabilities until we have a
 better handle on this type of analysis.

 A possible output file could look like this:

 {{{
 validafter,min_advbw,relays,linf
 2012-09-10 01:00:00,1,3040,0.03553
 2012-09-10 01:00:00,2,[...]
 }}}

 Here, `validafter` is the consensus valid-after time, `min_advbw` is the
 minimum advertised bandwidth of relays kept in the modified consensus,
 `relays` is the number of those relays, and `linf` is the largest
 difference between consensus weight fractions of all relays.  The
 probability in the pristine consensus is always the consensus weight
 fraction.  The probability in the modified consensus is 0 if the relay was
 excluded, or the consensus weight fraction relative to the ''new''
 consensus weight sum (which is lower than the original consensus weight
 sum, because we cut out some relays).  We'll want to compare probabilities
 of all relays, including those that we excluded, because they have non-
 zero probability in the modified consensus.

 > Then we should consider time: looking at C consensuses over the past
 year or something, for a given cutoff, we should graph the cdf of these C
 data points where each data point is the L_\inf of that consensus for that
 cutoff. The hope is that for some cutoffs, the cdf has very high area-
 under-the-curve.

 Sure, we should be able to plot those graphs from the file format above.

 Sathya, want to look into modifying pyentropy.py for the linf stuff?

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/1854#comment:28>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online