[tor-bugs] #6232 [Analysis]: Make entropy-over-time graph

Wed Jul 18 08:58:51 UTC 2012

#6232: Make entropy-over-time graph
-------------------------+--------------------------------------------------
 Reporter:  arma         |          Owner:                
     Type:  enhancement  |         Status:  needs_revision
 Priority:  normal       |      Milestone:                
Component:  Analysis     |        Version:                
 Keywords:               |         Parent:                
   Points:               |   Actualpoints:                
-------------------------+--------------------------------------------------

Comment(by gsathya):

 Replying to [comment:32 karsten]:
 Excellent, more coding!

 > A few comments after re-reading the whole ticket:
 >
 >  - I wonder if entropies based on subsets of Exit and Guard flagged
 relays are correct.  I spent yesterday afternoon on trying to learn how
 path selection really works
 ([https://trac.torproject.org/projects/tor/ticket/5755#comment:11 #5755]).
 I think we'll have to take bandwidth weights as reported in the footer
 section of a consensus into account, too.  Those bandwidth weights
 influence, for example, how to weight the consensus weight of a relay with
 the Exit flag and a relay with Exit ''and'' Guard flag for the exit
 position.  In a consensus published yesterday, the former was weighted
 with Wee=1.0, whereas the latter was weighted with Wed=0.4272.  Similarly,
 bandwidth weights for the guard position were Wgd=0.2864 and Wgg=0.6446,
 so quite different.  If we only look at the Exit ''or'' Guard flag of a
 relay, we might be quite off.  But before we change anything here, I want
 to hear back from Mike or Roger if my understanding of path selection is
 correct.
 >
 >  - The GeoIP database is part of the sources in metrics-tasks.git,
 right?  Can we change that and have users provide their own geoip file?
 I'm worried that the current "a1" madness influences the results, and I'd
 like to swap the current database with the one from February which didn't
 have "a1" relays all over.
 >
 >  - Can we add AS-based entropy values, too?  There's an AS database from
 Maxmind that we could use here.  Again, users could provide that database
 file, so there's no need to commit it to the Git repo.

 Yep, all the three comments can be done pretty easily.

 >  - In the longer term, do we want to include family diversity?  That
 metric would consider all relays in the same relay family as one entity,
 similar to how we consider all relays in the same country as one entity in
 the country diversity metric.  I admit that it's hard to extract families
 using the current code, because we'd have to parse server descriptors for
 that, too.  I'm also not certain that the results will be meaningful.  So,
 longer-term.
 >
 >  - A shorter-term goal could be to compute bandwidth diversity based on
 the relays' advertised bandwidths, not based on their consensus weights.
 Relays report their advertised bandwidth in their server descriptor; it's
 the minimum of bandwidth rate, burst, and observed bandwidth.  We'll want
 to compute bandwidth diversity for all relays and for exit/guard subsets
 as well as location diversity.  This is what Roger was referring to in the
 last but one paragraph of the ticket description.  Again, I admit that
 it's non-trivial to extract advertised bandwidths, because we'll have to
 parse server descriptors.  But it's easier to compute than relay families.

 Actually stem can parse the server descriptors now. So this wouldn't be
 hard at all. I can teach the script to use stem for both families and
 advertised bandwidths.

 > gsathya, are you up for more coding fun?  Didn't you worry that this
 task might be too trivial for a thesis?  Hah! :)

 Heh indeed! Fun :)

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/6232#comment:34>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online