[tor-bugs] #6232 [Analysis]: Make entropy-over-time graph

Thu Jul 26 10:59:06 UTC 2012

#6232: Make entropy-over-time graph
-------------------------+--------------------------------------------------
 Reporter:  arma         |          Owner:                
     Type:  enhancement  |         Status:  needs_revision
 Priority:  normal       |      Milestone:                
Component:  Analysis     |        Version:                
 Keywords:               |         Parent:  #6460         
   Points:               |   Actualpoints:                
-------------------------+--------------------------------------------------

Comment(by karsten):

 Replying to [comment:47 gsathya]:
 > Cool. I think atagar mentioned that stem can keep track of read
 consensus files, I'll take a look at this now.

 Neat!

 >> That's 2 * 3 * 4 = 24 possible combinations.  We have implemented five
 of them.  For example, in arma's first comment on #6443 he's asking for
 advertised bandwidths in the exit position for single relays.  We don't
 have those numbers yet.  Want to add the remaining 19 combinations, each
 of them with entropy and max entropy?
 > Looks like #6443 is using code from #5755, which is all Java. That would
 mean I'd have to rewrite it in py or just continue in Java which would
 essentially mean you'd have to rewrite all my Java code again to make it
 work ;).

 Ah, I meant implementing the 24 possible combinations in your #6232 code,
 not in #6443.  But you're right, we should merge the code of both tickets
 at some point, ideally rewriting #6443 in Python.  The two scripts are
 doing the same calculations, but outputting different values (entropy
 values vs. top bandwidths).  I'm not sure yet which values #6443 should
 provide, particularly with respect to the 24 combinations; that's a huge
 amount of data.  I'd say let's wait a bit until we're clearer about #6443,
 and then we integrate the code into #6232.

 > Should I pick some other metrics ticket?

 Want to look into #6471?

 > Yes, I was wondering about this. Why are we using the bandwidth and not
 the advertised bandwidth?

 Because clients use consensus weights to make path-selection decisions.
 These consensus weights are measured and voted on by bandwidth
 authorities, unlike the self-reported advertised bandwidths.  That's why
 we're interested in advertised bandwidths: how would all these fine
 metrics look like if we had no bandwidth authorities measuring the
 network?

 > It means give different weights to these metrics while calculating the
 total entropy -
 > 1. Legislative diversity - based on countries
 > 2. Organizational diversity - based on Family
 > 3. Financial diversity - ??
 > 4. Physical location diversity - ??
 > 5. Network diversity - Subnets

 I still don't understand.  Would such a metric consist of 30% legislative
 diversity, 20% organizational diversity, etc.?  I think we should compare
 these metrics separately, not combine them into a single number that
 nobody can interpret.

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/6232#comment:52>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online