[tor-bugs] #2680 [Metrics]: present bridge usage data so researchers can focus on the math

Tor Bug Tracker & Wiki torproject-admin at torproject.org
Fri Mar 11 14:06:58 UTC 2011


#2680: present bridge usage data so researchers can focus on the math
---------------------+------------------------------------------------------
 Reporter:  arma     |          Owner:  karsten 
     Type:  task     |         Status:  assigned
 Priority:  normal   |      Milestone:          
Component:  Metrics  |        Version:          
 Keywords:           |         Parent:          
   Points:           |   Actualpoints:          
---------------------+------------------------------------------------------
Changes (by karsten):

  * status:  new => assigned


Comment:

 Here's my first attempt for presenting bridge usage data in a way that is
 more useful to researchers:

 We have (at least) four data sources that are relevant for analyzing
 bridge usage:

  1. ''Bridge descriptors'':  Bridges publish bridge descriptors to the
 bridge authority at least once every 18 hours.

  2. ''Bridge network statuses'':  The bridge authority forms an opinion on
 all bridges that published a descriptor recently, decides whether it
 considers them as running, and writes these opinions to a bridge network
 status document every 30 minutes.

  3. ''BridgeDB pool assignments'':  BridgeDB learns about currently
 running bridges from the bridge authority and allocates these bridges to
 distributors like email or https or keeps them unallocated for manual
 distribution.

  4. ''Relay consensuses'':  The directory authorities vote on running
 relays (not bridges) every hour and publish a network status consensus.
 If a bridge uses the same identity key that it also used as a relay, it
 might observe more users than it would observe as a pure bridge.
 Therefore, bridges that have been running as relays before should be
 excluded from bridge statistics.

 When Roger and I talked about this idea on IRC, I thought that we could
 merge data from these 4 sources into a single file.  Let's step back.  We
 should start with 4 data formats that are easier to parse than the current
 data sources and let researchers assemble the files themselves.  We can
 discuss merging these 4 data formats into 1 at a later time.

 I wrote two Java programs to parse the data on the metrics website and
 generate 3 of these 4 data formats.  (We're still in the process of
 patching BridgeDB to dump its pool assignments to a file for the 3rd data
 source in the list above.  Once we're done with that, I'll write another
 Java program to provide the 4th data format.)  I can integrate these
 programs into metrics-db and provide these formats on a daily basis, but
 before doing so, I'd like to know whether the formats are useful to people
 at all.

 I uploaded a tarball of the three new data formats for
 [http://freehaven.net/~karsten/volatile/bridge-usage-data-2011-01.tar.bz2
 January 2011] (39M).  The source code to transform our standard tarballs
 into the new data formats plus a more detailed description of the data
 formats is in the [https://gitweb.torproject.org/metrics-
 tasks.git/tree/HEAD:/task-2680 metrics-tasks repository].

 I'm going to make the 3rd data format (BridgeDB pool allocations) for
 January 2011 available as soon as I have it (hopefully in a week from
 now).

 Also, I'm going to ignore the research questions listed in the ticket
 description above and let others answer them.

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/2680#comment:1>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list