[tor-bugs] #10680 [Analysis]: Obtain attributes of current public bridges

Tor Bug Tracker & Wiki blackhole at torproject.org
Mon Feb 3 11:23:19 UTC 2014


#10680: Obtain attributes of current public bridges
--------------------------+------------------------------
     Reporter:  sysrqb    |      Owner:
         Type:  task      |     Status:  new
     Priority:  normal    |  Milestone:
    Component:  Analysis  |    Version:
   Resolution:            |   Keywords:  bridgedb-parsers
Actual Points:            |  Parent ID:
       Points:            |
--------------------------+------------------------------

Comment (by karsten):

 Looks like a fine start!  I'll comment on the output csv files first:

  - Instead of "date", can you change the first column to
 "status_published" and put in the publication time of the bridge network
 status?  I'll aggregate that file using R, similar to how I'm aggregating
 the advbwdist-validafter.csv file here: https://gitweb.torproject.org
 /metrics-web.git/blob/HEAD:/modules/advbwdist/aggregate.R

  - The "ec2bridge" column in current servers.csv is actually a boolean
 type, not a number type.  It means that whenever there's a "t" in that
 column, the "bridges" column contains the number of bridges that in the
 EC2 cloud.  What you're doing is you're combining two dimensions, version
 and ec2bridge, by reporting how many of the EC2 bridges are running Linux.
 The current server.csv does not combine dimensions, so there's just one
 line for the number of Linux bridges and one line for the number of EC2
 bridges.  That's sufficient for most use cases, so I'd say let's not
 combine dimensions for now.

  - The column headers should not be repeated for every bridge status.  You
 could check if the output csv file exists and only write the header line
 if it doesn't.

 Regarding options to run your script: I'd appreciate a default mode of
 operation that processes only those bridge statuses that it did not
 process in an earlier run.  I think stem has an option to keep a parse
 history of some kind that you might be able to use here.  Note that you'll
 have to re-read server descriptors and extra-info descriptors in any case,
 because they might be referenced from many statuses.

 And finally, here are some quick comments on the code, though I can do
 another, more thorough code review later:

  - Bridge has quite a few attributes that we won't need.  For example,
 os_version isn't something we include in the output.  And we wouldn't
 include versions of other Tor-speaking programs like nTor anytime soon
 (but rather count them as "other" versions).  Oh, and there are no usable
 contact lines in bridge descriptors, so we don't need the contact
 attribute.  I guess what I'm saying is that this is dead code that
 shouldn't be there.  YAGNI.

  - I didn't see where you store the bridge status publication time in
 Bridge.

  - Both __init__ and set_descriptor_details could accept stem objects
 rather than several single parameters.

  - unpadded_base64_to_base_16 looks like something that stem should do for
 you.  If it doesn't, you should ask atagar to implement it in stem.

 I didn't make it further through the code yet, but I'm happy to do another
 review soon.  Let me know!

 Thanks!

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/10680#comment:17>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list