[tor-bugs] #3261 [Analysis]: Analyze how wrong our bridge usage statistics are

Tor Bug Tracker & Wiki torproject-admin at torproject.org
Tue Nov 1 09:49:02 UTC 2011


#3261: Analyze how wrong our bridge usage statistics are
----------------------+-----------------------------------------------------
 Reporter:  karsten   |          Owner:     
     Type:  task      |         Status:  new
 Priority:  major     |      Milestone:     
Component:  Analysis  |        Version:     
 Keywords:            |         Parent:     
   Points:            |   Actualpoints:     
----------------------+-----------------------------------------------------

Comment(by karsten):

 Replying to [comment:3 arma]:
 > Assuming most bridge users find out about a bridge via one of the
 bridgedb mechanisms, I think we should look at 'fraction of bridges' as
 the primary question rather than 'fraction of bytes'. Bridgedb doesn't
 look at capacity after all when deciding what addresses to give out.
 >
 > So I would ask "Given this hour's networkstatus (written by Tonga), what
 fraction of the Running bridges never send us stats covering this hour?"

 You're right.  Unfortunately, I cannot change the analysis to include
 network statuses, at least not easily.  I'm only parsing bridge extra-info
 descriptor, and even that keeps my machine busy for a few hours for a year
 of data, let alone the time I'd have to spend on rewriting the analysis
 code.

 But I changed the analysis to look at bridge uptime seconds per day that
 are covered by stats instead of written bytes.  I'm adding up the seconds
 for which bridges report usage statistics and the seconds for which they
 report written or read bytes.  The quotient of the two sums is the
 percentage we're looking for.  This analysis should be quite close to what
 you describe.  At least it gives us the idea whether we're talking about
 10, 30, 50, 70, or 90% here.

 See the attached graph that I just updated.  The upper part contains the
 old approach where we weight by written bytes, and the lower part is the
 new analysis that weights by uptime seconds.  So, the fraction of bridges
 reporting statistics has been at 20% until August 2011 and has then
 magically increased to 40%.

 > (Treating load as uniform across bridges is the wrong thing to do for
 users who learn their bridge through a non-bridgedb mechanism, like
 hearing from a friend what bridge they use. I wonder how we can estimate
 what fraction of bridge users learn about their bridge in what way. We
 could say that there probably aren't many such users because it involves
 manual interaction; or we could say that there aren't many users of the
 bridgedb approach because it gives out bridges that don't work in China so
 they're moot. I'm inclined toward the former.)

 Do we have any data about users who learn about their bridges through a
 non-BridgeDB mechanism?  You mean public bridges, right?  Because we don't
 have statistics from private bridges, which is an unrelated problem.  I
 don't know what data to use here, so I'm going to ignore the fact that
 non-BridgeDB bridge discovery mechanisms exist for now.

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/3261#comment:4>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list