[tor-bugs] #6064 [Metrics Website]: Bridge usage statistics on metrics website are broken

Tor Bug Tracker & Wiki torproject-admin at torproject.org
Tue Jun 5 12:51:29 UTC 2012


#6064: Bridge usage statistics on metrics website are broken
-----------------------------+----------------------------------------------
 Reporter:  karsten          |          Owner:  karsten
     Type:  defect           |         Status:  new    
 Priority:  major            |      Milestone:         
Component:  Metrics Website  |        Version:         
 Keywords:                   |         Parent:         
   Points:                   |   Actualpoints:         
-----------------------------+----------------------------------------------
 The graph on [https://metrics.torproject.org/users.html#bridge-users
 bridge users from all countries] recently went up from 10,000 to 50,000.
 There was no event that could explain this increase, so I looked for a
 possible bug.

 Here's the bug: when we aggregate bridge users per day, we write single
 observations to a file with lines like this:

 {{{
 bridge,date,time,??,a1,a2,...,all
 0007BC3A0CFC768DB2FA1E3EB6FB4ABF4EBE2D13,2012-05-24,07:12:18,NA,1.12,NA,...,30.55
 }}}

 In the next step we aggregate these lines by summing up all observations
 of a given day.

 Turns out the file with single observations was truncated and we didn't
 notice.  When adding lines to that file, it is read to memory, new
 observations are added, and the file is written to disk.  The file is
 always kept ordered by bridge fingerprint.  Here's the distribution of
 bridge fingerprints in the file:

 {{{
 0 24567
 1 24623
 2 11687
 3  1526
 4  1124
 5   825
 6  1352
 7  1422
 8  1271
 9  1287
 A  1336
 B  1048
 C  1525
 D  1227
 E  1497
 F   994
 }}}

 We would expect roughly the same number of bridges in each bucket.  Looks
 like the file was truncated after writing half of the fingerprints
 starting with 2.  This could have happened due to Java running out of
 memory, the server being restarted while writing the file, etc.

 The quick fix is to aggregate bridge usage statistics again and replace
 the single-observations file on yatei.  I'm going to do that now.

 The next fix is to avoid truncating the file by writing to a temp file and
 replacing the original file with it once we're done writing.  I'll look
 into that next.

 The real fix is to stop using flat files for something that requires a
 database.  That's going to take me quite a bit longer.

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/6064>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list