[tor-bugs] #4499 [Analysis]: Investigate scaling points to handle more bridges

Tor Bug Tracker & Wiki torproject-admin at torproject.org
Thu Feb 2 18:11:44 UTC 2012


#4499: Investigate scaling points to handle more bridges
----------------------+-----------------------------------------------------
 Reporter:  runa      |          Owner:  karsten                  
     Type:  task      |         Status:  assigned                 
 Priority:  normal    |      Milestone:  Sponsor E: March 15, 2012
Component:  Analysis  |        Version:                           
 Keywords:            |         Parent:                           
   Points:            |   Actualpoints:                           
----------------------+-----------------------------------------------------

Comment(by karsten):

 I started this analysis by writing a small tool to generate sample data
 for BridgeDB and metrics-db.  This tool takes the contents from one of
 Tonga's bridge tarball as input, copies them a given number of times, and
 overwrites the first two bytes of relay fingerprints in every copy with
 0000, 0001, etc.  The tool also fixes references between network statuses,
 server descriptors, and extra-info descriptors.  This is sufficient to
 trick BridgeDB and metrics-db into thinking that relays in the copies are
 distinct relays.  I used the tool to generate tarballs with 2, 4, 8, 16,
 32, and 64 times as many bridge descriptors in them.

 In the next step I fed the tarballs into BridgeDB and metrics-db.
 BridgeDB reads the network statuses and server descriptors from the latest
 tarball and writes them to a local database.  metrics-db sanitizes two
 half-hourly created tarballs every hour, establishes an internal mapping
 between descriptors, and writes sanitized descriptors with fixed
 references to disk.

 The attached graph shows the results.

 The upper graph shows how the tarballs grow in size with more bridge
 descriptors in them.  This growth is, unsurprisingly, linear.  One thing
 to keep in mind here is that bandwidth and storage requirements to the
 hosts transferring and storing bridge tarballs are growing with the
 tarballs.  We'll want to pay extra attention to disk space running out on
 those hosts.

 The middle graph shows how long BridgeDB takes to load descriptors from a
 tarball.  This graph is linear, too, which indicates that BridgeDB can
 handle an increase in the number of bridges pretty well.  One thing I
 couldn't check is whether BridgeDB's ability to serve client requests is
 in any way affected during the descriptor import.  I assume it'll be fine.
 Aaron, are there other things in BridgeDB that I overlooked that may not
 scale?

 The lower graph shows how metrics-db can or cannot handle more bridges.
 The growth is slightly worse than linear.  In any case, the absolute time
 required to handle 25K bridges is worrisome (I didn't try 50K).  metrics-
 db runs in an hourly cronjob, and if that cronjob doesn't finish within 1
 hour, we cannot start the next run and will be missing some data.  We
 might have to sanitize bridge descriptors in a different thread or process
 than the one that fetches all the other metrics data.  I can also look
 into other Java libraries to handle .gz-compressed files that are faster
 than the one we're using.  So, we can probably handle 25K bridges somehow,
 and maybe even 50K.  Somehow.

 Finally, note that I left out the most important part of this analysis:
 can Tonga, or more generally, a single bridge authority handle this
 increase in bridges?  I'm not sure how to test such a setting, or at least
 without running 50K bridges in a private network.  I could imagine this
 requires some more sophisticated sample data generation including getting
 the crypto right and then talking to Tonga's DirPort.  If there's an easy
 way to test this, I'll do it.  If not, we can always hope for the best.
 What can go wrong.

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/4499#comment:6>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list