[tor-bugs] #3015 [BridgeDB]: Enhance bucket functionality

Thu Apr 28 19:25:21 UTC 2011

#3015: Enhance bucket functionality
-------------------------+--------------------------------------------------
 Reporter:  kaner        |          Owner:  kaner   
     Type:  enhancement  |         Status:  assigned
 Priority:  normal       |      Milestone:          
Component:  BridgeDB     |        Version:          
 Keywords:               |         Parent:          
   Points:               |   Actualpoints:          
-------------------------+--------------------------------------------------

Comment(by kaner):

 Replying to [comment:4 karsten]:
 > So, after discussing this on IRC and trying out the code, I think
 BridgeDB does ''not'' remove non-running bridges from file buckets.  At
 least the number of bridges never decreased, but always stayed the same or
 increased in my tests.  This also makes sense in the code, because
 Bucket.py does not import anything from Bridges.py, and Bridges.py is
 where we store which bridges are running.

 This is for certain. The reason is simple: Currently, the bucket mechanism
 does not use the "splitter -> distributor -> rings"-bridges imported from
 the descriptors, but reads them from the database.

 Basically, there's two ways to implement the buckets feature:

 a) Similar to what Karsten did for the `dump pool assignments` feature.
 Trigger a dump each time SIGHUP is received and only dump bridges that are
 in the current descriptor file. This approach has the advantage of
 streamlining nicely with the rest of BridgeDB's distribution mechanisms.

 b) Read bridges directly from the database. This provides more flexibility
 than a), because not only the bridges seen by the latest descriptor file
 can be given out, but also bridges that maybe haven't been online for a
 few hours (people  might be shutting down their bridges on the weekend,
 for instance; or at night). Also, bridges can be dumped without
 interfering with the running instance of BridgeDB, because a separate
 instance can be triggered via command line at any time. The disadvantage
 here surely is that it uses a different way of distribution than the rest
 of BridgeDB.

 Initially, nobody seemed to object this tradeoff, so I went for the
 implementation as it now is. Adding a configurable timespan that
 eliminates bridges that are too old to give out seems okay to me. This is
 what part a) of the patch on branch bug3015 does.

 Anyway, I'm willing to let it all go down because you don't seem to be a
 big fan of it (plus a's design is cleaner) and change that approach to a)
 again.

 > We could make the BucketManager in Buckets.py, that decides which
 bridges are written to which file, extend from Bridges.BridgeHolder in the
 same way as the *Distributor classes in Dist.py do.  That's how the
 BucketManager would learn which bridges are running at the moment.
 >
 > We could also remove the command-line option to dump bridges to buckets
 and simply dump them whenever we're done reloading the network status and
 bridge descriptors.  External tools wouldn't have to run BridgeDB with the
 command-line option, but could simply read the files whenever they like.
 >
 > As a side-effect, this approach would implement the first half of #2755
 by removing the time gap between dumping bridges to file buckets (and
 thereby possibly changing their pseudo distributors) and dumping
 assignments upon the next HUP.
 >
 > Does that make sense?

 It does, if I understood correctly. Maybe we should try to merge the `dump
 pool assignments` and the `bucket` mechanisms somewhat, after all. On
 closer inspection, I think large parts of the code for both features would
 look quite similar, if we decide for this solution.

 Thanks for your suggestions.

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/3015#comment:5>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online