[tor-bugs] #5232 [BridgeDB]: Import bridges into BridgeDB in a separate thread and database transaction

Tor Bug Tracker & Wiki blackhole at torproject.org
Sat Mar 29 14:23:16 UTC 2014


#5232: Import bridges into BridgeDB in a separate thread and database transaction
-------------------------+-------------------------------------------------
     Reporter:  karsten  |      Owner:  sysrqb
         Type:  defect   |     Status:  closed
     Priority:  major    |  Milestone:
    Component:           |    Version:
  BridgeDB               |   Keywords:  bridgedb-email, bridgedb-db,
   Resolution:  fixed    |  bridgedb-https, bridgedb-0.1.7
Actual Points:           |  Parent ID:
       Points:           |
-------------------------+-------------------------------------------------
Changes (by isis):

 * status:  needs_review => closed
 * keywords:  bridgedb-email, bridgedb-db, bridgedb-https, bridgedb-0.1.x =>
     bridgedb-email, bridgedb-db, bridgedb-https, bridgedb-0.1.7
 * resolution:   => fixed


Comment:

 Replying to [comment:17 sysrqb]:
 > Timings from another run:
 >
 > 1,000 bridges:
 > {{{
 >   * Starting the servers took:      6s
 >   * Restarting (SIGHUP) took:       7s
 > }}}
 >
 > 10,000 bridges:
 > {{{
 >   * Starting the servers took:      1m 17s
 >   * Restarting (SIGHUP) took:          54s
 > }}}
 >
 > Master, 10,000
 > {{{
 >   * Starting the server took:      56m 44s
 > }}}
 >
 > These numbers seem...inconsistent(?).
 >

 They do... that `56m 44s` is from a second run? I didn't timeit, but I got
 something like ~15s startup on your
 `bug5232_adding_concurrent_processing_squashed` branch just now, for 250
 descriptors.

 > Pushed
 [https://gitweb.torproject.org/user/sysrqb/bridgedb.git/shortlog/refs/heads/bug5232_adding_concurrent_processing_squashed
 bug5232_adding_concurrent_processing_squashed]. Please review
 >
 > https://travis-ci.org/sysrqbci/bridgedb/builds/21812112

 Here goes!

   1) For `43c6f8778b244ca4b19628677a293a83ae1d1e09`
      `Reparse descriptors in background thread on signal`:

      In `Main.updateBridgeHistory()`:
      {{{
      +                for ID in bridges:
      +                    bridge = bridges[ID]
      }}}
      I'm pretty sure we had this code like this previously, but I don't
 know why we're not using:
      {{{
      +                for bridge in bridges.values():
      }}}
      instead. But this is shouldn't make any functional difference, it was
 probably done like this way back when, probably to solve some weird
 Python2.5 bug. Not really a problem, just nitpickings and archaeological
 ponderings.

      In the same function:
      {{{
      +            with bridgedb.Storage.getDB() as db:
      [...]
      +                        for timestamp in ts:
      +                            logging.debug(
      +                                "Updating BridgeHistory timestamps
 for %s: %s"
      +                                % (bridge.fingerprint, timestamp))
      +                            reactor.callFromThread(
      +
 bridgedb.Stability.addOrUpdateBridgeHistory,
      +                                bridge, timestamp)
      +            logging.debug("Stability calculations complete")
      +
      +    reactor.callInThread(updateBridgeHistory, bridges, timestamps)
      }}}
      Calling `reactor.callFromThread` will run the call inside the main
 reactor loop... unless the `with bridgedb.Storage.getDB() as db` is doing
 some context magic to put it back into a separate thread again. Unless I'm
 misunderstanding something, which I probably am. Or, does it call it in
 the parent thread, i.e. the `reactor.callInThread(_reloadFn)` thread?
 Because the `t.i.interfaces.IReactorThreads` docs make it seem like
 `reactor.callFromThread()` will ''always'' call the main reactor thread.

      Wow. This distributor swapping. Is just wow. OMGWTFBBQ, Python.

   2) For `e2275120ea8d95e0b2c9f25f1f42a293957363dc`
      `Add unit tests for bridgedb.Storage`:

      I like this function:
      {{{
      +    def _runAndDie(self, timeout, func):
      +        with func():
      +            sleep(timeout)
      }}}

      Just thought I'd say that somewhere. :)

      Thanks for adding unittests!

   3) The context wrapping for the `with bridgedb.Storage.getDB() as db`
 idiom is some serious hacking. I'm impressed. It works. :D

   4) In `b67862d317a282d81e2f53726e4dbff8bb8cb1f1`
      `More unit tests for Email Server`:
       Why `from StringIO import StringIO`? Isn't that the old, deprecated
 one?

   5) In `91dc3a6e079b318a3af1bbd4bb63cf249e2bbdb6`
      `Rewrite database lock acquisition function`:
      A typo got accidentally added:
      {{{
      -    Always return a :class:`bridgedb.Storage.Database` that is
      +    lways return a :class:`bridgedb.Storage.Database` that is
      }}}

   6) In `3c9e04ac9006ff21f30aeb033f6e2560f09541f6`
      `Make sure leekspin created our descriptors`:
      Sorry for the `/usr/share/dict` problem in leekspin, it's a good idea
 to add these checks.

   7) Super! CHANGELOG entries! Awesome! :D

 Overall, the context manager wrapping for the semaphores on db accesses is
 a bit of madness, but doing threading in Twisted Python is a bit of
 madness, and I'm amazed this is working. Threading in Twisted is seriously
 no small feat. You did a really brilliant job on this!

 Unless someone has something more to add, I'm merging this now for 0.1.7.
 :D

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/5232#comment:18>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list