[tor-bugs] #5232 [BridgeDB]: Import bridges into BridgeDB in a separate thread and database transaction

Tor Bug Tracker & Wiki blackhole at torproject.org
Sun Mar 30 14:26:51 UTC 2014


#5232: Import bridges into BridgeDB in a separate thread and database transaction
-------------------------+-------------------------------------------------
     Reporter:  karsten  |      Owner:  sysrqb
         Type:  defect   |     Status:  closed
     Priority:  major    |  Milestone:
    Component:           |    Version:
  BridgeDB               |   Keywords:  bridgedb-email, bridgedb-db,
   Resolution:  fixed    |  bridgedb-https, bridgedb-0.1.7
Actual Points:           |  Parent ID:
       Points:           |
-------------------------+-------------------------------------------------

Comment (by sysrqb):

 Replying to [comment:18 isis]:
 > Replying to [comment:17 sysrqb]:
 > > Pushed
 [https://gitweb.torproject.org/user/sysrqb/bridgedb.git/shortlog/refs/heads/bug5232_adding_concurrent_processing_squashed
 bug5232_adding_concurrent_processing_squashed]. Please review
 > >
 > > https://travis-ci.org/sysrqbci/bridgedb/builds/21812112
 >
 > Here goes!

 I decided to rewrite history, again because they were relatively minor
 changes.

 >
 >   1) For `43c6f8778b244ca4b19628677a293a83ae1d1e09`
 >      `Reparse descriptors in background thread on signal`:
 >      In `Main.updateBridgeHistory()`:
 >      {{{
 >      +                for ID in bridges:
 >      +                    bridge = bridges[ID]
 >      }}}
 >      I'm pretty sure we had this code like this previously, but I don't
 know why we're not using:
 >      {{{
 >      +                for bridge in bridges.values():
 >      }}}
 >      instead. But this is shouldn't make any functional difference, it
 was probably done like this way back when, probably to solve some weird
 Python2.5 bug. Not really a problem, just nitpickings and archaeological
 ponderings.

 Fixed.

 >
 >      In the same function:
 >      {{{
 >      +            with bridgedb.Storage.getDB() as db:
 >      [...]
 >      +                        for timestamp in ts:
 >      +                            logging.debug(
 >      +                                "Updating BridgeHistory timestamps
 for %s: %s"
 >      +                                % (bridge.fingerprint, timestamp))
 >      +                            reactor.callFromThread(
 >      +
 bridgedb.Stability.addOrUpdateBridgeHistory,
 >      +                                bridge, timestamp)
 >      +            logging.debug("Stability calculations complete")
 >      +
 >      +    reactor.callInThread(updateBridgeHistory, bridges, timestamps)
 >      }}}
 >      Calling `reactor.callFromThread` will run the call inside the main
 reactor loop... unless the `with bridgedb.Storage.getDB() as db` is doing
 some context magic to put it back into a separate thread again. Unless I'm
 misunderstanding something, which I probably am. Or, does it call it in
 the parent thread, i.e. the `reactor.callInThread(_reloadFn)` thread?
 Because the `t.i.interfaces.IReactorThreads` docs make it seem like
 `reactor.callFromThread()` will ''always'' call the main reactor thread.

 Nope, you're completey correct. I *think* that was a typo and I wanted to
 use callInThread(), but I don't actually know why I wanted to do that
 either, it's already in a different thread than reactor loop. I can't
 justify spawning a couple thousand threads for this so I deleted that
 invocation and now it directly calls addOrUpdateBridgeHistory().

 >
 >      Wow. This distributor swapping. Is just wow. OMGWTFBBQ, Python.
 >
 >   3) The context wrapping for the `with bridgedb.Storage.getDB() as db`
 idiom is some serious hacking. I'm impressed. It works. :D
 >

 Yeah. I know, me too!

 >   4) In `b67862d317a282d81e2f53726e4dbff8bb8cb1f1`
 >      `More unit tests for Email Server`:
 >       Why `from StringIO import StringIO`? Isn't that the old,
 deprecated one?
 >

 100% correct. That was my mistake. But then fixing that caused some
 breakage because we used StringIO.StringIO in EmailServer.py. When I
 switched that to io.StringIO it broke because io.StringIO only accepts
 unicode and we used MimeWriter.MimeWriter which used string.
 {{{
 Traceback (most recent call last):
   File "/home/sysrqb/.virtualenvs/virt-bridgedb/lib/python2.7/site-
 packages/bridgedb-0.1.5_78_gca67179_dirty-
 py2.7.egg/bridgedb/test/test_EmailServer.py", line 215, in
 test_getMailResponseMailContent
     ret = EmailServer.getMailResponse(lines, self.ctx)
   File "/home/sysrqb/.virtualenvs/virt-bridgedb/lib/python2.7/site-
 packages/bridgedb-0.1.5_78_gca67179_dirty-
 py2.7.egg/bridgedb/EmailServer.py", line 213, in getMailResponse
     gpgContext=ctx.gpgContext)
   File "/home/sysrqb/.virtualenvs/virt-bridgedb/lib/python2.7/site-
 packages/bridgedb-0.1.5_78_gca67179_dirty-
 py2.7.egg/bridgedb/EmailServer.py", line 443, in composeEmail
     mailbody = w.startbody("text/plain")
   File "/usr/lib64/python2.7/MimeWriter.py", line 141, in startbody
     self.flushheaders()
   File "/usr/lib64/python2.7/MimeWriter.py", line 125, in flushheaders
     self._fp.writelines(self._headers)
 exceptions.TypeError: unicode argument expected, got 'str'
 }}}
 As a result, #11370. Which should be resolved now.

 >   5) In `91dc3a6e079b318a3af1bbd4bb63cf249e2bbdb6`
 >      `Rewrite database lock acquisition function`:
 >      A typo got accidentally added:
 >      {{{
 >      -    Always return a :class:`bridgedb.Storage.Database` that is
 >      +    lways return a :class:`bridgedb.Storage.Database` that is
 >      }}}
 >

 Nice catch

 >   6) In `3c9e04ac9006ff21f30aeb033f6e2560f09541f6`
 >      `Make sure leekspin created our descriptors`:
 >      Sorry for the `/usr/share/dict` problem in leekspin, it's a good
 idea to add these checks.
 >

 I debated deleting this commit but I decided to leave it because it caused
 no harm to check for the files.

 >   7) Super! CHANGELOG entries! Awesome! :D
 >

 And now there's another one!

 > Overall, the context manager wrapping for the semaphores on db accesses
 is a bit of madness, but doing threading in Twisted Python is a bit of
 madness, and I'm amazed this is working. Threading in Twisted is seriously
 no small feat. You did a really brilliant job on this!
 >

 This comment made it all worth it. Thanks! :)

 > Unless someone has something more to add, I'm merging this now for
 0.1.7. :D

 yay! Ok, so after the changes,

 5092364f7e3ce42b61b0336a95519639d0e59308 needs review for #11370
 bd05227c5eefc528b72d57d53ac3f139fc2f0d5e (was 91dc3a6e) contains the re-
 added letter (5)
 0556e7daa8504756b34384a0f9b5dfa753556c09 (was b67862d3) now uses
 io.StringIO (4)
 bd652aa560cfaf15b9b37979aba73e9f8f548b4e (was 43c6f877) fixes
 callFromThread() and iterator (1)

 And I added 8dfcd5d7f77950828d3c95f93417baf7ea70c6b7 for a doc and to fix
 long lines.

 Pushed
 [https://gitweb.torproject.org/user/sysrqb/bridgedb.git/shortlog/refs/heads/bug5232_adding_concurrent_processing_squashed_r1
 bug5232_adding_concurrent_processing_squashed_r1] for rereview!

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/5232#comment:19>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list