[tor-bugs] #14744 [GetTor]: Automate upload of latest Tor Browser to cloud services

Tor Bug Tracker & Wiki blackhole at torproject.org
Sat Apr 4 04:54:56 UTC 2015


#14744: Automate upload of latest Tor Browser to cloud services
------------------------+----------------------
     Reporter:  ilv     |      Owner:  ilv
         Type:  defect  |     Status:  reopened
     Priority:  major   |  Milestone:
    Component:  GetTor  |    Version:
   Resolution:          |   Keywords:
Actual Points:          |  Parent ID:
       Points:          |
------------------------+----------------------
Changes (by ilv):

 * status:  closed => reopened
 * resolution:  implemented =>


Comment:

 Replying to [comment:6 isis]:
 >
 > Hey ilv! Great work! I see that
 [https://github.com/ilv/gettor/blob/develop/upload/fetch_latest_torbrowser.py
 your current script] still uses `os.system(cmd)`… were you still planning
 to use Twisted?  Using `os.system()` is really not recommended in the
 Python world.
 >

 \\
 hey isis, thanks! and thanks for taking the time to review this! tbh, I
 discarded using Twisted (for SSL verification) because wget fails (and
 thus the whole script) if the certificate is incorrect.
 \\


 > Some issues I see with the current implementation are:
 >
 >   1. If the `os.system("wget […]"` command fails entirely, or only
 downloads a portion of a bundle, you'll never know because you're not
 checking the returned exit status code.
 >
 >   2. There is no mechanism for resuming downloads, if !#1 happens.
 >
 \\
 Correct, thanks for pointing this out.
 \\
 >   3. Doing
 >      {{{
 >      for provider in UPLOAD_SCRIPTS:
 >          os.system("python2.7 %s" % UPLOAD_SCRIPTS[provider])
 >      }}}
 >      doesn't scale to more provider scripts than the Gettor machine has
 CPU cores, since most Python scripts will stupidly hog an entire core.  It
 also doesn't take into account memory limitations (and thus, the more
 providers Gettor has, the more likely for this code to OOM the Gettor
 machine), nor network bandwidth limitations (nor the effect that any
 network bandwidth limitations might have on other upload scripts being
 executed).
 >
 \\
 Correct me if I'm wrong, but the scripts for each provider should be
 executed sequentially, so I'm not sure about the scalability problems
 related to the CPU cores. And you are right again, I haven't taken into
 account nor the memory limitations nor the network bandwidth limitations.
 I guess Twisted should be helpful for these points.
 \\

 >   Second, which doesn't matter, but the syntax is a bit odd; normally
 one might do
 >   {{{
 >   for provider, script in UPLOAD_SCRIPTS.items():
 >       os.system("python2.7 %s" % script)
 >   }}}
 >   or, if nothing is using `provider`, then the for loop should more
 optimally look like:
 >   {{{
 >   for script in UPLOAD_SCRIPTS.values():
 >       […]
 >   }}}
 >
 \\
 /me is still a python noob :P
 \\
 > By using Twisted instead, particularly if you have the
 [https://pypi.python.org/pypi/service_identity service_identity] module
 installed, and then with a trivially implementable amount of extra code,
 having leaf or root certificate pinning is possible.  Not to mention the
 speed increases and parallelisation that become possible using Twisted.
 If you want an example of a standalone script for downloading something
 over TLS with Twisted,
 [https://gitweb.torproject.org/user/isis/bridgedb.git/tree/scripts/get-
 tor-exits?h=develop BridgeDB's script for downloading the list of Tor Exit
 relays] (into memory or a file, in this case) might be helpful, as well as
 [https://gitweb.torproject.org/user/isis/bridgedb.git/tree/lib/bridgedb/proxy.py?h=develop#n358
 the way BridgeDB uses this script as a Protocol]
 (`twisted.internet.protocol.Protocol`) and
 [https://gitweb.torproject.org/user/isis/bridgedb.git/tree/lib/bridgedb/proxy.py?h=develop#n32
 manages that Protocol within a Twisted program] (so that the list in this
 case is loaded directly into memory for the servers in the cluster without
 wasting a bunch of time doing disk I/O. This latter part is less
 applicable to your case, but it does demonstrate how tasks such as these
 can be running parallel to the rest of your program. Oh, and they can also
 be
 [https://gitweb.torproject.org/user/isis/bridgedb.git/tree/lib/bridgedb/Main.py?h=develop#n525
 easily scheduled], because f!@# cron too.)

 \\
 Thanks a lot for this info! Now I'm convinced again that I should use
 Twisted :)
 \\

 > You could also quite easily check the `*.asc` files on the downloaded
 bundles to ensure that the whole thing downloaded properly. If you were to
 use [https://pypi.python.org/pypi/gnupg python-gnupg] to do it, it would
 look something like:
 >
 > {{{
 > import gnupg
 > import glob
 > # The GNUPG_HOME_DIR should have the correct signing keys in its
 pubring.gpg
 > # file (so geko's and mikeperry's keys, and the Tor Browser signing key,
 at
 > # the minimum).
 > gpg = gnupg.GPG(homedir=GNUPG_HOME_DIR)
 > signatures = glob.glob("%s/*.asc" % latest_version)
 > verified = []
 > unverified = []
 > for sig in signatures:
 >     bundle = sig.rstrip(".asc")
 >     with open(bundle, 'rb') as fh:
 >         data = fh.read()
 >         result = gpg.verify(data, sig)
 >         if result.valid:
 >             verified.append(bundle)
 > }}}
 \\
 '''Awesome''', thanks again!

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/14744#comment:7>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list