May 2015 - tor-commits - lists.torproject.org

[bridgedb/master] Print each log line from get-tor-exits to it's own line in our log.
by isis＠torproject.org 01 May '15

01 May '15

commit 63bd41f58ca2e4fe4b008fcba12739590e3c3b2c Author: Isis Lovecruft <isis(a)torproject.org> Date: Thu Apr 2 05:08:38 2015 +0000 Print each log line from get-tor-exits to it's own line in our log. --- lib/bridgedb/proxy.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/lib/bridgedb/proxy.py b/lib/bridgedb/proxy.py index 7fab5af..086f3bd 100644 --- a/lib/bridgedb/proxy.py +++ b/lib/bridgedb/proxy.py @@ -406,7 +406,8 @@ class ExitListProtocol(protocol.ProcessProtocol): def errReceived(self, data): """Some data was received from stderr.""" # The get-exit-list script uses twisted.python.log to log to stderr: - logging.debug(data) # pragma: no cover + for line in data.splitlines(): # pragma: no cover + logging.debug(line) def outReceived(self, data): """Some data was received from stdout."""

1 0

[bridgedb/master] Fix documentation for `key` parameter in IPBasedDistributor.
by isis＠torproject.org 01 May '15

01 May '15

commit a43a05802c108292f26c9bc1bc959d5cfb3b62dc Author: Isis Lovecruft <isis(a)torproject.org> Date: Tue Apr 14 07:36:28 2015 +0000 Fix documentation for `key` parameter in IPBasedDistributor. Conflicts: lib/bridgedb/Dist.py --- lib/bridgedb/Dist.py | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/lib/bridgedb/Dist.py b/lib/bridgedb/Dist.py index 6bb41b9..78741a7 100644 --- a/lib/bridgedb/Dist.py +++ b/lib/bridgedb/Dist.py @@ -156,9 +156,9 @@ class IPBasedDistributor(Distributor): because the set of known open proxies constitutes its own category. DOCDOC What exactly does a cluster *do*? - :param bytearray key: The master HMAC key for this distributor. All - added bridges are HMACed with this key in order to place them into - the hashrings. + :param bytes key: The master HMAC key for this distributor. All added + bridges are HMACed with this key in order to place them into the + hashrings. :type ipCategories: iterable or None :param ipCategories: DOCDOC :type answerParameters: :class:`bridgedb.Bridges.BridgeRingParameters`

1 0

[bridgedb/master] PEP8 and Sphinx documentation fixes for bridgedb.geo.getCountryCode().
by isis＠torproject.org 01 May '15

01 May '15

commit 6a3ca67a7119f0958c23298aa6b3fceeb4ad4271 Author: Isis Lovecruft <isis(a)torproject.org> Date: Thu Apr 2 21:42:18 2015 +0000 PEP8 and Sphinx documentation fixes for bridgedb.geo.getCountryCode(). --- lib/bridgedb/geo.py | 28 ++++++++++++++++------------ 1 file changed, 16 insertions(+), 12 deletions(-) diff --git a/lib/bridgedb/geo.py b/lib/bridgedb/geo.py index 3b48016..c986fa1 100644 --- a/lib/bridgedb/geo.py +++ b/lib/bridgedb/geo.py @@ -38,20 +38,25 @@ except Exception as err: # pragma: no cover geoip = None geoipv6 = None -def getCountryCode(IPAddr): + +def getCountryCode(ip): """Return the two-letter country code of a given IP address. - :param IPAddr: (:class:`ipaddr.IPAddress`) An IPv4 OR IPv6 address. - """ + :type ip: :class:`ipaddr.IPAddress` + :param ip: An IPv4 OR IPv6 address. + :rtype: ``None`` or str - ip = None + :returns: If the GeoIP databases are loaded, and the **ip** lookup is + successful, then this returns a two-letter country code. Otherwise, + this returns ``None``. + """ + addr = None version = None try: - ip = IPAddr.compressed - version = IPAddr.version + addr = ip.compressed + version = ip.version except AttributeError as err: - logging.warn("Wrong type passed to getCountryCode. Offending call:" - " %r" % err) + logging.warn("Wrong type passed to getCountryCode: %s" % str(err)) return None # GeoIP has two databases: one for IPv4 addresses, and one for IPv6 @@ -59,8 +64,7 @@ def getCountryCode(IPAddr): db = None # First, make sure we loaded GeoIP properly. if None in (geoip, geoipv6): - logging.warn("GeoIP databases failed to load; could not look up"\ - " country code.") + logging.warn("GeoIP databases aren't loaded; can't look up country code") return None else: if version == 4: @@ -69,10 +73,10 @@ def getCountryCode(IPAddr): db = geoipv6 # Look up the country code of the address. - countryCode = db.country_code_by_addr(ip) + countryCode = db.country_code_by_addr(addr) if countryCode: logging.debug("Looked up country code: %s" % countryCode) return countryCode else: - logging.debug("Country code was not detected. IP: %s" % ip) + logging.debug("Country code was not detected. IP: %s" % addr) return None

1 0

[bridgedb/master] Fix docs for BridgeBackwardsCompatibility.getConfigLine() request param.
by isis＠torproject.org 01 May '15

01 May '15

commit c79290e0b009fdcd303961910db7ddcf0627c32f Author: Isis Lovecruft <isis(a)torproject.org> Date: Tue Apr 14 07:37:06 2015 +0000 Fix docs for BridgeBackwardsCompatibility.getConfigLine() request param. --- lib/bridgedb/bridges.py | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/lib/bridgedb/bridges.py b/lib/bridgedb/bridges.py index 5c2731c..5c0afd8 100644 --- a/lib/bridgedb/bridges.py +++ b/lib/bridgedb/bridges.py @@ -715,10 +715,10 @@ class BridgeBackwardsCompatibility(BridgeBase): :type addressClass: :class:`ipaddr.IPv4Address` or :class:`ipaddr.IPv6Address`. :param addressClass: Type of address to choose. - :param str request: A string unique to this request e.g. email-address - or ``uniformMap(ip)`` or ``'default'``. In this case, this is not - a :class:`~bridgerequest.BridgeRequestBase` (as might be expected) - but the equivalent of + :param str request: A string (somewhat) unique to this request, + e.g. email-address or ``IPBasedDistributor.getSubnet(ip)``. In + this case, this is not a :class:`~bridgerequest.BridgeRequestBase` + (as might be expected) but the equivalent of :data:`bridgerequest.BridgeRequestBase.client`. :param str transport: A pluggable transport method name. """

1 0

[bridgedb/master] Add a test utility for benchmarking code.
by isis＠torproject.org 01 May '15

01 May '15

commit d49112515ce338334b75a795ab6aaf08daa5b489 Author: Isis Lovecruft <isis(a)torproject.org> Date: Fri Apr 17 23:49:13 2015 +0000 Add a test utility for benchmarking code. --- lib/bridgedb/test/util.py | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/lib/bridgedb/test/util.py b/lib/bridgedb/test/util.py index 234a9b6..9e401a5 100644 --- a/lib/bridgedb/test/util.py +++ b/lib/bridgedb/test/util.py @@ -18,6 +18,7 @@ import errno import ipaddr import os import random +import time from functools import wraps @@ -146,3 +147,33 @@ def randomIPString(): #: being run as a ``TestCase`` by ``twisted.trial``. TestCaseMixin = bdbutil.mixin TestCaseMixin.register(unittest.TestCase) + + +class Benchmarker(object): + """Wrap a context with a timer to benchmark execution time. + + .. hint:: Use like so:: + + with Benchmarker(): + foo(bar, baz) + + Once the ``with`` context exits, something like:: + + Benchmark: 180.269957ms (0s) + + will be printed to stdout (if **verbose** is set to ``True``). + """ + + def __init__(self, verbose=True): + self.verbose = verbose + + def __enter__(self): + self.start = time.time() + return self + + def __exit__(self, *args): + self.end = time.time() + self.seconds = self.end - self.start + self.milliseconds = self.seconds * 1000 + if self.verbose: + print("Benchmark: %12fms %12fs" % (self.milliseconds, self.seconds))

1 0

[bridgedb/master] Remove duplicate doctest runs for some modules.
by isis＠torproject.org 01 May '15

01 May '15

commit 016b6e2ea26c0444bda6074087032d0d42bf7a02 Author: Isis Lovecruft <isis(a)torproject.org> Date: Tue Apr 14 08:54:58 2015 +0000 Remove duplicate doctest runs for some modules. --- lib/bridgedb/test/legacy_Tests.py | 8 -------- 1 file changed, 8 deletions(-) diff --git a/lib/bridgedb/test/legacy_Tests.py b/lib/bridgedb/test/legacy_Tests.py index 4d0c53c..a039f91 100644 --- a/lib/bridgedb/test/legacy_Tests.py +++ b/lib/bridgedb/test/legacy_Tests.py @@ -9,7 +9,6 @@ separate from the production codebase. from __future__ import print_function -import doctest import os import random import sqlite3 @@ -653,13 +652,6 @@ def testSuite(): for klass in [IPBridgeDistTests, SQLStorageTests, EmailBridgeDistTests, BridgeStabilityTests]: suite.addTest(loader.loadTestsFromTestCase(klass)) - - for module in [ bridgedb.Bridges, - bridgedb.Main, - bridgedb.Dist, - bridgedb.schedule ]: - suite.addTest(doctest.DocTestSuite(module)) - return suite def main():

1 0

[bridgedb/master] Refactor b.p.descriptors.deduplicate() to improve benchmark results.
by isis＠torproject.org 01 May '15

01 May '15

commit 1befc9a07f30490decb01c9ab873d37ea77e1fae Author: Isis Lovecruft <isis(a)torproject.org> Date: Sat Apr 18 00:04:34 2015 +0000 Refactor b.p.descriptors.deduplicate() to improve benchmark results. --- lib/bridgedb/parse/descriptors.py | 104 +++++++++++++++++++------------------ 1 file changed, 53 insertions(+), 51 deletions(-) diff --git a/lib/bridgedb/parse/descriptors.py b/lib/bridgedb/parse/descriptors.py index efdedc9..bec10b3 100644 --- a/lib/bridgedb/parse/descriptors.py +++ b/lib/bridgedb/parse/descriptors.py @@ -144,7 +144,29 @@ def parseServerDescriptorsFile(filename, validate=True): routers = list(document) return routers -def deduplicate(descriptors): +def __cmp_published__(x, y): + """A custom ``cmp()`` which sorts descriptors by published date. + + :rtype: int + :returns: Return negative if x<y, zero if x==y, positive if x>y. + """ + if x.published < y.published: + return -1 + elif x.published == y.published: + # This *shouldn't* happen. It would mean that two descriptors for + # the same router had the same timestamps, probably meaning there + # is a severely-messed up OR implementation out there. Let's log + # its fingerprint (no matter what!) so that we can look up its + # ``platform`` line in its server-descriptor and tell whoever + # wrote that code that they're probably (D)DOSing the Tor network. + logging.warn(("Duplicate descriptor with identical timestamp (%s) " + "for bridge %s with fingerprint %s !") % + (x.published, x.nickname, x.fingerprint)) + return 0 + elif x.published > y.published: + return 1 + +def deduplicate(descriptors, statistics=False): """Deduplicate some descriptors, returning only the newest for each router. .. note:: If two descriptors for the same router are discovered, AND both @@ -155,68 +177,48 @@ def deduplicate(descriptors): :param list descriptors: A list of :api:`stem.descriptor.server_descriptor.RelayDescriptor`s, :api:`stem.descriptor.extrainfo_descriptor.BridgeExtraInfoDescriptor`s, - :api:`stem.descriptor.router_status_entry.RouterStatusEntryV2`s. + or :api:`stem.descriptor.router_status_entry.RouterStatusEntryV2`s. + :param bool statistics: If ``True``, log some extra statistics about the + number of duplicates. + :rtype: dict + :returns: A dictionary mapping router fingerprints to their newest + available descriptor. """ duplicates = {} - nonDuplicates = {} + newest = {} for descriptor in descriptors: fingerprint = descriptor.fingerprint - logging.debug("Deduplicating %s descriptor for router %s" % (str(descriptor.__class__).rsplit('.', 1)[1], safelog.logSafely(fingerprint))) - - if fingerprint in nonDuplicates.keys(): - # We already found a descriptor for this fingerprint: - conflict = nonDuplicates[fingerprint] - - # If the descriptor we are currently parsing is newer than the - # last one we found: - if descriptor.published > conflict.published: - # And if this is the first duplicate we've found for this - # router, then create a list in the ``duplicates`` dictionary - # for the router: - if not fingerprint in duplicates.keys(): - duplicates[fingerprint] = list() - # Add this to the list of duplicates for this router: - duplicates[fingerprint].append(conflict) - # Finally, put the newest descriptor in the ``nonDuplicates`` - # dictionary: - nonDuplicates[fingerprint] = descriptor - # Same thing, but this time the one we're parsing is older: - elif descriptor.published < conflict.published: - if not fingerprint in duplicates.keys(): - duplicates[fingerprint] = list() - duplicates[fingerprint].append(descriptor) - # This *shouldn't* happen. It would mean that two descriptors for - # the same router had the same timestamps, probably meaning there - # is a severely-messed up OR implementation out there. Let's log - # its fingerprint (no matter what!) so that we can look up its - # ``platform`` line in its server-descriptor and tell whoever - # wrote that code that they're probably (D)DOSing the Tor network. - else: - try: - raise DescriptorWarning( - ("Duplicate descriptor with identical timestamp (%s) " - "for router with fingerprint '%s'!") - % (descriptor.published, fingerprint)) - # And just in case it does happen, catch the warning: - except DescriptorWarning as descwarn: - logging.warn("DescriptorWarning: %s" % str(descwarn)) - - # Hoorah! No duplicates! (yet...) + if fingerprint in duplicates: + duplicates[fingerprint].append(descriptor) else: - nonDuplicates[fingerprint] = descriptor + duplicates[fingerprint] = [descriptor,] + + for fingerprint, dupes in duplicates.items(): + dupes.sort(cmp=__cmp_published__) + first = dupes.pop() + newest[fingerprint] = first + duplicates[fingerprint] = dupes + + if statistics: + # sorted() won't sort by values (or anything that isn't the first item + # in its container), period, no matter what the cmp function is. + totals = sorted([(len(v), k,) for k, v in duplicates.viewitems()]) + total = sum([k for (k, v) in totals]) + bridges = len(duplicates) + top = 10 if bridges >= 10 else bridges + logging.info("Number of bridges with duplicates: %5d" % bridges) + logging.info("Total duplicate descriptors: %5d" % total) + logging.info("Bridges with the most duplicates (Top %d):" % top) + for i, (subtotal, bridge) in zip(range(1, top + 1), totals[:top]): + logging.info(" #%d %s: %d duplicates" % (i, bridge, subtotal)) logging.info("Descriptor deduplication finished.") - logging.info("Number of duplicates: %d" % len(duplicates)) - for (fingerprint, dittos) in duplicates.items(): - logging.info(" For %s: %d duplicates" - % (safelog.logSafely(fingerprint), len(dittos))) - logging.info("Number of non-duplicates: %d" % len(nonDuplicates)) - return nonDuplicates + return newest def parseExtraInfoFiles(*filenames, **kwargs): """Parse files which contain ``@type bridge-extrainfo-descriptor``s.

1 0

[bridgedb/master] Add benchmark tests for b.p.descriptors.deduplicate().
by isis＠torproject.org 01 May '15

01 May '15

commit c9ad39af644d5d4ea7165203b4719ee25f187b29 Author: Isis Lovecruft <isis(a)torproject.org> Date: Sat Apr 18 00:03:23 2015 +0000 Add benchmark tests for b.p.descriptors.deduplicate(). --- lib/bridgedb/test/test_parse_descriptors.py | 77 +++++++++++++++++++++++++++ 1 file changed, 77 insertions(+) diff --git a/lib/bridgedb/test/test_parse_descriptors.py b/lib/bridgedb/test/test_parse_descriptors.py index aa3b662..5ef3f6a 100644 --- a/lib/bridgedb/test/test_parse_descriptors.py +++ b/lib/bridgedb/test/test_parse_descriptors.py @@ -13,11 +13,13 @@ from __future__ import print_function import datetime import glob +import hashlib import io import os import textwrap from twisted.trial import unittest +from twisted.trial.unittest import SkipTest HAS_STEM = False @@ -31,6 +33,8 @@ except (ImportError, NameError), error: else: HAS_STEM = True +from bridgedb.test.util import Benchmarker + BRIDGE_NETWORKSTATUS_0 = '''\ r MiserLandfalls 4IsyTSCtChPhFPAnq5rD8yymlqA /GMC4lz8RXT/62v6kZNdmzSmopk 2014-11-04 06:23:22 2.215.61.223 4056 0 @@ -441,6 +445,79 @@ class ParseDescriptorsTests(unittest.TestCase): datetime.datetime.strptime("2014-12-04 03:10:25", "%Y-%m-%d %H:%M:%S"), "We should have the newest available descriptor for this router.") + def createDuplicatesForBenchmark(self, b=1, n=1200): + """Create a bunch of duplicate extrainfos for benchmark tests. + + :param int b: The number of fake "bridges" to create **n** duplicate + descriptors for. + :param int n: The number of duplicate descriptors for each bridge + **b**. + """ + descFiles = [] + + # The timestamp and fingerprint from BRIDGE_EXTRA_INFO_DESCRIPTOR: + timestamp = "2014-11-04 06:23:22" + Y, M, rest = timestamp.split("-") + fpr = "E08B324D20AD0A13E114F027AB9AC3F32CA696A0" + newerFpr = "E08B324D20AD0A13E114F027AB9AC3F32CA696A0" + + total = 0 + needed = b * n + for x in range(b): + if total >= needed: + break + # Re-digest the fingerprint to create a "new" bridge + newerFpr = hashlib.sha1(newerFpr).hexdigest().upper() + # Generate n extrainfos with different timestamps: + count = 0 + for year in range(1, ((n + 1)/ 12) + 2): # Start from the next year + if count >= n: + break + for month in range(1, 13): + if count < n: + newerTimestamp = "-".join([str(int(Y) + year), "%02d" % month, rest]) + newerDuplicate = BRIDGE_EXTRA_INFO_DESCRIPTOR[:].replace( + fpr, newerFpr).replace( + timestamp, newerTimestamp) + descFiles.append(io.BytesIO(newerDuplicate)) + count += 1 + total += 1 + else: + break + + print("Deduplicating %5d total descriptors (%4d per bridge; %3d bridges):" + % (len(descFiles), n, b), end='\t') + return descFiles + + def test_parse_descriptors_parseExtraInfoFiles_benchmark_100_bridges(self): + """Benchmark test for ``b.p.descriptors.parseExtraInfoFiles``.""" + print() + for i in range(1, 11): + descFiles = self.createDuplicatesForBenchmark(b=100, n=i) + with Benchmarker(): + routers = descriptors.parseExtraInfoFiles(*descFiles) + + def test_parse_descriptors_parseExtraInfoFiles_benchmark_1000_bridges(self): + """Benchmark test for ``b.p.descriptors.parseExtraInfoFiles``.""" + print() + for i in range(1, 11): + descFiles = self.createDuplicatesForBenchmark(b=1000, n=i) + with Benchmarker(): + routers = descriptors.parseExtraInfoFiles(*descFiles) + + def test_parse_descriptors_parseExtraInfoFiles_benchmark_10000_bridges(self): + """Benchmark test for ``b.p.descriptors.parseExtraInfoFiles``. + The algorithm should grow linearly in the number of duplicates. + """ + raise SkipTest(("This test takes ~7 minutes to complete. " + "Run it on your own free time.")) + + print() + for i in range(1, 11): + descFiles = self.createDuplicatesForBenchmark(b=10000, n=i) + with Benchmarker(): + routers = descriptors.parseExtraInfoFiles(*descFiles) + def test_parse_descriptors_parseExtraInfoFiles_no_validate(self): """Test for ``b.p.descriptors.parseExtraInfoFiles`` with descriptor validation disabled.

1 0

[bridgedb/master] Merge branch 'fix/15866-broken-bridgeauth' into develop
by isis＠torproject.org 01 May '15

01 May '15

commit ee9a76d18c4328c3f3ce44ec62f166cf870d4003 Merge: 9b2a724 f128f46 Author: Isis Lovecruft <isis(a)torproject.org> Date: Wed Apr 22 01:25:42 2015 +0000 Merge branch 'fix/15866-broken-bridgeauth' into develop bridgedb.conf | 20 +++++++++++++++++++- lib/bridgedb/Main.py | 17 +++++++++++++---- lib/bridgedb/bridges.py | 32 +++++++++++++++++++++++--------- lib/bridgedb/configure.py | 4 ++++ lib/bridgedb/test/test_bridges.py | 29 +++++++++++++++++++++++++++++ 5 files changed, 88 insertions(+), 14 deletions(-)

1 0

[bridgedb/master] Document descriptor parameter in Bridge.updateFromNetworkStatus().
by isis＠torproject.org 01 May '15

01 May '15

commit 7bd8f90d91226fe046fea15fe128614d669cb505 Author: Isis Lovecruft <isis(a)torproject.org> Date: Thu Apr 30 05:54:51 2015 +0000 Document descriptor parameter in Bridge.updateFromNetworkStatus(). --- lib/bridgedb/bridges.py | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/lib/bridgedb/bridges.py b/lib/bridgedb/bridges.py index 5c0afd8..93b9fe9 100644 --- a/lib/bridgedb/bridges.py +++ b/lib/bridgedb/bridges.py @@ -1381,10 +1381,11 @@ class Bridge(BridgeBackwardsCompatibility): def updateFromNetworkStatus(self, descriptor): """Update this bridge's attributes from a parsed networkstatus - descriptor. + document. - :type ns: :api:`stem.descriptors.router_status_entry.RouterStatusEntry` - :param ns: + :type descriptor: + :api:`stem.descriptors.router_status_entry.RouterStatusEntry` + :param descriptor: The networkstatus document for this bridge. """ self.descriptors['networkstatus'] = descriptor

1 0