[tor-bugs] #4771 [BridgeDB]: bridgedb should make clearer in its logs which addresses it knows are from bulk-exitlist

Tor Bug Tracker & Wiki blackhole at torproject.org
Mon Mar 30 01:08:17 UTC 2015


#4771: bridgedb should make clearer in its logs which addresses it knows are from
bulk-exitlist
--------------------------+--------------------------------------------
     Reporter:  arma      |      Owner:  isis
         Type:  defect    |     Status:  needs_information
     Priority:  minor     |  Milestone:
    Component:  BridgeDB  |    Version:
   Resolution:            |   Keywords:  isis2015Q1Q2, isisExB, isisExC
Actual Points:            |  Parent ID:
       Points:            |
--------------------------+--------------------------------------------

Comment (by isis):

 I am considering changing the way we treat clients making requests from
 Tor exits. Actually, I'm questioning if this has ever worked.

 The part of BridgeDB's codebase where this is supposed to happen is buried
 deep in `lib/bridgedb/Dist.py`, in a function that I've hardly touched,
 and which still contains code from when Nick wrote BridgeDB and the code
 was kept in SVN. The function is
 `bridgedb.Dist.IPBasedDistributor.getBridgesForIP()`, and it currently
 looks like:

 {{{
     def getBridgesForIP(self, ip, epoch, N=1, countryCode=None,
                         bridgeFilterRules=None):
         """Return a list of bridges to give to a user.

         :param str ip: The user's IP address, as a dotted quad.
         :param str epoch: The time period when we got this request.  This
 can
                           be any string, so long as it changes with every
                           period.
         :param int N: The number of bridges to try to give back. (default:
 1)
         :param str countryCode: The two-letter geoip country code of the
             client's IP address. If given, the client will be placed in
 that
             "area". Clients within the same area receive the same bridges
 per
             period. If not given, the **ip** is truncated to it's CIDR /24
             representation and used as the "area". (default: None)
         :param list bridgeFilterRules: A list of callables used filter the
                                        bridges returned in the response to
 the
                                        client. See
 :mod:`~bridgedb.Filters`.
         :rtype: list
         :return: A list of :class:`~bridgedb.Bridges.Bridge`s to include
 in
                  the response. See
 :meth:`bridgedb.HTTPServer.WebResource.getBridgeRequestAnswer`
                  for an example of how this is used.
         """
         logging.info("Attempting to return %d bridges to client %s..."
                      % (N, ip))

         if not bridgeFilterRules:
             bridgeFilterRules=[]

         if not len(self.splitter):
             logging.warn("Bailing! Splitter has zero bridges!")
             return []

         logging.debug("Bridges in splitter:\t%d" % len(self.splitter))
         logging.debug("Client request epoch:\t%s" % epoch)
         logging.debug("Active bridge filters:\t%s"
                       % ' '.join([x.func_name for x in
 bridgeFilterRules]))

         area = self.areaMapper(ip)
         logging.debug("IP mapped to area:\t%s"
                       % logSafely("{0}.0/24".format(area)))

         key1 = ''
         pos = 0
         n = self.nClusters

         # only one of ip categories or area clustering is active
         # try to match the request to an ip category
         for category in self.categories:
             # IP Categories
             if category.contains(ip):
                 g = filterAssignBridgesToRing(self.splitter.hmac,
                                               self.nClusters +
                                               len(self.categories),
                                               n)
                 bridgeFilterRules.append(g)
                 logging.info("category<%s>%s", epoch, logSafely(area))
                 pos = self.areaOrderHmac("category<%s>%s" % (epoch, area))
                 key1 = getHMAC(self.splitter.key,
                                "Order-Bridges-In-Ring-%d" % n)
                 break
             n += 1

         # if no category matches, use area clustering
         else:
             # IP clustering
             h = int( self.areaClusterHmac(area)[:8], 16)
             # length of numClusters
             clusterNum = h % self.nClusters

             g = filterAssignBridgesToRing(self.splitter.hmac,
                                           self.nClusters +
                                           len(self.categories),
                                           clusterNum)
             bridgeFilterRules.append(g)
             pos = self.areaOrderHmac("<%s>%s" % (epoch, area))
             key1 = getHMAC(self.splitter.key,
                            "Order-Bridges-In-Ring-%d" % clusterNum)

         # try to find a cached copy
         ruleset = frozenset(bridgeFilterRules)

         # See if we have a cached copy of the ring,
         # otherwise, add a new ring and populate it
         if ruleset in self.splitter.filterRings.keys():
             logging.debug("Cache hit %s" % ruleset)
             _, ring = self.splitter.filterRings[ruleset]

         # else create the ring and populate it
         else:
             logging.debug("Cache miss %s" % ruleset)
             ring = bridgedb.Bridges.BridgeRing(key1,
 self.answerParameters)
             self.splitter.addRing(ring,
                                   ruleset,
                                   filterBridgesByRules(bridgeFilterRules),
                                   populate_from=self.splitter.bridges)

         # get an appropriate number of bridges
         numBridgesToReturn = getNumBridgesPerAnswer(ring,
 max_bridges_per_answer=N)
         answer = ring.getBridges(pos, numBridgesToReturn)
         return answer
 }}}

 A couple things to note:

  * The `countryCode` parameter is entirely unused. What was it for?

    The only place in BridgeDB's code where
 `IPBasedDistributor.getBridgesForIP()` is called is in
 `bridgedb.HTTPServer.WebResourceBridges.getBridgeRequestAnswer()`, where
 the `countryCode` is passed in, and it is the two-letter geoIP countryCode
 for the client's IP address.

    Are we supposed to be grouping clients by which country they are coming
 from? Shouldn't we group them by whether or they are coming from Tor or
 another known open proxy, and if not, what country they are coming from?

    Or was the `countryCode` parameter supposed to be the country which the
 bridge shouldn't be blocked in?

    (By the way, the docstring was written by me. It was my best guess as
 to what `countryCode` was supposed to be for. There was previously no
 documentation on it.)

  * Should we still be grouping clients by `/24`s? What adversary is that
 effective against? I realise that it isn't very difficult to get a class C
 subnet, but it isn't very difficult to get addresses in different `/24`s.
 Should we make the groups bigger, i.e. group clients by which `/16` they
 are coming from?

  * Why are we still using the `/24` (the `area`) in the code for serving
 bridges to clients coming from Tor exits? This means that changing your
 exit node would get you different bridges. (But not the same bridges as
 people not using Tor.)

  * It seems like a lot of these bugs come from
 [https://gitweb.torproject.org/bridgedb.git/commit?id=f022b905ca01a193aabd4d78107f27fce85c40cd
 commit f022b905ca01a193aabd4d78107f27fce85c40cd] which implemented #4297
 and is where this whole business of making an "IPv5 hashring" and putting
 Tor users in it came from… we should probably look at the other changes in
 that commit and review them.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/4771#comment:14>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list