[tor-dev] BridgeDB - Bridge Distribution Modifications

Matthew Finkel matthew.finkel at gmail.com
Tue May 14 06:08:01 UTC 2013

Hi all,

Over the last few weeks I've been working with George and Aaron on
updating BridgeDB's code with respect to how it handles pluggable
transports. I've made some decent progress, but there are some
questions that I'd like to ask (because I'm not sure I should be the
one making the decision). I've also started updating the spec and
there are some parts on which I'd like some clarification. I'll try to
summarize the thoughts on the matter we/I have thus far. See [A] if
you're unfamiliar with the BridgeDB code/spec/idea.

1) How should BridgeDB decide the number of transports, and types, it
   should hand out?

  - My current patch returns transports based on the ratio of how many
    there are compared to the other bridges, so that if we hand out
    four bridges and obfs2 bridges account for 3/10 of all running
    bridges, then BridgeDB will hand out (4*(3/10)) = 1.2 bridges with
    each request, on average.
  - I've also added an option into bridgedb.conf to set the (expected)
    minimum and maximum number of bridges which support a specific PT
    that BridgeDB should hand out per request.
  - I have a verification check that tries to force us to meet these
    values, however, with its current implementation it's not
    guaranteed, only probabilistic. I think this is okay for now.
  - So, is this enough? Do we want/need a deterministic method of
    supplying bridges with a supported set of transports?
  - Another option is to place each transport into its own subring and
    select from each of the subrings to ensure we meet the requirement.
    The more I've thought about this, the more I think this defeats
    the purpose of constructing the rings, though.
  - Last (for now), if a bridge supports multiple PTs, should we return
    all of them to the user or randomly select one or select one with a
    bias? We agreed that we really shouldn't do the first because that
    would just accelerate the ability of a censor to block more bridges.
    The middle option works, but given that many bridges now support
    obfs2 and obfs3, is it a good idea to, again, probabilistically
    return each type (roughly) half the time?

2) Should we prefer to distribute PT bridges over regular bridges which
   have their ORPort on 443?
  - Right now returning ORPorts on 443 is the highest priority and
    transports are a secondary best-effort operation.

3) Unless I incorrectly understand the code, the bridges never rotate.
   The bridge interval is set to NoSchedule(), which means it returns
   a static time. Is there a reason for this? This is counter to the
   spec. Just wondering. :)

(I had some other points I wanted to raise, but I'm blanking on them
now. I think this is a good start, though.)

Please also let me know and correct anything I may have gotten wrong.

Thanks everyone, and thanks to George and Aaron for their help, as well.

- Matt

A. For those who don't know the details of the code, the simplified
   version is as follows:

   1) All bridges send their bridge descriptors and misc information
      to the Bridge Authority.
   2) Bridge Authority provides a network status file containing all
      known bridges described by their name, fingerprint, digest,
      time of publication, IP addr, ORPort, DirPort. Bridge Auth also
      provides a bridge descriptor file also specifying the bridges
      IP addr, ORPort, and fingerprint. Last, it supplies an extra-info
      file that contains all the extra info that the bridges
      provide - mainly their transports, in our case.
   3) BridgeDB parses all of these files and associates the information
      to a single instance of a bridge.
   4) BridgeDB assigns each running bridge to a distributor (website,
      email, etc) based on an hmac of the bridge's ID. Once assigned,
      the bridge is inserted into the distributors list of bridges.
   5) BridgeDB then further organizes the bridges assigned to each
      distributor by moving them into rings and subrings.
     - A ring is simply a sorted list of an hmac of the bridges' ID
       which, when traversed, wraps around to the beginning if it ever
       reaches the end.
     - The hmac of the bridge's ID is used to retrieve the actual
       bridge instance from a hash, which is stored along side the ring.
   6) Some distributors, such as https, are 'initialized' with a few
      rings based on filters.
     - https starts out with a ring containing all bridges assigned to
       it, a ring only containing bridges which support IPv4
       connections, and a ring only containing bridges which support
       IPv6 connections.
     - Every ring also contains two subrings (currently). One subring
       is the subset of bridges from the parent ring which have their
       ORPort listening on port 443. The other subring is the subset
       of bridges from the parent ring which have the stable flag set.
     - For example,
        - Cluster 1 Ring
          - subring (stable)
          - subring (https)
        - Cluster 2 Ring
          - subring (stable)
          - subring (https)
        - IPv4 Cluster 1 Ring
          - subring (stable)
          - subring (https)
        - IPv4 Cluster 2 Ring
          - subring (stable)
          - subring (https)
        - IPv6 Cluster 1 Ring
          - subring (stable)
          - subring (https)
        - IPv6 Cluster 2 Ring
          - subring (stable)
          - subring (https)
   7) When BridgeDB receives a request for bridges from its website, it
      forwards the query on to the IP distributor. The details will
      include if a specific PT was requested, IP version bridge
      supports, country within which the bridge should not be blocked,
      requesing IP address, and interval.
   8) The distributor then decides on the "area" of the IP address,
      currently the /24 mask, and then finds the "cluster" within that
      area (by taking the first eight bytes of an hmac of the area and
      using the result (modulus "the number of clusters")). A filter is
      then constructed based on the requested information. If a ring
      already exists that satisfies exactly these filters then that is
      then constructed based on the requested information. If a ring
      already exists that satisfies exactly these filters then that is
      used. Else a new ring (with subrings) is constructed to satisfy
      this request. The distributor also computes the position in the
      ring as the hmac of the interval and the area.
   9) Once the correct ring exists, it determines how many bridges it
      can find in the ring's subrings to satisfy the request. This is
      done by taking the previously computed position and finding it
      in the list of bridges ID's hmacs and then selecting the next
      consecutive "requested number of bridges" from the list (wrapping
      around to the beginning, if necessary). The same is then done for
      the main ring. The results from these searchs are then joined and
      the first "requested number of bridges" unique keys are selected
      from the list. This list is then sorted and returned, propagating
      back up to the user.
  10) Similar actions are taken by the other distributors. For example,
      the email distributor doesn't use an "area" to decide which
      bridges to distribute, it uses the normalized requesting/source
      mail address.
  11) Misc:
    - Because the rings are sorted by an hmac of the bridge's ID, I
      expect that they will be uniformly distributed around the ring.
      As such, I don't expect there to be a bias for one type of
      bridge/transport/ORPort over any other. (Is this incorrect?)

More information about the tor-dev mailing list