[tor-commits] [bridgedb/master] Finish first draft of bridge-db-spec.txt.

karsten at torproject.org karsten at torproject.org
Tue Apr 12 18:53:33 UTC 2011


commit 9d7dad7f97a05eba479c7a84a10e6663c79205ee
Author: Karsten Loesing <karsten.loesing at gmx.net>
Date:   Mon Feb 14 14:52:21 2011 +0100

    Finish first draft of bridge-db-spec.txt.
---
 bridge-db-spec.txt |  245 ++++++++++++++++++++++++++++++++++++----------------
 1 files changed, 171 insertions(+), 74 deletions(-)

diff --git a/bridge-db-spec.txt b/bridge-db-spec.txt
index a9e2c8f..5c8ca57 100644
--- a/bridge-db-spec.txt
+++ b/bridge-db-spec.txt
@@ -5,98 +5,168 @@
 
    This document specifies how BridgeDB processes bridge descriptor files
    to learn about new bridges, maintains persistent assignments of bridges
-   to distributors, and decides which descriptors to give out upon user
+   to distributors, and decides which bridges to give out upon user
    requests.
 
 1. Importing bridge network statuses and bridge descriptors
 
    BridgeDB learns about bridges from parsing bridge network statuses and
-   bridge descriptors as specified in Tor's directory protocol.  BridgeDB
-   SHOULD parse one bridge network status file and at least one bridge
-   descriptor file.
+   bridge descriptors as specified in Tor's directory protocol.
+   BridgeDB SHOULD parse one bridge network status file first and at least
+   one bridge descriptor file afterwards.
 
 1.1. Parsing bridge network statuses
 
    Bridge network status documents contain the information which bridges
-   are known to the bridge authority at a certain time.  We expect bridge
-   network statuses to contain at least the following two lines for every
-   bridge in the given order:
-
-   "r" SP nickname SP identity SP digest SP publication SP IP SP ORPort SP
-       DirPort NL
-   "s" SP Flags NL
-
-   BridgeDB parses the identity from the "r" line and scans the "s" line
-   for flags Stable and Running.  BridgeDB MUST discard all bridges that
-   do not have the Running flag.  BridgeDB MAY only consider bridges as
-   running that have the Running flag in the most recently parsed bridge
-   network status.  BridgeDB MUST also discard all bridges for which it
-   does not find a bridge descriptor.  BridgeDB memorizes all remaining
-   bridges as the set of running bridges that can be given out to bridge
-   users.
-# I'm not 100% sure if BridgeDB discards (or rather doesn't use) bridges
-# for which it doesn't have a bridge descriptor.  But as far as I can see,
-# it wouldn't learn the bridge's IP and OR port in that case, so we
-# shouldn't use it.  Is this a bug?  -KL
-# What's the reason for parsing bridge descriptors anyway?  Can't we learn
-# a bridge's IP address and OR port from the "r" line, too?  -KL
+   are known to the bridge authority and which flags the bridge authority
+   assigns to them.
+   We expect bridge network statuses to contain at least the following two
+   lines for every bridge in the given order:
+
+      "r" SP nickname SP identity SP digest SP publication SP IP SP ORPort
+          SP DirPort NL
+      "s" SP Flags NL
+
+   BridgeDB parses the identity from the "r" line and the assigned flags
+   from the "s" line.
+   BridgeDB MUST discard all bridges that do not have the Running flag.
+   BridgeDB memorizes all remaining bridges as the set of running bridges
+   that can be given out to bridge users.
+   BridgeDB SHOULD memorize assigned flags if it wants to ensure that sets
+   of bridges given out SHOULD contain at least a given number of bridges
+   with these flags.
 
 1.2. Parsing bridge descriptors
 
    BridgeDB learns about a bridge's most recent IP address and OR port
-   from parsing bridge descriptors.  Bridge descriptor files MAY contain
-   one or more bridge descriptors.  We expect bridge descriptor to contain
-   at least the following lines in the stated order:
-
-   "@purpose" SP purpose NL
-   "router" SP nickname SP IP SP ORPort SP SOCKSPort SP DirPort NL
-   ["opt "] "fingerprint" SP fingerprint NL
-
-   BridgeDB parses the purpose, IP, ORPort, and fingerprint.  BridgeDB
-   MUST discard bridge descriptors if the fingerprint is not contained in
-   the bridge network status(es) parsed in the same execution or if the
-   bridge does not have the Running flag.  BridgeDB MAY discard bridge
-   descriptors which have a different purpose than "bridge".  BridgeDB
-   memorizes the IP addresses and OR ports of the remaining bridges.  If
-   there is more than one bridge descriptor with the same fingerprint,
+   from parsing bridge descriptors.
+   Bridge descriptor files MAY contain one or more bridge descriptors.
+   We expect bridge descriptor to contain at least the following lines in
+   the stated order:
+
+      "@purpose" SP purpose NL
+      "router" SP nickname SP IP SP ORPort SP SOCKSPort SP DirPort NL
+      ["opt" SP] "fingerprint" SP fingerprint NL
+
+   BridgeDB parses the purpose, IP, ORPort, and fingerprint from these
+   lines.
+   BridgeDB MUST discard bridge descriptors if the fingerprint is not
+   contained in the bridge network status parsed before or if the bridge
+   does not have the Running flag.
+   BridgeDB MAY discard bridge descriptors which have a different purpose
+   than "bridge".
+   BridgeDB memorizes the IP addresses and OR ports of the remaining
+   bridges.
+   If there is more than one bridge descriptor with the same fingerprint,
    BridgeDB memorizes the IP address and OR port of the most recently
    parsed bridge descriptor.
-# I think that BridgeDB simply assumes that descriptors in the bridge
-# descriptor files are in chronological order.  If not, it would overwrite
-# a bridge's IP address and OR port with an older descriptor, which would
-# be bad.  The current cached-descriptors* files should write descriptors
-# in chronological order.  But we might change that, e.g., when trying to
-# limit the number of descriptors in Tor.  Should we make the assumption
-# that descriptors are ordered chronologically, or should we specify that
-# we have to check that explicitly?  -KL
+# I confirmed that BridgeDB simply assumes that descriptors in the bridge
+# descriptor files are in chronological order and that descriptors in
+# cached-descriptors.new are newer than those in cached-descriptors.  If
+# this is not the case, BridgeDB overwrites a bridge's IP address and OR
+# port with those from an older descriptor!  I think that the current
+# cached-descriptors* files that Tor produces always have descriptors in
+# chronological order.  But what if we change that, e.g., when trying to
+# limit the number of descriptors that Tor memorizes.  Should we make the
+# assumption that descriptors are ordered chronologically, or should we
+# specify that we have to check that explicitly and fix BridgeDB to do
+# that?  We could also look at the bridge descriptor that is referenced
+# from the bridge network status by its descriptor identifier, even though
+# that would require us to calculate the descriptor hash.  -KL
+   If BridgeDB does not find a bridge descriptor for a bridge contained in
+   the bridge network status parsed before, it MUST discard that bridge.
+# I confirmed that BridgeDB discards (or at least doesn't use) bridges for
+# which it doesn't have a bridge descriptor.  What's the reason for
+# parsing bridge descriptors anyway?  Can't we learn a bridge's IP address
+# and OR port from the "r" line, too?  -KL
 
 2. Assigning bridges to distributors
 
-# In this section I'm planning to write how BridgeDB should decide to
-# which distributor (https, email, unallocated/file bucket) it assigns a
-# new bridge.  I should also write down whether BridgeDB changes
-# assignments of already known bridges (I think it doesn't).  The latter
-# includes cases when we increase/reduce the probability of bridges being
-# assigned to a distributor or even turn off a distributor completely.
-# -KL
-
-3. Selecting bridges to be given out via https
-
-# This section is about the specifics of the https distributor, like which
-# IP addresses get bridges from the same ring, how often the results
-# change, etc.  -KL
-
-4. Selecting bridges to be given out via email
-
-# This section is about the specifics of the email distributor, like which
-# characters do we recognize in email addresses, how long we don't give
-# out new bridges to the same email address, etc.  -KL
-
-5. Selecting unallocated bridges to be stored in file buckets
-
-# This section is about kaner's bucket mechanism.  I want to cover how
-# BridgeDB decides which of the unallocated bridges to add to a file
-# bucket.  -KL
+   BridgeDB assigns bridges to distributors on a probabilistic basis and
+   makes these assignments persistent.
+   BridgeDB MAY be configured to support only a non-empty subset of the
+   distributors specified in this document.
+   BridgeDB MAY define different probabilities for assigning new bridges
+   to distributors.
+   BridgeDB MUST NOT change existing assignments of bridges to
+   distributors, even if probabilities for assigning bridges to
+   distributors change or distributors are disabled entirely.
+
+3. Giving out bridges upon requests
+
+   BridgeDB gives out a subset of the bridges from a given distributor
+   upon request.
+   BridgeDB MUST only give out bridges that are contained in the most
+   recently parsed bridge network status and that have the Running flag
+   set.
+   BridgeDB MAY define a different number of bridges (typically 3) to be
+   given out depending on the distributor.
+   BridgeDB MAY define an arbitrary number of rules saying that a certain
+   number of bridges SHOULD have a given OR port or a given bridge relay
+   flag.
+
+4. Selecting bridges to be given out based on IP addresses
+
+   BridgeDB MAY support one or more distributors that are giving out
+   bridges based on the requestor's IP address.
+   BridgeDB MUST fix the set of bridges to be returned for a defined time
+   period.
+   BridgeDB SHOULD consider two IP addresses coming from the same /24 as
+   the same IP address and return the same set of bridges.
+   BridgeDB SHOULD divide the IP address space equally into a small number
+   of areas (typically 4) and return different results to requests coming
+   from these areas.
+# I found that BridgeDB is not strict in returning only bridges for a
+# given area.  If a ring is empty, it considers the next one.  Therefore,
+# it's SHOULD in the sentence above and not MUST.  Is this expected
+# behavior?  -KL
+# I also found that BridgeDB does not make the assignment to areas
+# persistent in the database.  So, if we change the number of rings, it
+# will assign bridges to other rings.  I assume this is okay?  -KL
+   BridgeDB SHOULD be able to respect a list of proxy IP addresses and
+   return the same set of bridges to requests coming from these IP
+   addresses.
+   The bridges returned to proxy IP addresses SHOULD NOT come from the
+   same set as those for the general IP address space.
+   BridgeDB MAY include bridge fingerprints in replies along with bridge
+   IP addresses and OR ports.
+
+5. Selecting bridges to be given out based on email addresses
+
+   BridgeDB MAY support one or more distributors that are giving out
+   bridges based on the requestor's email address.
+   BridgeDB SHOULD reject email addresses containing other characters than
+   the ones that RFC2822 allows.
+   BridgeDB MAY reject email addresses containing other characters it
+   might not process correctly.
+   BridgeDB MUST reject email addresses coming from other domains than a
+   configured set of permitted domains.
+   BridgeDB MAY normalize email addresses by removing "." characters and
+   by removing parts after the first "+" character.
+   BridgeDB MAY discard requests that do not have the value "pass" in
+   their X-DKIM-Authentication-Result header or does not have this header.
+   BridgeDB SHOULD NOT return a new set of bridges to the same email
+   address until a given time period (typically a few hours) has passed.
+# Why don't we fix the bridges we give out for a global 3-hour time period
+# like we do for IP addresses?  This way we could avoid storing email
+# addresses.  -KL
+   BridgeDB MAY include bridge fingerprints in replies along with bridge
+   IP addresses and OR ports.
+
+6. Selecting unallocated bridges to be stored in file buckets
+
+   BridgeDB MAY reserve a subset of bridges and not give them out via one
+   of the distributors.
+   BridgeDB MAY assign reserved bridges to one or more file buckets of
+   fixed sizes and write these file buckets to disk for manual
+   distribution.
+   BridgeDB SHOULD ensure that a file bucket always contains the requested
+   number of running bridges.
+   If the requested number of bridges in a file bucket is reduced or the
+   file bucket is not required anymore, the unassigned bridges are
+   returned to the reserved set of bridges.
+   If a bridge stops running, BridgeDB SHOULD replace it with another
+   bridge from the reserved set of bridges.
 # I'm not sure if there's a design bug in file buckets.  What happens if
 # we add a bridge X to file bucket A, and X goes offline?  We would add
 # another bridge Y to file bucket A.  OK, but what if A comes back?  We
@@ -104,3 +174,30 @@
 # add it to a different file bucket?  Doesn't that mean that most bridges
 # will be contained in most file buckets over time?  -KL
 
+7. Writing bridge assignments for statistics
+
+   BridgeDB MAY write bridge assignments to disk for statistical analysis.
+   The start of a bridge assignment is marked by the following line:
+
+      "bridge-pool-assignment" SP YYYY-MM-DD HH:MM:SS NL
+
+   YYYY-MM-DD HH:MM:SS is the time, in UTC, when BridgeDB has completed
+   loading new bridges and assigning them to distributors.
+
+   For every running bridge there is a line with the following format:
+
+      fingerprint SP distributor (SP key "=" value)* NL
+
+   The distributor is one out of "email", "https", or "unallocated".
+
+   Both "email" and "https" distributors support adding keys for "port"
+   and "flag" and the port number and flag name as values to indicate that
+   a bridge matches certain port or flag criteria of requests.
+
+   The "https" distributor also allows the key "ring" with a number as
+   value to indicate to which IP address areas the bridge is returned.
+
+   The "unallocated" distributor allows the key "bucket" with the file
+   bucket name as value to indicate which file bucket a bridge is assigned
+   to.
+





More information about the tor-commits mailing list