commit e3b46883f593271829787f42d123d9fc3bc13f33 Author: Karsten Loesing karsten.loesing@gmx.net Date: Tue Apr 12 10:19:18 2011 +0200
Respond to Nick's comments. --- bridge-db-spec.txt | 82 +++++++++++++++++----------------------------------- 1 files changed, 27 insertions(+), 55 deletions(-)
diff --git a/bridge-db-spec.txt b/bridge-db-spec.txt index 48a9590..89f0e5c 100644 --- a/bridge-db-spec.txt +++ b/bridge-db-spec.txt @@ -41,15 +41,8 @@
BridgeDB parses the identity from the "r" line and the assigned flags from the "s" line. - BridgeDB MUST discard all bridges that do not have the Running flag. -# I don't think that "discard" is the right word here: we don't actually -# seem to "Forget they exist". Instead, we remember that they are not -# running. (See how parseStatusFile yields a flag that says if the bridges -# are running, and how Main.load sets the bridge's status to running or -# non-running appropriately before passing it to splitter. At no point here -# does a non-running bridge get "discarded", sfaict). -NM - BridgeDB memorizes all remaining bridges as the set of running bridges - that can be given out to bridge users. + BridgeDB memorizes all bridges that have the Running flag as the set of + running bridges that can be given out to bridge users. BridgeDB SHOULD memorize assigned flags if it wants to ensure that sets of bridges given out SHOULD contain at least a given number of bridges with these flags. @@ -58,6 +51,12 @@
BridgeDB learns about a bridge's most recent IP address and OR port from parsing bridge descriptors. + In theory, both IP address and OR port of a bridge are also contained + in the "r" line of the bridge network status, so there is no mandatory + reason for parsing bridge descriptors. But this functionality is still + implemented in case we need information from the bridge descriptor in + the future. + Bridge descriptor files MAY contain one or more bridge descriptors. We expect bridge descriptor to contain at least the following lines in the stated order: @@ -68,45 +67,20 @@
BridgeDB parses the purpose, IP, ORPort, and fingerprint from these lines. - BridgeDB MUST discard bridge descriptors if the fingerprint is not - contained in the bridge network status parsed before or if the bridge - does not have the Running flag. -# See comment above -NM - BridgeDB MAY discard bridge descriptors which have a different purpose - than "bridge". -# "MAY" isn't good enough; we need to know whether it's safe to give -# bridgedb a list of non-bridge-purpose descriptors or not. If it -# discards them, then you shouldn't give bridgedb non-bridge descriptors -# if you _do_ want them handed out. If it doesn't discard them, then you -# shouldn't give bridgedb non-bridge descriptors _unless_ you want them -# handed out. + BridgeDB skips bridge descriptors if the fingerprint is not contained + in the bridge network status parsed before or if the bridge does not + have the Running flag. + BridgeDB discards bridge descriptors which have a different purpose + than "bridge". BridgeDB can be configured to only accept descriptors + with another purpose or not discard descriptors based on purpose at + all. BridgeDB memorizes the IP addresses and OR ports of the remaining bridges. If there is more than one bridge descriptor with the same fingerprint, BridgeDB memorizes the IP address and OR port of the most recently parsed bridge descriptor. -# I confirmed that BridgeDB simply assumes that descriptors in the bridge -# descriptor files are in chronological order and that descriptors in -# cached-descriptors.new are newer than those in cached-descriptors. If -# this is not the case, BridgeDB overwrites a bridge's IP address and OR -# port with those from an older descriptor! I think that the current -# cached-descriptors* files that Tor produces always have descriptors in -# chronological order. But what if we change that, e.g., when trying to -# limit the number of descriptors that Tor memorizes. Should we make the -# assumption that descriptors are ordered chronologically, or should we -# specify that we have to check that explicitly and fix BridgeDB to do -# that? We could also look at the bridge descriptor that is referenced -# from the bridge network status by its descriptor identifier, even though -# that would require us to calculate the descriptor hash. -KL -# We should just look at the 'published' dates in the bridges. Call this a bug, -# I'd say. -NM If BridgeDB does not find a bridge descriptor for a bridge contained in the bridge network status parsed before, it MUST discard that bridge. -# I confirmed that BridgeDB discards (or at least doesn't use) bridges for -# which it doesn't have a bridge descriptor. What's the reason for -# parsing bridge descriptors anyway? Can't we learn a bridge's IP address -# and OR port from the "r" line, too? -KL -# I forget. -NM
2. Assigning bridges to distributors
@@ -114,22 +88,19 @@ given) to clients. The current distributors are "email", "https", and "unallocated".
- BridgeDB assigns bridges to distributors on a probabilistic basis and - makes these assignments persistent. -# Not exactly probabilistic: it's based on an HMAC hash of the bridge's ID -# and a secret. -NM - Persistence is achieved by using a database to map node ID to distributor. + BridgeDB assigns bridges to distributors based on an HMAC hash of the + bridge's ID and a secret and makes these assignments persistent. + Persistence is achieved by using a database to map node ID to + distributor. Each bridge is assigned to exactly one distributor (including the "unallocated" distributor). BridgeDB MAY be configured to support only a non-empty subset of the distributors specified in this document. BridgeDB MAY define different probabilities for assigning new bridges to distributors. - BridgeDB MUST NOT change existing assignments of bridges to + BridgeDB does not change existing assignments of bridges to distributors, even if probabilities for assigning bridges to distributors change or distributors are disabled entirely. -# Why "MUST NOT" here? This seems like a potentially desirable feature. -# "Does not" would be more accurate. -MN
3. Giving out bridges upon requests
@@ -215,12 +186,12 @@ might not process correctly. BridgeDB MUST reject email addresses coming from other domains than a configured set of permitted domains. - BridgeDB SHOULD normalize email addresses by removing "." characters and - by removing parts after the first "+" character. + BridgeDB SHOULD normalize email addresses by removing "." characters + and by removing parts after the first "+" character. BridgeDB MAY discard requests that do not have the value "pass" in their X-DKIM-Authentication-Result header or does not have this header. -# This means in practice that the incoming mail stack needs to check DKIM -# authentication and set X-DKIM-Authentication-Result. + The X-DKIM-Authentication-Result header is set by the incoming mail + stack that needs to check DKIM authentication. BridgeDB SHOULD NOT return a new set of bridges to the same email address until a given time period (typically a few hours) has passed. # Why don't we fix the bridges we give out for a global 3-hour time period @@ -230,12 +201,13 @@ # time values, then people get new bridges when bridges show up, as # opposed to then we decide to reset the bridges we give them. (Yes, this # problem exists for the IP distributor). -NM +# I'm afraid I don't fully understand what you mean here. -KL BridgeDB MAY include bridge fingerprints in replies along with bridge IP addresses and OR ports. BridgeDB SHOULD periodically discard old email-address-to-bridge mappings. - BridgeDB SHOULD reject too many email requires too frequently from the - same normalized address. + BridgeDB SHOULD reject too frequent email requests coming from the same + normalized address.
To map previously unseen email addresses to a set of bridges, BridgeDB proceeds as follows:
tor-commits@lists.torproject.org