Hi all,
Over the last few weeks I've been working with George and Aaron on updating BridgeDB's code with respect to how it handles pluggable transports. I've made some decent progress, but there are some questions that I'd like to ask (because I'm not sure I should be the one making the decision). I've also started updating the spec and there are some parts on which I'd like some clarification. I'll try to summarize the thoughts on the matter we/I have thus far. See [A] if you're unfamiliar with the BridgeDB code/spec/idea.
1) How should BridgeDB decide the number of transports, and types, it should hand out?
- My current patch returns transports based on the ratio of how many there are compared to the other bridges, so that if we hand out four bridges and obfs2 bridges account for 3/10 of all running bridges, then BridgeDB will hand out (4*(3/10)) = 1.2 bridges with each request, on average. - I've also added an option into bridgedb.conf to set the (expected) minimum and maximum number of bridges which support a specific PT that BridgeDB should hand out per request. - I have a verification check that tries to force us to meet these values, however, with its current implementation it's not guaranteed, only probabilistic. I think this is okay for now. - So, is this enough? Do we want/need a deterministic method of supplying bridges with a supported set of transports? - Another option is to place each transport into its own subring and select from each of the subrings to ensure we meet the requirement. The more I've thought about this, the more I think this defeats the purpose of constructing the rings, though. - Last (for now), if a bridge supports multiple PTs, should we return all of them to the user or randomly select one or select one with a bias? We agreed that we really shouldn't do the first because that would just accelerate the ability of a censor to block more bridges. The middle option works, but given that many bridges now support obfs2 and obfs3, is it a good idea to, again, probabilistically return each type (roughly) half the time?
2) Should we prefer to distribute PT bridges over regular bridges which have their ORPort on 443? - Right now returning ORPorts on 443 is the highest priority and transports are a secondary best-effort operation.
3) Unless I incorrectly understand the code, the bridges never rotate. The bridge interval is set to NoSchedule(), which means it returns a static time. Is there a reason for this? This is counter to the spec. Just wondering. :)
(I had some other points I wanted to raise, but I'm blanking on them now. I think this is a good start, though.)
Please also let me know and correct anything I may have gotten wrong.
Thanks everyone, and thanks to George and Aaron for their help, as well.
- Matt
A. For those who don't know the details of the code, the simplified version is as follows:
1) All bridges send their bridge descriptors and misc information to the Bridge Authority. 2) Bridge Authority provides a network status file containing all known bridges described by their name, fingerprint, digest, time of publication, IP addr, ORPort, DirPort. Bridge Auth also provides a bridge descriptor file also specifying the bridges IP addr, ORPort, and fingerprint. Last, it supplies an extra-info file that contains all the extra info that the bridges provide - mainly their transports, in our case. 3) BridgeDB parses all of these files and associates the information to a single instance of a bridge. 4) BridgeDB assigns each running bridge to a distributor (website, email, etc) based on an hmac of the bridge's ID. Once assigned, the bridge is inserted into the distributors list of bridges. 5) BridgeDB then further organizes the bridges assigned to each distributor by moving them into rings and subrings. - A ring is simply a sorted list of an hmac of the bridges' ID which, when traversed, wraps around to the beginning if it ever reaches the end. - The hmac of the bridge's ID is used to retrieve the actual bridge instance from a hash, which is stored along side the ring. 6) Some distributors, such as https, are 'initialized' with a few rings based on filters. - https starts out with a ring containing all bridges assigned to it, a ring only containing bridges which support IPv4 connections, and a ring only containing bridges which support IPv6 connections. - Every ring also contains two subrings (currently). One subring is the subset of bridges from the parent ring which have their ORPort listening on port 443. The other subring is the subset of bridges from the parent ring which have the stable flag set. - For example, - Cluster 1 Ring - subring (stable) - subring (https) - Cluster 2 Ring - subring (stable) - subring (https) - IPv4 Cluster 1 Ring - subring (stable) - subring (https) - IPv4 Cluster 2 Ring - subring (stable) - subring (https) - IPv6 Cluster 1 Ring - subring (stable) - subring (https) - IPv6 Cluster 2 Ring - subring (stable) - subring (https) 7) When BridgeDB receives a request for bridges from its website, it forwards the query on to the IP distributor. The details will include if a specific PT was requested, IP version bridge supports, country within which the bridge should not be blocked, requesing IP address, and interval. 8) The distributor then decides on the "area" of the IP address, currently the /24 mask, and then finds the "cluster" within that area (by taking the first eight bytes of an hmac of the area and using the result (modulus "the number of clusters")). A filter is then constructed based on the requested information. If a ring already exists that satisfies exactly these filters then that is then constructed based on the requested information. If a ring already exists that satisfies exactly these filters then that is used. Else a new ring (with subrings) is constructed to satisfy this request. The distributor also computes the position in the ring as the hmac of the interval and the area. 9) Once the correct ring exists, it determines how many bridges it can find in the ring's subrings to satisfy the request. This is done by taking the previously computed position and finding it in the list of bridges ID's hmacs and then selecting the next consecutive "requested number of bridges" from the list (wrapping around to the beginning, if necessary). The same is then done for the main ring. The results from these searchs are then joined and the first "requested number of bridges" unique keys are selected from the list. This list is then sorted and returned, propagating back up to the user. 10) Similar actions are taken by the other distributors. For example, the email distributor doesn't use an "area" to decide which bridges to distribute, it uses the normalized requesting/source mail address. 11) Misc: - Because the rings are sorted by an hmac of the bridge's ID, I expect that they will be uniformly distributed around the ring. As such, I don't expect there to be a bias for one type of bridge/transport/ORPort over any other. (Is this incorrect?)