George Kadianakis transcribed 4.1K bytes:
Currently, OONI bridge reachability reports look like this: https://ooni.torproject.org/reports/0.1/CN/bridge_reachability-2014-07-02T00... and you can retrieve them from this directory listing: https://ooni.torproject.org/reports/0.1/
A few concerns:
1. The tests have no control.
I am concerned that the test has no real control. One cannot say, "The experiment is testing if these bridges are reachable from China, and the control is whether or not they are reachable from the US." The problem with that is that there is absolutely no way to determine if the act of measurement is effecting the data being measured. How do you know that the test isn't causing the bridges to get blocked?
2. This test is attempting to connect simultaneously to multiple bridges with multiple different PT protocols.
That is, this test is doing precisely what we all decided that Tor Browser should *not* do, because the Great Firewall probably can't ask for better filter training material. :(
3. That test still isn't able to reliably start some transports, i.e. fteproxy.
4. The fingerprint should always be in the bridge line; otherwise you've got no proof that you've actually connected to the bridge. :)
5. There is unnecessarily unsafe data in the report output.
BridgeDB sends the bridge descriptors to the Metrics backend, so that Metrics can process them, come up with all the rest of the graphs we have, and put the sanitised data in Onionoo. What if these reports were to contain only data which is public, such as the data which Onionoo currently has?
To play it safe, I would prefer not to have a bunch of bridge fingerprints and ip:ports lying around, on a thousand poorly maintained machines all over the planet. The generated reports could instead output:
* The hashed fingerprint (as is the case for bridges in onionoo) * The hashed ip:port * The transport name * [true|false|null] for whether the test was successful.
This way, the data added to the rest of the bridge's data in onionoo, and all the visualisation/metrics tools which use Onionoo (all of them, I believe) won't need to do anything different. Then BridgeDB could either get the data from Onionoo.
6. Your tests would give more accurate data if they didn't use "real" bridges.
I've mentioned this in #ooni on IRC, but for everyone else: To figure out if a PT protocol is blocked, you do not need to use "real" bridges from Tor Browser or BridgeDB. If you (ideally automatedly) setup a couple bridges for each protocol, this would:
* Reduce the number of test inputs, making test runs complete faster and use less memory. * Eliminate the potential to get "real" bridges blocked through testing. * Test both sides of the connection, thus reducing false negatives. * Allow us to more accurately control variables while attempting to determine if a PT protocol is blocked by a certain country.
Here is one that shows which PTs are blocked in which countries: https://people.torproject.org/~asn/bridget_vis/countries_pts.jpg The list would only include countries that are blocking at least a bridge. Green is "works", red is "blocked". Also, you can imagine the same visualization, but instead of PT names for columns it has distribution methods ("BridgeDB HTTP distributor", "BridgeDB mail distributor", "Private bridge", etc.).
To be honest, I don't care which pool. Also, that data is in already publicly available in Onionoo (or deducible via its lack of availability).
And here is another one that shows how fast jurisdictions block the default TBB bridges: https://people.torproject.org/~asn/bridget_vis/tbb_blocked_timeline.jpg
Neat idea!
These visualizations could be helpful, but they are not the only ones.
What other use cases do you imagine using this dataset for?
In order to better hand out bridges, it would be quite excellent if BridgeDB could someday have something like:
{ hashed_bridge_address: SHA1('IP:PORT'), hashed_bridge_fingerprint: SHA1('FINGERPRINT'), pt_method: PT_METHOD|'vanilla', regions: { ..., BR: { reachable: false, since: TIMESTAMP_WHEN_IT_FIRST_BECAME_UNREACHABLE }, ..., CA: { reachable: true, since: TIMESTAMP_WHEN_IT_FIRST_BECAME_REACHABLE }, CN: { reachable: false, since: TIMESTAMP_WHEN_IT_FIRST_BECAME_UNREACHABLE }, ..., }, }, ...,