[tor-dev] On the visualization of OONI bridge reachability data

Matthew Finkel matthew.finkel at gmail.com
Mon Oct 6 16:28:18 UTC 2014


On Sat, Oct 04, 2014 at 06:27:22PM -0700, M. C. McGrath wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
> 
> Hi,
> These were a few possibilities for visualization that we came up with
> at the OTF summit (I can send the full notes from that discussion if
> everyone is okay with it):
> - - Timelines (by protocol, pool, country)
> - - Pie charts for above
> - - Timeline/graph of time it takes to block bridge from when added to
> TBB (github parser)

Similar to the next one, I wonder if showing a map cooresponding to
this data would also help. At t0, zero countries block the built-in
bridges, at t1 = only China blocks, at t2 = China + Iran block, at t3 =
China + Iran + Syria block, t4 = t3 + Turkey, etc. I'm thinking this
would be nice in addition to the timeline which George sketched (where
some of the time points are clickable and update the map). I don't
actually know how difficult this is to make.

> - - Geographic breakdown by region (if enough data points) Could be
> similar to this map of % of internet users who use Tor by country
> https://transparencytoolkit.org/tormap.html

That's cool. Are you able to add a legend to it?

Having something like this or similar to RSF's Press Freedom Index [0],
based on the number of bridge users, would be nice. This is doable
today, using the available metrics data. We'll probably never be able
to know the number of users per protocol per country, but at least we
can visualize where in the world bridges (in general) are used most
and if this changes over time.

[0] http://rsf.org/index2014/en-index2014.php

But, it would also be really cool if we can create a map like this
based on the reachability of bridges per country per protocol and
maybe, in addition, color-code/denote how the ISPs/country are
interfering with the connection (e.g. throttling, DNS cache
poisoning, IP addr/port blocking).

> - - At what point in the tor bootstrapping does it fail (may be
> difficult to determine, especially anonymized)?

Yes, but there's already a risk to running ooni-probe (at least right
now, hopefully this will change in time). We will eventually need
probes running in most countries if we want a good understanding of
what network interference is taking place and who is affected.

> - - In all visualizations, compare with control (filter, line break,
> plot alongside, etc)
> 
> And the variables we thought would be relevant to visualizations:
> Protocol
> Pool
> Country (and region)- Iran, China, Netherlands (control)
> Time it takes to be blocked
> Point in bootstrap where it fails
> Classify the bridges by commercial/residential connection
> Time we started scanning the bridge from where
> 

Maybe latency measurements per protocol? Initially, I'm thinking
"the time is takes to download a consensus from the bridge" but
there are many variables that may affect this. Anyone have a better
idea?

I think this mostly covers it. The only addition can think of right
now is comparing different control countries against each other (and
different ISPs within the control countries). Maybe we'll find
something interesting.

> It should be relatively simple to make rough versions of a lot of
> visualizations to see what works once we have a parser/converter that
> will generate JSONs (or similar) from OONI output that include the
> variables listed above.
> 

Is someone already working on this? I'm not really volunteering, merely
curious if this is in progress. :)

> Are there any other variables that would be particularly helpful to
> track or visualize? And are there any visualizations (listed or
> otherwise) that anyone would find particularly helpful?
> 
> 
> On 10/04/2014 06:10 PM, George Kadianakis wrote:
> > == What is bridge reachability data? ==
> > 
> > By bridge reachability data I'm referring to information about
> > which Tor bridges are censored in different parts of the world.
> > 
> > The OONI project has been developing a test that allows probes in 
> > censored countries to test which bridges are blocked and which are 
> > not. The test simply takes as input a list of bridges and tests 
> > whether they work. It's also able to test obfuscated bridges with 
> > various pluggable transports (PTs).
> > 
> > == Why do we care about this bridgability data? ==
> > 
> > A few different parties care about the results of the bridge 
> > reachability test [0]. Some examples:
> > 
> > Tor developers and censorship researchers can study the bridge 
> > reachability data to learn which PTs are currently useful around
> > the world, by seeing which pluggable transports get blocked and
> > where.  We can also learn which bridge distribution mechanisms are
> > busted and which are not.
> > 
> > Bridge operators, the press, funders and curious people, can learn 
> > which countries conduct censorship and how advanced technology
> > they use. They can also learn how long it takes jurisdictions to
> > block public bridges. And in general, they can get a better
> > understanding of how well Tor is doing in censorship circumvention
> > around the world.
> > 
> > Finally, censored users and world travelers can use the data to
> > learn which PTs are safe to use in a given jurisdiction.
> > 
> > == Visualizing bridge reachability data ==
> > 
> > So let's look at the data.
> > 
> > Currently, OONI bridge reachability reports look like this: 
> > https://ooni.torproject.org/reports/0.1/CN/bridge_reachability-2014-07-02T000021Z-AS4538-probe.yamloo
> >
> > 
> and you can retrieve them from this directory listing:
> > https://ooni.torproject.org/reports/0.1/
> > 
> > That's nice, but I doubt that many people will be able to access
> > (let alone understand) those reports. Hence, we need some kind of 
> > visualization (and better dir listing) to conveniently display the 
> > data to human beings.
> > 
> > However, a simple x-to-y graph will not suffice: our ploblem is 
> > multidimensional. There are many use cases for the data and
> > bridges have various characteristics (obfuscation method,
> > distribution method, etc.) hence there are more than one useful
> > ways to visualize this dataset.
> > 
> > To give you an idea, I will show you two mockups of visualizations 
> > that I would find useful. Please don't pay attention to the data 
> > itself, I just made some things up while on a train.
> > 
> > Here is one that shows which PTs are blocked in which countries: 
> > https://people.torproject.org/~asn/bridget_vis/countries_pts.jpg
> > The list would only include countries that are blocking at least a 
> > bridge. Green is "works", red is "blocked". Also, you can imagine
> > the same visualization, but instead of PT names for columns it has 
> > distribution methods ("BridgeDB HTTP distributor", "BridgeDB mail 
> > distributor", "Private bridge", etc.).
> > 
> > And here is another one that shows how fast jurisdictions block
> > the default TBB bridges: 
> > https://people.torproject.org/~asn/bridget_vis/tbb_blocked_timeline.jpg
> >
> >  These visualizations could be helpful, but they are not the only
> > ones.
> > 
> > What other use cases do you imagine using this dataset for?
> > 
> > What graphs or visualizations would you like to see?
> > 
> > [0]: Here are some use cases:
> > 
> > Tor developers / Researcers: *** Which pluggable transports are
> > blocked and where? *** Do they do DPI? Or did they just block the
> > TBB hardcoded bridges? *** Which jurisdictions are most aggressive
> > and what blocking technology do they use? *** Do they block based
> > on IP or on (IP && PORT)?
> > 
> > Users: *** Which pluggable transport should I use in my
> > jurisdiction?
> > 
> > Bridge operators / Press / Funders / Curious people: *** Which
> > jurisdictions conduct Tor censorship? (block pluggable
> > transports/distribution methods) *** How quickly do jurisdictions
> > block bridges? *** How many users/traffic (and which locations) did
> > the blocked bridges serve? **** Can be found out through extrainfo
> > descriptors. *** How well are Tor bridges doing in censorship
> > circumvention?
> > 


More information about the tor-dev mailing list