[ooni-dev] Let's come up with the roadmap for the future of OONI

Thu Feb 19 17:02:23 UTC 2015

Thank you all for the feedback and very useful comments!

I will quote replies from another thread and comment on some of the
topics there.

At the end based on the feedback received I will make a list of what are
the next main areas of focus for the next 6 months, some of the relevant
tickets and what new tickets should be created.

If you have ideas on some specific topic/issue/feature that you believe
should be tackled, please do append to this list.

On 2/17/15 5:28 PM, Nick Feamster wrote:
> 
> There are many interesting ideas in here.  Thanks!
> 
> Off the top of my head, one could get a better global picture of censorship using data from the “global censorship measurement” tools (Sam’s Encore tool, Roya and Ben’s “Censored Planet” project) to trigger measurements from OONI, and vice versa.
> 
> Tools such as Encore and CP can get a signal of filtering from a larger and more diverse set of clients than an OONI deployment could (and, at lower risk); the drawback is a lack of detailed information (typically the information is a binary “yes/no” about whether filtering is taking place in TCP/IP, DNS, or HTTP, but not much else).
> 
> I could imagine an OONI deployment using information about observed filtering from Encore or CP to trigger a more extensive and detailed set of tests from the OONI nodes.  Likewise, blocking observed at one OONI node at one layer could be fed back into Encore or CP to see whether any observable filtering behavior is observed across a broader range of sites.  
> 
> I view this as a specific “research use case” for the integration, orchestration, and data analysis points that Arturo lists below.
> 
> I think Roya and Ben are probably not quite ready for this with CP (I’d want to ask them), but Encore is very stable and we could look into good ways of passing information back and forth, either to OONI or to a common visualization engine (or both).
> 
> Some of the other efforts look interesting, but the research payoff isn’t as readily apparent to me.  (I’m suspicious of monthly reports without broader, more global baseline coverage like Encore or CP could provide, but we are also quite interested in visualization tools and reports for that data, so integrating all of that into a single “dashboard” would be something that could be very useful, based on what political scientists and others seem to be asking for.)

This is indeed a very interesting concept. With respect to using
ooniprobe data to trigger other tests, this will be easier once we
finish implementing the API to the pipeline.
We now have all the data collected with ooniprobe inside of a database
and will be working (mainly to provide analytics and visualization
tools) on exposing access and query functionality.

Work on this has only recently started, but you can find code on it
here: https://github.com/hellais/ooni-app.

With resepect to triggering ooniprobe measurements based on data
collected by encore, censored planet, etc. I think we would need to
support: https://trac.torproject.org/projects/tor/ticket/12551 unless we
consider it acceptable to have to wait some time before all deployed
probes run the measurement.

On 2/17/15 5:36 PM, Phillipa Gill wrote:
> I would say the daily measurements from stable nodes is less interesting for us since ICLab is already 99% of the way there on those (ie., running baseline tests from VPN + deployed endpoints). 
> 
> I think the idea of detecting different censorship circumvention tools could be interesting and if the tests are well specified this could be something that is ported/run in ICLab as well.

Yes we plan on openly specifying all the tests that we will be deploying
in a study in a specific country.

I am not sure how many details I can disclose at the moment on this, but
will be sure to update the list when more is known.

On 2/17/15 8:53 PM, Meredith Whittaker wrote:
> The biggest issues that I see for OONI, and all efforts in this nascent
> space are around:
> 
>    - *Getting consistent measurements at scale, over time, across broad
>    geographies.*
>       - Consistent = from a stable set of tests that, *if they do change,
>       are clearly documented as changed*, and this versioning is reflected
>       in collected data and elsewhere. The less change, the better (which isn't
>       to say that new tests are bad, at all, but that there needs to be set
>       "cannon" of core tests that work to set a baseline. Without this, much of
>       the research etc. is much, much less valuable.)
>       - at scale = a lot -- a significant number of representative links,
>       mapping diurnal patterns, etc. (omg obvs, but still)
>       - over time = for as long as possible -- letting us compare the
>       results of Stable Text X for Place Y in 2015 vs 2017, etc..
>       - Across broad geographies = how *are* UK censorship techniques
>       different from those used in Liberia? Etc.
>    - *Dealing with issues around user consent and risk. *
>       - This is huge. We all know that. For now, the more OONI probes and
>       measurement points can be decoupled from individuals (i.e. can
> be deployed
>       without requiring a user to download, install, flash, access),
> the better.
>       Impersonal Pis, or similar, seem like the best option for this currently.
>       But, even these pose risk if they can be linked to an individual
>       (scapegoat) in a hostile country. The bigger OONI's impact, the
> bigger this
>       problem.
> 
> I would advocate that any work undertaken focus on these two goals -- which
> I believe to be fundamental to many of the great other goals that have been
> put forward here.
> 
> With that declaration, my specific comments on the individuals ideas in
> Arturo's thoughtful email:
> 
>    - *Getting daily OONI measurements from 50 countries*
>    This is clearly a laudable goal. I am concerned with the means suggested
>    for achieving this. How can it be done without placing users in danger? How
>    can an answer to that question be obtained for 50 countries (and orders of
>    magnitude more regions, factions, climates within each country)? This is
>    also ambitious practically/logistically -- this is an increase in vantage
>    points that will likely require localization, maintenance, support,
>    troubleshooting, and consistent and well-documented updates, such that
>    apples can be compared to apples and data can stand as "proof." A roadmap
>    that narrows the focus, from 50 to "a pilot of 5," and that addresses the
>    above issues, would be welcome.
> 
>    - *Tests of circumvention tools*
>    This seems cool, but could as easily be titled The How Well is My
>    Censorship Working Index" Answering the question "whom does this serve, and
>    how?" would seem to be the next step to assessing the value of this
>    proposition.
> 
>    - *Orchestrating OONI probes*
>    This seems HUGELY problematic WRT privacy, user consent, and security. I
>    am also concerned with how this butts up agains the need for consistent,
>    and durational testing (which, IMO, is more important as a research tool
>    and a means to understand censorship than novel tests run a couple times
>    during a given month).
> 
>    - *Data analytics and visualizations*
>    Is there enough consistent available data (and a roadmap that would
>    guarantee a consistent pipeline of consistent data over time) to make this
>    useful and worthwhile currently? If/when yes, I would suggest bringing in
>    people who have professional experience with data visualizations and
>    analysis. With M-Lab this goal has been a continual challenge -- there
>    aren't accepted statistical ways of working with network data (more on that
>    if you want in another thread), and visualizations need to be gorgeous,
>    need to be written in a browser-friendly language, need to be maintained
>    and updated, need to not have [too many] gaps, etc.. You are, in this
>    effort, producing a user-facing product. All of the forever-work that
>    attends a product attends this effort.
> 
>    - *Pub system*
>    This seems potentially useful (I don't know enough to be concrete here),
>    but again, my question is, Whose needs does this serve, specifically, and
>    how does serving those needs further a longer-term OONI strategy? More
>    generally, shared storage and transport mechanisms for measured data are
>    something this space could use, for sure. Would this potentially help build
>    those systems?
> 
>    - *Production-quality OONI Pis. *
>    I like this, and I like the idea of partnering with CI lab (hey!) or
>    others. Deployment and maintenance is expensive. The more work these Pis
>    can do once they're deployed, before they die, the better. I would be more
>    enthusiastic about this if it involved a deployment partner, as I don't see
>    a huge value in spending dev time to ensure that the handful of people
>    who'd flash a Pi can.
> 
>    - *OONI on mobile*
>    I vote to have a more stable OONI before launching a mobile test. We at
>    M-Lab have explored mobile at length, and it's tricky (I'm not sure tests
>    written for a non-mobile environment would be as relevant to a mobile
>    environment), deployment is hard (marketing is key! and, who wants to use
>    their data cap on something that isn't whatsapp? (etc.).)
> 
>    - *Research based on OONI*
>    If there's enough consistent data, it could be interesting. But, not
>    sure that there is (?)
> 
>    - *Monthly reports*
>    As above, I think this is premature. Getting good data, and getting
>    enough of it, should come first.
> 
>    - *Adopt an OONI probe*
>    I worry about consent and permissions here. Once those are figured out,
>    I would suggest getting a big donor to adopt (deploy) a bunch of OONIs,
>    instead of a smaller campaign.
> 
>    - *Integration with other censorship measurement projects*
>    Really like the sound of this! The more resources can be shared, the
>    better. I'll let y'all discuss...
> 
>    - *Reaching out to communities inside censored regions (like the UK?)*
>    I'm all for this, but I think this should be led by groups like Citizen
>    Lab, maybe Amnesty, and others that have experience in qualitative user
>    studies and access to networks on the ground.
> 
>    - *Redesign the OONI website*
>    Definitely necessary. Not clear on its priority. I would suggest that
>    any redesign minimize the focus on "censorship" (and the use of the term).
>    For all the reasons we've discussed. And that a technical writer and
>    someone with some communications training be employed in drafting and
>    tinkering.
> 
>    - *Internet censorship conference*
>    What would the goals be? How would things be better/different when these
>    goals were achieved? Without a concrete motive, I worry that this could be
>    another "fly the same people somewhere new and pretend we're innovating"
>    model.
> 
>    - *Implement a GUI for OONI*
>    I would prioritize this after the backend is stable, and the consent and
>    permissions issues have been worked through. (I also think this is
>    something that should engage designers and UX experts outside of the OONI
>    core team, because all the reasons 

I agree fully with all that you say. In particular I think, although I
had not put it amongst the options to vote on, that we should dedicate
quite a bit of time working on sorting out the informed consent issue.

On 2/19/15 12:08 AM, Jed Crandall wrote:
> Sorry for chiming in late, have been a bit under the weather this week.
> I'll just second Philipp's comment:
> 
> https://lists.torproject.org/pipermail/ooni-dev/2015-February/000253.html
> 
> I.e., a "me too" for giving "Implement data analytics and visualization
> for OONI tests" a 5.  I don't mind writing a bit of code to start
> looking at some data, but there's so much data out there and I'm more
> likely to start looking at data (or suggest that a research or
> networking class student do so) if I already have some idea of what's
> there.
> 
> I'll also add that a "wish list" of problems you'd like solved could be
> helpful, like:
> 
> https://research.torproject.org/ideas.html
> https://www.torproject.org/getinvolved/volunteer.html.en#Research
> 
> For example, I've found IP geolocation services like MaxMind to be
> pretty bad in certain places.  It says something is in a country and
> it's not, which makes debugging why that data point doesn't behave like
> others that are supposed to be in that country a pain.  Is that a
> problem that would help OONI if solved?  If not, is something else?
> 
> FWIW, this is a step in the direction of doing IP geolocation better in
> parts of the world other than the U.S. and Europe:
> 
> http://www.cs.unm.edu/~crandall/infocom2015rtt.pdf
> 
> Lastly, in terms of baselines, even very specific baslines like "Tor
> bridge reachability in Country X" can be very illuminating if the amount
> of space and time the data is taken over is broad enough.  Measuring
> everything everywhere all the time would be nice, of course, but like
> Salvador Dali said, "Have no fear of perfection - you'll never reach
> it."
> 

The suggestion for the wish list of problems we would like solved
research wise is very good.

There are a few of those and I think it would be a good idea to write
them down somewhere.

The MaxmMind issue is indeed a problem of ours. Luckily we will collect
by default also the ASN and in most cases you will be able to identify
the inconsistency manually by looking up the details of the ASN (in a
whois database or similar).

Regarding reachability of Tor brides we have at this point about 3-4
months of data (daily measurements) for obfs2/obfs3/scramblesuit/fte
bridges in Iran, China, Russia, Ukraine.

The latest data is not yet public, because it needs to be scrubbed of
the metadata and I have not yet finished re-setting up the pipeline
(after our other machine ran out of disk space).

We also have some visualization, that needs to be completed, for it and
if somebody is interested in hacking on this I could give them access to
the data and code.

....

And now onto the result of the voting session:

Implement data analytics and visualization for OONI tests
4.75

Reach production quality ooni rasperry-pi (beagle-board) images
4.25

Develop OONI tests for censorship circumvention tools
4

Develop scheme for orchestrating ooni-probes
4

Promote and further develop OONI on mobile (Android, iOS)
4

Get daily OONI measurements from 50 countries
3.75

Publish monthly reports about the status of internet censorship in a country
3.75

Implement a GUI for ooniprobes
3.75

Do research based on OONI
3.666666667

Implement pub-sub system for ooni collectors
3.5

Integration with other censorship measurement projects
3.5

Reaching out to communities inside of censored regions
3.5

Run "adopt an ooni-probe" campaign
3.25

Redesign the website for ooni
2.75

Hold an international internet censorship conference
2.25

To me these results are not surprising at all and it reflects more or
less what have already been the main areas of work for the past couple
of months.

I think therefore we should continue in this direction and hence focus
on the the three main areas of "Implement data analytics and
visualization for OONI tests", "Reach production quality ooni
rasperry-pi (beagle-board) images", "Informed consent research",
moreover we will also be working on "Develop OONI tests for censorship
circumvention tools" for a project in a specific country.

Here is a list of what are the existing tickets on these areas and what
are some potentially new tickets to be created.

# Implement data analytics and visualization for OONI tests

## Existing tickets

Add generation of reports index to the export task of the pipeline
https://trac.torproject.org/projects/tor/ticket/13842

Migrating OONI data-pipeline containers and server configuration on a
different server
https://trac.torproject.org/projects/tor/ticket/13825

Better and more efficient database schema
https://trac.torproject.org/projects/tor/ticket/13803

Mongodb queries for the nettest visualization
https://trac.torproject.org/projects/tor/ticket/13759

Brainstorm ideas for possible visualisations
https://trac.torproject.org/projects/tor/ticket/13731

Investigate possible performance improvements to the ooni-pipeline
https://trac.torproject.org/projects/tor/ticket/13720

Align the dates in the visual timeline
https://trac.torproject.org/projects/tor/ticket/13639

Better tokening in the output json data format for bridge reachability
visualisation
https://trac.torproject.org/projects/tor/ticket/13638

## New tickets

### Design and implement OONI reports explorer

This will allow users of OONI to explore the data that we have so far
collected, by filtering and searching it.

# Reach production quality ooni rasperry-pi (beagle-board) images

## Existing tickets

OONI on Raspberry Pi
https://trac.torproject.org/projects/tor/ticket/13870

## New tickets

### Embedded device configuration wizarcd

Setup a OONI wifi network on the raspberry pi to configure the device at
first
start. This will allow the user to configure how ooni-probe should
connect to
the internet and what measurements should be run.

It would also be useful to provide an informed consent information page.

# Informed consent research

## Existing tickets

Write documentation of benefits for running ooniprobe
https://trac.torproject.org/projects/tor/ticket/14760

Brainstorm on possible ways of minimizing the risks involved with
running ooniprobe while keeping the benefits
https://trac.torproject.org/projects/tor/ticket/14761

Redesign how we inform the user of the risks of running ooniprobe and
get informed consent from them
https://trac.torproject.org/projects/tor/ticket/14762

## New tickets

Get legal feedback for the risks of running ooniprobe in a set of specific
countries

Thanks for taking the time to reach this.

I will soon send out an email to schedule next weeks IRC meeting, since
we have skipped it this week.

Have fun!

~ Arturo