[ooni-dev] Feedback on OONI data collection, aggregation, and visualization

Arturo Filastò art at torproject.org
Mon Dec 15 13:29:01 UTC 2014

On 12/9/14, 11:23 AM, Karsten Loesing wrote:
>> I like the idea that interaction with bridgeDB is opaque to us. All
>> we care is that they give us a JSON dictionary that has some keys
>> we expect.
> Oh, good, you're already talking to BridgeDB people about this.
> Note that I stopped collecting and sanitizing bridge pool assignments
> in CollecTor yesterday.  There has been no discussion on the new
> config line yet.

What could be the impact of this on our ability to produce the results
we are currently producing? I am not very familiar with the BridgeDB
component, so I am sort of expecting them to give us the data format
described in the ticket #13570 and as long as that is the case we will
not have worries.

Will this make implemented #13570 more difficult for the BridgeDB team?

> So, the redesign of Tor Metrics and its navigation is not done yet,
> but it's at a point where we can add new visualizations on bridge
> reachability quite easily.
> Just note that we should only add visualizations that are directly
> related to the Tor network, which is probably only a subset of what
> OONI produces.  That's why I mentioned bridge reachability as an example.
> Given your deadline, how about we start with one or more "Link" pages
> like this one?
> https://metrics.torproject.org/oxford-anonymous-internet.html

I am not yet fully certain we will hit the expected deadline as I
believe Choke Point Project is working on it, but have not spoken to
them recently so I am not sure how far along they are.

I am adding them to cc to learn if it is probable that we will have
implemented by the end of this year, the changes I have requested.

However adding it to the link pages sounds like a good interim move once
we are ready to go public with it.

> For each of these pages, I need a title ("Tor users as percentage of
> larger Internet population"), a permanent graph identifier
> ("oxford-anonymous-internet"), a short description ("The Oxford
> Internet Institute made..."), and the link
> ("http://geography.oii.ox.ac.uk/?page=tor").

Tor Bridge Reachability Timeline

The OONI is conducting a study on reachability of Tor bridges inside of
countries that are known to block access to them. These visualizations
show how many of the sampled bridges are working from the countries in
question and which types of pluggable transports are more likely to work
or not.

> Or, if you have visualizations that don't require server-side code,
> like d3.js, we can add that code directly to the website.  For example:
> https://metrics.torproject.org/bubbles.html

Our visualization don't require any server-side code, but I would need
to periodically update some static files on that server with data from
the latest measurements. I believe I should be able to do that running
one of weasels magic scripts.

>> We do have in mind a multi host sync protocol that follows a
>> pub-sub paradigm, but for the moment it's implemented using just
>> simple rsync based polling.
>>> - I could imagine extending CollecTor to also collect and archive
>>> OONI reports, as a long-term thing.  Right now CollecTor does
>>> that for Tor relay and bridge descriptors, TORDNSEL exit lists,
>>> BridgeDB pool assignment files, and Torperf performance
>>> measurement results.  But note that it's written in Java and that
>>> I hardly have development time to keep it afloat; so somebody
>>> else would have to extend it towards supporting OONI reports.
>>> I'd be willing to review and merge things.  We should also keep
>>> CollecTor pure Java, because I want to make it easier for others
>>> to run their own mirror and help us make data more redundant. 
>>> Anyway, I can also imagine keeping the OONI report collector
>>> distinct from CollecTor and only exchange design ideas and
>>> experiences if that's easier.
>> That would be awesome!
>> Can you point me to relevant CollecTor code portions that would be 
>> helpful to implement this?
>> It would be great if you could perhaps write a ticket giving some 
>> pointers to who may be interested in implementing this under the
>> OONI component of trac.
> Or, before we talk about code, can you elaborate on the pub-sub
> paradigm that you mention above?
> Maybe we can combine my efforts to make CollecTor more redundant with
> your wish to do the same for OONI reports.  I could imagine running
> two nodes that add Tor descriptors and mirror OONI reports, and you
> run nodes that add OONI reports and mirror Tor descriptors.

I just noticed that I did not write anywhere the ideas I had on this,
but I did start writing (**very** little) code to implement this.

I propose we start by writing a specification for the protocol and then
discuss how we can implement it:

> And Java is not an issue for you? :)

I would prefer not to have to deal with that, but if that means running
the software on shared infrastructure I would be down for it.

I have written a bit of java and even have a university exam to prove it!

~ Arturo

More information about the ooni-dev mailing list