[ooni-dev] Feedback on OONI data collection, aggregation, and visualization

Mon Dec 15 14:03:07 UTC 2014

On 15/12/2014 14:29, Arturo Filastò wrote:
> On 12/9/14, 11:23 AM, Karsten Loesing wrote:
>>> I like the idea that interaction with bridgeDB is opaque to us. All
>>> we care is that they give us a JSON dictionary that has some keys
>>> we expect.
>> Oh, good, you're already talking to BridgeDB people about this.
>>
>> Note that I stopped collecting and sanitizing bridge pool assignments
>> in CollecTor yesterday.  There has been no discussion on the new
>> config line yet.
>>
> What could be the impact of this on our ability to produce the results
> we are currently producing? I am not very familiar with the BridgeDB
> component, so I am sort of expecting them to give us the data format
> described in the ticket #13570 and as long as that is the case we will
> not have worries.
>
> Will this make implemented #13570 more difficult for the BridgeDB team?
>
>> So, the redesign of Tor Metrics and its navigation is not done yet,
>> but it's at a point where we can add new visualizations on bridge
>> reachability quite easily.
>>
>> Just note that we should only add visualizations that are directly
>> related to the Tor network, which is probably only a subset of what
>> OONI produces.  That's why I mentioned bridge reachability as an example.
>>
>> Given your deadline, how about we start with one or more "Link" pages
>> like this one?
>>
>> https://metrics.torproject.org/oxford-anonymous-internet.html
>>
> I am not yet fully certain we will hit the expected deadline as I
> believe Choke Point Project is working on it, but have not spoken to
> them recently so I am not sure how far along they are.
>
> I am adding them to cc to learn if it is probable that we will have
> implemented by the end of this year, the changes I have requested.
>
> However adding it to the link pages sounds like a good interim move once
> we are ready to go public with it.
We are still aiming to deliver the changes before the end of the year.
We're planning to throw time and attention at it starting the 19th. If
something changes to that planning I will let you know asap.
>
>> For each of these pages, I need a title ("Tor users as percentage of
>> larger Internet population"), a permanent graph identifier
>> ("oxford-anonymous-internet"), a short description ("The Oxford
>> Internet Institute made..."), and the link
>> ("http://geography.oii.ox.ac.uk/?page=tor").
>>
> Title:
> Tor Bridge Reachability Timeline
>
> Description:
> The OONI is conducting a study on reachability of Tor bridges inside of
> countries that are known to block access to them. These visualizations
> show how many of the sampled bridges are working from the countries in
> question and which types of pluggable transports are more likely to work
> or not.
>
>> Or, if you have visualizations that don't require server-side code,
>> like d3.js, we can add that code directly to the website.  For example:
>>
>> https://metrics.torproject.org/bubbles.html
>>
> Our visualization don't require any server-side code, but I would need
> to periodically update some static files on that server with data from
> the latest measurements. I believe I should be able to do that running
> one of weasels magic scripts.
>
>>> We do have in mind a multi host sync protocol that follows a
>>> pub-sub paradigm, but for the moment it's implemented using just
>>> simple rsync based polling.
>>>> - I could imagine extending CollecTor to also collect and archive
>>>> OONI reports, as a long-term thing.  Right now CollecTor does
>>>> that for Tor relay and bridge descriptors, TORDNSEL exit lists,
>>>> BridgeDB pool assignment files, and Torperf performance
>>>> measurement results.  But note that it's written in Java and that
>>>> I hardly have development time to keep it afloat; so somebody
>>>> else would have to extend it towards supporting OONI reports.
>>>> I'd be willing to review and merge things.  We should also keep
>>>> CollecTor pure Java, because I want to make it easier for others
>>>> to run their own mirror and help us make data more redundant. 
>>>> Anyway, I can also imagine keeping the OONI report collector
>>>> distinct from CollecTor and only exchange design ideas and
>>>> experiences if that's easier.
>>> That would be awesome!
>>> Can you point me to relevant CollecTor code portions that would be 
>>> helpful to implement this?
>>> It would be great if you could perhaps write a ticket giving some 
>>> pointers to who may be interested in implementing this under the
>>> OONI component of trac.
>> Or, before we talk about code, can you elaborate on the pub-sub
>> paradigm that you mention above?
>>
>> Maybe we can combine my efforts to make CollecTor more redundant with
>> your wish to do the same for OONI reports.  I could imagine running
>> two nodes that add Tor descriptors and mirror OONI reports, and you
>> run nodes that add OONI reports and mirror Tor descriptors.
>>
> I just noticed that I did not write anywhere the ideas I had on this,
> but I did start writing (**very** little) code to implement this.
>
> I propose we start by writing a specification for the protocol and then
> discuss how we can implement it:
> https://trac.torproject.org/projects/tor/ticket/13964
>
>
>> And Java is not an issue for you? :)
> I would prefer not to have to deal with that, but if that means running
> the software on shared infrastructure I would be down for it.
>
> I have written a bit of java and even have a university exam to prove it!
>
>
> ~ Arturo