[ooni-dev] Feedback on OONI data collection, aggregation, and visualization

Karsten Loesing karsten at torproject.org
Mon Dec 15 13:52:32 UTC 2014

Hash: SHA1

On 15/12/14 14:29, Arturo Filastò wrote:
> On 12/9/14, 11:23 AM, Karsten Loesing wrote:
>>> I like the idea that interaction with bridgeDB is opaque to us.
>>> All we care is that they give us a JSON dictionary that has
>>> some keys we expect.
>> Oh, good, you're already talking to BridgeDB people about this.
>> Note that I stopped collecting and sanitizing bridge pool
>> assignments in CollecTor yesterday.  There has been no discussion
>> on the new config line yet.
> What could be the impact of this on our ability to produce the
> results we are currently producing? I am not very familiar with the
> BridgeDB component, so I am sort of expecting them to give us the
> data format described in the ticket #13570 and as long as that is
> the case we will not have worries.
> Will this make implemented #13570 more difficult for the BridgeDB
> team?

No, it shouldn't.  This would only affect you if you were downloading
data from CollecTor or Onionoo.  But if you receive data from BridgeDB
in a new format, that is unrelated.

>> So, the redesign of Tor Metrics and its navigation is not done
>> yet, but it's at a point where we can add new visualizations on
>> bridge reachability quite easily.
>> Just note that we should only add visualizations that are
>> directly related to the Tor network, which is probably only a
>> subset of what OONI produces.  That's why I mentioned bridge
>> reachability as an example.
>> Given your deadline, how about we start with one or more "Link"
>> pages like this one?
>> https://metrics.torproject.org/oxford-anonymous-internet.html
> I am not yet fully certain we will hit the expected deadline as I 
> believe Choke Point Project is working on it, but have not spoken
> to them recently so I am not sure how far along they are.
> I am adding them to cc to learn if it is probable that we will
> have implemented by the end of this year, the changes I have
> requested.
> However adding it to the link pages sounds like a good interim move
> once we are ready to go public with it.

Sounds good.

>> For each of these pages, I need a title ("Tor users as percentage
>> of larger Internet population"), a permanent graph identifier 
>> ("oxford-anonymous-internet"), a short description ("The Oxford 
>> Internet Institute made..."), and the link 
>> ("http://geography.oii.ox.ac.uk/?page=tor").
> Title: Tor Bridge Reachability Timeline
> Description: The OONI is conducting a study on reachability of Tor
> bridges inside of countries that are known to block access to them.
> These visualizations show how many of the sampled bridges are
> working from the countries in question and which types of pluggable
> transports are more likely to work or not.

I need such a description for each graph you want to see on Tor
Metrics.  I can help write that as soon as I have the graphs.

>> Or, if you have visualizations that don't require server-side
>> code, like d3.js, we can add that code directly to the website.
>> For example:
>> https://metrics.torproject.org/bubbles.html
> Our visualization don't require any server-side code, but I would
> need to periodically update some static files on that server with
> data from the latest measurements. I believe I should be able to do
> that running one of weasels magic scripts.

You mean building graphs on a host and distributing them to Tor's
various web servers?  Sure, we just need a URL we can put on Tor Metrics.

>>> We do have in mind a multi host sync protocol that follows a 
>>> pub-sub paradigm, but for the moment it's implemented using
>>> just simple rsync based polling.
>>>> - I could imagine extending CollecTor to also collect and
>>>> archive OONI reports, as a long-term thing.  Right now
>>>> CollecTor does that for Tor relay and bridge descriptors,
>>>> TORDNSEL exit lists, BridgeDB pool assignment files, and
>>>> Torperf performance measurement results.  But note that it's
>>>> written in Java and that I hardly have development time to
>>>> keep it afloat; so somebody else would have to extend it
>>>> towards supporting OONI reports. I'd be willing to review and
>>>> merge things.  We should also keep CollecTor pure Java,
>>>> because I want to make it easier for others to run their own
>>>> mirror and help us make data more redundant. Anyway, I can
>>>> also imagine keeping the OONI report collector distinct from
>>>> CollecTor and only exchange design ideas and experiences if
>>>> that's easier.
>>> That would be awesome!
>>> Can you point me to relevant CollecTor code portions that would
>>> be helpful to implement this?
>>> It would be great if you could perhaps write a ticket giving
>>> some pointers to who may be interested in implementing this
>>> under the OONI component of trac.
>> Or, before we talk about code, can you elaborate on the pub-sub 
>> paradigm that you mention above?
>> Maybe we can combine my efforts to make CollecTor more redundant
>> with your wish to do the same for OONI reports.  I could imagine
>> running two nodes that add Tor descriptors and mirror OONI
>> reports, and you run nodes that add OONI reports and mirror Tor
>> descriptors.
> I just noticed that I did not write anywhere the ideas I had on
> this, but I did start writing (**very** little) code to implement
> this.
> I propose we start by writing a specification for the protocol and
> then discuss how we can implement it: 
> https://trac.torproject.org/projects/tor/ticket/13964

Great, thanks!  Will comment on the ticket once I have good ideas.

>> And Java is not an issue for you? :)
> I would prefer not to have to deal with that, but if that means
> running the software on shared infrastructure I would be down for
> it.
> I have written a bit of java and even have a university exam to
> prove it!

Okay. :)

All the best,

Version: GnuPG v1
Comment: GPGTools - http://gpgtools.org


More information about the ooni-dev mailing list