[ooni-dev] [tor-dev] OONI hackfest summary

David Stainton dstainton415 at gmail.com
Tue Nov 4 11:05:43 UTC 2014

Dear Arturo,

Thanks for being so organized with this whole project.
I will help out more soon... (i've been busy with some other projects lately)
Specifically I see several trac tickets related to docker and ansible
that i'll be able to help with.



On Tue, Nov 4, 2014 at 10:26 AM, Arturo Filastò <art at torproject.org> wrote:
> From October 24th to 26th the OONI team gathered in Berlin for a
> hackfest. Around 20 people ended up showing up and although most of them
> were seasoned Oonitarians some fairly new people joined us that I hope
> will become part of the growing OONI community.
> The scope of the hackfest was that of data analytics and visualization
> with special focus on the Tor bridge reachability study we are currently
> doing.
> # Bridge reachability study
> The goal of this study [1] is that of answering some questions
> concerning the blocking of Tor bridges [2] and pluggable transport [3]
> enabled bridges in the countries of China, Iran, Russia and Ukraine
> (test vantage points).
> To establish a baseline to eliminate the cases in which the bridge is
> marked as blocked, while it is in fact just offline, we measure also
> from a vantage point located in the Netherlands.
> For every test vantage point we perform two types of measurements:
> * A Bridge reachability measurement [4][5] that attempts to build a tor
> circuit using the bridge in question
> * A TCP connect measurement [6][7] that simply does a TCP connect to the
> bridge IP and port
> We run both of the measurements to further debug the reason why the
> blocking is happening, may this be due to a TCP RST or direct IP
> blocking or tor malfunction.
> So far this study has been running for a little less than 1 month.
> # OONI data pipeline
> In order to produce the aggregate data needed to build visualizations we
> have built a data pipeline [8]
> This consists of a series of operations that are done to the raw reports
> in order to strip out sensitive information and place the collected data
> into a database.
> The nice thing is that the data pipeline we have designed is not
> specific to this study, but can and will be in the future expanded to
> export data needed to visualize also the other types of measurements
> done by OONI.
> The data pipeline is comprised of 3 steps (or states, depending on how
> you want to look at it).
> When the data is submitted to a OONI collector it is synchronized with
> the aggregator.
> This is a central machine responsible for running all the data
> processing tasks, storing the collected data in a database and hosting a
> public interface to the sanitised reports. Since all the steps are
> independent from one another it is not necessary that they run on the
> machine, but it may also be more distributed.
> Once the data is on the aggregator machine it is said to be in the RAW
> state. The sanitise task is then run on the RAW data to remove sensitive
> information and strip out some superfluous information. A RAW copy of
> every report is also stored in a private compressed archive for future
> reference.
> Once the data is sanitised it is said to tbe in SANITISED state. At this
> point a import task is run on the data to place it inside of a database.
> The SANITISED reports are then place in a directory that is publicly
> exposed to the internet to allow people to download also a copy of the
> YAML reports.
> At this point is is possible to run any export task that performs
> queries on the database and produces as output some documents to be used
> in the data visualizations (think JSON, CSV, etc.).
> # The OONI hackfest
> The first day of the hackfest was spent going over the scope of the
> project we would be working on in the following days as well as working
> in groups that were interested in tacking the design of one aspect of
> the problem.
> Sticky notes were plentiful and helped us have a clear vision of what
> lied ahead of us.
> By the end of the first day we had clear what were the set of tasks that
> were needed to achieve our goals and which teams would be responsible
> for doing what.
> The second day was almost entirely dedicated to hacking and everybody
> had a task to complete that was either completed by the end of the day
> or sooner. Some people even completed their initially assigned task
> before the end of the day and came back asking for more!
> By the end of the second day we had a real data set to hand over to the
> visualization team, to start producing some pretty graphs based on real
> data.
> We decided that the first visualization we wanted to do should be kept
> as simple as possible and be something that we could also use to debug
> the data we had collected. It should tell us which bridges were working
> when and it should present the information in a way that would highlight
> the country involved and the pluggable transport type.
> A prototype of it can be seen here:
> http://reports.ooni.nu/analytics/bridge_reachability/timeline/
> The code for this visualization can be found here:
> https://github.com/Shidash/OONI-Bridge-Reachability-Timeline
> # Next steps
> * Write scripts for generating the bridge_db.json document based on the
> data that is given to us from the bridge db team
> https://trac.torproject.org/projects/tor/ticket/13570
> * Align the dates in the visual timeline
> https://trac.torproject.org/projects/tor/ticket/13639
> * Better tokenising for bridges so that bridges that have the same
>   fingerprint, but different transport are grouped properly
> https://trac.torproject.org/projects/tor/ticket/13638
> * Finish setting up the docker containers for the steps of the data
>   pipeline
> https://trac.torproject.org/projects/tor/ticket/13568
> * Setup disaster recovery procedure and backup:
> https://trac.torproject.org/projects/tor/ticket/13584
> * Setup monitoring of the probes.
> https://trac.torproject.org/projects/tor/ticket/12549
> * Add support for obfs4
> https://trac.torproject.org/projects/tor/ticket/13597
> * Set upper bound in comparison with the control in the bridge
> reachability timeline
> https://trac.torproject.org/projects/tor/ticket/13640
> * Make sure that the control measurement is for the specific bridge
> measurement
> https://trac.torproject.org/projects/tor/ticket/13655
> Questions and comments should be directed to the ooni-dev mailing list
> or to the #ooni channel on irc.oftc.net.
> Have fun!
> ~ Arturo
> [1] https://lists.torproject.org/pipermail/ooni-dev/2014-October/000184.html
> [2] https://www.torproject.org/docs/bridges
> [3] https://www.torproject.org/docs/pluggable-transports.html.en
> [4]
> https://gitweb.torproject.org/ooni/spec.git/blob/HEAD:/test-specs/ts-011-bridge-reachability.md
> [5]
> https://gitweb.torproject.org/ooni-probe.git/blob/HEAD:/ooni/nettests/blocking/bridge_reachability.py
> [6]
> https://gitweb.torproject.org/ooni/spec.git/blob/HEAD:/test-specs/ts-008-tcpconnect.md
> [7]
> https://gitweb.torproject.org/ooni-probe.git/blob/HEAD:/ooni/nettests/blocking/tcp_connect.py
> [8]
> https://github.com/TheTorProject/ooni-pipeline/blob/master/Readme.md#ooni-pipeline
> _______________________________________________
> tor-dev mailing list
> tor-dev at lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

More information about the ooni-dev mailing list