Dear all,
This is a reminder that today there will be the weekly OONI meeting.
It will happen as usual on the #ooni channel on irc.oftc.net at 17:00
UTC (19:00 CEST, 13:00 EST, 10:00 PST).
Everybody is welcome to join us and bring their questions and feedback.
See you later,
-Simone
Hello Oonitarians,
This is a reminder that today there will be the weekly OONI gathering.
It will happen as usual on the #ooni channel on irc.oftc.net at 17:00
UTC (19:00 CEST, 13:00 EST, 10:00 PST).
Everybody is welcome to join us and bring their questions and feedback.
See you later,
~ Arturo
Greetings!
We are very pleased to announce that the registrations for the OONI Open Data hackathon “A Dive Into Network Anomalies” are now open!
If you would like to come to Rome, Italy October 1st-2nd 2015 and hack on some internet censorship data in the Italian Parliament NOW is the moment to register.
By visiting https://ooni.torproject.org/event/adina15/ at the end of the page you will see a list of projects that we have proposed ourselves. We strongly encourage people to submit their own ideas and don’t want to put any limits on what you can do.
We are also very happy to announce that AirVPN has offered to sponsor the prizes, so there will be some interesting prizes for the awarded teams.
Unfortunately due to particular house rules it is a requirement that people submit their full legal name and we need to submit these names to the Italian Parliament before the event. Moreover the style police will be making sure that people are wearing jackets at all time while inside of the Italian Parliament (again, sorry this is not our rule, but be creative, what is the REAL definition of a jacket? ;)).
If you need travel and accommodation support do not hesitate to contact me as we do have some (small) budget for that and would be very happy if it were used by somebody that would not be able to attend otherwise.
Nothing is left, but to invite you to register and see you soon in Italy!
More information at:
https://ooni.torproject.org/event/adina15
#ADINA15
https://twitter.com/OpenObservatory/status/634416448820092929
Have fun!
The OONI team & Arturo
Hi there,
I had an email exchange with Arturo and he recommended I post this on the list.
> Arturo, have you looked at https://zeppelin.incubator.apache.org/ ? Also remember I mentioned SlamData ? It claims to do very fast MongoDB analytics (http://slamdata.com/use-cases/#operational-analytics) - and Michele, they are functional heads (using Purescript for the front-end https://github.com/slamdata/slamdata and talking about relational algebra http://slamdata.com/murray/). Also I think you mentioned that you tried to have your tera byte of data in Mongo but that there were size leaks? Can you talk more about that? Also have you looked at http://gopulsar.io/ ? What is the best place to ask questions about 1/ the data processing pipeline systems 2/ the type of analytics that could be done on the data?
These are all great suggestions and I think the only one I had briefly glanced at is zeppelin. Being still an incubator project it may not be something good to rely on, but perhaps something interesting to play around with (for doing similar tasks we currently import a partition of the data into elastic search and then use kibana to explore it).
Makes sense. Haven’t tried any of them myself, and just stumbled upon the Slamdata one because of my interest in functional front-end approaches and they seem to be doing interesting stuff with Mongo and a kind of Python notebook approach to creating analysis documents which I thought was cool.
It would be great if this information were posted to the ooni-dev mailing list (https://lists.torproject.org/cgi-bin/mailman/listinfo/ooni-dev) as perhaps some people there with more knowledge than me can comment on them too.
Done!
Regarding the size leaks of mongoDB I didn’t do much investigation into it, though I still have a copy of the original database if you would like to check it out. Basically the problem was that we had something like ~100GB of raw data (~10GB compressed) that when placed inside of the database would increase to a size of ~500GB. I didn’t dig too much into the issue as that iteration of the pipeline was presenting also other scaling issues (the processing of the batch jobs was not being distributed, but just done on a single machine) so we just ditched the mongodb approach and went for elastic search since a community member had recommended we use that.
From what I hear, elastic search is generally doing the job. I’d be interested in understanding what’s the current and planned view on what is done by elastic search (I guess simple full text or maybe also faceted queries?) from what is done by the hadoop cluster from what is done by the storm cluster.
One of the main questions I have with regards to the overall analysis capability of the data pipeline is: is there an approach that would allow analysts (rather than developers) to deploy rapidly complex queries into the data pipeline? In other words, can we shorten the feedback loop between thinking “hey that would be a cool analysis to try” and “look here’s the result!”. It seems that this would help iterate and be more creative with analysing the data.
Maybe its something like a staging/analysis cluster which is a replica of the production cluster but with additional ‘analysis’ processes running and write access for analysts to deploy their experiments? I guess something where you can deploy “queries” as map reduce jobs, to both a batch processing cluster and a streaming processing cluster?
Anyhow. This is surely not a priority but maybe it’ll be interesting to some folks here.
Cheers,
Jun
--
Jun Matsushita
Founder, CEO - information innovation lab - iilab.org - @iilab
+49 157 530 86174 - jun(a)iilab.org - skype: junjulien
I'm working on a project where we're using OONI http_requests reports to
find servers that block Tor users. I'd like you to check my
understanding of where the URL lists for testing come from. I wrote:
A single OONI report typically tests many URLs. For the most
part, reports use the Citizen Lab URL testing lists
(https://github.com/citizenlab/test-lists) which contain about
1,200 "global" URLs, plus up to about about 900 additional
country-specific URLs that are tested depending on the country
in which ooniprobe is run. Users may run ooniprobe with their
own custom URL lists; the results are uploaded to the global
report pool. (There are several one-off reports in the data that
test a single URL.)
Is this true? Has it always been like this? How do ooniprobe users get
updates when the list changes?
Hello Oonitarians,
This is a reminder that tomorrow there will be the weekly OONI gathering.
It will happen as usual on the #ooni channel on irc.oftc.net at 17:00
UTC (19:00 CEST, 13:00 EST, 10:00 PST).
Everybody is welcome to join us and bring their questions and feedback.
See you later,
~ Arturo
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
I got a server from ehostidc.cn .
They give servers UNDER the Korean restriction: hundreds of thousands
destination (including porn and gambling) redirected here:
http://warning.or.kr/
Also if i address an IP directly (it's not dns-based censorship).
Now i need to dismiss that server, i cannot install an OONI probe on it.
I write here because if anyone want to investigate about South Korea,
this info can be an useful resource.
Clodo
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAEBCAAGBQJVsNVPAAoJEC/ixHrG0m4LriAH/RSY+1+mh5ncirS6UiF1k7EH
LtXkJyF3OMl29NCAFNjbMu13EmgJQ22Fp45raX4Xh5Cjkez8CzR70b/nR/VA6qVt
DgF2De26vSQjvvt8y74+ANdZqeWtLtjOJNB4kplbem2ifjlN+Q4GFf3TFyl8tUuq
5ck5jZDhDAq6O6XHsUO7TSUO/VPkW6Y0k/zhT17ShnQb/sLgfgfx8fxYX22z7rs8
0Ue4GGSTuJR6xshWDSa2liTsTQE0xiEcj8/OeKPUKio538uwQXBQ1zt9LvkMAR4I
Ri7rC64ctlQ7uMffomdGFUsct8wD4v4xzKgVWDKfb5cPp9UM2krrM4BxhqAlxMc=
=Uvn0
-----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
Hi, i recently do some maintenance on a website called ipleak.net.
I added a json/api feature, and i think can be useful in a OONI probe
to detect DNS spoofing/injection.
For example, fetch this: (change the third-level domain to a random hash
):
https://a_long_random_hash_for_every_request.ipleak.net?mode=json
The domain are resolved by ISP, they DNS query ask the resolution of
the random domain to our authoritative server,
our server collect the IP address of the latest ISP DNS that request
the domain and report it in the http response.
Note that if the ISP have more DNS server (load-balancing), doing
multiple requests (every with a new hash) can return many DNS IP.
For example, here in Italy doesn't matter if i try to use Google DNS
8.8.8.8, my ISP (Vodafone) always do a 'Transparent DNS', they capture
any request over port 53 and redirect to their DNS.
If a country do the same thing for censorship reason, you can detect
it with this technique.
If this feature is interesting for OONI, feel free to use it on
ipleak.net throught our API.
Otherwise, if you prefer to implement yourself, i'm here for free
support.
You need a domain with NS record that point to a server you control
(dns authoritative), a bind9, a named-pipe between bind9 and a script,
and a wildcard SSL certificate if you want all under SSL.
Ciao!
Clodo
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAEBCAAGBQJVpRXGAAoJEC/ixHrG0m4LVQoH/RAJ4dPCTcCsV0tWpgjOprGk
CQJ2BLuqqc+1tr9sUansLYhPjTM3MVmsBwqrE18k4jLx/wb9CxJicSWR3BUFCtUt
eZANQPWjo+wr9xrH2G448pHosjsUKUJH+45f05M/RL2ucCwHw39JSh8vrz3SBxiv
QOOrbC5SXFY20kOVtX4uDPjsSyf5e1cpwDmawNUE/anaM7TtOWYMtSQADYozl7/1
6cl1sCfeH0uR4mIshgnIevDf4BYOwUrzVtsxuNh3Z4FCwxX0qcVCgVVv6Iur6O3H
BuMUPAiy5GruD+T95AlHYu5mkh2/0tkCQHu3+xgq4hp0s9IyzL5Bv5sfy9hRVzU=
=20ib
-----END PGP SIGNATURE-----
Hello Oonitarians,
This is a reminder that today there will be the weekly OONI gathering.
It will happen as usual on the #ooni channel on irc.oftc.net at 17:00
UTC (19:00 CEST, 13:00 EST, 10:00 PST).
Everybody is welcome to join us and bring their questions and feedback.
See you later,
~ Arturo