Hi Andri,
Thanks for your interest in OONI and your kind words!
On 10/02/2017 09:10 +0000, Andri
Effendi wrote:
Why are some websites appearing in the results as being censored, yet
when I try to go to the URL it works?
Yes, there are some false positives in the results and although we
do take some measures to reduce them, they sometimes occur.
For further reduction of false positives we handle this in the data
processing pipeline.
False positives can occur because:
1) Sometimes when you do a DNS resolution for a domain (converting
something like google.com into 8.8.8.8) from two different locations
you get back different IPs. We compensate for this by also doing
reverse lookups on the IPs and checking if the reverse matches, but
sometimes this is not enough.
In the data pipeline we have more advanced heuristics to take this
into account.
2) Sometimes the content of a site changes very dramatically between
two different locations (for example because when you access a site
from Turkey the content is localized in turkish). The state of the
art in this field is to use the body length (see:
https://www.cs.princeton.edu/~bj6/papers/imc2014-blockpage-detection.pdf),
but we also use other things such as HTTP Headers and the HTML Title
tag, but it is sometimes not enough.
3) Sometimes the site is not particularly reliable and it will fail
to be up when the probe connects to it, but it is up when our
control connects to it. We are working on addressing this issue by
measuring the global availability of all sites we test and using
this as a factor to reduce false positives in this area.
I know there are risks of running OONI, but what level is the risk?
You can read more information about what are the risks associated to
running ooniprobe at this page:
https://ooni.torproject.org/about/risks/
Is it just going on sites like thepiratebay.org?
Or OONI going to be probing (Internationally) illegal sites like Drugs
and Abuse material going to be probed in tests as well?
The list of sites we use for testing is a project we work on
together with the CitizenLab. You can find the full list of sites we
test here:
https://github.com/citizenlab/test-lists/blob/master/lists/global.csv
Some of the rational behind how they are chosen is described a bit
here:
https://github.com/citizenlab/test-lists#what-is-it
Let me know if you have any further questions,
~ Arturo