Hi Andri,

Thanks for your interest in OONI and your kind words!


On 10/02/2017 09:10 +0000, Andri Effendi wrote:
Why are some websites appearing in the results as being censored, yet
when I try to go to the URL it works?

Yes, there are some false positives in the results and although we do take some measures to reduce them, they sometimes occur.

For further reduction of false positives we handle this in the data processing pipeline.

False positives can occur because:

1) Sometimes when you do a DNS resolution for a domain (converting something like google.com into 8.8.8.8) from two different locations you get back different IPs. We compensate for this by also doing reverse lookups on the IPs and checking if the reverse matches, but sometimes this is not enough.
In the data pipeline we have more advanced heuristics to take this into account.

2) Sometimes the content of a site changes very dramatically between two different locations (for example because when you access a site from Turkey the content is localized in turkish). The state of the art in this field is to use the body length (see: https://www.cs.princeton.edu/~bj6/papers/imc2014-blockpage-detection.pdf), but we also use other things such as HTTP Headers and the HTML Title tag, but it is sometimes not enough.

3) Sometimes the site is not particularly reliable and it will fail to be up when the probe connects to it, but it is up when our control connects to it. We are working on addressing this issue by measuring the global availability of all sites we test and using this as a factor to reduce false positives in this area.
I know there are risks of running OONI, but what level is the risk?

You can read more information about what are the risks associated to running ooniprobe at this page:
https://ooni.torproject.org/about/risks/
Is it just going on sites like thepiratebay.org?

Or OONI going to be probing (Internationally) illegal sites like Drugs
and Abuse material going to be probed in tests as well?

The list of sites we use for testing is a project we work on together with the CitizenLab. You can find the full list of sites we test here:
https://github.com/citizenlab/test-lists/blob/master/lists/global.csv

Some of the rational behind how they are chosen is described a bit here:
https://github.com/citizenlab/test-lists#what-is-it

Let me know if you have any further questions,

~ Arturo