This is an email I sent to someone at the Internet Archive who wanted to
know about blocking of archive.org. The URLs "http://archive.org" and
"https://archive.org/web/" are in test-lists, so they are being tested
by OONI. See the README for notes on how I do analysis using ooni-sync,
jq, and R.
https://people.torproject.org/~dcf/graphs/archive.org-anomalies-20170709/RE…https://people.torproject.org/~dcf/graphs/archive.org-anomalies-20170709/bl…https://people.torproject.org/~dcf/graphs/archive.org-anomalies-20170709.zip
Here is a description of some basic analysis using OONI to check for
blocking of archive.org. It's based on 2,080 reports covering 59
countries, dated between 2017-07-01 and 2017-07-06. I'm attaching the
source code and a graph that it produces. There are anomalous
measurements found in China, Russia, Venezuela, Mexico, Brazil, and
France. Of these, the ones in China and Russia are clearly the result of
censorship, while the others are ambiguous, and might be random
measurement error or very localized blocking. For a clearer view, you
would want to use reports from a longer time period.
Here is a summary of the countries with anomalous measurements, showing
how many anomalous measurements there were out of how many total.
country anomalous total percent_anomalous
1: CN 1 1 100.0%
2: RU 19 54 35.2%
3: VE 1 4 25.0%
4: MX 1 10 10.0%
5: BR 3 42 7.1%
6: FR 1 100 1.0%
The process of making the graph is basically (1) download OONI reports,
(2) filter them for archive.org measurements, and (3) process the data
using another script. The longest part of the process is downloading the
report files, because they include tests of many domains other than
archive.org (typically about a thousand). Currently it's necessary to
download the full report files and filter them locally. However, OONI
plans to soon deploy a system that will make it possible to download
measurements for just one domain at a time.
== China ==
The one test from China shows blocking by DNS injection (this type of
blocking is characteristic and well documented for the Great Firewall).
In this case, the false DNS response for archive.org that they injected
was the IP address 31.13.69.228, which actually belongs to Facebook.
https://explorer.ooni.torproject.org/measurement/20170701T065636Z_AS4808_oh…
== Russia ==
About 35% of tests in Russia were blocked, which is not surprising given
that a block of archive.org was ordered in 2015.
https://arstechnica.com/tech-policy/2015/06/wayback-machines-485-billion-we…
It's not unusual for a site to be available in some places, even when
ordered blocked, when enforcement of the block is left to individual
ISPs, as seems to be the case here.
The blocked tests came from AS41661 and AS21378. The unblocked tests
came from AS3239, AS8369, AS8427, AS12389, AS16345, AS21127, AS41661,
and AS42668.
The blocks from AS41661 were by DNS injection, affecting both HTTP and
HTTPS. The false IP address returned was 92.255.241.100, whose reverse
DNS is law.filter.ertelecom.ru. The web server at
http://law.filter.ertelecom.ru/ serves a block page in Russian.
https://explorer.ooni.torproject.org/measurement/20170701T190029Z_AS41661_E…
The block from AS21378 was by TCP blocking: the DNS request gave the
correct response 207.241.224.2 and the client was able to establish a
TCP connection to the server, but the firewall did not permit the HTTP
response to arrive.
https://explorer.ooni.torproject.org/measurement/20170701T135420Z_AS21378_c…
== Venezuela ==
One test from AS8048 did not get a response to its DNS request. However
it may just be a random failure (not blocking), because there were two
other successful tests from AS8048, and one successful test from AS6306.
https://explorer.ooni.torproject.org/measurement/20170705T141354Z_AS8048_Ku…
== Mexico ==
As in the Venezuela case, there was one test from AS8151 that didn't get
a DNS response; however there were 9 other successful tests, including
others from AS8151.
https://explorer.ooni.torproject.org/measurement/20170703T060009Z_AS8151_GO…
== Brazil ==
Of the five Brazilian ASes present in the sample of reports, only one
shows anomalies: AS1916, Rede Nacional de Ensino e Pesquisa (National
Education and Research Network). In this network, requests for
http://archive.org (which redirects to https://archive.org) succeed,
while those directly requesting https://archive.org/web/ consistently
time out. I don't have a good explanation for this. Certain kinds of
stateful firewall could plausibly cause such behavior.
== France ==
A single measurement (out of 100) in France timed out requesting
http://archive.org. It was in AS197422 and there were no other reports
in the sample from that AS, so it's hard to say whether it's due to a
block or a random failure.
https://explorer.ooni.torproject.org/measurement/20170705T232621Z_AS197422_…
Currently you can do queries with order_by=test_start_time,
order_by=probe_cc, etc., but you cannot do order_by=index.
https://measurements.ooni.torproject.org/api/v1/files?limit=1&order_by=index
{
"error_code": 400,
"error_message": "Invalid order_by"
}
As I understand it, the difference between index and test_start_time is
that index is always increasing over time (newly uploaded reports always
get a higher index than existing reports), while newly uploaded reports
can have a test_start_time that is in the past (if the probe was not
able to upload for a time, for example).
The ability to order_by=index would allow a slight robustness
enhancement in ooni-sync, in the case when a new report is uploaded
while ooni-sync is running. Currently ooni-sync always does
order=asc&order_by=test_start_time&limit=1000
That is, starting with the oldest reports, get a page of 1000 reports at
a time. The issue is what happens when a report from the past is
uploaded while ooni-sync is downloading. In this case ooni-sync will not
notice the new report right away. Here is an example with made-up
indexes and dates:
ooni-sync starts downloading page 0 from index=5000 (2016-01-01) to index=5999 (2016-03-31)
new report with index=9999 (2016-02-01) appears, gets inserted into page 0
ooni-sync finishes downloading page 0
ooni-sync starts downloading page 1 from index=5999 (2016-03-31) to index=6998 (2016-04-05)
ooni-sync finishes downloading page 1
In this example, ooni-sync never downloads the report with index=9999.
Also, it sees index=5999 twice, because index=9999 pushed index=5999
from page 0 to page 1.
An order_by=index option would prevent newly uploaded reports from
unaligning the pages like that (at least when order_by=asc is used).
The reasons why this is minor minor minor and hardly worth mentioning:
* index=9999 will get downloaded the next time you run ooni-sync
* it can't cause ooni-sync to skip any already uploaded reports (it
would, with order=desc, but that's why ooni-sync uses order=asc)
* ooni-sync will see but won't actually download index=5999 twice
* newly uploaded reports are likely to be on the last page anyway