-------- Original Message --------
Subject: [icfp-active] Griffin's ICFP February
Date: 2016-03-03 13:50
From: Griffin Boyce <griffin(a)cryptolab.net>
To: ICFP Active <icfp-active(a)opentechfund.org>
Hello all,
In February, I fought logistics and small delays, but got some
interesting initial results from Russia. Ultimately, I was not able to
gather enough data to complete a paper before the PETS deadline.
However, I have proposed a talk for the HOPE conference in NYC this
summer [1]. My co-author, Jeff Landale, and I will continue working on
a paper with the goal of submitting for USENIX:FOCI.
Initial results from Russia have been surprising. Of the Alexa top 1M
sites, 153,000 seem to be blocked. I've run the test twice, and will
re-run the test from another location, but if the block percentage
really is ~15.3%, then that is quite extreme! I was guessing that
around 2000 sites would be blocked.
Towards the end of the month I prepared for and attended the Internet
Freedom Festival in Valencia, Spain. As part of the paper-writing
process, I wrote up a first draft of the project methodology, which
appears below. Please let me know if you see any gaps or have
additional tests to suggest.
Methodology:
# Website Diagnostics
The sites to be tested are divided into segments of 10,000 websites,
plus custom tests for sites focused on circumvention. This way, if
there’s an error during the test, it’s easier to perform a re-test
without having to test over a million sites (again).
The first test is a simple check to see if the site is available (code
200) or if it’s down. This test is performed from both the target
region and a presumed-uncensored area (typically Germany or the US) to
ensure that the site is not simply down for maintenance. (In the
process, I determine whether it’s a DNS block or an IP block). Once I
have a list of not-(code 200) sites, I dig in deeper. I take a
screenshot of all sites using EyeWitness. Then I see if the site is
being blocked or is giving a proper Block Page. If the block pages
follow a common format *and* give the block reason, I scrape the reasons
and map them to the domains. There can be interesting differences found
here. For example, in Indonesia SMBC Comics is blocked for ‘pornography
and promoting bigotry and sectarian violence,’ when the site is not
pornographic and contains minimal fantasy violence.
If the site is inaccessible for whatever reason, I check for TCP reset
packets, check if there is an SSL/TLS error, check for forced downloads,
and for some sites check for MitM. MitM test is only performed for
sites where a user would typically be trying to download something, such
as Lantern or Tor Browser. In cases of possible MitM, I collect PCAP
data (packet captures) and attempt to download the relevant parts of the
website to compare with a non-suspicious version of the website. While
some regional differences may exist -- news in different languages, for
example -- differences in executables or malware injections are
indicators of active man-in-the-middle attacks.
Once all website tests are performed, I categorize the sites by
cross-referencing Alexa data. Categorizing website content is one of
the harder problems at scale, so relying on Alexa for categorization
appears to be the best solution. For example, it would be impossible
for me to categorize 153,000 blocked websites from Russia given the
sheer number of websites involved.
# Tor & Lantern tests
Once all websites are categorized, I check whether or not Tor Browser
and Lantern are usable within the country. Lantern has a built-in test
for this purpose. For Tor Browser, I use a modified version of Philipp
Winter’s ExitMap, and feed it an up-to-date list of Tor nodes. It then
cycles through thousands of individual tor circuits to check whether
every entry node is available from the test location. From there, I
determine whether all of Tor is blocked, or only some nodes. If only
some guard nodes are blocked, this may mean (depending on the age of the
unblocked nodes relative to the blocked nodes) that the unblocked guard
nodes were set up to track users or indicate that the system for
blocking Tor nodes hasn’t updated since those nodes were created. I
then test whether different kinds of bridges are blocked within the
country by making circuits using bridges (including
obfs2/obfs3/obfs4/fte/scramblesuit and standard bridges). For some
locations, we will also test flashproxy- and snowflake-backed Tor
connectivity. All Tor-related tests are run twice, with pre-selected
nodes where possible, to minimize errors and ensure test accuracy.
# Tests Performed
OONI: website accessibility, HTTP error codes, TCP reset packets
ExitMap (hacked/customized): checks Tor network node connectivity
Tor daemon: test bridge connectivity
EyeWitness: screenshots
Custom nmap script: to get data on SSL/TLS certificates
Custom scripts: to compare SSL/TLS certificates between test locations,
compare websites for MitM determination, collecting blocked website
HTML, collecting blocked website reasons (as needed), and formatting
collected data
# Websites Tested
The top one million sites, as determined by Alexa’s traffic analyses.
This includes a very diverse group of religious and LGBT websites. In
addition, I am testing some circumvention websites. As mentioned above,
these are grouped into segments so that re-tests can be performed
without having to re-test everything.
Box status:
RU =)
UA =|
EG =)
TN =( -- coordinating local tests instead
KG =)
KZ =( -- ISP set up Crunchbang OS instead of Debian
[1] Due to this, I will not be attending PETS.
--
“We have to create; it is the only thing louder than destruction.”
~ Andrea Gibson