[ooni-dev] Turbo OONI report downloader

David Fifield david at bamsoftware.com
Wed Feb 22 21:58:40 UTC 2017


I wrote a program that uses the OONI API to download reports and keep a
local directory of reports up to date. It's much faster than the Wget
loop I used to use and it finishes quickly when there is nothing new to
download.

git clone https://www.bamsoftware.com/git/ooni-sync.git

For example, lately I've had to download a lot of tcp_connect reports. I
run it like this:
	ooni-sync -xz -directory reports.tcp_connect/ test_name=tcp_connect
This command downloads the index of tcp_connect reports and only
downloads the ones that are not already downloaded. It compresses the
downloaded files with xz. The next time I need to update, I run the same
command again, and it only downloads reports that are new since the last
time.

You can use other query parameters supported by the API, like probe_cc,
probe_asn, since, and until. For example:
	ooni-sync -xz -directory reports.is/ probe_cc=IS since=2017-01-01
	ooni-sync -xz -directory reports.as25/ probe_asn=AS25
	ooni-sync -xz -directory reports.tor-turkey/ test_name=vanilla_tor probe_cc=TR
	ooni-sync -xz -directory reports.web_connectivity/ test_name=web_connectivity since=2017-01-01 until=2017-01-02

I prefer to keep all the reports compressed on disk, so I always use the
-xz option, but by default reports are saved unmodified.
-------------- next part --------------
Fast downloader of OONI reports using the OONI API. It works by
downloading an index of available files, and only downloading the files
that are not already present locally. You can run it again and again to
keep a local directory up to date with newly published reports.

Example usage:
	ooni-sync -xz -directory reports test_name=tcp_connect
This command will create the directory "reports" if it doesn't exist,
download all tcp_connect reports that are not already present in the
directory, and compress the downloaded reports with xz.

The query is composed of name=value pairs. Possible query parameters
include:
	test_name=[name]
	probe_cc=[cc]
	probe_asn=AS[num]
	since=[yyyy-mm-dd]
	until=[yyyy-mm-dd]
The value of "test_name" can be "tcp_connect", "web_connectivity",
"nbt", etc. The query parameters "order", "offset", and "limit" are used
internally by the program and will be overridden if present. More
documentation on the API is available from
https://measurements.ooni.torproject.org/api/.

Because the program only uses filename comparison to know whether a
report file has already been downloaded, it is careful not to store a
file under its final filename until it has been completely downloaded.
It is safe to interrupt the program or run two copies simultaneously.

To make your Python scripts capable of processing xz-compressed files,
just call this function intead of open(filename):
	def open_magic(filename):
	    if filename.endswith(".xz"):
	        p = subprocess.Popen(["xz", "-d", "-c", filename], stdout=subprocess.PIPE)
	        return p.stdout
	    else:
	        return open(filename)

Contact: David Fifield <david at bamsoftware.com>


More information about the ooni-dev mailing list