[ooni-dev] Compressing reports?

Will willscott at gmail.com
Sat Mar 19 05:21:21 UTC 2016


would you be okay with monthly archives of all tests, or would you want the
archives to be separated by test type?

On Fri, Mar 18, 2016 at 10:16 PM, David Fifield <david at bamsoftware.com>
wrote:

> I just downloaded all the http_requests reports from
> https://measurements.ooni.torproject.org/. It took quite a long time and
> I wonder if we can make things more efficient by compressing the reports
> on the server.
>
> This is the command I ran to download the reports:
>         wget -c -r -l 2 -np --no-directories -A '*http_requests*'
> --no-http-keep-alive https://measurements.ooni.torproject.org/
> This resulted in 309 GB and 6387 files.
>
> If I compress the files with xz,
>         xz -v *.json
> they only take up 29 GB (9%).
>
> Processing xz-compressed files is pretty easy, as long as you don't have
> to seek. Just do something like this:
>         def open_xz(filename):
>             p = subprocess.Popen(["xz", "-dc", filename],
> stdout=subprocess.PIPE, bufsize=-1)
>             return p.stdout
>
>         for line in open_xz("report.json"):
>             doc = json.loads(line)
>             ...
> Of course you can do the same thing with gzip.
> _______________________________________________
> ooni-dev mailing list
> ooni-dev at lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/ooni-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.torproject.org/pipermail/ooni-dev/attachments/20160318/5cd3f02a/attachment.html>


More information about the ooni-dev mailing list