<div dir="ltr">would you be okay with monthly archives of all tests, or would you want the archives to be separated by test type?<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Mar 18, 2016 at 10:16 PM, David Fifield <span dir="ltr"><<a href="mailto:david@bamsoftware.com" target="_blank">david@bamsoftware.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I just downloaded all the http_requests reports from<br>

<a href="https://measurements.ooni.torproject.org/" rel="noreferrer" target="_blank">https://measurements.ooni.torproject.org/</a>. It took quite a long time and<br>

I wonder if we can make things more efficient by compressing the reports<br>

on the server.<br>

<br>

This is the command I ran to download the reports:<br>

        wget -c -r -l 2 -np --no-directories -A '*http_requests*' --no-http-keep-alive <a href="https://measurements.ooni.torproject.org/" rel="noreferrer" target="_blank">https://measurements.ooni.torproject.org/</a><br>

This resulted in 309 GB and 6387 files.<br>

<br>

If I compress the files with xz,<br>

        xz -v *.json<br>

they only take up 29 GB (9%).<br>

<br>

Processing xz-compressed files is pretty easy, as long as you don't have<br>

to seek. Just do something like this:<br>

        def open_xz(filename):<br>

            p = subprocess.Popen(["xz", "-dc", filename], stdout=subprocess.PIPE, bufsize=-1)<br>

            return p.stdout<br>

<br>

        for line in open_xz("report.json"):<br>

            doc = json.loads(line)<br>

            ...<br>

Of course you can do the same thing with gzip.<br>

_______________________________________________<br>

ooni-dev mailing list<br>

<a href="mailto:ooni-dev@lists.torproject.org">ooni-dev@lists.torproject.org</a><br>

<a href="https://lists.torproject.org/cgi-bin/mailman/listinfo/ooni-dev" rel="noreferrer" target="_blank">https://lists.torproject.org/cgi-bin/mailman/listinfo/ooni-dev</a><br>

</blockquote></div><br></div>