Re: [ooni-dev] Request for feedback on resuming of publishing of the reports

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Hi Arturo, - --- Currently for our data processing needs we have begun to bucket reports by date (every date corresponds to when a certain report has been submitted to the collector). What I would like to know is of the two following options what would be most convenient to you for accessing the data. The options are: OPTION A: Have 1 JSON stream for every day of measurements (either gzipped or plai n) ex. - https://ooni.torproject.org/reports/json/2016-01-01.json - https://ooni.torproject.org/reports/json/2016-01-02.json - https://ooni.torproject.org/reports/json/2016-01-03.json etc. OPTION B: Have 1 JSON stream for every ooni-probe test run and publish them inside of a directory with the timestamp of when it was collected ex. - - https://ooni.torproject.org/reports/json/2016-01-01/20160101T204732Z NL-AS3265-http_requests-v1-probe.json.gz - - https://ooni.torproject.org/reports/json/2016-01-01/20160101T204732Z US-AS3265-dns_consistency-v1-probe.json.gz etc. Since we are internally using the daily batches for doing the processing and analysis of reports unless there is an explicit request to publish them on a test run basis we will probably end up going for option A, so don’t be shy to reply :) - --- I agree with David in that it will be easier to access specific ooni-probe test results using option (B) (i.e. the current solution). What benefits did you identify when considering to switch to option (A)? A few reasons to stick with option (B) include: - - Retaining the ability to run ooni-pipeline on a subset of reports associated with a given time period by filtering by date prefix, and substrings within key names; - - Retaining the ability to distribute small units of work easily among subprocesses; and - - Retaining the idempotent nature of ooni-pipeline, and the luigi framework - switching from lots of small files to a single large file for a given day will invariably increase the time required to recover from failures (i.e. if a small dnst-based test fails to normalise, you'll have to renormalise everything as opposed to a single test; - - Developers will not have to download hundreds of megabytes of data in order to access a traceroute test result that is only a few kilobytes in size; and - - It's generally easier to work with smaller files than it is to work with big files. Cheers, Tyler GPG fingerprint: 8931 45DF 609B EE2E BC32 5E71 631E 6FC3 4686 F0EB (tyler@tylerfisher.org) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJWn1+qAAoJEGMeb8NGhvDr6hUP/R6XcXEejwT8DYuKLoVBpujs CqXtIj88A5JYhtt1npRF/a0peNihzFRbYuQpAUX/D1EdPa1UHDwCuqp1hO642xIJ WePgWHWIS7qzYK/i5LbMXC+oWmfAA0J25SawmjyNWclK+NCgIwQ1k7kleyFP7Ul5 PFjJKLCcuqJkQl1hnZlW7YhgLYZAf2QHOD1cJauLM5aDCNBDUSgfIP+/P/xfFLq3 XqLGFBfNrMXaWmOfDGLR7tV4mS3R4M5L7rL66AiomQULdld4cuLonAht4CWLDuhV MKyKrURixRqgTUoing59OcjgOcGEVQD5P5NaMuVruU1hFbHW0wr430/mEq8pdDW9 BQHZh/VZ/f2xz4rjWiE8Mfl3mgmGfbFiT6WMKQTRY3vr5mwbmefg0/IneJ1eHtIo A/XMX579DQt3V19tMa7rO4TjpdBKIWwJ8/6mwwaw9QrS/I2pmlg8AscLU0oQtMqc 3CcWELOdoV7uIPBVg3TfiL+RLDSxzIJIp0k6IM19tkwZAxGLmD+cZvDo+dxME3fd Y7+fYxovuQrt4vvhaPFU15EDzFWHMoMqlNUSOeC0FuIhpbYbM0Dqn1qL4EScJ+PA +p/rtMTZLiiIJLtYuZCRiZjbaMvqsfZAmPEZ5ZgSShJKsB4jhV4/5LsmHedYCHeW S/IdJ81xUzTfxMUZBqgI =kztT -----END PGP SIGNATURE-----

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 Hi, Are we going to have a per-country directory for published reports? It's a useful option to download all reports from a given country. ~Vasilis -----BEGIN PGP SIGNATURE----- iQIcBAEBCgAGBQJW4wGNAAoJEF+/cLHRJgFiVG0P/3i6uq44uZLhR54hCuAaA7yI JoQ5svAGilI1PAKjlI6ZsLhkzgdKF0PZu3TuZ9KggAvAhvxyzzs35XijTSvAlmsZ U5ZUAgsao/NuDLP8UtxHKBwalkSNA5BJJmcVfTv4vrLlKCT57cU6VEeDT43xfi4+ MK20SvIA88pWTIe5ke9H2wzhDL56DMEO8Lm9ZALPVXNH12EbDOwDgrS52hsr24S4 /IslTbqcBS4X1XAX6aSFuyfnKtpa/DAqW0fsTM9UcNjkHRQSfSG5euy0UHvP4qak OPFJLzjA1cQCm4OsqsNUBpoLjuTAmdNuSzabhOBSd+JlmxeOnArXGREUPQdmePVf Wqy2uRDWCWv9c/jxTSwHNe3msydomcr5SSQ1EgqfHjdPnsxrzVxTGutr4a2uwapt 1ciH4z40i8PzfBlZIATBDJGVwR9CNOeTSU9xPVVx/WuO42VGDfuMbGkmSmFpFHzw ZHjMK5fqVecLvlsWoCpT2EvSaEfYwi4ZQDlJx02UZAHeFzvt8PHV52PBvsGjnvDE J1RP9BuQTiC0p9efikEaue0QToyjxAmg0UYrHd4pXCPr3wQUCeFopK5kiV4uSTgh JqrJNJdPpLIbvVjDzWrYNDoXIpJI1hhv/k7Q73bZ5Y9hCNFi8378i+xntQljLRo/ zsRjLCxYGtM0OaHtq17K =rhzh -----END PGP SIGNATURE-----

On Mar 11, 2016, at 18:34, Vasilis <andz@torproject.org> wrote:
Signed PGP part Hi,
Are we going to have a per-country directory for published reports? It's a useful option to download all reports from a given country.
No, that is not going to be supported in the directory listing. You will be able to list the reports for a given country from the API or perform the filtering yourself by inspecting the filename. The directory structure of the new reports can be seen here: https://ooni.torproject.org/reports/ Note: since there is no more space on the torproject.org box hosting these, the HTTP requests tests are not being published there at the moment and the rsync has stopped. Once we setup the web server for hosting the report dumps /reports will redirect to it. ~ Arturo

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On 11/03/16 14:39, Arturo Filastò wrote:
On Mar 11, 2016, at 18:34, Vasilis <andz@torproject.org> wrote: Are we going to have a per-country directory for published reports? It's a useful option to download all reports from a given country.
No, that is not going to be supported in the directory listing.
You will be able to list the reports for a given country from the API or perform the filtering yourself by inspecting the filename.
Perhaps we should add a daily dump per country of all reports so that a user could be able to download all reports from a given country?
The directory structure of the new reports can be seen here: https://ooni.torproject.org/reports/
Note: since there is no more space on the torproject.org box hosting these, the HTTP requests tests are not being published there at the moment and the rsync has stopped. Once we setup the web server for hosting the report dumps /reports will redirect to it.
The web server hosting the updated reports is ready and can be found here: http://141.20.103.26:8080 I'm waiting for the DNS A record of'measurements.ooni.io' to be added so that I can setup a letsencrypt certificate. ~Vasilis -----BEGIN PGP SIGNATURE----- iQIcBAEBCgAGBQJW4xlFAAoJEF+/cLHRJgFiR2QP/ihuvhxzXVAJfgNdpO1nEPgx EaGdBC2a0v7YrSHJDrEKAgs9e1buWh5Zay47rUdcoRuWuMxWl/rjHJYAKYgS4Quz pTHHc7kXRayA53pNYXzhkAp7LupqFpuHqsJrrCSq3PdEdza+QHgQoxjY6dd53Uhz kcld0v/E505dOao7XLim+t2cQeU0xnu9yWWJl2VZStdzI8P7CC3APldKJGLiEVDb 7jGF6rV8dJ+DKe7E5KNzcVIH82R7tm7Qo+RzZKAYZ5spqPFNegZQtLhQ6S6CbHaB 1GV1Ja2ubu5hCcVkHQKq0cQItVZm+STWOTbQrCXJTgshtXO7eFaatzD1MAqnzAON PclS9/ehAPAtJPPNG2QfSiCMLppcjsLQ4qo9LOB57D/NYG3Q33UndnAi0a9VCApt B+3TeB2aCshuO4Bgfv0LVTis8jrCCzg6jSzx/Bxw4OL0dhig+rqGrkl8Ul1cD4qe LAiCm1QSqN/eUFpNxUkYgBxU0QyxmFnpE3PQeagtWGk/o4AL5BZLYIWHNPANkW+x AQhdeCIVQxQBR+7uoyR5vlBgCnfB7doGmOW712/dnQEWPWy6kqqfdUYlpNz5uc4l aVBTNYaI76or3p6AZvW8FyVHF80//KSz39Vc04851zxICF8BBf/qvWy2XP+IHIjf ciled4x+gtJl47HwHJhp =4r4/ -----END PGP SIGNATURE-----

On Mar 11, 2016, at 20:15, Vasilis <andz@torproject.org> wrote:
The web server hosting the updated reports is ready and can be found here: http://141.20.103.26:8080
I'm waiting for the DNS A record of'measurements.ooni.io' to be added so that I can setup a letsencrypt certificate.
$ dig A +short measurements.ooni.io 141.20.103.26 ~ A.

Awesome! The measurements server in that form is already very useful. I wouldn't worry too much about getting country-specific views; it seems pretty trivial to crawl the current listings to find all reports for a country of interest. On Fri, Mar 11, 2016 at 1:56 PM, Arturo Filastò <art@torproject.org> wrote:
On Mar 11, 2016, at 20:15, Vasilis <andz@torproject.org> wrote:
The web server hosting the updated reports is ready and can be found here: http://141.20.103.26:8080
I'm waiting for the DNS A record of'measurements.ooni.io' to be added so that I can setup a letsencrypt certificate.
$ dig A +short measurements.ooni.io 141.20.103.26
~ A.
_______________________________________________ ooni-dev mailing list ooni-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/ooni-dev
participants (4)
-
Arturo Filastò
-
Tyler Fisher
-
Vasilis
-
Will