
On 28/08/15 16:58, Arturo Filastò wrote:
On 09 Aug 2015, at 12:57, Daniel Ramsay <daniel@dretzq.org.uk> wrote:
We're already emulating a collector in the blocked.org.uk API, so perhaps we can go directly to peering with the pipeline. Is there any reference information that I can read on how to go about getting this set up (protocols, hostnames, etc)?
Currently peering is achieved by me giving you an AWS shared secret and you running with a daily (or hourly) periodicity the following invoke task: https://github.com/TheTorProject/ooni-pipeline-ng/blob/master/tasks.py#L228
The documentation of these components is basically inexistent, but if you are familiar with fabric and python software it should be quite natural it’s usage.
Basically once you have installed invoke and the required dependencies (listed in requirements.txt) you will edit the invoke.yaml file (using the .example as a template) to include the AWS shared secret.
Then you can configure a hourly cronjob to run:
invoke sync_reports /PATH/TO/YOUR/REPORTS/ARCHIVE
This will lead to you having peered with the OONI data pipeline.
I've read through the scripts, and I think it's a bit more elaborate than I was looking for at this stage, though it would be a better way to work in the long term. I did run a quick test using the commandline oonireport tool (sending a small number of valid probe results), and the time to run the submission isn't bad at all. Writing an export script to extract ooniprobe results from the blocked database and using oonireport to send them via TOR is a lot more performant than I'd expected. You mentioned a while ago setting up a staging collector, is that still a possibility? If not, I can still submit a small number of results to the main collector, which could be verified before enabling all reports to be sent. Many thanks, Daniel.