I was experimenting with adapting ooni-sync to the /api/v1/measurements endpoint. A minimal proof of concept patch is attached. While trying it, I found that the API was returning duplicate measurements and measurements that don't seem to match the query. I'm using this command: ./ooni-sync -xz -directory measurements.archive input=archive.org since=2017-01-01
Here is a query that at the moment happens to return two results with the same measurement_id and measurement_url, but different input and measurement_start_time. There are a few more example of this phenomenon (I found it 6 times in the first 1000 measurements I downloaded).
https://measurements-beta.ooni.io/api/v1/measurements?input=archive.org&... { "input": "http://archive.org", "measurement_id": "51daa51b-07d2-491e-ba2b-9189e1a08146", "measurement_start_time": "2017-01-04T01:55:06Z", "measurement_url": "https://measurements.ooni.torproject.org/api/v1/measurement/51daa51b-07d2-49...", "probe_asn": "AS3243", "probe_cc": "PT", "report_id": "20170104T105911Z_AS3243_OadZCx9yRNvqKYsLQaQDa3c1swLofXEQNtcplXQ14QrXemKcCT", "test_name": "web_connectivity" }, { "input": "http://wayback.archive.org", "measurement_id": "51daa51b-07d2-491e-ba2b-9189e1a08146", "measurement_start_time": "2017-01-04T09:19:42Z", "measurement_url": "https://measurements.ooni.torproject.org/api/v1/measurement/51daa51b-07d2-49...", "probe_asn": "AS3243", "probe_cc": "PT", "report_id": "20170104T105911Z_AS3243_OadZCx9yRNvqKYsLQaQDa3c1swLofXEQNtcplXQ14QrXemKcCT", "test_name": "web_connectivity" },
I also found that the results included some entries whose "input" field didn't seem to match the query. Here is a small sample of them. So far I've found 57/5783 (10%) of downloads whose input doesn't contain "archive.org".
https://measurements-beta.ooni.io/api/v1/measurement/00868be9-2441-42fb-9691... "http://www.imdb.com" https://measurements-beta.ooni.io/api/v1/measurement/0214f18c-058c-44ef-b291... "http://666games.net" https://measurements-beta.ooni.io/api/v1/measurement/03b771b2-9f2c-4eee-8835... "http://www.cesr.org" https://measurements-beta.ooni.io/api/v1/measurement/0cc491bb-30a0-4dea-9271... "http://adultfriendfinder.com" https://measurements-beta.ooni.io/api/v1/measurement/0fdfa57f-836f-4a75-8543... "http://last.fm" https://measurements-beta.ooni.io/api/v1/measurement/10f1f5ad-91c4-46f5-9d6a... "http://www.earthwatch.org" https://measurements-beta.ooni.io/api/v1/measurement/14520322-b00d-437c-be29... "http://abpr2.railfan.net" https://measurements-beta.ooni.io/api/v1/measurement/1bb1aa36-f4fe-440c-be03... "http://666games.net" https://measurements-beta.ooni.io/api/v1/measurement/210703f1-e52c-4740-99b3... "http://amphetamines.com"