[ooni-dev] Wrong measurements from beta Measurements API?

Arturo Filastò arturo at openobservatory.org
Tue Jul 25 12:04:46 UTC 2017


On July 25, 2017 at 2:46:07 AM, David Fifield (david at bamsoftware.com) wrote:
I was experimenting with adapting ooni-sync to the /api/v1/measurements  
endpoint. A minimal proof of concept patch is attached. While trying it,  
I found that the API was returning duplicate measurements and  
measurements that don't seem to match the query. I'm using this command:  
./ooni-sync -xz -directory measurements.archive input=archive.org since=2017-01-01  



Excellent, thanks for trying it out!

Here is a query that at the moment happens to return two results with 
the same measurement_id and measurement_url, but different input and 
measurement_start_time. There are a few more example of this phenomenon 
(I found it 6 times in the first 1000 measurements I downloaded). 

https://measurements-beta.ooni.io/api/v1/measurements?input=archive.org&limit=100&offset=500&order=asc&order_by=measurement_start_time&since=2017-01-01 
{ 
"input": "http://archive.org", 
"measurement_id": "51daa51b-07d2-491e-ba2b-9189e1a08146", 
"measurement_start_time": "2017-01-04T01:55:06Z", 
"measurement_url": "https://measurements.ooni.torproject.org/api/v1/measurement/51daa51b-07d2-491e-ba2b-9189e1a08146", 
"probe_asn": "AS3243", 
"probe_cc": "PT", 
"report_id": "20170104T105911Z_AS3243_OadZCx9yRNvqKYsLQaQDa3c1swLofXEQNtcplXQ14QrXemKcCT", 
"test_name": "web_connectivity" 
}, 
{ 
"input": "http://wayback.archive.org", 
"measurement_id": "51daa51b-07d2-491e-ba2b-9189e1a08146", 
"measurement_start_time": "2017-01-04T09:19:42Z", 
"measurement_url": "https://measurements.ooni.torproject.org/api/v1/measurement/51daa51b-07d2-491e-ba2b-9189e1a08146", 
"probe_asn": "AS3243", 
"probe_cc": "PT", 
"report_id": "20170104T105911Z_AS3243_OadZCx9yRNvqKYsLQaQDa3c1swLofXEQNtcplXQ14QrXemKcCT", 
"test_name": "web_connectivity" 
}, 



So I have confirmed that this is in fact an issue (see: https://github.com/TheTorProject/ooni-pipeline/issues/54) We were somewhat already aware of it, but only after digging more into it the full extent of the issue is apparent, see: https://github.com/TheTorProject/ooni-pipeline/issues/70.


I also found that the results included some entries whose "input" field 
didn't seem to match the query. Here is a small sample of them. So far 
I've found 57/5783 (10%) of downloads whose input doesn't contain 
"archive.org". 

https://measurements-beta.ooni.io/api/v1/measurement/00868be9-2441-42fb-9691-95501d6b93df "http://www.imdb.comhttps://measurements-beta.ooni.io/api/v1/measurement/0214f18c-058c-44ef-b291-9db88cc923dc "http://666games.nethttps://measurements-beta.ooni.io/api/v1/measurement/03b771b2-9f2c-4eee-8835-5128bf9e7832 "http://www.cesr.orghttps://measurements-beta.ooni.io/api/v1/measurement/0cc491bb-30a0-4dea-9271-1f6ba23c2b8a "http://adultfriendfinder.comhttps://measurements-beta.ooni.io/api/v1/measurement/0fdfa57f-836f-4a75-8543-c8dcade5455a "http://last.fmhttps://measurements-beta.ooni.io/api/v1/measurement/10f1f5ad-91c4-46f5-9d6a-38e455fd7158 "http://www.earthwatch.orghttps://measurements-beta.ooni.io/api/v1/measurement/14520322-b00d-437c-be29-22dc9b2cdc75 "http://abpr2.railfan.nethttps://measurements-beta.ooni.io/api/v1/measurement/1bb1aa36-f4fe-440c-be03-6029955c90ea "http://666games.nethttps://measurements-beta.ooni.io/api/v1/measurement/210703f1-e52c-4740-99b3-36c2db849cc1 "http://amphetamines.com" 
So the problem here is that since `measurement_id` is actually not unique, when you go to retrieve the individual measurement, you will be getting the first entry instead of the actual measurement you were searching for (see this bit of the code: https://github.com/TheTorProject/ooni-measurements/blob/master/measurements/api/measurements.py#L178).

I think at this point the best thing to do is to use some other key to point to the actual measurement you care about and expose that in place of the measurement_id in the top level `/measurements` API search endpoint.

~ Arturo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.torproject.org/pipermail/ooni-dev/attachments/20170725/b8d27760/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 208 bytes
Desc: Message signed with OpenPGP using AMPGpg
URL: <http://lists.torproject.org/pipermail/ooni-dev/attachments/20170725/b8d27760/attachment.sig>


More information about the ooni-dev mailing list