<html><head><style>body{font-family:Helvetica,Arial;font-size:13px}</style></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">On July 25, 2017 at 2:46:07 AM, David Fifield (<a href="mailto:david@bamsoftware.com">david@bamsoftware.com</a>) wrote:</div> <blockquote type="cite" class="clean_bq"><span><div><div></div><div>I was experimenting with adapting ooni-sync to the /api/v1/measurements
<br>endpoint. A minimal proof of concept patch is attached. While trying it,
<br>I found that the API was returning duplicate measurements and
<br>measurements that don't seem to match the query. I'm using this command:
<br>      ./ooni-sync -xz -directory measurements.archive input=archive.org since=2017-01-01
<br>
<br>
<br></div></div></span></blockquote><div><br></div><div>Excellent, thanks for trying it out!</div><div><br></div><div><blockquote type="cite" class="clean_bq" style="font-family: Helvetica, Arial; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;"><span>Here is a query that at the moment happens to return two results with<span class="Apple-converted-space"> </span><br>the same measurement_id and measurement_url, but different input and<span class="Apple-converted-space"> </span><br>measurement_start_time. There are a few more example of this phenomenon<span class="Apple-converted-space"> </span><br>(I found it 6 times in the first 1000 measurements I downloaded).<span class="Apple-converted-space"> </span><br><br>https://measurements-beta.ooni.io/api/v1/measurements?input=archive.org&limit=100&offset=500&order=asc&order_by=measurement_start_time&since=2017-01-01<span class="Apple-converted-space"> </span><br>{<span class="Apple-converted-space"> </span><br>"input": "http://archive.org",<span class="Apple-converted-space"> </span><br>"measurement_id": "51daa51b-07d2-491e-ba2b-9189e1a08146",<span class="Apple-converted-space"> </span><br>"measurement_start_time": "2017-01-04T01:55:06Z",<span class="Apple-converted-space"> </span><br>"measurement_url": "https://measurements.ooni.torproject.org/api/v1/measurement/51daa51b-07d2-491e-ba2b-9189e1a08146",<span class="Apple-converted-space"> </span><br>"probe_asn": "AS3243",<span class="Apple-converted-space"> </span><br>"probe_cc": "PT",<span class="Apple-converted-space"> </span><br>"report_id": "20170104T105911Z_AS3243_OadZCx9yRNvqKYsLQaQDa3c1swLofXEQNtcplXQ14QrXemKcCT",<span class="Apple-converted-space"> </span><br>"test_name": "web_connectivity"<span class="Apple-converted-space"> </span><br>},<span class="Apple-converted-space"> </span><br>{<span class="Apple-converted-space"> </span><br>"input": "http://wayback.archive.org",<span class="Apple-converted-space"> </span><br>"measurement_id": "51daa51b-07d2-491e-ba2b-9189e1a08146",<span class="Apple-converted-space"> </span><br>"measurement_start_time": "2017-01-04T09:19:42Z",<span class="Apple-converted-space"> </span><br>"measurement_url": "https://measurements.ooni.torproject.org/api/v1/measurement/51daa51b-07d2-491e-ba2b-9189e1a08146",<span class="Apple-converted-space"> </span><br>"probe_asn": "AS3243",<span class="Apple-converted-space"> </span><br>"probe_cc": "PT",<span class="Apple-converted-space"> </span><br>"report_id": "20170104T105911Z_AS3243_OadZCx9yRNvqKYsLQaQDa3c1swLofXEQNtcplXQ14QrXemKcCT",<span class="Apple-converted-space"> </span><br>"test_name": "web_connectivity"<span class="Apple-converted-space"> </span><br>},<span class="Apple-converted-space"> </span><br><br></span></blockquote><div><br></div><div><br></div><div>So I have confirmed that this is in fact an issue (see: <a href="https://github.com/TheTorProject/ooni-pipeline/issues/54">https://github.com/TheTorProject/ooni-pipeline/issues/54</a>) We were somewhat already aware of it, but only after digging more into it the full extent of the issue is apparent, see: <a href="https://github.com/TheTorProject/ooni-pipeline/issues/70">https://github.com/TheTorProject/ooni-pipeline/issues/70</a>.</div><div><br></div><blockquote type="cite" class="clean_bq" style="font-family: Helvetica, Arial; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;"><span><br>I also found that the results included some entries whose "input" field<span class="Apple-converted-space"> </span><br>didn't seem to match the query. Here is a small sample of them. So far<span class="Apple-converted-space"> </span><br>I've found 57/5783 (10%) of downloads whose input doesn't contain<span class="Apple-converted-space"> </span><br>"archive.org".<span class="Apple-converted-space"> </span><br><br>https://measurements-beta.ooni.io/api/v1/measurement/00868be9-2441-42fb-9691-95501d6b93df "http://www.imdb.com"<span class="Apple-converted-space"> </span><br>https://measurements-beta.ooni.io/api/v1/measurement/0214f18c-058c-44ef-b291-9db88cc923dc "http://666games.net"<span class="Apple-converted-space"> </span><br>https://measurements-beta.ooni.io/api/v1/measurement/03b771b2-9f2c-4eee-8835-5128bf9e7832 "http://www.cesr.org"<span class="Apple-converted-space"> </span><br>https://measurements-beta.ooni.io/api/v1/measurement/0cc491bb-30a0-4dea-9271-1f6ba23c2b8a "http://adultfriendfinder.com"<span class="Apple-converted-space"> </span><br>https://measurements-beta.ooni.io/api/v1/measurement/0fdfa57f-836f-4a75-8543-c8dcade5455a "http://last.fm"<span class="Apple-converted-space"> </span><br>https://measurements-beta.ooni.io/api/v1/measurement/10f1f5ad-91c4-46f5-9d6a-38e455fd7158 "http://www.earthwatch.org"<span class="Apple-converted-space"> </span><br>https://measurements-beta.ooni.io/api/v1/measurement/14520322-b00d-437c-be29-22dc9b2cdc75 "http://abpr2.railfan.net"<span class="Apple-converted-space"> </span><br>https://measurements-beta.ooni.io/api/v1/measurement/1bb1aa36-f4fe-440c-be03-6029955c90ea "http://666games.net"<span class="Apple-converted-space"> </span><br>https://measurements-beta.ooni.io/api/v1/measurement/210703f1-e52c-4740-99b3-36c2db849cc1 "http://amphetamines.com"<span class="Apple-converted-space"> </span></span></blockquote></div><p>So the problem here is that since `measurement_id` is actually not unique, when you go to retrieve the individual measurement, you will be getting the first entry instead of the actual measurement you were searching for (see this bit of the code: <a href="https://github.com/TheTorProject/ooni-measurements/blob/master/measurements/api/measurements.py#L178">https://github.com/TheTorProject/ooni-measurements/blob/master/measurements/api/measurements.py#L178</a>).</p><p>I think at this point the best thing to do is to use some other key to point to the actual measurement you care about and expose that in place of the measurement_id in the top level `/measurements` API search endpoint.</p><p>~ Arturo</p></body></html>