[ooni-dev] ooni-pipeline using local time zone when parsing timestamps

Arturo Filastò art at torproject.org
Thu Mar 17 13:54:49 UTC 2016


So I looked a bit into this matter and it seems like the root of the problem is actually in ooni-probe:

The problem is this function here:
https://github.com/TheTorProject/ooni-probe/blob/master/ooni/otime.py#L87

that is called with the result of time.time(). Since time.time() should return a
seconds since UTC Epoch, the result of then converting it again with utcfromtimestamp()
leads it to actually being converted back into the local timezone of the probe.

This issue seems like something much harder to fix in historical data, but
I am going to include a fix for this in versions 1.4.0 of ooni-probe together
with the rest of the JSON data format changes.

~ Arturo

On Mar 17, 2016, at 07:04, David Fifield <david at bamsoftware.com> wrote:
> 
> Parts of the ooni-pipeline are using datetime.fromtimestamp instead of
> datetime.utcfromtimestamp. It means the timestamp will be parsed
> differently according to the time zone in which the code is run:
> 
> https://github.com/TheTorProject/ooni-pipeline/blob/355ac1780f1f05eefb9ea3bf5b5c0148904e888c/pipeline/batch/daily_workflow.py#L521
>        entry['test_start_time'] = datetime.fromtimestamp(entry.pop('start_time',
>                                        0)).strftime("%Y-%m-%d %H:%M:%S")
> 
> https://github.com/TheTorProject/ooni-pipeline/blob/355ac1780f1f05eefb9ea3bf5b5c0148904e888c/pipeline/batch/daily_workflow.py#L583
>        start_time = datetime.fromtimestamp(header.get('start_time', 0))
> 
> Now, it appears that there's no actual harm done by this bug, because it
> looks like the timestamps were read and written by someone in UTC+1, so
> the conversion evens out in the final JSON. But it could go wrong if run
> by someone in a different time zone.
> 
> I noticed this because when I process the new JSON files, I get
> timestamps that are an hour different than I got parsing the YAML files.
> To parse the JSON times, I use datetime.datetime.strptime(..., "%Y-%m-%d %H:%M:%S"),
> which gives the correct answer because of the reason in the previous
> paragraph. My old YAML times were off by an hour, because I used
> datetime.datetime.utcfromtimestamp(...), parsing a UTC+1 timestamp as if
> it were UTC.
> 
> 
> Here's some related information I found while researching the above. It
> looks like individual test results are *also* using local timestamps, or
> have their clocks set wrong, or something. Because when I compare the
> "test_start_time" field to the "Date" header (which is always supposed
> to be UTC) in a response to an http_requests test, I get some widely
> divergent values (I saw up to around 9 hours).
> 
> For example consider
> https://measurements.ooni.torproject.org/2015-01-01/20150101T060029Z-BE-AS29182-http_requests-no_report_id-0.1.0-probe.json
> It has a difference of about 6 hours:
> 	{
> 		"test_start_time":"2015-01-01 07:00:29",
> 		"test_keys": {
> 			"requests": [
> 				{
> 					"response": {
> 						"headers": {
> 							"Date": "Thu, 01 Jan 2015 01:00:35 GMT"
> 							...
> 						}
> 					}
> 				}
> 			]
> 		},
> 		probe_cc: "BE"
> 	}
> probe_cc is "BE", Belgium, which is supposed to be UTC+1, so I don't
> know where the 6-hour discrepancy is coming from. Note also that the
> timestamp in the file name is not the same: 060029.
> 
> Here's a summary of the http_requests reports from 2015-01-01. Most of
> them match, but some are off by 6 or 1 hour:
> 
> CC: CH  test_start_time: 2015-01-01 03:00:28  Date: Thu, 01 Jan 2015 02:59:14 GMT  +0
> CC: BE  test_start_time: 2015-01-01 07:00:29  Date: Thu, 01 Jan 2015 01:00:35 GMT  +6
> CC: FR  test_start_time: 2015-01-01 09:41:08  Date: Thu, 01 Jan 2015 09:41:14 GMT  +0
> CC: FR  test_start_time: 2015-01-01 10:47:20  Date: Thu, 01 Jan 2015 10:47:23 GMT  +0
> CC: NL  test_start_time: 2015-01-01 12:42:16  Date: Thu, 01 Jan 2015 12:42:21 GMT  +0
> CC: NL  test_start_time: 2015-01-01 15:41:32  Date: Thu, 01 Jan 2015 15:41:44 GMT  +0
> CC: CZ  test_start_time: 2015-01-01 15:46:27  Date: Thu, 01 Jan 2015 15:46:34 GMT  +0
> CC: NL  test_start_time: 2015-01-01 17:19:39  Date: Thu, 01 Jan 2015 17:19:43 GMT  +0
> CC: HK  test_start_time: 2015-01-01 18:03:57  Date: Thu, 01 Jan 2015 17:04:10 GMT  +1
> CC: NL  test_start_time: 2015-01-01 21:47:32  Date: Thu, 01 Jan 2015 21:47:37 GMT  +0
> 
> I used this script:
> 
> import json
> import sys
> for line in sys.stdin:
>    doc = json.loads(line)
>    try:
>        probe_cc = doc["probe_cc"]
>        test_start_time = doc["test_start_time"]
>        for request in doc["test_keys"]["requests"]:
>            response_time = request["response"]["headers"]["Date"]
>            print "CC: %s  test_start_time: %s  Date: %s" % (probe_cc, test_start_time, response_time)
>    except KeyError:
>        pass
> _______________________________________________
> ooni-dev mailing list
> ooni-dev at lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/ooni-dev
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 236 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.torproject.org/pipermail/ooni-dev/attachments/20160317/ebb88b10/attachment.sig>


More information about the ooni-dev mailing list