[ooni-dev] test_start_time in JSON reports

Arturo Filastò art at torproject.org
Thu Mar 17 13:33:03 UTC 2016

On Mar 17, 2016, at 07:28, David Fifield <david at bamsoftware.com> wrote:
> The YAML reports had two time fields:
> 	start_time: timestamp of the start of ooni-probe run
> 	test_start_time: timestamp of the start of each individual test
> Within a single report file, start_time was constant, while
> test_start_time would advance with each successive test, depending on
> how long each test took to run.
> The JSON format reports have just one of the fields, test_start_time,
> but it confusingly appears to have the same meaning as start_time in the
> old YAML reports (it doesn't change within a report file):
> 	test_start_time: timestamp of the start of ooni-probe run
> It might be because of this code:
> https://github.com/TheTorProject/ooni-pipeline/blob/355ac1780f1f05eefb9ea3bf5b5c0148904e888c/pipeline/batch/daily_workflow.py#L521
>        entry['test_start_time'] = datetime.fromtimestamp(entry.pop('start_time',
>                                        0)).strftime("%Y-%m-%d %H:%M:%S”)

Gosh you are right. This is a pretty serious bug in the data pipeline.

The goal there was to actually only retain the ‘test_start_time’ field a drop
the ‘start_time’ since that is no longer relevant now that the discrete unit
is that of a measurement.

I guess since loosing information is perhaps not ideal I will add both of them back
in the final JSONs.

Since we are currently using the test_start_time in a lot of the database views for
generating some aggregates, the most straightforward thing to do is to add
another field called “measurement_start_time” that will represent the value
of what used to be “test_start_time”, while “test_start_time” will continue meaning
what previously was “start_time”.

> Some tests can take many minutes or hours to run, so by the end, the
> JSON test_start_time might be far off from the real time when it was
> run. Is there a way we could get both timestamps in each record again?
> What I'm doing now is incrementing a counter according to the
> test_runtime field of each record, and adding that counter to the
> test_start_time in order to estimate the individual test's start time.
> But I feel that is only approximate, and some older reports do not have
> test_runtime.

I will re-run the pipeline again on all historical data to re-populate these fields according
to the changes mentioned above.

~ Arturo

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 236 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.torproject.org/pipermail/ooni-dev/attachments/20160317/65167f1d/attachment.sig>

More information about the ooni-dev mailing list