[ooni] proposed specification for ooni backend component

6 Mar 2013

      Greetings!

It would be great to get some feedback on the specification of oonib that is
the backend component of ooniprobe.

As later said, not everything described here is implemented, but this document
details what oonib should become.

Moreover if there is something that you believe we should pay special attention
to in when implementing this it would be of great use to know.

If you wish to modify this spec directly you can send us a pull request on github:

https://github.com/TheTorProject/ooni-spec

-------------------------------------

# oonib specification

* version: 0.1
* date: 2013-03-04
* author: Arturo Filastò

This document aims at providing a functional specification of oonib. At the
time of writing this document not all parts are fully implemented, though the
application interface to oonib is.

# 1.0 System overview

oonib is the backend component of ooni. It is responsible for:

  * Collecting the results of tests from ooni-probes (Collector).

  * Exposing a set of services that are needed for allowing ooni-probes to
    perform their tests (Test Helpers).

# 2.0 Collector

## 2.1 System overview

The oonib collector exposes a JSON RPC like HTTP interface and allows probes to
submit the results of their measurements. Once probe measurement is complete
the test result is published as an ooni test report in the YAML format.

The oonib collector shall be exposed as a Tor Hidden Service and as a HTTPS
service.

## 2.2 Theat model

The collector shall provide end-to-end between the probe and the oonib
collector.

A malicious actor should not be able to update test results that they have not
created.

It is outside of the scope of the oonib collector to provide blocking
resistance or to conceal to a passive network observer the fact that they are
communicating to a collector.
Such properties are to be provided by other software components, for example
using Tor or obfsproxy.

## 2.3 Test result submission interface

Unless otherwise stated all of the network operations below can be performed
either via HTTPS or HTTPO (HTTP over Tor Hidden Service).

Note: we will eventually want to migrate over to using YAML instead of JSON as
a data exchange format. Not doing so adds unnecessary overhead in including
YAML data inside of JSON data.

### 2.3.1 Create a new report

When a probe starts a test they will *create* a new report with the oonib
collector backend.
The HTTP request it performs is:

`POST /report`

    {

     'software_name':
        `string` the name of the software that is creating a report (ex. "ooni-probe")

     'software_version':
        `string` the version of the software creating the report (ex. "0.0.10-beta")

     'probe_asn':
        `string` the Authonomous System Number of the network the test is
          related to prefixed by "AS" (ex. "AS1234")

     'test_name':
        `string` the name of the test performing the network measurement. In
          the case of ooni-probe this is the test filename without the ".py"
          extension.

     'test_version':
        `string` the version of the test peforming the network measurement.

     'content':
        (optional) `string` it is optionally possible to create a report with
          already some data inside of it.

     'probe_ip':
        (optional) `string` the IP Address of the ooniprobe client. When the
          test requires a test_helper the probe should inform oonib of it's IP
          address. We need to know this since we are not sure if the probe is
          accessing the report collector via Tor or not.
     }

The server will respond with a report identifier that will then allow the probe
to update the report and the version of the backend software like follows:

    {

      'backend_version':
        `string` containing the version of the backend

      'report_id':
        `string` report identifier of the format detailed below.

      'test_helper_address':
        `string` the address of a test helper for the requested test.

    }

The report identifier allows the probe to update the report and it will be
contructed as follows:

  ISO 8601 timestamp + '_' + probe ASN + '_' + 50 mixed lowercase uppercase characters

A report identifier can be for example:

  1912-06-23T101234Z_AS1234_ihvhaeZDcBpDwYTbmdyqZSLCDNuJSQuoOdJMciiQWwAWUKJwmR

Note:
The report identifier should be at least 256 bits and generated by means of a
CSPRNG.

Client implementation notes:
Probes should not expect the report identifier to be in a particular format as
the report id may be changed in the future.

### 2.3.2 Update a report

Once the probe has a report ID they will be able to add test related content to
the report by referencing it by id:

`PUT /report`

    {

    'report_id':
      `string` the report identifier

    'content':
      `string` content to be added to the report. This can be one or more
        report entries in the format specified in df-000-base.md

    }

The backend should validate the request to make sure it is a valid YAML Stream.

New collectors should use the following format for updating reports:

`POST /report/<report_id>`

    {

    content:
      `string` content to be added to the report. This can be one or more
        report entries in the format specified in df-000-base.md

    }

### 2.3.3 Closing a report

This request is done by a probe to tell the backend that they have finished
running the test and the report can be considered done:

`POST /report/<report_id>/close`

To create a new report a probe will peform an HTTP POST request to the resource
/report.

The collector MUST implement the following HTTP JSON RPC like API:

    /report

## 2.4 Report lifecycle

When a report is created (section 2.3.1) it should be marked as NEW and a
timestamp should be recorded.

Once it is updated (section 2.3.2) it should be marked as ACTIVE. An ACTIVE
report can be updated with new data. The collector must keep track of the last
time a certain report is updated with new and valid data.

A report is considered CLOSED when either the probe instructs the collector to
close the report (section 2.3.3) or when a report is in ACTIVE state and has
not been updated with valid data for more than 2 hours.

## 2.5 Report publishing and cleaning

Once a report is closed it should be made available to the public for download
and analysis. This shall happen as soon as a report reaches the CLOSED state.

Reports should be discarded and deleted if:

  * They have been in the NEW state for more than 4 hours

  * They are CLOSED, but contain only one Report Entry meaning that the report
    entry contains only the base test data (see df-000-base.md).

Reports should be published to:

`https://ooni.torproject.org/reports/` **reportFormatVersion** `/` **CC** `/`

Requesting such URL may also result in a 302 to the location of reports for
that specific country.

Where CC is the two letter country code as specified by ISO 31666-2.

For example the reports for Italy (CC is it) of the reportVersion 0.1 may be
found in:

https://ooni.torproject.org/reports/0.1/IT/

This directory shall contain the various reports for the test using the
following convention:

test name - timestamp in ISO8601 format - probe AS number - probe|backend.yamloo

The timestamp is expressed using ISO 8601 including seconds and with no : to
delimit hours, minutes, days.

Such date is the time in which the report was created and must be set by the
backend.

Like so:

YEAR - MONTH - DAY T HOURS MINUTES SECONDS Z

The time is always expressed in UTC.

If a collision is detected then an int (starting with 1) will get appended to
the test.

For example if two report that are created on the first of January 2012 at Noon
(UTC time) sharp from MIT (AS3) will be stored here:

https://ooni.torproject.org/reports/0.1/US/http_test-2012-01-01T120000Z-AS3-...
https://ooni.torproject.org/reports/0.1/US/http_test-2012-01-01T120000Z-AS3-...

Implementation notes:
The task of publishing a report should be made modular so that we can replace
the publishing mechanism if we discover the limitations of this system (not
infinite disk space).

The basic implementation shall just scp the files to a machine that is
configurable by config file.

?? Question:
How does this integrate into the m-lab infrastructure?

How will this work when some reports results are stored on m-lab and some are
not?

## 2.6 Test helper collector relationship

Some tests involve adding to the report also data that is collected by means of
a test helper. This is the case in the two way traceroute test, where the
report should include a traceroute also from the vantage point of the ooni
backend.

In these circumstances the report from the vantange point of the backend should
be inside of a separate file that has the same name of the probe report but
"-probe" should be replaced with "-backend".

For example the backend part of the report for a traceroute test called
`two_way_traceroute-2012-01-01T120000Z-AS3-probe.yamloo`, shall be called

`two_way_traceroute-2012-01-01T120000Z-AS3-backend.yamloo`

# 3.0 Test Helpers

These are services exposed to ooniprobe clients that are of assistance to
performing network measurements.

# 3.1 System overview

Probes will always receive as address of a test helper that of the one running
on the same machine as the collector.

They will only point to a test helper on a different machine if the test helper
for the test the user is interested in running is not available on the desired
machine.

This can happen, for example, if the machine does not have two network
interfaces with two differnet IP addresses. In this case the HTTP Return JSON
Headers test helper cannot run at the same time as the TCP echo test helper
(both bind to port 80).

Some communication with the collector is required. This is the case of the two
way traceroute test, where a multiport multiprotocol traceroute must also be
performed from the backend to the probe.

Test helpers are two kinds:

  * Reply: are test helpers that reply to requests from probes. For example
    HTTP test helpers are of this kind.

  * Active: are test helpers that actively perform requests towards the probe
    idepedently from probe requests.

Implementation notes:
Although I am talking about the collector as two differnet software components
they both run inside of the same process and are part of the same piece of
software.

## 3.2 Threat model (or non-goals)

Because of the nature of the services that they are exposing it is not possible
to guarantee end to end confidentiality and authentication of the data
transmitted to and from test helpers.

Moreover we are currently not making any particular effort to make test helpers
look like something that they are not (i.e. make test helper traffic not look
like test helper traffic).

## 3.3 Test Helper collector mapping

When a report that requires a test helper is created with the collector
component of oonib the test helper should be notified of the probes IP
address (the probe_ip).

This will allow the test helper to know from which IP address it should either
expect the probe to come from or towards what IP address it should perform an
active measurement.

## 3.4 Reply Test Helpers

### 3.4.1 HTTP Return JSON Headers

This test helper will bind on port 80 and expect HTTP requests from a probe. It
shall respond to every HTTP request with the HTTP Headers and HTTP request line
as seen from the backend point of view.

The response is structured in JSON as follows:

    {

        'request_headers':
            [[HTTP header1 name, HTTP header1 value], [HTTP header2 name, HTTP header2 value]]
            the list is ordered based on how the headers were received.

        'request_line':
            the value of the HTTP request line.

        'headers_dict':
            `dict` containing as keys the HTTP header name (normalized) and as
            value a list containing the values of such header

    }

For example:

    {

        'request_headers':
            [['User-Agent', 'IE6'], ['Content-Length', 200]]
        'request_line':
            'GET / HTTP/1.1'

        'headers_dict':
            {'User-Agent': ['IE6'], 'Content-Length': [200]}
    }

### 3.4.2 DNS Test Helper

Shall provide a DNS resolver over UDP and TCP exposed on port 53. Such DNS
resolver shall not filter any DNS query.

### 3.4.3 TCP Echo Helper

This shall expose a TCP echo service that is bound to port 80.

## 3.5 Active Test Helpers

### 3.5.1 Two way traceroute

This shall perform the ooniprobe traceroute test and attach the result to the
final report as described in section 2.5.

Arturo Filastò

tags

participants (1)