Greetings!
It would be great to get some feedback on the specification of oonib that is the backend component of ooniprobe.
As later said, not everything described here is implemented, but this document details what oonib should become.
Moreover if there is something that you believe we should pay special attention to in when implementing this it would be of great use to know.
If you wish to modify this spec directly you can send us a pull request on github:
https://github.com/TheTorProject/ooni-spec
-------------------------------------
# oonib specification
* version: 0.1 * date: 2013-03-04 * author: Arturo Filastò
This document aims at providing a functional specification of oonib. At the time of writing this document not all parts are fully implemented, though the application interface to oonib is.
# 1.0 System overview
oonib is the backend component of ooni. It is responsible for:
* Collecting the results of tests from ooni-probes (Collector).
* Exposing a set of services that are needed for allowing ooni-probes to perform their tests (Test Helpers).
# 2.0 Collector
## 2.1 System overview
The oonib collector exposes a JSON RPC like HTTP interface and allows probes to submit the results of their measurements. Once probe measurement is complete the test result is published as an ooni test report in the YAML format.
The oonib collector shall be exposed as a Tor Hidden Service and as a HTTPS service.
## 2.2 Theat model
The collector shall provide end-to-end between the probe and the oonib collector.
A malicious actor should not be able to update test results that they have not created.
It is outside of the scope of the oonib collector to provide blocking resistance or to conceal to a passive network observer the fact that they are communicating to a collector. Such properties are to be provided by other software components, for example using Tor or obfsproxy.
## 2.3 Test result submission interface
Unless otherwise stated all of the network operations below can be performed either via HTTPS or HTTPO (HTTP over Tor Hidden Service).
Note: we will eventually want to migrate over to using YAML instead of JSON as a data exchange format. Not doing so adds unnecessary overhead in including YAML data inside of JSON data.
### 2.3.1 Create a new report
When a probe starts a test they will *create* a new report with the oonib collector backend. The HTTP request it performs is:
`POST /report`
{
'software_name': `string` the name of the software that is creating a report (ex. "ooni-probe")
'software_version': `string` the version of the software creating the report (ex. "0.0.10-beta")
'probe_asn': `string` the Authonomous System Number of the network the test is related to prefixed by "AS" (ex. "AS1234")
'test_name': `string` the name of the test performing the network measurement. In the case of ooni-probe this is the test filename without the ".py" extension.
'test_version': `string` the version of the test peforming the network measurement.
'content': (optional) `string` it is optionally possible to create a report with already some data inside of it.
'probe_ip': (optional) `string` the IP Address of the ooniprobe client. When the test requires a test_helper the probe should inform oonib of it's IP address. We need to know this since we are not sure if the probe is accessing the report collector via Tor or not. }
The server will respond with a report identifier that will then allow the probe to update the report and the version of the backend software like follows:
{
'backend_version': `string` containing the version of the backend
'report_id': `string` report identifier of the format detailed below.
'test_helper_address': `string` the address of a test helper for the requested test.
}
The report identifier allows the probe to update the report and it will be contructed as follows:
ISO 8601 timestamp + '_' + probe ASN + '_' + 50 mixed lowercase uppercase characters
A report identifier can be for example:
1912-06-23T101234Z_AS1234_ihvhaeZDcBpDwYTbmdyqZSLCDNuJSQuoOdJMciiQWwAWUKJwmR
Note: The report identifier should be at least 256 bits and generated by means of a CSPRNG.
Client implementation notes: Probes should not expect the report identifier to be in a particular format as the report id may be changed in the future.
### 2.3.2 Update a report
Once the probe has a report ID they will be able to add test related content to the report by referencing it by id:
`PUT /report`
{
'report_id': `string` the report identifier
'content': `string` content to be added to the report. This can be one or more report entries in the format specified in df-000-base.md
}
The backend should validate the request to make sure it is a valid YAML Stream.
New collectors should use the following format for updating reports:
`POST /report/<report_id>`
{
content: `string` content to be added to the report. This can be one or more report entries in the format specified in df-000-base.md
}
### 2.3.3 Closing a report
This request is done by a probe to tell the backend that they have finished running the test and the report can be considered done:
`POST /report/<report_id>/close`
To create a new report a probe will peform an HTTP POST request to the resource /report.
The collector MUST implement the following HTTP JSON RPC like API:
/report
## 2.4 Report lifecycle
When a report is created (section 2.3.1) it should be marked as NEW and a timestamp should be recorded.
Once it is updated (section 2.3.2) it should be marked as ACTIVE. An ACTIVE report can be updated with new data. The collector must keep track of the last time a certain report is updated with new and valid data.
A report is considered CLOSED when either the probe instructs the collector to close the report (section 2.3.3) or when a report is in ACTIVE state and has not been updated with valid data for more than 2 hours.
## 2.5 Report publishing and cleaning
Once a report is closed it should be made available to the public for download and analysis. This shall happen as soon as a report reaches the CLOSED state.
Reports should be discarded and deleted if:
* They have been in the NEW state for more than 4 hours
* They are CLOSED, but contain only one Report Entry meaning that the report entry contains only the base test data (see df-000-base.md).
Reports should be published to:
`https://ooni.torproject.org/reports/%60 **reportFormatVersion** `/` **CC** `/`
Requesting such URL may also result in a 302 to the location of reports for that specific country.
Where CC is the two letter country code as specified by ISO 31666-2.
For example the reports for Italy (CC is it) of the reportVersion 0.1 may be found in:
https://ooni.torproject.org/reports/0.1/IT/
This directory shall contain the various reports for the test using the following convention:
test name - timestamp in ISO8601 format - probe AS number - probe|backend.yamloo
The timestamp is expressed using ISO 8601 including seconds and with no : to delimit hours, minutes, days.
Such date is the time in which the report was created and must be set by the backend.
Like so:
YEAR - MONTH - DAY T HOURS MINUTES SECONDS Z
The time is always expressed in UTC.
If a collision is detected then an int (starting with 1) will get appended to the test.
For example if two report that are created on the first of January 2012 at Noon (UTC time) sharp from MIT (AS3) will be stored here:
https://ooni.torproject.org/reports/0.1/US/http_test-2012-01-01T120000Z-AS3-... https://ooni.torproject.org/reports/0.1/US/http_test-2012-01-01T120000Z-AS3-...
Implementation notes: The task of publishing a report should be made modular so that we can replace the publishing mechanism if we discover the limitations of this system (not infinite disk space).
The basic implementation shall just scp the files to a machine that is configurable by config file.
?? Question: How does this integrate into the m-lab infrastructure?
How will this work when some reports results are stored on m-lab and some are not?
## 2.6 Test helper collector relationship
Some tests involve adding to the report also data that is collected by means of a test helper. This is the case in the two way traceroute test, where the report should include a traceroute also from the vantage point of the ooni backend.
In these circumstances the report from the vantange point of the backend should be inside of a separate file that has the same name of the probe report but "-probe" should be replaced with "-backend".
For example the backend part of the report for a traceroute test called `two_way_traceroute-2012-01-01T120000Z-AS3-probe.yamloo`, shall be called
`two_way_traceroute-2012-01-01T120000Z-AS3-backend.yamloo`
# 3.0 Test Helpers
These are services exposed to ooniprobe clients that are of assistance to performing network measurements.
# 3.1 System overview
Probes will always receive as address of a test helper that of the one running on the same machine as the collector.
They will only point to a test helper on a different machine if the test helper for the test the user is interested in running is not available on the desired machine.
This can happen, for example, if the machine does not have two network interfaces with two differnet IP addresses. In this case the HTTP Return JSON Headers test helper cannot run at the same time as the TCP echo test helper (both bind to port 80).
Some communication with the collector is required. This is the case of the two way traceroute test, where a multiport multiprotocol traceroute must also be performed from the backend to the probe.
Test helpers are two kinds:
* Reply: are test helpers that reply to requests from probes. For example HTTP test helpers are of this kind.
* Active: are test helpers that actively perform requests towards the probe idepedently from probe requests.
Implementation notes: Although I am talking about the collector as two differnet software components they both run inside of the same process and are part of the same piece of software.
## 3.2 Threat model (or non-goals)
Because of the nature of the services that they are exposing it is not possible to guarantee end to end confidentiality and authentication of the data transmitted to and from test helpers.
Moreover we are currently not making any particular effort to make test helpers look like something that they are not (i.e. make test helper traffic not look like test helper traffic).
## 3.3 Test Helper collector mapping
When a report that requires a test helper is created with the collector component of oonib the test helper should be notified of the probes IP address (the probe_ip).
This will allow the test helper to know from which IP address it should either expect the probe to come from or towards what IP address it should perform an active measurement.
## 3.4 Reply Test Helpers
### 3.4.1 HTTP Return JSON Headers
This test helper will bind on port 80 and expect HTTP requests from a probe. It shall respond to every HTTP request with the HTTP Headers and HTTP request line as seen from the backend point of view.
The response is structured in JSON as follows:
{
'request_headers': [[HTTP header1 name, HTTP header1 value], [HTTP header2 name, HTTP header2 value]] the list is ordered based on how the headers were received.
'request_line': the value of the HTTP request line.
'headers_dict': `dict` containing as keys the HTTP header name (normalized) and as value a list containing the values of such header
}
For example:
{
'request_headers': [['User-Agent', 'IE6'], ['Content-Length', 200]] 'request_line': 'GET / HTTP/1.1'
'headers_dict': {'User-Agent': ['IE6'], 'Content-Length': [200]} }
### 3.4.2 DNS Test Helper
Shall provide a DNS resolver over UDP and TCP exposed on port 53. Such DNS resolver shall not filter any DNS query.
### 3.4.3 TCP Echo Helper
This shall expose a TCP echo service that is bound to port 80.
## 3.5 Active Test Helpers
### 3.5.1 Two way traceroute
This shall perform the ooniprobe traceroute test and attach the result to the final report as described in section 2.5.