I would like to follow up on the discussion we had in Florence on some design choices behind OONIB.
In particular the most controversy was around using HTTP or rsync.
Before discussion the pro and contra about one choice over the other it would be useful to frame what are exactly the requirements for OONIB.
# What is OONIB
OONIB is the backend of OONI. It will run mainly on one centralized machine and may in a later stage run distributed across multiple ones. Currently we have not though of how to make it scale to being distributed, so we will look at it as if it were running only on one central machine.
It will be responsible for:
* Reporting a) Collecting reports from OONIProbes. Such reports are the results of tests. b) Collecting reports from third party censorship measurement tools (e.x. Bismark, NeuBot, etc.)
* Assistance in test running Certain OONI Tests require to have a backend component (e.x. b0wser). On OONIB we will have the serverside component that will assist us in running the test.
Note: Certain tests require the server to make connections to the client. This means that the client will need to request the server to probe them.
* Control Channel Signaling This is required for making some measurements to verify that the data received by the backend specific to the test matches with the one sent by the client.
# What properties we would like it to have note: these are not ordered.
* Efficient even over high latency networks.
* Ease of integration for third party developers.
* Expandable to support requirements of new tests we develop.
* Anonymous
* Secure
# HTTP and rsych comparison note: I will not deal with the security aspects of OONIB. We will suppose to have an encrypted and authenticated transport (this can be TLS, Tor Hidden Services, etc.)
## Rsync:
Pro:
* It supports good compression algorithms
* It's efficient and supports resume
* It does integrity checking on the uploaded files
Contra:
* It's designed only for copying files, this means we can't implement any more advanced API like logic. [*]
* It's not supported by many languages (for example in python we only have an implementation of the rsync algorithm, not of the protocol [1])
* It's not as commonly used by other application developers that have similar requirements.
* Painful to do sanitization of the data sent by clients.
* Does not allow bidirectional communication (Request-Response pattern)
[*] I would like to be able to create a session ID for a specific test and be able to reference such test ID when interacting with the Test helpers. rsync is one way, I push data to the server, but the server cannot signal me back with some data. This largely impeeds it's usefulness as an API interface.
## HTTP: note: I am not necessarily talking only about HTTP, we could use any other protocol with similar properties (e.x. SPDY). I will discuss HTTP because it is the one that I am most familiar with, but don't
Pro:
* Industry standard for exposing APIs
* Supported natively in most programming languages
* Well understood protocol
* Implementation of sanitization of passed data can be done more easily
* Allows bidirectional communication
* Good support in twisted (what we use as a language for OONIB)
Contra:
* Compression is not enabled by default (we can use gzip compression with HTTP 1.1), and no compression for headers.
* No resume support (this can be implemented on top of HTTP, we could even implement the rsyc algorithm on top of HTTP).
* No support for deltas (we can use rsych protocol over HTTP if we really need this).
I feel like we are a bit comparing apples and oranges here and I don't see why we could not use rsync algorithm on top of HTTP. Anyways I would like to get some feedback as to what we should use for something that should have the above described properties.
Thoughts?
- Art.