[anti-censorship-team] Request for comments: Generalising BridgeDB

Tue Jun 9 16:09:53 UTC 2020

We have been struggling with maintaining GetTor and BridgeDB because
both projects have accumulated a considerable amount of technical debt.
Besides, they implement similar tasks (generally speaking, they
distribute resources to users), which means that we are maintaining
redundant code.

Given these issues, we have been thinking about generalising and merging
GetTor's and BridgeDB's architecture.  That could mean 1) turning GetTor
into a BridgeDB distributor or 2) building a new system.  The rest of
this email discusses the latter option.  Below is a diagram that
proposes a design.  On the left side, we have scripts and tools that
take as input resources (e.g., bridge descriptors, Tor Browser download
links, or even HTTPS or Snowflake proxy information) and write them to
our resource DB.  On the right side, we have several distributors (e.g.,
our existing Email/HTTPS/Moat distributors, but possibly also a Salmon,
GetTor, or wolpertinger, our "hand bridges to OONI" distributor).  In
the middle sits the Matchmaker, which receives requests from
distributors and answers them with data from our resource DB.

                                                         ┌───────┐
                                                         │User DB│
 ┌───────────┐                                           └───┬───┘
 │Bridge desc├──┐                                      ┌─────┴─────┐
 │   parser  │  │                                    ┌─┤Salmon dist│
 └───────────┘  │                                    │ └───────────┘
 ┌────────────┐ │┌───────────┐      ┌──────────┐     │ ┌─────────┐
 │Tor Browser ├─┼┤Resource DB├─ SQL─┤Matchmaker├ TCP ┼─┤Moat dist│
 │link scraper│ │└───────────┘      └──────────┘     │ └─────────┘
 └────────────┘ │                                    │ ┌───────────┐
 ┌────────────┐ │                                    └─┤GetTor dist│
 │Snowflake or├─┘                                      └───────────┘
 │https proxy?│
 └────────────┘

Cecylia suggested that all (or some) of these components can live in
separate processes.  That has the following advantages:

* The system becomes easier to maintain and test.
* We can build distributors in different languages.
* A failing distributor doesn't affect the rest of the system.
* A distributor doesn't have direct access to our database.

For now, let's assume that distributors talk to the Matchmaker over TCP
and use a protocol like JSON, gob, or protobufs.  When a distributor
asks the Matchmaker for a resource, it provides the following
information:

* The resource type (e.g., an obfs4 bridge)
* A client-specific ID (e.g., its IP address)
* The client's location (e.g., Russia)

The Matchmaker then responds with one or more resources, similar to how
BridgeDB currently does it.  If any of these resources go offline at
some point, the Matchmaker should notify the distributor so it can react
accordingly.

I would love to hear your thoughts on the above, even if it's far from
comprehensive.  Do you think that the above design sketch accommodates
our present and foreseeable future needs?

Cheers,
Philipp