[anti-censorship-team] Plans for BridgeDB's future

Thu May 14 14:28:53 UTC 2020

On Tue, May 12, 2020 at 03:17:07PM -0700, Philipp Winter wrote:
> In a nutshell, Salmon hands out bridges to users while maintaining
> per-user reputation scores.  Alice's reputation goes up if her bridge
> remains unblocked after she got it, and it goes down if the bridge is
> blocked after Alice got it.  Users can invite each other, which results
> in an invitation graph.  Eventually, we will hopefully have a Salmon
> distributor which will live alongside our Moat, HTTPS, and Email
> distributors.

I am still optimistic about Salmon. Here are some thoughts I wrote while
re-reading it, highlighting unanswered pieces and potential thorny issues
once you look deeper into the design.

(1) How critical is the initial anti-sybil step? That is, how much impact
does its effectiveness have on whether the rest of the system works? My
mental model here is like 'sensitivity analysis', where the question is
whether a small change here produces a big change later in the system.

I guess the answer is: when not under attack, it doesn't much matter
how effective the anti-sybil step is. But under a good attack, if the
attacker can become "most" of the new users joining the system, and
especially if they can sustain that, then there's a bad spiral where
new real users can't ever get good service so they leave, making it
easier for the attacker to sustain "most".

On the plus side though, the attacker really needs to sustain the attack
-- if they miss out on getting a user onto any bridge, then the people
who got that bridge will be winners.

(2) From an engineering angle, is "check the Facebook account" the
right primitive to use here? How does the user prove possession of the
Facebook account to us? Are there libraries that do this, and if not,
what do they know that we don't know. :)

(3) How dangerous is the social graph of which users recommended which
users? The authors argue that the censors essentially already have this
data from real social networks, so there's no additional risk here. Do
we buy that?

Can this "social graph" component of the design be treated as a modular
thing and made less dangerous, like with cool crypto techniques,
without losing functionality? Is that essentially a hybrid design with
other papers?

(One example I was pondering would be to add randomized "noise" links
for some fraction of new users, so it looks like they were recommended,
and now when you see a recommendation link, you know it's 50% likely
to be artificial. But that idea seems to be at odds with the strategy
of putting low-trust users onto high-trust servers alongside their
high-trust recommenders.)

(4) I still worry about false positives, where a bridge gets blocked
for other reasons than "one of the users we assigned to it was bad". At
the extreme end, imagine Iran rolls out protocol whitelisting for a day:
obfs4 stops working, and we blame all the users of all the bridges.

In the analysis, imagine another variable, which is the chance that
the adversary discovered the bridge through some other (external)
mechanism. The paper assumes this chance is 0%, but if it goes to 5%,
how much falls apart? This question makes me especially nervous because
"what if the rate isn't independent for each bridge?"

Similarly, another variable is the false positive rate where we
mistakenly think a server got blocked but actually it just went down. In
theory we can make this rate 0% by doing external reachability checking
too, but if those results are noisy, what does it do to the design? All
users eventually get kicked out no matter what? Do we need a slow
'forgiveness' multiplier to make things work in the long term? Or does
regrowing trust undermine the rest of the design too much, because it
is more useful to the adversary than it is to the users?

(5) How to adapt this design to multiple countries? Right now all the
analysis is "if the bridge gets blocked", but if it gets blocked in
China but not Venezuela, is it clear what we do? It seems like we need
to blame every user (no matter what country they claim to be in), but
maybe we only need to move the blocked ones.

Do we want "per-country trust scores" for each user? Or is "four strikes,
anywhere in the world, even if two of them are China and two of them are
Venezuela, and you're out" good enough? (Back in 2016, the authors told
me they had in mind to track per-country trust scores, and ban the user
if they got four strikes in any single country.)

In order to move only the blocked users, do we need to know which country
each user is in? How much does it matter that sometimes blocking isn't
uniform within a country?

(6) What is the right mechanism for the out-of-band (meta communication)
channel? The paper suggests email, which has the advantage that if the
account identifier is an email address, we can use it to push info to
users, like "we think your bridge went away, here's a new one."

But the moat experience is compelling too, where Tor Browser
uses meek-azure to get updates as needed, and then so long as that
mechanism still works, the entire process of replacing the bridge can be
automated.

Tying the identity into the Tor Browser install introduces some subtle
usability questions, like how to export your identity if you want to
switch Tor Browsers, and whether many Tor Browsers can "be" the same
person (and what that does to load balancing on bridges).

In fact, the paper expects this automation, with ideas like "when the
client tells the directory that the server is offline" -- in 2016 the
authors told me they wanted to automate the email interaction, including
logging in to the email server, auto parsing the mails, etc. But now
moat might be easier to build as the first iteration.

(7) Do we need to batch "reassignment" events? That is, if ten users
have a bridge, and the bridge gets blocked, now we have ten users at the
same trust level who need a new bridge. If we assign all ten pending
users to the same new bridge, then we're not actually spreading users
out after blocking events. Other reputation designs have epochs where
all the reconfiguration changes happen between epochs, but here the
security comes from minimizing reconfigurations. How do we balance the
need to randomize replacement allocations vs getting users up and running
quickly with a new working bridge?

(8) How much better can the adversary strategies get, for blocking as
many users as possible?

For a first example, here's a better strategy than any of the ones in
the paper: "the attacker defects early when k or fewer attackers are
in the group, and it waits a while to defect when it has more than k
attackers in the group."

An even better strategy might take advantage of whatever reassignment
batching algorithm we pick in #7 above, to decide which *combination*
of bridges to fail at once, in order to get the desired set of users in
the reassignment phase.

--Roger