[tor-dev] Brainstorming Domain Fronted Bridge Distribution (was meek costs)

isis isis at torproject.org
Wed May 6 10:59:13 UTC 2015

WARNING: much text. so email. very long.

Mike Perry transcribed 13K bytes:
> isis:
> > Additionally, SOW.9. is actually the chronological precursor to SOW.10., the
> > latter of which is implementing rBridge (or at least getting started on it).
> > (Work on this is still waiting on OTF to officially grant me the fellowship,
> > along with the other prerequisite tasks getting finished.)
> > 
> > But just to be clear — since it sounds like you've asked for several new
> > things in that last paragraph :) — which do you want:
> > 
> >   1. Tor Browser users use meek to get to BridgeDB, to get non-meek bridges by:
> >        1.a. Retrieving and solving a CAPTCHA inside Tor Launcher.
> >        1.b. Solving a CAPTCHA on a BridgeDB web page.
> > 
> >   2. Tor Browser users use BridgeDB's domain front, to get non-meek bridges by:
> >        2.a. Retrieving and solving a CAPTCHA inside Tor Launcher.
> >        2.b. Solving a CAPTCHA on a BridgeDB web page.
> >
> > If you want #2, then we're essentially transferring the domain-fronting costs
> > (and the DDoS risks) from meek to BridgeDB, and we'd need to decide who is
> > going to maintain that service, and who is going to pay for it.  Could The
> > Tor Project fund BridgeDB domain fronting?
> I proposed two things in my original email. My #1 is your #1.b. My #2 is
> your #2.a.
> For my #2 (your #2.a), what I want is a separate domain front for
> BridgeDB. It makes the most sense to me for Tor to run its own domain
> front for this.

Got it.

> If for some reason #2.a can't be done, we could do #1.a and use all of
> meek+Tor, but this seems excessive, slow, and potentially confusing for
> users (their Tor client would have to bootstrap twice for each bridge
> set they test).

Well… the cost of the second bootstrap *could* be cut down by persisting the
state file from the first bootstrap… but I see what you mean.  The user
experience doesn't seem like it'd be as smooth.

> I only consider my #1 and #1.b emergency stopgaps, though. In fact, if
> any aspect of this this process is too slow and/or confusing, we won't
> take any load off of meek (unless the browser also starts regularly
> yelling at meek users to donate or something).

Agreed, except for the part about yelling at users to donate.  Asking nicely
and suggesting once or twice, I could get behind. :)

> Honestly, though, I think this is less likely now. If China wasn't
> somehow discouraged from this behavior via some diplomatic backchannel
> or just general public backlash, GreatFire.org would probably still be
> under attack right now.

It seems more likely that China was firing the Great Cannon at GreatFire.org
as a demonstration/warning.

> > > Now that we have a browser updater, I think it is also OK for us to
> > > provide autoprobing options for Tor Launcher, so long as the user is
> > > informed what this means before they select it, and it only happens
> > > once.
> > 
> > Probing all of the different Pluggable Transport types simultaneously provides
> > an excellent training classifier for DPI boxes to learn what new Pluggable
> > Transport traffic looks like.
> > 
> > As long as it happens only once, and only uses the bridges bundled in Tor
> > Browser, I don't see any issue with auto-selecting from the drop-down of
> > transport methodnames in a predefined order.  It's what users do anyway.
> Oh, yes. I am still against "connect to all of the things at the same
> time." The probing I had in mind was to cycle through the transport list
> and try each type, except also obtain the bridges for each type from
> BridgeDB.

But why does Tor Browser need to get bridges from BridgeDB, if it doesn't know
yet which ones will work?  Why not autoprobe with the bundled bridges, then
ask BridgeDB for some more of the kind that works?

> I also think we should be careful about the probing order. I want to
> probe the most popular and resilient transports (such as obfs4) first.

Currently, obfs4 isn't blocked anywhere… so why probe at all when we know
definitely that the first thing we try is going to work?

> > > The autoprobing could then keep asking for non-meek bridges for either a
> > > given type of the user's choice, or optionally all non-meek types (with
> > > an additional warning that this increases their risk of being discovered
> > > as a Tor user).
> > 
> > If the autoprobing is going to include asking BridgeDB (multiple times?) for
> > different types of bridges in the process, whether through a BridgeDB domain
> > front or not, then I think there needs to be more discussion…
> > 
> >   * Do you think could you explain more about the steps this autoprobing
> >     entails?
> 1. User starts a fresh Tor Browser (or one that fails to bootstrap)
> 2. User clicks "Configure" instead of "Connect"
> 3. User says they are censored
> 4. User selects a third radio button on the bridge dialog
>    "Please help me obtain bridges".
> 5. Tor Browser launches a JSON-RPC request to BridgeDB's domain front
>    for bridges of type $TYPE
> 6. BridgeDB responds with a Captcha
> 7. User solves captcha; response is posted back to BridgeDB.
> 8. BridgeDB response with bridges (or a captcha error)
> 9. Tor Launcher attempts to bootstrap with these bridges.
> 10. If bootstrap fails, goto step 5.
> The number of loops for steps 5-10 for each $TYPE probably require some
> intuition on how frequently we expect bridges that we hand out to be
> blocked due to scraping, and how many bridge addresses we really want to
> hand out per Captcha+IP address combination.

Currently, you get the same bridges every time you ask (for some arbitrary
period).  This would definitely require a new Distributor on the backend (not
a problem, and not difficult, just saying).

Why ask multiple times?  Why not just get three bridges per request, and if
that appears to be failing to get 99% of users connected to a bridge, increase
it to four?

> Later, we can replace Captchas with future RBridge-style crypto, though
> we should design the domain front independently from RBridge, IMO.

Agreed; I'm not proposing any crazy crypto here.  Rather, I'd like to know how
this Tor Browser Distributor should behave, and preferably also something
resembling a rationale for why it behaves that way.

Perhaps also it would be better to make yet another new rBridge Distributor
later, separate from the Tor Browser Distributor (this thing needs a name!).
I could imagine some old computers running Tails, and some
Magickal-Anonymity-Routers-Which-Are-Neither-Secure-Nor-Anonymous might not be
capable of running the (modified) rBridge crypto code.

> >   * Is the autoprobing meant to solve the issue of not knowing which transport
> >     will work?  Or the problem of not knowing whether the bridges in Tor
> >     Browser are already blocked?  Or some other problem?
> Both problems at once, though I suspect (or at least hope) that the
> current transport types included with Tor Browser are more likely to be
> blocked by scraping BridgeDB for IP addresses than by DPI.

IMO, the most efficient way is to run a middle relay without the Guard flag,
and log every connection from something that isn't currently in the consensus.

FWIW, the number of suspicious attempts to the HTTPS Distributor has dropped
substantially in the last four months, and to the email distributor has stayed
the about the same.  Off the top of my head, this is likely, hopefully,
something to do with:

 1. actually distributing separate bridges to Tor/proxy users, and actually
    rate limiting them (!!), [0] [1]

 2. actually rotating available bridges, such that large amounts of both time
    and IP space are required to effectively scrape, [2] [3]

 3. semi-automatedly blacklisting stupid email bots, [4]

 4. moving away from ReCAPTCHA to our own CAPTCHA module and expiring stale
    CAPTCHAs after 10 minutes. [5]

> > If we follow BridgeDB's spec, [1] and we allow wish for the logic controlling
> > how Tor Browser users are handled to be separate (and thus more maintainable),
> > then this will require a new bridge Distributor, and we should probably start
> > thinking about the threat model/security requirements, and behaviours, of the
> > new Distributor.  Some design questions we'll need to answer include:
> > 
> >   * Should all points on the Distributor's hashring be reachable at a given
> >     time (i.e., should there be some feasible way, at any given point in time,
> >     to receive any and every Bridge allocated to the Distributor)?
> >
> >   * Or should the Distributor's hashring rotate per time period?  Or should it
> >     have sub-hashrings which rotate in and out of commission?
> > 
> >   * Should it attempt to balance the distribution of clients to Bridges, so
> >     that a (few) Bridge(s) at a time aren't hit with tons of new clients?
> > 
> >   * Should it treat users coming from the domain front as separate from those
> >     coming from elsewhere?  (Is is even possible for clients to come from
> >     elsewhere?  Can clients use Tor to reach this distributor?  Can Tor
> >     Browser connect directly to BridgeDB, not through the domain front?)
> > 
> >   * If we're going to do autoprobing, should it still give out a maximum of
> >     three Bridges per request?  More?  Less?
> Personally, I think the domain fronting distributor should behave
> identically to the closest equivalent distributor that isn't domain
> fronted, both to reduce implementation complexity, and to keep the
> system easy to reason about.
> Before RBridge is implemented, this would mean using the
> X-Domain-Fronted-For header's IP address as if it were the real IP
> address, and index into the hashrings in the same way as we do with the
> web distributor.

"Index into the hashrings" is a culmination of rather complicated structures
to produce a behaviour particular to the current best known configuration of a
particular Distributor.

If you would like it to behave the same as the HTTPS Distributor, should
bridges for Tor/proxy users still be kept in a separate subhashring?  And
users grouped by /32 if IPv6, and /16 if IPv4?  And only allow one set of
bridges every three hours for non-Tor/proxy users, and four sets for all
Tor/proxy users?  And rotate sub-sub-hashrings once per day?  And expire
CAPTCHAs after 10 minutes?  And will you accept that, should I need to change
any of those things for the HTTPS Distributor, that your Tor Browser
Distributor might break?

(I'm goint to start calling the Tor Browser Distributor "Charon", unless
someone has less gruesome name.)

Otherwise, it should likely be separate.  What problem would be solved by
keeping them as the same Distributor, other than (at least at the initial
implementation time) the Distributor code won't be duplicated?  Also, it's not
very much duplication:

(bdb)∃!isisⒶwintermute:(fix/12505-refactor-hashrings_r7 *$<>)~/code/torproject/bridgedb ∴ cloc --quiet lib/bridgedb/https/distributor.py

http://cloc.sourceforge.net v 1.60  T=0.01 s (74.0 files/s, 28780.8 lines/s)
Language                     files          blank        comment           code
Python                           1             56            183            150

Also, I'm about to finish #12506, giving BridgeDB multi-process support and a
speed increase for every Distributor that runs separately.

> I could see an argument that the set of bridges held by the domain
> fronting distributor should be kept separate from the web distributor,
> because heck, way more people should be able to access the domain
> fronted version, and maybe we want to drastically reduce the web
> distributor's pool because nobody can reach it (except for whitelisted
> scrapers and people who don't really need bridges).

Actually!  Current statistical estimates say that many people (~75,000 times
per day) can not only reach it, but can get bridges from it. [6]  :)

If you want more bridges allocated to the Tor Browser Distributor from the
beginning, there's a trick to dump the unallocated bridges into your
Distributor as soon as it first exists.  That's 1/5th the size of the HTTPS
pool, and 2/5ths the size of the email pool; roughly 1,000 bridges.

> However, if you do keep the domain front pool separate from the web
> distributor pool, you should ensure that you also properly handle the
> case where Tor IP addresses appear in the X-Domain-Fronted-For header.
> Again, for this case, I think the simplest answer is "use the same rules
> as the current web distributor does", though if the domain front pool is
> separate, perhaps the Tor fraction should be much smaller.


Fortunately, there are separate structures for handling the current Tor exit
list, in memory, in the main process, [7] [8] because I prefer keeping various
functionalities organised separately, from generalised to specialised, to
reduce code duplication, and increase reusability and maintainability.

No changes required to take Tor users into consideration, or not take them
into consideration.  ItJustWorks™.  :)

[0]: https://bugs.torproject.org/4771#comment:14
[1]: https://bugs.torproject.org/4405
[2]: https://bugs.torproject.org/1839
[3]: https://bugs.torproject.org/15517
[4]: https://bugs.torproject.org/9385
[5]: https://bugs.torproject.org/11215
[6]: https://people.torproject.org/~isis/otf-etfp-sow.pdf#subsection.1.1
[7]: https://gitweb.torproject.org/bridgedb.git/tree/lib/bridgedb/proxy.py
[8]: https://gitweb.torproject.org/bridgedb.git/tree/scripts/get-tor-exits

 ♥Ⓐ isis agora lovecruft
OpenPGP: 4096R/0A6A58A14B5946ABDE18E207A3ADB67A2CDB8B35
Current Keys: https://blog.patternsinthevoid.net/isis.txt
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 1154 bytes
Desc: Digital signature
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20150506/df8f4637/attachment.sig>

More information about the tor-dev mailing list