[tor-dev] Brainstorming Domain Fronted Bridge Distribution (was meek costs)

Thu May 7 05:31:31 UTC 2015

isis:
> WARNING: much text. so email. very long.

Right. If I cut your previous text, assume I'm in agreement, not
ignoring it.

> Mike Perry transcribed 13K bytes:
> > isis:
> > > > Now that we have a browser updater, I think it is also OK for us to
> > > > provide autoprobing options for Tor Launcher, so long as the user is
> > > > informed what this means before they select it, and it only happens
> > > > once.
> > > 
> > > Probing all of the different Pluggable Transport types simultaneously provides
> > > an excellent training classifier for DPI boxes to learn what new Pluggable
> > > Transport traffic looks like.
> > > 
> > > As long as it happens only once, and only uses the bridges bundled in Tor
> > > Browser, I don't see any issue with auto-selecting from the drop-down of
> > > transport methodnames in a predefined order.  It's what users do anyway.
> > 
> > Oh, yes. I am still against "connect to all of the things at the same
> > time." The probing I had in mind was to cycle through the transport list
> > and try each type, except also obtain the bridges for each type from
> > BridgeDB.
> 
> But why does Tor Browser need to get bridges from BridgeDB, if it doesn't know
> yet which ones will work?  Why not autoprobe with the bundled bridges, then
> ask BridgeDB for some more of the kind that works?

Sure, the first autoprobe can (and should) test the local bridges before
ask for more, but I expect people are using meek because none of those
actually work.

We can do better about trying to sneak fresh bridges into the TBB
distribution immediately before every release, but I doubt that will
help much, since the adversary can scrape them with just a wget from our
git repos.

> > I also think we should be careful about the probing order. I want to
> > probe the most popular and resilient transports (such as obfs4) first.
> 
> Currently, obfs4 isn't blocked anywhere… so why probe at all when we know
> definitely that the first thing we try is going to work?

Mostly because of IP blocking. 

> > > > The autoprobing could then keep asking for non-meek bridges for either a
> > > > given type of the user's choice, or optionally all non-meek types (with
> > > > an additional warning that this increases their risk of being discovered
> > > > as a Tor user).
> > > 
> > > If the autoprobing is going to include asking BridgeDB (multiple times?) for
> > > different types of bridges in the process, whether through a BridgeDB domain
> > > front or not, then I think there needs to be more discussion…
> > > 
> > >   * Do you think could you explain more about the steps this autoprobing
> > >     entails?
> > 
> > 1. User starts a fresh Tor Browser (or one that fails to bootstrap)
> > 2. User clicks "Configure" instead of "Connect"
> > 3. User says they are censored
> > 4. User selects a third radio button on the bridge dialog
> >    "Please help me obtain bridges".
> > 5. Tor Browser launches a JSON-RPC request to BridgeDB's domain front
> >    for bridges of type $TYPE
> > 6. BridgeDB responds with a Captcha
> > 7. User solves captcha; response is posted back to BridgeDB.
> > 8. BridgeDB response with bridges (or a captcha error)
> > 9. Tor Launcher attempts to bootstrap with these bridges.
> > 10. If bootstrap fails, goto step 5.
> > 
> > The number of loops for steps 5-10 for each $TYPE probably require some
> > intuition on how frequently we expect bridges that we hand out to be
> > blocked due to scraping, and how many bridge addresses we really want to
> > hand out per Captcha+IP address combination.
> 
> Currently, you get the same bridges every time you ask (for some arbitrary
> period).  This would definitely require a new Distributor on the backend (not
> a problem, and not difficult, just saying).
> 
> Why ask multiple times?  Why not just get three bridges per request, and if
> that appears to be failing to get 99% of users connected to a bridge, increase
> it to four?

Handing out four at a time might work better than requesting the same
type again and again. Then again, simply trying the other transport
types might also work.

How hard is it to get analytics on the requests to BridgeDB? If we give
you a special request parameter (like "&justfailed=obfs4") that means
"Hey, I'm asking you again for this new transport type because the obfs4
bridges you just gave me didn't work", can you count that? Can you break
down that count by GeoIP country for the requesting IP?

That metric will be useful for hints if obfs4 is suddenly blocked in
some country, or if by some other mechanism the censor has discovered
all/most of the IP addresses of the obfs4 bridges.

> FWIW, the number of suspicious attempts to the HTTPS Distributor has dropped
> substantially in the last four months, and to the email distributor has stayed
> the about the same.  Off the top of my head, this is likely, hopefully,
> something to do with:
> 
>  1. actually distributing separate bridges to Tor/proxy users, and actually
>     rate limiting them (!!), [0] [1]
> 
>  2. actually rotating available bridges, such that large amounts of both time
>     and IP space are required to effectively scrape, [2] [3]
> 
>  3. semi-automatedly blacklisting stupid email bots, [4]
> 
>  4. moving away from ReCAPTCHA to our own CAPTCHA module and expiring stale
>     CAPTCHAs after 10 minutes. [5]

Interesting, I think this might be a good argument for "If it ain't
broke, don't fix it" wrt how we hand out bridges for the Domain Front
distributor.

> > Personally, I think the domain fronting distributor should behave
> > identically to the closest equivalent distributor that isn't domain
> > fronted, both to reduce implementation complexity, and to keep the
> > system easy to reason about.
> >
> > Before RBridge is implemented, this would mean using the
> > X-Domain-Fronted-For header's IP address as if it were the real IP
> > address, and index into the hashrings in the same way as we do with the
> > web distributor.
> 
> "Index into the hashrings" is a culmination of rather complicated structures
> to produce a behaviour particular to the current best known configuration of a
> particular Distributor.
> 
> If you would like it to behave the same as the HTTPS Distributor, should
> bridges for Tor/proxy users still be kept in a separate subhashring?  And
> users grouped by /32 if IPv6, and /16 if IPv4?  And only allow one set of
> bridges every three hours for non-Tor/proxy users, and four sets for all
> Tor/proxy users?  And rotate sub-sub-hashrings once per day?  And expire
> CAPTCHAs after 10 minutes?  And will you accept that, should I need to change
> any of those things for the HTTPS Distributor, that your Tor Browser
> Distributor might break?

Yes to all. Well, except the last one. I'd hope you could guarantee API
compatibility, at least.

As I said, I think starting out with all of this as it is in the HTTPS
distributor is fine, so long as we can measure how many users actually
don't need to ask for more bridges of a different type, and figure out
some way to change that if it is clearly not working.

> Otherwise, it should likely be separate.  What problem would be solved by
> keeping them as the same Distributor, other than (at least at the initial
> implementation time) the Distributor code won't be duplicated?

I have nothing against keeping them separate, and for just the bridge
pool assignments that may be wise, since the access mechanisms are
slightly different than the non-domain fronted HTTPS distributor. If
retaining the ability to change the behavior for just the Tor Browser
distributor also means keeping the implementation separate, that can
also work. 

I only suspect that when this functionality is in Tor Launcher, it will
be the primary way that people get bridges, and should thus have the
most bridges out of any distributor, until we start inventing other
distributor types to replace it.

After all, Tor Launcher is used by Tails, Tor Browser, Tor Birdy, and
Tor Messenger. I bet Orbot will make use of the API too. That's
basically the entire Tor userbase, right there.

> (I'm goint to start calling the Tor Browser Distributor "Charon", unless
> someone has less gruesome name.)

Yikes. I'm going to pretend this is just an internal secret project
codename and keep it off of user-facing UI :)

> > I could see an argument that the set of bridges held by the domain
> > fronting distributor should be kept separate from the web distributor,
> > because heck, way more people should be able to access the domain
> > fronted version, and maybe we want to drastically reduce the web
> > distributor's pool because nobody can reach it (except for whitelisted
> > scrapers and people who don't really need bridges).
> 
> Actually!  Current statistical estimates say that many people (~75,000 times
> per day) can not only reach it, but can get bridges from it. [6]  :)

How do we know these weren't bot/scraper requests? Especially since as
you say the suspicious request rate has fallen since you fixed several
issues related to the web distributor hash rings and rate limiting?

> If you want more bridges allocated to the Tor Browser Distributor from the
> beginning, there's a trick to dump the unallocated bridges into your
> Distributor as soon as it first exists.  That's 1/5th the size of the HTTPS
> pool, and 2/5ths the size of the email pool; roughly 1,000 bridges.

1/5 of the HTTPS pool is unlikely to serve the entire Tor userbase long
term. I really expect this to be the primary way people get bridges,
unless it gets blocked or scraped, or we build something else we like
better.

> [0]: https://bugs.torproject.org/4771#comment:14
> [1]: https://bugs.torproject.org/4405
> [2]: https://bugs.torproject.org/1839
> [3]: https://bugs.torproject.org/15517
> [4]: https://bugs.torproject.org/9385
> [5]: https://bugs.torproject.org/11215
> [6]: https://people.torproject.org/~isis/otf-etfp-sow.pdf#subsection.1.1
> [7]: https://gitweb.torproject.org/bridgedb.git/tree/lib/bridgedb/proxy.py
> [8]: https://gitweb.torproject.org/bridgedb.git/tree/scripts/get-tor-exits

-- 
Mike Perry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Digital signature
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20150506/7363d7d7/attachment.sig>