[tor-dev] A proposal to phase out CAPTCHAs for BridgeDB

Roger Dingledine arma at torproject.org
Mon Aug 2 10:42:01 UTC 2021


On Thu, Jul 29, 2021 at 04:46:37PM -0400, Cecylia Bocovich wrote:
> I would like to propose that we remove the CAPTCHAs from BridgeDB
> entirely, but I'd like to know whether there is research out there
> *specifically that fits with the anti-censorship context* showing that
> these CAPTCHAs are actually doing useful work to prevent Bridge
> enumeration. But, even if the CAPTCHAs are preventing a small number of
> censors from enumerating more bridges, is the usability impact worth
> what marginal benefit we get from it?

Right. As another data point, the original bridge distribution design
did not intend for the https bridge bucket to use captchas:
https://svn-archive.torproject.org/svn/projects/design-paper/blocking.html#tth_sEc7.4
The original plan around captchas was to rely on Gmail's captcha, or
whatever Gmail uses as an account creation rate limiter, for the email
distribution bucket. That way *they* keep up with captcha research rather
than forcing us to become (and stay) captcha experts.

Thought #1: While of course we don't necessarily need to stick to
the vision from 15 years ago, I think there's a lot of merit to the
let-a-thousand-flowers-bloom approach to distribution strategies, where
we don't need to glue captchas on to every one of them. I support your
goal of dropping Captchas from the https distributor, on the theory that
they are implicitly included (and done better!) for the email distributor.

Thought #2: Are there adversaries who would happily scrape the https
distributor if it were trivial to do, and just the barrier of solving
the captchas dissuades them? I'm thinking of the Belarus A1 censorship
event for example:
https://gitlab.torproject.org/tpo/anti-censorship/censorship-analysis/-/blob/main/reports/2020/belarus/2020-belarus-report.md
where our analysis indicates that they scraped the gmail distributor but
not the https distributor. Maybe they already had the gmail accounts in
place from some other attack, so it was cheap to use them for scraping.

Thought #3: We added captchas for the https distributor, but then when
we added the Moat distributor we put captchas on it too. And the Moat
distributor doesn't have any *other* rate-limiting or defense (compare to
the isolation-by-address-block for answers from the https distributor). So
Moat seems extra vulnerable to cheap full enumeration.

Thought #4: We would be in a much better position to experiment here if
we had a better measurement and feedback infrastructure in place. Like,
if we removed the captchas today, how would we know what the impacts are
in terms of higher risk of blocking?

So, I too am tempted to get rid of the captchas, but especially since
we use them in the Moat distributor too, it is unclear how much losing
them would impact usability and security, and it is unclear how we would
learn the answer to that in practice.

My suggestion would be to focus on getting that measurement and
feedback infrastructure in place first, before considering improving
the captchas. We know we need it to know how things are going now, and
we're going to need it to understand the impact of any changes we make.

--Roger



More information about the tor-dev mailing list