[tor-bugs] #18361 [Tor Browser]: Issues with corporate censorship and mass surveillance
Tor Bug Tracker & Wiki
blackhole at torproject.org
Thu Feb 25 14:47:00 UTC 2016
#18361: Issues with corporate censorship and mass surveillance
Reporter: ioerror | Owner: tbb-team
Type: enhancement | Status: new
Priority: High | Milestone:
Component: Tor Browser | Version:
Severity: Critical | Resolution:
Keywords: security, privacy, anonymity | Actual Points:
Parent ID: | Points:
Comment (by paxxa2):
Here is a summary of some unaddressed points CloudFlare could come back
to, if they are wondering how to continue to engage with this ticket:
1. What kind of per browser session tracking is actually happening?
1. What would a reasonable solution look like for a company like
1. What is reasonable for a user to do? (~17 CAPTCHAs for one site == not
1. Would "Warning this site is under surveillance by Cloudflare" be a
reasonable warning or should we make it more general?
1. What is the difference between one super cookie and ~1m cookies on a
per site basis? The anonymity set appears to be *strictly* worse. Or do
you guys not do any stats on the backend? Do you claim that you can't and
don't link these things?
1. Cloudflare asks: “Is it possible to prove that a visitor is indeed
human, once, but not allow the CDN/DDoS company to deanonymize / correlate
the traffic across many domains?” Answer and follow-up question: Here is a
non-cryptographic, non-cookie based solution: Never prompt for a CAPTCHA
on GET requests. For such a user - how will you protect any information
you've collected from them? Will that information be of higher value or
richer technical information if there is a cookie (super, regular,
whatever) tied to that data?
1. Let's be clear on one point: humans do not request web pages. User-
Agents request web pages ... It might be true that there is some kind of
elaborate ZKP protocol that would allow a user to prove to CloudFlare that
their User-Agent behaves the way CloudFlare demands, without revealing all
of the user's browsing history to CloudFlare and Google. Among other
things, this would require CloudFlare to explicitly and precisely describe
both their threat model and their definition of 'good behaviour', which as
far as I know they have never done.
1. How many people are actively testing with Tor Browser on a daily basis
for regressions? Does anyone use it full-time?
1. If I was logged into Google (as they use a Google Captcha...), could
they vouch for my account and auto solve it? Effectively creating an ID
system for the entire web where Cloudflare is the MITM for all the users
visiting users cached/terminated by them?
1. Regarding “What sort of data would qualify as an 'i'm a human' bit?
Let's start with something not-worse than now: a captcha solved in last
<XX> minutes.” – Is this something that CloudFlare has actually found
effective? Are there metrics on how many challenged requests that
successfully solved a CAPTCHA turned out to actually be malicious?
1. I'd really like it if it was CAPTCHA free entirely until there is a
POST request, for example. A read only version of the website, rather than
a CAPTCHA prompt just to read would be better wouldn't it?
hook requests that would result in a POST and challenge after say,
clicking the submit button? It seems reasonable in many cases to redirect
them on pages where this is a relevant concern? POST fails, failure page
asks for a captcha solution, etc.
1. Actually, a censorship page with specific information ala HTTP 451
would be a nearly in spec answer to this problem. Why not use that?
1. Why not just serve them an older cached copy?
1. Do you have any open data on (“unfortunately many Tor exit IP's have
bad IP reputation, because they _ARE_ often used for unwanted activity”)?
1. CF asks: “What do we do to implement zero-knowledge proofs both on
ddos protection side and on TBB side?” My first order proposition would be
to solve a cached copy of the site in "read only" mode with no changes on
the TBB side. We can get this from other third parties if CF doesn't want
to serve it directly - that was part of my initial suggestion. Why not
just serve that data directly?
1. What about slowing down recurrent requests? it's really not something
that can be solved on the Tor side.
1. What kind of DoS can you guys possibly see through Tor? The network in
total capacity has to be less than a tiny fraction of the capacity at
*one* of your PoPs. Could you please give us actual data here? I've seen
some basic CF API data - what is exposed seems to be quite minimal. As far
as I can tell - the main data is score data that is from project honeynet.
That has a lot of history that is extremely problematic in my view.
1. Those at-risk parties are not just a matter of ethics, they are a
source of surveillance capital for CloudFlare which is useful for
generating so-called "threat" scores as well as other data. I assume that
0days found in that process are submitted to CERT, the same CERT that
exploited Tor Hidden Service users, I might add.
1. In short - those at risk services are paying for this protection with
their user/attacker data which is extracted with surveillance by
CloudFlare. It may be ethical in motivation but unless I completely
misunderstand the monitoring by CloudFlare of its own network, it appears
to be sustained with surveillance more than pure good will.
1. Maybe CloudFlare could be persuaded to use CAPTCHAs more precisely?
That is, present a CAPTCHA only when:the server owner has specifically
requested that CAPTCHAs be usedthe server is actively under DoS attack,
and the client's IP address is currently a source of the DoS.
1. Has CloudFlare considered other CAPTCHAs, or discussed reCAPTCHA's
problems with Google?
1. could the FBI go to Google to get data on all CloudFlare users? Does
CF protect it? If so - who protects users more?
1. Building the infrastructure for a zero-knowledge proof system sounds
like a fascinating but expensive and long-term project. And I wouldn't be
confident that CloudFlare would even adopt such a thing once it became
available, unless they made a significant investment in the work at the
1. Marek, do you have any thoughts about my suggestions for reducing
CAPTCHA use in comment:17?
1. What does attempting to prove "i'm-a-human" have to do with addressing
1. Centralization ensures that your company is a high value target. The
ability to run code in the browsers of millions of computers is highly
attractive. The fact that CF and Google appear to both appear in those
captcha prompts probably ensures CF isn't even in control of the entirety
of the risk. Is it the case that for all the promises CF makes, Google is
actually in control of the Captcha - and thus is by proxy given the
ability to run code in the browsers of users visiting CF terminated sites?
1. Should we be reaching out to Google here?
1. Is (Project HoneyNet) data the reason that Google's captchas are so
hard to solve? (stated answer “I don't know if there's any connection
between Project Honeypot and Google's CAPTCHAs” is not an answer).
1. How do we vet this information or these so-called "threat scores"
other than trusting what someone says?
1. Are you convinced that (offering up a read only page) is strictly
worse than the current situation? I'm convinced that it is strictly better
to only toss up a captcha that loads a Google research when a user is
about to interact with the website in a major way.
1. Does that mean that Google, in addition to CF, has data on everyone
hitting those captchas?
1. When a user is given a CF captcha - does Google see any request from
them directly? Do they see the Tor Exit IP hitting them? Is it just CF or
1. Could run the exact same test against all Comcast IP addresses
aggregated as just once or another significant ISP?
1. How are you handling CGNs so far?
1. So what happens if me (as a site/server admin) don't need this (or
part of this).[[BR]]Specifically:[[BR]]As a server admin, if my site is
not under DDoS (or spam) attack, then its visitors should not get the
captcha challenge. [[BR]]As a server admin I should be able to choose if I
want this kind of protection and potentially completely disable it.
[[BR]]As a server admin, I want more sane defaults (lower security level).
1. CAPTCHAs are a fundamentally untenable solution to dealing with DDOS
attacks. Algorithmic solutions will always catch up to evolving CAPTCHA
methods. CloudFlare and other service providers should recognize that is
the inevitable direction technology is going and abandon it now. An
alternate solution is a client proof-of-work protocol.
1. Why does CloudFlare not run a .onion proxy for their sites?
1. If all Tor exits are known, why isn't there even a control panel
option for customers to say "Okay, I know Tor traffic is good, allow it
unconditionally"? → Answer: “We will add this feature. Our customers will
be able to 'whitelist' Tor so that Tor users visiting their web sites will
not be challenged. This feature is coded and will be released shortly.”
Follow up: Will that be the new default until a site decides to actively
1. So my question is to Cloudflare, their CTO was on here earlier. Why
exactly are you not able to just implement a Captcha system that works??
Seriously, is it that hard? As far as I know, you have recently moved over
to serving up Google captchas, but it still doesn't work? Is CF's CTO
really OK and comfortable with the fact that his team couldn't implement
this after apparently trying for a few years? Seriously!! Captcha's as a
concept have been around for a pretty long time now.
1. I always find it a demeaning and insulting attitude towards humans
that we are being asked by rooms full of servers (which handle enormous
amounts of requests and should be able handle the few extra ones coming
from Tor exits without breaking an electronic sweat, honestly) to solve
puzzles. I am very angry about this attitude btw because my time is
infinitely more valuable than your servers'. Being treated as a CAPTCHA-
solving bot makes people angry, understand?
1. Fantastic to hear that you are experiencing the same issues (CAPTCHA
loops) as the rest of us. How do we ensure that it not only gets fixed but
that it also never is left to our end users alone to detect these kinds of
1. So Mr. jgrahamc, are captcha's part of Government Technology? Why is
correlate it with previous data sets? With enough signal gathered, can you
then establish unique profiles of people?
1. So how much would these customers care for the anonymous eyeballs of a
relatively small group (in relation to the rest of the "net") of privacy-
active users of a technology that attempts to destroy their βusiness
model? Isn't this also your βusiness model, Cloudflar? Isn't this the very
thing you do with the traffic from all those sites you MITM? I wonder who
do you sell to though, hmm...
1. My concern about Google is not that people should not be free to use
their services - it is that CF *colludes* with Google when a user has not
at all consented. How many server operators know that the CAPTCHA is
hosted by Google, when they use CF for "protection" services? All of them?
None of them? Did anyone get a choice? Tor users certainly did not get a
choice when they are automatically flagged based on an IP reputation
system and then redirected to Google.
1. CF says: “We fixed the bug that caused a new CAPTCHA to be served for
a site when the circuit changes.” Doesn't this mean that you've now got
cross circuit tracking for Tor Browser users, effectively? I assume that
is by issuing a cookie that isn't tied to a given IP address - though
again without any transparency, I feel like it is unclear what was
actually done in any technical sense.
1. CF says: “We've reproduced the "CAPTCHA loop" problem and have an
engineer looking into what's happening.” Is there a timeline for this?
Will they report back on this bug?
1. Does this indeed mean that Google, because of actions by CF, has data
on every person prompted for a CAPTCHA?
1. Any American third party presents similar problems as Google. On the
one hand, they are a PRISM provider. On the other, they probably have the
best security team in the world. Why aren't you guys just hosting your own
CAPTCHA solution or proxying it to Google in such a way that Google gets
nothing directly from your users?
1. Does this fixed CAPTCHA record users' reaction time, order of clicks,
or mouse movements? CF Answers: “The fix I am talking about does not
of my repeated question is the ability of CF CAPTCHA to capture response
of users such as mouse movements, reaction time, order of checkbox
selection. Is this kind of information transferred out of users device by
submitting a CF CAPTCHA?
1. It would be nice if this wasn't a closed discussion (at Cloudflare)
with answers thrown over the wall. How can we include other people in
1. Why is CF even blocking Tor on sites that don't historically receive
abusive traffic from Tor IPs in the first place? The "whitelist tor IPs"
thing should be the default on all sites and only turned off when
significant abusive traffic patterns are detected from Tor Ips.
1. I'd also seriously look into how you are addressing DDoS from the
network layer (specifically your edge router/firewall/load balancing
configurations), how you scale your client infrastructure elastically, and
specifically how you define your threat model. Two subpoints: your own
engineer has admitted that captcha is a terrible isn't a salable way to
address this problem, stating "we struggle to even serve captchas" [edit:
while under attack]. So I'd challenge that this is an effective solution
1. I'm with several others here seriously questioning the SNR and
throughput constraints around blanket allowance of Tor infrastructure.
It's like using a hatchet to remove a fly from your friends forehead.
Small problem, oblique solution. Please remember that exit nodes are
communal, so pretend for example that every time you wanted to blacklist a
/32 ipv4 address, instead you were blacklisting an entire /24 public
1. Could you please make some comparisons of the abuse in question? Is
CloudFlare really just using Project Honeynet data here?
1. What is the p value as asked above?
1. Please reply to the analysis of the XFF dataset: Does CloudFlare
censor the entire country of Vietnam as hard as it does to many Tor exit
1. Another comment about the broken CloudFlare captchas is that they're
always in English for me. Is that always the case? For those who don't
speak English, they're even more confused when they are censored with a
looping and thus broken captcha security solution...?
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/18361#comment:144>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs