[tor-bugs] #18361 [Tor Browser]: Issues with corporate censorship and mass surveillance

Thu Feb 25 14:47:00 UTC 2016

#18361: Issues with corporate censorship and mass surveillance
------------------------------------------+--------------------------
 Reporter:  ioerror                       |          Owner:  tbb-team
     Type:  enhancement                   |         Status:  new
 Priority:  High                          |      Milestone:
Component:  Tor Browser                   |        Version:
 Severity:  Critical                      |     Resolution:
 Keywords:  security, privacy, anonymity  |  Actual Points:
Parent ID:                                |         Points:
  Sponsor:                                |
------------------------------------------+--------------------------

Comment (by paxxa2):

 Here is a summary of some unaddressed points CloudFlare could come back
 to, if they are wondering how to continue to engage with this ticket:

  1. What kind of per browser session tracking is actually happening?
  1. What would a reasonable solution look like for a company like
 Cloudflare?
  1. What is reasonable for a user to do? (~17 CAPTCHAs for one site == not
 reasonable)
  1. Would "Warning this site is under surveillance by Cloudflare" be a
 reasonable warning or should we make it more general?
  1. What is the difference between one super cookie and ~1m cookies on a
 per site basis? The anonymity set appears to be *strictly* worse. Or do
 you guys not do any stats on the backend? Do you claim that you can't and
 don't link these things?
  1. Cloudflare asks: “Is it possible to prove that a visitor is indeed
 human, once, but not allow the CDN/DDoS company to deanonymize / correlate
 the traffic across many domains?” Answer and follow-up question: Here is a
 non-cryptographic, non-cookie based solution: Never prompt for a CAPTCHA
 on GET requests. For such a user - how will you protect any information
 you've collected from them? Will that information be of higher value or
 richer technical information if there is a cookie (super, regular,
 whatever) tied to that data?
  1. Let's be clear on one point: humans do not request web pages. User-
 Agents request web pages ... It might be true that there is some kind of
 elaborate ZKP protocol that would allow a user to prove to CloudFlare that
 their User-Agent behaves the way CloudFlare demands, without revealing all
 of the user's browsing history to CloudFlare and Google. Among other
 things, this would require CloudFlare to explicitly and precisely describe
 both their threat model and their definition of 'good behaviour', which as
 far as I know they have never done.
  1. How many people are actively testing with Tor Browser on a daily basis
 for regressions? Does anyone use it full-time?
  1. If I was logged into Google (as they use a Google Captcha...), could
 they vouch for my account and auto solve it? Effectively creating an ID
 system for the entire web where Cloudflare is the MITM for all the users
 visiting users cached/terminated by them?
  1. Regarding “What sort of data would qualify as an 'i'm a human' bit?
 Let's start with something not-worse than now: a captcha solved in last
 <XX> minutes.” – Is this something that CloudFlare has actually found
 effective? Are there metrics on how many challenged requests that
 successfully solved a CAPTCHA turned out to actually be malicious?
  1. I'd really like it if it was CAPTCHA free entirely until there is a
 POST request, for example. A read only version of the website, rather than
 a CAPTCHA prompt just to read would be better wouldn't it?
  1. CloudFlare is in a position to inject JavaScript into sites. Why not
 hook requests that would result in a POST and challenge after say,
 clicking the submit button? It seems reasonable in many cases to redirect
 them on pages where this is a relevant concern? POST fails, failure page
 asks for a captcha solution, etc.
  1. Actually, a censorship page with specific information ala HTTP 451
 would be a nearly in spec answer to this problem. Why not use that?
  1. Why not just serve them an older cached copy?
  1. Do you have any open data on (“unfortunately many Tor exit IP's have
 bad IP reputation, because they _ARE_ often used for unwanted activity”)?
  1. CF asks: “What do we do to implement zero-knowledge proofs both on
 ddos protection side and on TBB side?” My first order proposition would be
 to solve a cached copy of the site in "read only" mode with no changes on
 the TBB side. We can get this from other third parties if CF doesn't want
 to serve it directly - that was part of my initial suggestion. Why not
 just serve that data directly?
  1. What about slowing down recurrent requests? it's really not something
 that can be solved on the Tor side.
  1. What kind of DoS can you guys possibly see through Tor? The network in
 total capacity has to be less than a tiny fraction of the capacity at
 *one* of your PoPs. Could you please give us actual data here? I've seen
 some basic CF API data - what is exposed seems to be quite minimal. As far
 as I can tell - the main data is score data that is from project honeynet.
 That has a lot of history that is extremely problematic in my view.
  1. Those at-risk parties are not just a matter of ethics, they are a
 source of surveillance capital for CloudFlare which is useful for
 generating so-called "threat" scores as well as other data. I assume that
 0days found in that process are submitted to CERT, the same CERT that
 exploited Tor Hidden Service users, I might add.
  1. In short - those at risk services are paying for this protection with
 their user/attacker data which is extracted with surveillance by
 CloudFlare. It may be ethical in motivation but unless I completely
 misunderstand the monitoring by CloudFlare of its own network, it appears
 to be sustained with surveillance more than pure good will.
  1. Maybe CloudFlare could be persuaded to use CAPTCHAs more precisely?
 That is, present a CAPTCHA only when:the server owner has specifically
 requested that CAPTCHAs be usedthe server is actively under DoS attack,
 and the client's IP address is currently a source of the DoS.
  1. Has CloudFlare considered other CAPTCHAs, or discussed reCAPTCHA's
 problems with Google?
  1. could the FBI go to Google to get data on all CloudFlare users? Does
 CF protect it? If so - who protects users more?
  1. Building the infrastructure for a zero-knowledge proof system sounds
 like a fascinating but expensive and long-term project. And I wouldn't be
 confident that CloudFlare would even adopt such a thing once it became
 available, unless they made a significant investment in the work at the
 beginning.
  1. Marek, do you have any thoughts about my suggestions for reducing
 CAPTCHA use in comment:17?
  1. What does attempting to prove "i'm-a-human" have to do with addressing
 DDoS attacks?
  1. Centralization ensures that your company is a high value target. The
 ability to run code in the browsers of millions of computers is highly
 attractive. The fact that CF and Google appear to both appear in those
 captcha prompts probably ensures CF isn't even in control of the entirety
 of the risk. Is it the case that for all the promises CF makes, Google is
 actually in control of the Captcha - and thus is by proxy given the
 ability to run code in the browsers of users visiting CF terminated sites?
  1. Should we be reaching out to Google here?
  1. Is (Project HoneyNet) data the reason that Google's captchas are so
 hard to solve? (stated answer “I don't know if there's any connection
 between Project Honeypot and Google's CAPTCHAs” is not an answer).
  1. How do we vet this information or these so-called "threat scores"
 other than trusting what someone says?
  1. Are you convinced that (offering up a read only page) is strictly
 worse than the current situation? I'm convinced that it is strictly better
 to only toss up a captcha that loads a Google research when a user is
 about to interact with the website in a major way.
  1. Does that mean that Google, in addition to CF, has data on everyone
 hitting those captchas?
  1. When a user is given a CF captcha - does Google see any request from
 them directly? Do they see the Tor Exit IP hitting them? Is it just CF or
 is it also Google? Do both companies get to run javascript in this user's
 browser?
  1. Could run the exact same test against all Comcast IP addresses
 aggregated as just once or another significant ISP?
  1. How are you handling CGNs so far?
  1. So what happens if me (as a site/server admin) don't need this (or
 part of this).[[BR]]Specifically:[[BR]]As a server admin, if my site is
 not under DDoS (or spam) attack, then its visitors should not get the
 captcha challenge. [[BR]]As a server admin I should be able to choose if I
 want this kind of protection and potentially completely disable it.
 [[BR]]As a server admin, I want more sane defaults (lower security level).
  1. CAPTCHAs are a fundamentally untenable solution to dealing with DDOS
 attacks. Algorithmic solutions will always catch up to evolving CAPTCHA
 methods. CloudFlare and other service providers should recognize that is
 the inevitable direction technology is going and abandon it now. An
 alternate solution is a client proof-of-work protocol.
  1. Why does CloudFlare not run a .onion proxy for their sites?
  1. If all Tor exits are known, why isn't there even a control panel
 option for customers to say "Okay, I know Tor traffic is good, allow it
 unconditionally"? → Answer: “We will add this feature. Our customers will
 be able to 'whitelist' Tor so that Tor users visiting their web sites will
 not be challenged. This feature is coded and will be released shortly.”
 Follow up: Will that be the new default until a site decides to actively
 block Tor?
  1. So my question is to Cloudflare, their CTO was on here earlier. Why
 exactly are you not able to just implement a Captcha system that works??
 Seriously, is it that hard? As far as I know, you have recently moved over
 to serving up Google captchas, but it still doesn't work? Is CF's CTO
 really OK and comfortable with the fact that his team couldn't implement
 this after apparently trying for a few years? Seriously!! Captcha's as a
 concept have been around for a pretty long time now.
  1. I always find it a demeaning and insulting attitude towards humans
 that we are being asked by rooms full of servers (which handle enormous
 amounts of requests and should be able handle the few extra ones coming
 from Tor exits without breaking an electronic sweat, honestly) to solve
 puzzles. I am very angry about this attitude btw because my time is
 infinitely more valuable than your servers'. Being treated as a CAPTCHA-
 solving bot makes people angry, understand?
  1. Fantastic to hear that you are experiencing the same issues (CAPTCHA
 loops) as the rest of us. How do we ensure that it not only gets fixed but
 that it also never is left to our end users alone to detect these kinds of
 issues?
  1. So Mr. jgrahamc, are captcha's part of Government Technology? Why is
 javascript so necessary. Do you measure per click reaction time? Do you
 correlate it with previous data sets? With enough signal gathered, can you
 then establish unique profiles of people?
  1. So how much would these customers care for the anonymous eyeballs of a
 relatively small group (in relation to the rest of the "net") of privacy-
 active users of a technology that attempts to destroy their βusiness
 model? Isn't this also your βusiness model, Cloudflar? Isn't this the very
 thing you do with the traffic from all those sites you MITM? I wonder who
 do you sell to though, hmm...
  1. My concern about Google is not that people should not be free to use
 their services - it is that CF *colludes* with Google when a user has not
 at all consented. How many server operators know that the CAPTCHA is
 hosted by Google, when they use CF for "protection" services? All of them?
 None of them? Did anyone get a choice? Tor users certainly did not get a
 choice when they are automatically flagged based on an IP reputation
 system and then redirected to Google.
  1. CF says: “We fixed the bug that caused a new CAPTCHA to be served for
 a site when the circuit changes.” Doesn't this mean that you've now got
 cross circuit tracking for Tor Browser users, effectively? I assume that
 is by issuing a cookie that isn't tied to a given IP address - though
 again without any transparency, I feel like it is unclear what was
 actually done in any technical sense.
  1. CF says: “We've reproduced the "CAPTCHA loop" problem and have an
 engineer looking into what's happening.”  Is there a timeline for this?
 Will they report back on this bug?
  1. Does this indeed mean that Google, because of actions by CF, has data
 on every person prompted for a CAPTCHA?
  1. Any American third party presents similar problems as Google. On the
 one hand, they are a PRISM provider. On the other, they probably have the
 best security team in the world. Why aren't you guys just hosting your own
 CAPTCHA solution or proxying it to Google in such a way that Google gets
 nothing directly from your users?
  1. Does this fixed CAPTCHA record users' reaction time, order of clicks,
 or mouse movements? CF Answers: “The fix I am talking about does not
 involve JavaScript or any of those things at all.” Follow up: The subject
 of my repeated question is the ability of CF CAPTCHA to capture response
 of users such as mouse movements, reaction time, order of checkbox
 selection. Is this kind of information transferred out of users device by
 submitting a CF CAPTCHA?
  1. It would be nice if this wasn't a closed discussion (at Cloudflare)
 with answers thrown over the wall. How can we include other people in
 these discussions?
  1. Why is CF even blocking Tor on sites that don't historically receive
 abusive traffic from Tor IPs in the first place? The "whitelist tor IPs"
 thing should be the default on all sites and only turned off when
 significant abusive traffic patterns are detected from Tor Ips.
  1. I'd also seriously look into how you are addressing DDoS from the
 network layer (specifically your edge router/firewall/load balancing
 configurations), how you scale your client infrastructure elastically, and
 specifically how you define your threat model. Two subpoints: your own
 engineer has admitted that captcha is a terrible isn't a salable way to
 address this problem, stating "we struggle to even serve captchas" [edit:
 while under attack]. So I'd challenge that this is an effective solution
 for DDoS.
  1. I'm with several others here seriously questioning the SNR and
 throughput constraints around blanket allowance of Tor infrastructure.
 It's like using a hatchet to remove a fly from your friends forehead.
 Small problem, oblique solution.  Please remember that exit nodes are
 communal, so pretend for example that every time you wanted to blacklist a
 /32 ipv4 address, instead you were blacklisting an entire /24 public
 network.
  1. Could you please make some comparisons of the abuse in question? Is
 CloudFlare really just using Project Honeynet data here?
  1. What is the p value as asked above?
  1. Please reply to the analysis of the XFF dataset: Does CloudFlare
 censor the entire country of Vietnam as hard as it does to many Tor exit
 nodes?
  1. Another comment about the broken CloudFlare captchas is that they're
 always in English for me. Is that always the case? For those who don't
 speak English, they're even more confused when they are censored with a
 looping and thus broken captcha security solution...?

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/18361#comment:144>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online