[tor-bugs] #9316 [Circumvention/BridgeDB]: BridgeDB should export statistics

Tor Bug Tracker & Wiki blackhole at torproject.org
Tue Jun 11 23:08:05 UTC 2019


#9316: BridgeDB should export statistics
-------------------------------------------------+-------------------------
 Reporter:  asn                                  |          Owner:  phw
     Type:  task                                 |         Status:
                                                 |  assigned
 Priority:  Medium                               |      Milestone:
Component:  Circumvention/BridgeDB               |        Version:
 Severity:  Normal                               |     Resolution:
 Keywords:  metrics, bridgedb, prometheus, ex-   |  Actual Points:
  sponsor-19, anti-censorship-roadmap            |
Parent ID:  #19332                               |         Points:  3
 Reviewer:                                       |        Sponsor:
                                                 |  Sponsor30-must
-------------------------------------------------+-------------------------

Comment (by phw):

 We just heard back from Tor's Research Safety Board. You can find the
 response below. The reviewer writes that our proposal wouldn't be an issue
 in a one-off setting but could be problematic in the long run. I think a
 reasonable way forward would be to implement the proposal, run it in a
 one-off setting for, say, a week, and then evaluate if we should change
 data collection. In the long run, we should also transition to PrivCount
 as the reviewer mentions.

 {{{
 Tor Research Safety Board Paper #20 Reviews and Comments
 ===========================================================================
 Paper #20 Collecting BridgeDB usage statistics


 Review #20A
 ===========================================================================
 * Updated: 11 Jun 2019 6:02:53pm EDT

 Overall merit
 -------------
 4. Accept

 Reviewer expertise
 ------------------
 3. Knowledgeable

 Paper summary
 -------------
 The document proposing collecting a new set of usage statistics through
 data
 available from the operation of BridgeDB. The statistics would be useful
 for
 better prioritizing development tasks, to improve reaction time to bridge
 enumeration attacks and blockages, to reduce failure rates, and to help
 promote
 censorship circumvention research.

 Comments for author
 -------------------
 If this was a short term study, I would say go for it, no questions asked.
 The
 benefits are clear and I agree that they outweigh the risks.

 However, I think it was implied (although not explicitly stated) that the
 new
 statistics would be regularly collected and published on an ongoing basis.
 I
 think there are more risks associated with such an ongoing collection as
 opposed
 to a one-off or short term study, so we should carefully consider the
 trade-offs
 between cost/effort of safer collection methods with the privacy benefits
 of
 such methods.

 The most concerning statistics to me are the per-country statistics and
 the
 per-service (gmail, yahoo, etc.) statistics. I think it is clear from
 Sections 3
 and 4 that you understand the risks associated with collecting these
 statistics:
 a single user from an unpopular country could be identified because the
 1-10
 bucket suddenly changed from a 0 count to a 1 count. This issue might also
 exist
 if unpopular email service providers are selected. This issue is already
 present
 in Tor's per-country user statistics, and I believe there is a plan to
 transition away from these statistics because of the safety concerns. The
 bucketing proposal (round to the nearest 10) does provide some
 uncertainty, but
 it's hard to reason about what protection it is providing.

 In an ideal world, we would collect these statistics with a privacy-
 preserving
 statistics collection tool. In fact, I think most if not all of these
 could be
 collected with PrivCount (assuming it was extended to support the new
 event
 types).

 One useful thing about PrivCount is secure aggregation, meaning that if
 you have
 multiple data collectors, you can securely count a total across all of
 them
 without leaking individual inputs. In this case, it seems like there is
 only one
 BridgeDB data source, so we woud not benefit from PrivCount's secure
 aggregation.

 The other useful thing that PrivCount provides is differential privacy.
 This is
 where you could get most of the benefit. Rather than rounding to 10 and
 not
 knowing how much privacy that provides, you instead start by defining how
 much
 privacy each statistic should achieve based on your operational
 environment
 (these are called action bounds), and then PrivCount will add noise to the
 statistics in a way that will guarantee differential privacy under those
 constraints. If these constraints add too much noise for the resulting
 statistics to be useful, then you have to consider if the measurement is
 too
 privacy-invasive for the given actions you are trying to protect and
 therefore
 you possibly shouldn't collect them.

 Tor has PrivCount on the roadmap (I believe), so one option could be to
 implement the non-PrivCount version now and eventually transition the
 statistics
 to PrivCount. Another option would be to set up a PrivCount instance using
 the
 open source tool rather than waiting for the PrivCount-in-Tor version to
 be
 ready. In fact, if the data is collected at BridgeDB, then I'm not sure
 that
 having PrivCount in Tor would help anyway (unless the BridgeDB runs Tor).

 There has been some work to use PrivCount for measurement and also to
 explain
 the process of defining action bounds. I think the most relevant is the
 IMC
 paper:
     - https://torusage-imc2018.github.io
 }}}

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/9316#comment:25>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list