[tor-dev] Should we disable the collection of some stats published in extra-infos?
rob.g.jansen at nrl.navy.mil
Thu Feb 11 19:51:51 UTC 2016
> On Jan 19, 2016, at 3:45 AM, Karsten Loesing <karsten at torproject.org> wrote:
> Signed PGP part
> On 15/01/16 23:00, Rob Jansen wrote:
> > Hello,
> Hi Rob,
> I'm moving this discussion from metrics-team@ to tor-dev@, because I
> think it's relevant for little-t-tor devs who are not subscribed to
> metrics-team at . Hope you don't mind.
> > Should Tor still be collecting these things? Should Tor disable the
> > collection of these statistics until we have a more
> > privacy-preserving way to collect and aggregate them?
> > The good news is that privacy-preserving techniques exist that can
> > reduce information leakage. I'm developing a tool based on the
> > secret-sharing variant of PrivEx  to collect some of these types
> > of statistics while providing privacy guarantees. We are currently
> > using it to collect only those stats that are useful for producing
> > Tor traffic models. A great advantage of this tool is that the
> > various counters that we store during the collection phase get
> > noise added and are randomized during initialization; only the
> > aggregates are ever known and revealed by the aggregation server,
> > limiting the information that is lost if a relay is compromised.
> > This is a large improvement over the current collection method,
> > which only adds noise before publication and reveals statistics on
> > a per-relay basis.
> Suggestion: How about we evaluate these statistics published by relays
> in the past years to see if there are other benefits or risks we
> didn't think of, and then we decide whether to leave them in, modify
> them, or take them out?
Sounds great, though I'm not sure how this evaluation will happen.
> The reason is that I'd want to avoid removing this code only to
> realize shortly after that we overlooked a good reason for keeping it.
The problem is that it is unlikely that anyone will speak up until *after* we remove them, so it may be difficult to realize all use cases until they have already been removed. At least for me, it's not just a matter of thinking hard enough about it.
That said, I think that for some of these stats, the risk is such that it is hard to imagine collecting it the way Tor does currently.
> These statistics are being collected for years now, and it might take
> another year or so for relays to upgrade to stop collecting them. So
> what's another month.
To be clear, I am not suggesting that we simply remove everything and never look back. I'm actually suggesting using secure aggregation to *replace* the current method for counting and aggregating. Maybe the secure counting/aggregation happens occasionally, or maybe continuously. The details there still need to be worked out (working on it).
I would suggest that we wait until those details are in fact worked out and we discuss a transition plan before removing the old collection methods, but I think that some stats have enough risk that it may not be worth waiting. Maybe we can remove the riskiest stats (IP addresses, exit ports, exit bytes) and wait to remove the others until I have more details about a replacement.
> Thanks for (re-)starting this discussion!
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
More information about the tor-dev