[tor-bugs] #4957 [Metrics Data Processor]: Decide how to sanitize pluggable transport lines in bridge descriptors

Thu Jun 28 16:29:07 UTC 2012

#4957: Decide how to sanitize pluggable transport lines in bridge descriptors
------------------------------------+---------------------------------------
 Reporter:  karsten                 |          Owner:  karsten
     Type:  task                    |         Status:  new    
 Priority:  normal                  |      Milestone:         
Component:  Metrics Data Processor  |        Version:         
 Keywords:                          |         Parent:         
   Points:                          |   Actualpoints:         
------------------------------------+---------------------------------------

Comment(by karsten):

 Replying to [comment:2 asn]:
 > Transport lines will look like this:
 > {{{
 > transport SP <methodname> SP <address:port> [SP arglist] NL
 > }}}
 > and there is also an optional field for supplemental data:
 > {{{
 > transport-info SP <methodname> [SP arglist] NL
 > }}}
 > I think you can ignore `transport-info` for now since it's not
 implemented and there are no transports that need it yet.

 So, it looks like the contents of `transport-info` lines will be no more
 sensitive than the `[SP arglist]` part of `transport` lines, right?  If we
 want to keep `[SP arglist]` in `transport` lines, we can as well keep
 `transport-info` lines, even if they're not in use yet.

 > As far as sanitization is concerned, I'm not sure which approach is
 better. I'm also not completely sure how bridge descriptors are used; I
 assume they are used when analyzing bridge stats, and when a user wants to
 look at the descriptor of her bridge in atlas. Are there other use cases?

 Those are the two major use cases.  I'm mainly interested in the bridge
 stats part, though.  It would be good to see how widely the different
 transports are deployed and maybe be able to infer which of them are
 blocked or not.

 > Some sanitization approaches:
 >
 > a) No sanitization. Pluggable transports and their ports are dislosed to
 people who know a bridge.

 Note that everyone can learn the contents of sanitized bridge descriptors
 by downloading the tarballs or rsync'ing them from metrics.  It's not just
 people who know a bridge who'll receive the sanitized descriptors.

 If this a) includes leaving in the `address` part, I disagree.  We should
 sanitize the `address` part in the same way how we sanitize bridge IP
 addresses.  We can probably leave the `port` part in, because it ''might''
 give us some hints whether a specific port works better than other ports
 for a given transport.

 What does the `arglist` tell us that would be useful for statistical
 analysis?  There are no shared secrets in that line, are there?  If we
 take out the `arglist` part, I think we already decide against keeping
 `transport-info` lines in the future, because their only purpose seems to
 be to add another `arglist` to an existing transport.

 > b) Sanitization. Only display whether the bridge supports pluggable
 transports or not. Or maybe the number of transports it supports. Or maybe
 something else.

 The simple fact that a bridge supports pluggable transports or the number
 of supported transports seems hardly useful for statistical analysis.
 What we ''could'' do is only keep `transport SP <methodname>` for each
 transport that a bridge supports.  But I don't see yet how the sanitized
 address and (non-sanitized) port are sensitive information that we'd have
 to remove.

 > c) Paranoia. '''Don't''' display any pluggable transport-related
 information.

 That's bad, because we should come up with ''some'' stats to show how
 successful pluggable transports are, if we can.

 > If I were to select one I would probably go with a). It's good both for
 analysis and for users who want to know more about their bridges.

 I agree.

 > I'm also not sold by the use case of a bridge operator who supports
 multiple transports, has a public bridge, and wants to hide some of her
 transports from her users. However, Tor users have many different use
 cases and I only know of a few, so if others think that b) or c) (or d))
 are more reasonable (or support a larger range of use cases) I'm OK with
 it.

 Okay.  Here's what I'm going to do, unless you or somebody else tells me
 it's a bad idea:

  - Sanitize `transport` lines by sanitizing the `address` part similar to
 how we sanitize other addresses and keeping the rest of the line
 unchanged.

  - Leave in `transport-info` lines without changing them at all.

 Does that make sense?  (Thanks!)

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/4957#comment:3>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online