Proposal 171 (revised): Separate streams across circuits by connection metadata

Wed Dec 15 04:55:15 UTC 2010

On Tue, Dec 14, 2010 at 5:35 PM, Robert Hogan <robert at roberthogan.net> wrote:
> A lot to digest!
>
>
> On Tuesday 07 December 2010 16:02:05 you wrote:
>>
>>   Generally, all the streams on a session come from a single
>>   application.  Unfortunately, isolating streams by application
>>   automatically isn't feasible, given the lack of any nice
>>   cross-platform way to tell which local process originated a given
>>   connection.  (Yes, lsof works.  But a quick review of the lsof code
>>   should be sufficient to scare you away from thinking there is a
>>   portable option, much less a portable O(1) option.)  So instead, we'll
>>   have to use some other aspect of a Tor request as a proxy for the
>>   application.
>
> It's hard to credit there isn't a good interface for this. The closest
> Linux gets is taskstats - see http://linux.derkeiler.com/Mailing-
> Lists/Kernel/2010-06/msg01125.html, but the requirement here (which is
> shared by programs like wireshark and nearly every network management
> application) is to match inode to pid reliably.
>
> Interestingly, Unix sockets allow you to collect the gid and uid of the
> process on the other side of the socket. Not the pid unfortunately.

Yeah.  If there's a good interface here, I would love to see it, but
afaict there isn't one. I spent a while grepping through the kernel
source and didn't see an easy way to get this stuff.  If somebody can
point to some working code, that would be neat.

>> Design:
>>
>>   When a stream arrives at Tor, we have the following data to examine:
>>     1) The destination address
>>     2) The destination port (unless this a DNS lookup)
>>     3) The protocol used by the application to send the stream to Tor:
>>        SOCKS4, SOCKS4A, SOCKS5, or whatever local "transparent proxy"
>>        mechanism the kernel gives us.
>>     4) The port used by the application to send the stream to Tor --
>>        that is, the SOCKSListenAddress or TransListenAddress that the
>>        application used, if we have more than one.
>>     5) The SOCKS username and password, if any.
>>     6) The source address and port for the application.
>>
>
> Why not the source address too? Robert Ransom made the point in a previous
> thread that Tor could be serving a local network.

I don't understand the question.  My option 6 above includes the
source address; that's what the option "IsolateClientAddr" is meant to
do.

Oh!  Was it confusing when I wrote, "We propose to use options 3, 4,
and 5 as a backchannel for the application to tell Tor about different
sessions"?  I didn't mean that Tor should not look at 1, 2, and 6:
what I mean is that 3..5 are more or less arbitrary choices under the
application's control, whereas 1, 2,and 6 are generally not something
that the application gets to pick.  So I considered 3,4,5 to be a
"backchannel" and 1,2,6 to be the actual routing information.

IOW, I agree that isolating by client address is valuable.  In fact, I
think it's crucial.  That's why the Security Risks section says
"Disabling IsolateClientAddr is a pretty bad idea".

>> The "IsolateSOCKSUser" and "IsolateClientAddr" options are on by
>>  default; "NoIsolateSOCKSUser" and "NoIsolateClientAddr" respectively
>>  turn them off.  The IsolateDestPort and IsolateDestAddr and
>>  IsolateClientProtocol options are off by default.  NoIsolateDestPort and
>>  NoIsolateDestAddr and NoIsolateClientProtocol have no effect.
>
> Why is IsolateClientProtocol off by default? Seems like a cheap,
> opportunistic way of distinguishing client applications.

No objection there.

>>   Handling DNS can be a challenge.  We can get hostnames by one of three
>>   means:
>>
>>     A) A SOCKS4a request, or a SOCKS5 request with a hostname.  This
>>        case is handled trivially using the rules above.
>>     B) A RESOLVE request on a SOCKSPort.  This case is handled using the
>>        rules above, except that port isolation can't work to isolate
>>        RESOLVE requests into a proper session, since we don't know which
>>        port will eventually be used when we connect to the returned
>>        address.
>>     C) A request on a DNSPort.  We have no way of knowing which
>>        address/port will be used to connect to the requested address.
>>
>>   When B or C is required but problematic, we could favor the use of
>>   AutomapHostsOnResolve.
>>
>
> I'm not clear when it will be problematic. Can you clarify?

Well, suppose that we're configured to isolate requests by destination
port.  When you get a new request to resolve (say) example.com via a
DNSPort request or via a SOCKS resolve request, what circuit should
you put it on?  To make this concrete, suppose that IsolateDestPort is
set, and that the user makes DNS requests for www.example.com and
gopher.example.com, then connects to the IP for www.example.com on
port 80 and to the IP for gopher.example.com on port 70.

We *could* call both of these DNS requests "port 53"; if you did this,
then the exit node for the port-53 circuit will get a complete list of
all the hosts you were connecting to, which would partially defeat the
purpose of session isolation.  Instead, we'd want to have the DNS
request go out on the circuit that will eventually connect to the
resolved address -- we'd like to have the www.example.com DNS request
made on the port-80 circuit, and the gopher.example.com  DNS request
made on the port-70 circuit.  Doing it like this would confine the
information about our DNS requests to the circuits handling the
applications that use them.

But of course the problem is that when we see only a RESOLVE request
or a DNS request, we do not actually know what port or ports will
actually be used when we connect to the resulting address.  Hence my
suggestion to use AutomapHostsOnResolve with IsolateHostsByPort: it
postpones the real lookup until we're actually connecting to the
target host, and we know what port it's using.

(Not sure if that makes sense; if not, just say so: I am pretty sleepy atm)

-- 
Nick