Ian Goldberg:
On Wed, Jun 26, 2013 at 03:55:58PM -0400, David Goulet wrote:
Hi everyone,
For those who don't know, I've been working on a new version of Torsocks in the last three weeks or so.
https://lists.torproject.org/pipermail/tor-dev/2013-June/004959.html
I just wanted to give a quick status report on the state of the development.
The DNS resolution is working for domain name (PTR) and IPv4 address. Currently, Tor does not support IPv6 resolution but the torsocks code support it.
Hidden service onion address resolution is also working using a "dead IP range" acting as cookie that is sent back to the user and mapped to the .onion address on the hijacked connect().
I've changed quite a bit the configuration file (torsocks.conf) to fit the style of tor (torrc). At this point, the tor address and port can be configured as well as the "dead IP range" mention above. More is coming but pretty simple for now.
Logging is working, connection registry and thread safety as well. There is also a compat layer for mutexes and once I start porting the project to other *nix system (BSD, OS X, ...) probably more subsystem will be added to that compat layer.
So, in a nutshell, some libc calls still need to be implemented, *moar* tests and other OS supports. I'm confident to have a beta version to present to the community in a couple of weeks (if nothing goes wrong).
Feel free to browse the code, comment on it, contribute!, etc...
Are non-blocking sockets, select/poll/etc. (especially at connect() time), and optimistic data on the to-do list?
Yes! Good point I should have put the todo list. So yes, non block socket support.
For optimistic data, it is kind of tricky. I can use it for DNS resolution without a problem because torsocks control the complete flow of data from opening a SOCKS5 connection to closing it after the DNS response is received however for actual real data (sendmsg, send, ...) a connect is needed before so it would means that a connect() call will return "yes OK socket connected" but where in fact it is not really true. So, when the first data are sent, there is a possibility that the Tor connections failed or even we block for an unknown amount of time during the send*/write() call.
Now the question is, is this the kind of behavior that would be acceptable meaning basically lying to the caller at connect() and possibly blocking I/O calls and returning something like ECONNRESET or ENOTCONN if the Tor socks5 connection fails.
This is *real* tricky especially with non blocking socket, if torsocks needs to do some possible blocking call for the SOCKS5 replies during an I/O call from the caller that is not suppose to block. Furthermore, having pending data that *might* come at any time on the connection from the SOCKS5 negotiation, the caller could put the file descriptor in poll() mode, wake up and try to receive the data but where in fact it's the socks5 reply... it's possible to handle that but it seems here a VERY intrusive behavior. Does optimistic data worth it here vis-a-vis the complexity of handling that it and high intrusiveness ?
Cheers! David
Thanks,
- Ian
On Thu, Jun 27, 2013 at 03:11:23PM -0400, David Goulet wrote:
Ian Goldberg:
On Wed, Jun 26, 2013 at 03:55:58PM -0400, David Goulet wrote:
Hi everyone,
For those who don't know, I've been working on a new version of Torsocks in the last three weeks or so.
https://lists.torproject.org/pipermail/tor-dev/2013-June/004959.html
I just wanted to give a quick status report on the state of the development.
The DNS resolution is working for domain name (PTR) and IPv4 address. Currently, Tor does not support IPv6 resolution but the torsocks code support it.
Hidden service onion address resolution is also working using a "dead IP range" acting as cookie that is sent back to the user and mapped to the .onion address on the hijacked connect().
I've changed quite a bit the configuration file (torsocks.conf) to fit the style of tor (torrc). At this point, the tor address and port can be configured as well as the "dead IP range" mention above. More is coming but pretty simple for now.
Logging is working, connection registry and thread safety as well. There is also a compat layer for mutexes and once I start porting the project to other *nix system (BSD, OS X, ...) probably more subsystem will be added to that compat layer.
So, in a nutshell, some libc calls still need to be implemented, *moar* tests and other OS supports. I'm confident to have a beta version to present to the community in a couple of weeks (if nothing goes wrong).
Feel free to browse the code, comment on it, contribute!, etc...
Are non-blocking sockets, select/poll/etc. (especially at connect() time), and optimistic data on the to-do list?
Yes! Good point I should have put the todo list. So yes, non block socket support.
For optimistic data, it is kind of tricky. I can use it for DNS resolution without a problem because torsocks control the complete flow of data from opening a SOCKS5 connection to closing it after the DNS response is received however for actual real data (sendmsg, send, ...) a connect is needed before so it would means that a connect() call will return "yes OK socket connected" but where in fact it is not really true. So, when the first data are sent, there is a possibility that the Tor connections failed or even we block for an unknown amount of time during the send*/write() call.
Now the question is, is this the kind of behavior that would be acceptable meaning basically lying to the caller at connect() and possibly blocking I/O calls and returning something like ECONNRESET or ENOTCONN if the Tor socks5 connection fails.
This is *real* tricky especially with non blocking socket, if torsocks needs to do some possible blocking call for the SOCKS5 replies during an I/O call from the caller that is not suppose to block. Furthermore, having pending data that *might* come at any time on the connection from the SOCKS5 negotiation, the caller could put the file descriptor in poll() mode, wake up and try to receive the data but where in fact it's the socks5 reply... it's possible to handle that but it seems here a VERY intrusive behavior. Does optimistic data worth it here vis-a-vis the complexity of handling that it and high intrusiveness ?
Cheers! David
It *is* kind of tricky. (See #3711.) But I don't think it's that much trickier than properly handling non-blocking sockets in the first place. For example:
- Application calls connect() - Torsocks intercepts, calls connect() - Now you have to do a fancy dance where the application is going to select() to wait for the connection to complete, but where torsocks has to get the connection to complete, *and* send the connect request, *and* wait for the connect reply. (In fact, with optimistic data, you *don't* have to do that last step.) So you have to play around a bit with the parameters to the select() call, etc. The torsocks version of select, poll, etc., have to recognize when select is called, and *any* socket is not fully end-to-end connected, to add "ready for write" / "ready for read" events for those sockets as appropriate. If libc_select returns ready for those sockets, handle them inside torsocks before returning to the caller. (But no blocking!) In the case you say above, where the application is polling for read, but it's really just the socks5 reply that's come in, the poll() in torsocks will need to read the reply first and mark the socket as fully connected. If there's more data (or if any other socket is ready), then great, return to the caller. If not, poll() again.
The only thing optimistic data changes is that (a) you don't wait for the Tor SOCKS5 connected response, (b) you have to be ready to eat that response when it comes, and (c) if the response is an error, ECONNRESET (or something) the socket.
Is it worth it? There's *significant* (like up to 33%) improvement in time-to-first-byte latency for client-speaks-first protocols (like HTTP). I believe that's worth it. But you're the one doing the implementation. ;-)
- Ian
On Thu, Jun 27, 2013 at 03:11:23PM -0400, David Goulet wrote:
Ian Goldberg:
Are non-blocking sockets, select/poll/etc. (especially at connect() time), and optimistic data on the to-do list?
Yes! Good point I should have put the todo list. So yes, non block socket support.
For optimistic data, it is kind of tricky.
It definitely is tricky. You just need to find the best way to have torsocks return the least untrue response that's allowed by the OS. I'm not going to reiterate what Ian said, but I'll just make some points about what I did.
I can use it for DNS resolution without a problem because torsocks control the complete flow of data from opening a SOCKS5 connection to closing it after the DNS response is received however for actual real data (sendmsg, send, ...) a connect is needed before so it would means that a connect() call will return "yes OK socket connected" but where in fact it is not really true. So, when the first data are sent, there is a possibility that the Tor connections failed or even we block for an unknown amount of time during the send*/write() call.
Yup, this is exactly the case (in addition to SOCKS4/A also).
Now the question is, is this the kind of behavior that would be acceptable meaning basically lying to the caller at connect() and possibly blocking I/O calls and returning something like ECONNRESET or ENOTCONN if the Tor socks5 connection fails.
The main problem I foresee with this is when torsocks wraps a program that does not fully implement error handling or does not implement it correctly. And, to be honest, I don't think you can let potentially faulty programs influence the features of *your* program (too much).
For what it's worth, I returned ENOTCONN and EBADF, but I think ENOTCONN is the most descriptive, I'm just not sure most programs check for it after a send()/write().
This is *real* tricky especially with non blocking socket, if torsocks needs to do some possible blocking call for the SOCKS5 replies during an I/O call from the caller that is not suppose to block. Furthermore, having pending data that *might* come at any time on the connection from the SOCKS5 negotiation, the caller could put the file descriptor in poll() mode, wake up and try to receive the data but where in fact it's the socks5 reply... it's possible to handle that but it seems here a VERY intrusive behavior. Does optimistic data worth it here vis-a-vis the complexity of handling that it and high intrusiveness ?
Well, you actually have more guarantees than you may think (unless I misunderstand you). You know torsocks will send the SOCKS5/4{,A) request and you know that before torsocks returns anything to the client application, torsocks *must* receive a response from Tor regarding the success or failure of establishing the proxy connection. As such, if you receive optdata from the client app and pass it to Tor which then will pass it to the endpoint (if possible), you know Tor *must* return a SOCKS reply *before* you receive any client data, so you simply read that off the buffer and then handle the connection in an appropriate manner. Simple. :)
Regarding poll(), torsocks really needs to wrap the multiplexing I/O syscalls ({p,}poll, {p,}select, epoll, kqueue, etc) or else you will run into some major problems (select() and poll() being much more important than the others). This is intrusive, but it's only a single write request (for all values of "write").
Personally, I think the most important feature of the optdata implementation is that you make it configurable.
Cheers! David
Thanks,
- Ian
- Matt
Matthew Finkel:
On Thu, Jun 27, 2013 at 03:11:23PM -0400, David Goulet wrote:
Ian Goldberg:
Are non-blocking sockets, select/poll/etc. (especially at connect() time), and optimistic data on the to-do list?
Yes! Good point I should have put the todo list. So yes, non block socket support.
For optimistic data, it is kind of tricky.
It definitely is tricky. You just need to find the best way to have torsocks return the least untrue response that's allowed by the OS. I'm not going to reiterate what Ian said, but I'll just make some points about what I did.
I can use it for DNS resolution without a problem because torsocks control the complete flow of data from opening a SOCKS5 connection to closing it after the DNS response is received however for actual real data (sendmsg, send, ...) a connect is needed before so it would means that a connect() call will return "yes OK socket connected" but where in fact it is not really true. So, when the first data are sent, there is a possibility that the Tor connections failed or even we block for an unknown amount of time during the send*/write() call.
Yup, this is exactly the case (in addition to SOCKS4/A also).
Actually, I did not see any reasons why this rewrite should support SOCKS4. Is there ?
Now the question is, is this the kind of behavior that would be acceptable meaning basically lying to the caller at connect() and possibly blocking I/O calls and returning something like ECONNRESET or ENOTCONN if the Tor socks5 connection fails.
The main problem I foresee with this is when torsocks wraps a program that does not fully implement error handling or does not implement it correctly. And, to be honest, I don't think you can let potentially faulty programs influence the features of *your* program (too much).
For what it's worth, I returned ENOTCONN and EBADF, but I think ENOTCONN is the most descriptive, I'm just not sure most programs check for it after a send()/write().
This is *real* tricky especially with non blocking socket, if torsocks needs to do some possible blocking call for the SOCKS5 replies during an I/O call from the caller that is not suppose to block. Furthermore, having pending data that *might* come at any time on the connection from the SOCKS5 negotiation, the caller could put the file descriptor in poll() mode, wake up and try to receive the data but where in fact it's the socks5 reply... it's possible to handle that but it seems here a VERY intrusive behavior. Does optimistic data worth it here vis-a-vis the complexity of handling that it and high intrusiveness ?
Well, you actually have more guarantees than you may think (unless I misunderstand you). You know torsocks will send the SOCKS5/4{,A) request and you know that before torsocks returns anything to the client application, torsocks *must* receive a response from Tor regarding the success or failure of establishing the proxy connection. As such, if you receive optdata from the client app and pass it to Tor which then will pass it to the endpoint (if possible), you know Tor *must* return a SOCKS reply *before* you receive any client data, so you simply read that off the buffer and then handle the connection in an appropriate manner. Simple. :)
Yup agree. Actually, the question here was more about if supporting optdata worth that non trivial effort but 33% is quite a big factor to consider for performance :).
Regarding poll(), torsocks really needs to wrap the multiplexing I/O syscalls ({p,}poll, {p,}select, epoll, kqueue, etc) or else you will run into some major problems (select() and poll() being much more important than the others). This is intrusive, but it's only a single write request (for all values of "write").
Can you detail why it's very important? You did some hacking in the old code base and I would like to know your experience on that. What possible major problems? What happens if it's not hijacked?
Personally, I think the most important feature of the optdata implementation is that you make it configurable.
By configurable you mean disabled or not ? What else is there to configure?
Thanks! David
Cheers! David
Thanks,
- Ian
- Matt
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
On Thu, Jun 27, 2013 at 05:39:08PM -0400, David Goulet wrote:
Matthew Finkel:
On Thu, Jun 27, 2013 at 03:11:23PM -0400, David Goulet wrote:
Ian Goldberg:
Are non-blocking sockets, select/poll/etc. (especially at connect() time), and optimistic data on the to-do list?
Yes! Good point I should have put the todo list. So yes, non block socket support.
For optimistic data, it is kind of tricky.
It definitely is tricky. You just need to find the best way to have torsocks return the least untrue response that's allowed by the OS. I'm not going to reiterate what Ian said, but I'll just make some points about what I did.
I can use it for DNS resolution without a problem because torsocks control the complete flow of data from opening a SOCKS5 connection to closing it after the DNS response is received however for actual real data (sendmsg, send, ...) a connect is needed before so it would means that a connect() call will return "yes OK socket connected" but where in fact it is not really true. So, when the first data are sent, there is a possibility that the Tor connections failed or even we block for an unknown amount of time during the send*/write() call.
Yup, this is exactly the case (in addition to SOCKS4/A also).
Actually, I did not see any reasons why this rewrite should support SOCKS4. Is there ?
As far as I know, latency is probably the reason SOCKS4 is still useful, but you can leave it on the TODO list as "patches welcome" if you don't think it's too important.
Regarding poll(), torsocks really needs to wrap the multiplexing I/O syscalls ({p,}poll, {p,}select, epoll, kqueue, etc) or else you will run into some major problems (select() and poll() being much more important than the others). This is intrusive, but it's only a single write request (for all values of "write").
Can you detail why it's very important? You did some hacking in the old code base and I would like to know your experience on that. What possible major problems? What happens if it's not hijacked?
Hrm. So my initial response was "Everything breaks :)" but then I thought about this and that's actually not true, at all. The real benefit to hijacking them is to progress the SOCKS handshake with a single select()/poll() from the client app rather than multiple calls. So, in retrospect, I'm not sure how important this is. I now realize I was actually thinking about another bug.
Personally, I think the most important feature of the optdata implementation is that you make it configurable.
By configurable you mean disabled or not ? What else is there to configure?
Yeah, sorry, I only meant the ability to enable/disable it, unless you can think of other nifty features to add.
Thanks! David
Cheers! David
Thanks,
- Ian
- Matt