Hi everyone,
About a week ago I've sent an email to the Torsocks maintainers to address some concerns I had about important issues of the current code base. For the reasons found below, I proposed a rewrite that I am willing to do and submit it to the community once I feel ok with it.
After some back and forth, the consensus was that I should go ahead and do it. Best case scenario, I come up with a tested library that has a smaller set of problems (hopefully) and could eventually be considered to replace the current version. Worst case, I lose my time :).
Now, I'm not looking to do that alone in my corner or anything, what I want is this effort to be as open as possible for everyone that want to help, contribute or/and follow it. Some help would be really appreciated for the compat. part (Windows, OS X, BSD, etc...).
Here is the repository I've created and started working on the rewrite. It has been branched from upstream master at commit e0bf65b5d3ca0e28b827c08d80b0fe1841d5a149
https://github.com/dgoulet/torsocks/tree/rewrite
There might be some rebasing going on in that branch in the next weeks so please don't consider it right now as "git stable" since I'm in the heavy part of getting everything together before starting a "normally growing" master branch. If you like to contribute or help, let me know and we'll make something work.
Big thanks to nickm, ioerror and sysrqb for their help on all this.
Cheers! David
Torsocks rewrite reasons: ---------------
(You can see that as either we put in 2000 lines of code to fix it in a ~3000 lines repository or we start fresh with some key aspect, maintainability and security in mind.)
Code and Maintenance Issues:
1) The code needs to be easier to maintain, so that it isn't sporadically maintained. Considering the number of issues (see below), it needs more love on a regular basis right now.
2) It has never been safe from the start so a rewrite will NOT impact the current state of security. There is not much tests right now.
3) Small code base. Fix it clean before it gets too big.
4) One of the biggest technical issues is that it's not thread safe and fixing it right now would add an important impact on performance adding locks to multiple global data structures. A clean rewrite would allow to take into account this *very* important feature from the start without any ugly hacks in the current code base.
5) This is a very, and I can't stress this enough, _VERY_ intrusive library so extra care MUST be put in making it bullet proof in terms of error handling. Right now, there is multiple call sites that can fail on error.
6) There is not even close to enough tests, thus not loosing that stable portion of a project.
7) No compatibility layer is present for multiple OS and architecture.
8) FD passing through Unix socket support. * Should at least be detected and notify the user or even deny execution since the process receiving the socket is not under torsocks control.
9) Symbol name space polluted. (should all be prefixed).
10) Libc hijacked calls of torsocks don't return correct values in error path. * This is bad because for instance the caller expect errno to be set and makes decision upon that value that MUST be right once returned. There is multiple call sites that are problematic.
11) IPv6 support. ----------------
David Goulet:
Hi everyone,
About a week ago I've sent an email to the Torsocks maintainers to address some concerns I had about important issues of the current code base. For the reasons found below, I proposed a rewrite that I am willing to do and submit it to the community once I feel ok with it.
Here is the repository I've created and started working on the rewrite. It has been branched from upstream master at commit e0bf65b5d3ca0e28b827c08d80b0fe1841d5a149
https://github.com/dgoulet/torsocks/tree/rewrite
Torsocks rewrite reasons:
- The code needs to be easier to maintain, so that it isn't
sporadically maintained. Considering the number of issues (see below), it needs more love on a regular basis right now.
- It has never been safe from the start so a rewrite will NOT impact
the current state of security. There is not much tests right now.
- One of the biggest technical issues is that it's not thread safe and
fixing it right now would add an important impact on performance adding locks to multiple global data structures. A clean rewrite would allow to take into account this *very* important feature from the start without any ugly hacks in the current code base.
Please be extra careful with locking, especially since you're going to redesign this.
I haven't personally reviewed torsock's current data structure use, but in general, when somebody looks at a codebase and says "this code is not threadsafe - it needs more locking around its data structure use", that person is usually wrong, or at best only half right. It's almost always better to get the locking out of your critical path and use thread local storage, message passing, and/or job scheduling instead of direct lock acquisition+release for commonly used data structures.
This is not just for performance reasons. Deadlocks are forever.
Library call hooking like this is especially tricky, because it could easily lead you to subtle instances of lock order reversal deadlock if you attempt to acquire your data structure lock before an OS/app lock via one socket call path, but after it in another (for example, due to OS/app abstractions that cause certain socket calls to be merely wrappers for combinations of others).
Such app-specific and race-prone deadlock conditions will be very annoying to track down. Careful design of how you deploy locking and concurrency is a very wise time investment for this reason. You should probably post a design doc and ask for review of it.
- This is a very, and I can't stress this enough, _VERY_ intrusive
library so extra care MUST be put in making it bullet proof in terms of error handling. Right now, there is multiple call sites that can fail on error.
Mike Perry:
David Goulet:
Hi everyone,
About a week ago I've sent an email to the Torsocks maintainers to address some concerns I had about important issues of the current code base. For the reasons found below, I proposed a rewrite that I am willing to do and submit it to the community once I feel ok with it.
Here is the repository I've created and started working on the rewrite. It has been branched from upstream master at commit e0bf65b5d3ca0e28b827c08d80b0fe1841d5a149
https://github.com/dgoulet/torsocks/tree/rewrite
Torsocks rewrite reasons:
- The code needs to be easier to maintain, so that it isn't
sporadically maintained. Considering the number of issues (see below), it needs more love on a regular basis right now.
- It has never been safe from the start so a rewrite will NOT impact
the current state of security. There is not much tests right now.
- One of the biggest technical issues is that it's not thread safe and
fixing it right now would add an important impact on performance adding locks to multiple global data structures. A clean rewrite would allow to take into account this *very* important feature from the start without any ugly hacks in the current code base.
Please be extra careful with locking, especially since you're going to redesign this.
Of course. :)
I haven't personally reviewed torsock's current data structure use, but in general, when somebody looks at a codebase and says "this code is not threadsafe - it needs more locking around its data structure use", that person is usually wrong, or at best only half right. It's almost always better to get the locking out of your critical path and use thread local storage, message passing, and/or job scheduling instead of direct lock acquisition+release for commonly used data structures.
This is not just for performance reasons. Deadlocks are forever.
Library call hooking like this is especially tricky, because it could easily lead you to subtle instances of lock order reversal deadlock if you attempt to acquire your data structure lock before an OS/app lock via one socket call path, but after it in another (for example, due to OS/app abstractions that cause certain socket calls to be merely wrappers for combinations of others).
Such app-specific and race-prone deadlock conditions will be very annoying to track down. Careful design of how you deploy locking and concurrency is a very wise time investment for this reason. You should probably post a design doc and ask for review of it.
That makes sense. I'm not sure yet how complex will be the "locking model" given that there is simply none right now, I'll surely submit a design proposal if it becomes non trivial.
I come from a Linux kernel and RCU (https://en.wikipedia.org/wiki/Read-copy-update) background so I'm used to heavily document any locking scheme and create design proposal before making any production code. So, I agree 100% with you on the importance of thinking this through!
Thanks! David
- This is a very, and I can't stress this enough, _VERY_ intrusive
library so extra care MUST be put in making it bullet proof in terms of error handling. Right now, there is multiple call sites that can fail on error.
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
(snip!)
- One of the biggest technical issues is that it's not thread safe and
fixing it right now would add an important impact on performance adding locks to multiple global data structures. A clean rewrite would allow to take into account this *very* important feature from the start without any ugly hacks in the current code base.
Please be extra careful with locking, especially since you're going to redesign this.
I haven't personally reviewed torsock's current data structure use, but in general, when somebody looks at a codebase and says "this code is not threadsafe - it needs more locking around its data structure use", that person is usually wrong, or at best only half right. It's almost always better to get the locking out of your critical path and use thread local storage, message passing, and/or job scheduling instead of direct lock acquisition+release for commonly used data structures.
Since everybody is piling on advice I might as well throw in some of my own, bear in mind I don't claim to fully understand the extent of what torsocks is capable of.
Doesn't it's behavior/functionality map well to a libev/libevent loop instead of a multi-threaded model?
It does mean you likely need to approach the problem from a slightly different angle, but with the I/O calls that are being made, it would provide a good high performance alternative without a lot of concurrency overhead.
I highly recommend reading some of the redis project's source code (https://github.com/antirez/redis), which also relies on libev and is quite readable (as far as C goes).
- warms0x --- xmpp: warms0x@riseup.net http: http://warms0x.github.com
Hello! Comments below.
warms0x:
(snip!)
- One of the biggest technical issues is that it's not thread safe and
fixing it right now would add an important impact on performance adding locks to multiple global data structures. A clean rewrite would allow to take into account this *very* important feature from the start without any ugly hacks in the current code base.
Please be extra careful with locking, especially since you're going to redesign this.
I haven't personally reviewed torsock's current data structure use, but in general, when somebody looks at a codebase and says "this code is not threadsafe - it needs more locking around its data structure use", that person is usually wrong, or at best only half right. It's almost always better to get the locking out of your critical path and use thread local storage, message passing, and/or job scheduling instead of direct lock acquisition+release for commonly used data structures.
Since everybody is piling on advice I might as well throw in some of my own, bear in mind I don't claim to fully understand the extent of what torsocks is capable of.
Doesn't it's behavior/functionality map well to a libev/libevent loop instead of a multi-threaded model?
You can see Torsocks as an "inprocess" library that is loaded using LD_PRELOAD and hijacks a good amount of symbols of the Libc to make sure any communications goes through the Tor network.
Now, this means that Torsocks is quite dependent on the application threading model which means that loading into an application that uses a lot of threads brings the possibility for anything *in* Torsocks to be concurrent.
Using a "big fat lock" would solve the issue but the performance hit would be quite important and it's undesirable.
I sent some days ago the locking design to handle such a thing. Maybe you could have a look at it if you are interested? :)
Cheers! David
It does mean you likely need to approach the problem from a slightly different angle, but with the I/O calls that are being made, it would provide a good high performance alternative without a lot of concurrency overhead.
I highly recommend reading some of the redis project's source code (https://github.com/antirez/redis), which also relies on libev and is quite readable (as far as C goes).
- warms0x
xmpp: warms0x@riseup.net http: http://warms0x.github.com
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
On Mon, Jun 03, 2013 at 09:38:30PM -0400, David Goulet wrote:
There might be some rebasing going on in that branch in the next weeks so please don't consider it right now as "git stable" since I'm in the heavy part of getting everything together before starting a "normally growing" master branch. If you like to contribute or help, let me know and we'll make something work.
Big thanks to nickm, ioerror and sysrqb for their help on all this.
David,
Does the current version of torsocks support Optimistic Data? That saves a round trip through the Tor network, and makes things snappier. Tor clients, servers, and recently the Tor Browser now support it.
Regardless, if you're rewriting torsocks, you might ensure the new version has such support. (Ticket #3711 is about this; there appears to be an unmerged branch for it.)
- Ian
On 6/4/13 2:08 PM, Ian Goldberg wrote:
David,
Does the current version of torsocks support Optimistic Data? That saves a round trip through the Tor network, and makes things snappier. Tor clients, servers, and recently the Tor Browser now support it.
I'd also say that you should consider extended socks code support, to know if a Tor Hidden Service exists or not: https://trac.torproject.org/projects/tor/ticket/6031
Fabio
Fabio Pietrosanti (naif):
On 6/4/13 2:08 PM, Ian Goldberg wrote:
David,
Does the current version of torsocks support Optimistic Data? That saves a round trip through the Tor network, and makes things snappier. Tor clients, servers, and recently the Tor Browser now support it.
Right now it does not but sysrqb did patches for that. Indeed, this rewrite _must_ support that.
I'd also say that you should consider extended socks code support, to know if a Tor Hidden Service exists or not: https://trac.torproject.org/projects/tor/ticket/6031
I'll certainly add this feature!
Thanks for the pointers! David
Fabio _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Before rewriting torsocks, would it make sense to takeover proxychains development either and/or to fork it?
Of what I can see, proxychains and tsocks have not been updated since 2006 and 2002 (http://sourceforge.net/projects/proxychains/). I'm not sure how "alive" these projects are.
Torsocks is of course Tor centric so it can be aware of different things like what Ian talked about which is to use Optimistic data and also hidden services availability, etc...
So, from my point of view proxychains is useful but for a broader range of use case where torsocks would be only for Tor with more specifics features.
Cheers! David
adrelanos:
Before rewriting torsocks, would it make sense to takeover proxychains development either and/or to fork it? _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
David Goulet:
Of what I can see, proxychains and tsocks have not been updated since 2006 and 2002 (http://sourceforge.net/projects/proxychains/). I'm not sure how "alive" these projects are.
The sourceforge project is dead.
There are two forks, maybe alive by your definition.
8 months ago: https://github.com/haad/proxychains
1 month ago: https://github.com/rofl0r/proxychains
On Wed, Jun 5, 2013 at 2:43 PM, adrelanos adrelanos@riseup.net wrote:
Before rewriting torsocks, would it make sense to takeover proxychains development either and/or to fork it?
I'm not sure that proxychains is a great stating point either. There hasn't been a new release since 2006 as far as I can tell, if I'm looking at the right page. Also, some of the code issues there make me cringe on first reading. (strcat and strcpy and sprintf into stack-allocated buffers; doing "int len=0; read(fd,(char*) &len, 1)" to read a one-byte value from the network as an integer; etc).
-- Nick