Hi everyone,
I'm posting here the small design document I've made for the Torsocks locking mechanism.
I'm looking for reviews, comments, improvements, ... well anything useful! :). (Even English mistakes since it's not my primary language).
Future changes will be pushed here if any are needed after this thread is done with.
https://github.com/dgoulet/torsocks/blob/rewrite/doc/proposals/01-Thread-saf...
Thanks people! David
On Sun, Jun 09, 2013 at 04:52:45PM -0400, David Goulet wrote:
Hi everyone,
Hi David!
I'm posting here the small design document I've made for the Torsocks locking mechanism.
I'm looking for reviews, comments, improvements, ... well anything useful! :). (Even English mistakes since it's not my primary language).
Future changes will be pushed here if any are needed after this thread is done with.
https://github.com/dgoulet/torsocks/blob/rewrite/doc/proposals/01-Thread-saf...
Great! Thanks for putting in the time/thought for this!
Thanks people! David
Date: 09/06/2013 Author: David Goulet dgoulet@ev0ke.net Contributors:
- Mathieu Desnoyers mathieu.desnoyers@efficios.com
This document details the design of torsocks locking for thread safety.
The basis is that at *any* time any thread can interact with Torsocks (connect(2), DNS resolution, ...). Considering that, some sort of synchronization is needed since torsocks needs to keep track of per socket connection. They are kept in a central registry that every thread needs to access.
This sounds reasonable.
The reason why it needs to be shared accross threads is because of fd passing.
s/accross/across/ :)
It is possible and even not uncommon that threads exchange file descriptor(s) so we can't keep a registry of connections using TLS (Thread Local Storage). Furthermore, a process could easily, for instance, close(2) a socket within a signal handler making Torsocks access thread storage inside a signal handler and this SHOULD NOT be done, just never.
Considering the above, a locking mechanism is needed to protect the registry. Insertion, deletion and lookup in that registry CAN occur at the same time so a mutex has to be used to protect any action *on* it. This protection is provided by a mutex named "registry lock".
Sure, this sounds like a good idea.
Now, the objects inside that registry have to be protected during their life time (as long as a reference is held) after a lookup. To protect the free() from concurrent access, a refcount is used. Basically, each possible action on a connection object checks if the connection refcount is down to 0. If so, this means that no one is holding a reference to the object and it is ready to be destroyed. For that, the refcount needs to start with a value of 1.
The beginning of this paragraph seems a bit unclear to me (maybe it's just me). As of right now, as I understand it, we have an opaque object that, at a minimum, holds references to the set of (open socket, ref count) pairs. (Please correct me if I'm wrong.) In addition to this, whenever this pair is retrieved, the refcount is incremented by 1 within the registry thus allowing for use of this reference outside of any critical sections. Then, when the function no longer needs the reference, it decrements the refcount. If refcount reaches 0, then the conn is not needed.
For this scheme to work, the connection object MUST be immutable once created. Any part that can change during the lifetime of the connection object MUST be protected with a mutex.
This mechanism is used to avoid heavy contention on the registry lock. Without the refcount, a connection would have to be protected within the registry lock to avoid race between an access to the object and freeing it. As long as the lookups in the registry are not that frequent, this should scale since the critical section is pretty short.
I think this should be okay. As it is currently implemented, Torsocks only "remembers" socket that *it* needs. By this I mean that it only maintains a reference to a socket if it is currently using the connection in a SOCKS handshake or DNS resolution. Once Torsocks has finished with its necessary operations, it returns complete control to the application and "forgets" about the socket.
If you plan to keep a similar design, then registry entries will mostly be short-lived and this should scale well. If your plan is to have the registry maintain a complete list of all open connections, then this still may scale and have little, or no, adverse affect on performance but I'm unsure if this provides much benefit. Have you decided which of these designs you want to use?
Here is the algorithm for a read/write and destroy operation.
Prerequisites:
Refcount of a connection object starts at 1. When down to 0, it can be destroyed.
Add to registry (e.g.: connect(2)):
- lock(registry)
new_conn refcount = 1;
add to registry
/* We could also make a lookup for duplicates. */ 4) unlock(registry)
Read/Write op. (e.g.: DNS Lookup):
- lock(registry)
conn = lookup
if conn:
atomic_inc(conn refcount)
- unlock(registry)
- [action using the conn]
- if atomic_dec_return(conn refcount) == 0:
/*
- This is safe because at this point the connection object is not visible
- anymore to any thread so we can safely free the object after unlocking it.
*/ 8) free conn
Destroy from registry (e.g.: close(2)):
- lock(registry)
conn = lookup
if conn:
/*
- Make sure the connection object is not visible once we release the registry
- lock. Note that we DO NOT increment the refcount here because we are on the
- destroy path so we have to make the refcount come down to 0.
*/ 4) remove from registry 5) unlock(registry) 6) if atomic_dec_return(conn refcount) == 0: 7) free(conn)
I think these look good. They seem to lock a very small amount of code, which is good. I suppose this is a nitpicky question but regarding atomic_dec_return() and atomic_inc(), do they both return the updated counter value? Or is atomic_dec_return() the only one that does that (denoted by the _return suffix)?
I think this is simple, and simple is good. I have some questions about the actual registry, but I'll defer asking them until they're more on-topic. I'll try to take another pass of this when I'm less asleep.
Thanks again!
- Matt
Matthew Finkel:
On Sun, Jun 09, 2013 at 04:52:45PM -0400, David Goulet wrote:
Hi everyone,
Hi David!
I'm posting here the small design document I've made for the Torsocks locking mechanism.
I'm looking for reviews, comments, improvements, ... well anything useful! :). (Even English mistakes since it's not my primary language).
Future changes will be pushed here if any are needed after this thread is done with.
https://github.com/dgoulet/torsocks/blob/rewrite/doc/proposals/01-Thread-saf...
Great! Thanks for putting in the time/thought for this!
Thanks people! David
Date: 09/06/2013 Author: David Goulet dgoulet@ev0ke.net Contributors:
- Mathieu Desnoyers mathieu.desnoyers@efficios.com
This document details the design of torsocks locking for thread safety.
The basis is that at *any* time any thread can interact with Torsocks (connect(2), DNS resolution, ...). Considering that, some sort of synchronization is needed since torsocks needs to keep track of per socket connection. They are kept in a central registry that every thread needs to access.
This sounds reasonable.
The reason why it needs to be shared accross threads is because of fd passing.
s/accross/across/ :)
It is possible and even not uncommon that threads exchange file descriptor(s) so we can't keep a registry of connections using TLS (Thread Local Storage). Furthermore, a process could easily, for instance, close(2) a socket within a signal handler making Torsocks access thread storage inside a signal handler and this SHOULD NOT be done, just never.
Considering the above, a locking mechanism is needed to protect the registry. Insertion, deletion and lookup in that registry CAN occur at the same time so a mutex has to be used to protect any action *on* it. This protection is provided by a mutex named "registry lock".
Sure, this sounds like a good idea.
Now, the objects inside that registry have to be protected during their life time (as long as a reference is held) after a lookup. To protect the free() from concurrent access, a refcount is used. Basically, each possible action on a connection object checks if the connection refcount is down to 0. If so, this means that no one is holding a reference to the object and it is ready to be destroyed. For that, the refcount needs to start with a value of 1.
The beginning of this paragraph seems a bit unclear to me (maybe it's just me). As of right now, as I understand it, we have an opaque object that, at a minimum, holds references to the set of (open socket, ref count) pairs. (Please correct me if I'm wrong.) In addition to this, whenever this pair is retrieved, the refcount is incremented by 1 within the registry thus allowing for use of this reference outside of any critical sections. Then, when the function no longer needs the reference, it decrements the refcount. If refcount reaches 0, then the conn is not needed.
Exactly that!
I'll clarify the "object" contained in this registry in the document. So, basically the registry holds connection object identified by each socket we've intercept. There is also some other things but the important part is how it can be used and accessed outside the critical section.
For this scheme to work, the connection object MUST be immutable once created. Any part that can change during the lifetime of the connection object MUST be protected with a mutex.
This mechanism is used to avoid heavy contention on the registry lock. Without the refcount, a connection would have to be protected within the registry lock to avoid race between an access to the object and freeing it. As long as the lookups in the registry are not that frequent, this should scale since the critical section is pretty short.
I think this should be okay. As it is currently implemented, Torsocks only "remembers" socket that *it* needs. By this I mean that it only maintains a reference to a socket if it is currently using the connection in a SOCKS handshake or DNS resolution. Once Torsocks has finished with its necessary operations, it returns complete control to the application and "forgets" about the socket.
If you plan to keep a similar design, then registry entries will mostly be short-lived and this should scale well. If your plan is to have the registry maintain a complete list of all open connections, then this still may scale and have little, or no, adverse affect on performance but I'm unsure if this provides much benefit. Have you decided which of these designs you want to use?
The registry contains every connection tracked by Torsocks that goes through the Tor network. For instance, as you said, a simple DNS resolution would create a short live connection that does not need to go into the registry since once we return the DNS answer, there is no point of keeping it.
Here is the algorithm for a read/write and destroy operation.
Prerequisites:
Refcount of a connection object starts at 1. When down to 0, it can be destroyed.
Add to registry (e.g.: connect(2)):
- lock(registry)
new_conn refcount = 1;
add to registry
/* We could also make a lookup for duplicates. */ 4) unlock(registry)
Read/Write op. (e.g.: DNS Lookup):
- lock(registry)
conn = lookup
if conn:
atomic_inc(conn refcount)
- unlock(registry)
- [action using the conn]
- if atomic_dec_return(conn refcount) == 0:
/*
- This is safe because at this point the connection object is not visible
- anymore to any thread so we can safely free the object after unlocking it.
*/ 8) free conn
Destroy from registry (e.g.: close(2)):
- lock(registry)
conn = lookup
if conn:
/*
- Make sure the connection object is not visible once we release the registry
- lock. Note that we DO NOT increment the refcount here because we are on the
- destroy path so we have to make the refcount come down to 0.
*/ 4) remove from registry 5) unlock(registry) 6) if atomic_dec_return(conn refcount) == 0: 7) free(conn)
I think these look good. They seem to lock a very small amount of code, which is good. I suppose this is a nitpicky question but regarding atomic_dec_return() and atomic_inc(), do they both return the updated counter value? Or is atomic_dec_return() the only one that does that (denoted by the _return suffix)?
Yes so "_return" means that it returns the result of the atomic action. This is to avoid a race between an atomic action and the atomic read.
Ref: http://gcc.gnu.org/onlinedocs/gcc-4.8.1/gcc/_005f_005fatomic-Builtins.html
I think this is simple, and simple is good. I have some questions about the actual registry, but I'll defer asking them until they're more on-topic. I'll try to take another pass of this when I'm less asleep.
Please if you have questions about the registry, ask them :). I've not made any final decisions yet for the data structure but an hash table makes the most sense to me at the moment containing "connection object" indexed by socket fd since we are sure it's unique during the lifetime process as long as we catch the connect/close of course.
Thanks! David
Thanks again!
- Matt
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev