[tor-dev] Proposal: Controller events to better understand connection/circuit usage
karsten at torproject.org
Sat Feb 23 21:22:50 UTC 2013
On 2/22/13 8:50 PM, Rob Jansen wrote:
> On Fri, Feb 22, 2013 at 12:59 PM, Karsten Loesing <karsten at torproject.org>wrote:
>> If anything here sounds strange to you, please let me know. I'm not
>> 100% certain that this is the best approach to track circuits from
>> client to exit, or if it's even correct.
>> For example, I assume here that circuit IDs are unique between two
>> nodes, which I think is correct. But before working on this I also
>> assumed that a circuit uses a single connection for both inbound and
>> outbound directions (which is apparently not the case).
> Whether or not your assumption about circuit ids is correct depends on
> which circuit id you are referring to - the source relay circuit id, the
> destination relay circuit id, or the id(s) written in the cells. Here is
> how I understand the ids work:
> Suppose relayA and relayB are part of the same circuit, and relayA is
> closer to the client than relayB. The id by which relayA refers to its
> circuit_t state is only unique to relayA and is chosen when the circuit_t
> struct is created. Similarly, the id by which relayB refers to its
> circuit_t state is only unique to relayB and also chosen when the circuit_t
> struct is created. Check this in the *circuit_new() functions in
> circuitlist.c. Lets refer to these as circuit UIDs, as they are unique and
> only known to individual relays.
> Now, relayA writes a circuit id into cells it sends to relayB, so that
> relayB knows which circuit the cells belong to, but it does not necessarily
> use its circuit UID. The id for this is computed in
> get_unique_circ_id_by_conn() in circuitbuild.c and is stored in n_circ_id
> in the circuit_t struct at relayA. Similarly, relayB uses p_circ_id from
> the or_circuit_t struct when sending cells back to relayA.
> Upon receiving cells from relayA, relayB immediately uses the id written in
> the cell (relayA's circ->n_circ_id) to look up relayB's UID for its
> circuit_t state. There is a circuit id map that is kept for this purpose.
> This works similarly from relayB to relayA.
> Now, the problem that prevents us from linking these is that all of the
> controller events print out the UIDs, but not the n_circ_id or p_circ_id. I
> tried printing out all of these various IDs but never verified that it
> actually worked.
> Please let me know if my understanding is flawed in some way.
Your understanding of n_circ_id and p_circ_id matches mine, but are you
sure there's a UID for circuits other than origin circuits? I think you
mean origin_circuit_t->global_identifier. But there's no such field for
or_circuit_t or circuit_t. Or do you mean something else?
Anyway, your description made me realize that my previous attempt to
link CELL_STATS events was flawed for (at least) two reasons:
1. I assumed that a circuit ID (n_circ_id or p_circ_id) is unique for a
given pair of nodes, but it's really only unique for a given connection
between two nodes.
2. I also assumed that a circuit uses two different connections for
inbound and outbound queues, but it's actually the same connection that
is known under different connection IDs by the involved nodes.
Maybe this gets clearer when looking again at the example:
00:23:16 [fileclient-188.8.131.52] >>> circ=34760 conn=12
00:23:16 [tokenconn-184.108.40.206] <<< circ=34760 conn=31
Circuit ID 34760 is what fileclient picked as n_circ_id for this circuit
and what tokenconn stored as p_circ_id. There's a single OR connection
carrying this circuit which fileclient identifies as 12 and tokenconn as 31.
The part where this gets tricky is when we need to find out (reliably)
that connection IDs 12 and 31 refer to the same connection. fileclient
never tells tokenconn that it locally refers to this connection as ID
12, and vice versa. However, there can be more than one connection
between the two nodes, and it's perfectly valid for those connections to
each have a circuit with ID 34760 on them.
I modified my Java program to parse ORCONN events to match corresponding
connection IDs based on state transitions from NEW or LAUNCHED to
CONNECTED. This works in 99% of cases and only fails for OR connections
that were launched before the controller registered for ORCONN events.
But we probably don't care about that early bootstrapping phase anyway.
Here's the same circuit from my original example, this time with
explanations (IDs might differ, because this example comes from another
This queue is for circuit 34761 which runs over connection 12 at
fileclient. It's an outbound queue, though that isn't explicitly stated
here. The ID 34761 is what fileclient picked as n_circ_id.
00:23:16 [fileclient-220.127.116.11] >>> circ=34761 conn=12
There were three cells in outbound direction reported in this CELL_STATS
event. Leaving out other CELL_STATS events here.
This queue is also for circuit 34761, but running over connection 31 at
tokenconn. It's an inbound queue, so 34761 is a p_circ_id, chosen by
00:23:16 [tokenconn-18.104.22.168] <<< circ=34761 conn=31
Two cells in inbound direction.
Queue for circuit 26405 on connection 16 at relay tokenconn.
tl;dr: I _think_ it's possible to reconstruct circuits from ORCONN and
CELL_STATS events as they are currently specified in proposal 218.
More information about the tor-dev