[tor-dev] Proposal: Controller events to better understand connection/circuit usage
karsten at torproject.org
Thu Feb 28 11:34:03 UTC 2013
[Nick, can you merge my torspec proposal218 branch, please? Thanks!]
On 2/28/13 2:08 AM, Rob Jansen wrote:
> On Mon, Feb 25, 2013 at 10:28 AM, Karsten Loesing <karsten at torproject.org>
>> On 2/23/13 11:20 PM, Rob Jansen wrote:
>>> On Feb 23, 2013 4:22 PM, "Karsten Loesing" <karsten at torproject.org>
>>>> Your understanding of n_circ_id and p_circ_id matches mine, but are you
>>>> sure there's a UID for circuits other than origin circuits? I think you
>>>> mean origin_circuit_t->global_identifier. But there's no such field for
>>>> or_circuit_t or circuit_t. Or do you mean something else?
>>> Ok. Though I thought my original patch moved the global Id to the base
>>> circuit struct. Perhaps I didn't. Anyway, I'm not sure it matters...
>> Your original patch did move the ID to circuit_t, but I thought we
>> wanted to avoid numbering non-origin circuits (mostly because that
>> affects relays in non-TestingTorNetwork mode and could lead to busy
>> relays running out of IDs at some point), which is why I took this
>> change out.
> Right, I remember this now. I could imagine ways to get around the 'running
> out of ids' problem, like resetting the id counter after a given timeframe.
> In fact, we could just let the counter overflow and loop back to 0 on its
> own (assuming its a unsigned int type) since we really only need them to be
> unique approximately for the life of a circuit. If we see the ID come up
> again after an hour, we can assume its a new circuit.
I'd rather not want to change behavior in non-simulation mode. If we
really need a UID for non-origin circuits, we could add a separate
uint64_t global_identifier to circuit_t and only use that in simulating
mode. I'm not sure that we need it though.
>> Also, even if we moved the field to circuit_t, the new CELL_STATS events
>> would be the only ones using these UIDs, because all other events are
>> for origin circuits only. I don't see how these IDs would help us. We
>> never learn what UID the next or previous node in a circuit picks for a
>> given circuit.
> Not currently, but couldn't we implement the functionality where relays log
> or export their circuit UIDs and next/prev IDs over the control port?
> Though I'm not sure if you'd want this in mainline Tor if its only useful
> in simulation mode...
We could have relays log local circuit UIDs together with p_circ_id,
p_conn_id, n_circ_id, and n_conn_id in simulation mode. But I don't see
how that would facilitate circuit tracking compared to the current
approach using ORCONN and CELL_STATS events.
Here's an example:
- fileclient creates a circuit with
- UID 14,
- n_conn_id 15, and
- n_circ_id 19403.
- tokenglobal is the first hop in this circuit and reports
- (new) UID 12345,
- p_conn_id 32,
- p_circ_id 19403,
- n_conn_id 18, and
- n_circ_id 6710.
- tokenrelay is the second hop and reports
- (new) UID 23456,
- p_conn_id 17,
- p_circ_id 6710,
- n_conn_id 16,
- n_circ_id 34402.
- exit2 is the third and final hop and reports
- (new) UID 34567,
- p_conn_id 15, and
- p_circ_id 34402.
We need ORCONN events to know that fileclient's n_conn_id 15 is the same
connection as tokenglobal's p_conn_id 32. How would the new UIDs make
>>>> tl;dr: I _think_ it's possible to reconstruct circuits from ORCONN and
>>>> CELL_STATS events as they are currently specified in proposal 218.
>>> Great, but do we really expect every Tor controller parser to get this
>>> right? It seems complicated enough that there should be an easier way.
>>> Maybe it's just wishful thinking on my part.
>> I agree that reconstructing circuits from ORCONN and CELL_STATS events
>> is far from trivial. I don't really see how to make it simpler though.
> What about linking all of the IDs and UIDs as described above?
>> From an earlier mail in this thread:
>>>> Finally, Rob, should I look into CIRC_BW events you suggested a while
>>>> ago? If yes, what did you have in mind how that event would look like,
>>>> and when/by whom would it be emitted?
>>> If we want to do this, it would likely be an aggregation of all STREAM_BW
>>> events for a circuit, but only during the time when those streams
>>> to that circuit. I don't think it makes sense to emit it for every
>>> STREAM_BW event though, so what if we aggregate and emit it once per
>>> second? A format similar to the STREAM_BW format should work fine.
>> Done. I specified and implemented such a CIRC_BW event.
>> Here's the updated proposal 218 (Nick, please don't merge this yet):
> In section 5.3/5.4/5.5, these events are emitted in the
> second_elapsed_callback(), right? I wanted to verify that a relay who
> hasn't sent anything in a few seconds and then starts sending again will
> emit the event at the end of the second after which it resumed sending,
> rather than the first bytes after it resumed sending.
Yes, all these events are emitted in second_elapsed_callback().
> The last word in 5.4 should be 'reading' instead of 'read'.
> Is the specification of 5.5 and 5.6 complex enough to warrant including
> example outputs?
Sure, can't hurt. Added a few examples.
> In 5.6, were we planning on explaining how buckets can go negative and how
> that affects the reporting of the TB_EMPTY events?
I tried a better explanation:
ReadBucketEmpty (WriteBucketEmpty) is the time in millis that the read
(write) bucket was empty since the last refill. LastRefill is the
time in millis since the last refill.
If a bucket went negative and if refilling tokens didn't make it go
positive again, there will be multiple consecutive TB_EMPTY events for
each refill interval during which the bucket contained zero tokens or
less. In such a case, ReadBucketEmpty or WriteBucketEmpty are capped
at LastRefill in order not to report empty times more than once.
> Here's the tor branch:
>> Here's a Shadow log file containing all new events:
>> Here's the Java program that I used to parse the Shadow log file:
> Ugh. I really don't look forward to writing parsers for this. I guess there
> may be few projects that actually require this information, and those that
> do can use your code:)
I hope that some of the complexity goes away once we use a parsing
library like Stem. And I think we should provide parsing code to other
people looking into this.
>> And finally, here's the output, which should be easier to understand now:
>> Search for "Circuit [fileclient-184.108.40.206]:14" to find the circuit I
>> mentioned earlier in this thread.
>> Can you review the proposal changes and tell me if they make sense to you?
> See comments above. Overall, it looks good. I'm still wondering about the
> UIDs / IDs issue. It may be that we don't want to include that for other
> reasons, in which case the current implementation is fine.
Okay, in that case let's consider proposal 218 done for the moment,
unless we come up with a better idea to solve circuit-tracking thing.
I asked Nick above to merge my changes, so that I can put proposal 218
on the sponsor F year 3 wiki page as one result of the February 28
milestone. (That doesn't mean we cannot make it even better after
More information about the tor-dev