Filename: 214-longer-circids.txt Title: Allow 4-byte circuit IDs in a new link protocol Author: Nick Mathewson Created: 6 Nov 2012 Status: Open
0. Overview
Relays are running out of circuit IDs. It's time to make the field bigger.
1. Background and Motivation
Long ago, we thought that 65535 circuit IDs would be enough for anybody. It wasn't. But our cell format in link protocols is still:
Cell [512 bytes] CircuitID [2 bytes] Command [1 byte] Payload [509 bytes]
Variable-length cell [Length+5 bytes] CircID [4 bytes] Command [1 byte] Length [2 bytes] Payload [Length bytes]
This means that a relay can run out of circuit IDs pretty easily.
2. Design
I propose a new link cell format for relays that support it. It should be:
Cell [514 bytes] CircuitID [4 bytes] Command [1 byte] Payload [509 bytes]
Variable cell (Length+7 bytes) CircID [4 bytes] Command [1 byte] Length [2 bytes] Payload [Length bytes]
We need to keep the payload size in fixed-length cells to its current value, since otherwise the relay protocol won't work.
This new cell format should be used only when the link protocol is 4. (To negotiation link protocol 4, both sides need to use the "v3" handshake, and include "4" in their version cells. If version 4 or later is negotiated, this is the cell format to use.)
2.1. Better allocation of circuitID space
In the current Tor design, circuit ID allocation is determined by whose RSA public key has the lower modulus. How ridiculous! Instead, I propose that when the version 4 link protocol is in use, the connection initiator use the low half of the circuit ID space, and the responder use the low half of the circuit ID space.
3. Discussion
* Why 4 bytes?
Because 3 would result in an odd cell size, and 8 seems like overkill.
* Will this be distinguishable from the v3 protocol?
Yes. Anybody who knows they're seeing the Tor protocol can probably tell by the TLS record sizes which version of the protocol is in use. Probably not a huge deal though; which approximate range of versions of Tor a client or server is running is not something we've done much to hide in the past.
* Why a new link protocol and not a new cell type?
Because pretty much every cell has a meaningful circuit ID.
* Okay, why a new link protocol and not a new _set of_ cell types?
Because it's a bad idea to mix short and long circIDs on the same channel. (That would leak which cells go with what kind of circuits ID, potentially.)
* How hard is this to implement?
I wasn't sure, so I coded it up. I've got a probably-buggy implementation in branch "wide_cird_ids" in my public repository. Be afraid! More testing is needed!
On Tue, Nov 06, 2012 at 09:36:34PM -0500, Nick Mathewson wrote:
Relays are running out of circuit IDs. It's time to make the field bigger.
I don't doubt the second sentence, but is the first sentence actually true? Do we have any evidence / measurements / something here?
(Since circids are relative to the connection they're on, it's not clear to me that any given TLS connection accrues more than a few tens of thousands of circuits. And if a very few do, maybe the solution is to move to a new TLS connection for those rare cases, rather than impose a 2-byte penalty on every cell in all cases.)
--Roger
On Tue, Nov 6, 2012 at 9:55 PM, Roger Dingledine arma@mit.edu wrote:
On Tue, Nov 06, 2012 at 09:36:34PM -0500, Nick Mathewson wrote:
Relays are running out of circuit IDs. It's time to make the field bigger.
I don't doubt the second sentence, but is the first sentence actually true? Do we have any evidence / measurements / something here?
(Since circids are relative to the connection they're on, it's not clear to me that any given TLS connection accrues more than a few tens of thousands of circuits.
I think that's enough? 32K from A to B, or from B to A, is where we run out. So if A is a popular middle node, and B is a popular exit, most of the circuits between A and B will be A->B. So if we get "a few tens of thousands" of circuits from A to B, we hit the limit.
And if a very few do, maybe the solution is to move to a new TLS connection for those rare cases, rather than impose a 2-byte penalty on every cell in all cases.)
Maaaybe, but I sure can't think of a sane testable design for that. Can you? To do this sanely, we'd need to negotiate this before we exchange any actual data, and predict in advance that we'd want it. (We wouldn't want to do it on-the-fly for connections that happen to have large numbers of circuits: that way lies madness.)
Also, I think those "rare cases" are communications between the busiest Tor nodes. I think those communications might represent a reasonably large fraction of total Tor bytes, such that having a fallback mode might not save us so much.
And also, this only adds 1/256 additonal overhead before TLS happens. Not huge IMO. We could save far more than that by more intelligent TLS use, if we needed to.
On Tue, Nov 06, 2012 at 10:10:15PM -0500, Nick Mathewson wrote:
And if a very few do, maybe the solution is to move to a new TLS connection for those rare cases, rather than impose a 2-byte penalty on every cell in all cases.)
Maaaybe, but I sure can't think of a sane testable design for that. Can you? To do this sanely, we'd need to negotiate this before we exchange any actual data, and predict in advance that we'd want it. (We wouldn't want to do it on-the-fly for connections that happen to have large numbers of circuits: that way lies madness.)
Also, I think those "rare cases" are communications between the busiest Tor nodes. I think those communications might represent a reasonably large fraction of total Tor bytes, such that having a fallback mode might not save us so much.
Ah. By "a new TLS connection", I didn't mean a new design or anything -- I meant simply a second TLS connection.
And also, this only adds 1/256 additonal overhead before TLS happens. Not huge IMO. We could save far more than that by more intelligent TLS use, if we needed to.
I agree that it's an ok price to pay if we decide it's the best way to go.
--Roger
On Wed, Nov 7, 2012 at 12:51 AM, Roger Dingledine arma@mit.edu wrote:
On Tue, Nov 06, 2012 at 10:10:15PM -0500, Nick Mathewson wrote:
And if a very few do, maybe the solution is to move to a new TLS connection for those rare cases, rather than impose a 2-byte penalty on every cell in all cases.)
Maaaybe, but I sure can't think of a sane testable design for that. Can you? To do this sanely, we'd need to negotiate this before we exchange
any
actual data, and predict in advance that we'd want it. (We wouldn't want
to
do it on-the-fly for connections that happen to have large numbers of circuits: that way lies madness.)
Also, I think those "rare cases" are communications between the busiest
Tor
nodes. I think those communications might represent a reasonably large fraction of total Tor bytes, such that having a fallback mode might not save us so much.
Ah. By "a new TLS connection", I didn't mean a new design or anything -- I meant simply a second TLS connection.
I wouldn't feel very good about this route: there are enough places in our design that assume one canonical OR connection with any given relay that changing this assumption would be emphatically nontrivial and error-prone.
On the other hand, reports of circuid ID exhaustion might be premature; I get no hits searching for "No unused circ IDs. Failing" except for our source code. Has anybody seem that warning IRL?
On Tue, Nov 6, 2012 at 9:36 PM, Nick Mathewson nickm@freehaven.net wrote:
2.1. Better allocation of circuitID space
In the current Tor design, circuit ID allocation is determined by whose RSA public key has the lower modulus. How ridiculous! Instead, I propose that when the version 4 link protocol is in use, the connection initiator use the low half of the circuit ID space, and the responder use the low half of the circuit ID space.
Shouldn't this say "the responder use the high half of the circuit ID space"?
Tim
On 11/10/12 11:50 AM, Tim Wilde wrote:
On Tue, Nov 6, 2012 at 9:36 PM, Nick Mathewson nickm@freehaven.net wrote:
2.1. Better allocation of circuitID space
In the current Tor design, circuit ID allocation is determined by whose RSA public key has the lower modulus. How ridiculous! Instead, I propose that when the version 4 link protocol is in use, the connection initiator use the low half of the circuit ID space, and the responder use the low half of the circuit ID space.
Shouldn't this say "the responder use the high half of the circuit ID space"?
Yes. Nick already fixed this in the torspec.git repository:
- and the responder use the low half of the circuit ID space. + and the responder use the high half of the circuit ID space.
https://gitweb.torproject.org/torspec.git/blob/HEAD:/proposals/214-longer-ci...
Best, Karsten