Route selection

Paul Syverson syverson at
Tue Dec 9 17:49:08 UTC 2003

It looks like we're all largely in agreement, which is good, although
we seem to be talking past each other a bit, which could be better.
So, rather than just respond point to point, I'm going to try to summarize
first based on what we've all said.

Directory servers have 4 primary things they announce about
nodes. (I'm speaking now about relatively short term design. What we
should design say three years from now requires, as we all keep
saying, more research.)

1. Who the nodes in the network are.
2. What the exit policy of a node is, as of last check.
3. Who is considered up as of last check (not nec. same freq. as in 2).
4. Who is considered reliable.

We have been using reputation to some extent for all of these things.
I only meant "reputation" to apply to 1 and 4, although in different
ways. We have also been talking about voting for all of these things.
When I talk about voting, how often the votes need occur varies for
each of these. We may decide to combine votes for efficiency, but we
should work out the needs first.

For the basic open Internet Tor network:

Voting on who the nodes in the network are probably needs doing not
more than once a day or even less frequently. I'm going to talk below
about the process of deciding who gets added (not the vote itself,
which by comparison should be straightforward).  This point takes the
perspective of adding nodes. If a node (key) is going to be retired,
the timeframe remains about the same. For revocation due to
compromise, etc., it should be possible to have an immediate vote and
change (quickly propagating to the mirrors etc. raises issues I leave

Exit policy changes probably also require infrequent updates, thus
the voted announced exit policy changes should need to be fairly
infrequent once or twice a day.

We seem to have been in violent agreement about who is considered up
as of last check, and why we are telling nodes about it. It should
only be a good bet suggestion that trying to connect to that node will
succeed. Nobody is suggesting having a reputation here although we
all seem to be rejecting the perceived proposal by each other to
have a reputation. We've argued some about how frequently this should
be voted, but we all seem to agree that experience will provide
guidance. A default of half an hour to an hour seems mutually acceptable
as a starting point. I agree that including node membership and exit
policy probably adds little to the overhead of this vote, so for
convenience it can all be lumped together. We just need to remember
why they are in there, so we can change should what is convenient

There are some subtleties we've been skirting in deciding (based
solely on a directory server's own testing) whether to vote that a
node is up or not and whether node state is enough. I'll defer those
to below.

On reliability: this is where we're all worried about making it too
complex. If you run nodes that are unreliable enough persistently
enough presumably that should get you booted from the network as a
node operator, which is really the who-the-nodes-are category.  For
reasons I hinted at and Roger noted in a little more detail, it may be
that there is little value in giving this out.  There will be much
complexity and chance to create reputation attacks, thus it may be
better to simply limit to membership and up state (but see comment
below about links). But we would need some sort of criteria for
directory servers to make the judgment that a node is too unreliable
that it should be removed from the network list. I'll leave that issue
for another time. I guess I'm proposing that a reputation for
reliability however determined not be given out to clients since it is
likely to do more harm than good. Hmmm.

On Mon, Dec 08, 2003 at 08:50:38PM -0500, Roger Dingledine wrote:
> On Fri, Dec 05, 2003 at 02:08:40PM -0500, Paul Syverson wrote:
> > On Fri, Dec 05, 2003 at 11:07:41AM -0500, Adam Shostack wrote:
> > > | The reason we need to tell clients which nodes are up right now is so
> > > 
> > > You can't know what nodes are up "right now" except by sending a
> > > packet from a near neighbor.  The best you can do is know which nodes
> > > have been "fairly stable" for "a while" and expect that past
> > > performance is an indicator of future returns.
> Tor has a really strong advantage here over the high-latency
> systems. Mixminion must solve the reputation problem because its
> directories must list which nodes will still be up in several hours as the
> message slowly goes from hop to hop. Tor directories just suggest nodes to
> try -- clients can adapt to dead nodes and still complete transactions.
> Let's remember what we'd like the directory to do. First, we'd like
> to detect and exclude _honest_ nodes that are down or flaky. We could
> require that listed nodes must have been connected continuously for, say,
> 30 minutes. For short-lived Tor transactions, this should be plenty,
> but since ssh or IM can span hours, I agree that we'll need to measure
> experimental results to find a good value for 30. But it's probably
> good enough for now. (We might also come up with a way to shift streams
> from one circuit to another so they don't break when they lose a node
> (other than the exit node), and Nick and I have been pondering it in the
> background -- but it's probably not worth it if our paths are so short
> that they have few/no intermediate hops.)

If we have at least three nodes in most connections, we will want to
have the ability to circuit hop long-lived connections so that there
is not some single long-lived circuit for the intermediate node to go
looking for the endpoints of. Circuits opening and closing should be
done on relatively the same basis if that is feasible.  Also, we're
not ready for synchronous batch networks (wrt data) yet obviously, but
we should shoot to be compatible with synchronous circuits as a step
in the right direction.

> Second, we'd like to detect and exclude misbehaving nodes. We've
> speculated about verification procedures to make sure a node is performing
> correctly and fairly -- make sure it's connecting to all routers, make
> sure it's processing cells, etc, even from indirect requests so the node
> can't distinguish the verification from a normal request. This level of
> verification is needed to reduce the opportunity for a node to single
> out a target victim and trace him. It's also needed to keep a bad node
> from degrading performance and reliability on the network. We have some
> ideas on verification approaches we could explore, but this is a largely
> unsolved (and hard) problem. Verification won't help misbehavior at
> the network edges (e.g. accepting certain clients, or connecting to
> certain websites), and will have limited ability to detect problems
> with long-lived transactions, unless the dirserver starts simulating
> three-hour ssh connections.

Another reason for circuit hopping. If clients are going to be
attempting to choose helpers, they should probably be building their
own local reputations for who reliable helpers are. System guidance
for criteria may be useful, but the directory servers need not supply
the performance information as long as they are doing good job on

> In addition to how directories handle the partitioning opportunities,
> we must also figure out how clients should handle them. If you make a
> circuit to Alice and try to extend to Bob and she says he's down, but the
> directory says he's up, do you believe Alice? We would like clients to
> behave uniformly, to reduce partitioning problems. Do you give up on Bob
> entirely, because something fishy is going on? Do you give up on Alice?
> Either one could be framing the other. If you give up on Alice, do you do
> it just for that circuit (so she can't dictate your next hop in the path
> by only letting you connect to the one she wants), or for later circuits
> too because if something fishy is going on you should remember about it?

KISS. Choose the first node based on the membership and state
information and local experience. Barring specific exit needs
(including an exit helper), choose the remainder based on just
membership and state information.

We've often been assuming that node state is all we need present, but
link state must be relevant because (as we all have been saying) if
Alice says she can't talk to Bob, there is no way to know who is lying
or broken or if it really is the network link between them. So even if
node state is all we need for reliability sans insider adversaries,
link information avoids the complexity of this judgment. The link
between A and B is down: no blame attached. Again, if the directory
servers have some criteria by which they decide Alice is down even
though she is reachable because she only processes connections with
Bob, Charlie, and Danielle and no one else, that's another
matter. More hard stuff to work out, but now it can be looked at

> Ideally our verification procedures above would limit the situations
> where this becomes an issue, but it still must be solved. Hard problem.


> > That said, even reliable nodes do go down for maintainance, etc.  If
> > someone goes on vacation, has some major change, etc.  they might
> > choose to take their node down for a day/few days/week,
> I've been planning a 'friendly shutdown' procedure: you hit ^c once, and
> that switches the node into a 'closing down' state, where it rejects all
> new streams but continues to pass cells for old ones. When the last stream
> closes, it exits; if you hit ^c again, or perhaps after a certain timeout,
> it just closes them and exits. It could easily do something to signal
> the directory server to take it off the list (for example, by no longer
> submitting descriptors to the dirservers); thus it can retire safely.
> > but don't want
> > to lose the identity and reputation associated with that node
> > indefinitely.
> Identity is based on manual approval by the dirserver operators, not
> on performance. It seems to me that 'reputation' for honest nodes (that
> is, has he been up recently) doesn't need any long-term state at all. I
> don't think our reputation system wants to be complex enough to reach
> for results like "every Sunday night his node becomes flaky". Reputation
> systems to identify dishonest nodes probably needs more thinking and
> more state.

I didn't mean to suggest anything like that. I meant that we don't
want to count as the same a node that is flakey every night from 5-9PM
and one that is intentionally down during that same period. The
friendly shutdown procedure should make it easy to maintain the

> >  Having a voted state of c. every hour or so can allow
> > honest directory servers to reasonably do testing, provide a reasonable
> > directory etc. Nodes that are regularly listed as up but are tested
> > as down from multiple independent paths during that hour will have their
> > reputation affected in ways that we need to work out.
> I don't see that the voting frequency should matter much here. You do
> testing, and when (more correctly, while) you're satisfied with somebody
> then you vote yes on him. If a vote happens while you're not satisfied
> with him then you vote no on him.
> > > I think the right approach is to experimentally determine how often
> > > your nodes actually crash, and from that try to find a set of "fairly
> > > stable" and "a while" that allows voting to happen rarely while still
> > > reflecting the facts well.
> Again, I want to distinguish "is up now so will probably be up for the
> next little while" from "tends to be up so probably will continue tending
> to be up".

Agreed. I was talking about the former and saying that we need to know
what we mean by a little while in such a way as to have this be a
prediction that is reasonably accurate.

> For short-lived streams like a web page, is it better to choose from as
> large as possible a pool of nodes, each of which has fine reliability for
> your purpose (and assuming most of them are honest), or better to choose
> only from the very stable nodes (again assuming most of them are honest)?
> I don't think it's so critical to make voting infrequent. If it's
> reasonably frequent then even if things start getting rough, we're more
> likely to have a not-too-old decision from the dirservers.

Hope that has been addressed at the beginning above.

> > Yes exactly. The hour or so suggestion is just a wild guess at this point.
> > 
> > > We (ZKS) found that two hops was a nice idea, because there's an
> > > obvious security win in that no one node knows all.  Proving that a
> > > third node actually wins against an adversary that is competent to
> > > count packets was harder, so two gives you a nice understandable
> Foo. You're right, my security argument for 3 is much weaker than my
> argument for 2. My argument for 3 goes something like "Imagine adversary
> Bob owns the last hop, and adversary Billy owns the first hop. Bob sees
> something neat come out of the circuit. He knows that since paths are
> only 2 hops, all he needs to do is call in a favor from Billy. 3 hops
> means he doesn't know who he'll have to go to."
> How about a new one: correctness verification is going to be hell if
> the adversary can be reasonably confident both that he's an edge node
> in the circuit, and that he's talking to the other edge node. I guess
> we'll cross that bridge when we come to it.
> Do helper nodes get screwy with only 2 hops? On first thought, they
> seem ok.
> >  But, I don't want to make choices that will leave the
> > other areas as clearly identifiable addons. We need to consider how
> > easy it is for a middle node to know whether he is one or two hops
> > from the end of a circuit and what he might do with that info.
> Right, Paul has another related reason here. If we use more hops for
> enclave-based circuits, then a node in that circuit may be able to guess
> it's an enclave-based circuit.

I still see a clear case for a minimum of three (tor nodes, hops is
annoyingly ambiguous) for all of the above stated reasons and
more. Adam's point seems to assume a global adversary against which we
are broken against anyway.  Put another way, if the entrance node
knows where any circuit exits the OR network and the exit node knows
where any circuit enters the OR network then it is trivial for an
adversary owning a node to just watch its own local traffic and then
devise attack strategies: decide where to place packet counters on
exit lines, which nodes to try to compromise or DoS.  With three nodes
we are back to traffic confirmation.

For the enclave issue, if we have a helper node, then three node
routes are constant and fixed. This is part of the reasoning that lead
to the five node default in the original design.


More information about the tor-dev mailing list