Route selection

Roger Dingledine arma at
Tue Dec 9 01:50:38 UTC 2003

On Fri, Dec 05, 2003 at 02:08:40PM -0500, Paul Syverson wrote:
> On Fri, Dec 05, 2003 at 11:07:41AM -0500, Adam Shostack wrote:
> > | The reason we need to tell clients which nodes are up right now is so
> > 
> > You can't know what nodes are up "right now" except by sending a
> > packet from a near neighbor.  The best you can do is know which nodes
> > have been "fairly stable" for "a while" and expect that past
> > performance is an indicator of future returns.

Tor has a really strong advantage here over the high-latency
systems. Mixminion must solve the reputation problem because its
directories must list which nodes will still be up in several hours as the
message slowly goes from hop to hop. Tor directories just suggest nodes to
try -- clients can adapt to dead nodes and still complete transactions.

Let's remember what we'd like the directory to do. First, we'd like
to detect and exclude _honest_ nodes that are down or flaky. We could
require that listed nodes must have been connected continuously for, say,
30 minutes. For short-lived Tor transactions, this should be plenty,
but since ssh or IM can span hours, I agree that we'll need to measure
experimental results to find a good value for 30. But it's probably
good enough for now. (We might also come up with a way to shift streams
from one circuit to another so they don't break when they lose a node
(other than the exit node), and Nick and I have been pondering it in the
background -- but it's probably not worth it if our paths are so short
that they have few/no intermediate hops.)

Second, we'd like to detect and exclude misbehaving nodes. We've
speculated about verification procedures to make sure a node is performing
correctly and fairly -- make sure it's connecting to all routers, make
sure it's processing cells, etc, even from indirect requests so the node
can't distinguish the verification from a normal request. This level of
verification is needed to reduce the opportunity for a node to single
out a target victim and trace him. It's also needed to keep a bad node
from degrading performance and reliability on the network. We have some
ideas on verification approaches we could explore, but this is a largely
unsolved (and hard) problem. Verification won't help misbehavior at
the network edges (e.g. accepting certain clients, or connecting to
certain websites), and will have limited ability to detect problems
with long-lived transactions, unless the dirserver starts simulating
three-hour ssh connections.

In addition to how directories handle the partitioning opportunities,
we must also figure out how clients should handle them. If you make a
circuit to Alice and try to extend to Bob and she says he's down, but the
directory says he's up, do you believe Alice? We would like clients to
behave uniformly, to reduce partitioning problems. Do you give up on Bob
entirely, because something fishy is going on? Do you give up on Alice?
Either one could be framing the other. If you give up on Alice, do you do
it just for that circuit (so she can't dictate your next hop in the path
by only letting you connect to the one she wants), or for later circuits
too because if something fishy is going on you should remember about it?

Ideally our verification procedures above would limit the situations
where this becomes an issue, but it still must be solved. Hard problem.

> That said, even reliable nodes do go down for maintainance, etc.  If
> someone goes on vacation, has some major change, etc.  they might
> choose to take their node down for a day/few days/week,

I've been planning a 'friendly shutdown' procedure: you hit ^c once, and
that switches the node into a 'closing down' state, where it rejects all
new streams but continues to pass cells for old ones. When the last stream
closes, it exits; if you hit ^c again, or perhaps after a certain timeout,
it just closes them and exits. It could easily do something to signal
the directory server to take it off the list (for example, by no longer
submitting descriptors to the dirservers); thus it can retire safely.

> but don't want
> to lose the identity and reputation associated with that node
> indefinitely.

Identity is based on manual approval by the dirserver operators, not
on performance. It seems to me that 'reputation' for honest nodes (that
is, has he been up recently) doesn't need any long-term state at all. I
don't think our reputation system wants to be complex enough to reach
for results like "every Sunday night his node becomes flaky". Reputation
systems to identify dishonest nodes probably needs more thinking and
more state.

>  Having a voted state of c. every hour or so can allow
> honest directory servers to reasonably do testing, provide a reasonable
> directory etc. Nodes that are regularly listed as up but are tested
> as down from multiple independent paths during that hour will have their
> reputation affected in ways that we need to work out.

I don't see that the voting frequency should matter much here. You do
testing, and when (more correctly, while) you're satisfied with somebody
then you vote yes on him. If a vote happens while you're not satisfied
with him then you vote no on him.

> > I think the right approach is to experimentally determine how often
> > your nodes actually crash, and from that try to find a set of "fairly
> > stable" and "a while" that allows voting to happen rarely while still
> > reflecting the facts well.

Again, I want to distinguish "is up now so will probably be up for the
next little while" from "tends to be up so probably will continue tending
to be up".

For short-lived streams like a web page, is it better to choose from as
large as possible a pool of nodes, each of which has fine reliability for
your purpose (and assuming most of them are honest), or better to choose
only from the very stable nodes (again assuming most of them are honest)?

I don't think it's so critical to make voting infrequent. If it's
reasonably frequent then even if things start getting rough, we're more
likely to have a not-too-old decision from the dirservers.

> Yes exactly. The hour or so suggestion is just a wild guess at this point.
> > We (ZKS) found that two hops was a nice idea, because there's an
> > obvious security win in that no one node knows all.  Proving that a
> > third node actually wins against an adversary that is competent to
> > count packets was harder, so two gives you a nice understandable

Foo. You're right, my security argument for 3 is much weaker than my
argument for 2. My argument for 3 goes something like "Imagine adversary
Bob owns the last hop, and adversary Billy owns the first hop. Bob sees
something neat come out of the circuit. He knows that since paths are
only 2 hops, all he needs to do is call in a favor from Billy. 3 hops
means he doesn't know who he'll have to go to."

How about a new one: correctness verification is going to be hell if
the adversary can be reasonably confident both that he's an edge node
in the circuit, and that he's talking to the other edge node. I guess
we'll cross that bridge when we come to it.

Do helper nodes get screwy with only 2 hops? On first thought, they
seem ok.

>  But, I don't want to make choices that will leave the
> other areas as clearly identifiable addons. We need to consider how
> easy it is for a middle node to know whether he is one or two hops
> from the end of a circuit and what he might do with that info.

Right, Paul has another related reason here. If we use more hops for
enclave-based circuits, then a node in that circuit may be able to guess
it's an enclave-based circuit.


More information about the tor-dev mailing list