Route selection

Fri Dec 5 19:29:54 UTC 2003

On Fri, Dec 05, 2003 at 02:08:40PM -0500, Paul Syverson wrote:
| On Fri, Dec 05, 2003 at 11:07:41AM -0500, Adam Shostack wrote:
| > On Fri, Dec 05, 2003 at 12:51:36AM -0500, Roger Dingledine wrote:
| > | On Sun, Nov 30, 2003 at 09:39:59AM -0500, Paul Syverson wrote:
| > | > So I woke up some time ago thinking that we were wrong to bother with
| > | > voting on network state in the directory servers, that they should
| > | > just be voting on membership and maybe exit policy. Working on it for
| > | > a while I now think we probably still want a voted network state at
| > | > least c. every hour or so, since `simpler' ideas now seem even more
| > | > complicated, but I think I uncovered another issue.
| > | 
| > | The reason we need to tell clients which nodes are up right now is so
| > | they can choose working nodes for their circuits. Getting a consensus
| > | prevents a bad directory server from partitioning clients. I think we
| > | want a quick turnaround. Probably the right way to implement it is to
| > 
| > You can't know what nodes are up "right now" except by sending a
| > packet from a near neighbor.  The best you can do is know which nodes
| > have been "fairly stable" for "a while" and expect that past
| > performance is an indicator of future returns.
| > 
| 
| Right, this was exactly my thought. The way we build a route via
| extensions means that we are dependent on who the current end of the
| route node says is up when we attempt to extend. And, what we care
| about are reasonably stable nodes. Even if I have a < 5 minute old
| threshold signed net state, there's not much I can or probably even
| want to do if I try to make a circuit and get told by one of the "up"
| nodes that another "up" node is unreachable.

I think you want to report it somewhere, because this (plus some code)
makes it harder to lie about the state of a node.  That is, if Alice
can tell you that Bob is down, while others are using it, we want to
be able to detect that.  

| That said, even reliable nodes do go down for maintainance, etc.  If
| someone goes on vacation, has some major change, etc.  they might
| choose to take their node down for a day/few days/week, but don't want
| to lose the identity and reputation associated with that node
| indefinitely.  Having a voted state of c. every hour or so can allow
| honest directory servers to reasonably do testing, provide a reasonable
| directory etc. Nodes that are regularly listed as up but are tested
| as down from multiple independent paths during that hour will have their
| reputation affected in ways that we need to work out.

Right.  I think it's important to note who says what here.  I'm
tempted to say that nodes should be able to announce a planned
downtime of some length without it counting against them, but that
could mean that you could have a set of great nodes which you operate
when your target is online and at no other time.

| > I think the right approach is to experimentally determine how often
| > your nodes actually crash, and from that try to find a set of "fairly
| > stable" and "a while" that allows voting to happen rarely while still
| > reflecting the facts well.
| > 
| Yes exactly. The hour or so suggestion is just a wild guess at this point.
| 
| 
| > | Agreed, this is a problem. There's a broader problem though, which
| > | is that many of our ideas about how to rebuild circuits from partway,
| > | have streams leave circuits from intermediate nodes, etc seem to assume
| > | that circuits are many hops long, whereas for usability they should be
| > | as short as possible. The current planned compromise is around 3 hops,
| > | plus one if the first OR is revealing, e.g. you, and plus one if the
| > | last OR is revealing, e.g. it's your destination or it has a very limited
| > | exit policy.
| > 
| > We (ZKS) found that two hops was a nice idea, because there's an
| > obvious security win in that no one node knows all.  Proving that a
| > third node actually wins against an adversary that is competent to
| > count packets was harder, so two gives you a nice understandable
| > security/performance tradeoff.  I know, I'm hammering on this, but I
| > think that the performance gain is important for the default
| > experience. 
| > 
| 
| I agree, but the Freedom threat/service model was a little different.
| Correct me if I'm wrong, but the goal was always to protect the
| individual user running a Freedom client whose first node in a
| connection was always untrusted/semitrusted. A stated goal of Onion
| Routing from the start was to protect (hide) enclaves, i.e., not just
| the end clients but sometimes the nodes themselves. I agree that
| performance gain is crucial, and for our current testbed where we are
| looking at end user protection primarily at least to start, I am happy
| setting things at just a few hops, i.e., two hops within the network =
| three nodes.  But, I don't want to make choices that will leave the
| other areas as clearly identifiable addons. We need to consider how
| easy it is for a middle node to know whether he is one or two hops
| from the end of a circuit and what he might do with that info. We then
| may have a tradeoff between forcing everyone into longer routes,
| letting enclave connections be recognized as such by all nodes in the
| path, or punting on enclave protection at all within the same network
| that protects clients.  It is impossible to say at this point until
| further analysis and experimentation indicates what is feasible at
| what performance level. Note that this point just elaborates on
| what Roger said; if it looks like I am disagreeing, then I am not
| being as clear enough.

I mostly agree with you here.

We can do some pretty easy thought experiments about the cost of
extra links.  I'm using a cable modem, and getting 10-20ms to the
cable co, and about 100 ms to transit ISPs.  So, if nodes are to be
jurisdictionally seperate along a chain, you're looking at probably
100+ms per hop.  (If this doesn't dwarf TOR-router latency, you're
handling less than 100 packets per second.)  So we assume that this is
the key latency people see; each hop adds about 100ms of latency.  I
seem to recall that latency starts being "annoying" at about 400-500
ms for non-interactive use.  Given that you'll have 100 to the first
hop, you're at 300 with 2 hops, and 400 with 4.  Subject to
verification, of course.

Adam

-- 
"It is seldom that liberty of any kind is lost all at once."
					               -Hume