0.0.8pre1 works: now what?

Mon Jul 26 11:24:43 UTC 2004

My premise for 0.0.8 was "can we make the Tor network degrade more
gracefully in the face of a large influx of users?"

The infrastructure we added for 0.0.8pre1 seems to pretty much
work. Clients that enable their ORPort will generate keys and upload
descriptors. They get into the directory immediately, without any sort
of manual verification process, but the manually-verified servers are
still distinguishable if clients prefer to use them (clients use only
verified servers currently). Tor nodes track and report their uptime, and
also track and report their bandwidth capacity. You can ask a server to
extend to another server -- even one he hasn't heard about before. Servers
don't need to stay connected to all the other servers anymore.

Now what?

In my eyes, there are three big issues remaining. We need to:
1) let clients describe how comfortable they are using unverified servers
in various positions in their paths --- and recommend good defaults.
2) detect which servers are suitable for given streams.
3) reduce dirserver bottlenecks.

---------------------------------------------------------------------
Part 1: configuring clients; and good defaults.

Nodes should have an 'AllowUnverified' config option, which takes
combinations of entry/exit/middle, or any.

Since users will have different needs, there isn't any good default.
People worried about traffic confirmation attacks where the website
colludes will want verified servers as entry points. People worried about
exit nodes modifying the web pages they give back, or reading their AIM
traffic, will want verified servers as exit points. People worried about
an adversary who can only observe a limited number of machines will want
as many node choices as they can get.

If we want to be conservative, we should say 'middle only' by default.
But if we use them only for middle, then people running unverified servers
get no benefits from plausible deniability (no users will enter at them,
and because paths are 3 hops, the hop after them knows whether they relayed
or originated the request). So we should allow them as entry nodes too.
That argues at least for allowing 'entry,middle'. And really, if we have
somebody willing to be an exit node, why on earth would we turn them away?

I think at this point we need to clarify (that is, change) what we mean
by verified. Up until now, the goal has been to make sure the adversary
doesn't get "too many" servers verified, but a server or two is probably
ok. Now that unverified servers can be put to use, I say we revise this
to also require "one of the dirserver operators knows them" (or typical
web-of-trust equivalents).

At what point in scaling the network does the statement "all people
offering exit services are either cypherpunks or the adversary" become
false?

And this doesn't even touch yet the idea of multiple separate adversaries
competing to take over the network.

---------------------------------------------------------------------
Part 2: choosing suitable servers.

If we want to maintain the high quality of the Tor network, we need a
way to determine and indicate bandwidth (aka latency) and reliability
properties for each server.

Approach one: ask people to only sign up if they're high-quality nodes,
and also require them to send us an explanation in email so we can approve
their server. This works quite well, but if we take the required email
out of the picture, bad servers might start popping out of the woodwork.
(It's amazing how many people don't follow instructions.)

Approach two: nodes track their own uptime, and estimate their max
bandwidth. The way they track their max bandwidth right now is by
recording whenever bytes go in or out, and remembering a rolling average
over the past ten seconds, and then also the maximum rolling-average
observed in the past 12 hours. Then the estimated bandwidth is the smaller
of the in-max and the out-max. They report this in the descriptor they
upload, rounding it down to the nearest 10KB, and capping anything over
100KB to 100KB. Clients could be more likely to choose nodes with higher
bandwidth entries (maybe from a linear distribution, maybe something
else -- thoughts?).

The premise is that we want to require a node to have some minimum
usefulness before we will risk actual user traffic on it, but after that
it will quickly observe whether it can handle a lot of bandwidth or just
a bit (it only takes a few fast connections to convince it of a higher
bandwidth value, if it has one).

How do we bootstrap from a node that isn't yet at 10KB? Maybe servers
(both the ones with known bandwidth and the ones with 0KB) should
periodically build circuits through them and pull down, or push, some
file. We could have a knob to turn such that servers being used for
actual stuff would test less frequently than servers just sitting around
waiting to be tested.

To accommodate servers that want to donate 20GB of bandwidth each month
but not more, they should set their bandwidthrate to be 7KB/s, and set
a huge token bucket. When the bucket runs dry, they should scale back
to advertising 10KB for a while, so they're not used as much. Perhaps
they'll flip-flop between advertising a little and advertising a lot,
but presumably we can tune that too.

Since uptime is published too, some streams (such as irc or aim) prefer
reliability to latency. Maybe we should prefer latency by default,
and have a Config option StreamPrefersReliability to specify by port
(or by addr:port, or anything exit-policy-style), that looks at uptime
rather than advertised bandwidth.

And of course, notice that we're trusting the servers to not lie. We
could do spot-checking by the dirservers, but I'm not sure what we would
do if one dirserver thought something was up, and the others were fine.
At least for jerks the dirservers can agree about, maybe we could
configure the dirservers to blacklist descriptors from certain IP spaces.

Approach three: Something else? Maybe a scheme for estimating bandwidth
capacity of others in an efficient way in an adversarial environment?

---------------------------------------------------------------------
Part 3: reducing dirserver bottlenecks.

There are three bottlenecks here: bandwidth, reliability, and trust.

Bandwidth is simplest. Right now clients and servers cache the latest
directory they've fetched, and if they configure a DirPort, they serve it
to others too. This DirPort is advertised in the descriptor, so clients
choose evenly between anybody with an open DirPort, taking the load
off the authoritative dirservers. On the other hand, if you've got an
ORPort open and you've uploaded your descriptor, then you go directly
to the authdirservers for your directory. So it's a two-tiered system --
authdirservers to normal servers to clients. That's working nicely right
now. If that's not enough, we're also working on a scheme to distribute
the running-routers list independently of the rest of the directory,
so people can download directories less frequently but fetch updates
quite frequently.

Reliability is a bugger -- when all the dirservers are down, new people
can't join the network (either as client or as server). Now that Tor nodes
cache the directory, we can let them store it to disk, so they know about
other people with open DirPorts when they come back, in case some of them
have recent directories. But fundamentally, if we rely on authdirservers
to let us learn about who else is in the network, we will continue to have
this issue. It would seem that if people don't require verified servers
everywhere in their path, they should be willing to hear about descriptors
from somebody other than the authdirservers. More on that later.

And similarly with trust -- if they don't care what we have to say,
why should we be holding them back?

Ok, sleep time. More later. Thoughts?
--Roger