Proposal: Speed up Tor

Nick Mathewson nickm at freehaven.net
Thu Jun 14 21:09:32 UTC 2007


On Fri, Jun 01, 2007 at 10:30:32AM -0700, Michael_google gmail_Gersten wrote:
> Proposal for Tor.

Hi, and sorry about the delay.  Stuff has been really busy.  I hope
Roger will have the time to chime in too.

First, this proposal needs a different name.  Generally, you want to
name proposals after the mechanism, not the goal: otherwise everybody
who has a proposal to make Tor faster writes something called "Speed
up Tor" or "Make Tor faster" or something.

Second, you should check out 111-local-traffic-priority.txt; that's
another proposal for speeding stuff up, based on creating incentives
for people to run more servers.

At the base, it looks like you want to do a few things:

    a) limit CPU usage on heavily loaded servers
    b) let servers turn down connections when they're already loaded.

I agree that a) is important.  I'm not sure that b) is the way to use
it; it seems to me that the end effect could easily be that mostly
people can't get on the network, but occasionally they get a really
fast circuit.  That's probably not an improvement.

Another danger here is that, if you're a greedy client, your logical
behavior under this proposal is to open as many circuits as you can
and use them forever.   This isn't so good for your anonymity though.
Hmmmm.

> The goal of this proposal is to support the following goals:
> 
> 1. An easy way to toggle between "At least speed X" (for
> single-threaded web browsing) and "Any speed, many connections" (for
> downloads).

Hmmm.  I'm not sure I really like that part; it seems to leak pretty
trivially what you're doing by your node selection.  Also see below.

> 2. A way to keep nodes from being CPU starved from the encryption
> processing (high bandwidth nodes)
> 3. A way to keep nodes from being bandwidth starved (the main limit on
> middle-speed nodes).
> 
> Motivation: Speed up Tor.
> 
> Design:
> 1. Add in a control message for switching torrc's. Add support in
> Vidiallia to toggle these.
> 	Flaw: Ideally the determination (high speed vs. high numbers) would
> be made based on who is making the request. For example, while the
> downloader is fetching 10 slow parts at once, I still want to
> browse.

This sounds like an unrelated thing to what you actually want.  If
you'd like some circuits to be built with a different node selection
approach than others, you don't want to switch torrc files: you want
to apply a different set of circuit building rules to them.
Otherwise, there's no way for circuits of different types to coexist
on the same client.

> 2. If the protocol for extending a circuit to a new node does not
> permit the new node to reject the connection, then add this ability.
> Otherwise, start using it. Nodes can prevent being CPU starved by
> refusing new connections when they are "full".

Or they could could rate-limit connections or something.

This "turn away circuits when you're full" approach would seem to mean
that once you've got a circuit, you can use it as much CPU and
bandwidth as you want, but it can get hard to get a circuit.  Roger,
what do you think here?  This seems to be a step away from fairness.

> 3. When a circuit is being built, estimates of bandwidth needed are
> transmitted as well. Similar to #2, nodes will reject new connections
> if the bandwidth isn't there.
> 
> Security implications: Absolutely no idea. How does having large
> numbers of connections affect Tor's tracability?

Sorry; who sends bandwidth estimates to whom?

> Specification (incomplete):
> 1. New control message would either take a filename of the new torrc,
> or the contents of the new torrc. I do not know Tor's inner workings,
> and cannot tell which is "better".

There's already control commands to change config options.

> 2. Nodes can measure the CPU cost per circuit, and tell how many they
> can afford CPU wise. There may be a configuration parameter to
> indicate how much CPU it can use; maybe the output of "uptime" is read
> to see what the CPU levels are (and Tor stops accepting when the load
> is .8 or higher.)

Seems like this would leak your CPU load.  Don't know if that's okay.

> 3. The simplest way to handle this is to put numbers in the config
> file, and pass them along. For example, if I'm in "single threaded
> browsing", I'll have numbers specifying a max speed of 150 KB/s, a
> burst speed of 100 KB/s, and an average speed of 10 KB/s. If I'm in
> "multi-threaded download", I'll specify a max speed of 25 KB, an
> average speed of 15 KB, and a burst speed of 18 KB.

We already specify numbers on the server side, and include them in
router descriptors, and never use more than we're configured to use.
Have you read dir-spec.txt (which says how we communicate about
routers) or path-spec.txt (which says how we choose circuit to build)?

> What to do with these numbers? Well, if the sum of the averages of all
> incoming circuits exceed my actual bandwidth, I say "No" when someone
> tries to connect. Similarly, if I cannot support the burst speed, I
> saw "No", to avoid slowing them down (this becomes the minimum speed
> needed). Finally, I know that the worst case for this circuit is the
> max speed, and I can do ... ? with it.

I don't know; is there a way to 

> The idea here: On my DSL, I cannot get more than 150 KB/s. While I
> want to get that full speed, I'll be happy to get 100 KB/s. On
> average, while I'm surfing, I'm not fetching pages all the time --
> hence, an average speed of 10 KB/s representing fetch, read, fetch,
> read, fetch, read.
> 
> Now, it's not perfect. I'm thinking that "Busy percentage" might make
> more sense -- 10% busy for web surfing, 95% busy for downloaders. This
> would also help CPU overhead calculations. It also helps tell when to
> say "This circuit has been idle for a while. It isn't active at all,
> and while it is inactive, we will regard it as having a speed demand
> of 0". This will prevent a node from being filled up with "idle"
> connections, and becoming wasted.
> 
> I'm also realizing that my concept of "burst" isn't quite right, and
> I'm hoping that someone else has a better idea. For downloading,
> "burst" means that while the average demand for a 10 part download is
> 15 KB/s per circuit, there will be variance, and a node might see a
> higher burst. Yet I will be happy even if a node can only give me 10
> KB/s, because I have 9 other circuits that will each get slightly more
> speed. So I think we need "This is my minimum acceptable speed, reject
> this circuit if you can't give me this much", "This is my average",
> and "This is my worst case / initial burst" (a lot of circuits will be
> busy at first, and idle afterwards), as well as "percentage of time I
> expect the circuit to be used".
> 
> Compatibility: The only change in how nodes talk to each other is in
> circuit building. I am not familiar with the current system to know
> how this will change things.

You can learn how nodes talk to each other by reading tor-spec.txt.


On the whole, I think this is a good start; we need to think about how
this impacts performance overall.  I think limiting CPU use is a
decent idea (though my top CPU priority right now is figuring out how
to do a better job when we're on multi-core machines).  We already
limit bandwidth use.

Our current bandwidth approach is (more or less; please correct me,
Roger) "share equally among circuits on connections; share equally on
connections."  That's not so smart; proposal 111 is a little better.
This proposal amounts to "Show up early, get a circuit.  Show up late,
get nothing."  How can we investigate whether this would actually
improve matters?

There's also a DoS problem to consider here: it doesn't cost much to
make a bunch of connections (possibly anonymously) to a node and say
"I'll need a big chunk of bandwidth."  If nodes don't overcommit, it's
pretty easy to make a node useless by allocating all its bw without
actually using any.

There are already better fair queueing algorithms than the one Tor is
using; could we improve throughput by switching to one of those instead?

Anyway, just a first round of thoughts.  If you want to do another
revision of your proposal, I'll check that in.  If you'd prefer to
just rename it and have me slap a number on it, I'll put it into svn
as is.

Also, I should really go through my to-do folder and see what else is
waiting for a number.

man. so busy.

peace,
-- 
Nick Mathewson
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 652 bytes
Desc: not available
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20070614/9d704829/attachment.pgp>


More information about the tor-dev mailing list