Proposal: Token Bucket

Wed Jan 19 22:14:30 UTC 2011

Dear Nick,

thanks a lot for the feedback.

On Mi, 2011-01-19 at 14:24 -0500, Nick Mathewson wrote:
> 2011/1/3 Florian Tschorsch <tschorsch at cs.uni-duesseldorf.de>:
>  [...]
> >  First, we observe that the token bucket for the relayed traffic on the
> >  outgoing connections is unnecessary: since no new such traffic is generated
> >  in an onion router, the rate of this traffic is already limited by the read
> >  bucket on the incoming side (cp. RelayedTokenBucket).
> 
> This is not strictly true, I think.  Outbound cells with no
> corresponding incoming can be generated by local directory requests,
> by local bandwidth testing, and probably a couple of other things too.

In the strict sense, you are of course right. However, the amount of
traffic caused by these functions is likely to be negligibly small (if I
remember that correctly, someone [Karsten Loesing?] investigated the
amount of directory traffic in some other context, and found that it
*is* negligible).

If incoming and outgoing bandwidth limits are equal, an increase in the
volume of traffic within the onion router, in fact, even constitutes an
even stronger argument for our proposed modifications: if you have a
system (here: the queues in the onion router) where you continuously
admit more data into the system (incoming bandwidth limit + additional
overhead and traffic) than you allow to leave the system (outgoing
bandwidth limit), you will necessarily build up longer and longer queues
within the system, which will in turn cause very long cell delays -
which is just what we currently observe in Tor. Therefore, limiting the
amount of traffic on both ends is simply not a good idea.

I really believe that this is a major issue which significantly (and
unnecessarily) limits the performance. Florian and I will be happy to
contribute a patch if everyone agrees.

> Also, when traffic arrives slowly on an edge connection, packaging it
> into cells adds significant overhead (since every outgoing cell is 512
> bytes+TLS overhead but not every outgoing cell is full).

One may of course argue that the (admittely in extreme cases
significant) packaging overhead should be accounted for when Tor
enforces the configured bandwidth limits. However, to be consistent, one
would then also have to account for TLS, TCP, and IP header overhead,
which is currently neglected. This overhead will also be very
significant if traffic arrives *that* slowly, i.e., if you have TCP
segments far smaller than a cell size.

This will certainly be interesting to assess in more detail. In case it
actually turns out to be an issue, I see two options:

i) the pragmatic solution: based on a measurement study how much the
traffic typically "grows" due to packaging overhead etc., reduce the
configured bandwidth limit by an appropriate percentage

ii) an "observe+adjust" mechanism: observe how much traffic is actually
leaving the router; if the outgoing traffic exceeds the configured
limit, reduce the rate at which data is read on the incoming side
accordingly

The latter, if done right, would be able to guarantee that configured
bandwidth limits are strictly enforced, despite additional traffic being
generated in the router, and in even in case of an arbitrarily high
packaging overhead. However, my feeling is that this is a too complex
solution for a very simple problem.

> >  We therefore propose
> >  to remove the rate limiting mechanism on the outgoing side. This will
> >  eliminate the "double door effect" discussed above, since all cells are
> >  allowed to flow freely out of the router once they passed the incoming rate
> >  limiter.
> >
> >  Second, the refill interval of the buckets should be shortened. The
> >  remaining token buckets should be refilled more often, with a
> >  correspondingly smaller amount of tokens. For instance, the buckets might
> >  be refilled every 10 milliseconds with one-hundredth of the amount of data
> >  admissible per second.
> 
> Smaller bucket refill intervals are already implemented in Tor master
> if you build with Libevent 2.0 and use the buffervents backend by
> passing --enable-bufferevents to configure.  The refill interval is
> currently set to 1/3 of a second, but that value was chosen more or
> less at random; it would be neat to see other values benchmarked as
> well.

We did some measurements on our onion router, and also many simulations.
1/3 second still seems far too coarse. One would have to come
significantly below the RTT of the TCP links between onion routers. We
therefore tend more towards something in the order of 1/100 s. (We did
not notice any difference in CPU utilization when we made this change -
so this apparently isn't an issue.)

Best regards

Björn