[tor-dev] DirAuth usage and 503 try again later

Roger Dingledine arma at torproject.org
Mon Jan 18 17:00:21 UTC 2021

On Sat, Jan 16, 2021 at 01:56:02AM +0300, James wrote:
> In any case, it seems to me that if there was some high-level description of
> logic for official tor client, it would be very useful.

Hi James! Thanks for starting this discussion.

While I was looking at moria1's directory activity during the overload,
I did say to myself "wow that's a lot of microdescriptor downloads".

So hearing that torpy isn't caching mirodescriptors yet makes me think
that it's a good bet for explaining our overload last weekend.

I agree that we should have clearer docs for "how to be nice to the Tor
network." We actually have an open ticket for that goal but nobody has
worked on it in a while:

Quoting from that ticket:

"""Second, it's easy to make client-side decisions that harm the Tor
network. For examples, you can hold your TLS connections open too long,
or do too many TLS connections, or make circuits too often, or ask the
directory authorities for everything. We need to write up a spec to
clarify how well-behaving Tor clients should do things. Maybe that means
we write up some principles along the way, or maybe we just identify
every design point that matters and say what to do for each of them."""

And in fact, since Nick has been working a lot on Arti lately:
it might be a perfect time for him to help document the current Tor
behavior and the current Arti behavior, and we can think about where
there is room for improvement.

>  If you have
> some sort of statistic about increasing traffic we can compare that

Here's the most interesting graph so far:

So from that graph, the number of bytes handled by the directory
authorities doesn't go up a lot, because they were already rate limited
(instead, they just failed more often).

But the number of bytes handled by directory mirrors (including
fallbackdirs) shot up a huge amount. For context, if we imagine that
the normal Tor network handles between 2M and 8M daily users, then that
added dir mirror load would imply an extra 4M to 16M daily users if they
follow Tor's directory update habits. I'm guessing that the torpy users
weren't following Tor's directory update habits, and so a much smaller
set of users accounted for a much larger fraction of the load.

> >The logic that if a network_status document was already downloaded that
> >is used rather than trying to download a new one does not work.
> It works. But probably not in optimal way. It caches network_status only.

Here's my first start at three principles we should all follow when
writing Tor clients:

(1) Reduce redundant interactions. For examples:

- Cache as much as possible of the directory information you fetch
(consensus documents, microdescriptors, certs)

- If a directory fetch failed, don't just relaunch a duplicate request
right after (because it will probably fail too).

- If your setup involves running multiple Tors locally, consider using a
shared directory cache, so only one of them needs to fetch new directory
info and then all of them can use it.

(2) Reduce impact of interactions. For examples:

- Always use the "If-Modified-Since" header on consensus updates, so
they don't send you a consensus that you already have.

- Try to use the consensus diff system, so if you have an existing
consensus you aren't fetching an entire new consensus.

- Ask for compression, to save overall bandwidth in the network.

- Move load off of directory authorities, and then off of fallback
directories, as soon as possible. That is, if you have a list of
fallbackdirs, ask them instead of directory authorities. And once
you have a consensus and you've chosen your directory guards, ask
them instead of the fallbackdirs.

(3) Plan ahead for what your current code will do in a few years when
the world is different.

- To start here, check out the "slow zombies and fast zombies" discussion
in Proposal 266:

- Specifically, think about how your code handles failures, and design
your interactions with the Tor network so that if many people are running
your code in the future, and it's failing for example because it is
asking directory questions in an old format or because the directory
servers have started rate limiting differently, it will back off rather
than become more aggressive.

- When possible, look for ways to recognize when your code is asking old
questions, so it can warn the user and stop interacting with the network.

...What else should be on the list?


More information about the tor-dev mailing list