[tor-relays] Metrics Error: staledesc

Roger Dingledine arma at torproject.org
Fri Jan 29 00:17:50 UTC 2021


On Thu, Jan 28, 2021 at 07:00:45PM +0100, lists at for-privacy.net wrote:
> Metrics showed my relay offline. But my Tor daemon is running normally.
> Then I saw _many_ relays suddenly have flag: staledesc
> ?
> 
> https://metrics.torproject.org/rs.html#search/flag:staledesc

Yep. The reason that happens is that the directory authorities are
receiving too many dirport connections from exit relays, but the exit
relays use a dirport connection to post their own descriptor.

So if we don't handle all of the dirport attempts, then we end up not
receiving some of the descriptor publish attempts.

I'm thinking that this part will still work out though, for two reasons.

One is that if *any* of the dir auths receive the descriptor, then they
will mention it in their next vote, and the other dir auths will learn
about it from that vote and ask for a copy.

And two is that relays watch to see if they are still listed in the
consensus, and if they're not then they try more often to upload a
new descriptor.

So yes, we are making an effort to make sure there is at least one dir
auth that will be good at receiving descriptor publishes.

Some small fraction of relays are expected to get the StaleDesc flag in
normal network operation, because there is an unfortunate interaction
between how relays publish a new descriptor "every 18 hours or when
something important changes", but dir auths ignore new descriptors if
they are too close in time or other characteristics to one that they
already have. So for example there is a known bad interaction where you
restart your relay, and the relay publishes a new descriptor because
it doesn't know that it just published one earlier, but then the dir
auths discard that new descriptor because they already have the old one,
and then your relay waits 18 hours to create a new one.

For much more backstory, see
https://gitlab.torproject.org/tpo/core/tor/-/issues/1810
https://gitlab.torproject.org/tpo/core/tor/-/issues/2479
https://gitlab.torproject.org/tpo/core/tor/-/issues/3327
https://gitweb.torproject.org/torspec.git/tree/proposals/293-know-when-to-publish.txt

But I guess the other way to look at it is: the StaleDesc flag is a
*feature*, to let your relay know that it has fallen into this edge case
so it can take steps to recover.

> https://metrics.torproject.org/rs.html#details/5D84900DBE6D6365684A9675B81A68ACE9577A68

This relay looks genuinely down.

--Roger



More information about the tor-relays mailing list