[tor-relays] General overload -> DNS timeouts

nusenu nusenu-lists at riseup.net
Sun Nov 7 10:43:50 UTC 2021


Hi,

since out of 447 exit relays that support the new overload system
(it got added in tor 0.4.6.x that recently hit the torproject's debian repo)
over 400 (minus those affected by an onionoo bug)
are overloaded as per tor's definition of it, I'll write some
general recommendations for the DNS timeout case because many more operators will want to solve
similar issues in the near future.
The general overload does not imply it is always a DNS issue
but in your case it is.

Generally speaking it is a bit unfortunate that the
new MetricsPort prometheus feature in tor is not available
in the same tor releases as the new overload design
since one of the first recommendation to investigate
DNS timeouts is to enable MetricsPort to monitor DNS timeout rates.

The currently best tor version to use MetricsPort with is 0.4.7.2-alpha
since 0.4.7.1-alpha is affected by a bug in that area
but due to another issue [1] there are no debian/ubuntu tor alpha packages
for 0.4.7.2-alpha on deb.torproject.org yet.

I would recommend to upgrade to alpha packages once they become available.

An exit relay can generate large amounts of DNS queries that the
configured resolvers need to handle. Your exit relay is still ramping up and so will the DNS query rate.
So if tor sees timeout already now the situation might becomes more problematic as your exit gets more traffic.

How does your DNS resolution work on your exit relay?
Do you have a local recursive resolver running?
Do you have operational monitoring for it that show you timeout rates?

the relay documentation has a short section about DNS on exits:
https://community.torproject.org/relay/setup/exit/#dns-on-exit-relays


The overload documentation also has a short section on DNS but
since you are running on Linux the default timeout (5s) in resolv.conf is more then enough
I would not change it.
https://support.torproject.org/relay-operators/relay-bridge-overloaded/

[1] https://gitlab.torproject.org/tpo/core/tor/-/issues/40505

Intrepid Ibex via tor-relays:
> nyx is giving me a notive every 10 minutes or so: [NOTICE] General overload ->
> DNS timeouts (6) fraction 1.4742% is above threshold of 1.0000%
> 
> DNS on this machine however works perfectly. I told my tor browser to use my
> specific exit node, everything works fine.

a timeout rate of about 1% is likely hard to "see" when testing manually
because it also depends on what query you send (due to DNS caching)
but it is still something that affects tor users of your exit.

That said, since the entire overload system is new in tor it is good to have
some operational monitoring of the DNS resolver (ideally only used by your tor daemon)
that can confirm what tor reports.

Once you have MetricsPort setup and graphs for the data you can try to adapt
the DNS settings or tune your resolver and see if things improve (timeout rate goes down).

kind regards,
nusenu

-- 
https://nusenu.github.io


More information about the tor-relays mailing list