[tor-relays] exit operators: overall DNS failure rate above 5% - please check your DNS resolver

nusenu nusenu-lists at riseup.net
Tue Jul 2 13:35:00 UTC 2019


Tim Niemeyer:
> Maybe it is a load problem, because this machine has 100% cpu load? :(

Generally speaking running a relay at 100% of hardware resources all the time
will not make happy users and we should optimize for a smooth tor browser
experience more than a high bw or hw resource usage.

I don't think we have to worry about an exit failing 10% of DNS queries for a single day.

Single operators running a significant exit share (>0.5% exit probability)
which fail at a high rate (>10%) consistently over multiple days are more
relevant.

Since I don't see your exits showing up as failing currently the remainder
of this email is not necessarily directed at you directly but more for the general
record.

> A dedicated machine for dns may be good, but currently we have only
> this one machine.

I actually believe in running DNS resolvers locally to keep paths short.
The resources required for the resolver must be taken into account when
planing the capacity of the entire server. The resolver can also require
a decent amount of CPU time on fast exits.

In very constraint environments it might still makes sense to run DNS resolvers
non-locally (while not using a resolver to far away)
since DNS resolvers for exits can also run where exits might not be welcome.

Using a non-local resolver is obviously still better than a local resolver that
can not keep up with the load.


> Another way could be to recude exit capacity, but I
> don't know if it's a good idea to throttle it?


With the goal to have happy users (low latency reliable exits):

On a single server with multiple cores and a >1Gbit/s connectivity
(server not limited by uplink bw and memory limits) I'd suggest:

1) determine your CPU's single thread performance:
measure the peak bandwidth of tor traffic it can manage at a given
exit policy running a single instance with no bw limits. Take some ramp-up
time into account - which also exists for exits.
(use measured data not advertised bandwidth - they can be far appart)

2) determine how many DNS QPS that single tor exit instance
generates and what resolver CPU load (peak value after 1-2 weeks of operations)

3) run as many instances as you have cores -1 and set the bw limit (RelayBandwidthRate)
in your torrc to ~80% of the peak value from (1)
while ensuring that there is enough spare capacity for the resolver and
the OS itself 

optimize your resolver's performance and cache hit rate by playing with cache size
and amount of threads.
example for unbound:
https://nlnetlabs.nl/documentation/unbound/howto-optimise/

 
> Btw, in the mean time we got more upstream transit and now we are
> looking to get better / second hardware. But money is a limiting
> factor. :(

maybe it helps if you clearly communicate that you could easily do X Gbit/s
of exit capacity if you only had the necessary hardware
and to tell people where to enter their credit card details if they want
to see that happen ;)




-- 
https://twitter.com/nusenu_
https://mastodon.social/@nusenu

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.torproject.org/pipermail/tor-relays/attachments/20190702/c2325f37/attachment.sig>


More information about the tor-relays mailing list