Re: [tor-relays] Flooding of unbound via resolve attempts

10 Mar 2022

      On Thu, Mar 10, 2022 at 08:33:07AM +0000, Georg Koppen wrote:
...
Hello!
As you might know we are doing regular (at the moment weekly) scans of 
exit nodes to find and help with misconfigurations or errors that have 
potentially serious effects for Tor network usability and performance. 
The results we got so far after over a year of scanning are roughly 
single digit numbers of exit relays per week having mostly DNS 
configuration issues (unbound crashed etc.)
However, this week we suddenly found almost 80 exit relays with 
malfunctioning DNS resolution[1] which was surprising. Additionally, 
after some of the servers got fixed the issue returned. DrWhax (thanks!) 
pointed us to a possible explanation twittered by the unredacted folks:
https://twitter.com/unredacted_org/status/1501458345219215363
It seems that someone (intentionally or not) is overwhelming unbound 
leading to DNS resolution issues for those exit operators that do run 
this local resolver, which we currently recommend.
I find it interesting that it is possible to crash/DoS unbound through
Tor circuits to an exit relay. I would have assumed other factors
would limit before unbound would. They posted some CPU graphs on the
Twitter page, but it would have been interesting to see some
requests/s numbers if someone has any to share.
...
We've opened a ticket[2] for further investigation, but I hope this 
email raises some awareness so that exit operators can keep and eye on 
the situation.
Feel free to add insights you have to the ticket. Additionally, I bet if 
someone would share how they do monitoring for such a problem on their 
exits then a lot of exit operators would be happily picking up that 
setup and the Tor network would win. :)
I'm using Grafana + Prometheus + node_exporter to monitor my relays.
Grafana is a web UI for visualising data, Prometheus is a data
collector that scrapes data from node_exporter and stores it for
Grafana to fetch. node_exporter is a service that collects and
presents a bunch of data on the same format as the new Tor metrics
function.

(When I eventually get Tor daemons recent enough to get anything but
emptiness out of the metrics port, I'll add them to Premetheus for
scraping as well.)

Grafana is great and one can build dashboards that show pertinent
information and give a good overview. It is also possible to configure
alerts if metrics go outside of specified bounds. I have alerts
configured to mail me for a few statistics.

When it comes to unbound monitoring, I use unbound_exporter from the
letsencrypt project on Github[3]. It works the same way node_exporter
does, but exports unbound metrics and can be scraped by Prometheus. To
visualise the data, I use a pre-made dashboard for Grafana[4] that I
have tweaked a bit.

Cordially,
Andreas Kempe
...
[1] https://gitlab.torproject.org/tpo/network-health/team/-/issues/197
[2] https://gitlab.torproject.org/tpo/network-health/analysis/-/issues/30
[3]: https://github.com/letsencrypt/unbound_exporter
[4]: https://grafana.com/grafana/dashboards/9604