[tor-relays] cases where relay overload can be a false positive

s7r s7r at sky-ip.org
Mon Jan 3 18:27:55 UTC 2022


Replying to myself:

s7r wrote:
[SNIP]
> 
> Metrics port says:
> 
> tor_relay_load_tcp_exhaustion_total 0
> 
> tor_relay_load_onionskins_total{type="tap",action="processed"} 52073
> tor_relay_load_onionskins_total{type="tap",action="dropped"} 0
> tor_relay_load_onionskins_total{type="fast",action="processed"} 0
> tor_relay_load_onionskins_total{type="fast",action="dropped"} 0
> tor_relay_load_onionskins_total{type="ntor",action="processed"} 8069522
> tor_relay_load_onionskins_total{type="ntor",action="dropped"} 273275
> 
> So if we account the dropped ntor circuits with the processed ntor 
> circuits we end up with a reasonable % (it's  >8 million vs <300k).
> 
> So the question here is: does the computed consensus weight of a relay 
> change if that relay keeps sending reports to directory authorities that 
> it is being overloaded? If yes, could this be triggered by an attacker, 
> in order to arbitrary decrease a relay's consensus weight even when it's 
> not really overloaded (to maybe increase the consensus weights of other 
> malicious relays that we don't know about)?
> 
> Also, as a side note, I think that if the dropped/processed ratio is not 
> over 15% or 20% a relay should not consider itself overloaded. Would 
> this be a good idea?
> 
> Sending to tor-relays@ for now, if some of you think of this in any way 
> we can open a thread about it on tor-dev@ - please let me know if I 
> should do this.
> 

I am now positive that this particular relay is actively being probed, 
overloaded for just few minutes every 2-3-4 days, rest of the time 
performing just fine with under 70% usage for CPU and under 50% for RAM, 
SSD and bandwidth.

I also confirm that after this time's overload report, my consensus 
weight and advertised bandwidth decreased. So my concerns about this 
being triggered arbitrary has a network-wide effect in terms of path 
selection probability and might suite someone a purpose of any sort.

I don't know what is the gain here and who is triggering this, as well 
as if other Guard relays are experiencing the same (maybe we can analyze 
onionoo datasets and find out) but until then I am switching to 
OverloadStatistics 0.


Here are today's Metrics Port results:

tor_relay_load_tcp_exhaustion_total 0

tor_relay_load_onionskins_total{type="tap",action="processed"} 62857
tor_relay_load_onionskins_total{type="tap",action="dropped"} 0
tor_relay_load_onionskins_total{type="fast",action="processed"} 0
tor_relay_load_onionskins_total{type="fast",action="dropped"} 0
tor_relay_load_onionskins_total{type="ntor",action="processed"} 10923543
tor_relay_load_onionskins_total{type="ntor",action="dropped"} 819524

As you can see, like in the first message of this thread, the calculated 
percent of dropped/processed ntor cells is not a concern (over 10 
million processed, under 900 000 dropped).


Other relevant log messages that sustain my doubts:
This appeared when it was being hammered intentionally. As you can see 
the overload only took 7 minutes. At previous overload it took 5 minutes 
and previous previous overload 6 minutes.

I think the attacker saves resources as it gains same result overloading 
it 5 minutes versus overloading it 24x7.

Jan 03 07:14:42.000 [warn] Your computer is too slow to handle this many 
circuit creation requests! Please consider using the 
MaxAdvertisedBandwidth config option or choosing a more restricted exit 
policy. [2004 similar message(s) suppressed in last 213900 seconds]
Jan 03 07:15:42.000 [warn] Your computer is too slow to handle this many 
circuit creation requests! Please consider using the 
MaxAdvertisedBandwidth config option or choosing a more restricted exit 
policy. [52050 similar message(s) suppressed in last 60 seconds]
Jan 03 07:16:42.000 [warn] Your computer is too slow to handle this many 
circuit creation requests! Please consider using the 
MaxAdvertisedBandwidth config option or choosing a more restricted exit 
policy. [92831 similar message(s) suppressed in last 60 seconds]
Jan 03 07:17:42.000 [warn] Your computer is too slow to handle this many 
circuit creation requests! Please consider using the 
MaxAdvertisedBandwidth config option or choosing a more restricted exit 
policy. [89226 similar message(s) suppressed in last 60 seconds]
Jan 03 07:18:42.000 [warn] Your computer is too slow to handle this many 
circuit creation requests! Please consider using the 
MaxAdvertisedBandwidth config option or choosing a more restricted exit 
policy. [74832 similar message(s) suppressed in last 60 seconds]
Jan 03 07:19:42.000 [warn] Your computer is too slow to handle this many 
circuit creation requests! Please consider using the 
MaxAdvertisedBandwidth config option or choosing a more restricted exit 
policy. [79933 similar message(s) suppressed in last 60 seconds]
Jan 03 07:20:42.000 [warn] Your computer is too slow to handle this many 
circuit creation requests! Please consider using the 
MaxAdvertisedBandwidth config option or choosing a more restricted exit 
policy. [68678 similar message(s) suppressed in last 60 seconds]
Jan 03 07:21:42.000 [warn] Your computer is too slow to handle this many 
circuit creation requests! Please consider using the 
MaxAdvertisedBandwidth config option or choosing a more restricted exit 
policy. [76461 similar message(s) suppressed in last 60 seconds]


Other stats from log file:

14154 circuits open ; I've received 358682 connections on IPv4 and 14198 
on IPv6. I've made 185294 connections with IPv4 and 51900 with IPv6.

[notice] Heartbeat: DoS mitigation since startup: 1 circuits killed with 
too many cells, 27881 circuits rejected, 2 marked addresses, 0 same 
address concurrent connections rejected, 0 connections rejected, 0 
single hop clients refused, 0 INTRODUCE2 rejected.

[notice] Since our last heartbeat, 2878 circuits were closed because of 
unrecognized cells while we were the last hop. On average, each one was 
alive for 653.767547 seconds, and had 1.000000 unrecognized cells.

This particular last message I am only seeing it recently, but I see it 
quite heavily (at every heartbeat); any others of you see it?

My sense which never disappointed me is telling me there's something 
here worth looking into. I want to analyze onionoo datasets to see if 
the Guard % overload reports increased in the last month and to open an 
issue on gitlab to patch Tor to only report Overload in case the 
dropped/processed ratio is over 20%.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 495 bytes
Desc: OpenPGP digital signature
URL: <http://lists.torproject.org/pipermail/tor-relays/attachments/20220103/9af8535d/attachment-0001.sig>


More information about the tor-relays mailing list