[tor-relays] cases where relay overload can be a false positive
s7r at sky-ip.org
Sun Jan 23 17:28:01 UTC 2022
Mike Perry wrote:
>> Correct, this is a possibility indeed. I'm not entirely certain that
>> this is
>> the case at the moment as sbws (bandwidth authority software) might
>> not be
>> downgrading the bandwidth weights just yet.
>> But regardless, the point is that it is where we are going to. But we
>> control over this so now is a good time to notice these problems and act.
>> I'll try to get back to you asap after talking with the network team.
> My thinking is that sbws would avoid reducing weight of a relay that is
> overloaded until it sees a series of these overload lines, with fresh
> timestamps. For example, just one with a timestamp that never updates
> again could be tracked but not reacted to, until the timestamp changes N
> We can (and should) also have logic that prevents sbws from demoting the
> capacity of a Guard relay so much that it loses the Guard flag, so DoS
> attacks can't easily cause clients to abandon a Guard, unless it goes
> entirely down.
> Both of these things can be done in sbws side. This would not solve
> short blips of overload from still being reported on the metrics portal,
> but maybe we want to keep that property.
I agree with this - sbws should see the overload reports often and with
a some kind of continuity before reducing the weight of a relay, not
just "bursts" like the ones I was experiencing (maximum 3-5 minutes of
overload every 2-3 days).
After switching to OverloadStatistics 0 the consensus weight is back to
normal, to what the relay can take with no effort, and the overload
bursts are so rare now (just one in 17 days).
The metrics port values are looking good as well. the % of dropped ntors
as opposite to processed ntors is very good (under 5%).
I am not sure what the best approach for the metrics portal is, but I
think it's easier there to document what to look out for and when should
this be considered a problem by the relay operator and when not.
>>> Also, as a side note, I think that if the dropped/processed ratio is not
>>> over 15% or 20% a relay should not consider itself overloaded. Would
>>> this be
>>> a good idea?
>> Plausible that it could be better idea! Unclear what an optimal
>> percentage is
>> but, personally, I'm leaning towards that we need higher threshold so
>> they are
>> not triggered in normal circumstances.
>> But I think if we raise this to 20% let say, it might not stop an
>> from triggering it. It might just make it that it is a bit longer.
> Hrmm. Parameterizing this threshold as a consensus parameter might be a
> good idea. I think that if we can make it such that an attack has to be
> "severe" and "ongoing" long enough such that a relay has lost capacity
> and/or lost the ability to complete circuits, and that relay can't do
> anything about it, that relay unfortunately should not be used as much.
> It's not like the circuit will be likely to succeed or be fast enough to
> use in that case anyway.
> We need better DoS defenses generally :/
Of course we need better defense, DoS is never actually fixed, no matter
what we do. It's just an arms race the way I see it. But if we reduce
the consensus weight or assume at network level that relay X should be
used less because of a super tiny percent of dropped circuits we could
end up in wasting network resources on one side, and on the other side
maybe granting better probability chances for evil relays that we have
not discovered yet to grab circuits. A consensus parameter is of course
appropriate here, maybe 20% is a big threshold and should be less, but
right now even 0.1% is reported and treated as overload, IMO this is not
Thanks for looking into this David & Mike.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 495 bytes
Desc: OpenPGP digital signature
More information about the tor-relays