[tor-dev] DirAuth usage and 503 try again later

Sebastian Hahn hahn.seb at web.de
Tue Jan 12 00:53:32 UTC 2021



> On 11. Jan 2021, at 23:20, James <jbrown299 at yandex.com> wrote:
> 
> Good day.
> 
> Is there any chance that torpy (https://github.com/torpyorg/torpy) was triggered this issue https://gitlab.torproject.org/tpo/core/tor/-/issues/33018 ?
> 
> Some wary facts:
> - Torpy using old fashion consensus (not mircodesc)
> - When consensus not present in cache (first time usage) it downloads consensus from random directory authorities only.
> - Before August 2020 it was using plain HTTP requests to DirAuths. Now it creates "CREATE_FAST" circuits to DirAuths (is that right way by the way?)
> 
> From other side:
> - Torpy store consensus on disk (so whenever client restart it must not download full consensus again)
> - It will try download consensus after time which sets by valid_time field from consensus which more than 1 hour (so it's not so often)
> - Torpy try get consensus by "diff" feature (so it's minimize traffic)
> 
> Still may be some of this features not working well in some conditions. Which could cause a lot of consensus downloads in Jan 2020... Or may be you know more info about this situation?

Hi there,

thanks for the message. I think it is very likely that torpy is
responsible for a at least a part of the increased load we're seeing on
dirauths. I have taken a (very!) quick look at the source, and it appears
that there are some problems. Please excuse any inaccuracies, I am not
that strong in Python nor have I done too much Tor development recently:

First, I found this string in the code: "Hardcoded into each Tor client
is the information about 10 beefy Tor nodes run by trusted volunteers".
The word beefy is definitely wrong here. The nodes are not particularly
powerful, which is why we have the fallback dir design for
bootstrapping.

The code counts Serge as a directory authority which signs the
consensus, and checks that over half of the dirauths signed it. But
Serge is only the bridge authority and never signs the consensus, so
torpy will reject some consensuses that are indeed valid. Once this
happens, torpy goes into a deathly loop of "consensus invalid,
trying again". There are no timeouts, backoffs, or failures noted.

The code frequently throws exceptions, but when an exception occurs
it just continues doing what it was doing before. It has absolutely
no regards to constrain its resources when using the Tor network.

The logic that if a network_status document was already downloaded that
is used rather than trying to download a new one does not work. I have
a network_status document, but the dirauths are contacted anyway.
Perhaps descriptors are not cached to disk and downloaded on every new
start of the application?

New consensuses never seem to be downloaded from guards, only from
dirauths.

If my analsis above is at least mostly correct, if only some few people
are running a scraper using torpy and call the binary in a loop, they
will quickly overload the dirauths, causing exactly the trouble we're
seeing. The effects compound, because torpy is relentless in trying
again. Especially a scraper that might call torpy in a loop would just
think that a single file failed to download and go to the next, once
again creating load on all the dirauths.

There are probably more things suboptimal that I missed here.
Generally, I think torpy needs to implement the following quickly if it
wants to stop hurting the network. This is in order of priority, but I
think _ALL_ (maybe more) are needed before torpy stops being an abuser
of the network:

- Stop automatically retrying on failure, without backoff
- Cache failures to disk to ensure a newly started torpy_cli does not
  request the same resources again that the previous instance failed to
  get.
- Fix consensus validation logic to work the same way as tor cli (maybe
  as easy as removing Serge)
- use microdescs/consensus, cache descriptors

I wonder if we can actively defend against network abuse like this in
a sensible way. Perhaps you have some ideas, too? I think torpy has the
ability to also quickly overwhelm fallback dirs in its current
implementation, so simply switching to them from dirauths is not a
solution here. Defenses are probably necessary to implement even if
torpy can be fixed very quickly, because the older versions of torpy are
out there and I assume will continue to be used. Hopefully that point
is wrong?

Thanks
Sebastian



More information about the tor-dev mailing list