[tor-dev] Understanding the guard/md issue (#21969)

teor teor2345 at gmail.com
Mon Oct 30 12:31:41 UTC 2017



-- 
Tim / teor

PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B
ricochet:ekmygaiu4rzgsk6n
------------------------------------------------------------------------

> On 30 Oct 2017, at 22:30, George Kadianakis <desnacked at riseup.net> wrote:
> 
> teor <teor2345 at gmail.com> writes:
> 
>>> On 29 Oct 2017, at 01:19, George Kadianakis <desnacked at riseup.net> wrote:
>>> 
>>> Hey Tim,
>>> 
>>> just wanted to ask a clarifying question wrt #21969.
>>> 
>>> First of all there are various forms of #21969 (aka the "missing
>>> descriptors for some of our primary entry guards" issue). Sometimes it
>>> occurs for 10 mins and then goes away, whereas for other people it
>>> disables their service permanently (until restart). I call this the
>>> hardcore case of #21969. It has happened to me and disabled my service
>>> for days, and I've also seen it happen to other people (e.g. dgoulet).
>>> 
>>> So. We have found various md-related bugs and put them as children of
>>> #21969. Do you think we have found the bugs that can cause the hardcore
>>> case of #21969? That is, is any of these bugs (or a bug combo) capable
>>> of permanently disabling an onion service?
>> 
>> Yes, this bug is disabling:
>> 
> 
> Thanks for the reply, Tim.
> 
>> #23862, where we don't update guard state unless we have enough
>> directory info.
>> 
>> When tor gets in a state where it doesn't have enough directory info
>> due to another bug, this makes sure it will never get out of that state.
>> Because it will never mark its directory guards as up when it gets a
>> new consensus, and therefore it will never fetch microdescs, find out
>> it has enough directory info, and build circuits.
>> 
> 
> Hmm, just want to make sure I get this.
> 
> My understanding with #23862 is that Tor would never mark its directory
> guards as up like you say, but it _would still_ fetch microdescs using
> fallback directories because of the way
> directory_pick_generic_dirserver() works. Isn't that the case?

No, because we're not actually marking those guards as down (#23863),
I think we might be putting them in a partly usable guard state instead.
(Fallbacks are only used when all directory guards or mirrors are down.)

And so we keep trying guards until we backoff for a long time (#23817).

Which means that some of our microdescs start expiring or change in the
consensus. Which triggers #23862, which we fixed in 0.3.2.3-alpha.

And we don't reset the right download state when we get an application
request (#23620). Which makes it hard for tor to recover from this bug.

I might be wrong or missing a few of the details, but those bugs are
enough to cause the issues we're seeing. Hopefully we can find any
remaining bugs when we fix these ones.

This set of bugs is actually better than the alternative, which is clients
trying too fast, and DDoSing relays. If we hadn't implemented exponential
backoff, these retries could have caused either a slow or fast DDoS.

T
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20171030/1a6a3c41/attachment.html>


More information about the tor-dev mailing list