[tor-bugs] #24667 [Core Tor/Tor]: OOM needs to consider the DESTROY queued cells
Tor Bug Tracker & Wiki
blackhole at torproject.org
Fri Dec 22 04:10:12 UTC 2017
#24667: OOM needs to consider the DESTROY queued cells
----------------------------------------+----------------------------------
Reporter: dgoulet | Owner: (none)
Type: defect | Status: new
Priority: Medium | Milestone: Tor:
| 0.3.3.x-final
Component: Core Tor/Tor | Version:
Severity: Normal | Resolution:
Keywords: tor-cell, tor-circuit, oom | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
----------------------------------------+----------------------------------
Comment (by arma):
Replying to [ticket:24667 dgoulet]:
> But also not sending those will affects other relays hanging on dead
circuits.
Yeah, this is an ugly one. I was first thinking about the case where a
relay doesn't send back a destroy cell towards the client, so the client
ends up with an out-of-sync idea of what the circuit looks like. But in
that case, eventually the client might still try to close the circuit, and
things will take care of themselves.
Where it gets really ugly is if the relay doesn't send a destroy *forward*
on a circuit. Then the circuit essentially lives forever on the later
relays. It will only be when the orconn that would have sent the destroy
cell dies that the next relay will notice.
(If some other orconn on the dangling circuit dies, it could still trigger
splintered dangling circuits: the relay on the client side of the broken
orconn will send a truncated data cell towards the client, which will just
be ignored since there's no circuit that it corresponds to. And then the
splintered dangling circuit will live forever because nobody will ever
know to tell it to go away.)
So, silently dropping destroy cells seems really bad and like we should
really try to avoid it.
One option is to queue them somewhere, using the more efficient queue that
we put in with #24666, and then send them over the next "little while".
That is, it's not critical to send them immediately, so long as they are
sent sometime.
Another option would be to rotate the long-term ORConn once an event has
happened that caused us to drop destroy requests. That is, try to work
towards closing the orconn, which will trigger destruction of the
remaining circuits. But if even one long-lived circuit remains, that
option is not so great, since it could remain for days or even weeks.
What do we know about the pattern of destroys when we are reacting to an
oom case? For example, do we end up making decisions like "close all the
circuits to that relay"? In that case we could close the entire orconn,
right there, rather than sending thousands of destroy cells. We'd probably
want to mark it for flush for a little while so its current contents have
a chance to go out, but that approach seems workable *if* that's the
pattern of destroys that we want to make.
Another option would be to make multidestroy cells that give you a huge
pile of circids+reasons in a single cell -- basically extend the notion of
the destroy queue into something that you can transport wholesale to a
neighbor relay.
Another option would be to make a destroy-except cell, where if you want
to close a big pile of circids but leave a few open, you send over the
ones *not* to destroy.
While we're at it, we might want to get rid of the "send a truncate cell
toward the client, and then let the client actually destroy the circuit"
design. We built Tor that way so that clients could choose to have some
smarter reaction in the future, like re-extending the circuit to some
different next hop. But in practice we haven't figured out a smarter
reaction that doesn't draw in a lot of complexity in terms of anonymity
analysis, so maybe we should opt to simplify the design (and thus reduce
network load).
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/24667#comment:1>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list