[tor-bugs] #28780 [Core Tor/Tor]: circpadding: Add machine flag for not closing circuit if machine is active

Tue Apr 30 22:07:43 UTC 2019

#28780: circpadding: Add machine flag for not closing circuit if machine is active
-------------------------------------------------+-------------------------
 Reporter:  asn                                  |          Owner:  (none)
     Type:  defect                               |         Status:
                                                 |  needs_information
 Priority:  Very High                            |      Milestone:  Tor:
                                                 |  0.4.1.x-final
Component:  Core Tor/Tor                         |        Version:
 Severity:  Normal                               |     Resolution:
 Keywords:  wtf-pad, tor-relay, tor-cell,        |  Actual Points:  6
  padding, 041-proposed, network-team-           |
  roadmap-2019-Q1Q2                              |
Parent ID:  #28634                               |         Points:  5
 Reviewer:  asn                                  |        Sponsor:
                                                 |  Sponsor2
-------------------------------------------------+-------------------------
Changes (by mikeperry):

 * status:  needs_revision => needs_information

Comment:

 Replying to [comment:29 nickm]:
 > I want to talk about this approach more before we merge it.
 >
 > As I understand it, the idea here is that some circuits that would
 otherwise close should instead stay open longer, so that padding can be
 sent on them.  This patch achieves that by adding a call inside
 `circuit_mark_for_close` to `circpad_circuit_should_be_marked_for_close`,
 which potentially overrides the decision to mark the circuit, and changes
 its purpose instead.  Later, the circuit is supposed to get expired
 naturally when it times out.
 >
 > Here are some things that worry me about the logic:
 >
 > 1. Intercepting `circuit_mark_for_close()` is pretty risky IMO.  It no
 longer 'does what it says on the label' and instead might keep the circuit
 open.  There are over 100 callers to circuit_mark_for_close(); I think it
 will be hard to audit them all.

 This same reasoning made us prefer to hook circuit_mark_for_close(). If we
 want to keep a circuit open for padding, preventing current and future
 code from somehow closing it via one of these 100 other codepaths seemed
 daunting... This is also the reasoning that caused me to prefer purpose
 changes to flags: when a purpose changes, both existing code and not-yet-
 written code will stop trying to use that circuit. But when a flag gets
 added, not everybody knows to check that flag before proceeding.

 > 2. The function `circpad_circuit_should_be_marked_for_close()` is named
 as if it were a predicate function, but in fact it changes the circuit's
 purpose in some cases.  (Also, it doesn't answer the question "should this
 circuit be marked for close"; it answers "can this circuit be allowed to
 close, given that somebody else wants to mark it.")

 This is already the case since we added pathbias code.. It changes the
 purpose and sends a probe in some cases. Our circuit timeout code also
 changes circuit purpose to MEASUREMENT in a similar way, to hold them open
 long enough to get a better timeout estimate (at Roger's suggestion).

 I see circuit purpose as a signifying component ownership of a circuit. By
 doing it this way, we're sending a loud and clear signal to the rest of
 the code that we've taken over responsibility to use, update, and free
 this circuit.

 > 3. The function `circuit_expire_old_circuits_clientside()`, which we are
 relying on to kill of circuits eventually, uses `circuit_mark_for_close()`
 to close the circuits it wants to expire, and circuit_mark_for_close is
 now potentially overridden.

 This was also a deliberate design artifact: to avoid dead parrot issues,
 we wanted to use the exact same timeout logic (and conditional timeout
 values based on that logic) to tear down the circuit, once circuit padding
 was done padding on it. (Said another way, we're telling the
 circuit_expire_old_circuits_clientside() code, other timeout code, and all
 other current and future components: "Hey, this is now the padding
 component's circuit. Please don't time it out or close it).

 > Here's one possibility of we could do instead:
 >
 > 1. Rename the new purpose from C_CIRCUIT_PADDING to
 CIRCUIT_PADDING_SHUTDOWN to make it clear that the circuit's purpose is
 "sending padding until it dies."

 Hrmm.. I would rather have a purpose devoted to the padding subsystem as a
 whole, again as signifying ownership. In ticket:30092#comment:7, we
 realized that we may need to launch new circuits from the padding code
 eventually. The logic for keeping those alive will be the same as for
 these converted circuits (keep them alive until the padding machine is
 gone, and then use our normal circuit_expire_old_circuits_clientside()
 timers after that), and so using the same purpose for any padding-
 subsystem launched circuits also seems wise.

 > 2. Leave circuit_mark_for_close() untouched from the current codebase.
 Instead add a new function, `circuit_transition_to_shutdown()`.  This
 function should either mark the circuit for close immediately by calling
 circuit_mark_for_close(), or should change the circuit's purpose to
 CIRCUIT_PADDING_SHUTDOWN.

 Future callers would have to know to use this new function in all relevant
 future patches. The number of people and components that have to learn
 about this change to maintain correctness for circuitpadding will become
 very large. This seems failure prone.

 If we went this way, I would want to add asserts to
 circuit_mark_for_close() to prevent it from being called without a call to
 circuit_transition_to_shutdown() for those cases to future-proof it, which
 seems a little backwards. And I bet we'd still have breakage in the
 future.

 > 3. Audit the codebase so that we change some calls from
 circuit_mark_for_close to circuit_transition_to_shutdown.  We should only
 do this for calls that use REASON_FINISHED or REASON_NONE or
 REASON_IP_NOW_REDUNDANT today.

 Again, we'd need lots of BUG macros and future-proof checks to guard
 against reverts in behavior.. This seems more fragile an approach.

 > How does that sound?

 Can we back up and discuss the root reasons why you feel this approach
 risky? My read of the above is that your three concerns are: code clarity,
 memory leaks, and purpose changes. Is that right?

 If we try to step back and look at the places where this code might cause
 memory leaks and other ownership-related issues, maybe we can come up with
 either tests or refactoring that directly addresses those two concerns?

 Or are there other concerns that I am missing?

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/28780#comment:30>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online