[tor-bugs] #12595 [Core Tor/Tor]: Finalize design for improved guard-node behavior

Tue Jun 7 14:04:41 UTC 2016

#12595: Finalize design for improved guard-node behavior
-------------------------------------------------+-------------------------
 Reporter:  asn                                  |          Owner:  asn
     Type:  defect                               |         Status:
 Priority:  High                                 |  assigned
Component:  Core Tor/Tor                         |      Milestone:  Tor:
 Severity:  Normal                               |  0.2.9.x-final
 Keywords:  tor-guard, TorCoreTeam201606,        |        Version:  Tor:
  028-triaged, mike-can, prop259, tor-guards-    |  0.2.7
  revamp                                         |     Resolution:
Parent ID:                                       |  Actual Points:
 Reviewer:                                       |         Points:  3
                                                 |        Sponsor:
                                                 |  SponsorU-must
-------------------------------------------------+-------------------------

Comment (by asn):

 Hello. Here is a small status report on this project.

 Let me first mention the main problems of the current Tor guard algorithm
 that proposal 259 tries to address:

   '''ISSUE1''': The current Tor guard algorithm will attempt to connect to
 an infinite number of guards given enough time. This is a security issue,
 since a LAN adversary can firewall a Tor user until Tor eventually
 connects to an adversary-controlled entry guard. In prop259, we enforce an
 upper bound on the number of guards that Tor will ever attempt to connect
 to (a la prop241).

   '''ISSUE2''': There are various edge cases and race conditions where Tor
 will think that some guards are down, and connect to lower priority
 guards, even though the ones on the top are still up (e.g. bug #12450).
 While it's very hard to fix all these edge cases, proposal 259 aims to
 minimize the time Tor will spend connected to lower priority guards.

   '''ISSUE3''': The current Tor guard algorithm is completely unspecified
 and undocumented, making it very hard to fix issues and design
 improvements. With prop259 we aim to provide a proper algorithm
 specification (i.e. a documented state machine) that in the future could
 be modded to include various improvements (firewall heuristics, etc.). See
 [0] for a brief algorithm description.

   The idea was also to produce clean, isolated and tested code that can be
 reused and extended with ease in the future (e.g. to do multiple layers of
 guards a la prop247).  The current entry guard code is spaggheti and
 spewed all over the codebase.

 ----

 == State of prop259 ==

 You can find the latest version of proposal 259
 [https://github.com/twstrike/torspec/blob/review/proposals/259-guard-
 selection.txt here], and the thoughtworks crew
 [https://github.com/twstrike/tor_for_patching/tree/prop259 has already
 implemented a PoC of it in the Tor codebase].

 There is also [https://github.com/twstrike/tor_guardsim a Python
 simulation of prop259], but I'm not sure what's its current state relative
 to the prop259 spec and the little-t-tor implementation. Ola and Reinaldo
 showed me some graphs of it in Valencia, that looked about what you would
 expect.

 That said, IMO, both the design and the implementation need heavy
 improvements before we consider them for inclusion in our codebase:

 The main problem with the prop259 design right now, is that the algorithm
 was not designed to support multiple parallel invocations of it, which is
 exactly what Tor does (multiple circuits can pick guards at the same
 time). This causes problems like "Prop259 algorithm invocation #2 does not
 learn the guard reachability information that invocation #1 knows, and has
 to try the same dead guards again".

 It is my understanding that when the thoughtworks crew realized the above
 issue, they slightly changed the design such that one invocation of the
 algorithm can support multiple circuit creations. However, I feel this was
 done in a kludgy way on the implementation side. I feel that instead, the
 right way forward would be to refactor the state machine to support this
 usage model.

 IMO, this is the main actual design change that needs to be done to the
 algorithm. Also, the spec needs to be cleaned and simplified a bit
 (because it got edited multiple times by multiple people and there are
 ugly artifacts around), and some constants/states should be renamed to
 better names. I feel that fixing these issues properly should take about
 2-4 days of thinking time.

 I have not looked too deep
 [https://github.com/twstrike/tor_for_patching/tree/prop259 at the
 implementation], but it seems like it's indeed implementing some version
 of proposal 259. The code is workable, but it's also undocumented in
 parts, and it also uses some non-standard coding idioms that we would need
 to fix. Fortunately, the state machine is well tested, and it seems to
 work without crashing on my system. I feel that if we revise the design in
 a way we are happy, then we can use the thoughtworks implementation as a
 good basis for our branch.

 ----

 == Ways forward ==

 The righteous way:

 - I feel that the proper way to proceed here (if we could ignore
 deliverables, lack of
   engineers and funding issues) would be to finish up prop259, and
 implement it in Tor. Ideally, for the first months we would be able to run
 both algorithms in parallel and get statistics about them, so that we can
 actually test it in real life and evaluate its security.

 The honest way:

 - If we don't have time to do the above, but we are still concerned about
 the
   security of our current guard selection (i.e. ISSUE1), we could make a
 quick hotfix for ISSUE1 in our current codebase. For example, we could
 modify `choose_random_entry_impl()` or `pick_entry_guards()` so that
 instead of sampling guards from the whole consensus, we use a restricted
 set of guards to sample from.

   I imagine this is going to be easier to do than ''the righteous way'',
 but also it will not help us at all with ISSUE2 and ISSUE3 which are also
 quite important (e.g. since we hope to implement prop247 as well).

 The cheap way:

 - We already have a PoC of the current (incomplete) prop259: We have a
 branch
   for little-t-tor and a simulation in Python. I think this work could be
 enough to satisfy sponsor deliverables if we have no manpower to work on
 this. I don't actually want to take this approach, but we are all under
 fire from all sides... so dropping things shouldn't be forgotten as an
 option.

 I'm currently unsure which approach should be taken here. I feel that more
 people need to skim over prop259 before we decided that it's a good way
 forward, because me and the thoughtworks team are the only people who are
 actually familiar with it right now (and I'm already starting to forget
 it).

 ----

 == Footnotes ==

 [0]:

 With prop259, when Tor builds a circuit and needs a guard, Tor calls
 START() to initialize the prop259 algo for that particular circuit, and
 then it starts calling NEXT() to receive a candidate guard. Tor connects
 to the candidate guard, and updates the guard reachability state
 accordingly. If the candidate guard did not work, we go back to calling
 NEXT(). Otherwise, we call END() and the algorithm finishes successfuly.

 Prop259 also specifies the behavior of Tor when it receives a new
 consensus.

 Prop259 also specifies the already existing "network-up" heuristic of Tor
 when it manages to connect to a guard after long times of inactivity (see
 prop259 SHOULD_CONTINUE()). This heuristic is already implemented in Tor:
 see first_contact at entry_guard_register_connect_status().

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/12595#comment:37>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online