[tor-dev] [proposal] New Entry Guard Selection Algorithm

Fri Oct 30 16:12:57 UTC 2015

Hello Isis,

thanks for the proposal. Looks good!

I think adding the logic for greater reachability of people behind
FascistFirewalls makes sense and will be very helpful.

Some comments on the proposal inlined below:

----

> Filename: xxx-guard-selection.txt
> Title: New Guard Selection Behaviour
> Author: Isis Lovecruft, George Kadianakis

Thanks for adding me as a co-author! Would also be great if you could
link to my original post for reference:

   https://lists.torproject.org/pipermail/tor-dev/2015-August/009328.html

> Created: 2015-10-28
> Status: Draft
> Extends: 241-suspicious-guard-turnover.txt
>
> <snip>
>
> §2. Design
>
>   Alice, an OP attempting to connect to the Tor network, should undertake the
>   following steps to determine information about the local network and to
>   select (some) appropriate entry guards.  In the following scenario, it is
>   assumed that Alice has already obtained a recent, valid, and verifiable
>   consensus document.
>
>   Before attempting the guard selection procedure, Alice initialises the guard
>   data structures and prepopulates the guardlist structures, including the
>   UTOPIC_GUARDLIST and DYSTOPIC_GUARDLIST (cf. §XXX).  Additionally, the
>   structures have been designed to make updates efficient both in terms of
>   memory and time, in order that these and other portions of the code which
>   require an up-to-date guard structure are capable of obtaining such.
>
>     0. Determine if the local network is potentially accessible.
>
>        Alice should attempt to discover if the local network is up or down,
>        based upon information such as the availability of network interfaces
>        and configured routing tables.  See #16120. [0]
>
>        [XXX: This section needs to be fleshed out more.  I'm ignoring it for
>        now, but since others have expressed interest in doing this, I've added
>        this preliminary step. —isis]
>
>     1. Check that we have not already attempted to add too many guards
>        (cf. proposal #241).
>

I'd suggest you extract all the functionality and variables you need
from prop#241 and incorporate them to your proposal. Otherwise it's
not immediately obvious how the two proposals interact, and which
parts of #prop241 are still active.

Personally, I think I only used CANDIDATE_THRESHOLD from #prop241
which I renamed to GUARDS_ATTEMPTED_THRESHOLD.

>     2. Then, if the PRIMARY_GUARDS on our list are marked offline, the
>        algorithm attempts to retry them, to ensure that they were not flagged
>        offline erroneously when the network was down. This retry attempt
>        happens only once every 20 mins to avoid infinite loops.
>
>        [Should we do an exponential decay on the retry as s7r suggested?
>        —isis]
>
>     3. Take the list of all available and fitting entry guards and return the
>        top one in the list.
>
>     4. If there were no available entry guards, the algorithm adds a new entry
>        guard and returns it.  [XXX detail what "adding" means]
>
>     5. Go through the steps 1-4 above algorithm, using the UTOPIC_GUARDLIST.
>
>        5.a. When the GUARDLIST_FAILOVER_THRESHOLD of the UTOPIC_GUARDLIST has
>             been tried (without success), Alice should begin trying steps 1-4
>             with entry guards from the DYSTOPIC_GUARDLIST as well.  Further,
>             if no nodes from UTOPIC_GUARDLIST work, and it appears that the
>             DYSTOPIC_GUARDLIST nodes are accessible, Alice should make a note
>             to herself that she is possibly behind a fascist firewall.
>
>        5.b. If no nodes from either the UTOPIC_GUARDLIST or the
>             DYSTOPIC_GUARDLIST are working, Alice should make a note to
>             herself that the network has potentially gone down.  Alice should
>             then schedule, at exponentially decaying times, to rerun steps
>             0-5.
>
>             [XXX Should we do step 0? Or just 1-4?  Should we retain any
>             previous assumptions about FascistFirewall?  —isis]
>
>     6. [XXX Insert potential other fallback mechanisms, e.g. switching to
>        using bridges? —isis]
>
>
> §3. New Data Structures, Consensus Parameters, & Configurable Variables
>
> §3.1. Consensus Parameters & Configurable Variables
>
>     Variables marked with an asterisk (*) SHOULD be consensus parameters.
>
>     DYSTOPIC_GUARDS ¹
>         All nodes listed in the most recent consensus which are marked with
>         the Guard flag and which advertise their ORPort(s) on 80, 443, or any
>         other addresses and/or ports controllable via the FirewallPorts and
>         ReachableAddresses configuration options.
>
>     UTOPIC_GUARDS
>         All nodes listed in the most recent consensus which are marked with
>         the Guard flag and which do NOT advertise their ORPort(s) on 80, 443,
>         or any other addresses and/or ports controllable via the FirewallPorts
>         and ReachableAddresses configuration options.
>

It's interesting that these two sets DYSTOPIC_GUARDS and UTOPIC_GUARDS
are disjoint. This means that there will be no 80/443 relays on the
UTOPIC guardlist. This means that 80/443 guards will only be used by
people under FascistFirewall; which makes them a bit like bridges in a
way, and also has significant effects on our load balancing.

Are we sure we want these two sets to be disjoint?

I could imagine an alternative design where we still have the two
guard lists, but we also allow 80/443 relays to be in UTOPIC_GUARDS.
If you were lucky and you had 80/443 guards in your UTOPIC guard list
you don't need to go down to the DYSTOPIC guard list. Or something
like this.

>     PRIMARY_GUARDS *
>        The number of first, active, PRIMARY_GUARDS on either the
>        UTOPIC_GUARDLIST or DYSTOPIC_GUARDLIST as "primary". We will go to
>        extra lengths to ensure that we connect to one of our primary guards,
>        before we fall back to a lower priority guard. By "active" we mean that
>        we only consider guards that are present in the latest consensus as
>        primary.
>
>     UTOPIC_GUARDS_ATTEMPTED_THRESHOLD *
>     DYSTOPIC_GUARDS_ATTEMPTED_THRESHOLD *
>        These thresholds limit the amount of guards from the UTOPIC_GUARDS and
>        DYSTOPIC_GUARDS which should be partitioned into a single
>        UTOPIC_GUARDLIST or DYSTOPIC_GUARDLIST respectively.  Thus, this
>        represents the maximum percentage of each of UTOPIC_GUARDS and
>        DYSTOPIC_GUARDS respectively which we will attempt to connect to.  If
>        this threshold is hit we assume that we are offline, filtered, or under
>        a path bias attack by a LAN adversary.
>
>        There are currently 1600 guards in the network.  We allow the user to
>        attempt 80 of them before failing (5% of the guards).  With regards to
>        filternet reachability, there are 450 guards on ports 80 or 443, so the
>        probability of picking such a guard guard here should be high.
>

oops "guard guard"

>
>        This logic is not based on bandwidth, but rather on the number of
>        relays which possess the Guard flag.  This is for three reasons: First,
>        because each possible *_GUARDLIST is roughly equivalent to others of
>        the same category in terms of bandwidth, it should be unlikely [XXX How
>        unlikely? —isis] for an OP to select a guardset which contains less
>        nodes of high bandwidth (or vice versa).  Second, the path-bias attacks
>        detailed in proposal #241 are best mitigated through limiting the
>        number of possible entry guards which an OP might attempt to use, and
>        varying the level of security an OP can expect based solely upon the
>        fact that the OP picked a higher number of low-bandwidth entry guards
>        rather than a lower number of high-bandwidth entry guards seems like a
>        rather cruel and unusual punishment in addition to the misfortune of
>        already having slower entry guards.  Third, we favour simplicity in the
>        redesign of the guard selection algorithm, and introducing bandwidth
>        weight fraction computations seems like an excellent way to
>        overcomplicate the design and implementation.
>
>
> §3.2. Data Structures
>
>     UTOPIC_GUARDLIST
>     DYSTOPIC_GUARDLIST
>         These lists consist of a subset of UTOPIC_GUARDS and DYSTOPIC_GUARDS
>         respectively.  The guards in these guardlists are the only guards to
>         which we will attempt connecting.
>
>         When an OP is attempting to connect to the network, she will construct
>         hashring structure containing all potential guard nodes from both
>         UTOPIC_GUARDS and DYSTOPIC_GUARDS.  The nodes SHOULD BE inserted into
>         the structure some number of times proportional to their consensus
>         bandwidth weight. From this, the client will hash some information
>         about themselves [XXX what info should we use? —isis] and, from that,
>         choose #P number of points on the ring, where #P is
>         {UTOPIC,DYSTOPIC}_GUARDLIST_ATTEMPTED_THRESHOLD proportion of the
>         total number of unique relays inserted (if a duplicate is selected, it
>         is discarded).  These selected nodes comprise the
>         {UTOPIC,DYSTOPIC}_GUARDLIST for (first) entry guards.  (We say "first"
>         in order to distinguish between entry guards and the vanguards
>         proposed for hidden services in proposal #247.)
>

I don't entirely understand why we prefer a hash ring over a simple
list here for sampling guards. I was imagining that when we need a new
guard, we would just put all guards in a list, and sample a random
guard weighted by bandwidth. I think this is what the current code is
doing in smartlist_choose_node_by_bandwidth_weights() and it seems easy!

>         [Perhaps we want some better terminology for this.  Suggestions
>         welcome. —isis]
>
>         Each GUARDLIST SHOULD have the property that the total sum of
>         bandwidth weights for the nodes contained within it is roughly equal
>         to each other guardlist of the same type (i.e. one UTOPIC_GUARDLIST is
>         roughly equivalent in terms of bandwidth to another UTOPIC_GUARDLIST,
>         but necessarily equivalent to a DYSTOPIC_GUARDLIST).
>

Why is it important for guard lists to have about the same total bandwidth?

Also, will this be enforced somehow, or is it a result of how the hash ring is
designed?

>         For space and time efficiency reasons, implementations of the
>         GUARDLISTs SHOULD support prepopulation(), update(), insert(), and
>         remove() functions.  A second data structure design consideration is

These operations sound good!

>         that the amount of "shifting" — that is, the differential between
>         constructed hashrings as nodes are inserted or removed (read: ORs
>         falling in and out of the network consensus) — SHOULD be minimised in
>         order to reduce the resources required for hashring update upon
>         receiving a newer consensus.
>

Why is shifting-resistance important here? I'm not sure what you mean
"reduce the resources required for hashring update". This is something
that happens about once every hour right?

>         The implementation we propose is to use a Consistent Hashring,
>         modified to dynamically allocate replications in proportion to
>         fraction of total bandwidth weight.  As with a normal Consistent
>         Hashring, replications determine the number times the relay is
>         inserted into the hashring.  The algorithm goes like this:
>
>           router          ← ⊥
>           key             ← 0
>           replications    ← 0
>           bw_weight_total ← 0
>           while router ∈ GUARDLIST:
>            | bw_weight_total ← bw_weight_total + BW(router)
>           while router ∈ GUARDLIST:
>            | replications ← FLOOR(CONSENSUS_WEIGHT_FRACTION(BW(router),
>           bw_total) * T)
>            | factor ← (S / replications)
>            | while replications != 0:
>            |  | key ← (TOINT(HMAC(ID)[:X] * replications * factor) mod S
>            |  | INSERT(key, router)
>            |  | replications <- replications - 1
>
>         where:
>
>           - BW is a function for extracting the value of an OR's `w bandwith=`
>             weight line from the consensus,
>           - GUARDLIST is either UTOPIC_GUARDLIST or DYSTOPIC_GUARDLIST,
>           - CONSENSUS_WEIGHT_FRACTION is a function for computing a router's
>             consensus weight in relation to the summation of consensus weights
>             (bw_total),
>           - T is some arbitrary number for translating a router's consensus
>             weight fraction into the number of replications,
>           - H is some collision-resistant hash digest,
>           - S is the total possible hash space of H (e.g. for SHA-1, with
>             digest sizes of 160 bits, this would be 2^160),
>           - HMAC is a keyed message authentication code which utilises H,
>           - ID is an hexadecimal string containing the hash of the router's
>             public identity key,
>           - X is some (arbitrary) number of bytes to (optionally) truncate the
>             output of the HMAC to,
>           - S[:X] signifies truncation of S, some array of bytes, to a
>             sub-array containing X bytes, starting from the first byte and
>             continuing up to and including the Xth byte, such that the
>             returned sub-array is X bytes in length.
>           - INSERT is an algorithm for inserting items into the hashring,
>           - TOINT convert hexadecimal to decimal integers,
>
>         For routers A and B, where B has a little bit more bandwidth than A,
>         this gets you a hashring which looks like this:
>
>                            B-´¯¯`-BA
>                         A,`        `.
>                         /            \
>                        B|            |B
>                         \            /
>                          `.        ,´A
>                           AB--__--´B
>
>         When B disappears, A remains in the same positions:
>
>                            _-´¯¯`-_A
>                         A,`        `.
>                         /            \
>                         |            |
>                         \            /
>                          `.        ,´A
>                           A`--__--´
>
>         And similarly if B disappears:
>

Hm, maybe you mean "if A disappears".

>
>                            B-´¯¯`-B
>                          ,`        `.
>                         /            \
>                        B|            |B
>                         \            /
>                          `.        ,´
>                            B--__--´B
>
>         Thus, no "shifting" problems, and recalculation of the hashring when a
>         new consensus arrives via the update() function is much more time
>         efficient.
>
>         Alternatively, for a faster and simpler algorithm, but non-uniform
>         distribution of the keys, one could remove the "factor" and replace
>         the derivation of "key" in the algorithm above with:
>
>                 key ← HMAC(ID || replications)[:X]
>
>         A reference implementation in Python is available². [1]
>
>
> §4. Footnotes
>
> ¹ "Dystopic" was chosen because those are the guards you should choose from if
>   you're behind a FascistFirewall.
>
> ² One tiny caveat being that the ConsistentHashring class doesn't dynamically
>   assign replication count by bandwidth weight; it gets initialised with the
>   number of replications.  However, nothing in the current implementation
>   prevents you from doing:
>       >>> h = ConsistentHashring('SuperSecureKey', replications=6)
>       >>> h.insert(A)
>       >>> h.replications = 23
>       >>> h.insert(B)
>       >>> h.replications = 42
>       >>> h.insert(C)
>
>
> §5. References
>
>   [0]: https://trac.torproject.org/projects/tor/ticket/16120
>   [1]:
>   https://gitweb.torproject.org/user/isis/bridgedb.git/tree/bridgedb/hashring.py?id=949d33e8#n481
>
>