[tor-dev] Latest state of the guard algorithm proposal (prop259) (April 2016)
fanjiang at thoughtworks.com
Thu Apr 21 21:50:28 UTC 2016
On Thu, Apr 21, 2016 at 4:32 AM, George Kadianakis <desnacked at riseup.net>
> Fan Jiang <fanjiang at thoughtworks.com> writes:
> > [ text/plain ]
> > Hi,
> >> Hello Fan and team,
> >> I think I'm not a big fan of the pending_guard and pending_dir_guard
> >> concept. To me it seems like a quick hack that tries to address
> >> issues with our algorithm that appeared when we tried to adapt the
> >> proposal to
> >> the tor codebase.
> > Yeah agree, this pending_guard hack was trying to avoid some
> > problem, we need to redesign.
> > I haven't got any good idea about this, that will be nice if you already
> > got some thoughts.
> >> I think one of the main issues with the current algorithm structure is
> >> _one run of the algorithm_ can be asked to _setup multiple circuits_,
> >> each
> >> of those circuits has different requirements for guards. That is, since
> >> do
> >> filtering on START based on the requirements of circuit #1, this means
> >> that any
> >> other circuits that appear before END is called, also have to adapt to
> >> requirements of circuit #1. This is obvious in the code since we use
> >> guard_selection->for_directory throughout the whole algorithm run, even
> >> though
> >> for_directory was just the restriction of circuit #1.
> >> Specifically about the pending_guard trick, I feel that it interacts in
> >> unpredictable ways with other features of the algorithm. For example,
> >> consider
> >> how it interacts with the primary guards heuristic. It could be time for
> >> the
> >> algorithm to reenter the primary guards state and retry the top guards
> >> the
> >> list, but because of the pending_guard thing we actually return the 15th
> >> guard
> >> to the circuit.
> >> IMO we should revisit the algorithm so that one run of the algorithm can
> >> accomodate multiple circuits by design and without the need for hacks.
> >> Here is
> >> an idea towards that direction:
> >> I think one very important change that we can do to simplify things
> is to
> >> remove the need to filter guards based on whether they are dirguards,
> >> fast,
> >> or stable. My suggestion here would be to *only* consider guards that
> >> dirguards _and_ fast _and_ stable. This way, any circuit that appears
> >> will be
> >> happy with all the guards in our list and there is no need to do the
> >> pending_dir_guard trick. See  on why I think that's safe to do.
> >> This is easy to do in the current codebase. You just need to call
> >> entry_is_live() with need_uptime, need_capacity and for_directory all
> >> enabled (instead of flags being 0).
> >> If you do the above, your sampled guard set will be able to accomodate
> >> any
> >> circuit that comes its way and that should simplify logic
> > Sounds great, that can simplify the logic a lot, I've done the change, no
> > more pending_dir_guard.
> Hm. Can't you also remove pending_guard? What's the point of it now?
Pending guard was actually for sampled_guard, say when we are in
we don't wan't to make the algo return a new random guard
picked_by_banwith, we want to keep the same one
until the callback of first hop comes. We can make it specific for
sampled_guards after checking the state.
Does this make sense?
BTW, looking at your e54551adbfd5be4bee795df10f925b53fc9e730d I suggest you
> also use entry_is_live() with the ENTRY_NEED_UPTIME and ENTRY_NEED_CAPACITY
> flags always enabled. So that it always returns Stable && Fast guards.
Yes, I have done this change.
We should also look at how ENTRY_ASSUME_REACHABLE and ENTRY_NEED_DESCRIPTOR
> used in the rest of the code, to see if we should enable them or not
> >> Never saw this before, will look into it.
> > - There is a memleak on 'extended' in filter_set().
> >> In general, I feel that logic in that function is a bit weird. The
> >> function
> >> is called filter_set() but it can actually end up adding guards to the
> >> list.
> >> Maybe it can be renamed?
> > Split it up to filter_set & expand_set probably can make this clear.
> > - What's up with the each_remaining_by_bandwidth() function name?
> > I guess it should be iterate_remaining_guards_by_bandwidth.
> Better. Or sample_guard_by_bandwidth() ? Or get_guard_by_bandwidth() ?
> IIUC that function is probalistically sampling from the 'guards' set, so
> not really iterating it.
We are actually pick and removing guards from remaining_set in this
And I saw the filter_set used this function wrong which has been fixed,
so maybe `pop` is better than `get`.
Do we still need decide_num_guards to define the n_primary_guards? and it's
the only remaining one is using for_directory.
Fan Jiang 蒋帆
Amateur Code Chef
skype fan at torchz.net
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the tor-dev