[tor-dev] Latest state of the guard algorithm proposal (prop259) (April 2016)

Fan Jiang fanjiang at thoughtworks.com
Thu Apr 21 21:50:28 UTC 2016


On Thu, Apr 21, 2016 at 4:32 AM, George Kadianakis <desnacked at riseup.net>
wrote:

> Fan Jiang <fanjiang at thoughtworks.com> writes:
>
> > [ text/plain ]
> > Hi,
> >
> >> Hello Fan and team,
> >>
> >> I think I'm not a big fan of the pending_guard and pending_dir_guard
> >> concept. To me it seems like a quick hack that tries to address
> fundamental
> >> issues with our algorithm that appeared when we tried to adapt the
> >> proposal to
> >> the tor codebase.
> >>
> >>
> > Yeah agree, this pending_guard hack was trying to avoid some
> implementation
> > problem, we need to redesign.
> > I haven't got any good idea about this, that will be nice if you already
> > got some thoughts.
> >
> >
> >> I think one of the main issues with the current algorithm structure is
> that
> >> _one run of the algorithm_ can be asked to _setup multiple circuits_,
> and
> >> each
> >> of those circuits has different requirements for guards. That is, since
> we
> >> do
> >> filtering on START based on the requirements of circuit #1, this means
> >> that any
> >> other circuits that appear before END is called, also have to adapt to
> the
> >> requirements of circuit #1. This is obvious in the code since we use
> >> guard_selection->for_directory throughout the whole algorithm run, even
> >> though
> >> for_directory was just the restriction of circuit #1.
> >>
> >> Specifically about the pending_guard trick, I feel that it interacts in
> >> unpredictable ways with other features of the algorithm. For example,
> >> consider
> >> how it interacts with the primary guards heuristic. It could be time for
> >> the
> >> algorithm to reenter the primary guards state and retry the top guards
> in
> >> the
> >> list, but because of the pending_guard thing we actually return the 15th
> >> guard
> >> to the circuit.
> >>
> >> IMO we should revisit the algorithm so that one run of the algorithm can
> >> accomodate multiple circuits by design and without the need for hacks.
> >> Here is
> >> an idea towards that direction:
> >>
> >>   I think one very important change that we can do to simplify things
> is to
> >>   remove the need to filter guards based on whether they are dirguards,
> >> fast,
> >>   or stable. My suggestion here would be to *only* consider guards that
> are
> >>   dirguards _and_ fast _and_ stable. This way, any circuit that appears
> >> will be
> >>   happy with all the guards in our list and there is no need to do the
> >>   pending_dir_guard trick. See [0] on why I think that's safe to do.
> >>
> >>   This is easy to do in the current codebase. You just need to call
> >>   entry_is_live() with need_uptime, need_capacity and for_directory all
> >>   enabled (instead of flags being 0).
> >>
> >>   If you do the above, your sampled guard set will be able to accomodate
> >> any
> >>   circuit that comes its way and that should simplify logic
> considerably.
> >>
> >>
> > Sounds great, that can simplify the logic a lot, I've done the change, no
> > more pending_dir_guard.
> >
>
> Hm. Can't you also remove pending_guard? What's the point of it now?
>
>
Pending guard was actually for sampled_guard, say when we are in
SAMPLED_GUARDS_STATE
we don't wan't to make the algo return a new random guard
picked_by_banwith, we want to keep the same one
until the callback of first hop comes. We can make it specific for
sampled_guards after checking the state.
Does this make sense?

BTW, looking at your e54551adbfd5be4bee795df10f925b53fc9e730d I suggest you
> also use entry_is_live() with the ENTRY_NEED_UPTIME and ENTRY_NEED_CAPACITY
> flags always enabled. So that it always returns Stable && Fast guards.
>
>
Yes, I have done this change.

We should also look at how ENTRY_ASSUME_REACHABLE and ENTRY_NEED_DESCRIPTOR
> are
> used in the rest of the code, to see if we should enable them or not
> ourselves.
>
> >> Never saw this before, will look into it.
> >
> > - There is a memleak on 'extended' in filter_set().
> >>
> >>   In general, I feel that logic in that function is a bit weird. The
> >> function
> >>   is called filter_set() but it can actually end up adding guards to the
> >> list.
> >>   Maybe it can be renamed?
> >>
> >>
> > Split it up to filter_set & expand_set probably can make this clear.
> >
> > - What's up with the each_remaining_by_bandwidth() function name?
> >>
> >>
> > I guess it should be iterate_remaining_guards_by_bandwidth.
> >
>
> Better. Or sample_guard_by_bandwidth() ? Or get_guard_by_bandwidth() ?
>
> IIUC that function is probalistically sampling from the 'guards' set, so
> it's
> not really iterating it.
>
>
We are actually pick and removing guards from remaining_set in this
function,
And I saw the filter_set used this function wrong which has been fixed,
so maybe `pop` is better than `get`.

Another thing:
Do we still need decide_num_guards to define the n_primary_guards? and it's
the only remaining one is using for_directory.


> Cheers!
>



-- 
____


Fan Jiang 蒋帆
Amateur Code Chef
Thoughtworks, Inc.
mobile +86-150-9189-3714
skype fan at torchz.net
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20160421/a262eef1/attachment.html>


More information about the tor-dev mailing list