Re: [tor-dev] Proposal: Merging Hidden Service Directories and Introduction Points

27 Jul 2015


      On Thu, Jul 23, 2015 at 11:56 PM, isis <isis@torproject.org> wrote:
...
Aaron Johnson transcribed 2.1K bytes:
...
...
...
That seems easy to fix. Make the number of Introduction Points the same
as it was before and make them be selected in a bandwidth-weight
way. There is no cost to this. You need IPs to be online, and so
whatever number was used in the past will yield the same availability
now. And bandwidth-weighting should actually improve both performance
and security.
Is it obvious how to build a bandwidth-weighted DHT that is stable
across changes in the consensus?  One advantage of using the hash ring
is that the loss of an HSDir causes only local changes in the
topology, so if the client is using a different version of the
consensus they can still locate one of the responsible HSDirs.  (Note:
I do not claim this cannot be done; it just seems like an important
detail to sort out…)
You could reserve a space after each relay in the hash ring with a length
equal to the relay's bandwidth, and then assign an onion service with a hash
that is a fraction f of the maximum possible hash value to the relay owning
the space in which the fraction f of the total hash-ring length is
located. Removing or adding a relay adjusts the onion service locations by
an amount that is at most the fraction that is that relay’s total bandwidth
fraction. To ensure coverage for clients with older consensuses, the relay
can maintain HSDir+IPs at the locations indicated by both the current and
previous consensus.
At one point, I wanted to do something like this for BridgeDB's hashrings, to
be able to increase the probability that Bridges with higher bandwidth would
be distributed (assuming we live in a glorious future where Bridges are
actually measured).  The above design isn't the most time efficient, and it
also turned out to be *super* unfun to implement/debug.  For HS reliability,
it could be a bit disastrous, depending on how much "shifting" happens between
consensuses, and (at least, for BridgeDB's case) my testing showed that even a
small differential meant that nearly the entire hashring would be in an
unusable state.
FWIW, I was running a simulation of this algorithm with the first week
of July's consensuses when Isis posted the following way smarter
algorithm:
...
A better algorithm would be a Consistent Hashring, modified to dynamically
allocate replications in proportion to fraction of total bandwidth weight. As
with a normal Consistent Hashring, replications determine the number times the
relay is uniformly inserted into the hashring.
So I simulated this one also (with one exception: I didn't scale the
number of replications by the total bandwidth.  Instead, each HSDir
simply gets a number of ring locations proportional to its measured
bandwidth.  The scaling should happen automatically.)  The simulation
works as follows: pick 10000 hash ring locations, and for each
location,  track how many distinct relays would be responsible for
that location for one calendar day.  The simulation was run for 7/1/15
through 7/7/15.  With Aaron's algorithm, the average hash ring
location mapped to 9.96 distinct relays each day; with Isis'
consistent hash ring approach, the average location mapped to 1.41
distinct relays each day.

-- 
------------------------------------------------------------------------
Nicholas Hopper
Associate Professor, Computer Science & Engineering
University of Minnesota
------------------------------------------------------------------------