[tor-dev] Proposal: Merging Hidden Service Directories and Introduction Points

Tue Jul 28 01:59:09 UTC 2015

On Thu, Jul 23, 2015 at 11:56 PM, isis <isis at torproject.org> wrote:
> Aaron Johnson transcribed 2.1K bytes:
>> > > That seems easy to fix. Make the number of Introduction Points the same
>> > > as it was before and make them be selected in a bandwidth-weight
>> > > way. There is no cost to this. You need IPs to be online, and so
>> > > whatever number was used in the past will yield the same availability
>> > > now. And bandwidth-weighting should actually improve both performance
>> > > and security.
>> >
>> > Is it obvious how to build a bandwidth-weighted DHT that is stable
>> > across changes in the consensus?  One advantage of using the hash ring
>> > is that the loss of an HSDir causes only local changes in the
>> > topology, so if the client is using a different version of the
>> > consensus they can still locate one of the responsible HSDirs.  (Note:
>> > I do not claim this cannot be done; it just seems like an important
>> > detail to sort out…)
>>
>> You could reserve a space after each relay in the hash ring with a length
>> equal to the relay's bandwidth, and then assign an onion service with a hash
>> that is a fraction f of the maximum possible hash value to the relay owning
>> the space in which the fraction f of the total hash-ring length is
>> located. Removing or adding a relay adjusts the onion service locations by
>> an amount that is at most the fraction that is that relay’s total bandwidth
>> fraction. To ensure coverage for clients with older consensuses, the relay
>> can maintain HSDir+IPs at the locations indicated by both the current and
>> previous consensus.
>
> At one point, I wanted to do something like this for BridgeDB's hashrings, to
> be able to increase the probability that Bridges with higher bandwidth would
> be distributed (assuming we live in a glorious future where Bridges are
> actually measured).  The above design isn't the most time efficient, and it
> also turned out to be *super* unfun to implement/debug.  For HS reliability,
> it could be a bit disastrous, depending on how much "shifting" happens between
> consensuses, and (at least, for BridgeDB's case) my testing showed that even a
> small differential meant that nearly the entire hashring would be in an
> unusable state.

FWIW, I was running a simulation of this algorithm with the first week
of July's consensuses when Isis posted the following way smarter
algorithm:

> A better algorithm would be a Consistent Hashring, modified to dynamically
> allocate replications in proportion to fraction of total bandwidth weight. As
> with a normal Consistent Hashring, replications determine the number times the
> relay is uniformly inserted into the hashring.

So I simulated this one also (with one exception: I didn't scale the
number of replications by the total bandwidth.  Instead, each HSDir
simply gets a number of ring locations proportional to its measured
bandwidth.  The scaling should happen automatically.)  The simulation
works as follows: pick 10000 hash ring locations, and for each
location,  track how many distinct relays would be responsible for
that location for one calendar day.  The simulation was run for 7/1/15
through 7/7/15.  With Aaron's algorithm, the average hash ring
location mapped to 9.96 distinct relays each day; with Isis'
consistent hash ring approach, the average location mapped to 1.41
distinct relays each day.

-- 
------------------------------------------------------------------------
Nicholas Hopper
Associate Professor, Computer Science & Engineering
University of Minnesota
------------------------------------------------------------------------