[tor-dev] Walking onions status update: week 3 notes

Sun Mar 29 21:16:15 UTC 2020

On Fri, Mar 20, 2020 at 5:09 PM teor <teor at riseup.net> wrote:
>
> Hi Nick,
>
> On 21 Mar 2020, at 05:38, Nick Mathewson <nickm at torproject.org> wrote:
>
> Walking Onions: week 3 update
>
> As you might recall, the SNIPs need to be nice and small, but they
> need to be completely self-contained.  The ENDIVE, on the other
> hand, needs to be diff-friendly, and compressible.  Relays need to
> derive all the SNIPs from the ENDIVE.  The ENDIVE and SNIP formats
> that I worked on during last week [SNIPFMT] was designed to meet these
> criteria.
>
> This week, I wrote up the algorithms to extract signed SNIPs from an
> ENDIVE.  This is in section 4 [RECONST] of the specs in my
> work-in-progress repository [GITREPO] This was mainly straightforward
> consequence from the work I did last week, but there were a few
> things that turned up.
>
> One trickier aspect of this work is that we want to move most of the
> decision-making to the authorities, and have the relays simply do an
> expansion algorithm.  Updating all the relays would be slow, so relays
> need to be able to handle new index types and data types without
> fully understanding them.  That means that we need to specify an
> algorithm that can work flexibly with many future kinds of data.
>
> Another tricky aspect is that the data which the relays need to put
> into SNIPs all needs to be signed, so all of that data needs to come
> out the same when relays calculate it as the when the authorities
> calculate it. Thus, all the algorithms for building Merkle trees and
> indices and SNIPRouterData bodies need to be deterministic.
>
> I think that I got all of this relatively right in [RECONST], but
> there are probably some mistakes hiding in there.
>
> I'm hoping that there is a cleaner alternative to the weighted-index
> reconstruction algorithm -- the one that's there right now puts a
> lot of requirements on the input data, in order to avoid
> floating-point operations.
>
>
> There's a shorter encoding for Raw:
>
> for each [i, pos1, pos2] in index_ranges:
>   w = pos2 - pos1
>   j  = the index of pos1 among all sorted pos1s
>   new_encoding = [i, j, w]

This will work when w is an integer that we can work on additively,
but not so well when we're talking about something more like a hash
ring.  I think it makes sense to have two different possible encodings
here, though.  I'm calling this one "RawNumeric.

> [i, j, w] is an efficient encoding if index_ranges is sparse compared
> with ENDIVERouterData, because:
> * j has the same cardinality as I
> * w is smaller than pos1 and pos2
>
> If index_ranges is dense, there may be a more efficient encoding:
>   add missing i with weight 0
>   drop j
> With this encoding, you can drop a few of the constraints.

> There's a faster calculation and more efficient encoding for
> Weighted, because there's a common factor in each POS()
> calculation:
> (1 << 32) / total_weight
>
> If we remove that factor, we end up with a much simpler algorithm,
> that's also more flexible:
>
> Algorithm: Expanding a "Weighted" indexspec.
>
>
> Each weighted indexspec also has a multiplier, which may vary
>
> between indexspecs.
>
>
> Let total_weight = SUM(indexspec.index_weights)
>
> Verify total_weight * multiplier <= UINT64_MAX.
>
>
> Let total_so_far = 0.
>
> Let result_idx = {} (an empty mapping).
>
> Define POS(b) = b * multiplier
>
>
> For 0 <= i < LEN(indexspec.indexweights):
>
>    Let w = indexspec.indexweights[i].
>
>    Let lo = POS(total_so_far).
>
>    Let total_so_far = total_so_far + w.
>
>    Let hi = POS(total_so_far) - 1.
>
>    Append (lo, hi) => i to result_idx.
>
>
> Return result_idx.

I'm not so sure about this one -- see the note below about the total value.

> If multiplier is large, then the weights can be quite small.
>
> (Small weights sacrifice some accuracy.) And if the multiplier
>
> is 1, you effectively have a "Raw" index.
>
>
> If you make those changes, you should be able to use a similar
> process to expand all the different index types. (After multiplying,
> truncating, or hashing, you either end up with a delta, or an
> absolute position. You can turn deltas into absolute positions,
> and then feed them all into the same base algorithm.)
>
> There are also a few things I think might be bugs:
>
> Was there meant to be a constraint that the Weighted total is
> UINT64_MAX? Or close to UINT64_MAX?

The actual intention is that it has to be " = UINT32_MAX", not" <=
UINT32_MAX".  Remember that the the client picks its next relay by
specifying a random index value in range 0..UINT32_MAX, and it expects
that the relay to that it is extending to _will_ have a snip that
covers that range.

> The fixed parameters don't make much sense otherwise.
>
>
> I think v2 and v3 onion services assign descriptors to the next
> highest HSDir RSA id (or ed25519 hash). But
> INDEX_FROM_RING_KEYS() uses the relay input as the lowest value.
>
> There is no next member for the last member in
> INDEX_FROM_RING_KEYS(), but the code asks for one.
> (Perhaps there are some typos here that make the code hard to
> understand.)

Okay, I'll have to look into this.  I think you're probably right, and
you _are_ right about the ring wrapping issues.

> We'll need special treatment for ring wrapping. (That is, 0xFF… is not
> a real upper limit, it actually means, "use the lowest relay".)
>
> It's weird to call a middle value a "suffix", but "infix" is also a bit of an
> unusual word.
>
> There are also a bunch of typos, let me know when you're ready for
> copy-editing.

Once I'm through with sections 3 and 5, I'm going to do a big revision
and consistency check of everything that's there before I move on to
sections 6-8.  Once that's done I hope that a copy edit would be more
helpful.

-- 
Nick