[tor-dev] Walking onions status update: week 3 notes

teor teor at riseup.net
Fri Mar 20 21:08:52 UTC 2020


Hi Nick,

> On 21 Mar 2020, at 05:38, Nick Mathewson <nickm at torproject.org> wrote:
> 
> Walking Onions: week 3 update
> 
> As you might recall, the SNIPs need to be nice and small, but they
> need to be completely self-contained.  The ENDIVE, on the other
> hand, needs to be diff-friendly, and compressible.  Relays need to
> derive all the SNIPs from the ENDIVE.  The ENDIVE and SNIP formats
> that I worked on during last week [SNIPFMT] was designed to meet these
> criteria.
> 
> This week, I wrote up the algorithms to extract signed SNIPs from an
> ENDIVE.  This is in section 4 [RECONST] of the specs in my
> work-in-progress repository [GITREPO] This was mainly straightforward
> consequence from the work I did last week, but there were a few
> things that turned up.
> 
> One trickier aspect of this work is that we want to move most of the
> decision-making to the authorities, and have the relays simply do an
> expansion algorithm.  Updating all the relays would be slow, so relays
> need to be able to handle new index types and data types without
> fully understanding them.  That means that we need to specify an
> algorithm that can work flexibly with many future kinds of data.
> 
> Another tricky aspect is that the data which the relays need to put
> into SNIPs all needs to be signed, so all of that data needs to come
> out the same when relays calculate it as the when the authorities
> calculate it. Thus, all the algorithms for building Merkle trees and
> indices and SNIPRouterData bodies need to be deterministic.
> 
> I think that I got all of this relatively right in [RECONST], but
> there are probably some mistakes hiding in there.
> 
> I'm hoping that there is a cleaner alternative to the weighted-index
> reconstruction algorithm -- the one that's there right now puts a
> lot of requirements on the input data, in order to avoid
> floating-point operations.

There's a shorter encoding for Raw:

for each [i, pos1, pos2] in index_ranges:
  w = pos2 - pos1
  j  = the index of pos1 among all sorted pos1s
  new_encoding = [i, j, w]

[i, j, w] is an efficient encoding if index_ranges is sparse compared
with ENDIVERouterData, because:
* j has the same cardinality as I
* w is smaller than pos1 and pos2

If index_ranges is dense, there may be a more efficient encoding:
  add missing i with weight 0
  drop j

With this encoding, you can drop a few of the constraints.

There's a faster calculation and more efficient encoding for
Weighted, because there's a common factor in each POS()
calculation:
(1 << 32) / total_weight

If we remove that factor, we end up with a much simpler algorithm,
that's also more flexible:

Algorithm: Expanding a "Weighted" indexspec.

Each weighted indexspec also has a multiplier, which may vary
between indexspecs.

Let total_weight = SUM(indexspec.index_weights)
Verify total_weight * multiplier <= UINT64_MAX.

Let total_so_far = 0.
Let result_idx = {} (an empty mapping).
Define POS(b) = b * multiplier

For 0 <= i < LEN(indexspec.indexweights):
   Let w = indexspec.indexweights[i].
   Let lo = POS(total_so_far).
   Let total_so_far = total_so_far + w.
   Let hi = POS(total_so_far) - 1.
   Append (lo, hi) => i to result_idx.

Return result_idx.

If multiplier is large, then the weights can be quite small.
(Small weights sacrifice some accuracy.) And if the multiplier
is 1, you effectively have a "Raw" index.

If you make those changes, you should be able to use a similar
process to expand all the different index types. (After multiplying,
truncating, or hashing, you either end up with a delta, or an
absolute position. You can turn deltas into absolute positions,
and then feed them all into the same base algorithm.)

There are also a few things I think might be bugs:

Was there meant to be a constraint that the Weighted total is
UINT64_MAX? Or close to UINT64_MAX?
The fixed parameters don't make much sense otherwise.

I think v2 and v3 onion services assign descriptors to the next
highest HSDir RSA id (or ed25519 hash). But
INDEX_FROM_RING_KEYS() uses the relay input as the lowest value.

There is no next member for the last member in
INDEX_FROM_RING_KEYS(), but the code asks for one.
(Perhaps there are some typos here that make the code hard to
understand.)

We'll need special treatment for ring wrapping. (That is, 0xFF… is not
a real upper limit, it actually means, "use the lowest relay".)

It's weird to call a middle value a "suffix", but "infix" is also a bit of an
unusual word.

There are also a bunch of typos, let me know when you're ready for
copy-editing.

T
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20200321/07bfa77a/attachment.html>


More information about the tor-dev mailing list