[tor-dev] Walking onions status update: week 3 notes

teor teor at riseup.net
Fri Mar 20 21:08:52 UTC 2020

Hi Nick,

> On 21 Mar 2020, at 05:38, Nick Mathewson <nickm at torproject.org> wrote:
> Walking Onions: week 3 update
> As you might recall, the SNIPs need to be nice and small, but they
> need to be completely self-contained.  The ENDIVE, on the other
> hand, needs to be diff-friendly, and compressible.  Relays need to
> derive all the SNIPs from the ENDIVE.  The ENDIVE and SNIP formats
> that I worked on during last week [SNIPFMT] was designed to meet these
> criteria.
> This week, I wrote up the algorithms to extract signed SNIPs from an
> ENDIVE.  This is in section 4 [RECONST] of the specs in my
> work-in-progress repository [GITREPO] This was mainly straightforward
> consequence from the work I did last week, but there were a few
> things that turned up.
> One trickier aspect of this work is that we want to move most of the
> decision-making to the authorities, and have the relays simply do an
> expansion algorithm.  Updating all the relays would be slow, so relays
> need to be able to handle new index types and data types without
> fully understanding them.  That means that we need to specify an
> algorithm that can work flexibly with many future kinds of data.
> Another tricky aspect is that the data which the relays need to put
> into SNIPs all needs to be signed, so all of that data needs to come
> out the same when relays calculate it as the when the authorities
> calculate it. Thus, all the algorithms for building Merkle trees and
> indices and SNIPRouterData bodies need to be deterministic.
> I think that I got all of this relatively right in [RECONST], but
> there are probably some mistakes hiding in there.
> I'm hoping that there is a cleaner alternative to the weighted-index
> reconstruction algorithm -- the one that's there right now puts a
> lot of requirements on the input data, in order to avoid
> floating-point operations.

There's a shorter encoding for Raw:

for each [i, pos1, pos2] in index_ranges:
  w = pos2 - pos1
  j  = the index of pos1 among all sorted pos1s
  new_encoding = [i, j, w]

[i, j, w] is an efficient encoding if index_ranges is sparse compared
with ENDIVERouterData, because:
* j has the same cardinality as I
* w is smaller than pos1 and pos2

If index_ranges is dense, there may be a more efficient encoding:
  add missing i with weight 0
  drop j

With this encoding, you can drop a few of the constraints.

There's a faster calculation and more efficient encoding for
Weighted, because there's a common factor in each POS()
(1 << 32) / total_weight

If we remove that factor, we end up with a much simpler algorithm,
that's also more flexible:

Algorithm: Expanding a "Weighted" indexspec.

Each weighted indexspec also has a multiplier, which may vary
between indexspecs.

Let total_weight = SUM(indexspec.index_weights)
Verify total_weight * multiplier <= UINT64_MAX.

Let total_so_far = 0.
Let result_idx = {} (an empty mapping).
Define POS(b) = b * multiplier

For 0 <= i < LEN(indexspec.indexweights):
   Let w = indexspec.indexweights[i].
   Let lo = POS(total_so_far).
   Let total_so_far = total_so_far + w.
   Let hi = POS(total_so_far) - 1.
   Append (lo, hi) => i to result_idx.

Return result_idx.

If multiplier is large, then the weights can be quite small.
(Small weights sacrifice some accuracy.) And if the multiplier
is 1, you effectively have a "Raw" index.

If you make those changes, you should be able to use a similar
process to expand all the different index types. (After multiplying,
truncating, or hashing, you either end up with a delta, or an
absolute position. You can turn deltas into absolute positions,
and then feed them all into the same base algorithm.)

There are also a few things I think might be bugs:

Was there meant to be a constraint that the Weighted total is
UINT64_MAX? Or close to UINT64_MAX?
The fixed parameters don't make much sense otherwise.

I think v2 and v3 onion services assign descriptors to the next
highest HSDir RSA id (or ed25519 hash). But
INDEX_FROM_RING_KEYS() uses the relay input as the lowest value.

There is no next member for the last member in
INDEX_FROM_RING_KEYS(), but the code asks for one.
(Perhaps there are some typos here that make the code hard to

We'll need special treatment for ring wrapping. (That is, 0xFF… is not
a real upper limit, it actually means, "use the lowest relay".)

It's weird to call a middle value a "suffix", but "infix" is also a bit of an
unusual word.

There are also a bunch of typos, let me know when you're ready for

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20200321/07bfa77a/attachment.html>

More information about the tor-dev mailing list