[tor-dev] Walking Onions status update: week 2 notes

teor teor at riseup.net
Sat Mar 14 04:44:15 UTC 2020

Hi Nick,

I'm interested in following along with Walking Onions, but I might
drop out when the relay IPv6 work gets busy.

I'm not sure how you'd like feedback, so I'm going to try to put it
in emails, or in pull requests.

(I made one comment on a git commit in walking-onions-wip, but
I'm not sure if you see those, so I'll repeat it here.)

> On 14 Mar 2020, at 03:52, Nick Mathewson <nickm at torproject.org> wrote:
> This week, I worked specifying the nitty-gritty of the SNIP and
> ENDIVE document formats.  I used the CBOR meta-format [CBOR] to
> build them, and the CDDL specification language [CDDL] to specify
> what they should contain.
> As before, I've been working in a git repository at [GITHUB]; you
> can see the document I've been focusing on this week at
> [SNIPFMT].  (That's the thing to read if you want to send me
> patches for my grammar.)

I'm not sure if you've got to exit ports yet, but here's one possible
way to partition ports:
* choose large partitions so that all exits support all ports in the
* choose smaller categories so that most exits support most ports
  in the partition
* ignore small partitions, they're bad for client privacy anyway

For example, you might end up with:
* web (80 & 443)
* interactive (SSH, IRC, etc.)
* bulk (torrent, etc.)
* default exit policy
* reduced exit policy

I'm not sure if we will want separate categories for IPv4-only
and dual-stack policies. We can probably ignore IPv6-only
policies for the moment, but we should think about them in

> There were a few neat things to do here:
>   * I had to define SNIPs so that clients and relays can be
>     mostly agnostic about whether we're using a merkle tree or a
>     bunch of signatures.
>   * I had to define a binary diff format so that relays can keep
>     on downloading diffs between ENDIVE documents. (Clients don't
>     download ENDIVEs).  I did a quick prototype of how to output
>     this format, using python's difflib.

Can we make the OrigBytesCmdId use start and length?
length may be shorter than end, and it will never be longer.

If we are doing chunk-based encoding, we could make start
relative to the last position in the original file. But that would
mean no back-tracking, which means we can't use some
more sophisticated diff algorithms.

>   * To make ENDIVE diffs as efficient as possible, it's important
>     not to transmit data that changes in every ENDIVE.  To this
>     end, I've specified ENDIVEs so that the most volatile parts
>     (Merkle trees and index ranges) are recomputed on the relay
>     side.  I still need to specify how these re-computations work,
>     but I'm pretty sure I got the formats right.
>     Doing this calculation should save relays a bunch of
>     bandwidth each hour, but cost some implementation complexity.
>     I'm going to have to come back to this choice going forward
>     to see whether it's worth it.
>   * Some object types are naturally extensible, some aren't.  I've
>     tried to err on the size of letting us expand important
>     things in the future, and using maps (key->value mappings)
>     for object that are particularly important.
>     In CBOR, small integers are encoded with a little less space
>     than small strings.  To that end, I'm specifying the use of
>     small integers for dictionary keys that need to be encoded
>     briefly, and strings for non-tor and experimental extensions.
>   * This is a fine opportunity to re-think how we handle document
>     liveness.  Right now, consensus directories have an official
>     liveness interval on them, but parties that rely on
>     consensuses tolerate larger variance than is specified in the
>     consensus.  Instead of that approach, the usable lifetime of
>     each object is now specified in the object, and is ultimately
>     controlled by the authorities.  This gives the directory
>     authorities more ability to work around network tolerance
>     issues.
>     Having large lifetime tolerances in the context of walking
>     onions is a little risky: it opens us up to an attack where
>     a hostile relay holds multiple ENDIVEs, and decides which one
>     to use when responding to a request.  I think we can address this
>     attack, however, by making sure that SNIPs have a published
>     time in them, and that this time moves monotonically forward.

If the issue is having multiple valid ENDIVEs, then authorities could
also put a cap on the number of concurrently valid ENDIVEs.

There are two simple schemes to implement a cap:
* set a longer interval for rebuilding all ENDIVEs
  (the cap is the rebuild interval, divided by the validity interval)
* refuse to sign a new SNIP for a relay that's rapidly changing
  (or equivalently, leave that relay out of the next ENDIVE)

Both these schemes also limit the amount of bandwidth used
for a relay that's rapidly changing details.

>   * As I work, I'm identifying other issues in tor that stand in
>     the way of a good efficient walking onion implementation that
>     will require other follow-up work.  This week I ran into a
>     need for non-TAP-based v2 hidden services, and a need for a
>     more efficient family encoding.  I'm keeping track of these
>     in my outline file.

Do "tricky restrictions" include the IP subnet restriction (avoid
relays in the same IPv4 /16 and IPv6 /32) ?

What about a heterogenous IPv4 / IPv6 network, where
IPv4-only relays can't connect to IPv6-only relays?

If we do decide to add IPv6-only relays, we'll probably add
them in this order:
* IPv6-only bridges (needs dual-stack bridge guards / middles?)
* IPv6-only exits (needs dual-stack middles)
* IPv6-only guards (needs dual-stack middles)
* IPv6-only middles (needs dual-stack or IPv6-only guards and
   exits, removes need for dual-stack middles)

What about bridge guards?
(That is, can bridges add an extra hop into circuits, to protect
themselves from being discovered by middles?)

Maybe bridges could commit to their (blinded) bridge guards
in their self-signed own snip?
Or the bridge authority could distribute a bridge ENDIVE?
(We might need multiple bridge authorities for redundancy.)

> [CBOR] RFC 7049: "Concise Binary Object Representation (CBOR)"
>    https://tools.ietf.org/html/rfc7049b
> [CDDL] RFC 8610: "Concise Data Definition Language (CDDL): A
>    Notational Convention to Express Concise Binary Object
>    Representation (CBOR) and JSON Data Structures"
>    https://tools.ietf.org/html/rfc8610
> [GITREPO]  https://github.com/nmathewson/walking-onions-wip
> [SNIPFMT] https://github.com/nmathewson/walking-onions-wip/blob/master/specs/02-endives-and-snips.md

More information about the tor-dev mailing list