[tor-dev] Error-Correcting Onions with Bech32

Alec Muffett alec.muffett at gmail.com
Sun Dec 31 11:46:28 UTC 2017

On 31 December 2017 at 02:46, nullius <nullius at nym.zone> wrote:

> For the foregoing reasons, I will propose that subdomain data, if any, be
> kept separate from the Bech32 coding.  It may be either kept in a separate
> string, or somehow affixed with a special delimiter either before or after
> the Bech32 representation of the onion.  Off-the-cuff, which of these looks
> best to you?
>         www:onion19qzypww2zw3ykkkglr4tu9
>         onion19qzypww2zw3ykkkglr4tu9:www
>         another-level.www:onion19qzypww2zw3ykkkglr4tu9
> (My choice of a delimiter here may be wrong, if we want for the browser’s
> address bar to translate it.  I should think more about this.)

I need to think about this more, and after coffee, but my first concerns
would be:

1) that having multiple representations of a site's onion address is likely
to break many/most sites, because of Host/Origin headers being complicated
enough already.

2) anything involving colons in any position ("https://
onion19qzypww2zw3ykkkglr4tu9:www/") is likely to break both
client-side-web-browsers and server-side-CMS-software unless they are
specially re-engineered for Tor, which is likely to inhibit use *of* Tor;
colons are a port-number separator in URLs, unless they come as part of an
IPv6 address in [square brackets].

My general sense is that:

a) if Onion addresses suddenly stop looking very-similar-to DNS addresses,
Tor risks returning to a world where special expertise is necessary to
build software for it, thereby harming growth/adoption

b) if Onion addresses have 2+ forms, one like the current (www.
4acth47i6kxnvkewtm6q7ib2s3ufpo5sqbsnzjpbi7utijcltosqemad.onion) and the
other being apparently more human-usable because it contains a CRC, the one
which allows access to websites will win.

My expectation to date has been that the problem with "
4acth47i6kxnvkewtm6q7ib2s3ufpo5sqbsnzjpbi7utijcltosqemad" is that that
there is no place for the eyeball to rest when typing it in; as such I've
presumed that a canonical form, defined by Tor, would be something like:


 ...being N groups of M characters (where N and M can be argued, feel
free...) and where any unused characters within the 63-character
DNS-compliant budget can be used to implement a credit-card-like running
checksum or CRC, for quick client-side checks; eg: the URL bar can identify
that you are typing in an Onion address and leave it red-or-grey until you
type something which satisfies the checksum, before flinging it at
tor-daemon for attempted resolution.

Or, indeed, you could leave out the hyphens and do the same; the Prop224
Onion address is 59 characters, leaving a budget of 63-59==4 characters or
20 bits; we could put these at the end, in the space marked "@@@@":


....and use those 20 bits to implement 5x 4-bit checksums over 12-character


...so that any UX component which wants to help the user can highlight (in
red? or bold?) where the problem is, picking out a chunk of 12 characters
which contain the typo:


Spot the errant 'j'.

The advantage of a system like this is that it's not perfect, but a typo
mostly has to happen twice and be quite fortunate to go undetected.

Of course it's not perfect, but nothing will be, and clever selection of
checksum and encoding will result in something which is still DNS- and


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20171231/52f125d6/attachment-0001.html>

More information about the tor-dev mailing list