[tor-dev] Error-Correcting Onions with Bech32

nullius nullius at nym.zone
Sun Dec 31 02:46:00 UTC 2017

On 2017-12-31 at 00:57:49 +0000, Alec Muffett <alec.muffett at gmail.com> 
>Thanks! That's very interesting!  TIL :-)

Why, if it isn’t instant feedback from the RFC 7686 co-author!  In 
response to what you said, in brief:  I will propose that any subdomain 
data (which is presumably human-readable) be transmitted in a separate 
or affixed string, leaving Bech32 to deal with the pseudorandom blobs.  
Technical details follow.

>What would you propose to do with subdomains, like 
>www.facebookcorewwwi.onion? Or is that outside the scope of your 

Good question.  That had briefly occurred to me; but I couldn’t figure 
out any feasible means to stuff subdomains into the Bech32 string, for 
the following reasons:

(0) RFC 1034 DNS names may be up to 255 octets in length.  But Bech32 
strings are more length-limited.  After subtracting an HRP of “onion” (5 
chars), the required separator of “1”, and the 6 characters of ECC 
checksum in the data part, the 90-character total length limit can only 
spare up to 78 characters for the onion address data.  For both v2 and 
v3 onions, that’s more than sufficient.  But even if the length limit 
could be raised, an excessively long string would destroy the 
human-friendliness which is the raison d’être for Bech32.

(I *infer* that this last may be one reason for the length limit.  
Although of course I can’t say for certain, I’ve read Greg Maxwell 
discussing some of the user testing involved in the standard’s 
development; and 90 chars seems to me the extreme of what a mortal 
flesh-and-blood creature could handle with such a string.)

(1) Bech32 is a base-32 encoding, only with a different alphabet than 
RFC 4648.  Thus, it would be necessary to design another layer of 
encoding to most efficiently represent subdomain labels and the 
dot-separator with an alphabet of 38 characters [-0-9a-z.].  Worse, 
depending on which standards an implementation follows or ignores, that 
is not really a strict limitation on names seen in the wild.  How should 
the Bech32 transformation deal with names containing an underscore “_”?  
Or other characters?  I think it would only be safe to go with full 
octets.  This would severely exacerbate the problem of (0) above.

(Aside:  The special alphabet is bound to raise some eyebrows; so I will 
here quote its rationale from BIP 173:  “The character set is chosen to 
minimize ambiguity according to [this](https://hissa.nist.gov/~black/GTLD/) 
visual similarity data, and the ordering is chosen to minimize the 
number of pairs of similar characters (according to the same data) that 
differ in more than 1 bit.  As the checksum is chosen to maximize 
detection capabilities for low numbers of bit errors, this choice 
improves its performance under some error models.”  From what I 
understand, a large amount of CPU time was spent crunching over the data 
in search of the most error-resistant alphabet.)

(2) Most subdomains are human-memorable—in your example, “www”.  Coding 
them with Bech32 would decrease human-friendliness, which is the precise 
opposite of my objective in making this suggestion.  Bech32 is great for 
helping humans deal with pseudorandom blobs; for those, it improves upon 
RFC4648 Base32, Base64, hexadecimal, or in Bitcoin’s case, the old 
base58-based address encoding.  But it is absolutely inappropriate as a 
coding format for text which humans can easily read, type, and remember.

It is also important to consider relative impact in common usage.  I 
observe that most .onions do not use subdomains.  I do think that it’s 
important to support this use case; but if tradeoffs must be made, then 
I would optimize more for making that pseudorandom blob less brittle in 
human hands.

For the foregoing reasons, I will propose that subdomain data, if any, 
be kept separate from the Bech32 coding.  It may be either kept in a 
separate string, or somehow affixed with a special delimiter either 
before or after the Bech32 representation of the onion.  Off-the-cuff, 
which of these looks best to you?




(My choice of a delimiter here may be wrong, if we want for the 
browser’s address bar to translate it.  I should think more about this.)

Finally, I think I should mention:  Yes, “onion19qzypww2zw3ykkkglr4tu9” 
is not as pretty as “facebookcorewwwi.onion”.  But few .onion sites have 
the compute power available to Facebook!  Moreover, my proposal should 
apply to v3 onions—where nobody on Earth will be able to fully 
bruteforce out a human-memorable string.

I would advise users to stick to the DNS-style coding for 
facebookcorewwwi.onion, and take advantage of Bech32 as an alternative 
representation for http://yz7lpwfhhzcdyc5y.onion/ , 
http://5nca3wxl33tzlzj5.onion/ , and other such strings.  Those are pure 
pain for users now, and it will only get use when v3 onions get uptake.  
Error-correcting codes do not make the names any easier to read; but 
they certainly do help with the inevitable mistakes in all the use cases 
which involve voice, handwriting, manual typing, carrier pigeons, etc.

nullius at nym.zone | PGP ECC: 0xC2E91CD74A4C57A105F6C21B5A00591B2F307E0C
Bitcoin: bc1qcash96s5jqppzsp8hy8swkggf7f6agex98an7h | (Segwit nested:
“‘If you’re not doing anything wrong, you have nothing to hide.’
No!  Because I do nothing wrong, I have nothing to show.” — nullius
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20171231/35c038f5/attachment-0001.sig>

More information about the tor-dev mailing list