On 2017-12-31 at 00:57:49 +0000, Alec Muffett alec.muffett@gmail.com wrote:
Thanks! That's very interesting! TIL :-)
Why, if it isn’t instant feedback from the RFC 7686 co-author! In response to what you said, in brief: I will propose that any subdomain data (which is presumably human-readable) be transmitted in a separate or affixed string, leaving Bech32 to deal with the pseudorandom blobs. Technical details follow.
What would you propose to do with subdomains, like www.facebookcorewwwi.onion? Or is that outside the scope of your proposal?
Good question. That had briefly occurred to me; but I couldn’t figure out any feasible means to stuff subdomains into the Bech32 string, for the following reasons:
(0) RFC 1034 DNS names may be up to 255 octets in length. But Bech32 strings are more length-limited. After subtracting an HRP of “onion” (5 chars), the required separator of “1”, and the 6 characters of ECC checksum in the data part, the 90-character total length limit can only spare up to 78 characters for the onion address data. For both v2 and v3 onions, that’s more than sufficient. But even if the length limit could be raised, an excessively long string would destroy the human-friendliness which is the raison d’être for Bech32.
(I *infer* that this last may be one reason for the length limit. Although of course I can’t say for certain, I’ve read Greg Maxwell discussing some of the user testing involved in the standard’s development; and 90 chars seems to me the extreme of what a mortal flesh-and-blood creature could handle with such a string.)
(1) Bech32 is a base-32 encoding, only with a different alphabet than RFC 4648. Thus, it would be necessary to design another layer of encoding to most efficiently represent subdomain labels and the dot-separator with an alphabet of 38 characters [-0-9a-z.]. Worse, depending on which standards an implementation follows or ignores, that is not really a strict limitation on names seen in the wild. How should the Bech32 transformation deal with names containing an underscore “_”? Or other characters? I think it would only be safe to go with full octets. This would severely exacerbate the problem of (0) above.
(Aside: The special alphabet is bound to raise some eyebrows; so I will here quote its rationale from BIP 173: “The character set is chosen to minimize ambiguity according to [this](https://hissa.nist.gov/~black/GTLD/) visual similarity data, and the ordering is chosen to minimize the number of pairs of similar characters (according to the same data) that differ in more than 1 bit. As the checksum is chosen to maximize detection capabilities for low numbers of bit errors, this choice improves its performance under some error models.” From what I understand, a large amount of CPU time was spent crunching over the data in search of the most error-resistant alphabet.)
(2) Most subdomains are human-memorable—in your example, “www”. Coding them with Bech32 would decrease human-friendliness, which is the precise opposite of my objective in making this suggestion. Bech32 is great for helping humans deal with pseudorandom blobs; for those, it improves upon RFC4648 Base32, Base64, hexadecimal, or in Bitcoin’s case, the old base58-based address encoding. But it is absolutely inappropriate as a coding format for text which humans can easily read, type, and remember.
It is also important to consider relative impact in common usage. I observe that most .onions do not use subdomains. I do think that it’s important to support this use case; but if tradeoffs must be made, then I would optimize more for making that pseudorandom blob less brittle in human hands.
For the foregoing reasons, I will propose that subdomain data, if any, be kept separate from the Bech32 coding. It may be either kept in a separate string, or somehow affixed with a special delimiter either before or after the Bech32 representation of the onion. Off-the-cuff, which of these looks best to you?
www:onion19qzypww2zw3ykkkglr4tu9
onion19qzypww2zw3ykkkglr4tu9:www
another-level.www:onion19qzypww2zw3ykkkglr4tu9
(My choice of a delimiter here may be wrong, if we want for the browser’s address bar to translate it. I should think more about this.)
Finally, I think I should mention: Yes, “onion19qzypww2zw3ykkkglr4tu9” is not as pretty as “facebookcorewwwi.onion”. But few .onion sites have the compute power available to Facebook! Moreover, my proposal should apply to v3 onions—where nobody on Earth will be able to fully bruteforce out a human-memorable string.
I would advise users to stick to the DNS-style coding for facebookcorewwwi.onion, and take advantage of Bech32 as an alternative representation for http://yz7lpwfhhzcdyc5y.onion/ , http://5nca3wxl33tzlzj5.onion/ , and other such strings. Those are pure pain for users now, and it will only get use when v3 onions get uptake. Error-correcting codes do not make the names any easier to read; but they certainly do help with the inevitable mistakes in all the use cases which involve voice, handwriting, manual typing, carrier pigeons, etc.