# Synopsis
The Bech32 standard for error-correcting base32 strings was developed explicitly for relative ease and reliability in human communication of pseudorandom bitstrings. I invite discussion of specifying Bech32 as an alternative means for representing RFC 7686 .onion domain names. Should the response hereto be positive, then I will offer a formal proposal.
I have written and released a tool which automatically recognizes and encodes/decodes .onion addresses in Bech32. To complement whatever I here say, please get a hands-on feel for Bech32 .onions:
https://github.com/nym-zone/bech32
Manpage (yes, a real manpage!): https://raw.githubusercontent.com/nym-zone/bech32/master/bech32.1.txt
# Background: About Bech32
Bech32 is specified by the Bitcoin BIP 173 standard,[1] co-authored by Pieter Wuille and Greg Maxwell. According to Mr. Maxwell, “Bech32 is designed for human use and basically nothing else”; the underlying research and development process involved extensive testing with human users, analysis of NIST visual confusability data, and the integration of a BCH code with strong error correction and detection properties.
[1] https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki
I refer to BIP 173 for further explanation of Bech32’s design properties, its rationales, and the limits of its error handling.
A specific application of Bech32 is Bitcoin’s new address format for the future, which I call “Bravo Charlie Addresses” after the letters “bc” specified for Bitcoin addresses in the standard’s “human-readable part” (HRP). However, the standard was written to permit general use in other applications.
Having in hand a standard explicitly designed to ease the pain which wetware suffers when it comes into contact with pseudorandom gibberish, the cypherpunk in me is overjoyed at the potentials. One is a concept which I call “PGP Descriptors”, which I am currently working to specify with a few extra features and nuances. And of course, I think of .onions!
# Bech32 for .onion
I hereby nominate “onion” as the logical HRP for RFC 7686 .onion special-use domain names.
Here is Bech32 .onion by example, using my bech32 tool with its built-in .onion support to encode and decode the name for the Tor Project’s .onion equivalent of its “www” site:
``` $ bech32 -e expyuzz4wqqyqhjn.onion onion1yh0c5eeuksscs8fdyd8406 $ bech32 -d onion1yh0c5eeuksscs8fdyd8406 expyuzz4wqqyqhjn.onion ```
The string is longer, because it contains 6 base32 characters’ worth of error-correcting code. N.b. also, the foregoing should work just fine for v3 onions (formerly prop-224).
Imagine the impact on users who have a practical need to transmit a .onion address by verbal communication, or via a handwritten note. Now they can get some help with errors, instead of wondering why they can’t connect to a nonexistent .onion site.
The standard enjoins applications against autocorrecting Bitcoin addresses, so as to prevent even the slightest possibility of causing funds loss by being too “helpful”. But in applications where it would be safe to do so, Bech32 can indeed correct small errors (as well as reliably detecting much worse errors). I suggest that such automatic correction would be suitable for .onion addresses.
Bech32 co-author Dr. Wuille (sipa) has published Javascript reference code, plus a Javascript error-correction demo, under an MIT license. Perhaps this may be easily adapted into Torbutton, for automagic decoding of Bech32 “onion1” to .onion domains in the Tor Browser address bar. The code is in the same repository whence I copied the Bech32 reference C code I use internally in my tool:
https://github.com/sipa/bech32
# Conclusion—or, to be continued...
An alternative representational format with error-correcting codes will make .onion addresses more human-friendly. I look forward to the day when “onion1” addresses can be passed by handwritten notes, vocalized with a radio alphabet, stuffed into QR codes, scrawled on parchments placed in bottles tossed to sea, rocketed into space, and then conveniently transformed with appropriate corrections into the DNS-style .onion format specified by RFC 7686.
Here’s to the alternative Onion format of the future!
Thanks! That's very interesting! TIL :-)
What would you propose to do with subdomains, like www.facebookcorewwwi.onion? Or is that outside the scope of your proposal?
- alec
On 31 Dec 2017 00:53, "nullius" nullius@nym.zone wrote:
# Synopsis
The Bech32 standard for error-correcting base32 strings was developed explicitly for relative ease and reliability in human communication of pseudorandom bitstrings. I invite discussion of specifying Bech32 as an alternative means for representing RFC 7686 .onion domain names. Should the response hereto be positive, then I will offer a formal proposal.
I have written and released a tool which automatically recognizes and encodes/decodes .onion addresses in Bech32. To complement whatever I here say, please get a hands-on feel for Bech32 .onions:
https://github.com/nym-zone/bech32
Manpage (yes, a real manpage!): https://raw.githubusercontent.com/nym-zone/bech32/master/bech32.1.txt
# Background: About Bech32
Bech32 is specified by the Bitcoin BIP 173 standard,[1] co-authored by Pieter Wuille and Greg Maxwell. According to Mr. Maxwell, “Bech32 is designed for human use and basically nothing else”; the underlying research and development process involved extensive testing with human users, analysis of NIST visual confusability data, and the integration of a BCH code with strong error correction and detection properties.
[1] https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki
I refer to BIP 173 for further explanation of Bech32’s design properties, its rationales, and the limits of its error handling.
A specific application of Bech32 is Bitcoin’s new address format for the future, which I call “Bravo Charlie Addresses” after the letters “bc” specified for Bitcoin addresses in the standard’s “human-readable part” (HRP). However, the standard was written to permit general use in other applications.
Having in hand a standard explicitly designed to ease the pain which wetware suffers when it comes into contact with pseudorandom gibberish, the cypherpunk in me is overjoyed at the potentials. One is a concept which I call “PGP Descriptors”, which I am currently working to specify with a few extra features and nuances. And of course, I think of .onions!
# Bech32 for .onion
I hereby nominate “onion” as the logical HRP for RFC 7686 .onion special-use domain names.
Here is Bech32 .onion by example, using my bech32 tool with its built-in .onion support to encode and decode the name for the Tor Project’s .onion equivalent of its “www” site:
$ bech32 -e expyuzz4wqqyqhjn.onion onion1yh0c5eeuksscs8fdyd8406 $ bech32 -d onion1yh0c5eeuksscs8fdyd8406 expyuzz4wqqyqhjn.onion
The string is longer, because it contains 6 base32 characters’ worth of error-correcting code. N.b. also, the foregoing should work just fine for v3 onions (formerly prop-224).
Imagine the impact on users who have a practical need to transmit a .onion address by verbal communication, or via a handwritten note. Now they can get some help with errors, instead of wondering why they can’t connect to a nonexistent .onion site.
The standard enjoins applications against autocorrecting Bitcoin addresses, so as to prevent even the slightest possibility of causing funds loss by being too “helpful”. But in applications where it would be safe to do so, Bech32 can indeed correct small errors (as well as reliably detecting much worse errors). I suggest that such automatic correction would be suitable for .onion addresses.
Bech32 co-author Dr. Wuille (sipa) has published Javascript reference code, plus a Javascript error-correction demo, under an MIT license. Perhaps this may be easily adapted into Torbutton, for automagic decoding of Bech32 “onion1” to .onion domains in the Tor Browser address bar. The code is in the same repository whence I copied the Bech32 reference C code I use internally in my tool:
https://github.com/sipa/bech32
# Conclusion—or, to be continued...
An alternative representational format with error-correcting codes will make .onion addresses more human-friendly. I look forward to the day when “onion1” addresses can be passed by handwritten notes, vocalized with a radio alphabet, stuffed into QR codes, scrawled on parchments placed in bottles tossed to sea, rocketed into space, and then conveniently transformed with appropriate corrections into the DNS-style .onion format specified by RFC 7686.
Here’s to the alternative Onion format of the future!
-- nullius@nym.zone | PGP ECC: 0xC2E91CD74A4C57A105F6C21B5A00591B2F307E0C Bitcoin: bc1qcash96s5jqppzsp8hy8swkggf7f6agex98an7h | (Segwit nested: 3NULL3ZCUXr7RDLxXeLPDMZDZYxuaYkCnG) (PGP RSA: 0x36EBB4AB699A10EE) “‘If you’re not doing anything wrong, you have nothing to hide.’ No! Because I do nothing wrong, I have nothing to show.” — nullius
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
On 2017-12-31 at 00:57:49 +0000, Alec Muffett alec.muffett@gmail.com wrote:
Thanks! That's very interesting! TIL :-)
Why, if it isn’t instant feedback from the RFC 7686 co-author! In response to what you said, in brief: I will propose that any subdomain data (which is presumably human-readable) be transmitted in a separate or affixed string, leaving Bech32 to deal with the pseudorandom blobs. Technical details follow.
What would you propose to do with subdomains, like www.facebookcorewwwi.onion? Or is that outside the scope of your proposal?
Good question. That had briefly occurred to me; but I couldn’t figure out any feasible means to stuff subdomains into the Bech32 string, for the following reasons:
(0) RFC 1034 DNS names may be up to 255 octets in length. But Bech32 strings are more length-limited. After subtracting an HRP of “onion” (5 chars), the required separator of “1”, and the 6 characters of ECC checksum in the data part, the 90-character total length limit can only spare up to 78 characters for the onion address data. For both v2 and v3 onions, that’s more than sufficient. But even if the length limit could be raised, an excessively long string would destroy the human-friendliness which is the raison d’être for Bech32.
(I *infer* that this last may be one reason for the length limit. Although of course I can’t say for certain, I’ve read Greg Maxwell discussing some of the user testing involved in the standard’s development; and 90 chars seems to me the extreme of what a mortal flesh-and-blood creature could handle with such a string.)
(1) Bech32 is a base-32 encoding, only with a different alphabet than RFC 4648. Thus, it would be necessary to design another layer of encoding to most efficiently represent subdomain labels and the dot-separator with an alphabet of 38 characters [-0-9a-z.]. Worse, depending on which standards an implementation follows or ignores, that is not really a strict limitation on names seen in the wild. How should the Bech32 transformation deal with names containing an underscore “_”? Or other characters? I think it would only be safe to go with full octets. This would severely exacerbate the problem of (0) above.
(Aside: The special alphabet is bound to raise some eyebrows; so I will here quote its rationale from BIP 173: “The character set is chosen to minimize ambiguity according to [this](https://hissa.nist.gov/~black/GTLD/) visual similarity data, and the ordering is chosen to minimize the number of pairs of similar characters (according to the same data) that differ in more than 1 bit. As the checksum is chosen to maximize detection capabilities for low numbers of bit errors, this choice improves its performance under some error models.” From what I understand, a large amount of CPU time was spent crunching over the data in search of the most error-resistant alphabet.)
(2) Most subdomains are human-memorable—in your example, “www”. Coding them with Bech32 would decrease human-friendliness, which is the precise opposite of my objective in making this suggestion. Bech32 is great for helping humans deal with pseudorandom blobs; for those, it improves upon RFC4648 Base32, Base64, hexadecimal, or in Bitcoin’s case, the old base58-based address encoding. But it is absolutely inappropriate as a coding format for text which humans can easily read, type, and remember.
It is also important to consider relative impact in common usage. I observe that most .onions do not use subdomains. I do think that it’s important to support this use case; but if tradeoffs must be made, then I would optimize more for making that pseudorandom blob less brittle in human hands.
For the foregoing reasons, I will propose that subdomain data, if any, be kept separate from the Bech32 coding. It may be either kept in a separate string, or somehow affixed with a special delimiter either before or after the Bech32 representation of the onion. Off-the-cuff, which of these looks best to you?
www:onion19qzypww2zw3ykkkglr4tu9
onion19qzypww2zw3ykkkglr4tu9:www
another-level.www:onion19qzypww2zw3ykkkglr4tu9
(My choice of a delimiter here may be wrong, if we want for the browser’s address bar to translate it. I should think more about this.)
Finally, I think I should mention: Yes, “onion19qzypww2zw3ykkkglr4tu9” is not as pretty as “facebookcorewwwi.onion”. But few .onion sites have the compute power available to Facebook! Moreover, my proposal should apply to v3 onions—where nobody on Earth will be able to fully bruteforce out a human-memorable string.
I would advise users to stick to the DNS-style coding for facebookcorewwwi.onion, and take advantage of Bech32 as an alternative representation for http://yz7lpwfhhzcdyc5y.onion/ , http://5nca3wxl33tzlzj5.onion/ , and other such strings. Those are pure pain for users now, and it will only get use when v3 onions get uptake. Error-correcting codes do not make the names any easier to read; but they certainly do help with the inevitable mistakes in all the use cases which involve voice, handwriting, manual typing, carrier pigeons, etc.
Hi,
Please read the naming layer API proposal before writing your proposal:
https://gitweb.torproject.org/torspec.git/tree/proposals/279-naming-layer-ap...
In particular, if you added a unique top-level domain (.bech?), you would only have to specify how a the bech translation plugin works. (It would be a much shorter proposal.)
On 31 Dec 2017, at 13:46, nullius nullius@nym.zone wrote:
For the foregoing reasons, I will propose that subdomain data, if any, be kept separate from the Bech32 coding. It may be either kept in a separate string, or somehow affixed with a special delimiter either before or after the Bech32 representation of the onion. Off-the-cuff, which of these looks best to you?
www:onion19qzypww2zw3ykkkglr4tu9
onion19qzypww2zw3ykkkglr4tu9:www
another-level.www:onion19qzypww2zw3ykkkglr4tu9
(My choice of a delimiter here may be wrong, if we want for the browser’s address bar to translate it. I should think more about this.)
Why not:
www.onion19qzypww2zw3ykkkglr4tu9
Transforming the final 2 components and leaving the rest intact seems like the most usable form. Particularly if you're going to add a .bech at the end of the address for prop#279.
T
On 2017-12-31 at 14:23:39 +1100, teor teor2345@gmail.com wrote:
Please read the naming layer API proposal before writing your proposal:
https://gitweb.torproject.org/torspec.git/tree/proposals/279-naming-layer-ap...
In particular, if you added a unique top-level domain (.bech?), you would only have to specify how a the bech translation plugin works. (It would be a much shorter proposal.)
Thanks, teor. I reviewed the spec (version 13cbcbc) carefully, and opened https://trac.torproject.org/24774 attaching a `git diff` patch with proposed changes.
The crux of the matter is support for what I will call alternative name representations. Prop-279 assumed quasi-DNS names resolved through some sort of a network or database lookup. However, an alternative representation can be entirely self-contained. Thus, one of the changes I request is to explicitly permit a global wildcard '*' tld for plugins which can be sandboxed with neither network nor filesystem access (and will return answers in microseconds).
I also proposed changes to permit the UTF-8 characters required for representing names in languages other than American English, and some other technical improvements. I added status code 5 to support plugins which can discern when a name is in a recognized format, but is intrinsically invalid e.g. due to checksum failure; and I expanded the description of status code 2, for plugins which do not have TLDs but do recognize a definite syntax.
The potential use cases here extend beyond my suggestion for Bech32-coded .onions. I also wish to encode .onion addresses in a mnemonic phrase, similar to those generated by this tool:
easyseed(1) BIP 39 mnemonic phrase generator https://github.com/nym-zone/easyseed manpage: https://raw.githubusercontent.com/nym-zone/easyseed/master/easyseed.1.txt
Out of the box, that will make a mnemonic from the raw data for a v3 .onion address, but not v2 (too short). I could easily draw up a spec to represent v2 .onions as 8 words, and v3 onions as 24–25 words, each including a simple checksum. The mnemonic standard I’ve been using includes carefully designed wordlists for nine different languages; I will soon be adding multilanguage support to my tool, which I could copy over to a prop-279 name system plugin.
Now, imagine an activist under a repressive régime whispering in the ear of a whistleblower eight words for the address of a SecureDrop. Or scrawling a Bech32 address on a scrap of paper in a hurry. The possibilities are many.
Should my proposed changes be accepted, I will be eager to write tools and plugins for .onion alternative representations which look either like this (a real address, properly encoded in Bech32):
onion1kt50trm0nf4jxkskpcjy74
...or approximately like this (random words off a wordlist, for example only):
mad century mirror awkward glory shine cake fat
...with out-of-the-box support for Chinese (Simplified), Chinese (Traditional), French, Italian, Japanese, Korean, and Spanish, in addition to English.
Wordlists, all designed to minimize user error: https://github.com/bitcoin/bips/tree/master/bip-0039 (In the English list, all words are unique within the first four characters; and similar/confusable words are excluded.)
Given appropriate prop-279 changes, I won’t need to draw a proposal. I’ll simply write code!
I commented on the ticket but I'll do it here for completeness sake:
On Sun, 31 Dec 2017 10:12:53 +0000 nullius nullius@nym.zone wrote:
I also proposed changes to permit the UTF-8 characters required for representing names in languages other than American English, and some other technical improvements. I added status code 5 to support plugins which can discern when a name is in a recognized format, but is intrinsically invalid e.g. due to checksum failure; and I expanded the description of status code 2, for plugins which do not have TLDs but do recognize a definite syntax.
This is pointless because internationalized domain names are standardized around Punycode encoding (Unicode<->ASCII), and said standard is supported by applications that support IDN queries.
I am firmly against this change, and I'm not particularly thrilled by the thought of homograph attacks either.
Given appropriate prop-279 changes, I won’t need to draw a proposal. I’ll simply write code!
It's worth keeping in mind that no one to my knowledge has implemented prop 279 in the tor code itself, though there is (IIRC) a python kludge that kind of allows development.
Regards,
On 2017-12-31 at 10:48:52 +0000, Yawning Angel yawning@schwanenlied.me wrote:
This is pointless because internationalized domain names are standardized around Punycode encoding (Unicode<->ASCII), and said standard is supported by applications that support IDN queries.
I am firmly against this change, and I'm not particularly thrilled by the thought of homograph attacks either.
Happy New Year, Yawning; and apologies for the delayed reply. I thought I’d best work up some code for an object demonstration of why I urge the importance of UTF-8 (and also embedded spaces, which I forgot to mention explicitly).
Here is an 8-word mnemonic phrase encoding for Wikileaks (http://wlupld3ptjvsgwqw.onion/), in 8 different languages or writing systems:
real element glow tennis pluck museum hair shuffle 洁 爱 唱 仰 泪 吴 乎 怒 潔 愛 唱 仰 淚 吳 乎 怒 parole distance fautif sombre notoire loyal flairer ratisser retina erba idillio suonare potassio opposto india scuderia にもつ けろけろ しちりん ほめる とかす たんまつ しゃうん はんしゃ 잠자리 반죽 상품 큰딸 이불 열차 선풍기 중반 pie dulce gimnasio tabla oscuro molde guerra repetir
Imagine an activist whispering this address in someone’s ear, in the people’s native tongue!
Respectively, those mnemonics are in English, Chinese (Simplified), Chinese (Traditional), French, Italian, Japanese, Korean, and Spanish. Those are not my selections; they are the languages for which wordlists are currently available in the standard I am adapting. Here is a hint on how to produce these phrases: https://github.com/nym-zone/easyseed/commit/ba77be1b1a1f0c6af50ceba5c89f4ade...
As for Punycode vs. UTF-8:
Homograph attacks are not “solved” by Punycode any more than they would be fixed by base64ing all addresses. Punycode is not a security feature; to the contrary! CVE-2013-7424, CVE-2015-8948, CVE-2016-6261, CVE-2016-6262, CVE-2017-14062.... Need I say more?
With some care, I can write a perfectly secure UTF-8 handler (forbidding non-shortest form, with a proper U+FFFD replacement algorithm, etc.). Whereas I have never seen a Punycode decoder which gives me confidence in its behaviour under all possible inputs. I assiduously avoid interacting with the bloat and pitfalls of IDNA and Punycode, insofar as I can. By contrast, UTF-8 has been happily in use on Unix/Plan9 systems for a quarter-century.
I know that as you say, applications which handle a string as a “domain” will Punycode it before Tor even sees it. But my thinking from the beginning was not in terms of DNS names. One of my constructive criticisms of prop-279 is that it makes that assumption.
The proper question is not, “How do we make more flexible pseudo-DNS lookups?”, but rather more generally: **How can we turn the pseudorandom binary data from .onion names into forms friendlier to humans?** If the Name System API could be in some way modified to admit better answers in the long term, then it would be my pleasure to help achieve that.
Now since I know that Alec Muffett is reading this thread, here are mnemonics in the same languages for facebookcorewwwi.onion:
chimney capital common neither demand certain hen athlete 身 热 界 巨 置 证 假 然 身 熱 界 巨 置 證 假 然 caméra boussole chasseur mairie crayon butiner fougère annuel casuale buffone collare osare derivare capello intuito apatico かいさつ おこす かんそう ちせい ぐうせい おもたい しゅらば いはつ 노력 기획 답변 예방 매장 남자 세월 고급 calor brazo centro mover crema cabeza helio antojo
Dare to dream outside the quasi-DNS box about how .onion addresses can be represented!
On Mon, 1 Jan 2018 08:45:57 +0000 nullius nullius@nym.zone wrote:
On 2017-12-31 at 10:48:52 +0000, Yawning Angel yawning@schwanenlied.me wrote:
This is pointless because internationalized domain names are standardized around Punycode encoding (Unicode<->ASCII), and said standard is supported by applications that support IDN queries.
I am firmly against this change, and I'm not particularly thrilled by the thought of homograph attacks either.
Happy New Year, Yawning; and apologies for the delayed reply. I thought I’d best work up some code for an object demonstration of why I urge the importance of UTF-8 (and also embedded spaces, which I forgot to mention explicitly).
I'm aware of the use cases for IDNs.
As for Punycode vs. UTF-8:
Homograph attacks are not “solved” by Punycode any more than they would be fixed by base64ing all addresses. Punycode is not a security feature; to the contrary! CVE-2013-7424, CVE-2015-8948, CVE-2016-6261, CVE-2016-6262, CVE-2017-14062.... Need I say more?
Sigh, the problem is encoding format agnostic.
My point was, by allowing non-ASCII characters the onus is on *someone* to solve the problem of homograph attacks (which admittedly is a bit of a tangent). I'm painfully aware that all browsers, including Tor Browser have utterly inadequate solutions here.
I know that as you say, applications which handle a string as a “domain” will Punycode it before Tor even sees it. But my thinking from the beginning was not in terms of DNS names. One of my constructive criticisms of prop-279 is that it makes that assumption.
It makes that assumption because it is an entirely reasonable thing to do in the context of Tor.
Dare to dream outside the quasi-DNS box about how .onion addresses can be represented!
I will quote Alec Muffet here:
a) if Onion addresses suddenly stop looking very-similar-to DNS addresses, Tor risks returning to a world where special expertise is necessary to build software for it, thereby harming growth/adoption
The current proposal can get "very similar-to DNS addresses" IDNs by using the same encoding format that DNS uses.
Regards,
Yawning Angel yawning@schwanenlied.me writes:
It's worth keeping in mind that no one to my knowledge has implemented prop 279 in the tor code itself, though there is (IIRC) a python kludge that kind of allows development.
Said kludge is here, for completeness:
https://github.com/meejah/torns
(It's definitely not a thing you should use "in production" or whatever, but a nice toy if you want to play with a Prop279 implementation). I'm happy to merge PRs to fix things etc but I'm not "actively developing" it.
Also worth noting that Tor doesn't play nicely with multiple controllers that try to do stream-attaching; the above thing does stream-attaching.
On 31 December 2017 at 02:46, nullius nullius@nym.zone wrote:
For the foregoing reasons, I will propose that subdomain data, if any, be kept separate from the Bech32 coding. It may be either kept in a separate string, or somehow affixed with a special delimiter either before or after the Bech32 representation of the onion. Off-the-cuff, which of these looks best to you?
www:onion19qzypww2zw3ykkkglr4tu9 onion19qzypww2zw3ykkkglr4tu9:www another-level.www:onion19qzypww2zw3ykkkglr4tu9
(My choice of a delimiter here may be wrong, if we want for the browser’s address bar to translate it. I should think more about this.)
I need to think about this more, and after coffee, but my first concerns would be:
1) that having multiple representations of a site's onion address is likely to break many/most sites, because of Host/Origin headers being complicated enough already.
2) anything involving colons in any position ("https:// onion19qzypww2zw3ykkkglr4tu9:www/") is likely to break both client-side-web-browsers and server-side-CMS-software unless they are specially re-engineered for Tor, which is likely to inhibit use *of* Tor; colons are a port-number separator in URLs, unless they come as part of an IPv6 address in [square brackets].
My general sense is that:
a) if Onion addresses suddenly stop looking very-similar-to DNS addresses, Tor risks returning to a world where special expertise is necessary to build software for it, thereby harming growth/adoption
b) if Onion addresses have 2+ forms, one like the current (www. 4acth47i6kxnvkewtm6q7ib2s3ufpo5sqbsnzjpbi7utijcltosqemad.onion) and the other being apparently more human-usable because it contains a CRC, the one which allows access to websites will win.
My expectation to date has been that the problem with " 4acth47i6kxnvkewtm6q7ib2s3ufpo5sqbsnzjpbi7utijcltosqemad" is that that there is no place for the eyeball to rest when typing it in; as such I've presumed that a canonical form, defined by Tor, would be something like:
https://www. 4acth47i-6kxnvkew-tm6q7ib2-s3ufpo5-sqbsnzjp-bi7utij-cltosqem-ad.onion/
...being N groups of M characters (where N and M can be argued, feel free...) and where any unused characters within the 63-character DNS-compliant budget can be used to implement a credit-card-like running checksum or CRC, for quick client-side checks; eg: the URL bar can identify that you are typing in an Onion address and leave it red-or-grey until you type something which satisfies the checksum, before flinging it at tor-daemon for attempted resolution.
Or, indeed, you could leave out the hyphens and do the same; the Prop224 Onion address is 59 characters, leaving a budget of 63-59==4 characters or 20 bits; we could put these at the end, in the space marked "@@@@":
https://www4acth47i6kxnvkewtm6q7ib2s3ufpo5sqbsnzjpbi7utijcltosqemad@ @@@.onion/
....and use those 20 bits to implement 5x 4-bit checksums over 12-character chunks:
https://%7Bwww 4acth47i6}{kxnvkewtm6q7}{ib2s3ufpo5sq}{bsnzjpbi7uti}{jcltosqemad@}@@@.onion/
...so that any UX component which wants to help the user can highlight (in red? or bold?) where the problem is, picking out a chunk of 12 characters which contain the typo:
https://www4acth47i6kxnvkewtm6q7*ib2s3ujpo5sq* bsnzjpbi7utijcltosqemadwxyz.onion/ ---------------------------------^^^^^^^^^^^^
Spot the errant 'j'.
The advantage of a system like this is that it's not perfect, but a typo mostly has to happen twice and be quite fortunate to go undetected.
Of course it's not perfect, but nothing will be, and clever selection of checksum and encoding will result in something which is still DNS- and Browser-compliant.
-a
On 31 December 2017 at 11:46, Alec Muffett alec.muffett@gmail.com wrote:
...so that any UX component which wants to help the user can highlight (in red? or bold?) where the problem is, picking out a chunk of 12 characters which contain the typo: https://www4acth47i6kxnvkewtm6q7*ib2s3ujpo5sq*bsnzjpbi7utijclt osqemadwxyz.onion/ ---------------------------------^^^^^^^^^^^^ Spot the errant 'j'. The advantage of a system like this is that it's not perfect, but a typo mostly has to happen twice and be quite fortunate to go undetected. Of course it's not perfect, but nothing will be, and clever selection of checksum and encoding will result in something which is still DNS- and Browser-compliant.
One other advantage: a DNS-format-compliant checksum like this could be trivially baked into an SSL certificate without requiring CA/Browser Forum to invent a wholly new kind of certificate just-for-Tor
This would result in Prop224 Onion Addresses which would not only be typo-resistant, but could also continue to be issued with EV certificates where site-attestation is beneficial.
Further: adding segment-checksum bits at the end would be (I think?) backwards compatible with existing Prop224 addresses.
-a
Hi,
b) if Onion addresses have 2+ forms, one like the current (www.4acth47i6kxnvkewtm6q7ib2s3ufpo5sqbsnzjpbi7utijcltosqemad.onion) and the other being apparently more human-usable because it contains a CRC, the one which allows access to websites will win.
What if they both allow access to websites?
I had always thought that prop#279 addresses would be translated into their canonical forms before the browser acts on them. But the current proof-of-concept implementation would include them in the Host header, because the translation is done at the Tor layer (not the browser layer).
This also makes a mess of security certificates. (Or it means that both names would need to be in the certificate.)
And there's the issue of having two names for the same site.
My expectation to date has been that the problem with "4acth47i6kxnvkewtm6q7ib2s3ufpo5sqbsnzjpbi7utijcltosqemad" is that that there is no place for the eyeball to rest when typing it in; as such I've presumed that a canonical form, defined by Tor, would be something like:
https://www.4acth47i-6kxnvkew-tm6q7ib2-s3ufpo5-sqbsnzjp-bi7utij-cltosqem-ad....
...being N groups of M characters (where N and M can be argued, feel free...)
That's not what's specified right now, and it not what will be released in 0.3.2 in a few weeks.
But we could implement a grouping and checksum mechanism like this using a prop#279 plugin, much like the bech transform.
Depending on where we do the name translation, this change would cause the same Host header and certificate issues.
The advantage of a system like this is that it's not perfect, but a typo mostly has to happen twice and be quite fortunate to go undetected. Of course it's not perfect, but nothing will be, and clever selection of checksum and encoding will result in something which is still DNS- and Browser-compliant.
One other advantage: a DNS-format-compliant checksum like this could be trivially baked into an SSL certificate without requiring CA/Browser Forum to invent a wholly new kind of certificate just-for-Tor
This is true. We should make any schemes DNS-compliant, which is how the examples in prop#279 work.
This would result in Prop224 Onion Addresses which would not only be typo-resistant, but could also continue to be issued with EV certificates where site-attestation is beneficial.
Further: adding segment-checksum bits at the end would be (I think?) backwards compatible with existing Prop224 addresses.
They would be compatible, as would most prop#279 addresses, apart from the issues mentioned above.
Are you aware that there's already a checksum in v3 onion service addresses?
"The onion address of a hidden service includes its identity public key, a version field and a basic checksum."
https://gitweb.torproject.org/torspec.git/tree/rend-spec-v3.txt#n2012
T
On 31 Dec 2017 12:22, "teor" teor2345@gmail.com wrote:
Are you aware that there's already a checksum in v3 onion service addresses?
No I was not*, that's great!
"The onion address of a hidden service includes its identity public key,
a version field and a basic checksum."
It would be great to get the human interface elements to leverage this; perhaps overall we are premature in trying to solve the presumed HCI issues of long onions?
- a
*entirely
Date: Sun, 31 Dec 2017 11:46:28 +0000 From: Alec Muffett alec.muffett@gmail.com
Or, indeed, you could leave out the hyphens and do the same; the Prop224 Onion address is 59 characters, leaving a budget of 63-59==4 characters or 20 bits; we could put these at the end, in the space marked "@@@@":
https://www4acth47i6kxnvkewtm6q7ib2s3ufpo5sqbsnzjpbi7utijcltosqemad@@@@.onio...
Existing checksum in v3 addresses aside, what would prevent using a second DNS label for a longer checksum if you wanted a bigger budget?
The labels are limited to 63 octets, but the whole name can be up to 255 (including label length bytes).
On 2018-01-01 at 22:36:53 +0000, Taylor R Campbell campbell+tor-dev@mumble.net wrote:
Date: Sun, 31 Dec 2017 11:46:28 +0000 From: Alec Muffett alec.muffett@gmail.com
Or, indeed, you could leave out the hyphens and do the same; the Prop224 Onion address is 59 characters, leaving a budget of 63-59==4 characters or 20 bits; we could put these at the end, in the space marked "@@@@":
https://www4acth47i6kxnvkewtm6q7ib2s3ufpo5sqbsnzjpbi7utijcltosqemad@@@@.onio...
Actually, the label part is 56 characters, not 59 characters. rend-spec-v3.txt, § 6 [ONIONADDRESS]. See also § 1.2 [NAMING] (“The result is a 56-character domain name”—nit, that should be “label”). Using the first example example address therefrom:
$ bech32 -e pg6mmjiyjmcrsslvykfwnntlaru7p5svn6y2ymmju6nubxndf4pscryd.onion onion10x7vvfgcfvz3jjt4c29kddntq35l0aj4d7c6cvvf57d5phdr9u0jz3crm5jhsx
Of course, 56 + 6 = ...
$ echo -n \ 0x7vvfgcfvz3jjt4c29kddntq35l0aj4d7c6cvvf57d5phdr9u0jz3crm5jhsx \ | wc -c 62
N.b. that this still includes the two octets of truncated SHA3-256, wrapped inside a format with 30 bits of error-correcting BCH code. Decoding/re-encoding the name to drop the SHA3 bits would cut the payload from 280 to 264 octets, which could be represented in 53+6=59 Bech32 characters with the BCH ECC.
I also question whether the onion version needs a whole octet. In the specific application of Bech32 to Bitcoin, the “witness version” (version of encoded tx auth program) is restricted to 0–16, inclusive; and the Bech32 coding is done with one of what I will call a “quintet” char (5 bits) for the version, followed by the encoding of 8-bit octets of the witness program.[0] If the .onion version were resticted to 0–15 so as to fit in 4 bits, then only 260 bits = 52 quintets would be needed to express the version plus the 256-bit master identity key. How many .onion address versions are expected in, say, the next 20–30 years? Adding a 6-char BCH code, the total label length would be 58 quintet characters.
At these lengths, I think every character of pseudorandom data which can be reasonably shaved off is a significant win for wetware UX.
0. Note, Bech32 encoding rules do not require that the encoded bit length be a multiple of 5. The standard prescribes the simple rule that strings of octets be zero-padded to a multiple of 5 bits when encoding, and decoded to octets with up to 4 trailing 0 bits discarded. https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki
Existing checksum in v3 addresses aside, what would prevent using a second DNS label for a longer checksum if you wanted a bigger budget?
The labels are limited to 63 octets, but the whole name can be up to 255 (including label length bytes).
I expect that the user burden of a greater length of pseudorandom gibberish would outweigh any possible UX benefit of adding more checksum data. A 6-quintet BCH code already provides error correction, guarantees detection of errors affecting not more than 4 characters, and has a <10^-9 probability of failing to detect a greater number of errors. Is better than that really needed?
Upon the same cryptographic self-validation principle which .onion applies in the first place, I have also considered such possibilities as encoding a TLS public key fingerprint in subdomain labels. The fingerprint could be automatically verified by the connecting TLS client against the same data it itself provides via SNI. This could alleviate the current need to get CAB Forum to approve some form of DV for .onion certificates. However, the results must be considered absolutely impracticable for humans transcription. The usage model would rely exclusively on bookmarks, copypaste, etc.
On 2 Jan 2018, at 10:51, nullius nullius@nym.zone wrote:
On 2018-01-01 at 22:36:53 +0000, Taylor R Campbell campbell+tor-dev@mumble.net wrote:
Date: Sun, 31 Dec 2017 11:46:28 +0000 From: Alec Muffett alec.muffett@gmail.com
Or, indeed, you could leave out the hyphens and do the same; the Prop224 Onion address is 59 characters, leaving a budget of 63-59==4 characters or 20 bits; we could put these at the end, in the space marked "@@@@":
https://www4acth47i6kxnvkewtm6q7ib2s3ufpo5sqbsnzjpbi7utijcltosqemad@@@@.onio...
Actually, the label part is 56 characters, not 59 characters. rend-spec-v3.txt, § 6 [ONIONADDRESS]. See also § 1.2 [NAMING] (“The result is a 56-character domain name”—nit, that should be “label”).
We would happily take a patch that makes the wording more precise throughout the proposal and Tor's other specifications.
…
N.b. that this still includes the two octets of truncated SHA3-256, wrapped inside a format with 30 bits of error-correcting BCH code. Decoding/re-encoding the name to drop the SHA3 bits would cut the payload from 280 to 264 octets, which could be represented in 53+6=59 Bech32 characters with the BCH ECC.
You could safely drop and recalculate the hash, but if the onion address encoding changes in a future version, you would have to patch all the bech code.
I also question whether the onion version needs a whole octet. In the specific application of Bech32 to Bitcoin, the “witness version” (version of encoded tx auth program) is restricted to 0–16, inclusive; and the Bech32 coding is done with one of what I will call a “quintet” char (5 bits) for the version, followed by the encoding of 8-bit octets of the witness program.[0] If the .onion version were resticted to 0–15 so as to fit in 4 bits, then only 260 bits = 52 quintets would be needed to express the version plus the 256-bit master identity key. How many .onion address versions are expected in, say, the next 20–30 years? Adding a 6-char BCH code, the total label length would be 58 quintet characters.
At these lengths, I think every character of pseudorandom data which can be reasonably shaved off is a significant win for wetware UX.
We won't be revising the spec at this point, because it's been implemented. However, you could suggest that the next version of onion services only uses 5 bits to encode the version.
You could safely encode the current version 3 in zero bits, but if the onion address encoding changes in a future version, you would have to patch all the bech code.
One way of doing this is to make the bech prefix "onion3".
...
T