On 31 December 2017 at 02:46, nullius <nullius@nym.zone> wrote:
For the foregoing reasons, I will propose that subdomain data, if any, be kept separate from the Bech32 coding.  It may be either kept in a separate string, or somehow affixed with a special delimiter either before or after the Bech32 representation of the onion.  Off-the-cuff, which of these looks best to you?

        www:onion19qzypww2zw3ykkkglr4tu9

        onion19qzypww2zw3ykkkglr4tu9:www

        another-level.www:onion19qzypww2zw3ykkkglr4tu9

(My choice of a delimiter here may be wrong, if we want for the browser’s address bar to translate it.  I should think more about this.)


I need to think about this more, and after coffee, but my first concerns would be:

1) that having multiple representations of a site's onion address is likely to break many/most sites, because of Host/Origin headers being complicated enough already.

2) anything involving colons in any position ("https://onion19qzypww2zw3ykkkglr4tu9:www/") is likely to break both client-side-web-browsers and server-side-CMS-software unless they are specially re-engineered for Tor, which is likely to inhibit use *of* Tor; colons are a port-number separator in URLs, unless they come as part of an IPv6 address in [square brackets].


My general sense is that: 

a) if Onion addresses suddenly stop looking very-similar-to DNS addresses, Tor risks returning to a world where special expertise is necessary to build software for it, thereby harming growth/adoption

b) if Onion addresses have 2+ forms, one like the current (www.4acth47i6kxnvkewtm6q7ib2s3ufpo5sqbsnzjpbi7utijcltosqemad.onion) and the other being apparently more human-usable because it contains a CRC, the one which allows access to websites will win.


My expectation to date has been that the problem with "4acth47i6kxnvkewtm6q7ib2s3ufpo5sqbsnzjpbi7utijcltosqemad" is that that there is no place for the eyeball to rest when typing it in; as such I've presumed that a canonical form, defined by Tor, would be something like:

https://www.4acth47i-6kxnvkew-tm6q7ib2-s3ufpo5-sqbsnzjp-bi7utij-cltosqem-ad.onion/

 ...being N groups of M characters (where N and M can be argued, feel free...) and where any unused characters within the 63-character DNS-compliant budget can be used to implement a credit-card-like running checksum or CRC, for quick client-side checks; eg: the URL bar can identify that you are typing in an Onion address and leave it red-or-grey until you type something which satisfies the checksum, before flinging it at tor-daemon for attempted resolution.

Or, indeed, you could leave out the hyphens and do the same; the Prop224 Onion address is 59 characters, leaving a budget of 63-59==4 characters or 20 bits; we could put these at the end, in the space marked "@@@@":

  https://www4acth47i6kxnvkewtm6q7ib2s3ufpo5sqbsnzjpbi7utijcltosqemad@@@@.onion/

....and use those 20 bits to implement 5x 4-bit checksums over 12-character chunks:

   https://{www4acth47i6}{kxnvkewtm6q7}{ib2s3ufpo5sq}{bsnzjpbi7uti}{jcltosqemad@}@@@.onion/

...so that any UX component which wants to help the user can highlight (in red? or bold?) where the problem is, picking out a chunk of 12 characters which contain the typo:

   https://www4acth47i6kxnvkewtm6q7ib2s3ujpo5sqbsnzjpbi7utijcltosqemadwxyz.onion/
  ---------------------------------^^^^^^^^^^^^ 

Spot the errant 'j'.

The advantage of a system like this is that it's not perfect, but a typo mostly has to happen twice and be quite fortunate to go undetected.

Of course it's not perfect, but nothing will be, and clever selection of checksum and encoding will result in something which is still DNS- and Browser-compliant.

    -a



--