Hello list,
we've had discussions over the past years about how to encode prop224 onion addresses. Here is the latest thread: https://lists.torproject.org/pipermail/tor-dev/2016-December/011734.html
Bikeshedding is over; it's time to finally pick a scheme! My suggested scheme basically follows from the discussion on that thread, and is heavily based on the Bitcoin address format: https://en.bitcoin.it/wiki/Base58Check_encoding https://en.bitcoin.it/wiki/Technical_background_of_version_1_Bitcoin_address...
Here is the suggested scheme:
onion_address = base32(version + pubkey + checksum) checksum = SHA3(".onion checksum" + version + pubkey)
where: pubkey is 32 bytes (ed25519) version is one byte checksum is _truncated_ to two bytes
With the above construction onion_address ends up being 56 bytes long (excluding the ".onion"):
tbi5tdxbosiotphawjyu7f5pw5tlnvbvfjrj7meskbsnwr2bqbu2t4gg.onion tcrdnadkefvbdm3u56kz6lfh6v5lr24fpog5vzsy4n3djr2ymueu34ws.onion tcdw7lwmtp5pbwj2w7wf6amxdhmc62qitj2teu376r5s2fqke4r3uiq6.onion
If people like the above suggestion, I will take the effort to engrave it in prop224.
Here is the discussion section. Please provide feedback!
[D1] How to use version field:
The version field is one byte long. If we use it as an integer we can encode 256 values in it; if we use it as a bitmap we could encode properties and such.
My suggestion is to simply use it as an integer like Bitcoin does. So we can assign value \x01 to normal onion services, and in the future we can assign more version tags if we need to. For example, we can give a different version field to onion services in the testnet. We can also reserve a range of values for application-specific purposes.
[D1.1] Default version value:
The next question is what version value to assign to normal onion services. In the above scheme where:
onion_address = base32(version + pubkey + checksum)
the value of 'version' basically determines the first two characters of the onion address. In Bitcoin, they've made it such that the default version value basically prefixes addresses with "1"; so all normal Bitcoin addresses start with 1 as in 14tDWDT9zqDufWZmiLqoaT9qJyHi7RRZPE
What should we do in Tor? My suggestion is to use '\x98' as the default version value which prefixes all addresses with 't' (as in Tor). Check the examples I cited above.
An alternative is to turn the scheme to: onion_address = base32(pubkey + checksum + version) where the version byte is at the end with no effect at usability.
A heavier alternative would be to have two bytes of version so that we can just prefix them all with 'tor'...
[D2] Checksum strength:
In the suggested scheme we use a hash-based checksum of two bytes (16 bits). This means that in case of an address typo, we have 1/65536 probability to not detect the error (false negative). It also means that after 256 typos we will have 50% probability to miss an error (happy birthday!).
I feel like the above numbers are pretty good given the small checksum size.
The alternative would be to make the checksum four bytes (like in Bitcoin). This would _greatly_ increase the strength of our checksum but it would also increase our address length by 4 base32 characters (and also force us to remove leading padding from base32 output). This is how these 60-character addresses look like:
tc2dty3zowj6oyhbyb5n3a2h3luztlx22hy2cwdvn37omsv7quy7rxiysn3a.onion tbdczrndtadzdhb6iyemnxf7f4i6x7yojnunarlrvt2virtmrecmwgx5golq.onion tc6pcgyorusw3jj5tosxakmcwfmcend2q4g2qnbjtkhuuh4dcgvs4rl4rdaa.onion
You probably don't notice the size difference compared to the 56-character addresses, which perhaps is an argument for adopting a four byte checksum. Let me know what you think about this.
[D3] Do we like base32???
In this proposal I suggest we keep the base32 encoding since we've been using it for a while; but this is the perfect time to switch if we feel the need to.
For example, Bitcoin is using base58 which is much more compact than base32, and also has much better UX properties than base64: https://en.bitcoin.it/wiki/Base58Check_encoding#Background
If we wanted to get a more compact encoding, we could adopt base58 or make our own adaptation of it. In this proposal I'm using base32 for everything, but I could be persuaded that now is the time to use a better encoding.
Let me know what you think!
Thanks :)