Hello George,
George Kadianakis wrote:
Hello list,
we've had discussions over the past years about how to encode prop224 onion addresses. Here is the latest thread: https://lists.torproject.org/pipermail/tor-dev/2016-December/011734.html
Bikeshedding is over; it's time to finally pick a scheme! My suggested scheme basically follows from the discussion on that thread, and is heavily based on the Bitcoin address format: https://en.bitcoin.it/wiki/Base58Check_encoding https://en.bitcoin.it/wiki/Technical_background_of_version_1_Bitcoin_address...
Here is the suggested scheme:
onion_address = base32(version + pubkey + checksum) checksum = SHA3(".onion checksum" + version + pubkey)
where: pubkey is 32 bytes (ed25519) version is one byte checksum is _truncated_ to two bytes
With the above construction onion_address ends up being 56 bytes long (excluding the ".onion"):
tbi5tdxbosiotphawjyu7f5pw5tlnvbvfjrj7meskbsnwr2bqbu2t4gg.onion tcrdnadkefvbdm3u56kz6lfh6v5lr24fpog5vzsy4n3djr2ymueu34ws.onion tcdw7lwmtp5pbwj2w7wf6amxdhmc62qitj2teu376r5s2fqke4r3uiq6.onion
If people like the above suggestion, I will take the effort to engrave it in prop224.
Here is the discussion section. Please provide feedback!
[D1] How to use version field:
The version field is one byte long. If we use it as an integer we can encode 256 values in it; if we use it as a bitmap we could encode properties and such. My suggestion is to simply use it as an integer like Bitcoin does. So we can assign value \x01 to normal onion services, and in the future we can assign more version tags if we need to. For example, we can give a different version field to onion services in the testnet. We can also reserve a range of values for application-specific purposes.
[D1.1] Default version value:
The next question is what version value to assign to normal onion services. In the above scheme where: onion_address = base32(version + pubkey + checksum) the value of 'version' basically determines the first two characters of the onion address. In Bitcoin, they've made it such that the default version value basically prefixes addresses with "1"; so all normal Bitcoin addresses start with 1 as in 14tDWDT9zqDufWZmiLqoaT9qJyHi7RRZPE What should we do in Tor? My suggestion is to use '\x98' as the default version value which prefixes all addresses with 't' (as in Tor). Check the examples I cited above. An alternative is to turn the scheme to: onion_address = base32(pubkey + checksum + version) where the version byte is at the end with no effect at usability. A heavier alternative would be to have two bytes of version so that we can just prefix them all with 'tor'...
The version field is useful and allows room for much stuff that we might need to do. I think it would be better to place it at the end of the address. I don't think all addresses should start with the same prefix tbh - this will make them slightly less distinguishable (as much as possible users should be able to differentiate onion addresses, which are re-usable for long term, as opposite to Bitcoin where the recommended way is to use 1 address 1 time, different one every time and the users just need to see a string that looks and reads like a Bitcoin address and just make sure it's copied (scanned) from/to the right place).
[D2] Checksum strength:
In the suggested scheme we use a hash-based checksum of two bytes (16 bits). This means that in case of an address typo, we have 1/65536 probability to not detect the error (false negative). It also means that after 256 typos we will have 50% probability to miss an error (happy birthday!). I feel like the above numbers are pretty good given the small checksum size.
Yes, the numbers are very good.
The alternative would be to make the checksum four bytes (like in Bitcoin). This would _greatly_ increase the strength of our checksum but it would also increase our address length by 4 base32 characters (and also force us to remove leading padding from base32 output). This is how these 60-character addresses look like: tc2dty3zowj6oyhbyb5n3a2h3luztlx22hy2cwdvn37omsv7quy7rxiysn3a.onion tbdczrndtadzdhb6iyemnxf7f4i6x7yojnunarlrvt2virtmrecmwgx5golq.onion tc6pcgyorusw3jj5tosxakmcwfmcend2q4g2qnbjtkhuuh4dcgvs4rl4rdaa.onion You probably don't notice the size difference compared to the 56-character addresses, which perhaps is an argument for adopting a four byte checksum. Let me know what you think about this.
I don't think so. I think our best bet is a checksum of 2 bytes, this offers sufficient strength for our use cases.
[D3] Do we like base32???
In this proposal I suggest we keep the base32 encoding since we've been using it for a while; but this is the perfect time to switch if we feel the need to. For example, Bitcoin is using base58 which is much more compact than base32, and also has much better UX properties than base64: https://en.bitcoin.it/wiki/Base58Check_encoding#Background If we wanted to get a more compact encoding, we could adopt base58 or make our own adaptation of it. In this proposal I'm using base32 for everything, but I could be persuaded that now is the time to use a better encoding.
Let me know what you think!
When talking about > 50 chars strings, I think memorizing is (for most users at least) very hard, regardless if encoding is base58 or base32. What I think is more important is that onion addresses should not contain upper case and lower case characters, they should look as much as possible like regular DNS hostnames. For this reason I would go for base32 here.