[tor-dev] Drafting a proposal for mnemonic onion URLs
nickm at alum.mit.edu
Wed Dec 21 05:16:58 UTC 2011
On Tue, Dec 20, 2011 at 11:20 PM, Sai <tor at saizai.com> wrote:
> Howdy all.
> I would like to make a mnemonic translation method for onion URLs. Not
> as in a full petname system with registration and so forth, just a
> "four little words" style way of rendering and entering existing
> 80-bit hashes in a way that is memorable to not just detect
> partial-overlap fuzzing attacks but actually memorize and produce the
> URL directly.
So, the last time I was talking about this (with a grad student whose
name I have just asked for permission to cite), I came up with the
following scheme. Let BW be the number of bits per natural-language
word; let BH be the number of bits per hidden service identity key
hash. Let H() be some hash function. Represent the onion key
identity hash as a sequence of BH/BW natural-language words, letting
each word W represent the first BW bits of H(W).
So if H() is SHA256, and BW is 20, and BH is 80, the hostname
would represent the 80-bit sequence
Ht("bookbind") Ht("electrodynamism") H("unsurnamed") H("squibbish")
where Ht(x) is the first 20 bits of SHA256(x).
(In practice, 80 bits is feeling way too short these days; but that's
The main advantages and disadvantages of this idea are:
+ You don't need to ship a shared wordlist with the client software.
(That's a good thing, since big dictionaries get annoying fast.)
+ You don't need to restrict yourself to any particular language; it
is not English-centric.
- If bw is large enough, there will be sequences of bits that are
not represented as the hash of any particular English word. (e.g., if
bw is 32, you are in trouble: no language has 4 billion words!) You
can work around this by having your search program generate new
english-sounding sequences, by using a bigger wordlist or by
generating keys until you find one whose fingerprint *is*
representable with words you have.
- If you make BW big, you are going to get some pretty obscure words.
- Encodings are not unique.
+ You can get a lot of bits per word. If you're willing to do stuff
like "every word in english" x "the numbers 00-99", you can get even
+ It scales to more than 80 bits pretty easily.
- It doesn't make pretty sentences by default, though you can keep
generating words till you find something you like the sound of.
- We'd need to watch out for words that look equivalent to a human,
/ You actually need to write a reasonably good search program. This
isn't too hard, and the database it generates isn't too big.
There are other issues and other possible solutions too.
More information about the tor-dev