On Tue, Dec 20, 2011 at 11:20 PM, Sai tor@saizai.com wrote:
Howdy all.
I would like to make a mnemonic translation method for onion URLs. Not as in a full petname system with registration and so forth, just a "four little words" style way of rendering and entering existing 80-bit hashes in a way that is memorable to not just detect partial-overlap fuzzing attacks but actually memorize and produce the URL directly.
So, the last time I was talking about this (with a grad student whose name I have just asked for permission to cite), I came up with the following scheme. Let BW be the number of bits per natural-language word; let BH be the number of bits per hidden service identity key hash. Let H() be some hash function. Represent the onion key identity hash as a sequence of BH/BW natural-language words, letting each word W represent the first BW bits of H(W).
So if H() is SHA256, and BW is 20, and BH is 80, the hostname bookbind.electrodynamism.unsurnamed.squibbish would represent the 80-bit sequence Ht("bookbind") Ht("electrodynamism") H("unsurnamed") H("squibbish") where Ht(x) is the first 20 bits of SHA256(x).
(In practice, 80 bits is feeling way too short these days; but that's another discussion.)
The main advantages and disadvantages of this idea are:
+ You don't need to ship a shared wordlist with the client software. (That's a good thing, since big dictionaries get annoying fast.) + You don't need to restrict yourself to any particular language; it is not English-centric. - If bw is large enough, there will be sequences of bits that are not represented as the hash of any particular English word. (e.g., if bw is 32, you are in trouble: no language has 4 billion words!) You can work around this by having your search program generate new english-sounding sequences, by using a bigger wordlist or by generating keys until you find one whose fingerprint *is* representable with words you have. - If you make BW big, you are going to get some pretty obscure words. - Encodings are not unique. + You can get a lot of bits per word. If you're willing to do stuff like "every word in english" x "the numbers 00-99", you can get even more. + It scales to more than 80 bits pretty easily. - It doesn't make pretty sentences by default, though you can keep generating words till you find something you like the sound of. - We'd need to watch out for words that look equivalent to a human, but aren't. / You actually need to write a reasonably good search program. This isn't too hard, and the database it generates isn't too big.
There are other issues and other possible solutions too.
yrs,