[torspec/master] Add proposal 194 by Saizai and Alex Fink

29 Feb 2012

commit 7b76d529802740b11af62b5ded6f61dc5d39c1d4
Author: Jacob Appelbaum <jacob@appelbaum.net>
Date:   Wed Feb 29 15:09:17 2012 -0400

    Add proposal 194 by Saizai and Alex Fink
---
 proposals/000-index.txt         |    2 +
 proposals/194-mnemonic-urls.txt |  200 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 202 insertions(+), 0 deletions(-)

diff --git a/proposals/000-index.txt b/proposals/000-index.txt
index 73cdd5b..95dd4d4 100644
--- a/proposals/000-index.txt
+++ b/proposals/000-index.txt
@@ -114,6 +114,7 @@ Proposals by number:
 191  Bridge Detection Resistance against MITM-capable Adversaries [OPEN]
 192  Automatically retrieve and store information about bridges [OPEN]
 193  Safe cookie authentication for Tor controllers [OPEN]
+194  Mnemonic .onion URLs [OPEN]
 
 
 Proposals by status:
@@ -149,6 +150,7 @@ Proposals by status:
    191  Bridge Detection Resistance against MITM-capable Adversaries
    192  Automatically retrieve and store information about bridges [for 0.2.[45].x]
    193  Safe cookie authentication for Tor controllers
+   194  Mnemonic .onion URLs
  ACCEPTED:
    117  IPv6 exits [for 0.2.3.x]
    140  Provide diffs between consensuses
diff --git a/proposals/194-mnemonic-urls.txt b/proposals/194-mnemonic-urls.txt
new file mode 100644
index 0000000..7d66196
--- /dev/null
+++ b/proposals/194-mnemonic-urls.txt
@@ -0,0 +1,200 @@
+Filename: 194-mnemonic-urls.txt
+Title: Mnemonic .onion URLs
+Author: Sai, Alex Fink
+Created: 29-Feb-2012
+Status: Open
+
+1. Overview
+
+  Currently, canonical Tor .onion URLs consist of a naked 80-bit hash[1]. This
+  is not something that users can even recognize for validity, let alone produce
+  directly. It is vulnerable to partial-match fuzzing attacks[2], where a
+  would-be MITM attacker generates a very similar hash and uses various social
+  engineering, wiki poisoning, or other methods to trick the user into visiting
+  the spoof site.
+
+  This proposal gives an alternative method for displaying and entering .onion
+  and other URLs, such that they will be easily remembered and generated by end
+  users, and easily published by hidden service websites, without any dependency
+  on a full domain name type system like e.g. namecoin[3]. This makes it easier
+  to implement (requiring only a change in the proxy).
+
+  This proposal could equally be used for IPv4, IPv6, etc, if normal DNS is for
+  some reason untrusted.
+
+  This is not a petname system[4], in that it does not allow service providers
+  or users[5] to associate a name of their choosing to an address[6]. Rather, it
+  is a mnemonic system that encodes the 80 bit .onion address into a
+  meaningful[7] and memorable sentence. A full petname system (based on
+  registration of some kind, and allowing for shorter, service-chosen URLs) can
+  be implemented in parallel[8].
+
+  This system has the three properties of being secure, distributed, and
+  human-meaningful — it just doesn't also have choice of name (except of course
+  by brute force creation of multiple keys to see if one has an encoding the
+  operator likes).
+
+  This is inspired by Jonathan Ackerman's "Four Little Words" proposal[9] for
+  doing the same thing with IPv4 addresses. We just need to handle 80+ bits, not
+  just 32 bits.
+
+  It is similar to Markus Jakobsson & Ruj Akavipat's FastWord system[10], except
+  that it does not permit user choice of passphrase, does not know what URL a
+  user will enter (vs verifying against a single stored password), and again has
+  to encode significantly more data.
+
+  This is also similar to RFC1751[11], RFC2289[12], and multiple other
+  fingerprint encoding systems[13] (e.g.  PGPfone[14] using the PGP
+  wordlist[15], and Arturo Filatsò's OnionURL[16]), but we aim to make something
+  that's as easy as possible for users to remember — and significantly easier
+  than just a list of words or pseudowords, which we consider only useful as an
+  active confirmation tool, not as something that can be fully memorized and
+  recalled, like a normal domain name.
+
+2. Requirements
+
+2.1. encodes at least 80 bits of random data (preferably more, eg for a
+checksum)
+
+2.2. valid, visualizable English sentence — not just a series of words[17]
+
+2.3. words are common enough that non-native speakers and bad spellers will have
+minimum difficulty remembering and producing (perhaps with some spellcheck help)
+
+2.4. not syntactically confusable (e.g. order should not matter)
+
+2.5. short enough to be easily memorized and fully recalled at will, not just
+recognized
+
+2.6. no dependency on an external service
+
+2.7. dictionary size small enough to be reasonable for end users to download as
+part of the onion package
+
+2.8. consistent across users (so that websites can e.g. reinforce their random
+hash's phrase with a clever drawing)
+
+2.9. not create offensive sentences that service providers will reject
+
+2.10. resistant against semantic fuzzing (e.g. by having uniqueness against
+WordNet synsets[18])
+
+3. Possible implementations
+
+  This section is intentionally left unfinished; full listing of template
+  sentences and the details of their parser and generating implementation is
+  co-dependent on the creation of word class dictionaries fulfilling the above
+  criteria. Since that's fairly labor-intensive, we're pausing at this stage for
+  input first, to avoid wasting work.
+
+3.1. Have a fixed number of template sentences, such as:
+
+  1. Adj subj adv vtrans adj obj
+  2. Subj and subj vtrans adj obj
+  3. … etc
+
+  For a 6 word sentence, with 8 (3b) templates, we need ~12b (4k word)
+  dictionaries for each word category.
+
+  If multiple words of the same category are used, they must either play
+  different grammatical roles (eg subj vs obj, or adj on a different item), be
+  chosen from different dictionaries, or there needs to be an order-agnostic way
+  to join them at the bit level. Preferably this should be avoided, just to
+  prevent users forgetting the order.
+
+3.2. As 3.1, but treat sentence generation as decoding a prefix code, and have
+  a Huffman code for each word class.
+
+  We suppose it’s okay if the generated sentence has a few more words than it
+  might, as long as they’re common lean words.  E.g., for adjectives, "good"
+  might cost only six bits while "unfortunate" costs twelve.
+
+  Choice between different sentence syntaxes could be worked into the prefix
+  code as well, and potentially done separately for each syntactic constituent.
+
+4. Usage
+
+  To form mnemonic .onion URL, just join the words with dashes or underscores,
+  stripping minimal words like 'a', 'the', 'and' etc., and append '.onion'. This
+  can be readily distinguished from standard hash-style .onion URLs by form.
+
+  Translation should take place at the client — though hidden service servers
+  should also be able to output the mnemonic form of hashes too, to assist
+  website operators in publishing them (e.g. by posting an amusing drawing of
+  the described situation on their website to reinforce the mnemonic).
+
+  After the translation stage of name resolution, everything proceeds as normal
+  for an 80-bit hash onion URL.
+
+  The user should be notified of the mnemonic form of hash URL in some way, and
+  have an easy way in the client UI to translate mnemonics to hashes and vice
+  versa. For the purposes of browser URLs and the like though, the mnemonic
+  should be treated on par with the hash; if the user enters a mnemonic URL they
+  should not become redirected to the hash version. (If anything, the opposite
+  may be true, so that users become used to seeing and verifying the mnemonic
+  version of hash URLs, and gain the security benefits against partial-match
+  fuzzing.)
+
+  Ideally, inputs that don't validly resolve should have a response page served
+  by the proxy that uses a simple spell-check system to suggest alternate domain
+  names that are valid hash encodings. This could hypothetically be done inline
+  in URL input, but would require changes on the browser (normally domain names
+  aren't subject so spellcheck), and this avoids that implementation problem.
+
+5. International support
+
+  It is not possible for this scheme to support non-English languages without
+  a) (usually) Unicode in domains (which is not yet well supported by browsers),
+  and
+  b) fully customized dictionaries and phrase patterns per language
+
+  The scheme must not be used in an attempted 'translation' by simply replacing
+  English words with glosses in the target language. Several of the necessary
+  features would be completely mangled by this (e.g. other languages have
+  different synonym, homonym, etc groupings, not to mention completely different
+  grammar).
+
+  It is unlikely a priori that URLs constructed using a non-English
+  dictionary/pattern setup would in any sense 'translate' semantically to
+  English; more likely is that each language would have completely unrelated
+  encodings for a given hash.
+
+  We intend to only make an English version at first, to avoid these issues
+  during testing.
+
+________________
+
+[1] https://trac.torproject.org/projects/tor/wiki/doc/HiddenServiceNames
+https://gitweb.torproject.org/torspec.git/blob/HEAD:/address-spec.txt
+[2] http://www.thc.org/papers/ffp.html
+[3] http://dot-bit.org/Namecoin
+[4] https://en.wikipedia.org/wiki/Zooko's_triangle
+[5] https://addons.mozilla.org/en-US/firefox/addon/petname-tool/
+[6] However, service operators can generate a large number of hidden service
+descriptors and check whether their hashes result in a desirable phrasal
+encoding (much like certain hidden services currently use brute force generated
+hashes to ensure their name is the prefix of their raw hash). This won't get you
+whatever phrase you want, but will at least improve the likelihood that it's
+something amusing and acceptable.
+[7] "Meaningful" here inasmuch as e.g. "Barnaby thoughtfully mangles simplistic
+yellow camels" is an absurdist but meaningful sentence. Absurdness is a feature,
+not a bug; it decreases the probability of mistakes if the scenario described is
+not one that the user would try to fit into a template of things they have
+previously encountered IRL. See research into linguistic schema for further
+details.
+[8] https://gitweb.torproject.org/torspec.git/blob/HEAD:/proposals/ideas/xxx-oni
+on-nyms.txt
+[9] http://blog.rabidgremlin.com/2010/11/28/4-little-words/
+[10] http://fastword.me/
+[11] https://tools.ietf.org/html/rfc1751
+[12] http://tools.ietf.org/html/rfc2289
+[13] https://github.com/singpolyma/mnemonicode
+http://mysteryrobot.com
+https://github.com/zacharyvoase/humanhash
+[14] http://www.mathcs.duq.edu/~juola/papers.d/icslp96.pdf
+[15] http://en.wikipedia.org/wiki/PGP_word_list
+[16] https://github.com/hellais/Onion-url
+https://github.com/hellais/Onion-url/blob/master/dev/mnemonic.py
+[17] http://www.reddit.com/r/technology/comments/ecllk
+[18] http://wordnet.princeton.edu/wordnet/man2.1/wnstats.7WN.html
+

    

ioerror＠torproject.org

tags

participants (1)