[tor-dev] Mnemonic 80-bit phrases (proposal)

Sai tor at saizai.com
Wed Mar 21 02:47:56 UTC 2012


On Tue, Mar 20, 2012 at 20:11, Ken Takusagawa II
<ken.takusagawa.2 at gmail.com> wrote:
> 1. You need 2^8=256 templates, not just 8, to reach 6*12+8=80 bits.

We won't know for sure how it hashes out until we make both the
dictionaries and the syntax generator. The ambiguity was intentional.

But yes, it may well use a number of generated templates. We're
thinking of making it symbolic expansion based, which is more
efficient on bits but also more complicated to describe before it's
fixed (and it'll require a parser library).

> 2. Having toyed with this idea in the past, let me warn that forming a 4096
> word dictionary of memorable, non-colliding  words for each word category is
> going to be very difficult.  Too many words are semantically similar,
> phonetically similar, or just unfamiliar.

Our intention currently is to first take candidate dictionaries from
WordNet, and use a combination of WordNet and Google 1-gram frequency
data as part of the cutoff for whether words are adequately familiar.
(N-grams with n >= 2 are rather irrelevant to our needs, AFAICT.)

> http://kenta.blogspot.com/2012/02/lefoezyy-some-notes-on-google-books.html

Thanks; that could be useful.

> Another way to go about it might be to first catalogue semantic categories
> (colors, animals, etc.) then list the most common (yet dissimilar) members
> of each category.  An attempt at 64 words is here:

This is something that WordNet has already done.

> http://kenta.blogspot.com/2011/10/xpmqawkv-common-words.html

I think you omit far more common words, which you shouldn't — eg air
water coal man house etc.

But quibbling at this level is pointless; we'll need to be dealing
with dictionaries mostly on the order of a few thousand words, sorted
by *constituent types*, not be semantic categories. (E.g. one
dictionary would be "nouns that can be the target of a transitive
verb".)

> I'd propose that the "right" way to do this is not just sentences, but
> entire semantically consistent stories, written in rhyming verse, with
> entropy of perhaps only a few bits per sentence.  (Prehistoric oral
> tradition does prove we can memorize such poems.)  However, synthesizing
> these seem extremely difficult, an AI problem.

I think it's currently impossible to do that, and furthermore, that
it's *not* Right even if you could — because it would violate a key
constraint: that it can be reasonably typed as a domain. It shouldn't
take longer than a few seconds to remember and type. It won't be as
fast as typing "google.com", and that's OK, but I think that level of
redundant expansion is way too much.

Creating unambiguously parseable syntaxes and dictionaries that meet
our stated constraints is already hard enough. ;-)

> 3. I presume people are familiar with Bubblebabble?  It doesn't solve all
> the problems, but does make bit strings seem less "dense".

BubbleBabble produces nonwords; as such it fails a basic requirement.
Making something merely look phonotactically valid isn't enough; it
has to be grammatically valid and composed entirely of known terms.

- Sai


More information about the tor-dev mailing list