[tor-dev] Mnemonic 80-bit phrases (proposal)
tor at saizai.com
Wed Mar 21 02:47:56 UTC 2012
On Tue, Mar 20, 2012 at 20:11, Ken Takusagawa II
<ken.takusagawa.2 at gmail.com> wrote:
> 1. You need 2^8=256 templates, not just 8, to reach 6*12+8=80 bits.
We won't know for sure how it hashes out until we make both the
dictionaries and the syntax generator. The ambiguity was intentional.
But yes, it may well use a number of generated templates. We're
thinking of making it symbolic expansion based, which is more
efficient on bits but also more complicated to describe before it's
fixed (and it'll require a parser library).
> 2. Having toyed with this idea in the past, let me warn that forming a 4096
> word dictionary of memorable, non-colliding words for each word category is
> going to be very difficult. Too many words are semantically similar,
> phonetically similar, or just unfamiliar.
Our intention currently is to first take candidate dictionaries from
WordNet, and use a combination of WordNet and Google 1-gram frequency
data as part of the cutoff for whether words are adequately familiar.
(N-grams with n >= 2 are rather irrelevant to our needs, AFAICT.)
Thanks; that could be useful.
> Another way to go about it might be to first catalogue semantic categories
> (colors, animals, etc.) then list the most common (yet dissimilar) members
> of each category. An attempt at 64 words is here:
This is something that WordNet has already done.
I think you omit far more common words, which you shouldn't — eg air
water coal man house etc.
But quibbling at this level is pointless; we'll need to be dealing
with dictionaries mostly on the order of a few thousand words, sorted
by *constituent types*, not be semantic categories. (E.g. one
dictionary would be "nouns that can be the target of a transitive
> I'd propose that the "right" way to do this is not just sentences, but
> entire semantically consistent stories, written in rhyming verse, with
> entropy of perhaps only a few bits per sentence. (Prehistoric oral
> tradition does prove we can memorize such poems.) However, synthesizing
> these seem extremely difficult, an AI problem.
I think it's currently impossible to do that, and furthermore, that
it's *not* Right even if you could — because it would violate a key
constraint: that it can be reasonably typed as a domain. It shouldn't
take longer than a few seconds to remember and type. It won't be as
fast as typing "google.com", and that's OK, but I think that level of
redundant expansion is way too much.
Creating unambiguously parseable syntaxes and dictionaries that meet
our stated constraints is already hard enough. ;-)
> 3. I presume people are familiar with Bubblebabble? It doesn't solve all
> the problems, but does make bit strings seem less "dense".
BubbleBabble produces nonwords; as such it fails a basic requirement.
Making something merely look phonotactically valid isn't enough; it
has to be grammatically valid and composed entirely of known terms.
More information about the tor-dev