[tor-dev] Hash Visualizations to Protect Against Onion Phishing

Thu Aug 20 18:39:14 UTC 2015

> On 21 Aug 2015, at 02:56, Jesse V <kernelcorn at riseup.net> wrote:
> 
> 
>> Jacek Wielemborek <d33tah at gmail.com> writes:
>> 
>>> George Kadianakis pisze:
>>>> Some real UX research needs to be done here, before we decide something terrible.
>>> 
>>> Just curious, has anybody seen any cognitive studies on the SSH
>>> randomart visualisation? I always found them impossible to remember.
>>> Perhaps adding a bit more color could help...
>> Hm. Indeed.
>> 
>> I can remember the general shape and edges of my SSH server's key, but not any
>> details.
>> 
>> I doubt I would remember the randomart of like 10 onion websites, especially if
>> I didn't visit them regularly. But maybe I would remember the randomart of my
>> webmail better than my SSH server's.
> 
> The main issue is that there's always going to be a tradeoff between memorability and security. It's difficult to cram sufficient entropy into a visualization and expect people to remember it. I agree that coloring schemes or perhaps faces stand the best chance of memorability, but they are difficult to deploy. I wonder if a simpler scheme is sufficient.

Visual schemes are only helpful to users who have the appropriate level of visual ability or processing:
* as has already been mentioned, colouring schemes are not as useful to the colourblind;
* facial recognition schemes are useless to the face-blind (including many autistic people);
and any visual scheme would need to have a text alternative for screen readers or other tools used by the visually impaired.

> I suggest using a word-bank to generate a series of words. First, take the .onion address or the hidden service public identity key (basically the same thing for 224) and run it through scrypt or similar algorithm. Then, based on the output, select a series of words from the dictionary. Present the series of words in George's mockup (https://people.torproject.org/~asn/tbb_randomart/randomart_mockup.png) in lieu of the art. It's not a new idea to use a word-list for this purpose: I recall reading a paper suggesting to use a word-list to encode .onion addresses rather than base32. The scheme has also been deployed by websites; Gfycat, for example, shows a series of words in its URL to provide an identifier to user images. It's also successful in practice: everyone in the /r/globaloffensive subreddit recognizes the DelayedAutisticGuppy reference before they even open the gif.
> 
> People remember random information best if it's grouped, but usually the maximum group size is about 4, which is why phone numbers are split with delimiters. To make it simple, I suggest showing one row of four words. Each character in a .onion address has 32 possible combinations. If you used a word-bank of 1,024 words, you cover two characters per word. If your dictionary consists of 32,768 words, you can capture three at a time. Assuming this latter case, if four words are used, you can cover 12 characters, or 60 bits. If you combine that with the number of characters in the address that users intrinsically check subconsciously, it's extremely difficult for an attacker to match all of them. The Shallot README indicates that it would take nearly 10 millenia to find a 12-character match, and this of course does not take into account a round of scrypt before the word-list is used. Everyone remembers "correct horse battery staple", right?

If we choose a list of English words, is that going to cause recognition issues for people who are non-native English speakers, or whose native script is a non-Latin script?
(We could test this out.)

Tim (teor)