[tor-talk] Fwd: Developing an open-source, user-friendly tool for avoiding stylometry; seeking input from community

Alden Page pagea at allegheny.edu
Mon Dec 15 04:14:37 UTC 2014


It has been shown that it is possible to "fingerprint" a person using
their writing style (preference for certain words, spelling mistakes,
eccentricities in grammar, etc.) thereby using this fingerprint to
determine whether or not a person authored an anonymous document to a
high degree of statistical certainty. The process of
analyzing/fingerprinting a person's writing style is called stylometry
It has been shown that it is possible to perform stylometry on sample
sizes of up to 100,000 authors with a surprising degree of success. I
hope that you will all agree that this poses a significant threat to
the preservation of the anonymity of Tor users. Please see the
following document for more information on the threat stylometry poses
to privacy, freedom of speech, and, more specifically, Tor users:
http://www.cs.berkeley.edu/~dawnsong/papers/2012%20On%20the%20Feasibility%20of%20Internet-Scale%20Author%20Identification.pdf

Several members of the online privacy community have expressed
interest in a tool that helps circumvent stylometry, as seen on the
Tails bug tracker and in a few threads on tor-talk. There is a tool
called Anonymouth that sets out to do this by pointing out stylometric
"giveaways" in input text, but it is quite unstable, and aimed at
researchers rather than your everyday end-user, making it quite
difficult to use. For this reason, I am attempting to replicate the
functionality of Anonymouth in a stripped down, easy-to-use Python
application, which I believe may someday be suitable for prepackaging
in the Tails OS and inclusion in Debian repositories.

Development will begin in mid-January 2015 at the latest; source code
will be made available under the MIT license on May 1st 2015. As much
as I would like to reach it earlier, I am developing this software as
part of my senior thesis at my college, and must not accept outside
code contributions until I have turned in my project for grading. It
is my hope that I and any other interested developers will continue to
work on this project long after May 1st.

In the spirit of meeting the needs of the privacy community, I am
interested in hearing what potential users might have to say about the
design of such a tool. As of now, I envision this tool as a GUI
desktop application that provides suggestions for preserving anonymity
much like Anonymouth, although this will be targeted at Tails/Tor
users rather than researchers. I hope to at least partially automate
the anonymization process as well, perhaps automatically substituting
certain words with synonyms or slightly adjusting the structure of a
sentence in order to get rid of glaring indicators of writing style.

Please contact pagea (at) allegheny.edu if you would like to be
notified once the source code is available. For a (very rough) idea of
what I hope to accomplish with this project, please see a draft of my
research proposal here:
https://pdf.yt/d/HsAyoE0VGCYsnVxU

I look forward to reading your comments.

Cheers,
Alden Page


More information about the tor-talk mailing list