[tor-dev] [GSoC] BridgeDB Twitter Distributor report

13 Jul 2014

      Hi all,

preferring existing code over shiny code and being mad late, I

  * (re)wrote a simple but working churn control mechanism[1], which uses

  * a general persistable storage system:

    * in particular, the bot now has a central storage controller
which takes care of storage handlers which, in turn, may be of
different varieties. Each variety knows how to handle its own kind of
storage containers (simple objects with data as attributes). Some of
them may be persistable, others necessarily ephemeral (wipe data on
close);
    * right now we only make use of simple
pickle-dump-to-file-and-gzip persistable storage; we use it for churn
control and for challenge responses; everything is self-contained so
to speak;
    * we hash the user twitter handles (unique usernames / screen
names) and round up bridges-last-given-at timestamps;
    * we handle bot shutdown by catching the appropriate signal (then
properly closing down the twitter stream listener and asking the
storage controller to close down the handlers);
    * we use the storage system in the core bot via a general "bot
state" object (which is itself oblivious to how storage is actually
implemented);

  * wrote a simple and generic challenge-response system[2] (which
makes use of the persistent storage);
    * instead of doing something very smart, we use a general CR
system which takes care of particular challenge-responses; the general
CR is usable as-is; the particular CR objects can be easily subclassed
(and that's what we do now);
    * the current mock/bogus CR system that is in place (for testing
etc.) is a naive text-based question-answer CR, which asks the users
to add the number of characters in their twitter username to a given
verbal/English-word number;
    * I should now finish up with ``BridgeRequest``s, which are the
proper way to handle bridge requests in the bot while doing
challenge-responses (the current interaction between the core bot and
the CR system will lead / has been leading nowhere);
    * also, there's a question to be had whether the cached (and
hashed) answers to CRs should be persisted to storage (if bot gets
shutdown while some challenges are pending) in the first place.

I've been unable to find[3] or to come up with a concept of a
user-friendly *text-based* CR that would stand against any kind of
thief who is able to create lots of Twitter users and to write
twenty-line scripts solving any text-based challenges/questions
presented. Either it will to be a difficult problem that will be
easier solved by a computer than by a human (hence unfeasible
general-UX-wise), or it will be so "symmetrical" in the sense that one
only has to view the source (if even that) to come up with a script
trivially solving the challenge presented.

Hence I've been slowly moving on with the
captcha-over-twitter-direct-messages idea, which is not pretty, but
which would at least ensure that we don't give up bridges more easily
than in, say, the current IPDistributor.

[1]: https://github.com/wfn/twidibot/compare/master...churn_rewrite
[2]: https://github.com/wfn/twidibot/compare/churn_rewrite...simple_cr2

[3] it's quite hard to find anything of use in the "chatroom problem"
/ "text-based challenge response" area. Basically, it would be great
to have a "reverse Turing test"[4] that is not about captcha/OCR. I
realize this is in itself a very ambitious topic.
[4]: some context on early CAPTCHAs / precursors (have been trying to
familiarize myself with the general area),
http://www2.parc.com/istl/projects/captcha/docs/pessimalprint.pdf

--

Kostas.

0x0e5dce45 @ pgp.mit.edu

[tor-dev] [GSoC] BridgeDB Twitter Distributor report

Kostas Jakeliunas