Hi all,
preferring existing code over shiny code and being mad late, I
* (re)wrote a simple but working churn control mechanism[1], which uses
* a general persistable storage system:
* in particular, the bot now has a central storage controller which takes care of storage handlers which, in turn, may be of different varieties. Each variety knows how to handle its own kind of storage containers (simple objects with data as attributes). Some of them may be persistable, others necessarily ephemeral (wipe data on close); * right now we only make use of simple pickle-dump-to-file-and-gzip persistable storage; we use it for churn control and for challenge responses; everything is self-contained so to speak; * we hash the user twitter handles (unique usernames / screen names) and round up bridges-last-given-at timestamps; * we handle bot shutdown by catching the appropriate signal (then properly closing down the twitter stream listener and asking the storage controller to close down the handlers); * we use the storage system in the core bot via a general "bot state" object (which is itself oblivious to how storage is actually implemented);
* wrote a simple and generic challenge-response system[2] (which makes use of the persistent storage); * instead of doing something very smart, we use a general CR system which takes care of particular challenge-responses; the general CR is usable as-is; the particular CR objects can be easily subclassed (and that's what we do now); * the current mock/bogus CR system that is in place (for testing etc.) is a naive text-based question-answer CR, which asks the users to add the number of characters in their twitter username to a given verbal/English-word number; * I should now finish up with ``BridgeRequest``s, which are the proper way to handle bridge requests in the bot while doing challenge-responses (the current interaction between the core bot and the CR system will lead / has been leading nowhere); * also, there's a question to be had whether the cached (and hashed) answers to CRs should be persisted to storage (if bot gets shutdown while some challenges are pending) in the first place.
I've been unable to find[3] or to come up with a concept of a user-friendly *text-based* CR that would stand against any kind of thief who is able to create lots of Twitter users and to write twenty-line scripts solving any text-based challenges/questions presented. Either it will to be a difficult problem that will be easier solved by a computer than by a human (hence unfeasible general-UX-wise), or it will be so "symmetrical" in the sense that one only has to view the source (if even that) to come up with a script trivially solving the challenge presented.
Hence I've been slowly moving on with the captcha-over-twitter-direct-messages idea, which is not pretty, but which would at least ensure that we don't give up bridges more easily than in, say, the current IPDistributor.
[1]: https://github.com/wfn/twidibot/compare/master...churn_rewrite [2]: https://github.com/wfn/twidibot/compare/churn_rewrite...simple_cr2
[3] it's quite hard to find anything of use in the "chatroom problem" / "text-based challenge response" area. Basically, it would be great to have a "reverse Turing test"[4] that is not about captcha/OCR. I realize this is in itself a very ambitious topic. [4]: some context on early CAPTCHAs / precursors (have been trying to familiarize myself with the general area), http://www2.parc.com/istl/projects/captcha/docs/pessimalprint.pdf
--
Kostas.
0x0e5dce45 @ pgp.mit.edu