Hi Kostas.
I've taken the liberty to hack your code for the GSoC project I'm working on: Revamp GetTor. One of the ideas in mind was to give links for the bundles via Twitter. Thankfully, your code made things a lot easier for me! :-P I've just made some changes to see if I could accomplish what I was looking for, nothing big. One of the thins I did was to add a class Messages in twitter_bot.py [0] to handle messages in various languages with i18n, I hope you find it useful in case you have considered translated messages too.
That said, I'd like to discuss some issues about creating a twitter bot which I think it affects the projects we're working on.
1) According to twitter's "Automation rules and best practices" guide [1], in the section of "Automated following and un-following", if I understand right, applications using twidibot will be suspended, as the current behavior is to automatically follow and un-follow users. Similar issues are mentioned on "Following rules and best practices" [2] and "Twitter rules" [3] Are you aware of this? If so, what's the plan?
2) I'm not very familiar with bridges, but for what I understand, one of the reasons to use obfuscated bridges is to hide the fact that you're using Tor. With the current behaviour of twidibot (both for BridgeDB and GetTor twitter distributors), a malicious user could follow the twitter accounts and learn what users the bot started following and then un-following, thus identifying all users that asked for bridges/bundles. If you're using an account with your real name, this could get you in trouble in places where using software to avoid censorship is prohibited. I think users should be warned about this. Have you considered this case, or am I just too paranoid?
I'll be glad to hear what you and others think about 1) and 2).
[0] https://github.com/ileiva/twidibot/blob/master/twidibot/twitter_bot.py [1] https://support.twitter.com/articles/76915# [2] http://support.twitter.com/articles/68916-following-rules-and-best-practices [3] http://support.twitter.com/articles/18311-the-twitter-rules
2014-07-13 14:00 GMT-04:00 Kostas Jakeliunas kostas@jakeliunas.com:
Hi all,
preferring existing code over shiny code and being mad late, I
(re)wrote a simple but working churn control mechanism[1], which uses
a general persistable storage system:
- in particular, the bot now has a central storage controller
which takes care of storage handlers which, in turn, may be of different varieties. Each variety knows how to handle its own kind of storage containers (simple objects with data as attributes). Some of them may be persistable, others necessarily ephemeral (wipe data on close); * right now we only make use of simple pickle-dump-to-file-and-gzip persistable storage; we use it for churn control and for challenge responses; everything is self-contained so to speak; * we hash the user twitter handles (unique usernames / screen names) and round up bridges-last-given-at timestamps; * we handle bot shutdown by catching the appropriate signal (then properly closing down the twitter stream listener and asking the storage controller to close down the handlers); * we use the storage system in the core bot via a general "bot state" object (which is itself oblivious to how storage is actually implemented);
- wrote a simple and generic challenge-response system[2] (which
makes use of the persistent storage); * instead of doing something very smart, we use a general CR system which takes care of particular challenge-responses; the general CR is usable as-is; the particular CR objects can be easily subclassed (and that's what we do now); * the current mock/bogus CR system that is in place (for testing etc.) is a naive text-based question-answer CR, which asks the users to add the number of characters in their twitter username to a given verbal/English-word number; * I should now finish up with ``BridgeRequest``s, which are the proper way to handle bridge requests in the bot while doing challenge-responses (the current interaction between the core bot and the CR system will lead / has been leading nowhere); * also, there's a question to be had whether the cached (and hashed) answers to CRs should be persisted to storage (if bot gets shutdown while some challenges are pending) in the first place.
I've been unable to find[3] or to come up with a concept of a user-friendly *text-based* CR that would stand against any kind of thief who is able to create lots of Twitter users and to write twenty-line scripts solving any text-based challenges/questions presented. Either it will to be a difficult problem that will be easier solved by a computer than by a human (hence unfeasible general-UX-wise), or it will be so "symmetrical" in the sense that one only has to view the source (if even that) to come up with a script trivially solving the challenge presented.
Hence I've been slowly moving on with the captcha-over-twitter-direct-messages idea, which is not pretty, but which would at least ensure that we don't give up bridges more easily than in, say, the current IPDistributor.
[3] it's quite hard to find anything of use in the "chatroom problem" / "text-based challenge response" area. Basically, it would be great to have a "reverse Turing test"[4] that is not about captcha/OCR. I realize this is in itself a very ambitious topic. [4]: some context on early CAPTCHAs / precursors (have been trying to familiarize myself with the general area), http://www2.parc.com/istl/projects/captcha/docs/pessimalprint.pdf
--
Kostas.
0x0e5dce45 @ pgp.mit.edu _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev