Re: [tor-dev] [GSoC] BridgeDB Twitter Distributor report

11 Aug 2014

      Hi Kostas.

I've taken the liberty to hack your code for the GSoC project I'm working
on: Revamp GetTor. One of the ideas in mind was to give links for the
bundles via Twitter. Thankfully, your code made things a lot easier for me!
:-P I've just made some changes to see if I could accomplish what I was
looking for, nothing big. One of the thins I did was to add a class
Messages in twitter_bot.py [0] to handle messages in various languages with
i18n, I hope you find it useful in case you have considered translated
messages too.

That said, I'd like to discuss some issues about creating a twitter bot
which I think it affects the projects we're working on.

1) According to twitter's "Automation rules and best practices" guide [1],
in the section of "Automated following and un-following", if I understand
right, applications using twidibot will be suspended, as the current
behavior is to automatically follow and un-follow users. Similar issues are
mentioned on "Following rules and best practices" [2] and "Twitter rules"
[3] Are you aware of this? If so, what's the plan?

2) I'm not very familiar with bridges, but for what I understand, one of
the reasons to use obfuscated bridges is to hide the fact that you're using
Tor. With the current behaviour of twidibot (both for BridgeDB and GetTor
twitter distributors), a malicious user could follow the twitter accounts
and learn what users the bot started following and then un-following, thus
identifying all users that asked for bridges/bundles. If you're using an
account with your real name, this could get you in trouble in places where
using software to avoid censorship is prohibited. I think users should be
warned about this. Have you considered this case, or am I just too paranoid?

I'll be glad to hear what you and others think about 1) and 2).

[0] https://github.com/ileiva/twidibot/blob/master/twidibot/twitter_bot.py
[1] https://support.twitter.com/articles/76915#
[2]
http://support.twitter.com/articles/68916-following-rules-and-best-practices
[3] http://support.twitter.com/articles/18311-the-twitter-rules

2014-07-13 14:00 GMT-04:00 Kostas Jakeliunas <kostas@jakeliunas.com>:
...
Hi all,
preferring existing code over shiny code and being mad late, I
* (re)wrote a simple but working churn control mechanism[1], which uses
* a general persistable storage system:
* in particular, the bot now has a central storage controller
which takes care of storage handlers which, in turn, may be of
different varieties. Each variety knows how to handle its own kind of
storage containers (simple objects with data as attributes). Some of
them may be persistable, others necessarily ephemeral (wipe data on
close);
    * right now we only make use of simple
pickle-dump-to-file-and-gzip persistable storage; we use it for churn
control and for challenge responses; everything is self-contained so
to speak;
    * we hash the user twitter handles (unique usernames / screen
names) and round up bridges-last-given-at timestamps;
    * we handle bot shutdown by catching the appropriate signal (then
properly closing down the twitter stream listener and asking the
storage controller to close down the handlers);
    * we use the storage system in the core bot via a general "bot
state" object (which is itself oblivious to how storage is actually
implemented);
* wrote a simple and generic challenge-response system[2] (which
makes use of the persistent storage);
    * instead of doing something very smart, we use a general CR
system which takes care of particular challenge-responses; the general
CR is usable as-is; the particular CR objects can be easily subclassed
(and that's what we do now);
    * the current mock/bogus CR system that is in place (for testing
etc.) is a naive text-based question-answer CR, which asks the users
to add the number of characters in their twitter username to a given
verbal/English-word number;
    * I should now finish up with ``BridgeRequest``s, which are the
proper way to handle bridge requests in the bot while doing
challenge-responses (the current interaction between the core bot and
the CR system will lead / has been leading nowhere);
    * also, there's a question to be had whether the cached (and
hashed) answers to CRs should be persisted to storage (if bot gets
shutdown while some challenges are pending) in the first place.
I've been unable to find[3] or to come up with a concept of a
user-friendly *text-based* CR that would stand against any kind of
thief who is able to create lots of Twitter users and to write
twenty-line scripts solving any text-based challenges/questions
presented. Either it will to be a difficult problem that will be
easier solved by a computer than by a human (hence unfeasible
general-UX-wise), or it will be so "symmetrical" in the sense that one
only has to view the source (if even that) to come up with a script
trivially solving the challenge presented.
Hence I've been slowly moving on with the
captcha-over-twitter-direct-messages idea, which is not pretty, but
which would at least ensure that we don't give up bridges more easily
than in, say, the current IPDistributor.
[1]: https://github.com/wfn/twidibot/compare/master...churn_rewrite
[2]: https://github.com/wfn/twidibot/compare/churn_rewrite...simple_cr2
[3] it's quite hard to find anything of use in the "chatroom problem"
/ "text-based challenge response" area. Basically, it would be great
to have a "reverse Turing test"[4] that is not about captcha/OCR. I
realize this is in itself a very ambitious topic.
[4]: some context on early CAPTCHAs / precursors (have been trying to
familiarize myself with the general area),
http://www2.parc.com/istl/projects/captcha/docs/pessimalprint.pdf
--
Kostas.
0x0e5dce45 @ pgp.mit.edu
_______________________________________________
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
-- 
israel

Re: [tor-dev] [GSoC] BridgeDB Twitter Distributor report

Israel Leiva