[tor-dev] New BridgeDB Distributor (was: Re: New BridgeDB Distributor (Twitter/SocialDistributor intersections, etc.))

Kostas Jakeliunas kostas at jakeliunas.com
Tue Apr 22 14:05:12 UTC 2014

Hash: SHA512

With isis' and sysrqb's permission, moving the new BridgeDB
Distributor (and maybe general bridgedb distributor architecture
discussion) thread onto tor-dev at .

On 04/15/2014 10:30 PM, Kostas Jakeliunas wrote:
> On 03/29/2014 10:08 AM, Matthew Finkel wrote:
>> (I look the liberty of making this readable again :))
>> On Fri, Mar 28, 2014 at 08:00:17PM +0200, Kostas Jakeliunas
>> wrote:
>>> isis wrote:
>>>> Kostas Jakeliunas transcribed 7.9K bytes:
>>>>> Hey isis,
>>>>> wfn here. [...]
>>>> Hi!
>> Howdy!
>> I'm super excited to hear you're interested in working on this! 
>> [...]
>>>>> [...] a couple of questions (more like inconcrete musings)
>>>>> [...]:
>>>>> Would you personally think that incorporating "some" ideas
>>>>> from #7520[1] ("Design and implement a social distributor
>>>>> for BridgeDB") would be within the scope of a ~three+ month
>>>>> project? The way I see it, if a twitter (or, say, xmpp+otr
>>>>> as mentioned by you/others on IRC) distributor were to be 
>>>>> planned, it would either need to
>>>>> - incorporate some form of churn rate control / Sybil
>>>>> attack prevention, via e.g. recaptcha (I see that twitter
>>>>> direct (=personal) messages can include images; they'll
>>>>> probably be served by one of twitter media CDNs (would need
>>>>> to look things up), but it's probably safe to assume that
>>>>> as long as twitter itself is not blocked, those CDNs won't
>>>>> be, either);
>>>> Yes, this stuff is already built, and wouldn't be too hard
>>>> to incorporate. However, as I'm sure you already understand,
>>>> there is no Proof of Work system which actually works for
>>>> users while keeping adversaries out.
>>> For sure, we always have to keep this in mind. Hopefully
>>> there's a compromise that kinda-works, and eventually, given
>>> some more metrics/diagnostic info intersected with OONI
>>> hopefully being able to say which bridges don't work from which
>>> countries, it'll be possible to actually carry out tests in a
>>> kind-of-scientific/not-blind-guessing way..
>> At this point I just assume our adversary will always have more 
>> resources than us no matter which mechanism we use. More people,
>> more compute power/time, more money. At this point I think we
>> only have two things that they don't. We have more bridges and
>> more love for people. Leveraging this is...not easy, however. :(
>> POW is useful in some cases, for example, to prevent an asshole
>> from crawling bridgedb so that they can add all bridges to a
>> blacklist. When dealing with state-level adversaries I agree with
>> isis that they're of little use.
> Agree.
>>>>> - or take an idea from the social distributor in #7520,
>>>>> namely/probably, implement some form of token system.
>>>> This is not very doable in 6 weeks. It also, sadly, requires
>>>> the DB backend work (which I'll be doing over the next three
>>>> months, but might take more time).
>>> Aha, understood, yes. So basically, ideally I'd write code that
>>> could *later on* be easily extendable in relevant ways. But no
>>> tokens for now.
>> Ideally this sounds like a good idea, however I'm not sure we (or
>> at least I) have a good handle on what bridgedb will look like in
>> 6-12 months. It's undergoing a lot of change right now. Don't
>> interpret this as saying this is a bad idea because the more
>> abstract and extensible you make this distributor the more useful
>> it will be. I'm just a little worried about writing something for
>> the future. Perhaps there's a good way to design and plan for
>> this, though.
> Yeah, understood. As I understand it, isis is changing some things
> in bridgedb (bridgedb.Distributor, etc) right now / these days.
> For now, the idea is to have a thing that works that is more or
> less completely decoupled from the bridgedb codebase. If we do this
> right, it will hopefully be relatively easy to then integrate it in
> a way that will make sense at that point in time (e.g. as part of 
> bridgedb.Distributor, *or* as a client to a core RESTful 
> distributor/api/service that gives bridges to other 'third-party' 
> distributors (see below.))
>>>>> It might be possible to have some simplistic token system
>>>>> with pre-chosen seed nodes, etc. Of course, security and
>>>>> privacy implications ahoy - first and foremost, this would
>>>>> result in more than zero places/people knowing t he entire
>>>>> social graph, unless your and other people's ideas (the
>>>>> whole Pandora box of; I should attempt an honest read of
>>>>> rBridge, et al.; have only skimmed as of now) re: oblivious
>>>>> transfer, etc. were incorporated. Here it becomes quite
>>>>> difficult to define short-ish term deliverables of course.
>>>>> I know that you did quite a lot of research on the
>>>>> private/secure social distributor idea.
>>>> Really, you don't want to get into this stuff. Or do, but
>>>> don't do it for GSoC. I've spent the past year painfully
>>>> writing proofs to correct the erro rs in that paper, and
>>>> discovered some major problems for anonymity in old 
>>>> "tried-and-true" cryptographic primitives in the process.
>>>> This is a HUGE project.
>>> Sounds insanely intense, in both a good and a bad way! It's
>>> definitely interesting, but this is a whole other level for
>>> sure. Ok. (Btw, interesting re: proofs for crypto/whatever
>>> concepts. Intense stuff.)
>> rBridge is definitely a large project. This is not to say your
>> help would not be appreciated [...] so I'm sure you can help,
>> however this is much larger than GSoC.
> For sure, OK! (and yes, it sounds like an interesting avenue / way
> of getting into zero knowledge madness etc etc. Will see how it
> goes..)
>>>>> But I wonder if it wouldn't make sense to attempt a
>>>>> simplistic token system, as well as (possibly) some
>>>>> hopefully-not-too-evil incarnation of recaptcha, to
>>>>> - have a system that would actually do its job; - have a
>>>>> system that would be easily extendible into cryptomadness
>>>>> later on / when the time comes; - have a framework for more
>>>>> complex social distributors later on. The latter is maybe
>>>>> because only doing a naive twitter distributor (via PMs) 
>>>>> project is not enough, in the sense of gsoc scope (or I'd
>>>>> like to attempt something a bit more ambitious anyway. But
>>>>> here of course dragon territory starts, with many slippery
>>>>> slopes.) This would hopefully be very useful f or later
>>>>> work, with hopefully quite a bit of code reuse being
>>>>> possible (vs. twitter distributor that would use your
>>>>> twisted stuff + additional code which would likely not be
>>>>> beneficial for other distributors/projects.) 
>>>>> Architecturally, this token-based distributor would be a
>>>>> generic/parent class, with the particular twitter
>>>>> distributor inheriting from it. Hopefully this would all
>>>>> result in some clean, reusable code.
>>>>> Ideally, the token-distributor would work in a (generic)
>>>>> way that could be used in IRC, etc etc. I think a
>>>>> simplistic version of it would still prove useful (assuming
>>>>> we'd actually be OK with taking the responsibility of 
>>>>> maintaining and knowing the social graph, and so on.) This
>>>>> would be very nice indeed.
>>>> I know that you want this, and I know that you want it for
>>>> good reasons. Bu t I refuse to have access to something which
>>>> might potentially get me shot in s ome countries.
>>> Yeah, I see the problem here. Ok.
>> +1
> Sorry for the kinda-delayed reply!
>>> Cool.
>>> So right now I'm thinking just doing a simplistic 
>>> twitter-direct-message-based bot. I do believe some churn 
>>> control / semi-working-PoW thing is needed. Reusing (large)
>>> parts of bridgedb's recaptcha makes sense to me.
>> I don't think this is a crazy idea.
> I wrote a simplistic twitter bot that pretends to give bridges to
> people who ask:
> https://github.com/wfn/twidibot
> (tweepy (a python module) is used to interact with the twitter
> API. Twitter has two main APIs: 'streaming' and the 'RESTful' API.
> The streaming API just gives you event data about things that
> you're interested in (so e.g. 'someone started following me',
> 'someone sent me a message.') The restful twitter api is for
> actually doing things (like sending messages) and getting event
> info on a per-query basis (we don't need the latter.))
> As of now, the bot is running under this account:
> https://twitter.com/wfntestacct
> Try following it; it should send you a direct message, to which you
> can reply with e.g.
> "get bridges" or "get me some bridges nao!" or "get obfs3
> scramblesuit fte bridges"
> The bridge data returned is of course beyond-bogus (code is really
> just a placeholder), but what is I think good is that I wrote the
> thing in a way that can be easily extended - the
> bridge-getting-process is abstracted away.
> The main bot stuff is at 
> https://github.com/wfn/twidibot/blob/master/twidibot/twitter_bot.py
>  The stub 'bridge-getter' is at 
> https://github.com/wfn/twidibot/blob/master/twidibot/bridge_getter.py
(take a look at the comments at the top maybe[3])
> Basically I wanted to see if there'd be problems with the twitter
> api, and so on. I should try to somehow benchmark the thing, to see
> if anything breaks (on e.g. twitter's end[4]) when there are many
> requests, etc. (the code itself is kind-of-not-ready for that, but
> it should be easy to fix this up; but I'm also worried about
> twitter limits and that sort of thing - stuff that in the end may
> be hard to control on our end; hence the PoC, etc.)
> I also wanted to have some skeleton code that will support future 
> abstraction/extension (i.e. hopefully it's not an ugly script.)

[3]: the comments (pretty much all there is there) make it sound as if
going the way they say we should go is the right way. The comments
should probably be interpreted as a flaky proposal/ideas at best!

[4]: By "problems on twitter's end" I of course meant twitter possibly
rate-limiting the bot, or even blacklisting it somehow due to high
direct message load (maybe lots of direct messages => anomalous
behaviour, but then again, probably not.)

>> It also seems like we're combining the email and http distributor
>> with this, not that it is bad.
> ..so the whole plan in terms of bridgedb abstraction / 
> bridge-getting-mechanism abstraction is not clear, I suppose.
> Here's what asn wrote on that google melange project page[2] (as a 
> comment to the proposal):
> asn April 4, 2014, 11:09 p.m.:
>> Don't mind me too much, but it would be great if we could have a 
>> simple RESTful HTTPS distributor for BridgeDB, before writing
>> exotic distributors like Twitter. Such a distributor would expose
>> a simple REST API that clients (like tor-launcher etc.) could use
>> to fetch CAPTCHAs/bridges.
>> A RESTful distributor seems easier and it would also give us a
>> fair idea of the different methods that a BridgeDB distributor
>> needs to implement. It would serve as a basic distributor that
>> can be used as a skeleton to build more complicated ones.
>> just 2 cents
> I then spoke with him briefly on irc. Basically, I'm up for writing
> a RESTful bridge api/distributor, too (or focusing on it instead);
> but maybe it makes sense to continue with the twitter bot (and have
> the bridge-getting-mechanism be abstract enough to be completely
> turned over / extended / changed easily, etc.), and to continue
> thinking about the RESTful bridge api. Focusing on a twitter bot
> for now will probably lead to clearer deliverables and results. But
> we can develop things while thinking about related things in
> parallel.
> I'm sure isis has some ideas here, too (e.g. how feasible it is to 
> implement a restful bridge distributor soon.) But I agree that it
> would be nice to have a core distributor that could be used by
> tor-launcher, and by other distributors (e.g. this twitter bot, as
> well as future distributors.)
>> I do wonder if there is a better rate-limiting mechanism than a
>> captcha that we can use. I suspect there is something, but it
>> won't be much better, in reality. But, also, I wonder if we want
>> to use a POW at all. Maybe only rate-limit a handle by time
>> period, similar to how we handle emails? I don't know what is
>> best. I think this can be decided in the coming months, though.
> Yeah, for now I'll probably flesh out a
> rate-limiting-by-time-period thingie - simplistic, but this is OK.
> There are some nuances, e.g. this would require the twitter bot to
> remember every use who was given bridges, at least for a short
> while; even if this would not be persisted / would only be in
> memory, it's a delicate thing I suppose, security and privacy wise.
> etc.
> But yeah, I'm not sure if a captcha PoW scheme is the way to go,
> either. We can continue thinking about it, and meanwhile have
> something simplistic.
>>> Or would any of you prefer a different kind of distributor? IRC
>>> was mentioned (churn control more difficult), as well as
>>> XMPP+OTR (the latter would be more difficult to do for sure,
>>> lots of things to integrate.) Whatsapp, too. Twitter, of
>>> course, sounds the most easy/easily-deliverable. I can try to
>>> come up with future goals/tasks would it turn out to be too
>>> easy (ha! i'm sure i'm bound to stumble into unforeseen
>>> problems..)
>> I think you should have enough time to complete the twitter
>> distibutor and, at least, start working on another one. Twitter
>> seems like a good place to start. Some people would like an
>> XMPP+OTR distibutor, but that will be a large and time consuming
>> project. Maybe a chat protocol that is popular in AsiaPAC is a
>> good second choice.
> Yeah! Sounds good. Federated-chat-systems (like XMPP)-based
> distributor would sure be nice. WhatsApp distributor would surely
> be very useful, and would be pretty hard to censor, if only in the 
> legal-repercussion-sense (lots of collateral damage so to speak.)
> Let's think about this.
>>> Basically, if there are other ideas afloat around bridgedb that
>>> are doable / can be incorporated into this, let me know (maybe
>>> you've been wishing to do something not too difficult but
>>> simply do not have time?)
>> There are a few outstanding tickets that we really should 
>> fix/implement/do, so maybe look through them and see if any seem 
>> interesting to you?
>>> As far as the twitter bot idea is concerned, it's pretty much 
>>> straightforward, I guess. I can't think of a proper way to
>>> subdivide the twitter distributor into further hashrings; i.e.
>>> there'd probably need to be one single hashring for twitter,
>>> and that's it. Of course the twitter-handlespace is larger
>>> than, say, IPv4-space. I assume this is not a problem in and of
>>> itself (there's a hashring for IPv6 (though not yet quite
>>> functional), as I understand, etc.) But maybe there are some
>>> nasty nuances to be found.
>> We could split the hashring into n partitions and then choose a 
>> partition based on some property of the handle. I don't know if
>> there is an advantage to this, though. Probably not considering
>> it would be trivial and cheap to create a new handle that is
>> mapped to another partition. Let us know if you think of
>> something :)
> Haven't yet. :) for now, I think having a normal hashring, and
> having a mechanism of giving only very few bridges (that remain the
> same, unless new brides inserted in the neighbourhood) makes
> sense.
>>>>> Will do some thinking, finally have time now.
>>>>> [1]: https://trac.torproject.org/projects/tor/ticket/7520
>>>>> --
>>>>> Kostas / wfn
>>>>> 0x0e5dce45 @ pgp.mit.edu
>>>> -- =E2=99=A5=E2=92=B6 isis agora lovecruft
>> As an aside, I'm not usually not this synical. It was a long day
>> at work and it made me grumpy. But in any case, I'm really happy
>> you decided to look at making one of these!
>> All the best, Matt
> Good stuff. :) (fwiw, nothing of this sounded cynical, at all.
> Though everything depends on definitions, ha! 
> https://en.wikipedia.org/wiki/Cynicism_(philosophy) )
> [2]: 
> http://www.google-melange.com/gsoc/proposal/review/student/google/gsoc2014/wfn/5818821692620800
The TL;DR would be

  * working on a twitter-bridgedb-bot, PoC is at

  * feel free to try and break https://twitter.com/wfntestacct (flood
it with messages! follow, un-follow, re-follow, try to crash it if
you'd like!)

  * just found out attaching images to direct messages over API
directly is not possible: https://dev.twitter.com/discussions/24116
(whoops! And no updates to API doc elsewhere); still possible to link
to images, etc. But kind of sad nevertheless.

  * will implement a generic rate control mechanism. Will either use
requests-per-time-period (this would require the distributor to
remember some state about users for a while, which is a somewhat
delicate thing), (and/)or text-based challenge-response (which can
later on be replaced by something more sophisticated.)

  * as of now, planning not to stop with a twitter distributor. Would
sure be nice to also have an XMPP-based one, too.

  * RESTful BridgeDB Distributor / bridge API discussion is welcome.

Did I miss something, or garble things up?
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/


More information about the tor-dev mailing list