Hello everybody,
I was talking to Roger yesterday on the IRC, and he mentioned that "[S]tegotorus ... has a whole lot of problems". I have heard this many times in different forms by now (in Florence, The sponsor F discussion, etc). But I never saw these "lot of problems" are broken down in a list, so at least one can attack them one by one. It was always "lots of problem".
So, let this email be an appeal to all of you who have some problem, deficiency, architectural dissatisfaction, etc with Stegotorus, write back on this thread, so at least we have written account of these problems and dissatisfactions?
Then I'll turn them into the tickets as the very first step toward their solution. Also it can serve as written account of difficulties involving the making of any HTTP transport.
Thank you for your contribution in advance.
Cheers, Vmon
On Thu, Jan 10, 2013 at 12:18:17PM -0700, vmonmoonshine@gmail.com wrote:
I was talking to Roger yesterday on the IRC, and he mentioned that "[S]tegotorus ... has a whole lot of problems". I have heard this many times in different forms by now (in Florence, The sponsor F discussion, etc). But I never saw these "lot of problems" are broken down in a list, so at least one can attack them one by one. It was always "lots of problem".
So, let this email be an appeal to all of you who have some problem, deficiency, architectural dissatisfaction, etc with Stegotorus, write back on this thread, so at least we have written account of these problems and dissatisfactions?
Hi vmon,
Thanks for starting the thread.
My main issues with Stegotorus currently are more on the research (ok, maybe it's better called design) side.
1) Like the FTE paper (https://www.torproject.org/docs/pluggable-transports), the main contribution of Stegotorus is to provide a framework for plugging in steg modules. There are several example steg modules to choose from. The idea is that even if the ones they offer now aren't suitable, if you *had* a good one, you could just pop it in. The trouble is that I don't know of any good ones, and I think that's a harder problem than people think.
2) There also remains the issue of where you get your covertexts. While FTE says "we will build a brilliant regexp to characterize the format of the thing we hide our content in" (which has its own problems -- anything your regexp misses is a crack in the armor), Stegotorus says "we will build a big library of example things, by crawling the Internet, and then we'll hide our content in them". Where does this library come from? How does every Stegotorus bridge gets its own library? What happens when you reuse an item in your library? How do *clients* generate their own library? I think there are lots of ways to lose plausibility that haven't been explored.
2') One of the proposed ways for clients to generate their library of plausible covertexts is to basically wiretap the user and then replay her own traffic later with the Tor flow embedded in it. First there are messy engineering questions to tapping the user in a portable way; but I worry even more about the privacy issues introduced by repeating earlier traffic. Also, does it introduce new distinguishing attacks, like "look for variations on the same request"? I recognize that *not* using real client traffic also allows problems, e.g. "why is that user, who usually uses IE, sending a user-agent of chromium?"
3) What's the overhead of putting your Tor traffic through each of the steg modules? It's my understanding that some of the Stegotorus steg modules produce immense size overhead (since the cover-item is large, and the part of the cover-item you can hide your message in is relatively small). What are the numbers for the current steg modules that people are talking about / have built? Is there some correlation between inefficiency (overhead) and plausibility (indistinguishability)? What are the tradeoffs if we adopt some sort of "choose the covertext from your library that minimizes your overhead" policy?
4) And then the last issue isn't so much a design issue as a community or resource issue -- Zack is busy being a student, and further development by SRI is complicated by their pub review requirement (which alas applies to their code contributions too).
I think having some thorough explorations of 1-3 would put us in a much better position.
--Roger
On Thu, Jan 10, 2013 at 2:18 PM, vmonmoonshine@gmail.com wrote:
I was talking to Roger yesterday on the IRC, and he mentioned that "[S]tegotorus ... has a whole lot of problems". I have heard this many times in different forms by now (in Florence, The sponsor F discussion, etc). But I never saw these "lot of problems" are broken down in a list, so at least one can attack them one by one. It was always "lots of problem".
So, let this email be an appeal to all of you who have some problem, deficiency, architectural dissatisfaction, etc with Stegotorus, write back on this thread, so at least we have written account of these problems and dissatisfactions?
Sorry for taking so long to chime in here. From my perspective, there are four critical problems with Stegotorus as is. I consider the first three still my responsibility to see addressed, but as Roger mentioned, I'm busy being a student, and it's unlikely that I could get another publication out of fixing these problems, so it's hard for me to spend any significant time on it. SRI is still theoretically paying me to do it, but the time would have to come from somewhere, and if I start taking their money again, I go back under their collaboration restrictions. That said, in principle I would be *delighted* to work with anyone who was interested in taking on the bulk of the coding and debugging.
I don't take responsibility for the fourth problem because it's code I had basically no hand in, but it does remain a stumbling block, and if anyone wants to work on it, I am available at least for kibitzing.
1. The cryptography was never completed. In particular, the handshake protocol as described in the paper was never implemented. The program as-is uses "session keys" which are WRITTEN INTO THE SOURCE CODE. Obviously this renders it totally unfit for deployment outside laboratory test environments.
This is totally my fault; I should have forced SRI to let me finish it. But they considered the next problem (below) more time-critical, and I went along, and then I didn't finish that either :-/ I should also mention in this context that I am not all that happy with the cryptographic primitives I picked two years ago (especially the weird elliptic curve thing) and if I ever get back to this, the first thing I'm going to do is have a good hard look at better options (Curve25519, for instance, and I'd like to get away from OpenSSL entirely).
2. The chopper *protocol* has a known deadlock condition due to the lack of explicit acknowledgments. The *implementation* may still have bugs on top of that, and it's possible that sufficiently thorough testing will expose further protocol issues.
I started writing code to fix this but never finished it; there's a git branch (should be both on Github and torproject's git server). This is probably the easiest thing for people to help with right now. It is very easy to trigger the deadlocks even in lab-test conditions; basically all you have to do is try to start up Tor over ST with no cached directory information and a network connection that drops SYN packets from time to time.
(2a. There is a desperate need for a more thorough automated test suite for this program, one that can reliably reproduce the deadlock conditions without your having to set up a remote server and/or a special loopback interface that drops packets.)
3. The code is harder to work on than it ought to be due to my having gotten bogged down in minutiae halfway through a C-to-C++ conversion.
I am actually still hacking on this one in my copious free time, but you shouldn't expect anything in the near future. There are nontrivial yaks still to be shaved (the biggest one is a proper C++ binding for libevent). If it ever gets done I expect that the other problems will become much easier to fix; however, it might be faster overall to scrap the entire existing body of code and start over with Pyobfsproxy or similar.
4. The steganography is not only a joke in terms of actually hiding the secret messages, it's just plain badly written. You know how C from the 4BSD era tends to have a buffer overflow every ten lines or so? It's that bad.
It has been my intention for quite some time to deal with this by (a) writing a "steg" module that just feeds chopper output onto the wire but randomizes packet size and connection length, thus providing a small but strict improvement on obfsproxy, (b) summarily scrapping the existing HTTP and Embed modules, and (c) getting after vmon to finish what he started :) Unfortunately (a) is blocked on problems 2 and 3 above, and (b) is difficult to do without pissing off people at SRI; I'm normally happy to be Mr. Nasty when it comes to code quality, but I only have so much political capital to burn and it might be more important to burn it on e.g. dealing with the collaboration restrictions.
Speaking of, vmon, I know you had trouble getting libcurl and libevent to play nice -- did you ever look at libevhtp?
----
There is also a philosophical concern with proceeding with StegoTorus: there are at least three other projects with which it ought to merge, or at least cross-pollinate: FlashProxy, Kevin Dyer's FTE http://eprint.iacr.org/2012/494.pdf, and the shiny new FreeWave http://www.cs.utexas.edu/users/amir/papers/FreeWave.pdf. Coordinating that kind of thing *is* worth me spending time on with my student hat on, since it's more likely to result in papers than straight-up coding, but I still have a bunch of other projects which are higher priority.
zw