On Thu, Jan 10, 2013 at 12:18:17PM -0700, vmonmoonshine@gmail.com wrote:
I was talking to Roger yesterday on the IRC, and he mentioned that "[S]tegotorus ... has a whole lot of problems". I have heard this many times in different forms by now (in Florence, The sponsor F discussion, etc). But I never saw these "lot of problems" are broken down in a list, so at least one can attack them one by one. It was always "lots of problem".
So, let this email be an appeal to all of you who have some problem, deficiency, architectural dissatisfaction, etc with Stegotorus, write back on this thread, so at least we have written account of these problems and dissatisfactions?
Hi vmon,
Thanks for starting the thread.
My main issues with Stegotorus currently are more on the research (ok, maybe it's better called design) side.
1) Like the FTE paper (https://www.torproject.org/docs/pluggable-transports), the main contribution of Stegotorus is to provide a framework for plugging in steg modules. There are several example steg modules to choose from. The idea is that even if the ones they offer now aren't suitable, if you *had* a good one, you could just pop it in. The trouble is that I don't know of any good ones, and I think that's a harder problem than people think.
2) There also remains the issue of where you get your covertexts. While FTE says "we will build a brilliant regexp to characterize the format of the thing we hide our content in" (which has its own problems -- anything your regexp misses is a crack in the armor), Stegotorus says "we will build a big library of example things, by crawling the Internet, and then we'll hide our content in them". Where does this library come from? How does every Stegotorus bridge gets its own library? What happens when you reuse an item in your library? How do *clients* generate their own library? I think there are lots of ways to lose plausibility that haven't been explored.
2') One of the proposed ways for clients to generate their library of plausible covertexts is to basically wiretap the user and then replay her own traffic later with the Tor flow embedded in it. First there are messy engineering questions to tapping the user in a portable way; but I worry even more about the privacy issues introduced by repeating earlier traffic. Also, does it introduce new distinguishing attacks, like "look for variations on the same request"? I recognize that *not* using real client traffic also allows problems, e.g. "why is that user, who usually uses IE, sending a user-agent of chromium?"
3) What's the overhead of putting your Tor traffic through each of the steg modules? It's my understanding that some of the Stegotorus steg modules produce immense size overhead (since the cover-item is large, and the part of the cover-item you can hide your message in is relatively small). What are the numbers for the current steg modules that people are talking about / have built? Is there some correlation between inefficiency (overhead) and plausibility (indistinguishability)? What are the tradeoffs if we adopt some sort of "choose the covertext from your library that minimizes your overhead" policy?
4) And then the last issue isn't so much a design issue as a community or resource issue -- Zack is busy being a student, and further development by SRI is complicated by their pub review requirement (which alas applies to their code contributions too).
I think having some thorough explorations of 1-3 would put us in a much better position.
--Roger