[tor-dev] All the problems about Stegotorous
vmonmoonshine at gmail.com
vmonmoonshine at gmail.com
Wed Jan 16 08:23:03 UTC 2013
Thank you very much for replying to my appeal. I hope that other people
who are critical of Stegotorus jump-in along the line.
> 1) Like the FTE paper...
I'm going to discuss your first point at the end because it is not
criticism of Stegotorus per se, but of http transport in general.
> 2) There also remains the issue of where you get your
> covertexts. While
> FTE says "we will build a brilliant regexp to characterize the format
> of the thing we hide our content in" (which has its own problems --
> anything your regexp misses is a crack in the armor), Stegotorus says
> "we will build a big library of example things, by crawling the
> and then we'll hide our content in them". Where does this library come
> from? How does every Stegotorus bridge gets its own library? What
> when you reuse an item in your library? How do *clients* generate
> own library? I think there are lots of ways to lose plausibility that
> haven't been explored.
> 2') One of the proposed ways for clients to generate their library of
> plausible covertexts is to basically wiretap the user and then replay
> her own traffic later with the Tor flow embedded in it. First there
> are messy engineering questions to tapping the user in a portable way;
> but I worry even more about the privacy issues introduced by repeating
> earlier traffic. Also, does it introduce new distinguishing attacks,
> like "look for variations on the same request"?
So one thing I implemented last summer to address 2 and 2' at least on
plausibility level is this:
- To setup, ST-server, you give it the address of an http server and
list of files stored on that server (alternatively you are running
Apache httpd on the same machine and ST-Server will go and make a
list of all files in Apache document_root).
- The client request the "http://ST-server/".
- In reply ST-server will send the hash of the list of files
- ST-client compare with the hash of its own file list (which can be
zero if the client has no list).
- If it matches they starts serving right away.
- If not, ST-server will send the list of file to the client.
(these all happens inside the steg layer of course).
Moreover, now not only the steg modules are pluggable, the payload
(covertext) loader is also pluggable, so if you don't like this way of
loading payloads you can write your payload loader and ask ST to use
> I recognize that *not*
> using real client traffic also allows problems, e.g. "why is that
> who usually uses IE, sending a user-agent of chromium?"
This can be address as a ticket. CURL already has the option to pretend
being other agents. You can just ask the user to browse
127.0.0.1:ST-Port and get the agent that way.
After all I don't think, this is as a critical issue. First most people
are sharing valid IPs. (I can't remember one place in Iran that I got a valid IP)
So the filter has no way to say if two people are using the connection,
or one person using two browsers. Let alone that, it is not uncommon
to use two browsers on a same computer.
The other thing is that HTTP GET request doesn't have lots of place to
maneuver so it is much easier to make it exactly look like a specific
Long term we can make a Firefox plug-in to interact with ST on the
client side. It shouldn't be hard because the payload provider is
pluggable now (which also take care of the communication on the client side).
> 3) What's the overhead of putting your Tor traffic through each of the
> steg modules? It's my understanding that some of the Stegotorus steg
> modules produce immense size overhead (since the cover-item is large,
> and the part of the cover-item you can hide your message in is
Well, this is a usual efficiency/security trade off question. What
would you tell somebody who says "but I browse the web much faster
without tor...", The answer is if you have idea to provide the same
service with higher efficiency then go for it. Between no access or
inefficient access, personally, prefer the latter. If Obfproxy can get
by, I'm not going to use ST, but what if not.
Practically speaking, I have watched the entire "behind enemy lines" of
you and Jacob using ST using a quite good connection, and beside 2 or 3
times that it stopped, I had a OK experience watching it.
I can come up with exact value of the overhead (traffic size in bytes
with st)/(traffic size without st) if you give me some time.
> What are the numbers for the current steg modules that people
> talking about / have built?
There's still 3/4 of them, pdf, swf, js and js in html. Adding steg module,
seems to be the least of the problems and the most out source-able
task. It is relatively easy to take an existing steg algorithm and feed it to
ST as a steg module.
>Is there some correlation between
> (overhead) and plausibility (indistinguishability)?
Obviously. You can choose the "no-steg" steg module and then you won't
have any overhead.
>What are the
> if we adopt some sort of "choose the covertext from your library that
> minimizes your overhead" policy?
Stegotorus, already has something like that, it consider 10 random
candidates and between them it chooses the one with the least overhead
that offer the capacity requested by the chopper.
It is matter of few lines of codes to say only use xx% of least overhead
covertext (size/capacity which payload loader is aware of both).
Other possible approach to this is to ask user for a security factor (between
0-1) and based on that the steg module knows how much of change is
allowed to be done to a covertext.
Writing a simple classifier (using scikit-learn e.g.) which computes simple
stat features to test the effectiveness of the security factor (telling
http from st) isn't a terribly ambitious task either.
> 4) And then the last issue isn't so much a design issue as a community
> resource issue -- Zack is busy being a student,
I think by now, I have made change to all part of the code beside the
crypto module and I know the code quite well. Though I have the same
problem as Zack, but I'm expecting to graduate soon. But if we tell every
potential contributor don't touch ST it will burn, then the community won't grow.
Anyway, I hope putting all above points on the trac as tickets, probably is the
first step. If ST get a page with a short explanation and a link to its
ticket (like Flashproxy does) maybe attracts more attention.
> and further
> by SRI is complicated by their pub review requirement (which alas
> to their code contributions too).
I don't know about this "pub review requirement". Could you give a ref
to read more about it?
And now item 1.
> 1) Like the FTE paper
> (https://www.torproject.org/docs/pluggable-transports), the main
> contribution of Stegotorus is to provide a framework for plugging in
> modules. There are several example steg modules to choose from. The
> is that even if the ones they offer now aren't suitable, if you *had*
> a good one, you could just pop it in. The trouble is that I don't know
> of any good ones, and I think that's a harder problem than people
I think the actual question is that "are we going to provide http
transport or not?" I think the situation that "I'm going to close many ports
beside 80 and X and Y, then do a simple DPI so people don't divert
their https on 80" is a very likely scenario.
The problem that you think is "a harder problem than people
think" is the secure steg problem, which we are not trying to solve
here. We are banking on the fact that a accurate multilevel
stat analysis is too expensive for a DPI that needs to judges GBs of
data per second.
In expense of efficiency you can make the steg quite hard to
be detected. Suppose you only use jpeg images and you only use the LSB
of the cosine coefficients. I don't see any easy way to detect the
steg for a DPI on the go.
> I think having some thorough explorations of 1-3 would put us in a
> better position.
In nutshell, ST doesn't stop you from writing better steg/payload
loaders so that is for 2,2' and 3 while providing a working version of
them right now. Item 1, is asking "if a http transport is worth it", I
think the answer is yes.
Thanks again, I'm going to make some tickets for more concrete aspects of
above point and share them with the list.
> Message: 2
> Date: Sun, 13 Jan 2013 23:47:44 -0700
> From: Bin Wang <binwang.cu at gmail.com>
> To: tor-dev at lists.torproject.org
> Subject: [tor-dev] Multiple Tor
> <CAJHCcVbzQcXkQZurORupLLEJBm9cJATkKqYOYkUr=km-bo34pA at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
> Dear Guys,
> I am brand new to TOR and I feel like multiple TORs should be
> The multiple tors I mentioned here are not only multiple instances,
> also using different proxy ports for each, like what has been done
> I am trying to get started with 4 tors. However, the tutorial applies
> Arch Linux and I am using a headless EC2 ubuntu 64bits. It is really a
> going through the differences between Arch and Ubuntu. And here I am
> wondering is there anyone could offer some help to implement my idea
> 1. Four TORs running at the same time each with an individual port,
> or polipo or whatever are ok once it works.
> 8118 <- Privoxy <- TOR <- 9050
> 8129 <- Privoxy <- TOR <- 9150
> 8230 <- Privoxy <- TOR <- 9250
> 8321 <- Privoxy <- TOR <- 9350
> 2. In this way, if I try to return the ip of 127.0.0.1:8118, 8129,
> 8230 and
> 8321, they should return four different ips, which indicates there are
> different tors running at the same time. Then, a few minutes later,
> again, all four of them should have a new ips again.
> I know my simple 'dream' could come true in many ways, however... I am
> only new to tor, but even also to bash and python... That is why I
> here and see whether some of you could light me up.
> These links might be useful:
> Bin Wang
> -------------- next part --------------
> An HTML attachment was scrubbed...
> tor-dev mailing list
> tor-dev at lists.torproject.org
> End of tor-dev Digest, Vol 24, Issue 8
More information about the tor-dev