[tor-dev] [Stegotorus] Stegotorus stuff

Sun Mar 24 08:56:00 UTC 2013

Hello TorDev,

Here, I'm reporting my recent progress on Stegotorus as well as
answering some questions from past.

During past month, I mainly worked in "retransmit = false" mode trying
to restore reliable browsing. I have resolved quite few bugs. Some of
them where due to incomplete implementation of "retransmit = false" mode
after implementing retransmission.

But the major one seems to be the flush_interval implementation: if
server had transmitted data to the client, the client opens a new
connection, "probably" in near future, to retrieve more data. The
probability follows geometric distribution to simulate user browsing
behavior. 

This is OK most of the time. But when you transmit 1000s of
packets, inevitably it happens that it samples delays such 40-60
seconds. Unfortunately, Firefox isn't patient enough to wait so
long to receive the rest of the data and judge the connection as dead. 

I changed this now that if it real data came from the server on previous
connection, the client opens a new connection 100msec which I know isn't
the point but I wanted to restore reliability. The better way might be to
schedule opening few connections at a time (instead of one). In this way
it is very improbable that all of them sample unacceptable delays.

I am going to debug retransmission in the next step. Now few questions,
that I was putting off partialy cause I was tired talking and I wanted
to do something practical before talking more:

From: Roger Dingledine <arma at mit.edu> on Sun, 13 Jan 2013 13:04:23
-0500:  
> 3) What's the overhead of putting your Tor traffic through each of the
> steg modules? 

Now the chopper is measuring the efficiency of the steg modules based on
the following measure for each circuit:

(no of byte transmitted from upstream)/(no of bytes transmitted downstream
ino order to transmit upstream bytes)

The summary of the results measured using speedtest.net:

|-------------+----------+---------+------+------|
|             |     Mb/s |    Mb/s | Efficiency  |
|-------------+----------+---------+------+------|
| Steg module | Download |  Upload | S->C | C->S |
|-------------+----------+---------+------+------|
| Direct      |    19.52 |    2.08 |    1 |    1 |
| nosteg      |    12.98 | 2.26(!) | 0.98 | 0.98 |
| http        |     0.28 |    0.31 | 0.03 | 0.07 |
| http_apache |     1.43 |    0.47 | 0.65 | 0.52 |
|-------------+----------+---------+------+------|

(!) The faster upload of nosteg than the direct connection with
Stegotorus is persistent and I can't figure out why. it might be a bug with
speedtest.net algorithm or a their weird notion of speed.

If you are downloading, the server->client efficiency is like this:

|-------------+-----------------|
| Steg module | S->C Efficiency |
|-------------+-----------------|
| Direct      |               1 |
| nosteg      |            0.98 |
| http        |            0.10 |
| http_apache |            0.65 |
|-------------+-----------------|

The efficiency of course varies depending on the cover texts. These are
measured for http module using the fake cover generated by zack and for
http_apcahe using random javascripts that I have downloaded plus the pdf
files I had on my computer.

The advantage of http_apache over http steg module is due to http_apache
always chooses the most efficient cover possible  verses http module
which randomly chooses 10 candidates and then chooses the best one among
them. The http_apach approach, however, isn't security-wise
plausible. The best solution seems to be hybrid of the two with user
having a say how efficient/secure wants it to be. 

With all my cat and mouse with curl, it still manages to steal data from
libevent and I lose 1 in 100 packets or so. In presence of
retransmission this is not a big deal but I have to solve it in more
fundamental way.

From: George Kadianakis <desnacked at riseup.net> on Wed, 30 Jan 2013
14:09:35 +0200: 
> The code monkey inside me doesn't see any code-quality-related tickets
> in that URL. stegotorus boasts 12k LoC of C++ and some of that code
> was written in haste.

You can divide the ST code into three major parts. Network, chopper and
the steg modules. The network is basically, obfsproxy. For the chopper,
beside Zack's obsession with overloading functions with the same name
by changing type/no of parameters, I think the rest is a pretty good
standard code and well commented (I suspect it is very similar to a
typical TCP implementation). It lacks is a flow chart that explain what
exactly happens when data is received or sent. 

The ugly code are the steg modules, (partially because it is in C rather
than C++), but then the steg modules are not innate part of Stegtorus
and are meant to be re-implemented by interested party.

Having said that, Zack has mentioned that hardening the code is number 1
priority. Just because he mentioned it later, I haven't included it in
the track list yet.

> Looking at this from a deployment point of view,
> I feel much more relaxed and easygoing with deploying programs written
> in a high-level language (like Flashproxy or pyobfsproxy) than with
> deploying an unaudited C++ program with many bad code moments.

I suspect that python inefficiency will break Stegotorus completely. Now
when I run Stegotorus it eats all CPU time. I know this is partially due
to heavy logging and partially because I compile it with zero
optimization and full debug but I can only imagine how it would look
like if it was written in python.

There are lots of exact timing there in scope of msec. These are not
important when you are running nosteg module which is basically
obfsproxy with variable packet size. But when you turn to simulating http
with multi-connections and constant negotiation of chopper with steg
modules, I doubt if python is an option. Beside, the steg algorithms can
get quite complicated.

> I think that a trac ticket with a bit of research on how painful would
> be to write a deployable <high-level language>-port of stegotorus
> would be a good idea. I know that Zack is also interested in exploring
> this avenue.

I haven't looked into pyobfsproxy, but translating the chopper into python
over pyobfsproxy doesn't seems to be an impossible task (you are looking
at around 3000 LoC). But then you still need some steg module to get
something more than pyobfsproxy (if we close our eyes on the preformance
doubts I raised). Maybe, having a cython/c interface to steg module be a
middle way compromise.

Wow I talk too much!

I'll spare everybody for now. 
Vmon