Hello TorDev,
Here, I'm reporting my recent progress on Stegotorus as well as answering some questions from past.
During past month, I mainly worked in "retransmit = false" mode trying to restore reliable browsing. I have resolved quite few bugs. Some of them where due to incomplete implementation of "retransmit = false" mode after implementing retransmission.
But the major one seems to be the flush_interval implementation: if server had transmitted data to the client, the client opens a new connection, "probably" in near future, to retrieve more data. The probability follows geometric distribution to simulate user browsing behavior.
This is OK most of the time. But when you transmit 1000s of packets, inevitably it happens that it samples delays such 40-60 seconds. Unfortunately, Firefox isn't patient enough to wait so long to receive the rest of the data and judge the connection as dead.
I changed this now that if it real data came from the server on previous connection, the client opens a new connection 100msec which I know isn't the point but I wanted to restore reliability. The better way might be to schedule opening few connections at a time (instead of one). In this way it is very improbable that all of them sample unacceptable delays.
I am going to debug retransmission in the next step. Now few questions, that I was putting off partialy cause I was tired talking and I wanted to do something practical before talking more:
From: Roger Dingledine arma@mit.edu on Sun, 13 Jan 2013 13:04:23 -0500:
- What's the overhead of putting your Tor traffic through each of the
steg modules?
Now the chopper is measuring the efficiency of the steg modules based on the following measure for each circuit:
(no of byte transmitted from upstream)/(no of bytes transmitted downstream ino order to transmit upstream bytes)
The summary of the results measured using speedtest.net:
|-------------+----------+---------+------+------| | | Mb/s | Mb/s | Efficiency | |-------------+----------+---------+------+------| | Steg module | Download | Upload | S->C | C->S | |-------------+----------+---------+------+------| | Direct | 19.52 | 2.08 | 1 | 1 | | nosteg | 12.98 | 2.26(!) | 0.98 | 0.98 | | http | 0.28 | 0.31 | 0.03 | 0.07 | | http_apache | 1.43 | 0.47 | 0.65 | 0.52 | |-------------+----------+---------+------+------|
(!) The faster upload of nosteg than the direct connection with Stegotorus is persistent and I can't figure out why. it might be a bug with speedtest.net algorithm or a their weird notion of speed.
If you are downloading, the server->client efficiency is like this:
|-------------+-----------------| | Steg module | S->C Efficiency | |-------------+-----------------| | Direct | 1 | | nosteg | 0.98 | | http | 0.10 | | http_apache | 0.65 | |-------------+-----------------|
The efficiency of course varies depending on the cover texts. These are measured for http module using the fake cover generated by zack and for http_apcahe using random javascripts that I have downloaded plus the pdf files I had on my computer.
The advantage of http_apache over http steg module is due to http_apache always chooses the most efficient cover possible verses http module which randomly chooses 10 candidates and then chooses the best one among them. The http_apach approach, however, isn't security-wise plausible. The best solution seems to be hybrid of the two with user having a say how efficient/secure wants it to be.
With all my cat and mouse with curl, it still manages to steal data from libevent and I lose 1 in 100 packets or so. In presence of retransmission this is not a big deal but I have to solve it in more fundamental way.
From: George Kadianakis desnacked@riseup.net on Wed, 30 Jan 2013 14:09:35 +0200:
The code monkey inside me doesn't see any code-quality-related tickets in that URL. stegotorus boasts 12k LoC of C++ and some of that code was written in haste.
You can divide the ST code into three major parts. Network, chopper and the steg modules. The network is basically, obfsproxy. For the chopper, beside Zack's obsession with overloading functions with the same name by changing type/no of parameters, I think the rest is a pretty good standard code and well commented (I suspect it is very similar to a typical TCP implementation). It lacks is a flow chart that explain what exactly happens when data is received or sent.
The ugly code are the steg modules, (partially because it is in C rather than C++), but then the steg modules are not innate part of Stegtorus and are meant to be re-implemented by interested party.
Having said that, Zack has mentioned that hardening the code is number 1 priority. Just because he mentioned it later, I haven't included it in the track list yet.
Looking at this from a deployment point of view, I feel much more relaxed and easygoing with deploying programs written in a high-level language (like Flashproxy or pyobfsproxy) than with deploying an unaudited C++ program with many bad code moments.
I suspect that python inefficiency will break Stegotorus completely. Now when I run Stegotorus it eats all CPU time. I know this is partially due to heavy logging and partially because I compile it with zero optimization and full debug but I can only imagine how it would look like if it was written in python.
There are lots of exact timing there in scope of msec. These are not important when you are running nosteg module which is basically obfsproxy with variable packet size. But when you turn to simulating http with multi-connections and constant negotiation of chopper with steg modules, I doubt if python is an option. Beside, the steg algorithms can get quite complicated.
I think that a trac ticket with a bit of research on how painful would be to write a deployable <high-level language>-port of stegotorus would be a good idea. I know that Zack is also interested in exploring this avenue.
I haven't looked into pyobfsproxy, but translating the chopper into python over pyobfsproxy doesn't seems to be an impossible task (you are looking at around 3000 LoC). But then you still need some steg module to get something more than pyobfsproxy (if we close our eyes on the preformance doubts I raised). Maybe, having a cython/c interface to steg module be a middle way compromise.
Wow I talk too much!
I'll spare everybody for now. Vmon