Hi folks,
I was trying to help a user in #tor to get their ftp server running behind an onion, and rediscovering the ftp protocol's weird separation between the control channel and the file transfer channels, and I realized it could be interesting in the obfs4 / "unclassifiable protocols" context.
To recap, when you ftp a file, you connect to port 21, and tell it you want to download a file. In the modern era, clients and servers use "passive mode" by default, where the server opens a high-numbered port, and you connect to that port and it dumps the file on you.
I might be mistaken (please tell me if I am), but I believe there are no protocol headers or preamble or handshake or anything to the download. You just connect and the bytes start flowing.
If this is so, and if ftp were still popular, then there are a bunch of high-numbered-port connections which will be hard to classify by protocol because they are simply a file, on the network.
Of course, many files have structure of their own, including some "header"-like preface that e.g. says it's a zip file.
By that reasoning, a variant of obfsproxy that wrapped Tor traffic to look like a password-protected zip file could give it many other things to blend with on a large network like China's backbone.
(Compression is good but not enough, because DPI engines already know how to uncompress a zipped flow to look inside it. So we need some sort of encryption or password or the like too.)
By that reasoning also, it might be interesting to separate the two directional flows in an obfs connection -- i.e. so there's a "download" flow, and a separate "upload" flow. This approach will surely look weird in some contexts (two flows rather than one, and no ftp control connection, gotcha), but maybe in other contexts it will have many friends -- I'm thinking network backbones where there are many flows, many users are natted, and it's expensive to try to tie together state between permutations of flows.
If you google for 'why is ftp still used' you find a bunch of articles lamenting that people won't move away from it, and especially that large orgs won't move away from it. Maybe some of those large orgs are reasoning that if they secure the files contents themselves, then the transfer protocol doesn't matter so much. That scenario would play well into our goals of having a bunch of high-entropy files being passed around with no protocol headers.
Hm, --Roger