Good point about joining the swarm. This is a part of the design that i'm not confident about, it's definitely questionable.
Suppose a non-bitsmuggler peer joins the swarm. If he starts torrenting the file, he will get a correct copy (no checksum fails on the pieces) of it because all bitsmuggler parties hold a full correct copy of it to begin with.
ODDITIES
1. File content. However, as described in the docs, at the moment those files are just random data, generated with a pseudo-random generator using an integer seed. So an entropy analysis of the the file may be a give-away (the fact that it doesn't look like anything really) and also the fact that right now the percentage of the file that is available is fixed (1/2).
a solution here is to use real existing files, but this involves pre downloading them (fetch pirates of the Caribbean 3 first and then torrent it again with your bit-smuggler server). how much of the file is available can be randomized.
2. Contact files. Another aspect is that the server now works by advertising a set of so called contact files. those contact files are bittorrent files that a client needs to start downloading to tunnel a bit-smuggler connection through them. They are partially completed files (1/2 of the pieces are there). Once they are depleted (all downloadable pieces are downloaded) the file is removed and a new partial copy is placed in, to allow for new peer connections on that contact file to have plenty of data flowing back and forth.
This aspect that the server keeps refreshing its files is odd.
These files are part of the server descriptor.
Another way of doing it could be deciding the contact files dynamically. You could maybe have a small exchange at the very beginning between the server and the client through some other channel and steg some data in there.
Possible ways:
* the client can make a DHT request for the server, and the server would reply with a set of nodes, but the data in the reply contains data about what contact file the client should use, so not a correct DHT query response.
* ue the bifield message of bittorrent to do a request-response sequence between the bitsmuggler server and client about what contact file to use and then switch to it.
3.Upload slots per torrent = 1 . the client and server instruct their bittorrent clients to upload to a single peer. basically i'm restricting swarms to a size of 2 to load balance things. if i disable this it would just mean the file gets depleated faster.
So actually, given this setting, an outsider joining a swarm where a bit-smuggler server and client live would not actually be able to download.
ABOUT BITTORRENT
On bittorrent: it's wire protocol is not very complicated. traditionally it runs over a TCP connection, starts with a handshake (containing infohash ( the ID ) of the file transferred and the ) and continues with length prefixed messages which are mostly piece requests and piece data messages (the ones where i embedd the payload) + some control messages to control the data flow (choke, unchoke, interested)
The handshake you can probably yank it out with a regex easily at IP level without any packet reconstruction. the rest i guess you need to parse the stream at application level to make sense of it.
Having the infohash means you can fetch the torrent file.
Spec is here
POSSIBLY BETTER DESIGN
the bitsmuggler server joins aribtrary existing swarms on the internets and informs the client somehow which swarms to look for it. once they find each other, they start exchanging data.
Tech limitations
A crappy thing about utorrent's interface is that it doesn't allow you to tell it to look for a certain peer (so bitsmuggler client tells its bittorrent process to just look for the bitsmuggler's bittorrent process in a big swarm). So who you connect to in a swarm is arbitrary.
A solution could be to intentionally join seeder-only swarms that just sit there.
If you are to not break the file integrity for the other peers you better have a copy of that file of whose swarm you are joining ahead of time. but this is necessary with the current design as well.
Any suggestions/comments are very welcome. IT seems to me that bittorrent is very hard to tame compared to let's say HTTP as a cover between a server and a client, so this might be an impairing limitation for the project.