On 16 May (14:20:05), George Kadianakis wrote:
Hello!
4.1. A dive into general circuit construction sequences [CIRCCONSTRUCTION]
In this section we give an overview of how circuit construction looks like to a network or guard-level adversary. We use this knowledge to make the right padding machines that can make intro and rend circuits look like these general circuits.
In particular, most general Tor circuits used to surf the web or download directory information, start with the following 6-cell relay cell sequence (cells surrounded in [brackets] are outgoing, the others are incoming):
[EXTEND2] -> EXTENDED2 -> [EXTEND2] -> EXTENDED2 -> [BEGIN] -> CONNECTED
When this is done, the client has established a 3-hop circuit and also opened a stream to the other end. Usually after this comes a series of DATA cell that either fetches pages, establishes an SSL connection or fetches directory information:
[DATA] -> [DATA] -> DATA -> DATA
The above stream of 10 relay cells defines the grand majority of general circuits that come out of Tor browser during our testing, and it's what we are gonna use to make introduction and rednezvous circuits blend in.
Considering "either fetches pages,..." is in the description, I'm confused how only 2 data cells is the grand majority?
A simple "wget torproject.org" gives me an index.html of 16KB meaning at least 32 DATA cells. Even a directory fetch can't only be 2 data cells... ?
Is this that "there will always be a minimum of 2 data cell both ways" and thus you want to match that for HS client circuits and then send bunch of padding to match whatever comes next on a general circuit but "at least we'll have 10 cells like any other circuits" ?
5.1. Client-side introduction circuit hiding machines [INTRO_CIRC_HIDING]
These two machines are meant to hide client-side introduction circuits. The origin-side machine sits on the client and sends padding towards the introduction circuit, whereas the relay-side machine sits on the middle-hop (second hop of the circuit) and sends padding towards the client. The padding from the origin-side machine terminates at the middle-hop and does not get forwarded to the actual introduction point.
Both of these machines only get activated for introduction circuits, and only after an INTRODUCE1 cell has been sent out.
This means that before the machine gets activated our cell flow looks like this:
[EXTEND2] -> EXTENDED2 -> [EXTEND2] -> EXTENDED2 -> [EXTEND2] -> EXTENDED2 -> [INTRODUCE1]
Comparing the above with section [CIRCCONSTRUCTION], we see that the above cell sequence matches the one from general circuits up to the first 7 cells.
However, in normal introduction circuits this is followed by an INTRODUCE_ACK and then the circuit gets teared down, which does not match the sequence from [CIRCCONSTRUCTION].
Hence when our machine is used, after sending an [INTRODUCE1] cell, we also send a [PADDING_NEGOTIATE] cell, which gets answered by a PADDING_NEGOTIATED cell and an INTRODUCE_ACKED cell. This makes us match the [CIRCCONSTRUCTION] sequence up to the first 10 cells.
After that, we continue sending padding from the relay-side machine so as to fake a directory download, or an SSL connection setup. We also want to continue sending padding so that the connection stays up longer to destroy the "Duration of Activity" fingerprint.
I've looked at the implementation quickly and these DROP cells aren't accounted for in our circuit flow control which means that there will be a difference between a "real" DATA circuit and a circuit being sent PADDING in order to look like the former. And that will be the flow control cell(s) (SENDME) coming back from the end point that is receiving the data.
In other words, one circuit (the padded one) will have only a long stream of cells going in one direction and the second circuit (with legit data) will have that long stream but now and then a cell coming back down the circuit.
I believe this is quite the distinguisher between any circuit seeing much padding and one that doesn't? :S
To calculate the padding overhead, we see that the origin-side machine just sends a single [PADDING_NEGOATIATE] cell, wheras the origin-side machine
Typo here "PADDING_NEGOATIATE".
sends a PADDING_NEGOTIATED cell and between 7 to 10 DROP cells. This means that the average overhead of this machine is 11 padding cells.
In terms of WTF-PAD terminology, these machines have three states (START, OBF, END). They move from the START to OBF state when the first non-padding cell is received on the circuit, and they stay in the OBF state until all the padding gets depleted. The OBF state is controlled by a histogram which specifies the parameters described in the paragraphs above. After all the padding finishes, it moves to END state.
We also set a special WTF-PAD flag which keeps the circuit open even after the introduction is performed. In particular, with this feature the circuit will stay alive for the same durations as normal web circuits before they expire (usually 10 minutes).
I would make sure that the implentation here flags the circuit "Unusable" after an introduction since if a client just repicks it to introduce again (let say a second SOCKS connection with a different user/pass), then the intro point will immediately tear it down rendering this "keep open" feature a bit pointless :(.
Cheers! David