Re: [tor-dev] Proposal 302: Hiding onion service clients using WTF-PAD

21 May 2019

      On 16 May (14:20:05), George Kadianakis wrote:

Hello!
...
4.1. A dive into general circuit construction sequences [CIRCCONSTRUCTION]
In this section we give an overview of how circuit construction looks like
   to a network or guard-level adversary. We use this knowledge to make the
   right padding machines that can make intro and rend circuits look like these
   general circuits.
In particular, most general Tor circuits used to surf the web or download
   directory information, start with the following 6-cell relay cell sequence (cells
   surrounded in [brackets] are outgoing, the others are incoming):
[EXTEND2] -> EXTENDED2 -> [EXTEND2] -> EXTENDED2 -> [BEGIN] -> CONNECTED
When this is done, the client has established a 3-hop circuit and also
   opened a stream to the other end. Usually after this comes a series of DATA
   cell that either fetches pages, establishes an SSL connection or fetches
   directory information:
[DATA] -> [DATA] -> DATA -> DATA
The above stream of 10 relay cells defines the grand majority of general
   circuits that come out of Tor browser during our testing, and it's what we
   are gonna use to make introduction and rednezvous circuits blend in.
Considering "either fetches pages,..." is in the description, I'm confused how
only 2 data cells is the grand majority?

A simple "wget torproject.org" gives me an index.html of 16KB meaning at least
32 DATA cells. Even a directory fetch can't only be 2 data cells... ?

Is this that "there will always be a minimum of 2 data cell both ways" and
thus you want to match that for HS client circuits and then send bunch of
padding to match whatever comes next on a general circuit but "at least we'll
have 10 cells like any other circuits" ?
...
5.1. Client-side introduction circuit hiding machines [INTRO_CIRC_HIDING]
These two machines are meant to hide client-side introduction circuits. The
   origin-side machine sits on the client and sends padding towards the
   introduction circuit, whereas the relay-side machine sits on the middle-hop
   (second hop of the circuit) and sends padding towards the client. The
   padding from the origin-side machine terminates at the middle-hop and does
   not get forwarded to the actual introduction point.
Both of these machines only get activated for introduction circuits, and
   only after an INTRODUCE1 cell has been sent out.
This means that before the machine gets activated our cell flow looks like this:
[EXTEND2] -> EXTENDED2 -> [EXTEND2] -> EXTENDED2 -> [EXTEND2] -> EXTENDED2 -> [INTRODUCE1]
Comparing the above with section [CIRCCONSTRUCTION], we see that the above
   cell sequence matches the one from general circuits up to the first 7 cells.
However, in normal introduction circuits this is followed by an
   INTRODUCE_ACK and then the circuit gets teared down, which does not match
   the sequence from [CIRCCONSTRUCTION].
Hence when our machine is used, after sending an [INTRODUCE1] cell, we also
   send a [PADDING_NEGOTIATE] cell, which gets answered by a PADDING_NEGOTIATED
   cell and an INTRODUCE_ACKED cell. This makes us match the [CIRCCONSTRUCTION]
   sequence up to the first 10 cells.
After that, we continue sending padding from the relay-side machine so as to
   fake a directory download, or an SSL connection setup. We also want to
   continue sending padding so that the connection stays up longer to destroy
   the "Duration of Activity" fingerprint.
I've looked at the implementation quickly and these DROP cells aren't
accounted for in our circuit flow control which means that there will be a
difference between a "real" DATA circuit and a circuit being sent PADDING in
order to look like the former. And that will be the flow control cell(s)
(SENDME) coming back from the end point that is receiving the data.

In other words, one circuit (the padded one) will have only a long stream of
cells going in one direction and the second circuit (with legit data) will
have that long stream but now and then a cell coming back down the circuit.

I believe this is quite the distinguisher between any circuit seeing much
padding and one that doesn't? :S
...
To calculate the padding overhead, we see that the origin-side machine just
   sends a single [PADDING_NEGOATIATE] cell, wheras the origin-side machine
Typo here "PADDING_NEGOATIATE".
...
sends a PADDING_NEGOTIATED cell and between 7 to 10 DROP cells. This means
   that the average overhead of this machine is 11 padding cells.
In terms of WTF-PAD terminology, these machines have three states (START,
   OBF, END). They move from the START to OBF state when the first
   non-padding cell is received on the circuit, and they stay in the OBF
   state until all the padding gets depleted. The OBF state is controlled by
   a histogram which specifies the parameters described in the paragraphs
   above. After all the padding finishes, it moves to END state.
We also set a special WTF-PAD flag which keeps the circuit open even after
   the introduction is performed. In particular, with this feature the circuit
   will stay alive for the same durations as normal web circuits before they
   expire (usually 10 minutes).
I would make sure that the implentation here flags the circuit "Unusable"
after an introduction since if a client just repicks it to introduce again
(let say a second SOCKS connection with a different user/pass), then the intro
point will immediately tear it down rendering this "keep open" feature a bit
pointless :(.

Cheers!
David

-- 
RvcA5t4gf8ZVGWkeAH8q2YX6s5pRuadzbdJisXSBhfA=

Re: [tor-dev] Proposal 302: Hiding onion service clients using WTF-PAD

David Goulet