[tor-dev] Prototype Primitives for Website Traffic Fingerprinting defenses

Mike Perry mikeperry at torproject.org
Tue Jul 29 23:57:07 UTC 2014


I've inserted a couple of corrections and clarifications below. I'm
leaving original text fully quoted for the archive.

Mike Perry:
> Hello all,
> 
> What follows is a summary of the primitives that Marc Juarez aims to
> implement for his Google Summer of Code project on prototyping defenses
> for Website Traffic Fingerprinting and follow-on research.
> 
> After a review of Tamaraw[1], Adaptive Padding[2], CS-BuFLO[3], and the
> Supersequence[4] work, as well as some discussion at the Tor dev
> meeting, we came up with this list of padding message primitives.
> 
> These functions are meant to be sent as command cells from one endpoint
> to the other. Some are bidirectional commands, others can only be sent
> in one direction (client to relay, or relay to client).
> 
> For now, they will be implemented as ad-hoc messages between two
> modified obfsproxy (called wfpadtools) instances. The source code for
> this fork lives here for the moment:
> https://bitbucket.org/mjuarezm/obfsproxy-wfpadtools/
> 
> Long term, I am thinking that all of these should be specified as if
> they were their own RELAY_* cell type from tor-spec.txt:
> https://gitweb.torproject.org/torspec.git/blob/HEAD:/tor-spec.txt#l1215
> 
> This means that padding would be a circuit-level property, and it would
> be possible to send it to and from any hop in the circuit, due to
> leaky-pipe topology (using the "recognized" field). In a world of very
> cheap and excessive middle and Guard node bandwidth, we would run this
> padding to the middle node. For the wfpadtools prototype, it will
> obviously only cover the first hop.
>
> Here's the list of primitives, broken down by research paper:
> 
> Generic Messages (common to all defenses):
>  RELAY_IGNORE()
>    Description:
>      Simple fixed-length (CELL_SIZE) padding cell.
>    Direction:
>      Bidirectional (client to relay, relay to client)
>    Parameters:
>      None
> 
>  RELAY_SEND_PADDING(N,t)
>    Description:
>      Send the requested number of padding cells in response.
>    Direction:
>      Client to relay
>    Parameters:
>      N:
>        Number of padding cells to send in response to this cell
>      t:
>        microseconds delay before sending
> 
>  RELAY_APP_HINT(session_id, status)
>    Description:
>      A hint from the application layer for session start/stop
>    Direction:
>      Client to relay
>    Parameters:
>      session_id:
>        A string identifying the session (ie keyed hash of
>        url bar domain)
>      status:
>        "Start" or "Stop", indicating session start and end.
> 
> 
> Adaptive Padding Messages:
>  RELAY_BURST_HISTOGRAM(histogram[], labels_ms[], remove_toks, fuzz, when)
>    Description:
>      Specifies a histogram that encodes a delay distribution
>      representing the probability of sending a single padding packet
>      after a given delay in response to either an upstream cell, or a
>      client-originating cell.
>    Direction:
>      Client to relay
>    Parameters:
>      histogram[]:
>        Contains delay distribution of sending an IGNORE packet
>        after sending a real packet
>      labels_ms[]:
>        millisecond labels for the bins (with "Infinity"/"Ignore" bin to
>        allow encoding the probability of not sending any padding packet in
>        response to this packet).

In both of the Adaptive Padding primitives, a considerably more compact
form is possible for bin labeling. The original Adaptive Padding
literature used a 2^K exponential spacing between these bins, but didn't
rigorously analyze the choice. If experimentation finds this is indeed
reasonable/optimal, we can send a single parameter W instead of a
labels_ms. W would be bin width in milliseconds, and L is the length of
the histogram[] array. The labels would then be computed as W*2^K with K
ranging from 0 to L-2, with the addition of a zero-delay bin label.

For the prototype though, I think we should allow the labeling to remain
a free-form array, for maximum flexibility in research. We may decide we
want some form of polynomial spacing instead of exponential here, for
example.

>      remove_toks:
>        If true, follow Adaptive Padding token removal rules.
>        If false, histograms are immutable.
>      fuzz:
>        If true, randomize the delay uniformly between bin labels.
>        If false, use bin labels as exact delay values.

"fuzz" here might be better called "interpolate" in both this and the
below primitive, since linear interpolation is what I mean here.

I suppose we could imagine smoother types of interpolation, like
quadratic, cubic, etc, but I don't expect the actual interpolation
mechanism to matter a whole lot. I expect it to be most useful for
preventing the adversary from subtracting the padding due to its arrival
at a fixed delay quantity after another packet.

>      when:
>        If set to "receive", this histogram governs the probably of sending
>        a padding packet after some delay in response to a packet
>        originating from. If set to "send", this histogram governs padding
                         ^
                         | insert "the client" after that "from"

>        packets that are transmitted after a packet arrives from upstream
>        (the middle node). In both cases, the padding packet is sent in the
>        direction of the client.
> 
>  RELAY_GAP_HISTOGRAM(histogram[], labels_ms[], remove_toks, fuzz, when)
>    Description:
>      Specifies a histogram that encodes a delay distribution
>      representing the probability of sending a single additional padding
>      packet after a given delay following a padding packet that
>      originated at this hop. 
>    Direction:
>      Client to relay
>    Parameters:
>      histogram[]:
>        Contains delay distribution of sending an IGNORE packet after sending
>        an IGNORE packet
>      labels_ms[]:
>        millisecond labels for the bins (with "Infinity"/"Ignore" bin to
>        allow encoding the probability of not sending any padding packet in
>        response to this packet).
>      remove_toks:
>        If true, follow Adaptive Padding token removal rules.
>        If false, histograms are immutable.
>      fuzz:
>        If true, randomize the delay uniformly between bin labels.
>        If false, use bin labels as exact delay values.
>
>      when:
>        If "receive", this histogram applies to locally-inserted padding
>        packets that were initially sent in response to client-originated
>        data.  If "send", this histogram applies to packets sent in response
>        to locally-inserted padding packets sent in response to upstream
>        data. Note that this means that implementations must maintain this
>        metadata as internal state as the system transitions from
>        BURST_HISTOGRAM initiated padding into GAP_HISTOGRAM initiated
>        padding. In both cases, the padding packet is sent in the direction
>        of the client.
>
> CS-BuFLO Messages:
>  RELAY_TOTAL_PAD(session_id, t):
>    Description:
>      Requests that endpoint pad all batches to nearest 2^K cells total
>      or until APP_HINT(session_id, stop)
>    Direction:
>      Client to relay
>    Parameters:
>      session_id:
>        The session ID from APP_HINT()
>      t:
>        The number of microseconds to wait between cells to consider them
>        part of the same batch.
> 
>  RELAY_PAYLOAD_PAD(session_id, t, R):
>    XXX: The paper was not clear enough for me to understand this primitive
> 
> 
> Tamaraw:
>  RELAY_BATCH_PAD(session_id, L, t)
>    Description:
>      Requests that endpoint pad all batches of cells to L cells total
>      or until APP_HINT(session_id, stop)
>    Direction:
>      Client to relay
>    Parameters:
>      session_id:
>        The session ID from APP_HINT()
>      L:
>        The multiple of cells to pad to.
>      t:
>        The number of microseconds to wait between cells to consider them
>        part of the same batch.
> 
> 
> 
> Comments, questions, and suggestions are welcome! In particular, I'm not
> sure I got either the CS-Buflo or the tamaraw primitives completely
> correct.
>
> 
> 
> 
> 1. http://cacr.uwaterloo.ca/techreports/2013/cacr2013-30.pdf
> 2. http://freehaven.net/anonbib/cache/ShWa-Timing06.pdf
> 3. http://arxiv.org/abs/1401.6022
> 4. http://cacr.uwaterloo.ca/techreports/2014/cacr2014-05.pdf
> 
> 
> -- 
> Mike Perry



> _______________________________________________
> tor-dev mailing list
> tor-dev at lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


-- 
Mike Perry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Digital signature
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20140729/02ddc443/attachment.sig>


More information about the tor-dev mailing list