Michael Rogers:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
On 25/11/14 12:45, George Kadianakis wrote:
Yes, integrating low-latency with high-latency anonymity is a very interesting probleml. Unfortunately, I haven't had any time to think about it.
For people who want to think about it there is the "Blending different latency traffic with alpha-mixing" paper. Roger mentioned that one of the big challenges of making the paper usable with Tor, is switching from the message-based approach to stream-based.
Other potential papers are "Stop-and-Go-MIX" by Kesdogan et al. and "Garbled Routing (GR): A generic framework towards unification of anonymous communication systems" by Madani et al. But I haven't looked into them at all...
Two of these papers were also mentioned in the guardian-dev thread, so I guess we're thinking along similar lines.
Alpha mixes and stop-and-go mixes are message-oriented, which as you said raises the question of how to integrate them into Tor. Judging by the abstract of the garbled routing paper (paywalled), it's a hybrid design combining message-oriented and circuit-oriented features. I think there might also be scope for circuit-oriented designs with higher latency than Tor currently provides, which might fit more easily into the Tor architecture than message-oriented or hybrid designs.
A circuit-oriented design would aim to prevent an observer from matching the circuits entering a relay with the circuits leaving the relay. In other words it would prevent traffic confirmation at each hop, and thus also end-to-end.
At least four characteristics can be used to match circuits entering and leaving a relay: start time, end time, total traffic volume and traffic timing. The design would need to provide ways to mix a circuit with other circuits with respect to each characteristic.
The current architecture allows start and end times to be mixed by pausing at each hop while building or tearing down a circuit. However, each hop of a given circuit must start earlier and end later than the next hop.
Traffic volumes can also be mixed by discarding padding at each hop, but each hop must carry at least as much traffic as the next hop (or vice versa for traffic travelling back towards the initiator). This is analogous to the problem of messages shrinking at each hop of a cypherpunk mix network, as padding is removed but not added.
There's currently no way to conceal traffic timing - each relay forwards cells as soon as it can.
Here's a crude sketch of a design that allows all four characteristics to be mixed, with fewer constraints than the current architecture. Each hop of a circuit must start earlier than the next hop, but it can end earlier or later, carry more or less traffic, and have different traffic timing.
The basic idea is that the initiator chooses a traffic pattern for each direction of each hop. The traffic pattern is described by a distribution of inter-cell delays. Each relay sends the specified traffic pattern regardless of whether it has any data to send, and regardless of what happens at other hops.
Whenever a relay forwards a data cell along a circuit, it picks a delay from the specified distribution, adds it to the current time, and writes the result on the circuit's queue. When the scheduler round-robins over circuits, it skips any circuits with future times written on them. If a circuit's time has come, the relay sends the first queued data cell if there is one; if not, it sends a single-hop padding cell.
You might also like the "Adaptive Padding" defense: http://freehaven.net/anonbib/cache/ShWa-Timing06.pdf. It implements pretty much what you describe here. It is one of my current low-resource favorites.
Marc Juarez actually has a reference implementation of this defense as a pluggable transport: https://bitbucket.org/mjuarezm/obfsproxy-wfpadtools/
It is a generalization of the Adaptive Padding scheme, and is described here: https://gitweb.torproject.org/user/mikeperry/torspec.git/tree/proposals/idea...
Our goal is to study this as a defense for Website Traffic Fingerprinting, but I also believe it has promise for frustrating E2E correlation.
Unfortunately, hidden services also represent the hardest form of E2E correlation - where the adversary actually gets to control when you connect, and to a large degree what you send (both in terms of amount, and traffic pattern). In this sense, it actually resembles a Global Active Adversary more than a GPA. It may even be harder than a generic GAA, because of the "adversary also decides when you send data" property.
I am unfortunately not optimistic about simple low-overhead padding being successful here. At the very least, this problem will require something more like a congestion-sensitive "always pad if there is spare capacity available to grow the CWIND, but no data". We have an approximation of this defense, too, in the form of the "Basket" pluggable transport: https://lists.torproject.org/pipermail/tor-dev/2014-December/007977.html
If the adversary is *also* capable of Guard node coercion/compromise, then we also want something like "Basket" to pad all the way to the middle node, but to accomplish that, we actually need fair end-to-end flow control for clients (and by extension, a datagram/UDP transport).
(Though also note: We should not need such end-to-end flow control to use Adaptive Padding all the way to the middle node, because we can approximate fairness using statistics and consensus load balancing information).
So the padding approach is a long, long road for hidden services. Luckily, most of it is also stuff we want for other reasons, anyway.