[tor-bugs] #31788 [Core Tor/Tor]: Circuit padding trace simulator

Tue Oct 22 14:49:50 UTC 2019

#31788: Circuit padding trace simulator
-----------------------------------------------+------------------------
 Reporter:  mikeperry                          |          Owner:  (none)
     Type:  enhancement                        |         Status:  new
 Priority:  Medium                             |      Milestone:
Component:  Core Tor/Tor                       |        Version:
 Severity:  Normal                             |     Resolution:
 Keywords:  circpad-researchers-want, wtf-pad  |  Actual Points:
Parent ID:                                     |         Points:
 Reviewer:                                     |        Sponsor:
-----------------------------------------------+------------------------

Comment (by pulls):

 Replying to [comment:4 mikeperry]:
 > Replying to [comment:3 pulls]:
 > > The implementation now requires a client and relay trace (got a lazy
 python script to simulate a relay trace from a client trace as well). My
 biggest gripe right now is time. Every time a client/relay machine
 triggers padding, a corresponding event has to be added to the
 relay/client trace with an estimated time. This estimate will always be,
 well, wrong. I'm not sure it's possible to make this estimate in such a
 way that it'll fool time-based classifiers, even if we add in guard traces
 for better estimates and patches as you mention Mike.
 >
 > To be clear: I believe this simulator will only be accurate enough to do
 preliminary tuning of defenses against attacks, especially for expensive
 classifiers. I think final attack and defense evaluation, and possibly
 even some final tuning, should be done on the live network. At least until
 we discover that for all of our tested attack+defense combinations, the
 live network and the simulator agree.
 >
 > What do you mean by "wrong" though? We should try to make the simulator
 as close as possible. We are aware of the circuitmux problem, as well as
 delay introduced by libevent callbacks. These are both paths we hope we
 can optimize, though. Are there others?

 Agree, we're on the same page. By "wrong" I mean in addition to what you
 mentioned inside of tor everything between traffic leaving tor at
 client/relay until it ends up in tor at the relay/client: basically the
 Internet! ;) We can never accurately capture this in a simulation now,
 that's more something for Shadow++ to strive for.

 > > Right now I think it might be best as a starting point to just try to
 use the simulator to find optimal machines against attacks like Deep
 Fingerprinting that ignores time. Once we have a better understanding of
 how feasible and costly that is we can look more closely at how time
 changes things.
 >
 > Do you mean ignores the time deltas between the client/middles and the
 guard?

 Ignores time completely (beyond the ordering of cells and their
 directions). Deep Fingerprinting, like Wang's kNN and so on, operates on
 pure cell/packet traces:

 1
 1
 -1
 -1

 Etc, no time there at all. Since the simulator cannot get time exactly
 right and some nasty deep learning machinery likely can pick-up on padding
 cells being sampled from some non-real distribution given enough samples,
 a first step is to get the simulation to produce correct cell traces with
 high probability. Finding (reasonably efficient) machines that can defend
 against attacks operating only on cells would be an awesome first step and
 hopefully teach us a lot.

 > > Any thoughts on this? Have I missed some other reason than time
 estimates for including guard traces?
 >
 > Well, I have always assumed that the most realistic adversary for these
 attacks is one that runs them from inside the Tor network, where they have
 much higher resolution over circuit construction and usage, and have full
 circuit multiplexing information.
 >
 > We can simulate such an adversary by looking at client traces, or guard
 TLS traces, I suppose.

 Thanks for clarifying, makes sense. With access to the client and
 (padding) relay traces, would it be possible to get closer to an "ideal"
 trace for an attacker (not necessarily the most realistic, but stronger)?
 For example, observing the exact time that cells are sent at a client and
 when cells are forwarded from relay (at the relay) to client seems close
 to ideal, right? That way you minimize the network noise. Padding machines
 that can defend from such an attacker should be able to deal with
 attackers with mote noisy traces.

 > > Also, if some other researcher working on this wants to collaborate
 please reach out.
 >
 > I now have some time to help with this a bit for the next couple weeks.
 Can you put your work in a branch on github?

 Awesome, it's here: https://github.com/pylls/circpad-sim . Added you as
 collaborator. Updated the README with the current state of things. Going
 to work on input and output next so that I can clean up some of the debug
 code. Will continue to work actively on it now sans trip Sunday-Wednesday.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/31788#comment:5>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online