[tor-commits] [tech-reports/master] Add morpher sources.

Wed Aug 22 18:28:38 UTC 2012

commit e1eb25efc0eef9ed51cb23fd81d70a2bd655a4f4
Author: Karsten Loesing <karsten.loesing at gmx.net>
Date:   Mon Aug 20 19:55:58 2012 +0200

    Add morpher sources.
    
    https://trac.torproject.org/projects/tor/attachment/ticket/5023/morpher_2.tex
---
 2012/morpher/.gitignore    |    3 +
 2012/morpher/500000_cs.pdf |  Bin 0 -> 27094 bytes
 2012/morpher/500000_sc.pdf |  Bin 0 -> 25864 bytes
 2012/morpher/https_cs.pdf  |  Bin 0 -> 20287 bytes
 2012/morpher/morpher.tex   |  298 ++++++++++++++++++++++++++++++++++++++++++++
 2012/morpher/tor_cs.pdf    |  Bin 0 -> 12335 bytes
 6 files changed, 301 insertions(+), 0 deletions(-)

diff --git a/2012/morpher/.gitignore b/2012/morpher/.gitignore
new file mode 100644
index 0000000..5f59b41
--- /dev/null
+++ b/2012/morpher/.gitignore
@@ -0,0 +1,3 @@
+morpher.pdf
+morpher-2012-03-13.pdf
+
diff --git a/2012/morpher/500000_cs.pdf b/2012/morpher/500000_cs.pdf
new file mode 100644
index 0000000..dd7abd0
Binary files /dev/null and b/2012/morpher/500000_cs.pdf differ
diff --git a/2012/morpher/500000_sc.pdf b/2012/morpher/500000_sc.pdf
new file mode 100644
index 0000000..fe954fe
Binary files /dev/null and b/2012/morpher/500000_sc.pdf differ
diff --git a/2012/morpher/https_cs.pdf b/2012/morpher/https_cs.pdf
new file mode 100644
index 0000000..b4e5d55
Binary files /dev/null and b/2012/morpher/https_cs.pdf differ
diff --git a/2012/morpher/morpher.tex b/2012/morpher/morpher.tex
new file mode 100644
index 0000000..8fd4935
--- /dev/null
+++ b/2012/morpher/morpher.tex
@@ -0,0 +1,298 @@
+\documentclass[]{article}
+\usepackage{graphicx}
+
+\usepackage[parfill]{parskip}
+\begin{document}
+
+\title{Packet Size Pluggable Transport and Traffic Morphing}
+\author{George Kadianakis}
+\maketitle
+
+\section{Traffic Morphing}
+
+\subsection{Overview}
+
+It is well known \cite{herrmann} that Tor traffic can be distinguished
+from other network protocols by its distinctive packet size. Due to the
+sized Tor cells, most TCP packets of Tor traffic are 586 bytes in size
+(See Figure \ref{tor_cs.pdf}).
+
+\begin{figure}[h]
+\begin{center}
+\includegraphics[scale=0.65]{tor_cs.pdf}
+\end{center}
+\caption{Packet size probability distribution of Tor Client-to-Server traffic \cite{caida}}
+\label{tor_cs.pdf}
+\end{figure}
+
+On the other hand, HTTPS, the protocol that Tor tries to simulate
+\cite{tls_norm}, has a much more spread out packet size probability
+distribution (See Figure \ref{https_cs.pdf})
+
+\begin{figure}[h]
+\begin{center}
+\includegraphics[scale=0.65]{https_cs.pdf}
+\end{center}
+\caption{Packet size probability distribution of HTTPS Client-to-Server traffic \cite{caida}}
+\label{https_cs.pdf}
+\end{figure}
+
+This means that an adversary can detect Tor traffic by using packets
+of size \textit{586} as distinguishers \cite{ll} \cite{herrmann}.
+
+\subsection{Solutions}
+
+An obvious solution to this problem is to introduce a padding scheme
+to Tor.
+
+Some network protocols already use padding to defend against traffic
+fingerprinting attacks. SSH and TLS, for example, both support padding
+in their messages \cite{ssh} \cite{gnutls}. Most implementations of
+those protocols don't pad by default. The ones that do, add a random
+amount of padding to the protocol message.
+
+The Tor protocol also supports padding cells (named \textit{PADDING}
+and \textit{VPADDING}) which attempt to generate covert traffic.
+
+While the random padding solution effectively hides the \textit{586}
+bytes identifier, it does not fully make Tor traffic resemble HTTPS
+traffic. This means that a sophisticated attacker can distinguish Tor
+traffic, by searching for SSL handshakes that try to look like HTTPS
+but actually follow a packet size probability distribution different
+from the one of HTTPS.
+
+A solution to that, is to collect a large amount of HTTPS packets and
+form their packet size probability distribution. We can then randomly
+sample that probability distribution for every packet we need to
+send. Specifically, for every Tor packet, we would sample the HTTPS
+probability distribution, find a target packet size and split or pad
+our packet accordingly. We call this method \textit{random sampling}.
+
+The problem with random sampling, is that it radically increases our
+bandwidth overhead.
+
+To reduce this overhead, we looked into a paper of Charles Wright,
+Scott Coulls and Fabian Monrose called \emph{Traffic Morphing: An
+  efficient defense against statistical traffic analysis} \cite{tm}.
+
+\subsection{Traffic Morphing}
+
+Traffic Morphing minimizes bandwidth overhead when transforming
+packets from one packet size probability distribution to another. That
+is, given two packet size probability distributions, one for the
+source protocol and another for the target protocol, it produces a
+\emph{morphing matrix} that acts as an oracle and describes how to
+efficiently morph packets between those two probability distributions.
+
+Traffic Morphing works by modeling the problem of transforming packets
+as a problem of linear programming.  To construct the morphing matrix,
+we consider \textit{bandwidth overhead} as the objective function of
+our linear program, and use appropriate problem constraints that
+transform the source probability distribution to the target
+probability distribution.
+
+\section{Woes of Traffic Morphing}
+
+\subsection{Issues}
+
+Unfortunately, morphing matrices are not a panacea, for the following
+reasons:
+
+\subsubsection{Statelessnes}
+
+Morphing matrices are not stateful. In other words, morphing matrices
+don't handle protocols whose packet size probability distribution
+changes through the protocol runtime.
+
+For example, HTTPS follows different packet size probability
+distributions between the handshake and the data phase, and morphing
+matrices are unable to understand the difference.
+
+\subsubsection{Splitting}
+
+The original traffic morphing paper did not specify an algorithm for
+splitting packets: When a morphing matrix suggests a packet size
+\textbf{smaller} than the original packet size, we have to split the
+original packet into two parts. The problem is that the original paper
+does not clearly define what should be done with the second half of
+the split packet.
+
+Sending the second half of the packet to the network is not correct,
+because its packet size does not belong to the packet size probability
+distribution of the target protocol (specifically, it belongs to the
+packet size probability distribution of the target protocol when split
+once by its morphing matrix).
+
+Querying the morphing matrix again, for the size of the second half of
+the split packet, is also not correct, since that packet size does
+not belong to the probability distribution of the source protocol and
+morphing matrices mishandle unknown packet sizes.
+
+Even though the original paper didn't specify the splitting algorithm
+that should be used, it hinted that doing a random sampling of the
+target probability distribution for the length of the split packet
+is what they did in their implementation.
+
+\subsection{Evaluation of issues}
+
+Unfortunately, the splitting problem of the previous section is not
+only theoretical. To evaluate the bandwidth overhead of morphing
+matrices, we made software \cite{morpher_eval} that simulates the
+morphing of a large number of packets. Specifically, our software
+morphs 500000 packets using both random sampling and traffic morphing,
+and plots the overhead. (In traffic morphing, we use random sampling
+for the second parts of a split packet.)
+
+We can see that in the case of Server-to-Client Tor-to-HTTPS packet
+morphing, morphing matrices actually reduce the bandwidth overhead by
+a substantial amount (See Figure \ref{0.5m_sc}).
+
+\begin{figure}[h]
+\begin{center}
+\includegraphics[scale=0.65]{500000_sc.pdf}
+\end{center}
+\caption{Bandwidth overhead for 0.5 million Server-to-Client packets}
+\label{0.5m_sc}
+\end{figure}
+
+However, in the case of Client-to-Server Tor-to-HTTPS packet size
+morphing, the bandwidth overhead of morphing matrices is actually
+larger than the bandwidth overhead of random sampling (See Figure
+\ref{0.5mcs}).
+\begin{figure}[h]
+\begin{center}
+\includegraphics[scale=0.65]{500000_cs.pdf}
+\end{center}
+\caption{Bandwidth overhead for 0.5 million Client-to-server packets}
+\label{0.5mcs}
+\end{figure}
+
+Our guess is that this happens because in the Client-to-Server case,
+HTTPS has large probabilities of outputting small packet sizes (See
+Figure \ref{https_cs.pdf}).
+
+This means that our morphing matrix tries to split our packets into
+small packets, and then when random sampling is used for the second
+half of our packet it tries to pad it to 1460 (which is also a very
+popular packet size in HTTPS):
+
+For example:
+
+\begin{verbatim}
+Packet size 586. We must morph it to 127. Splitting to 127+459 and sending 127.
+Packet size 459. We must morph it to 123. Splitting to 123+336 and sending 123.
+Packet size 336. We must morph it to 1459. Padding with 1123 and sending.
+\end{verbatim}
+
+In this case, morpher gets a packet of 586 bytes to morph. It queries
+the morphing matrix which suggests a packet size of 127 bytes. Morpher
+splits the original packet to 127+459 bytes, sends 127 bytes to the
+network, and is left with 459 bytes. Since the packet was split, our
+faulty splitting algorithm says that morpher should do a direct
+sampling over the HTTPS probability distribution. Morpher gets 123
+bytes as the result of random sampling, and splits the packet to
+123+336 bytes. It sends 123 bytes to the network, and is left with 459
+bytes. Morpher again does random sampling over the HTTPS probability
+distribution, and gets 1459 bytes as the result. This means that it
+must pad the 336 bytes packet to 1459 bytes. This results in an
+overhead of 1123 bytes.
+
+For the same case, random sampling was better due to the randomness of
+random sampling.
+
+\subsection{Fixes}
+
+A potential fix for the statelessness of morphing matrices, is to
+generate multiple morphing matrices for each phase of the protocol. We
+have not attempted this approach.
+
+A potential fix for the splitting algorithm issue, is to generate a
+morphing matrix for every split level (since the number of splits that
+can happen to a packet are finite), and use the appropriate morphing
+matrix everytime we have a split packet. We have not implemented or
+evaluated this fix yet.
+
+\section{Future}
+
+For the short-term future, we decided to postpone the development of
+pluggable transports based on Traffic Morphing. We think that more
+research is needed, as well as a stronger focus towards realistic
+adversaries.
+
+Specifically, we believe that if we investigate the strength and
+limitations of modern Deep Packet Inspection kits, we can provide
+adequate defences without the implementation complexity and drawbacks
+of the Traffic Morphing technology.
+
+We plan on investigating packet size pluggable transports with less
+overhead, even if they don't deliver a perfect probability
+distribution match to the target protocol, since we believe that such
+pluggable transports can effectively mitigate many practical
+packet-size-based detection attacks.
+
+At the same time, we also think it's important to remain up-to-date
+with modern academic research. For example, modern website
+fingerprinting research has showed that any kind of padding scheme is
+insufficient to protect against sophisticated packet size
+distinguishers \cite{ll} \cite{herrmann}. Furthermore, the academic
+community has found other network traffic features that can be used as
+distinguishers with surprisingly high accuracy \cite{panchenko}.
+
+Meanwhile, we are also researching alternative pluggable transports
+with better resistance against payload fingerprinting, which we think
+real-life attackers are likely to do, and also pluggable transports
+which can get through HTTP proxy servers.
+
+\section{Acknowledgments}
+
+Thanks to Steven J. Murdoch and the Tor Project for the fruitful
+conversations on packet size pluggable transports.
+
+\bibliographystyle{plain}
+\begin{thebibliography}{9}
+
+\bibitem{ssh} https://www.ietf.org/rfc/rfc4344.txt
+
+\bibitem{gnutls} https://www.gnu.org/software/gnutls/manual/gnutls.html\#On-Record-Padding
+
+\bibitem{ll}
+  M. Liberatore, B. N. Levine, \textit{Inferring the Source
+  of Encrypted HTTP Connections}, CCS2006, October 2006.
+
+\bibitem{herrmann}
+  Dominik Herrmann, Rolf Wendolsky, and Hannes
+  Federrath. 2009. Website fingerprinting: attacking popular privacy
+  enhancing technologies with the multinomial naÃ¯ve-bayes classifier.
+
+\bibitem{tls_norm}
+  http://archives.seul.org/or/dev/Jan-2011/msg00029.html
+
+\bibitem{morpher_eval}
+  \textit{Morpher pluggable transport: Select algorithm for packet size morphing}
+
+  https://trac.torproject.org/projects/tor/ticket/5023
+  http://archives.seul.org/or/dev/Jan-2011/msg00029.html
+
+\bibitem{panchenko}
+  Andriy Panchenko, Lukas Niessen, Andreas Zinnen, and Thomas
+  Engel. 2011. \textit{Website fingerprinting in onion routing based
+  anonymization networks}.
+
+\bibitem{tm}
+  Charles Wright, Scott Coulls , Fabian Monrose. \textit{Traffic
+  Morphing: An efficient defense against statistical traffic
+  analysis.} In Proceedings of the 14th Annual Network and
+  Distributed Systems Symposium (NDSS), Feb, 2009.
+
+\bibitem{caida}
+  The packet length probability distributions were formed after
+  analysis of traffic traces kindly provided by CAIDA.  The traffic
+  traces were captured by monitoring an Equinix datacenter in Chicago,
+  IL.
+
+  https://gitorious.org/morpher/morpher/blobs/master/ACKNOWLEDGMENTS
+  https://gitorious.org/morpher/morpher/trees/master/data
+
+\end{thebibliography}
+
+\end{document}
diff --git a/2012/morpher/tor_cs.pdf b/2012/morpher/tor_cs.pdf
new file mode 100644
index 0000000..5fe6483
Binary files /dev/null and b/2012/morpher/tor_cs.pdf differ