[tor-dev] Request for comments: patch to mark exit traffic for routing and statistical analysis

Fri Sep 23 20:02:15 UTC 2016

Hi everybody,

Unfortunately, it took a bit longer than expected, but here goes...
FWIW, after the recent update (with subsequent downtime), our exit node
is fully up and running again (including this patch) and relaying over
1TB a day at the moment.

Am 2016-09-19 um 23:36 schrieb René Mayrhofer:
> Am 2016-09-19 um 20:24 schrieb grarpamp:
>> On Mon, Sep 19, 2016 at 9:14 AM, René Mayrhofer <rm at ins.jku.at> wrote:
>>> Setup: Please note that our setup is a bit particular for reasons that
>>> we will explain in more detail in a later message (including a proposed
>>> patch to the current source which has been pending also because of the
>>> holiday situation...). Briefly summarizing, we use a different network
>>> interface for "incoming" (Tor encrypted traffic) than for "outgoing"
>>> (mostly clearnet traffic from the exit node, but currently still
>>> includes outgoing Tor relay traffic to other nodes). The outgoing
>>> interface has the default route associated, while the incoming interface
>>> will only originate traffic in response to those incoming connections.
>>> Consequently, we let our Tor node only bind to the IP address assigned
>>> to the incoming interface 193.171.202.146, while it will initiate new
>>> outgoing connections with IP 193.171.202.150.
>> There could be further benefit / flexibility in a 'proposed patch' that
>> would allow to take the incoming ORport traffic and further split
>> it outbound by a) OutboundBindAddressInt that which is going back
>> internal to tor, and b) OutboundBindAddressExt that which is going
>> out external to clearnet. Those two would include port specification
>> for optional use on the same IP. I do not recall if this splitting is
>> currently possible.
> That is exactly what we have patched our local Tor node to do, although
> with a different (slightly hacky, so the patch will be an RFC type)
> approach by marking real exit traffic with a ToS flag to leave the
> decision of what to do with it to the next layer (in our setup Linux
> kernel based policy routing on the same host). There may be a much
> better approach do achieve this goal. I plan on writing up our setup
> (and the rationale behind it) along with the "works for me but is not
> ready for upstream inclusion" patch tomorrow.
[Slightly long description of our setup to provide sufficient context
for the patch]
Attached you will find a PDF (sorry about the image artefacts, MS Office
vs. Libreoffice, etc.) describing our rough setup. The whole setup (Tor
node(s), monitoring server, switch, firewall, and soon a webcam watching
the rack with an unfiltered live-stream publicly available) is in a
separate small server room that does not host any other hardware. We use
an IPv4 range separate from the main university network (which is the
main reason why we don't relay IPv6 yet - we still have to acquire a
separate IPv6 range so as not to impact the reputation of the main
university subnet). We are highly thankful to the Johannes Kepler
University Linz and the Austrian ACOnet for supporting this!

Ideally, we would use 2 different providers to even further
compartmentalize "incoming" (i.e. encrypted Tor network) from "outgoing"
(for our exit node, mostly clearnet) traffic and make traffic
correlation harder (doesn't help against a global adversary as we know,
but at least a single ISP would not be able to directly correlate both
sides of the relay). Although we don't have two different providers at
this point, we still use two different network interfaces with
associated IP addresses (one advertised as the Tor node for incoming
traffic, and the other one with the default route assigned for outgoing
traffic). This has two main reasons (and a few minor ones listed in the
PDF):
* Technical: In the current project for statistical traffic analysis
(which is the reason for running the exit node, and the reason for the
gracious support by ACOnet), we are interested only in exit traffic
leaving the Tor network (i.e. into the "clear" net). We explicitly do
not want to analyze any traffic in which our node is an entry or middle
relay or traffic involving hidden services. This statistical analysis is
not done on the Tor node itself, but on a separate monitoring host (more
on that below).
* Legal: In case of a court order, it may be harder to compel us to
start monitoring incoming as well as outgoing traffic, as our system
architecture currently doesn't allow that. In other words, adding
traffic correlation would be more than adding or removing a filter on
the monitoring host, but require a significant change in our setup. That
may raise the bar for a corresponding legal order (not that we have
received _any_ legal order concerning our node so far, this is really
just another layer of protection).

The monitoring server collects - anonymized - statistical data by
watching the outgoing interface. There is another layer of protection in
the form of a passive network tap: the switch is configured so as to
mirror traffic between the Tor node outgoing interface and the upstream
firewall to a network port on which the monitoring server can passively
sniff. That is, with this setup we cannot tamper with the (incoming or
outgoing) traffic in any way (another hurdle for potential legal
orders). On the monitoring server, we strip IP target addresses and only
record statistics on port numbers, AS numbers, and countries (based on a
local geoip database, without any external queries). The statistics are
computed using monthly batch jobs (we can barely aggregate the traffic
data in the same time frame that we collect netflows...) and are online
at https://www.ins.tor.net.eu.org/tor-info/index.html. We are still in
the process of fully automating the aggregation over anonymized
netflows, which is why the latest time frame fully analyzed is June 2016
at the time of this writing.
An academic paper on the collected traffic statistics is to be submitted
within the next few weeks (showing e.g. that nearly all traffic that we
see is with a very high probability legal in our jurisdiction and that
the percentage of encrypted traffic is slowly but steadily increasing).
In the spirit of full transparency, we have yet another precaution in
place in the form of different responsibilities: Michael Sonntag is the
only person with remote access to the monitoring server, and he is
running the data analysis. Rudolf Hörmanseder is the only person with
remote access to the switch and firewall. I am the only person with
remote access to the Tor node itself (and as a full, appointed professor
at an Austrian university, this falls under my right to research and may
be legally hard to forbid). In other words, none of us could, without
colluding with another person, increase the set of data items being
monitored/analyzed. Anybody with physical access could of course make
arbitrary changes to all parts of the setup, which is why we intend to
put a live webcam into that server room.
We will also publish a more complete description of our technical and
legal setup including the specific reasoning in an Austrian/European
jurisdiction.

[The patch]
Currently, both (clearnet) exit traffic as well as encrypted Tor traffic
(to other nodes and hidden services) will use the outgoing interfaces,
as the Tor daemon simply creates TCP sockets and uses the default route
(which points at the outgoing interface). A patch as suggested by
grarpamp above could solve that issue. In the meantime, we have created
a slightly hacky patch as attached. The simplest way to only record exit
traffic and separate that from outgoing Tor traffic seemed to mark those
packets with a ToS value - which, as far as we can see, can be done with
a minimally invasive patch adding that option at a single point in
connection.c. At the moment, we use this ToS value in a filter
expression at the monitoring server to make sure that we do not analyze
outgoing Tor traffic. We also plan to also use it for policy routing
rules at the Linux kernel level to send outgoing Tor traffic back out
the "incoming" interface (to distinguish between Tor traffic and clear
traffic). When that works, the ToS flag can actually be removed again
before the packets leave the Tor node.
What do you think of that approach? Does that seem reasonable or would
there be a cleaner approach to achieve that kind of separation of exit
traffic from other traffic for analysis purposes? If this patch seems
useful, we can extend it to make this marking configurable for potential
upstream inclusion.

Rene
(Head of the Institute for Networks and Security at JKU)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: tor-exit-traffic-marking.diff
Type: text/x-diff
Size: 674 bytes
Desc: not available
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20160923/26fce9a4/attachment-0001.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Tor-Dev- Patch Submission.pdf
Type: application/pdf
Size: 147744 bytes
Desc: not available
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20160923/26fce9a4/attachment-0001.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20160923/26fce9a4/attachment-0001.sig>