Hi everybody,
Unfortunately, it took a bit longer than expected, but here goes... FWIW, after the recent update (with subsequent downtime), our exit node is fully up and running again (including this patch) and relaying over 1TB a day at the moment.
Am 2016-09-19 um 23:36 schrieb René Mayrhofer:
Am 2016-09-19 um 20:24 schrieb grarpamp:
On Mon, Sep 19, 2016 at 9:14 AM, René Mayrhofer rm@ins.jku.at wrote:
Setup: Please note that our setup is a bit particular for reasons that we will explain in more detail in a later message (including a proposed patch to the current source which has been pending also because of the holiday situation...). Briefly summarizing, we use a different network interface for "incoming" (Tor encrypted traffic) than for "outgoing" (mostly clearnet traffic from the exit node, but currently still includes outgoing Tor relay traffic to other nodes). The outgoing interface has the default route associated, while the incoming interface will only originate traffic in response to those incoming connections. Consequently, we let our Tor node only bind to the IP address assigned to the incoming interface 193.171.202.146, while it will initiate new outgoing connections with IP 193.171.202.150.
There could be further benefit / flexibility in a 'proposed patch' that would allow to take the incoming ORport traffic and further split it outbound by a) OutboundBindAddressInt that which is going back internal to tor, and b) OutboundBindAddressExt that which is going out external to clearnet. Those two would include port specification for optional use on the same IP. I do not recall if this splitting is currently possible.
That is exactly what we have patched our local Tor node to do, although with a different (slightly hacky, so the patch will be an RFC type) approach by marking real exit traffic with a ToS flag to leave the decision of what to do with it to the next layer (in our setup Linux kernel based policy routing on the same host). There may be a much better approach do achieve this goal. I plan on writing up our setup (and the rationale behind it) along with the "works for me but is not ready for upstream inclusion" patch tomorrow.
[Slightly long description of our setup to provide sufficient context for the patch] Attached you will find a PDF (sorry about the image artefacts, MS Office vs. Libreoffice, etc.) describing our rough setup. The whole setup (Tor node(s), monitoring server, switch, firewall, and soon a webcam watching the rack with an unfiltered live-stream publicly available) is in a separate small server room that does not host any other hardware. We use an IPv4 range separate from the main university network (which is the main reason why we don't relay IPv6 yet - we still have to acquire a separate IPv6 range so as not to impact the reputation of the main university subnet). We are highly thankful to the Johannes Kepler University Linz and the Austrian ACOnet for supporting this!
Ideally, we would use 2 different providers to even further compartmentalize "incoming" (i.e. encrypted Tor network) from "outgoing" (for our exit node, mostly clearnet) traffic and make traffic correlation harder (doesn't help against a global adversary as we know, but at least a single ISP would not be able to directly correlate both sides of the relay). Although we don't have two different providers at this point, we still use two different network interfaces with associated IP addresses (one advertised as the Tor node for incoming traffic, and the other one with the default route assigned for outgoing traffic). This has two main reasons (and a few minor ones listed in the PDF): * Technical: In the current project for statistical traffic analysis (which is the reason for running the exit node, and the reason for the gracious support by ACOnet), we are interested only in exit traffic leaving the Tor network (i.e. into the "clear" net). We explicitly do not want to analyze any traffic in which our node is an entry or middle relay or traffic involving hidden services. This statistical analysis is not done on the Tor node itself, but on a separate monitoring host (more on that below). * Legal: In case of a court order, it may be harder to compel us to start monitoring incoming as well as outgoing traffic, as our system architecture currently doesn't allow that. In other words, adding traffic correlation would be more than adding or removing a filter on the monitoring host, but require a significant change in our setup. That may raise the bar for a corresponding legal order (not that we have received _any_ legal order concerning our node so far, this is really just another layer of protection).
The monitoring server collects - anonymized - statistical data by watching the outgoing interface. There is another layer of protection in the form of a passive network tap: the switch is configured so as to mirror traffic between the Tor node outgoing interface and the upstream firewall to a network port on which the monitoring server can passively sniff. That is, with this setup we cannot tamper with the (incoming or outgoing) traffic in any way (another hurdle for potential legal orders). On the monitoring server, we strip IP target addresses and only record statistics on port numbers, AS numbers, and countries (based on a local geoip database, without any external queries). The statistics are computed using monthly batch jobs (we can barely aggregate the traffic data in the same time frame that we collect netflows...) and are online at https://www.ins.tor.net.eu.org/tor-info/index.html. We are still in the process of fully automating the aggregation over anonymized netflows, which is why the latest time frame fully analyzed is June 2016 at the time of this writing. An academic paper on the collected traffic statistics is to be submitted within the next few weeks (showing e.g. that nearly all traffic that we see is with a very high probability legal in our jurisdiction and that the percentage of encrypted traffic is slowly but steadily increasing). In the spirit of full transparency, we have yet another precaution in place in the form of different responsibilities: Michael Sonntag is the only person with remote access to the monitoring server, and he is running the data analysis. Rudolf Hörmanseder is the only person with remote access to the switch and firewall. I am the only person with remote access to the Tor node itself (and as a full, appointed professor at an Austrian university, this falls under my right to research and may be legally hard to forbid). In other words, none of us could, without colluding with another person, increase the set of data items being monitored/analyzed. Anybody with physical access could of course make arbitrary changes to all parts of the setup, which is why we intend to put a live webcam into that server room. We will also publish a more complete description of our technical and legal setup including the specific reasoning in an Austrian/European jurisdiction.
[The patch] Currently, both (clearnet) exit traffic as well as encrypted Tor traffic (to other nodes and hidden services) will use the outgoing interfaces, as the Tor daemon simply creates TCP sockets and uses the default route (which points at the outgoing interface). A patch as suggested by grarpamp above could solve that issue. In the meantime, we have created a slightly hacky patch as attached. The simplest way to only record exit traffic and separate that from outgoing Tor traffic seemed to mark those packets with a ToS value - which, as far as we can see, can be done with a minimally invasive patch adding that option at a single point in connection.c. At the moment, we use this ToS value in a filter expression at the monitoring server to make sure that we do not analyze outgoing Tor traffic. We also plan to also use it for policy routing rules at the Linux kernel level to send outgoing Tor traffic back out the "incoming" interface (to distinguish between Tor traffic and clear traffic). When that works, the ToS flag can actually be removed again before the packets leave the Tor node. What do you think of that approach? Does that seem reasonable or would there be a cleaner approach to achieve that kind of separation of exit traffic from other traffic for analysis purposes? If this patch seems useful, we can extend it to make this marking configurable for potential upstream inclusion.
Rene (Head of the Institute for Networks and Security at JKU)