Filename: compress traffic at exit.txt Title: Compress traffic at exit Author: Sebastian G. (aka bastik_tor) Created: 20-JAN-2012 Type: idea Overview: This is about compressing traffic at the exit, where it's passed from outside the network through other relays to the client where it gets decompressed. The compression should happen on the fly. This is NOT about which compression is actually used. It's an idea. It outlines what has to be done, but not how. Motivation: The network has a high bandwidth usage due to it massive user-base. This idea should reduce the outgoing traffic of the exit and take much load from the mid-relay and the entry point. The relays (exit or not) have certain bandwidth limitations which may could be circumvented by compressing the traffic they have to handle, which would mean that they could handle more users. The main motivation is to improve the performance of the network, e.g. to serve more users and remove bottlenecks. Design: The design assumes: - that a fair amount of traffic is not compressed already - that such traffic can be compressed - bandwidth is rarer than resources (CPU/memory) The client requests that the traffic should be compressed. This could be enabled by default. It connects to the Tor network and loads a web page, joins an IRC chat, checks mails or does anything that produces traffic that can be compressed. If it can't it's treated the same. The relays pass that request to the exit. The exit loads the data, the client asked for and compresses them. The relays pass the compressed data to the client. The client decompresses the data. Bridges would need to support it as well and behave as relay. They pass the request to another relay and forward the compressed data to the actual client. The exit can set if it wants to support compression. As the exit operator might already know if he's processing compress-able traffic it would be good to give him a choice. The traffic would be compressed blindly. Apart from that an automated judgment based on the used ports could be made. Whenever port 80 or any other port where it's most likely that traffic is compress-able is "detected" the data compressed. This happens blindly, just port based. Security implications: While thinking about this idea, there were no thoughts that it should affect the security of the network or it's users. This includes their anonymity. When a relay would handle more users it might be harder for an adversary to track them. Compression could make website requests appear different, which might be good due to the fact that old fingerprints won't work. This idea was not made with security in mind. Of course it should not be harmful. The idea itself, might not have a security impact, but the actual implementation could have. e.g. the used compression method has a bug, that leads to exploits when fed with the "right" data. Specification: ReqCompress 0/1; client setting; request (1) compression or not (0) Compress 0/1/Auto; exit setting 0; don't support compression 1; support compression, compress anything Auto; support compression, compress or not based on ports //whenever possible the cells should only used when the data should be compressed, otherwise all would need to support it. Don't they? COMPRESS cell, carrying the compress request form the client to the exit, the relays pass it to the exit COMPRESS contains Compress; tells the exit to compress (or not) CompressMethods; which compressions are supported by the client Padding; if required COMPRESSED cell, serves as indicator that the data are compressed, the relays pass it to the client COMPRESSED contains Compressed; tells the client if the data are compressed (or not) CompressMethod; which compression has be used by the exit Padding; if required Compatibilitiy: Clients that can't handle compressed traffic are incompatible, but those should not request compression. Relays are not affected, they just ship the traffic. Exits that do not support compression should not have problems, when they can ignore the request. Compatibility can be achieved by making the exits and relays able to understand that the data should be compressed, before clients can request it. The same is valid for bridges Implementation: Pick an compression method that has the best possible compression ratio/CPU/memory trade-off. It should be stable and fast. The goal should not be the smallest data at any price. The compression should not decrease the Open Questions: Is there enough data that can be compressed? Can those data be compressed without delay? Is the saved traffic worth the work? Can the request be done without cells? Can cells be used optional, e.g. only when compression happens? Performance: The performance gain of the network depends on the compression ratio that can be achived. Compression requires CPU time and memory, so it could hurt the performance of the exits, which would have less CPU time and memory available for crypto operations. Clients with low bandwidth could benefit from as well. Clients on weaker platforms might have problems with CPU and memory, which is consumed for decompression. The client for those platforms might shipped with different defaults. The positive effect should overcome the work the exits have to put in. This should be measured at the majority of exits. When they would not be able to answer requests although bandwidth is available, but memory or CPU is exhausted due to compression, then there's no point in this idea.