Filename: compress traffic at exit.txt
Title: Compress traffic at exit
Author: Sebastian G. (aka bastik_tor)
Created: 20-JAN-2012
Type: idea

Overview:

   This is about compressing traffic at the exit, where it's passed from
   outside the network through other relays to the client where it
   gets decompressed.
   
   The compression should happen on the fly.
   
   This is NOT about which compression is actually used.
   
   It's an idea. It outlines what has to be done, but not how.
   
Motivation:

   The network has a high bandwidth usage due to it massive user-base.
   This idea should reduce the outgoing traffic of the exit and take much
   load from the mid-relay and the entry point.
   
   The relays (exit or not) have certain bandwidth limitations which may
   could be circumvented by compressing the traffic they have to handle,
   which would mean that they could handle more users.
   
   The main motivation is to improve the performance of the network,
   e.g. to serve more users and remove bottlenecks.

Design:

   The design assumes:
   - that a fair amount of traffic is not compressed already
   - that such traffic can be compressed
   - bandwidth is rarer than resources (CPU/memory)
   
   The client requests that the traffic should be compressed. This
   could be enabled by default. It connects to the Tor network and loads
   a web page, joins an IRC chat, checks mails or does anything that
   produces traffic that can be compressed.
   If it can't it's treated the same.
   
   The relays pass that request to the exit.
   
   The exit loads the data, the client asked for and compresses them.
    
   The relays pass the compressed data to the client.
   
   The client decompresses the data.
   
   Bridges would need to support it as well and behave as relay. They pass
   the request to another relay and forward the compressed data to the
   actual client.
   
   The exit can set if it wants to support compression. As the exit operator
   might already know if he's processing compress-able traffic it would
   be good to give him a choice. The traffic would be compressed blindly.
   
   Apart from that an automated judgment based on the used ports could be made.
   Whenever port 80 or any other port where it's most likely that traffic is
   compress-able is "detected" the data compressed. This happens blindly,
   just port based.
   
Security implications:

   While thinking about this idea, there were no thoughts that it should affect
   the security of the network or it's users. This includes their anonymity.
   
   When a relay would handle more users it might be harder for an adversary to
   track them. Compression could make website requests appear different,
   which might be good due to the fact that old fingerprints won't work.
   
   This idea was not made with security in mind. Of course it should not be
   harmful.
   
   The idea itself, might not have a security impact, but the actual
   implementation could have. e.g. the used compression method has a bug, that
   leads to exploits when fed with the "right" data.
   
Specification:

   ReqCompress 0/1; client setting; request (1) compression or not (0)
   
   Compress 0/1/Auto; exit setting
   0; don't support compression
   1; support compression, compress anything
   Auto; support compression, compress or not based on ports
   
   //whenever possible the cells should only used when the data should
   be compressed, otherwise all would need to support it. Don't they?
   
   COMPRESS cell, carrying the compress request form the client to the
   exit, the relays pass it to the exit
   
   COMPRESS contains
   
        Compress; tells the exit to compress (or not)
        CompressMethods; which compressions are supported by the client
		Padding; if required
   
   COMPRESSED cell, serves as indicator that the data are compressed,
   the relays pass it to the client
   
   COMPRESSED contains
   
        Compressed; tells the client if the data are compressed (or not)
		CompressMethod; which compression has be used by the exit
		Padding; if required
   
Compatibilitiy:

   Clients that can't handle compressed traffic are incompatible, but those
   should not request compression.
   Relays are not affected, they just ship the traffic.
   Exits that do not support compression should not have problems, when
   they can ignore the request.
   
   Compatibility can be achieved by making the exits and relays able to
   understand that the data should be compressed, before clients
   can request it. The same is valid for bridges

Implementation:

   Pick an compression method that has the best possible
   compression ratio/CPU/memory trade-off. It should be stable and fast.
   
   The goal should not be the smallest data at any price.
   The compression should not decrease the 

   
Open Questions:

   Is there enough data that can be compressed?
   Can those data be compressed without delay?
   Is the saved traffic worth the work?
   Can the request be done without cells?
   Can cells be used optional, e.g. only when compression happens?

Performance:

   The performance gain of the network depends on the compression ratio that
   can be achived.
   
   Compression requires CPU time and memory, so it could hurt the performance
   of the exits, which would have less CPU time and memory available for
   crypto operations.
   
   Clients with low bandwidth could benefit from as well.
   
   Clients on weaker platforms might have problems with CPU and memory,
   which is consumed for decompression. The client for those platforms might
   shipped with different defaults.
   
   The positive effect should overcome the work the exits have to put in.
   This should be measured at the majority of exits.
   When they would not be able to answer requests although bandwidth is
   available, but memory or CPU is exhausted due to compression, then
   there's no point in this idea.