On TLS Empty Record Insertion
-----------------------------

I was asked by the Tor project whether it was safe to disable the
insertion of empty records, at least in the case of Tor.

I claim that it is safe for a variety of applications, in fact, and,
in practice, is also so for Tor.

Background
----------

OpenSSL prepends an empty record to each outgoing TLS record, in order
to avoid an adaptive plaintext attack. See
http://www.openssl.org/~bodo/tls-cbc.txt. Since that is a rather
convoluted document, I reproduce the attack, which is due to Rogaway,
slightly rephrased for clarity, below:

"Phil Rogaway observed that CBC mode is not secure against
chosen-plaintext attack if the IV is known or can be predicted by the
attacker before he choses his plaintext [1]. Similarly, CBC mode is
not secure if the attacker can observe the last ciphertext block
before choosing the next block of plaintext, because the last block of
ciphertext essentially serves as the IV for the rest of the message.
 
The attack itself is very simple. Remember that in CBC mode, each
plaintext block is XOR'ed with the last ciphertext block and then
encrypted to produce the next ciphertext block. Suppose the attacker
suspects that plaintext block P_i might be x, and wants to test
whether that's the case, he would choose some later plaintext block P_j
to be x XOR C_(i-1) XOR C_(j-1). If his guess is correct, then C_j =
Encrypt(P_j XOR C_(j-1)) = Encrypt(P_i XOR C_(i-1)) = C_i, and so he
can confirm his guess by looking at whether C_j = C_i."

Analysis
--------

For this attack to work, the attacker must be able to control a
plaintext block for which he knows the preceding ciphertext block,
C_(j-1). SSL and TLS open up this problem because the IV for each
packet is the previous packet's final ciphertext block[1].

Thus, if the attacker can choose plaintext for a TLS packet that
immediately follows a TLS packet he has observed, he can take a guess
at the plaintext of any packet he has seen in that ciphertext stream
and test whether the guess is correct.

Not many applications exist that allow the attacker to do this. In
particular, if an attacker has an ability to inject plaintext into a
TLS connection that contains data unknown to him, then the connection
must contain data not supplied by him.

Therefore any protocol running over TLS that contains data entirely
controlled by a single party is immune. Note, though, that this does
not include protocols like POP3 or IMAP, where the payloads are the
result of many different people sending emails. Nor does it include
Tor. It does, however, include static web pages.

The next important requirement is that the very first block of the
plaintext for some TLS packet must be controlled by the attacker,
which means that the protocol wrapped in TLS must contain no preamble
at the start of each TLS packet. So, any protocol that has a preamble
for any attacker-controlled data it sends is also immune. This does
include POP3 and IMAP. Web pages, even with dynamic content, might
appear to be almost certain to be immune, too, because the preamble is
the HTTP header block, or various bits of HTML, but care should be
taken because of protocols like AJAX that can use long-lived HTTP
connections.

Beware, though - if the preamble is short, there may be a statistical
attack that relies on the preamble happening to come out right. Also
beware that buffering might cause natural breaks in the protocol to
not correspond to TLS packet boundaries. The risk from this seems low,
since the exact boundaries of the buffers would seem hard to control
by the attacker, and the buffer would have to be empty at exactly the
right moment.

Tor may be vulnerable to such a statistical attack (this attack is due
to Steven Murdoch). If padding blocks are enabled, then an attacker
can determine if a block is a padding block with (2^128-2^16)/2^128
certainty. The attack works because padding blocks are all zeroes. So,
if the attacker thinks block C_i travelling between nodes A and B may
be a padding block, he can test for this as follows.

The attacker first creates a circuit to A, then extends it to B. He
then waits for a TLS packet with an IV s.t. IV xor C_(i-1) has 0x03 as
the third byte. This is then used as the plaintext for a block the
attacker sends to A over the circuit. Since this block will be a valid
relay cell, when node A receives the block it will modify it by
mapping the first two bytes (the circuit ID) of the incoming packet to
a different value via a lookup table unknown to the attacker. If the
lookup happens to leave these two bytes unaltered, then A's encryption
of the packet will become the "test" packet described above and the
attacker will see C_j = C_i and can conclude that the target packet
was, indeed, a padding packet (with high probability). This means that
the attacker has a 1 in 2^16 chance of detecting the padding packet at
each attempt. Because only one byte (the third byte) of the plaintext
is fixed, the attacker can attempt this attack on 1 in 2^8 TLS
packets. Beacuse the attacker cannot be certain of the value of the
first two plaintext bytes, a plaintext that happens to be all zero
except the first two bytes could trigger a false positive, with
probablity 2^-112, so therefore the attacker knows the block is a
padding block with probability (2^128-2^16)/2^128.

Note that because A modifies the circuit ID, the attacker can choose
any value he likes in the packet he sends to A. This means that after
a successful identification of a padding packet the attacker now knows
one entry in the lookup table. If the desired value of the circuit ID
(i.e. the first two bytes of IV xor C_(i-1)) happens to match in a
future test, then the attacker can use his known value for his ciruit
ID and will then be certain of the value in the plaintext. After each
successful attack with a different circuit ID, the attacker may learn
another entry in this table, so after 2^16 successful attacks the
difficulty of an attack falls from 1 in 2^16 to something closer to 1
in 2.

Because Tor does not currently actually send padding packets, this
attack is not currently applicable. Furthermore, if Tor modified
padding to be random instead of all zeroes, this would counter the attack.

Finally, the attacker must be able to react in a timely way. Having
observed a TLS packet, they must be able to react quickly enough to
construct their new packet and have it be the payload for the next TLS
packet. This means that any multiplexed protocol that is reasonably
heavily loaded is immune. This, in practice, includes Tor.

Other Countermeasures
---------------------

A Nagle-like algorithm would also make attacks harder in general.

Conclusion
----------

Not only is Tor likely to be immune to the attack, in practice,
without the need for record insertion, a variety of other protocols
are, too. The most widely used protocol, HTTPS, in particular, is
immune in most cases.

If Tor enables padding, then an attacker could detect padding as
described above, but with such low probability it is unclear whether
this attack would actually be useful. In any case, randomizing padding
is a simple countermeasure.

Care should be taken to ensure that a protocol and any particular
implementation of that protocol are actually immune as described above
before disabling empty packet insertion.

[1] The defence of including an empty record before each "real" record
works because the attacker is rendered unable to predict the IV for
the record he controls.