Cell header changes / solving malleability

Thu Aug 28 06:31:54 UTC 2003

On Wed, 2003-08-27 at 20:03, Roger Dingledine wrote:
 [...]
> We fix the malleability problem by introducing randomness and redundancy
> inside the encrypted cell: we add 3 bytes of pseudorandomness in the
> space saved by reducing stream ID size, and we add 4 bytes (1/5 of a SHA1)
> where the sequence number used to be.

Just to clarify: I don't think you say what exactly the SHA-1 hash is
supposed to cover.  I assume that it covers all fields except ACI.  Is
this right?

Also, I assume that the purpose of the psuedorandomness is to prevent
cell contents from being predictable, so that an attacker can't modify
the contents and the checksum too.

>         ACI (anonymous circuit identifier)   [2 bytes]  \__outer header
>         Command                              [1 byte]   /
>         Pseudorandomness                     [3 bytes]  \__integrity header
>         Checksum (1/5 of a SHA-1)            [4 bytes]  /
>         Relay command                        [1 byte]   \
>         Length                               [1 byte]    == relay header
>         Stream ID                            [2 bytes]  /
>         Relay payload                        [242 bytes]
> 
> gives 14 bytes overhead and 242 bytes usable per relay cell. For relay
> cells, the outer header is rewritten at each hop, and the rest of the
> packet does the layered-encryption thing. Non-relay cells have the outer
> header and integrity header, followed by whatever is specific to that
> cell type (eg an onionskin).
>
> This means that relay cell integrity is only checked end-to-end. With 3
> bytes of randomness, it's pretty darn unlikely that the adversary could
> spoof a cell. The chance that he'll be able to keep up the spoofing for
> multiple cells is even lower.

> Note that *all cells* need to have this randomness in them. For example,
> if we decided padding cells can just use all zeros, then the adversary
> would be able to mutate padding cells into anything he wants.
> 
> We should also recognize that an observer could mutate a non-relay cell
> into a relay cell, and then since relay cells don't immediately check
> their integrity, it would propagate to the end of the circuit.
> 
> The computational overhead isn't so bad. The pseudorandomness is just
> a couple bytes of AES, and we're already doing 256 of them for the
> link encryption, and twice that for relay crypts. Making or receiving
> a non-relay cell or a non-intermediate relay cell requires a SHA-1,
> but that's not so bad either.

Um, the SHA1 cost is *not* negligable.  Doing this will add 50% to our
CPU overhead.

When I profiled an OR last week, I found that nearly all of the time
went into AES encryption.  Just now, I ran a benchmark for the following
cases:
   1) SHA1: An independent SHA1 on a 256-byte chunk of data.
   2) Incremental SHA1: Update an existing SHA1 structure with 256
bytes, and compute a hash of the data seen so far.
   3) AES: With an existing key, perform AES counter-mode encryption on
a 256-byte chunk of data.

I used the following sloppy code:

void 
bench_crypto()
{
  const int N = 1000000;
  crypto_cipher_env_t *env;
  SHA_CTX sha;
  struct timeval start;
  struct timeval end;
  int i;
  char data[256];
  char data2[256];
  char hash[20];
  crypto_rand(256, data);
  my_gettimeofday(&start);
  for (i=0;i<N;++i) {
    crypto_SHA_digest(data,256,hash);
  }
  my_gettimeofday(&end);
  printf("SHA took %f usec\n", (tv_udiff(&start,&end)/(double)N));

  SHA1_Init(&sha);
  my_gettimeofday(&start);
  for (i=0;i<N;++i) {
    SHA1_Update(&sha,data,256);
    SHA1_Final(hash,&sha);
  }
  my_gettimeofday(&end);
  printf("Incremental SHA took %f usec\n", (tv_udiff(&start,&end)/(double)N));

  env = crypto_new_cipher_env(CRYPTO_CIPHER_AES_CTR);
  crypto_cipher_generate_key(env);
  crypto_cipher_encrypt_init_cipher(env);
  my_gettimeofday(&start);  
  for (i=0;i<N;++i) {
    crypto_cipher_encrypt(env, data, 256, data2);
  }
  my_gettimeofday(&end);
  printf("AES took %f usec\n", (tv_udiff(&start,&end)/(double)N));

  crypto_free_cipher_env(env);
}

The results were:
SHA took 4.167264 usec
Incremental SHA took 3.419704 usec
AES took 8.421425 usec

In other words, SHA1 will significantly affect our processing demands.

But we may not need SHA1.  We only want a 4-byte digest, after all, and
we don't care so much about simple collision-resistance as we do about
preventing an attacker who knows A=E(H(msg|rand)) and B=E(msg|rand) and
msg from modifying A and B such that A'=E(H(B')).

There may be cheap functions that meet this need.

> Is there any real problem with AES'ing an odd number of bytes, rather
> than the block size? I would hope the AES implementation sticks the
> unused bytes in a buffer and gives them to me first when I ask for more,
> so amortized it's just like doing it by the block size.

Our current counter mode implementation does this, yes.

> We could instead back off on our threat model, and only concern ourselves
> with evil nodes in the middle of the circuit, rather than evil ISPs who
> own links. In that case non-relay nodes could skip the integrity header.

Or we could use an encryption mode other than Counter for our link
connections.  Since cells are 256 bytes long, and 256 is an even
multiple of our block size, we can probably use a mode where single-bit
errors turn the rest of the connection's data to junk.

> Reducing stream ID to 2 bytes increases the chance of collisions for
> crypted relay packets, but also improves general throughput. Consider
> 4 hops to a path and 3 exit connections per hop. 16 bits of stream ID,
> minus 4 bits from collisions (lg 4*4), plus 8 bits because cells are
> 256 bytes, means we expect a collision roughly every 2^20 bytes (one
> megabyte) of data flowing from the OP. The OP must examine outgoing
> relay cells as it crypts them, and if there will be a collision, it
> must send that cell instead as a relay-padding cell to the node where
> the collision will happen. One cell in 4096 needs to be sent twice (256
> byte overhead, .02%), compared to 4096 cells with the extra five bytes
> (20480 byte overhead, 2% overhead -- plus all the saved space in relay
> cells coming toward the OP, arguably the bulk of relay cells, that don't
> need to be checked for collisions).

In case you're curious, and in case I didn't miscalculate, the optimum
number of bits to use is:

        log_2 [ 8 * log_e(2) * CS * NS ]

where NS is the number of streams per circuit, and CS is the size of a
cell in bytes.  Simplifying:

        2.47 + lg[CS]+lg[NS] = 10.47 + lg[NS]

Assuming we want to use an integer number of bytes, two bytes is the
optimal number if we have a number of streams in the range 2...257.

-- 
Nick
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20030828/6332d27b/attachment.pgp>