Here's an early draft document trying to sketch out the parameters for
migrating to better crypto ops and designs in the future.
Comments are invited, even comments of the form "you will need to be
much more specific here before I can say anything sensible."
This is proposals/ideas/xxx-new-crypto-sketch.txt in the torspec repository.
Title: Design sketch for new crypto ops
Date: 31 Oct 2011
Author: Nick Mathewson
0. Overview
The point of this document is to discuss what crypto we ought to be using.
See "Initial Thoughts on Migrating Tor to New Cryptography" from last year
for general guidelines and principles.
In broad strokes, the parts of our crypto are:
IDENTITY KEYS AND FINGERPRINTS
Addressed here in Section 2.
LINK CRYPTO (TLS) --
Addressed in proposals 176, 184. We say a little here in section 5,
though.
CREATE/EXTEND CRYPTO --
Addressed in xxx-ntor-handshake.txt and rransom's EXTEND draft at
[*] and subsequent discussion on the tor-dev mailing list. Not
considered here.
RELAY CRYPTO
Addressed here in Section 6.
DIRECTORY SYSTEM
Addressed here.
HIDDEN SERVICE SYSTEM
Addressed in a forthcoming document by rransom.
[*] https://lists.torproject.org/pipermail/tor-dev/2011-March/002547.html
1. Base algorithm choice
There seem to be two main candidate algorithms for signatures: RSA
with big keys (hereinafter "RSA>1024"); and Ed25519, which is DSA with
the sharp edges filed off on an Edwards curve related to DJB's
Curve25519. We can look at other ECC groups too. {But see ECC
Notes in 1.1 below.}
FOR DIFFIE-HELLMAN: Curve25519 seems like a decent choice; failing
that, one of the NIST P-groups. Failing that, DH on Z_p with big
groups (hereinafter "DH>1024"). {But see ECC Notes in 1.1 below.}
FOR A HASH FUNCTION: SHA256, switching to SHA3 in 2012 when it comes
out. It might be worthwhile waiting for SHA3 in most places and
skipping over the SHA256 stage entirely.
FOR A STREAM CIPHER: AES-CTR is in one sense a conservative choice
inasmuch as AES is well-analyzed, but AES's well-known issues with
cache-based timing attacks are pretty worrisome. We can mitigate that
some by using random secret IVs for AES-CTR, so that we will be
encrypting neither attacker-chosen nor attacker-known plaintext with
our AES cipher, but that's a bit kludgy. There are also supposed to
be time-invariant implementations that use Intel's AESNI instructions
where available, and time-invariant implementations that use
bit-slicing.
Salsa20 is what rransom likes these days, but IMO we aren't competent
to tell whether it looks good or not; the existing attacks against it
don't look like very bad news to me, but who knows whether it's
getting enough attention that we can read. See also ChaCha; see also
the other eSTREAM winners/finalists; see also SHA3 if the SHA3 winner
specifies a way to use it as a stream cipher, or specifies an
underlying stream/block cipher.
If we're feeling cautious, we could run two independently-keyed stream
ciphers and xor their streams together.
FOR A RANDOM NUMBER GENERATOR: We currently use OpenSSL seeded with
RAND_poll and with platform entropy. OpenSSL uses a message-digest-
based algorithm from SSLeay (See http://linux.die.net/man/3/sslrand
for the ugly details.) The platform entropy management can be messy,
obscure, or both. I suggest that:
* We should seed our PRNG with more entropy sources if we can find
some promising code with an appropriate license
* Instead of just using OpenSSL's PRNG, we should use OpenSSL's
MD-based PRNG xor'd with some other good PRNG. (Fortuna,
maybe. Is there a combine operation better than xor? See also SHA3
if the SHA3 winner is one that specifies a PRNG mode of
operation.)
* We should consider splicing this combined-stream PRNG into OpenSSL
as the RNG it uses for SSL and key generation.
* We should re-seed the RNG before and after very sensitive
operations, like private key generation.
1.1. ECC notes
ECC is the brave new[*] crypto of the future! It's faster[**] than
doing crypto in Z_n (as we do for RSA and DH now) for equivalent
levels of security, and the resulting outputs are much shorter.
As near as I can tell as a layman, Certicom is muddying the waters as
much as possible wrt claiming that it's nigh-impractical to deploy ECC
without licensing their patents. This is rather like the silliness
that PKP used to pull back in the day, where they claimed that their
patents covered not only the existing public key cryptography
algorithms, but also the very idea of public key cryptography itself.
DJB claims that for every patent he's aware of, either that patent
doesn't cover his code, or that patent is invalid because of prior
art. I'm not going to try to evaluate these claims, since I'm not
supposed to be reading patents for typical "let's avoid the appearance
of knowing infringement" reasons. But before we dive into the world
of ECC, we should see if we can ask any friendly patent attorneys and
ECC experts for a second or third opinion here.
I note in passing that nearly all of the patents that DJB mentions in
his list would appear to expire over the next 12 months or so.
Additionally, there are ECC groups out there less fast than DJB's, but
more widely available and analyzed. We should consider some of those
too.
One final issue to investigate is whether using these algorithms will
make any major free software distribution decide not to include us. I
seem to recall seeing that one or two of the big ones had at one point
decided to ship OpenSSL only with ECC disabled, either because of real
patent concerns, or because of an opinion that the Certicom license
for ECC use in TLS was problematic for free software, or something
like that. We should check that out.
[*] Actually, it's older than onion routing, and older than some
members of the Tor Project.
[**] Actually, because of the common practice of choosing a small-ish
prime value (65537) for e in RSA, RSA public key operations can be a
little faster than equivalent-security ECDH or ECDSA operations. The
private key operations in RSA are still much much slower.
2. New identities
Identity keys and their fingerprints are used:
- To sign router descriptors.
- To identify nodes in consensus directories.
- To make sure we're talking to the right node in the link handshake.
- To make sure that the extending node is talking to the right next
node when sending an extend cell.
- To identify particular nodes in the hidden service subsystem.
- To identify nodes in the UI in various places.
- Internally, to identify a node uniquely in the codebase.
- To determine which part of the circuit ID space to use on a Tor
instance's links.
2.1. New identities, option 1: "RSA>1024, slow migration"
In this option, we use RSA for identity keys indefinitely. Nearly all
operations done with an identity key are signature checking; signing
happens only a few times an hour per node even with pathological
cases. Since signature checking is really cheap with RSA, there's no
speed advantage for ECC here. (There is a space advantage, since the
keys are much smaller.)
The easiest way to migrate to longer identity keys is to tell all Tors
to begin accepting longer identity keys now, and to tweak all our
protocols so that longer RSA identity keys are understood. We should
then have a pair of parameters in the consensus that determines the
largest and smallest acceptable identity key size in the network.
Clients and servers should reject any keys longer or shorter than
specified. Once all versions of Tor can accept long identity keys, we
raise the maximum size from 1024 to somewhere in the 2048-4096 range.
2.2. New identities option 2: "RSA>1024, faster migration"
In this option, we use RSA for identity keys indefinitely as above.
But we allow nodes to begin having longer identities now, even though
older Tors won't understand them. This implies, of course, that every
such node needs to have at least 2 identities: one RSA1024 identity
for backward compatibility, one RSA>1024 identity for more secure
identification.
We would have these identities cross-certify as follows: All keys
would be listed in the router descriptor. RSA>1024 keys would be
called something other than identity-key, so as not to confuse older
clients. A signature with the RSA>1024 key would appear right before
the current RSA1024 signature. This way, signed material would
include both keys, and would be signed by both keys.
[In other words, descriptors would look something like:
router foo...
...
identity-key
-----BEGIN RSA KEY-----
1024-bit RSA key here
-----END RSA KEY-----
ext-identity-key
-----BEGIN RSA KEY-----
3072-bit RSA key here
-----END RSA KEY-----
...
ext-signature
-----BEGIN SIGNATURE-----
signature of everything through "ext-signature\n",
using the long key
-----END SIGNATURE-----
router-signature
-----BEGIN SIGNATURE-----
signature of everything through "router-signature\n",
using the short key
-----END SIGNATURE-----
]
See "UI notes" in the "new fingerprints" section below for some of the
implications of letting nodes have multiple identity keys.
We'll need to advertise these new identities in consensus directories
too; see 4.2 below for more info there.
2.3. New identities option 3: "RSA>1024 and/or Ed25519, faster migration"
As in option 2 above, but new keys can also be Ed25519. If we expect
that not all installations will allow Ed25519 (see "ECC Notes",
section 1.1), we'll need to say that every server with an Ed25519 key
must also have an RSA>1024 key.
2.4. Implications for current use of identity keys
Let's review our use of identity keys again and make sure that we can
handle all of them with the ideas above.
- To sign router descriptors.
We discussed this in 2.2.
- To make sure we're talking to the right node in the link handshake.
The current v3 link handshake can handle presenting multiple identity
certificates in the CERT cell. We should consider ourselves to be
connected to a node with identity X if _any_ of the identity
certificates that it presents in its authenticated CERT cell has
identity X. To handle EXTEND cells correctly, we should verify every
identity we can.
- To make sure that the extending node is talking to the right next node
when sending an extend cell.
The new extend cell format needs to allow the client to tell the
extending node about some identity for the destination node that the
extending node will be able to understand. This is a capability of
the extending node that the client needs to be able to check. (Also,
the extend cell needs to hash that identity in a form the extending
node can understand, but that's a fingerprint issue.)
- To determine which part of the circuit ID space to use on a Tor
instance's links.
We can continue to use RSA1024 identity key comparison here by
default. We can also use some other parameter of the v3 handshake, or
introduce a new link protocol where if the initiator authenticates,
the initiator always gets the low circIDs and the responder always
gets the high ones.
- To identify nodes in consensus directories.
- To identify nodes in the UI in various places.
- Internally, to identify a node uniquely in the codebase.
See sections 3 and 4 below.
- To identify particular nodes in the hidden service subsystem.
Out of scope.
2.5. Migrating away from short ID keys entirely
Eventually, no version of Tor that requires 1024-bit identity keys will
remain. When that happens, we should stop using them entirely. That
means that if we take any path other than the "slow migration" path of
2.1, we'll need to make everything that looks at a node's identity
also accept nodes with _only_ a RSA>1024/Ed25519 identity.
At the directory service level, we should have an option to allow
nodes without RSA1024 identity keys (off until all clients and nodes
accept new identity keys).
2.6. Selective correctness attacks
For any scheme based on having multiple signature types on a router
descriptor or other document, an attacker could mount a partitioning
attack by making a document which older clients will accept but newer
clients will reject.
It's easy to prevent this at the consensus step: directory authorities
MUST NOT accept any descriptor unless all clients will be able to
verify it.
For bridge descriptors, we need to investigate more carefully.
3. New fingerprints
Right now we compute fingerprints by taking the SHA1 hash of an ASN1
encoding of the RSA1024 identity key. We encode this in hex almost
everywhere, and sometimes prefix it with a $.
I propose that fingerprints of the future be determined by taking a
digest using SHA256 or SHA3 of:
"Hash Algorithm Name", "Key Type Name", encoded key
When representing these internally, we should include the hash
algorithm that was used. When representing them in the UI, we should
use the notation %b64, where b64 is a base-64 encoding, omitting the
trailing =s.
(Other plausible characters to use are @, ?, +, ~, =, etc. I like %,
but can be persuaded. Bikeshed bikeshed bikeshed.)
Since 43 base-64 characters is enough to represent a 256-bit digest,
with 2 bits left over, I propose that the b64 value encode
hh | D(hash algorithm name, key type, encoded key)
where hh is a 2-bit value, with one of the following values:
00 -- sha256
01 -- sha3
10 -- to be determined
11 -- reserved.
We should investigate in the interface whether it's plausible to allow
a prefix of a node ID where the full ID would otherwise be required.
That seems risky for short prefixes, though.
3.1. How many fingerprints is that anyway?!
Suppose that we allow sha256 and sha3 as hash algorithms, and we allow
each node to have 3 identity keys: one RSA1024, one RSA>1024, and one
ECC. Then we would have 7 fingerprints (6 plus the legacy
SHA1(RSA1024) fingerprint), for a total of 20+6*32==212 bytes per
node.
It's not a horrible problem to accept them all in the UI, but the UI
isn't the only place that needs to know fingerprints. Instead, let's
say that RSA1024 identities are only identified with SHA1 hashes.
This limits our fingerprint load to a more manageable 20+32*2 == 84
bytes per node. Still not great, though.
3.2. What does this imply for the UI?
In the UI we'll lose the property that no node has more than one
fingerprint: I do not believe that this actually hurts us.
3.3. Implications for directory information
Clients must know a hash for each node's identity key, or else they
can't make an authenticated connection to the node or tell ORs how to
extend to the node.
This means that if client Alice wants to connect to node Bob, Alice
must have a fingerprint of Bob's ID key such that she understands the
ID key type and the fingerprint algorithm. If Alice wants to extend
from Bob to Carol, she must have a fingerprint of Carol's ID key such
that Bob understands the ID key type and the fingerprint algorithm.
So for every node, Alice must not only know a fingerprint that *she*
can use for that node, but also a set of fingerprints such that every
node can understand at least one fingerprint in the set.
This implies a proliferation of fingerprints! We should tread
carefully here. To prevent proliferation, the easiest solution is not
to add too many new types and to have a good plan for retiring older
types.
3.4. Implications for EXTEND cells
As mentioned in 3.3, when a client Alice tells node Bob to extend
to node Carol, she needs to give Bob a fingerprint for Carol that Bob
will understand: one where Bob understands the digest algorithm, and
understands the identity key type.
There are two ways we can do this:
1) Alice's EXTEND cell contains every fingerprint for Carol that
Alice knows about. Bob treats the cell as valid if every one he
can verify is correct.
2) Alice knows which fingerprint types Bob understands (either via
his version, or something else in his directory info). She
selects a fingerprint for Carol using the best one of these
types.
The first seems more robust to me, if we have space for enough bytes.
If we proliferate too many types, though, we'll need to do the second.
4. Directory changes
4.1. Better cross-referencing
In some places, directory objects cross-reference one another by SHA1
hash. They should use a better hash algorithm instead.
This does make problems in a few cases.
Router descriptors and extrainfo descriptors:
One problematic case is in determining node families. If node A
and node B want to list each other as being in the same family,
they need to do so in a way that clients can interpret. That could
mean listing SHA1-RSA1024 fingerprints so old clients understand,
AND new fingerprints for security. (But *that* could create
interesting partitioning attacks wherein your family looks
different depending on who's looking.)
Solution: we need to move the responsibility for combining node
families into the consensus voting process, so clients don't
need to understand the cross-reference types themselves.
Another case is in certifying extrainfo documents from descriptors.
For that, we can list multiple extrainfo digests, either on the
extrainfo line, or on additional lines.
Voting and consensus documents:
Adding more fingerprints in votes isn't a problem; votes are a tiny
fraction of authority bw usage. Adding more hashes is easy.
For consensus documents, we ought to have flavors that you can
download depending on what set of fingerprint types you
understand.
For integrity purposes, consensuses can refer to microdescriptors
or descriptors by any digest type that the client understands. But
for downloading purposes, the digest type must be one that
directory caches also support: see 4.4.
4.2. More fingerprints
Because extending from node A to node B requires that we have node B's
fingerprint in a way that node A will understand, it is not enough to
get a set of identity fingerprints for each node in the format that
the client likes best -- see 3.3 and 3.4 above. So every flavor of
consensus we serve needs to include a node identity in a format the
client understands, and node identities in formats such that every
node will understand at least one.
4.3. An option: compound signatures on directory objects
In Tor 0.2.2.x and later, when we check a signature on a directory
object (not including hidden service descriptors), we only look at
the first DIGEST_LEN bytes of the RSA-signed data. Once 0.2.1.x is
obsolete, or on any types of signatures not checked in 0.2.1.x, we
can use the rest of the space. (We're using PKCS1 padding on our
signatures, which has an overhead of 11 bytes. Signing a SHA1 hash
with a 1024-bit key therefore leaves 128-11-20==97 more bytes we
could use for a SHA2 or a SHA3 hash.)
4.4. Downloading by digest
We should have directory caches support downloading objects by more
hash types. Right now, descriptors are downloaded by their SHA1
hashes and microdescriptors by their SHA256 hashes. This is okay for
now, but once SHA3 is out, we should support downloading all of these
by SHA3 digest.
5. Link crypto changes
Currently we use TLS. That's fine.
We should however look to longer link keys, bigger DH groups, etc.
Once TLS versions 1.1/1.2 are available in OpenSSL, we should move to
use them, I think. We should also look into how quickly we can
deprecate TLS 1.0 and SSL <= 3 usage.
6. Relay crypto changes
There are a few things we might want out of improved relay crypto.
They include:
- Resistance to end-to-end bitwise tagging attacks.
- Better resistance to malleability.
- If using counter mode, no block-cipher operations on any value
known to the attacker.
I'll try to provide these in increasing order of difficulty. None of
these is necessarily correct; I should look for a security proof or a
better construction for any that we seem likely to use.
Rationales: Our existing malleability resistance is a kludge. Doing
no block-cipher ops on attacker-known values increases our security
margins a little. Our arguments about tagging attacks hold that an
attacker who controls both ends has plenty of ways to win even if
tagging attacks are foiled; nonetheless, most of these ways are
technically slightly more difficult than xor-based tagging, and it
could be useful to boost our defense-in-depth a little bit, just in
case other active end-to-end attacks turn out to be harder than we'd
thought.
6.1. Option 1: Use AES-CTR in a less scary mode
When doing key expansion, in addition to establishing Kf, Kb, Df, and
Db, also establish IVf and IVb. Use the current relay crypto, except
instead of starting the counters at 0, start them at IVf and IVb.
This way, an attacker doesn't have any known plaintexts to work with,
which makes AES a little more robust.
6.2. Option 2: As 1, but tagging attacks garble the circuit after one block.
Keep an HMAC of all previously received encrypted cells on a circuit.
When decrypting a cell, use this HMAC value to determine the first 64
bits of the counter; increment the low 64 bits of the counter as
usual.
This way, if an adversary flips any bits before passing the stream
through an honest node, no _subsequent_ block will be recoverable.
To prevent any part of the stream from being re-used, close any
circuit if the low 64 bits of the counter would ever wrap (that is,
around 295 million terabytes).
(If we're using a stream cipher with fast re-key, then we can just
have the key used for each block be an HMAC of all previously
received ciphertext.)
6.3. Option 3: As 1, but tagging attacks garble the circuit in the same block.
Use a large-block cipher mode, such as BEAR or LIONESS (depending on
whether we need a PRP or SPRP). Base the key material for each block
on an HMAC of all previous blocks' ciphertexts.
This way, if an adversary makes any alteration in a block, that block
and all subsequent blocks will be garbled. It's more expensive than
2, though, especially if we need to use a LIONESS construction.
{I considered IGE here, with a trick where odd-numbered nodes on a
circuit start from the front of the block and even-numbered nodes
start from the end, but it didn't seem much better. We should
investigate relative performance, though.}
6.4. Option 4: Shall we have middle nodes be able to fast-stop bad data?
In all the above options, if a cell is altered, the middle node can
at best turn that cell and the rest of the cells on the circuit into
garbage, which the last node won't deliver (if honest) or can't
deliver (if dishonest).
Might we prefer to do as in mixnets, and have nodes kill circuits
upon receiving altered cells?
It's not such an obvious improvement. Including more MACs is more
expensive in per-cell overhead. The attacks that we would foil this
way but not with Option 3 are not so much better than the the passive or
timing-based-active end-to-end attacks that would still remain.
Consider that if option 3 is in place, an end-to-end attacker who
wants to do a tagging attack at one node can garble the rest of the
circuit and see if the output is garbled at the exit node. But such
an attacker could just as easily close the circuit at one of those
nodes and watch for a corresponding close event, or even better --
simply pause traffic on that circuit for a while and watch for a
corresponding gap at the exit. The only advantage of the garbling
attack would be that garbled cells are presumably rarer than circuit
closes or traffic pauses, and thus easier to use to distinguish
target circuits. But that's still questionable: the other attacks
win fine, and the pause attack doesn't risk detection as much.
So why might we want to do this? First, the overhead doesn't need to
be as bad as you might first expect (see below). Second, it would be
nice to increase the security margin as much as possible: "attacks
only get better".
So let's figure out how it would look.
To do this one, we'd want to have outgoing and incoming circuits
treated differently. Incoming cells would get decrypted as in 1
above, except that we'd have a MAC on them. For outgoing cells,
each node would check that the first N bytes of the cell
match a MAC of all data seen so far, *including the rest of the
cell*. They'd then remove the first N bytes, re-pad the cell
with bytes from a PRNG, and decrypt the resulting re-padded cell.
(This is basically how mixmaster works, and how mixminion works in
the common case.)
The space overhead here is kind of large: N bits per cell per node.
In the most paranoid case, if we used 256-bit HMACs on 3-node paths,
that's 96 bytes per cell, which is more than 20% of the total length.
But we can probably do better if we let the CREATE operation also
tell the node some N to check. For example, the first node doesn't
need to check any bits. The second and third nodes could check 64
bits apiece; that only has 16 bytes overhead total, and high
probability of catching any changes. (Birthday attacks don't matter
here, and an attacker who mounts this attack for long enough to
accidentally find a 64-bit MAC will break so many circuits in the
process as to become totally unreliable.)
All of this leaks the path lengths and position on the path to
various nodes. We might open ourselves up to partitioning attacks if
different clients choose different numbers of bits. What's more, we
might leak the length of the path to the last node by how much junk
there is at the end of the cell. So we'd need to be careful!
Here's a simple construction for this format, to be concrete:
The CREATE operation's KDF produces the following outputs:
Kf, IVf (stream cipher key and IV for forward direction)
Kb, IVb (stream cipher key and IV for reverse direction)
Mf (MAC key for forward direction)
Mb (MAC key for reverse direction)
SEEDf (PRNG key for forward direction)
And it also sets the following user-selected parameter:
MACBYTESf (an integer between 0 and 32 inclusive)
MACBYTESb (an integer between 0 and 32 inclusive)
CANEXIT (boolean: can we exit from this hop?)
Let Kf[i], Mf[i], etc denote the parameter Kf, Mf, etc as shared
between the client and the i'th node in its circuit.
Relay cells sent towards the client have the following plaintext
format:
Body:
Content:
Relay Command [1 byte]
StreamID [2 bytes]
Length [2 bytes]
Data [Up to CELL_DATA_LEN-5-MACBYTESb bytes]
Padding [randomly generated as needed to fill the
cell]
MAC(All previous encrypted content + encrypted content,
Mb)[:MACBYTESb] [MACBYTESb bytes]
The originator of the client-bound cell encrypts the content with
the next part of its Kb,IVb stream, then appends the MAC.
Non-clients receiving a client-bound relay cell encrypt the entire
cell body, MAC included, with the next part of the stream cipher
that was keyed with Kb,IVb.
When the client receives a relay cell body, it iteratively does:
For node i in circuit from 1..N:
Let cells_i = all previous cells which we previously decided
were from node i, or relayed by node i,
and let cellbody = the body of the cell, except for the last
MACBYTESb[i] bytes,
and let cellmac = the last MACBYTESb[i] bytes of this cell.
If cellmac is nonempty, check wither cellmac = mac_received,
where mac_received is the first MACBYTESb[i] bytes of
MAC(cells_i | cellbody, Mb[i]). If so, this cell is from node
i.
If this cell is from node i, add cellbody to cells_i, then
decrypt cellbody using the stream keyed with Kb[i],IVb[i].
Act on it as a relay cell.
Otherwise add the entire cell to cells_i, and decrypt it, MAC
included, with the stream keyed with Kb[i], IVb[i].
If no node sent this cell: it's junk and somebody is probably
messing with us! Destroy the circuit.
When the client *sends* a cell outbound to node N:
Let cells[i] start at "" for all i in 1...N initially, and get
updated as below.
Let MACLEN = SUM(MACBYTESf[1...N])
Let Body =
Relay Command [1 byte]
StreamID [2 bytes]
Length [2 bytes]
Data [Up to CELL_DATA_LEN-5-MACLEN bytes]
Padding [randomly generated,
CELL_DATA_LEN-5-MACLEN-len(Data) bytes]
Let PAD[i] = the next MACBYTESf[i] bytes from the PRNG keyed
with SEEDf[i], for i in 1...N
Let STREAM[i] = the next CELL_DATA_LEN bytes of
the stream keyed by Kf[i],IV[i], for i in 1...N
Let PADSEEN[1] == ""
For i in 2...N:
Let L = len(PADSEEN[i-1]) + len(PAD[i-1])
Let PADSEEN[i] = (PADSEEN[i-1] | PAD[i-1]) xor
STREAM[i-1][CELL_DATA_LEN-L:]
For i in N down to 1:
Let Encbody = Body xor STREAM[i][:len(Body)]
Let extra = "RECOGNIZED" if i == N, "OK" otherwise
Let cells[i] = cells[i] | Body | PADSEEN[i]
Let M = MAC(cells[i] | extra , Mf[i])
Let Body = M[:MACBYTESf[i]] | EncBody
To receive an outbound cell:
Let M be the first MACBYTESf bytes of the cell, let REST be the
rest of the cell, and let "cells" be all previous cells on this
circuit. If CANEXIT, and M = MAC(cells|rest|"RECOGNIZED",
Mb)[:MACBYTESf], and MACBYTESf > 0, this cell is for us. If M
= MAC(cells|rest|"OK", Mb)[:MACBYTESf], this cell is not for
us, but is valid. Otherwise, destroy the circuit.
Let PAD = the next MACBYTESf[i] bytes of the PRNG keyed with
SEEDf, and decrypt REST | PAD using the stream cipher keyed
with Kf,IVf. If this cell is for us, act on it as a relay
cell. Otherwise, relay it.
ANOTHER VARIANT:
If we restrict MACBYTESf values to range 0..HL/2, where HL is the
length of the MAC output, we can replace
MAC(x | "RECOGNIZED")[:MACBYTESf] and MAC(x | "OK")[:MACBYTESf]
with
MAC(x)[:MACBYTESf] and MAC(x)[HL-MACBYTESf:]
PICKING MACBYTESf,MACBYTESb.
We don't need to worry about birthday attacks:
Because we're using a MAC, only the parties who are making
the MACs could try to do a brute-force search for a
collision, but they have no reason to do so.
If a collision occurs accidentally, an adversary can't
substitute an earlier-seen cell for a later one with the
same MAC, since the MAC covers not only the cell, but all
previous cells on the circuit.
So 16 bytes is about the most we should ever do, given our
usual security parameters. Let me moot the number 8 for
MACBYTESb.
For outbound cells, for any hop we can exit from, choosing
MACBYTESf=6 gets us the current security level. For the first
hop, assuming we don't exit from it, choosing MACBYTESf=0 is
totally safe, since the link crypto guarantees that nothing was
corrupted on the way.
In general, to prevent an end-to-end tagging attack, it seems
sufficient to do something like setting MACBYTES=8 for the last
hop, and MACBYTES=8 for one hop in the middle.
OTHER VARIANTS:
Can we combine this approach with one of the approaches in 2 or
3 above to ensure that if corrupt data passes (because of our
use of truncated HMACs) it still corrupts the stream?
Can/should we use GCM or something here instead of separate
encrypt/hmac operations? It doesn't seem that GCM per se would
apply without some tweaking, which we probably do not have the
expertise to do.
OVERHEAD NOTES:
When computing additional overhead with this method, note that
it lets us replace the old 4 byte "digest" field and the 2 byte
"recognized" field.
I note in passing that we need at most 9 bits for the length
field, and at most 6 bits for the command field, yet we're using a
total of 3 bytes for those 15 bits. That's an opportunity to
save another byte.
ACKS
Lots of the good ideas and concerns here are due to Robert Ransom.
Michael Stone helped some with "relay option 4" above.
A few pointers. On the happy "yay we can use it freely" side, we have:
* RFC 6090
Pure gold! A catalog of ECC techniques which were published so long
ago that they cannot still be under patent. It includes digital
signatures and Diffie-Hellman.
* DJB's page stating that certain patents don't apply to Curve25519
(which you mentioned): http://cr.yp.to/ecdh/patents.html
* Jack Lloyd's statement that he avoided techniques newer than about
1990 in Botan: https://bugzilla.redhat.com/show_bug.cgi?id=615372#c19
On the sad "no we can't use it" side, we have:
* Fedora's insistence on removing all elliptic curve cryptography from
their Linux distribution and refusing to answer questions about why:
https://bugzilla.redhat.com/show_bug.cgi?id=615372
Regards,
Zooko
On Tue, Nov 1, 2011 at 4:40 PM, Marsh Ray <marsh(a)extendedsubset.com> wrote:
>
> On 11/01/2011 03:06 PM, Watson Ladd wrote:
>>
>> GCM is a Wegman-Carter authenticator with polynomial evaluation in
>> GF2^128 as the universal hash and AES as the encryption. As NIST
>> pointed out, neither of those papers had anything to say about the
>> actual security of GCM: the security claims in the GCM paper are still
>> valid. The Microsoft paper is simply irrelevant. 2^{n-k} looks scary,
>> but messages are 2^k blocks long, where a block is 128 bits. And the
>> GCM paper limited the length of messages to 2^36 bytes. The Joux paper
>> was a threat to a misfeature: Theorem 1 still holds from the GCM paper
>> (http://eprint.iacr.org/2004/193.pdf). Its just a matter of
>> remembering that unique means unique, and that supporting variable
>> length IVs by padding changes the meaning of the word unique.
>
> But those are exactly the kinds of details that get overlooked time and time
> again. Whether or not you consider it a "real attack", a little bit of
> controversy can help raise the important design constraints out of footnote
> status.
Definition of attack: something that breaks a security claim.
Microsoft breaks a claim the GCM designers never made, Joux very
cleverly breaks a part of the standard that their proof doesn't cover.
The designers also didn't claim anything near the real security.
(http://cr.yp.to/antiforgery/securitywcs-20050227.pdf is a tighter
proof).
True, NIST should put the claims front and center in the document,
especially when they have such a nice form.
But it is the responsibility of implementators of cryptography to
exhaustively document what their security claims
are. Hiding behind NIST won't save your data if you do something stupid.
>
>> I don't
>> think stupidity by standards designers is a strike against what they
>> are standardizing.
>
> :-)
>
> Everyone except fools and specialized cryptographers look to the recognized
> experts to guide them. NIST's function is to provide expert, conservative
> guidance on these topics, and they're pretty well regarded by engineers and
> others who choose which standard to use. By following NIST, at the very
> least it's not your fault if it turns out to be broken.
NIST is standardizing cryptographic algorithms for government use.
They make tradeoffs in
their algorithms the same as any other user: speed vs. security.
Blindly implementing standards
is a way to fall into a trap. DES is a standard, Blowfish isn't as
much. Which would you rather use?
Furthermore, NIST didn't write the draft that the comments from
Ferguson were about. As the response
indicates GCM could be secure if you made the right parameter choices.
Its not substantially different from
RSA in that regard. Now, why the options, why q=2^128 instead of some
prime? Because NIST wanted to support
high speed encryption of fiber optic links in hardware, and Galois
fields are easy in hardware. This is the
story I get from the comments on the standard. If you are working in
software, I would suggest something else.
>
> If, as you suggest, NIST can't get this unambiguously correct the first time
> then in peoples' minds it naturally raises the question of the whole scheme
> maybe being too complicated in practice.
The standard is complicated But the problem is the picking the IV. I
have no idea why NIST decided to use 96 bit IVs but fix the first 32
bits, so you are left with a 64 bit counter, then support longer ones
through hashing, and then ultimately have to backtrack and support
only 64 bit counters as nonces. But GCM is simple: it is polynomial
evaluation, and encryption of the resulting value. The NSA has had
these incidents before: ECMQV used to be
in Suit B, before unknown key share attacks were published against it.
The NIST P-256 was fast on 32 bit
machines, but put a reflection right in the middle of a word on 64 bit
machines. Etc, etc,. I'm sure there are
many we never hear about.
More fundamentally implementing a system such as Tor requires making
choices about how standards are used
and fit together. It is the rare (and wonderful!) standard that has no
options. Making those choices requires knowledge of the security
offered by the standard. It is never going to be possible to rely on
guidance alone.
Implementors who implement AES with giant tables are in for a surprise
when they run on cloud machines.
Hashing a password is pointless when that hash is a hash function:
checking all 10 character passwords takes about a week on a nice
graphics card. Understanding of cryptography is required even if
standards exist.
Nonces are required for protection against replay attacks. If nonces
are unique, GCM is secure and pretty fast.
Poly1305 is faster. HMAC does not have as high a margin of security
proven, and when a hash function breaks
so does HMAC. This is a good case for GCM or Poly1305 in Tor. Right
now we have nothing: cells are encrypted
with AES in counter mode only.
In order to do something constructive I might go through Torspec and
look at how things are used and what we
rely on them for. This might give us a better idea of what the requirements are.
>
>> KDF!=turn a password into a bunch of bits. The KDF Tor uses is used to
>> turn the result of a DH negotiation into keying material for several
>> different algorithms. The KDF we need just needs to take some
>> distribution to something random looking. We have enough randomness in
>> the initial DH negotiation to not have to worry about slowing down
>> brute force attacks.
>
> So you could almost use MD5(counter + DH material) for that :-)
You can indeed. Or any stream cipher really. (Provided it doesn't do
very bad things)
>
>> Where we use hashing to protect a password we
>> should use scrypt or similar functions.
>
> I didn't know if passwords was a use case under discussion. PBKDFs has been
> a hot topic lately and was just ranting in general.
>
> Thanks,
>
> - Marsh
>
Sincerely,
Watson Ladd
--
"Those who would give up Essential Liberty to purchase a little
Temporary Safety deserve neither Liberty nor Safety."
-- Benjamin Franklin
in reference to
https://lists.torproject.org/pipermail/tor-dev/2011-November/002999.html
We're doing a development effort and/or thought experiment in the
Tahoe-LAFS project. This development effort and/or thought experiment
is called "One Hundred Year Cryptography" [1].
The main result of the experiment so far is that you don't need to
worry very much future-proofing your digital signatures,
collision-resistant hashes, or MACs, but you *should* worry about
future-proofing your Diffie-Hellmans and ciphers!
Once you deploy a future version of Tor which no longer accepts
old-style signatures, then it will no longer matter to your users if
someone subsequently figures out how to forge old-style signatures
(for example by using quantum computers). On the other hand, once you
deploy a future version of Tor that uses a newer cipher, your users
remain vulnerable to an attacker who later breaks the older cipher and
reads their older plaintexts.
Combining two ciphers as you suggested by XORing their output (and
keying them independently—perhaps by deriving two subkeys from a
master key using a strong KDF) seems to be a safe and efficient way to
protect against this sort of threat and doesn't seem to impose too
high of a burden of complexity. That's what we intend to do in
Tahoe-LAFS. A Google Summer of Code student, Yu Xue, contributed some
code to generate subkeys and XOR the streams, but we haven't yet
integrated it into Tahoe-LAFS itself. (I intend to do that in the
release cycle that just opened up, so within the next 4 or so months.)
I haven't thought very much about future-proofing Diffie-Hellman
operations. That sounds like it might be more complex. I *have*
thought about how to combine two different digital signatures or
public key encryption algorithms. Those don't seem too dauntingly
complex, but like I said not so urgent.
I'm leaning toward combining AES-256 with XSalsa20. Other interesting
options include ChaCha (used in SHA-3 finalist Blake, and very similar
to Salsa20) and Threefish (used in SHA-3 finalist Skein).
Regards,
Zooko
[1] https://tahoe-lafs.org/trac/tahoe-lafs/wiki/OneHundredYearCryptography