[tor-talk] Fwd: [Full-disclosure] tor vulnerabilities?
nickm at alum.mit.edu
Sat Jun 29 21:53:57 UTC 2013
On Sat, Jun 29, 2013 at 4:43 PM, Cool Hand Luke
<coolhandluke at coolhandluke.org> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
> the below text was posted to pastebin.com (see original e-mail to the
> full-disclosure list at the end of this message).
> - ----- BEGIN PASTEBIN -----
> Tor LOL:
> directory authorities are the point of contact for clients to locate
> relays/exit nodes/guard nodes/etc. This is determined by a consensus
> document that goes through an elaborate process to ensure its integrity
> and cause bad directory authorities to be identified also via consensus.
> However, Tor developers are not the quickest lot, and this is basically
> the only document that they serve that has integrity control on it. Most
> interestingly, the public keys for every other node in the network is
> served without any form of signature or other form of integrity control.
> As such, a rogue directory authority, which anyone can be simply with a
> configuration option and an IP, can introduce path bias and other such
> tricks by serving the wrong keys for relays/guards/exits that it doesnt
> control. This can result in essentially directing clients through the
> network by causing decryption failures, thereby allowing determination
> of the source and end-point of a given tor connection with little more
> than a couple relays and some rogue directory authorities. Moreover, it
> can use the simple-minded metrics made to identify rogue guard nodes and
> couple that together with the behavior of public key cryptography to
> actually cause legitimate guard nodes to be flagged as having excessive
> extend cell failures causing it ultimately to be marked as bad.
I think this guy is confused. I tried to tell him as much when he
twittered at me last night; you can see more or less the full record
if you look at the @nickm_tors from last night.
tl;dr: relay onion keys are indeed authenticated by the consensus
document. On discussion, it appears that the guy thinks we aren't
actually authenticating them, though. He posted
http://i.imgur.com/uVQTKlT.png to try to explain what he has in mind.
The attack doesn't work, though, as far as I can tell. Her's what I
started writing up about it.
Some preliminary notes to clear up:
- Being in the microdescriptor cache (the one implemented in
microdesc.c as microdesc_map) is not sufficient for a
microdescriptor to actually get *used*. It has to be linked to a
node_t. The function that does that is the one in nodelist.c. More
on this below.
- Directory authorities and directory mirrors are different.
Directory authorities are a closed set, whose public keys are
distributed with the source. Anybody can be a _directory mirror_
simply with a configuration option and an IP.
- There are indeed three paths to the microdescs_add_to_cache()
function. One of them (in directory.c, not dirserv.c), passes a
list "which" of the microdescriptor digests we requested
microdescriptors for. The other two don't. But those are the ones
that are reading microdescriptors from disk, so those
microdescriptors were already checked on a previous run of the
program. (Also, adding them to microdesc_map is harmless; see
- Note that a corrupt directory mirror could try to influence path
selection, by simply not answering requests for some nodes'
microdescriptors, and pretending not to have them. I'll call this
the "response filtering" attack. (Note also that it has nothing to
do with cryptographic verification.) To resist it:
* When clients want a directory resource, and they don't receive
it, they request it from other directory mirrors until they
do get it.
* Clients don't build client circuits until they have
information for a sufficient fraction of the nodes in the
network, as calculated in nodelist.c,
So unless the "response filtering" attacker controls all the
directory mirrors that the client uses, they can't prevent the user
from learning microdescriptors for all the nodes they want. And if
they temporarily prevent the user from learning a given discriptor,
the extent to which they can distort the user's view of the network
is limited by the minimum_dir_info check
Okay, so let's walk through the code.
Here's what's *supposed* to happen.
The client decides to make a request for microdescriptors. This
happens in update_microdesc_downloads, where they call
microdesc_list_missing_digest256 to get a list of the
microdescriptor digests listed in the microdesc-flavor consensus
such that the client does not have and is not already trying to
fetch a microdescriptor with that digest. The client passes this
list to launch_descriptor_downloads, which actually does the work of
sending the requests to one or more directory mirrors. The list of
microdescriptor digests requested is encoded in the
"requested_resource" field of the directory connection.
The directory mirror responds with a buffer, which the client hopes
will contain microdescriptors with those digests. In directory.c,
the client reconstructs the list of which digests it asked for (by
calling dir_split_resource_into_fingerprints) and passes that list
of requested digests, along with the directory's response, to
In microdesc_add_to_cache, the client first calls
microdescs_parse_from_string. Now "descriptors" contains a list of
the received microdescriptors. For every microdescriptor,
md->digest is a digest of all of its textual contents.
Then, it makes sure that the directory did not tell it any
microdescriptors it hadn't asked for. It does this by using a
temporary map, "requested". It initializes requested as mapping D
to 1 for every digest in requested_digests256. It then iterates
over the microdescriptors. If a microdescriptor's digest is in
"requested", it sets the value in "requested" for that digest to 2,
indicating that the microdescriptor was found. If the
microdescriptor's digest is not in "requested", it frees the
microdescriptor, removes it from the "descriptors" list, and logs a
(The function then removes every digest corresponding to a received
microdescriptor from the 'requested_digests256' list, so that the
caller knows what it didn't receive.)
Notice that at this point, it has not checked whether the
microdescriptors' digests match the digests listed for particular
nodes or not -- only that the client actually requested
microdescriptors with those digests. It hasn't even matched
microdescriptors up with nodes! That comes later.
Now we move on to microdescs_add_list_to_cache. Our job here is to
store the newly received microdescriptors to disk; to insert them
into microdesc_map, and finally pass them to
Before we pass the nodes to nodelist_add_microdesc(), let's recap
where we are. The microdesc_map contains microdescriptors, indexed
by their digests. These are all microdescriptors that we read from
disk cache from an earlier session, or ones we received in reply to
a request that we made for directory mirror request. They are not
yet associated with nodes. We have not yet checked that they still
match the consensus.
Now we get to nodelist_add_microdesc. This part is key. It looks
up, in the microdesc consensus, whether we have any routerstatus
whose listed microdescriptor digest (stored in its descriptor_digest
field) matches the digest of the microdescriptor we have received.
If so, it finds the corresponding node_t object, and associates the
microdescriptor with that node_t.
[Aside: if we already have a microdescriptor when we get a
new consensus, it gets associated with the node_t in the
nodelist_set_consensus function, where we look it up using
microdesc_cache_lookup_by_digest256 with the microdescriptor
digest listed in the consensus.]
Associating the microdescriptor with a node_t might seem like an
afterthought, but it's actually the security-critical part here.
When do we use an onion-key from a microdescriptor? When we extract
it in extend_info_from_node(). But that only looks at the
microdescriptor currently associated with a node by
nodelist_add_microdesc(). If the microdesc wasn't associated with a
node there, we wouldn't even find its onion key.
So, what could go wrong?
1. Suppose that we start up with a microdescriptor cache that contains
some microdescriptors which aren't in the consensus.
In this case, microdesc_add_list_to_cache will indeed add them to
microdesc_map, indexed by their digests. But they won't get
associated with nodes, so they won't affect client behavior.
2. Suppose that the directory mirror sends the same (requested)
microdescriptor more than once in a given response.
In this case, the "md2 = HT_FIND(microdesc_map, &cache->map, md)"
check in microdesc_add_list_to_cache will make only one copy get
In any case, they will only get associated with nodes if they match
the digest listed for that node in the consensus.
3. Suppose that the directory mirror sends some microdescriptors in the
directory response that did not have their digests listed in the
In that case, they'll not be found among the microdescs in
requested_digests256, and they'll be dropped.
Okay, now let's try the real thought experiment.
Suppose that according to the consensus, node N1 with identity ID1
has a microdescriptor M1 with digest D1, and node N2 with identity
ID2 has a microdescriptor M2 with digest D2, and so on. Suppose that
the client sees a consensus that lists D1 for ID1, D2 for ID2, and so
on. Suppose that the client requests D1...Dn. Suppose that the
directory mirror sends back ANYTHING OTHER THAN M1...Mn. What could
First off, any members of the response that are duplicates will get
dropped, and any whose digests don't appear in D1...Dn will get
dropped. It will be as if the directory mirror didn't send them at
So the only stuff that will make it into the microdescriptor cache
can will be microdescriptors whose digests match a subset of
D1...Dn. Assuming that SHA256 is collision-resistant, that means
that a subset of M1..Mn will make it in.
Can anything cause the client to associate M1 with a node other than
N1? No, since this association is done explicitly by the <ID1,D1>
mapping in the node's routerstatus in the signed consensus.
So the directory mirror can, at worst, cause the client to have a
subset of the answers it requested. This reduces to the "request
filtering" attack above, which has defenses.
More information about the tor-talk