Hi Tor devs,
It's surprisingly hard to work on Tor during the Tor developer meetings! My apologies for not publishing this text until now, despite my repeated ranting about the subject the last few days.
Well, here it is, in an early draft version. Thank you all who've listened patiently and given valuable feedback. I welcome more feedback from the list. Thanks in advance.
--8<---------------cut here---------------start------------->8--- Filename: xxx-tor-consensus-transparency.txt Title: Tor Consensus Transparency Author: Linus Nordberg Created: 2014-06-28 Status: Draft
0. Introduction
WARNING!!! EARLY DRAFT -- MISSING IMPORTANT BITS AND PIECES!
This document describes how to provide and use public, append-only, untrusted logs containing Tor consensus documents, much like what Certificate Transparency [RFC6962] does for X.509 certificates. Tor relays and clients can then refuse using a consensus not present in logs of their choosing.
WARNING!!! EARLY DRAFT -- MISSING IMPORTANT BITS AND PIECES!
1. Overview
Using a public, append-only, untrusted log like the history tree described in [CrosbyWallach], Tor clients and relays verify that consensus documents are present in one or more logs before using them.
Consensus-users, i.e. Tor clients and relays, expect to receive one or more "proof of inclusions" with new consensus documents. A proof of inclusion is a hash sum representing the tree head of a log, signed by the logs private key, and an audit path listing the nodes in the tree needed to recreate the tree head. Consensus-users are configured to use one or more logs by listing a log address and a public key for each log. This is used to verify that a given consensus document is present in a given log.
Anyone can submit a properly formatted and signed consensus document to a log and get a signed proof of inclusion in return. Directory authorities should do this and include the proofs when serving consensus documents. Directory caches and consensus-users receiving a consensus not including a proof of inclusion submit the document and use the proof they receive in return.
Auditing log behaviour and monitoring the contents of logs is performed in cooperation between the Tor network and external services. Directory caches act as log auditors with help from Tor clients gossiping about what they see. Directory authorities are good candidates for monitoring log content since they know what documents they have issued. Anybody can run both an auditor and a monitor though, which is an important property of the proposed system.
Summary of proposed changes to Tor:
- Directory authorities start submitting newly created consensuses to at least one public log.
- Tor clients and relays receiving a consensus not accompanied by a proof of inclusion start submitting that to at least one public log.
- Consensus-users start rejecting consensuses accompanied by an invalid proof of inclusion.
- A new cell type LOG_GOSSIP is defined, for clients and relays to exchange information about tree heads seen and their validity.
- Consensus-users send LOG_GOSSIP cells with seen tree heads to relays.
- Relays validate tree heads received in LOG_GOSSIP cells (section 3.2.2) and send results to consensus-users in LOG_GOSSIP cells.
2. Motivation
Popping five boxes or factoring five RSA keys should not be ruled out as a possible attack against a subset of the Tor network. An attacker controlling a majority of the directory authorities signing keys can, using man-in-the-middle or man-on-the-side attacks, serve consensus documents listing relays under their control. If mounted on a small subset of the network, the chance of detection is probably low. This proposal increases the cost for such an attack by raising the chances of it to be detected.
The complexity of the proposed solution is motivated by the value of the decentralisation given. Anybody can run their own log and use it. Anybody can audit any existing logs and verify their correct behaviour. This empowers people outside the group of Tor directory authority operators and the people who trust them on a personal basis.
3. Design
Communication with logs is done over http(s) similar to what [RFC6962] defines. This proposal does not use [[the TLS data structures]] but instead structures based on [[FIXME]]. Parameters for POSTs and all responses are encoded as name/value pairs in JSON objects [RFC4627].
Definitions:
- Log id: The SHA-256 hash of the log's public key, to be treated as an opaque byte string identifying the log.
3.1 Consensus submission
Logs accept consensus submissions from anyone as long as the consensus is signed by a majority of the Tor directory authorities of the Tor network that the log is logging.
[[TODO: Move most of this to "specification" section?]]
Consensus documents are POST:ed to well-known URL
https://<log server>/tct/v1/add
Input:
consensus: A consensus status document as defined in [dir-spec] section 3.4.1.
Output:
id: The log id, base64 encoded.
tree_size: The size of the tree, in entries, in decimal.
timestamp: The timestamp, in decimal.
sha256_root_hash: The Merkle Tree Hash of a tree including the submitted entry, in base64.
tree_head_signature: A TreeHeadSignature ([RFC6962] section 3.5) for the above data.
audit_path: An array of base64-encoded Merkle tree nodes proving the inclusion of the submitted entry in the tree denoted by sha256_root_hash (see [RFC6962] section 2.1.1).
The output is what we call a proof of inclusion.
The tree_head_signature is signed with the private key of the log.
3.2 Consensus verification
3.2.1 Log entry membership
Calculate a tree head from the hash of the received consensus and the audit path in proof. Verify that it's identical to the tree head in the proof. This can easily be done by consensus-users for each received consensus.
We now know that the consensus is part of a tree which the log claims to be The Tree. Whether this tree is the same tree that everybody else see is unknown at this point.
3.2.2 Append-only property of the log
Ask the log for a consistency proof between the received tree head and a previously known good tree head. The known good head can be the empty tree. [[TODO add text about how to deal with received heads that are older than the last known good tree.]] Communication with logs is done over http(s) [[as described in [RFC6962] section 4 -- TODO specify protocol and encoding]].
[[description of consistency verification goes here]]
Tor relays may do this for tree heads received in LOG_GOSSIP cells and communicate results in the same cells. [[TODO: Do this synchronously or asynchronously?]] Relays cache results to minimise the need for communication with log servers and calculations.
We now know that the received tree is a superset of the known good tree.
3.3 Log auditing
A log auditor verifies that the log presents the same view to all its clients and its append-only property, i.e. that no entries once accepted by the log are ever changed or removed. [[TODO describe the Tor networks role in auditing a bit more than what's mentioned in 3.2.2]]
3.4 Log monitoring
A log monitor verifies that the contents of the log is consistent with the rules of the Tor network, notably that all entries are properly formed and signed Tor consensus documents. Note that there can be more than one valid consensus documents for a given point in time. One reason for this is that the number of signatures can differ due to consensus voting timing details. [[Are there more?]]
[[TODO expand on monitoring strategies -- even if this is not part of proposed extensions to the Tor network it's good for understanding]]
3.5 Consensus-user behaviour
Keep an on-disk cache of consensus documents. Mark them as being in on of three states:
LOG_STATE_UNKNOWN -- don't know whether it's present in enough logs or not LOG_STATE_LOGGED -- have seen good proof(s) of inclusion LOG_STATE_LOGGED_GOOD -- confident about the tree head representing a good tree
Newly arrived consensus documents start in LOG_STATE_UNKNOWN or LOG_STATE_LOGGED depending on whether they are accompanied by enough proofs or not. There are two possible state transitions:
- LOG_STATE_UNKNOWN --> LOG_STATE_LOGGED: Seen enough proofs of inclusion verifying correctly according to section 3.2.1. The number of good proofs needed is a policy setting in the configuration of the consensus-user.
- LOG_STATE_LOGGED --> LOG_STATE_LOGGED_GOOD: Seen enough gossiping to know that the tree head in the proof belongs to a known log.
Consensuses in state LOG_STATE_UNKNOWN are not used but are instead submitted to one or more logs. This may take the consensus to LOG_STATE_LOGGED.
Consensuses in state LOG_STATE_LOGGED are used despite not being fully verified with regard to logging. LOG_GOSSIP cells with the tree heads from received proofs are being sent to relays for further verified. Clients send to all relays that they have a circuit to to. Relays send to three random relays that they have a circuit to.
3.6 Relay behaviour when acting as an auditor
TODO
3.7 Notable differences from Certificate Transparency
- The data logged is "strictly time-stamped", i.e. ordered.
- Much shorter lifetime of logged data -- a day rather than a year. Is the effects of this difference of importance only for "one-shot attacks"?
- Directory authorities have consensus about what they're signing -- there are no "web sites knowing better".
- Submitters are not in the same hurry as CA:s and can wait minutes rather than seconds for a proof of inclusion.
4. Security implications
TODO
5. Specification
TODO
? Compatibility ? Implementation ? Performance and scalability notes
A. Open issues
- handle all consensus flavours (i.e. microdescriptor consensuses) - don't use "consensus verification" since that's misleading - maybe add hash function agility, i.e. don't fixate SHA-256 (but see CT discussion about why not and TODO summarize it here) - add a blurb about the values of publishing logs as Tor hidden services - should relays gossip amongst each others too? - discuss compromise of log keys - add 'version' and 'extensions' fields to the submission response? - maybe log votes as well
B. Acknowledgements
This proposal leans heavily on [RFC6962]. Some definitions are copied verbatim from that document. Valuable feedback has been received from Ben Laurie and Karsten Loesing.
C. References
[CrosbyWallach] http://static.usenix.org/event/sec09/tech/full_papers/crosby.pdf [dir-spec] https://gitweb.torproject.org/torspec.git/blob/HEAD:/dir-spec.txt [RFC4627] https://tools.ietf.org/html/rfc4627 [RFC6962] https://tools.ietf.org/html/rfc6962 --8<---------------cut here---------------end--------------->8---