Hello Tor devs,
Namecoin is interested in collaboration with Tor in relation to human-readable .onion names; I'm reaching out to see how open the Tor community would be to this, and to get feedback on how exactly the integration might work.
The new hidden service spec is going to substantially increase the length of .onion names, which presents usability concerns. Namecoin provides a way to resolve a human-readable .bit name to a .onion name. Another benefit of Namecoin is that it provides a way to lookup TLS fingerprints for clearnet .bit sites, which reduces the risk of MITM attacks on clearnet websites from malicious or compromised CA's.
I had the pleasure of meeting Mike Perry at the Decentralized Web Summit at the Internet Archive in June; I talked to him about Namecoin's rough plans and he suggested I post here. I understand that Riccardo Spagni from Monero discussed this topic as well with Roger Dingledine at the Security in Times of Surveillance conference at Ei_PSI.
The two most major concerns that I expect would be brought up involve anonymity and blockchain size. Here's how we plan to deal with these issues:
Namecoin already provides location-anonymity for name registrations assuming that it's routed via Tor. It's also necessary to broadcast transactions for different names to different peers, which isn't coded yet, but this is just coding work rather than an engineering challenge -- a usable workaround today is running multiple Namecoin wallets.
The more interesting challenge is blockchain anonymity for registrations, due to the linkability required for blockchain validation. An important point here is that transactions for a given name are inherently linkable to each other, and that this isn't problematic. The problem would come when multiple names are linked together, or when a name is linked with currency transactions. The solution I've come up with is to use atomic cross-chain trades, which let a user buy namecoins using a cryptocurrency that is designed to provide anonymity (such as Monero or Zcash, both of which have cryptographic proofs of anonymity, given a certain anonymity set and security assumptions). The user would use an anonymous cryptocurrency to buy a small amount of namecoins (enough to register a single name and keep it renewed for a while). If the user wanted to register another name, she would perform another atomic cross-chain trade, receiving namecoins that are not linked to the namecoins obtained for the first name. As long as those namecoins are not mixed by the wallet software, the names remain unlinked.
Many users won't want to download the full Namecoin blockchain (around 3 GB at the moment). I have a proof-of-concept SPV-based Namecoin name lookup client working as of early June. I just got a large part of that code upstreamed into libdohj, and I'm working on getting the rest upstreamed and released. It's in Java (based on BitcoinJ), so it's not subject to the memory safety concerns that C/C++ code are. The SPV name lookups are implemented in 3 ways, depending on the user's needs:
Option A:
1. Block headers are synced over the Namecoin P2P network. (Over clearnet this takes about 5 minutes the first time it runs.) 2. An index mapping unexpired block heights to block hashes is constructed, so that lookups can be done quickly. (This occurs when the SPV client starts, after syncup has completed; it's fast enough that I haven't found a need to benchmark it.) 3. When a name lookup request is received, the client asks a remote API server for the height of the last update of the name. 4. The client looks up the block hash of that height from its index, and requests that block over the P2P network. 5. The client verifies that the received block matches the correct hash and that the block follows Namecoin rules (e.g. verifying the merkle root). 6. The client looks through the transactions in the block until it finds the one that updates the name. 7. The client retrieves the value of the name from that transaction, and returns it to the user.
Option B:
1 through 3. Same as Option A. 4. The API server also provides the full content of the transaction, as well as a merkle proof of inclusion in the block. 5. The client verifies that the merkle proof links the hash of the provided transaction to the merkle root of the block header with the given height. 6. The client retrieves the value of the name from the provided transaction, and returns it to the user.
Option C:
1. Block headers are synced over the Namecoin P2P network, as well as full blocks for the past year (meaning that all full blocks that contain unexpired name data will be synced). (Over clearnet this takes about 10 minutes the first time it runs.) 2. An index mapping names to transactions is constructed as the full blocks are downloaded. (This uses LevelDB.) 3. When a name lookup request is received, the client looks up the transaction in the LevelDB index. 4. The client retrieves the value of the name from that transaction, and returns it to the user.
For Options A and B, if the API server is malicious, it can do any of the following:
1. Falsely claim that the name doesn't exist. 2. Provide outdated name data that is less than 36000 blocks old (the expiration period for Namecoin).
(Option C is not vulnerable to either of those attacks.)
If multiple API servers are consulted, and they return different results, it is easy to tell which is lying (although I haven't implemented any such logic yet).
The API server cannot do any of the following:
1. Provide name data that isn't from the blockchain with the most work. 2. Provide name data that is more than 36000 blocks old (the expiration period for Namecoin).
The reason an API server is used in Options A and B instead of the P2P network, is that the P2P network is unauthenticated and easy to Sybil. The P2P network is great for getting data that is independently verifiable (e.g. block headers and contents of blocks), but it's unwise to rely on the P2P network to get unverifiable data such as a block height of a name. An API server is authenticated (currently via CA-based TLS, but a cert pin or PGP signing is certainly doable), which reduces the possible points of attack. This is analogous to why Tor uses centralized directory authorities -- authenticated trust points are harder to Sybil.
(We do have longer term plans to introduce a way for SPV clients to get the latest transaction associated with a name, without using an API server or needing to download any full blocks, but that's out of scope of this email.)
Options A and B do reveal to the API server which name is being looked up. If mode A is used, it also reveals to a P2P peer which block height is being looked up (which narrows the set of names by a factor of ~36000). Therefore, Tor stream isolation should be used in such cases. (That's not implemented yet.) Option C doesn't generate any network traffic on lookups, so it doesn't reveal anything.
In my testing, an SPV-based name lookup using Option A takes around 650 milliseconds (over clearnet). The vast majority of this is latency to the API server (the server I'm testing with is on a low-budget hosting plan). The portion consisting of a block retrieval over P2P takes around 98 milliseconds (although it varies by block size). Option C takes around 4 milliseconds.
The storage overhead of Option C's LevelDB database is around 400 MB right now, although I believe it's feasible to reduce this significantly.
There are a few options I can think of for integrating this with Tor for .onion naming. One would be to modify OnioNS to call the Namecoin SPV client. This would concern me because OnioNS is in C++, which introduces the risk of memory safety vulnerabilities. Another would be to use an intermediate proxy like Yawning's or-ctl-filter. A third option would be to try to get external name resolution implemented in Tor itself, which I believe Jeff Burdges has suggested in the past. If Option A or B is used, any solution would need to pass the stream isolation info to the SPV client.
Integrating this with Tor Browser for TLS certificate validation might involve a Firefox patch. There are tricks that can be done with the CertDB and SiteSecurityService XPCOM interfaces that will do the job without Firefox patches, but XPCOM is being phased out by Mozilla in favor of WebExtensions, and I'm unaware of any equivalent features in WebExtensions. (Also, it's unclear to me whether CertDB and SiteSecurityService would introduce isolation issues -- I can't think of any obvious attacks, but I haven't thought very hard about it.) I'm trying to engage with Mozilla to see if we can work out a WebExtensions feature for this, but nothing conclusive has happened on that front yet.
On the subject of reproducible builds, I've never tried to build Java code in Gitian, so I'm not certain how difficult it's going to be. Since Android uses Java, maybe the Guardian Project devs would have some insight into the best way to do it. One of the Namecoin developers (Joseph Bisch) is really good with reproducible builds (you probably know him since he's the author of the Debian guest support in Gitian), so I'm reasonably confident that a way to do it can be found.
I'd love to hear feedback on all of this.
Cheers, -Jeremy Rand Lead Application Engineer of Namecoin