[tor-dev] Implementing an embeddable C++ Tor client, advice requested

Tue Jun 23 18:34:21 UTC 2020

On Fri, Jun 19, 2020 at 3:59 PM The Paranoia Project
<info at paranoia.tools> wrote:
>
> Hello everyone, I'm a long time user of Tor, first time poster here.
>
> Over the last few months, I have been working on a light weight C++ client only implementation of the Tor protocol, intended to be used as an embedded library in other applications. It is now at the stage where it can complete bootstrapping, build circuits, as well as connect to services / host hidden services (v3). I have been building this primarily off the spec documents (which have generally been extremely helpful), and well as the assistance of stepping through the official Tor implementation when needed for troubleshooting and to confirm specifics. Before I release this to the wider world, I'd like to confirm a few points that may not be explicitly stated in the specs.

Hello, P!  It'll be neat to have a look at this when it comes out;
building a Tor implementation is a lot of work.

> 1. In general, what are the things to look out for when implementing the Tor protocol beyond "making it work" and validating all data (signatures, timestamps, etc)? One thing I'm concerned about is the risk of fingerprinting where the spec does not completely specify behaviour, e.g. the order in which link specifiers are passed in an extend cell, exact criteria for when circuits are explicitly destroyed etc. (I'm very excited to see the proposals around CBOR on this point which would help greatly with knowing that a canonical data representation was used).

There are a lot of these, I'm afraid, and they're not all perfectly
documented.  Some of the trickier ones are about "being kind to the
network" -- not making too many circuits, not over-using resources
when idle, and so on.  These are under-documented, but I believe Roger
has talked a few times about starting a document to collect these.
Roger, do you remember if this ever went anywhere, and produced a
draft or something?

For your first few versions, I'd suggest that your best bet is to
label your software loudly as experimental and alpha, since there will
almost certainly be ways to distinguish your software and surprising
bugs. Trying to be completely indistinguishable is probably
impossible, due to timing issues at least -- about the best you can do
is try to avoid easy ways to passively distinguish your software.

In general, I wouldn't mind taking patches to enhance the specs by
describing a preferred behavior whenever there is more than on
possible behavior.

> 2. When it comes to bootstrapping, the official implementation appears to favour accessing directories via plaintext HTTP rather than connecting on the OR port and using create fast / begin dir. What is the motivation for using the plaintext option (and for that matter, having a plaintext http service open at all)?. While the OR will learn just as much about the client regardless, it seems like the default plaintext access to directory information unnecessarily gives away details of how clients engage with the Tor network to third parties.

That isn't right.  It's preferred for clients to download directory
material over the ORPort via begindir.  Plaintext DirPorts are
supposed to be used by relays and authorities only.

What part of the spec or the implementation says that the plaintext
dirport should be preferred?  I'd like to correct that.

> 3. When using bridges and in particular pluggable transports, how is the client intended to safely bootstrap in the cold start case where it does not know up front which bridge/relay it will be connected to (e.g. when using Snowflake)? The RSA identity can be accepted in blind faith based on the Tor handshake, and it's then possible to get the full details with create fast / begin dir, but how does a client know that it has been connected to a bridge that is "blessed" by the Tor network rather than a MITM actor?

Bridge addresses and identities need to be discovered out-of-band, by
some means like bridgedb.torproject.org, personal communication, or
bundling with software.  The provenance of this information is the
only way to tell whether you're getting a likely-to-be-good bridge or
likely-to-be-run-by-your-enemy bridge.

> 4. Finally, if anyone reading has been involved with or close to the development of other unofficial Tor implementations, what are the lessons learned on this front? I'm aware of among others Orchid (updated last in 2016), node-Tor (does not implement ECC) and torpy (does not implement hidden services v3). What makes these fail / stall?

I'll let developers answer here -- part of the issue is that it can be
hard to maintain feature parity over time.

For a longer list of implementations, see
https://gitlab.torproject.org/legacy/trac/-/wikis/doc/ListOfTorImplementations
[warning -- wiki migration in progress].

best wishes,
-- 
Nick