Quoting Adrien Luxey (2018-09-27 18:21:06)
Dear Tor developers,
As a PhD student in distributed systems, I am studying onion routing.
We would like to investigate an onion routing system that would run on users devices, i.e. a lot of nodes with crappy bandwidth and intermittent connection. To compare against Tor, I have been looking for information about how you currently handle the disconnection of relays, but I found no "digest" on the Web. Your code seems to point to the direction of core/or/circuit*, that I need to investigate further. I still would love some first-hand high-level description from you guys.
I have the following questions:
• When an OR disconnects while supporting active circuits, how is the failure detected?
If it shuts down properly, then it will send a DESTROY cell in both directions notifying them that the circuit is now broken. Otherwise, Tor relies on the operating system TCP timeout to tell it when the connection has failed. Then, it generates DESTROY cells as needed to propagate the error.
• How are the paths subsequently rebuilt?
The same way they were built in the first place.
• To which extent would you say that Tor is resilient to churn? What would be the effects of a massive churn of relays? Where would be the bottleneck?
It depends on what you mean by churn. If you just mean the relays reboot, then that will cause circuits to fail frequently. If relays frequently disappear though, that will make it difficult to measure relay bandwidth (for proportioning network load) as well as maintain guard persistence. These problems likely do not have any solution.
If you have any questions on my work, I will be pleased to answer (though a little ashamed, because I know our system will never hold your security properties so tight)!
Consider reading the Tor protocol specification: https://gitweb.torproject.org/torspec.git/tree/tor-spec.txt.
It seems that your idea can basically be summarized as "implement circuit resumption". This is likely not inherently difficult to implement, except for the problem of knowing when to expire old sessions. If you just use the TCP rules, then you might as well just run Tor over multipath TCP or QUIC or something.