Stress testing guard nodes
 
            Hi, I am trying to identify bottlenecks at an entry guard when it is used to create a large number of circuits. To identify this I started a set of clients which use a single entry-guard to create 5K-10K circuits in total. With this scale clients observe timeouts (waiting for CREATED cell) while constructing circuits and they consider the entry guard to be down. I control the entry-guard chosen in this experiment and no other client should be using this relay node as a entry guard as it does not have GUARD flag in consensus. Lack of bandwidth at the entry guard is not the cause for timing-out for circuits. It consumed 80-90 KBps while the entry guard's RelayBandwidthRate is set to 400 KBps. Also, CPU at the entry-guard is not bottleneck as it was always less than 10% during the course of experiment. Can anyone provide me pointers to why these circuits timeout? And, what would be an effective way to verify that this indeed is the bottleneck? I tried profiling the entry-guard with callgrind but this is extremely slow. In the meanwhile I will create a static tor instance so that gprof produces some meaningful results. Motivation for this work is to enable a client to predict how many circuits it can create without degrading performance of its entry-guards. If a circuit is not created within a timeout period, then the client perceives the entry guard to be down and it starts using another guard node. If the client keeps on creating excessive circuits, then it is more likely to end-up using a malicious (and resourceful) entry guard. The prediction model would help the client to stay anonymous as it would refrain from unnecessarily switching guard nodes. Thanks, Abhishek
 
            On 26 Apr 2016, at 02:10, Abhishek Singh <abhishek.singh.kumar@gmail.com> wrote:
Hi,
I am trying to identify bottlenecks at an entry guard when it is used to create a large number of circuits. To identify this I started a set of clients which use a single entry-guard to create 5K-10K circuits in total. With this scale clients observe timeouts (waiting for CREATED cell) while constructing circuits and they consider the entry guard to be down.
That's a large number of circuits to be creating on the public Tor network for an experiment. Have you considered starting up your own private tor test network? https://gitweb.torproject.org/chutney.git/
I control the entry-guard chosen in this experiment and no other client should be using this relay node as a entry guard as it does not have GUARD flag in consensus.
That's not quite true, Tor had a bug where clients were happy to pick directory guards without the Guard flag. We fixed it a few months ago, but not all users will have updated. https://trac.torproject.org/projects/tor/ticket/17772 If you log detailed information about each connection, there is a risk you could expose a Tor user's IP, or expose the fact they are using your guard. Again, I very strongly encourage you to use a private test network, where you can control these factors.
Lack of bandwidth at the entry guard is not the cause for timing-out for circuits. It consumed 80-90 KBps while the entry guard's RelayBandwidthRate is set to 400 KBps. Also, CPU at the entry-guard is not bottleneck as it was always less than 10% during the course of experiment.
Can anyone provide me pointers to why these circuits timeout? And, what would be an effective way to verify that this indeed is the bottleneck?
You might have convinced tor filled the kernel buffers on your clients or your entry guard, and they will block until empty. You might have run into file descriptor or buffer limits on your clients or entry guard. Your clients or entry guard might be waiting for access to memory or the network or other blocking operations. Depending on the level you're logging, you might even be blocking on writing debug log entries. Since tor isn't using much CPU, can you work out which calls are blocking?
I tried profiling the entry-guard with callgrind but this is extremely slow. In the meanwhile I will create a static tor instance so that gprof produces some meaningful results.
Motivation for this work is to enable a client to predict how many circuits it can create without degrading performance of its entry-guards. If a circuit is not created within a timeout period, then the client perceives the entry guard to be down and it starts using another guard node. If the client keeps on creating excessive circuits, then it is more likely to end-up using a malicious (and resourceful) entry guard. The prediction model would help the client to stay anonymous as it would refrain from unnecessarily switching guard nodes.
Are you aware of the existing CircuitBuildTimeout option, and the existing dynamic model used to predict timeouts? If you gradually increase the load on the guard, your clients may have time to adapt this model. Tim Tim Wilson-Brown (teor) teor2345 at gmail dot com PGP 968F094B ricochet:ekmygaiu4rzgsk6n
participants (2)
- 
                 Abhishek Singh Abhishek Singh
- 
                 Tim Wilson-Brown - teor Tim Wilson-Brown - teor