Hi Folks,
This is Sky from University of Illinois. Currently we are working on research project related with Tor.
To help us to better design and evaluation our proposal, we need some information about the Tor relays that is currently unavailable from the Atlas. Thus, if someone who operates Tor relays could provide us such information, your help will be greatly appreciated. :)
We hope to have an estimate about computation capacity of Tor relays. For instance, how many circuits a relay can maintain when its CPU is driven to about 100%? On average, how many circuits are maintained by a busy guard and what the CPU utilization is. These kinds of information would be really helpful.
Further, in Sep 2013, Tor saw a spike in its clients. https://blog.torproject.org/blog/how-to-handle-millions-new-tor-clients The suspicion is that a botnet used Tor as its C&C channel. Is there any data about the relay utilization during that time? For instance, how many more circuits a relay had to maintain during that time?
Thanks a lot for your help. :)
On Fri, Aug 26, 2016 at 01:42:38AM +0000, Liu, Zhuotao wrote:
This is Sky from University of Illinois. Currently we are working on research project related with Tor.
To help us to better design and evaluation our proposal, we need some information about the Tor relays that is currently unavailable from the Atlas. Thus, if someone who operates Tor relays could provide us such information, your help will be greatly appreciated. :)
We hope to have an estimate about computation capacity of Tor relays. For instance, how many circuits a relay can maintain when its CPU is driven to about 100%? On average, how many circuits are maintained by a busy guard and what the CPU utilization is. These kinds of information would be really helpful.
I don't know about CPU usage, but as for circuits, I think you can get them from @extra-info and @bridge-extra-info descriptors: https://collector.torproject.org/#type-extra-info https://collector.torproject.org/#type-bridge-extra-info For example, see some of the files at https://collector.torproject.org/recent/relay-descriptors/extra-infos/
The "cell-circuits-per-decile" lines might be interesting to you. https://spec.torproject.org/dir-spec: "cell-circuits-per-decile" num NL Mean number of circuits that are included in any of the deciles, rounded up to the next integer.
For parsing the descriptor files, you can use Stem: https://stem.torproject.org/_modules/stem/descriptor/extrainfo_descriptor.ht...
Thanks for that info, David. That seems valuable to me. :)
However, I am a bit confused about the definition
"cell-circuits-per-decile": Mean number of circuits that are included in any of the deciles, rounded up to the next integer.
What is the exact meaning of 'decile'? Is it one tenth of a hour? Or something else?
Thanks ________________________________________ From: David Fifield [david@bamsoftware.com] Sent: Thursday, August 25, 2016 23:26 To: Liu, Zhuotao Cc: tor-dev@lists.torproject.org Subject: Re: [tor-dev] Some information about Tor relays
On Fri, Aug 26, 2016 at 01:42:38AM +0000, Liu, Zhuotao wrote:
This is Sky from University of Illinois. Currently we are working on research project related with Tor.
To help us to better design and evaluation our proposal, we need some information about the Tor relays that is currently unavailable from the Atlas. Thus, if someone who operates Tor relays could provide us such information, your help will be greatly appreciated. :)
We hope to have an estimate about computation capacity of Tor relays. For instance, how many circuits a relay can maintain when its CPU is driven to about 100%? On average, how many circuits are maintained by a busy guard and what the CPU utilization is. These kinds of information would be really helpful.
I don't know about CPU usage, but as for circuits, I think you can get them from @extra-info and @bridge-extra-info descriptors: https://collector.torproject.org/#type-extra-info https://collector.torproject.org/#type-bridge-extra-info For example, see some of the files at https://collector.torproject.org/recent/relay-descriptors/extra-infos/
The "cell-circuits-per-decile" lines might be interesting to you. https://spec.torproject.org/dir-spec: "cell-circuits-per-decile" num NL Mean number of circuits that are included in any of the deciles, rounded up to the next integer.
For parsing the descriptor files, you can use Stem: https://stem.torproject.org/_modules/stem/descriptor/extrainfo_descriptor.ht...
On Fri, Aug 26, 2016 at 04:46:45AM +0000, Liu, Zhuotao wrote:
Thanks for that info, David. That seems valuable to me. :)
However, I am a bit confused about the definition
"cell-circuits-per-decile": Mean number of circuits that are included in any of the deciles, rounded up to the next integer.
What is the exact meaning of 'decile'? Is it one tenth of a hour? Or something else?
I don't know. My reading of dir-spec says it is probably the 0%–10%, 10%–20%, 20%–30%, etc. circuits counting by number of cells.
https://spec.torproject.org/dir-spec
"cell-processed-cells" num,...,num NL Mean number of processed cells per circuit, subdivided into deciles of circuits by the number of cells they have processed in descending order from loudest to quietest circuits.
I see. :) But thank you for the valuable info.
Thanks, Zhuotao ________________________________________ From: David Fifield [david@bamsoftware.com] Sent: Friday, August 26, 2016 0:04 To: Liu, Zhuotao Cc: tor-dev@lists.torproject.org Subject: Re: [tor-dev] Some information about Tor relays
On Fri, Aug 26, 2016 at 04:46:45AM +0000, Liu, Zhuotao wrote:
Thanks for that info, David. That seems valuable to me. :)
However, I am a bit confused about the definition
"cell-circuits-per-decile": Mean number of circuits that are included in any of the deciles, rounded up to the next integer.
What is the exact meaning of 'decile'? Is it one tenth of a hour? Or something else?
I don't know. My reading of dir-spec says it is probably the 0%–10%, 10%–20%, 20%–30%, etc. circuits counting by number of cells.
https://spec.torproject.org/dir-spec
"cell-processed-cells" num,...,num NL Mean number of processed cells per circuit, subdivided into deciles of circuits by the number of cells they have processed in descending order from loudest to quietest circuits.
On Fri, Aug 26, 2016 at 01:42:38AM +0000, Liu, Zhuotao wrote:
We hope to have an estimate about computation capacity of Tor relays. For instance, how many circuits a relay can maintain when its CPU is driven to about 100%? On average, how many circuits are maintained by a busy guard and what the CPU utilization is. These kinds of information would be really helpful.
I used to report CPU exhaustion when pushing 15-25 high circuit flux application streams in parallel through a client and thus its guards. To gather and characterize current limitations in an operational context you might want to deploy a guard at your university and run some clients through it, instrumenting various things, until something saturates.
I'd be interested in seeing estimates of what the net change in network usable CPU headroom [1] is when adding relays using certain fixed ratios of their own cpu/circuits and or cpu/clients and or cpu/bandwidth capacities.
Perhaps in other words... we roughly know how a clients stream over 3 or 6 hops might consume an additional 1Gbps added to the network. But what does adding its CPU to the network get us... and effect of clients/net on that. And with each box added, are we adding the right ratio of CPU and bandwidth, do we need a knob there to ensure optimum balanced benefit to the net, or is it better to leave it float.
[1] Left over for network meta purposes like circuit construction, directory services, consensus, parametric pathing computation, etc.
On Aug 26, 2016, at 2:15 AM, grarpamp grarpamp@gmail.com wrote:
On Fri, Aug 26, 2016 at 01:42:38AM +0000, Liu, Zhuotao wrote:
We hope to have an estimate about computation capacity of Tor relays. For instance, how many circuits a relay can maintain when its CPU is driven to about 100%? On average, how many circuits are maintained by a busy guard and what the CPU utilization is. These kinds of information would be really helpful.
I used to report CPU exhaustion when pushing 15-25 high circuit flux application streams in parallel through a client and thus its guards. To gather and characterize current limitations in an operational context you might want to deploy a guard at your university and run some clients through it, instrumenting various things, until something saturates.
I'd be interested in seeing estimates of what the net change in network usable CPU headroom [1] is when adding relays using certain fixed ratios of their own cpu/circuits and or cpu/clients and or cpu/bandwidth capacities.
Perhaps in other words... we roughly know how a clients stream over 3 or 6 hops might consume an additional 1Gbps added to the network. But what does adding its CPU to the network get us... and effect of clients/net on that. And with each box added, are we adding the right ratio of CPU and bandwidth, do we need a knob there to ensure optimum balanced benefit to the net, or is it better to leave it float.
[1] Left over for network meta purposes like circuit construction, directory services, consensus, parametric pathing computation, etc.
Hi Zhuotao,
We have performed some privacy-preserving measurements including the number of circuits and streams seen at exit relays, the amount of data transferred by exit relays, and the number of active/inactive clients connecting to entry relays. We only collected over small timeframes, and we didn't collect anything related to relay computation capacity, but you may be able to make some inferences based on our results.
The measurement system we developed, called PrivCount [1], uses differential privacy and secure aggregation protocols and is described in our upcoming paper "Safely Measuring Tor" [2] that will appear at the 23rd ACM Conference on Computer and Communication Security (CCS) in October. The measurement results are also presented in that paper.
Some highlights from the paper:
+ Tor has about 710,000 unique connected clients at any given time on average, of which about 550,000 (77%) are active (the remaining are connected but inactive). For comparison, Tor itself estimates about 1.75 million user *per day*, suggesting that the user population turns over about 2.5 times per day.
+ Data over ports 80 and 443 accounts for about 91% of the traffic exiting Tor, which is up from about 42% in 2010. This suggests that either there was a shift of file-sharing traffic onto standard web ports, or lower file-sharing usage overall, or both.
Hope this helps!
Cheers, Rob
[1] https://github.com/privcount/privcount [2] http://www.robgjansen.com/publications/privcount-ccs2016.pdf
Thanks for your notes.
Besides these pretty expensive streams, we want to have a general estimation about the computation capability about Tor relays. The concern is when a botnet abuses Tor as its primary C&C channel, they would create tons of circuits through Tor. But they many not run any expensive streams, as what we saw in Aug 2013. So the general question is how many (basic) circuits can Tor relays hold before their computation saturates. Or in other words, when a botnet switches to use Tor as C&C channel, will it be a denial of service?
Thanks, Zhuotao
________________________________________ From: grarpamp [grarpamp@gmail.com] Sent: Friday, August 26, 2016 1:15 To: tor-dev@lists.torproject.org Cc: Liu, Zhuotao Subject: Re: [tor-dev] Some information about Tor relays
On Fri, Aug 26, 2016 at 01:42:38AM +0000, Liu, Zhuotao wrote:
We hope to have an estimate about computation capacity of Tor relays. For instance, how many circuits a relay can maintain when its CPU is driven to about 100%? On average, how many circuits are maintained by a busy guard and what the CPU utilization is. These kinds of information would be really helpful.
I used to report CPU exhaustion when pushing 15-25 high circuit flux application streams in parallel through a client and thus its guards. To gather and characterize current limitations in an operational context you might want to deploy a guard at your university and run some clients through it, instrumenting various things, until something saturates.
I'd be interested in seeing estimates of what the net change in network usable CPU headroom [1] is when adding relays using certain fixed ratios of their own cpu/circuits and or cpu/clients and or cpu/bandwidth capacities.
Perhaps in other words... we roughly know how a clients stream over 3 or 6 hops might consume an additional 1Gbps added to the network. But what does adding its CPU to the network get us... and effect of clients/net on that. And with each box added, are we adding the right ratio of CPU and bandwidth, do we need a knob there to ensure optimum balanced benefit to the net, or is it better to leave it float.
[1] Left over for network meta purposes like circuit construction, directory services, consensus, parametric pathing computation, etc.