[tor-dev] CellStatistics circuit distribution scale could perhaps use adjustment

Karsten Loesing karsten at torproject.org
Sat Jan 4 08:24:49 UTC 2014


On 12/21/13 7:33 PM, starlight at binnacle.cx wrote:
> Have been running a guard for a couple
> of months with 'CellStatistics' and
> noticed that the distribution looks
> out of whack:
> 
> cell-stats-end 2013-12-20 18:13:10 (86400 s)                       
> cell-processed-cells 1409,9,6,6,6,5,4,3,2,1
> cell-queued-cells 0.44,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> cell-time-in-queue 98,1,1,1,0,13,2,1,1,0
> cell-circuits-per-decile 15199
> 
> Seems like most of the circuits with significant
> traffic end up in the first bucket and the
> remaining nine buckets are of little
> significance.  I'm fairly certain that a
> relative handful of circuits account for
> 99.9% of the cell traffic with cell-counts
> in the tens-to-hundreds of thousands.
> Most of that bot traffic I suppose.
> 
> Perhaps a log-scaled "loudness" breakdown
> would make sense?
> 
> Nothing pressing here, just an observation
> and a thought.

Hi,

you're right that a log-scaled breakdown would be more meaningful than
simple deciles.  A possible downside is that the first bucket would be
pretty small, especially on slow relays.  Not saying that we shouldn't
do it, but we have to be careful not to provide statistics on a too
small set of observations.  This requires analysis and experimenting.

One way to do the experimenting part is to run a large *private* Tor
network in Shadow, enable the new CELL_STATS event type, and aggregate
new cell statistics from logged events.  The next step would be to write
a proposal, discuss it on this list, write a patch, and get it reviewed
and merged into master.

If we spend all this effort, we should look into other changes to cell
statistics we want to make.  There's already a minor flaw in cell
statistics noted in dir-spec.txt: "Note that this statistic can be
inaccurate for circuits that had queued cells at the start or end of the
measurement interval."  If we touch cell statistics, we should fix that
issue, too.

More generally, we should step back and see what questions we want to
have answered by cell statistics.  And then we should design the
statistics that we need to answer those questions.

Want to help?

All the best,
Karsten



More information about the tor-dev mailing list