[metrics-team] PrivCount in Tor session in Rome

teor teor2345 at gmail.com
Tue Mar 13 08:00:55 UTC 2018


> On 12 Mar 2018, at 20:47, Karsten Loesing <karsten at torproject.org> wrote:
> 
> On 2018-03-11 11:32, teor wrote:
>>> Just a reminder that we have the PrivCount in Tor session at 2pm
>>> on the team's day (Sunday).
> 
> Hi teor!
> 
> I just read the session notes
> (https://trac.torproject.org/projects/tor/wiki/org/meetings/2018Rome/Notes/PrivCountInTor
> -- thanks for taking notes and posting them!) and have a few
> remarks/questions:
> 
> 1. It's true that graphs on Tor Metrics only show totals over all
> relays/bridges. But keep in mind that some statistics are also shown per
> relay/bridge on Relay Search. For example, if we were to move bandwidth
> statistics to PrivCount, we couldn't provide bandwidth graphs per
> relay/bridge on Relay Search anymore.

Yes, there are some statistics that we can't put in PrivCount.

In particular, if relays stop reporting their bandwidth, we can't use
self-reported bandwidths as an input to the bandwidth authority
calculations.

But in these cases, we can (and should) add noise and then bin.
Adding noise protects the clients that contribute to each individual reported
value. Binning provides some uncertainty in the underlying value over time.

> 2. What analysis can the metrics team do to help with PrivCount
> design/development? There's something in the notes about flags changing
> in 24 hour periods or possible partition of relays. Can you elaborate
> and make these questions a lot more concrete? Maybe this is something I
> can do in the next few days, with enough time for you to discuss more
> with irl while you're in Rome?

We want to partition the reporting relays into 3 groups at random.
(Or maybe some other number: there is a tradeoff between the number of
groups, which resists manipulation by a single relay, and the quality of the
resulting statistic.)

If we select relays from the consensus at random, do we get a roughly
even distribution of consensus weight, guard weight, middle weight, and
exit weight?

What if we only have 5% of relays reporting statistics?
Can we still get roughly even total partition weights at random?
(Please choose relays on the latest tor versions, because they will be the
first to deploy PrivCount.)

If we can't get even partitions by choosing relays at random, we will need
to choose partitions weighted by consensus weight. Let's decide if we
want to do that analysis after we see the initial results.

> 3. The notes say that the metrics team will need somebody to analyze
> results from PrivCount. I think that's the perfect job for iwakeh. Do
> you have early/experimental PrivCount results that you could provide to
> iwakeh, to find out whether analyzing these numbers will be something
> they're comfortable with or not?

I have some experimental results that will feed into a paper, I will ask the
paper authors if I can share them.

The numbers we output will be similar, but the final format will be more
structured. We can include additional information if it will be helpful for
analysis, like the set of reporting relays.

> 4. Extending CollecTor to archive outputs from tally reporters is
> definitely something we're going to do. Do you have sample data or
> specifications that you can share?

No, we do not have specifications for the tally reporter outputs.
That specification isn't in our 6 month roadmap, it might get done
later in 2018 if we have time.

> And what's the timeframe when we
> should expect to start archiving this data?

6-18 months.
We should have a more specific date after our hackfest in May.

T


More information about the metrics-team mailing list