[metrics-team] PrivCount in Tor session in Rome

Karsten Loesing karsten at torproject.org
Tue Mar 13 15:01:08 UTC 2018


On 2018-03-13 13:33, teor wrote:
> 
>> On 13 Mar 2018, at 12:57, Karsten Loesing <karsten at torproject.org> wrote:
>>
>>> On 2018-03-13 12:06, teor wrote:
>>>
>>>> On 13 Mar 2018, at 11:41, Karsten Loesing <karsten at torproject.org> wrote:
>>>>
>>>> Hi teor,
>>>>
>>>>> On 2018-03-13 09:00, teor wrote:
>>>>>>> 2. What analysis can the metrics team do to help with PrivCount
>>>>>>> design/development? There's something in the notes about flags changing
>>>>>>> in 24 hour periods or possible partition of relays. Can you elaborate
>>>>>>> and make these questions a lot more concrete? Maybe this is something I
>>>>>>> can do in the next few days, with enough time for you to discuss more
>>>>>>> with irl while you're in Rome?
>>>>>>
>>>>>> We want to partition the reporting relays into 3 groups at random.
>>>>>> (Or maybe some other number: there is a tradeoff between the number of
>>>>>> groups, which resists manipulation by a single relay, and the quality of the
>>>>>> resulting statistic.)
>>>>>>
>>>>>> If we select relays from the consensus at random, do we get a roughly
>>>>>> even distribution of consensus weight, guard weight, middle weight, and
>>>>>> exit weight?
>>>>>>
>>>>>> What if we only have 5% of relays reporting statistics?
>>>>>> Can we still get roughly even total partition weights at random?
>>>>>> (Please choose relays on the latest tor versions, because they will be the
>>>>>> first to deploy PrivCount.)
>>>>
>>>> Here's a graph (with and without annotations):
>>>>
>>>> https://people.torproject.org/~karsten/volatile/partitions-2018-03-13.pdf
>>>>
>>>> https://people.torproject.org/~karsten/volatile/partitions-2018-03-13-annotated.pdf
>>>
>>> 0.3.2 has the expected consensus weight distribution.
>>> And it's 2 months since 0.3.2 became stable:
>>> https://trac.torproject.org/projects/tor/wiki/org/teams/NetworkTeam/CoreTorReleases
>>>
>>> I would be happy to wait 2 months after a stable release for good statistics.
>>>
>>>> Let me know if this makes sense, or which parameters I should tweak.
>>>
>>> Can we focus on 0.3.2, and all relays?
>>
>> That would be 0.3.2 or higher then. And all relays for comparison. Sure!
>>
>>>> For
>>>> example:
>>>>
>>>> - Different number of groups (currently 3).
>>>
>>> Can we try 3 and 5?
>>
>> Yep!
>>
>>>> - Different number of simulations (currently 1000).
>>>
>>> That's fine.
>>
>> Or, 40 simulations per consensus = 40 * 24 = 960 simulations in total.
>>
>>>> - Different number of consensuses as input (currently 1).
>>>
>>> We'll be collecting over a day, so please use 24 consensuses.
>>
>> Okay. Note that I'm simply taking 24 consensuses rather than 1 and
>> running simulations on that. I'm not tracking how relays stay online
>> over these 24 hours. That would be a different simulation.
>>
>>>>>> If we can't get even partitions by choosing relays at random, we will need
>>>>>> to choose partitions weighted by consensus weight. Let's decide if we
>>>>>> want to do that analysis after we see the initial results.
>>>>
>>>> Let me know if you want me to try out a different algorithm. The current
>>>> algorithm simply assigns relays to groups at random.
>>>
>>> That seems to get us what we want, let's keep selecting at random.
>>
>> Alright.
>>
>> New graph:
>>
>> https://people.torproject.org/~karsten/volatile/partitions-2018-03-13a.pdf
> 
> All these look fine.
> 
> But I'm having a bit of trouble seeing differences in the cumulative sum
> graphs. Can we do a distribution of total consensus weights for the next
> set of graphs? (That is, a graph that looks like this: _/\_, not this: _/-- )

Here's the same data in a histogram:

https://people.torproject.org/~karsten/volatile/partitions-2018-03-13b.pdf

The y axis is a bit harder to interpret here: It's the number of
partitions produced by the simulation. We're running 960 simulations and
creating 3 or 5 partitions in each simulation. That means that in the
case of 3 (5) groups, there are 3 * 960 = 2880 (5 * 960 = 4800) bars.

> When we choose a set of statistics to move to PrivCount, let's do a
> simulation on historical relay stats, to check that we can add noise,
> partition, aggregate, and bin, and interpret the results.
> 
> We'll need to do some more work before we're ready to do a simulation.
> Like estimate individual client usage.

Sounds good.

> T

All the best,
Karsten

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 528 bytes
Desc: OpenPGP digital signature
URL: <http://lists.torproject.org/pipermail/metrics-team/attachments/20180313/b02d75bb/attachment.sig>


More information about the metrics-team mailing list