[tor-dev] Guardiness: Yet another external dirauth script

George Kadianakis desnacked at riseup.net
Wed Sep 17 13:54:38 UTC 2014


George Kadianakis <desnacked at riseup.net> writes:

> Damian Johnson <atagar at torproject.org> writes:
>
>>> - Q: Why do you slow stem instead of parsing consensuses with Python on your own?
>>>
>>> This is another part where I might have taken the wrong design
>>> decision, but I decided to not get into the consensus parsing business
>>> and just rely on stem.
>>>
>>> This is also because I was hoping to use stem to verify consensus
>>> signatures. However, now that we might use Daniel's patch to populate
>>> our consensus database, maybe we don't need to treat consensuses as
>>> untrusted anymore.
>>>
>>> If you think that I should try to parse the consensuses on my own,
>>> please tell me and I will give it a try. Maybe it will be
>>> fast. Definitely not as fast as summary files, but maybe we can parse
>>> 3 months worth of consesuses in 15 to 40 seconds.
>>
>> I'm not sure why you think it was the wrong choice. If Stem isn't
>> providing you the performance you want then seems like speeding it up
>> is the right option rather than writing your own parser. That is, of
>> course, unless you're looking for something highly specialized in
>> which case have fun.
>>
>> Nick improved parsing performance by around 30% in response to this...
>>
>>   https://trac.torproject.org/projects/tor/ticket/12859
>>
>> Between that and turning off validation I'd be a little curious where
>> the time is going if it's still too slow for you.
>
> Indeed, our use case is quite specialized. The only thing the
> guardiness script cares about is whether relays have the guard
> flag. No other consensus parsing actually needs to happen.
>
> However, you have a point that stem performance could be improved and
> I will look a bit more into stem parsing and see what I can do.
>
> That said, currently stem parses (with validation enabled) 24
> consensuses in 25 seconds. That's one consensus per second.
> If we are aiming for 7000 consenuses in less than a minute, we need to
> parse 120~ consensuses a second. That will probably require quite some
> optimization in stem, I think.

FWIW, turning off validation helps a bit but not too much.  For
example, my laptop parsing 24 consensuses with validation takes 25
seconds, and if we disable validation it takes 22 seconds.

This means that to reach the rate of 120~ consensuses a second with
parse_file(), we need to make it 100 times faster or so. This sounds
much harder than 30% performance increase :/ 


More information about the tor-dev mailing list