[tor-dev] Making microdescriptor tarballs available on metrics.tpo

Karsten Loesing karsten at torproject.org
Wed Jan 22 08:01:37 UTC 2014


On 1/22/14 4:32 AM, Damian Johnson wrote:
>> Damian, can you try to parse these descriptors using stem, to see if the
>> descriptor annotations are correct and if stem can parse them without
>> issues?
> 
> Hi Karsten, sorry about the delay! Yup, stem parses them just fine
> (though processing compressed tarballs still takes an unpleasantly
> long time)...
> 
> 
> % du -h microdescs-2014-01.tar.bz2
> 1.8M    microdescs-2014-01.tar.bz2
> 
> 
> % cat parse.py
> from stem.descriptor.reader import DescriptorReader
> 
> counter = 0
> 
> with DescriptorReader(["microdescs-2014-01.tar.bz2"]) as reader:
>   for desc in reader:
>     counter += 1
> 
> print "Found %i microdescriptors" % counter
> 
> 
> % time python parse.py
> Found 14999 microdescriptors
> 
> real    67m15.022s
> user    65m50.259s
> sys    1m13.717s

Wow, that's indeed time-consuming.  Inflating the tarball before feeding
it into stem probably solves this problem.  (That's what I usually do
with metrics-lib, too.)

Thanks for testing this!  Will deploy the metrics-db changes on yatei.

All the best,
Karsten



More information about the tor-dev mailing list