On 1/22/14 4:32 AM, Damian Johnson wrote:
Damian, can you try to parse these descriptors using stem, to see if the descriptor annotations are correct and if stem can parse them without issues?
Hi Karsten, sorry about the delay! Yup, stem parses them just fine (though processing compressed tarballs still takes an unpleasantly long time)...
% du -h microdescs-2014-01.tar.bz2 1.8M microdescs-2014-01.tar.bz2
% cat parse.py from stem.descriptor.reader import DescriptorReader
counter = 0
with DescriptorReader(["microdescs-2014-01.tar.bz2"]) as reader: for desc in reader: counter += 1
print "Found %i microdescriptors" % counter
% time python parse.py Found 14999 microdescriptors
real 67m15.022s user 65m50.259s sys 1m13.717s
Wow, that's indeed time-consuming. Inflating the tarball before feeding it into stem probably solves this problem. (That's what I usually do with metrics-lib, too.)
Thanks for testing this! Will deploy the metrics-db changes on yatei.
All the best, Karsten