Making microdescriptor tarballs available on metrics.tpo

Hi Damian, hi devs, I'm planning to make microdescriptor tarballs available on the metrics website that contain both microdescriptor consensuses and microdescriptors. Some background: Recent Tor clients don't download the network status consensus and full server descriptors anymore, but instead download the microdescriptor consensus and the microdescriptors referenced from it. We didn't provide these formats on the metrics website yet, because they are derived from the formats we already provide and don't contain anything novel. But having the new formats will, for example, make it easier for developers to analyze the directory protocol and for researchers to understand what information is available to clients to make path selection decisions. If you need more background, see #2785 and search for "microdesc" in dir-spec.txt. Here's a sample tarball: https://people.torproject.org/~karsten/microdescs-2014-01.tar.bz2 Damian, can you try to parse these descriptors using stem, to see if the descriptor annotations are correct and if stem can parse them without issues? If all goes well, microdescriptor tarballs will start to be available on the metrics website before the end of the month. All the best, Karsten

Damian, can you try to parse these descriptors using stem, to see if the descriptor annotations are correct and if stem can parse them without issues?
Hi Karsten, sorry about the delay! Yup, stem parses them just fine (though processing compressed tarballs still takes an unpleasantly long time)... % du -h microdescs-2014-01.tar.bz2 1.8M microdescs-2014-01.tar.bz2 % cat parse.py from stem.descriptor.reader import DescriptorReader counter = 0 with DescriptorReader(["microdescs-2014-01.tar.bz2"]) as reader: for desc in reader: counter += 1 print "Found %i microdescriptors" % counter % time python parse.py Found 14999 microdescriptors real 67m15.022s user 65m50.259s sys 1m13.717s Cheers! -Damian

On 1/22/14 4:32 AM, Damian Johnson wrote:
Damian, can you try to parse these descriptors using stem, to see if the descriptor annotations are correct and if stem can parse them without issues?
Hi Karsten, sorry about the delay! Yup, stem parses them just fine (though processing compressed tarballs still takes an unpleasantly long time)...
% du -h microdescs-2014-01.tar.bz2 1.8M microdescs-2014-01.tar.bz2
% cat parse.py from stem.descriptor.reader import DescriptorReader
counter = 0
with DescriptorReader(["microdescs-2014-01.tar.bz2"]) as reader: for desc in reader: counter += 1
print "Found %i microdescriptors" % counter
% time python parse.py Found 14999 microdescriptors
real 67m15.022s user 65m50.259s sys 1m13.717s
Wow, that's indeed time-consuming. Inflating the tarball before feeding it into stem probably solves this problem. (That's what I usually do with metrics-lib, too.) Thanks for testing this! Will deploy the metrics-db changes on yatei. All the best, Karsten

On 22/01/14 09:01, Karsten Loesing wrote:
Thanks for testing this! Will deploy the metrics-db changes on yatei.
Microdescriptor tarballs are now available on the metrics website: https://metrics.torproject.org/data.html#relaydesc All the best, Karsten
participants (2)
-
Damian Johnson
-
Karsten Loesing