On 9/18/13 2:53 AM, Damian Johnson wrote:
Unless maybe stem already does exactly this for us?
Yup, stem parses the extrainfo descriptors...
https://stem.torproject.org/api/descriptor/extrainfo_descriptor.html#stem.de...
The only pesky bit is that you'll need to download a lot of descriptors from metrics (I assume you need the entries published over a long period of time?).
Parsing extra-info descriptors is only step one. Time periods of contained byte histories can overlap quite substantially. You'll need a database or efficient file format to avoid over-counting. For example, assume you have an extra-info descriptor with these lines:
extra-info torrelayfishsticks 9FD2E81F27FB2628B3FEABEB2E66854984E48ABB write-history 2013-09-03 01:35:10 (900 s) [...] 37888,37888,61440,786432
A simple but expensive solution would be to write lines like this to a file:
9FD2E81F27FB2628B3FEABEB2E66854984E48ABB,2013-09-03 01:35:10,w,786432 9FD2E81F27FB2628B3FEABEB2E66854984E48ABB,2013-09-03 01:20:10,w,61440 9FD2E81F27FB2628B3FEABEB2E66854984E48ABB,2013-09-03 01:05:10,w,37888 9FD2E81F27FB2628B3FEABEB2E66854984E48ABB,2013-09-03 00:50:10,w,37888
Once you have that, you sort that file, throw out duplicate lines, and sum up values by fingerprint, date, and read/write.
This approach works fine if you need to evaluate byte histories once per month or so and if it's okay for the job to run a few hours. If you want to do this more often, you might want to use a database for this. See https://gitweb.torproject.org/metrics-tasks.git/tree/HEAD:/task-8462 for a related approach. The file based approach is much simpler though.
All the best, Karsten