[tor-relays] Reimbursement of Exit Operators

Wed Sep 18 06:40:50 UTC 2013

On 9/18/13 2:53 AM, Damian Johnson wrote:
>> Unless maybe stem already does exactly this for us?
> 
> Yup, stem parses the extrainfo descriptors...
> 
> https://stem.torproject.org/api/descriptor/extrainfo_descriptor.html#stem.descriptor.extrainfo_descriptor.ExtraInfoDescriptor
> 
> The only pesky bit is that you'll need to download a lot of
> descriptors from metrics (I assume you need the entries published over
> a long period of time?).

Parsing extra-info descriptors is only step one.  Time periods of
contained byte histories can overlap quite substantially.  You'll need a
database or efficient file format to avoid over-counting.  For example,
assume you have an extra-info descriptor with these lines:

extra-info torrelayfishsticks 9FD2E81F27FB2628B3FEABEB2E66854984E48ABB
write-history 2013-09-03 01:35:10 (900 s) [...] 37888,37888,61440,786432

A simple but expensive solution would be to write lines like this to a file:

9FD2E81F27FB2628B3FEABEB2E66854984E48ABB,2013-09-03 01:35:10,w,786432
9FD2E81F27FB2628B3FEABEB2E66854984E48ABB,2013-09-03 01:20:10,w,61440
9FD2E81F27FB2628B3FEABEB2E66854984E48ABB,2013-09-03 01:05:10,w,37888
9FD2E81F27FB2628B3FEABEB2E66854984E48ABB,2013-09-03 00:50:10,w,37888

Once you have that, you sort that file, throw out duplicate lines, and
sum up values by fingerprint, date, and read/write.

This approach works fine if you need to evaluate byte histories once per
month or so and if it's okay for the job to run a few hours.  If you
want to do this more often, you might want to use a database for this.
See https://gitweb.torproject.org/metrics-tasks.git/tree/HEAD:/task-8462
for a related approach.  The file based approach is much simpler though.

All the best,
Karsten