[tor-dev] Stem Proc Integration Tests

Damian Johnson atagar at torproject.org
Fri Jun 29 16:27:34 UTC 2012


> Keep in mind that metrics tarballs can be huge.  stem's tests probably
> shouldn't download one or more of these tarballs in an automatic integ
> test run.

Oops yup. Should have mentioned that. We're just picking out a
descriptor that seems to exercise most of the parsing. This is just
for a sanity check that 'we can still parse something found in the
wild'. Megan, Erik: the layout should be pretty obvious when you take
a peek in test/integ/descriptor/data/*.

> The Java metrics-lib doesn't
> understand microdescriptor consensuses, because they don't contain
> anything new for statistical analysis, but I think stem will want to
> parse them.

Definitely. Microdescriptors are available via the control protocol so
we need to be able to parse them.

> It probably makes sense to have an abstract
> NetworkStatusEntry class that does most of the parsing work but that can
> be specialized in its subclasses.  Picking names like ConsensusEntry if
> the consensus class is called Consensus makes sense.

Perfect, thanks. Megan, Erik: if I was in your shoes the first thing
that I'd do to approach this is propose the following on this list...
- an object hierarchy (we already have a bit of one, ex.
ServerDescriptor vs RelayDescriptor/BridgeDescriptor)
- a description for each of the classes, preferably something meaty
that we can use for the pydocs of each class with the :var: entries
- your thoughts on which parsing logic should go where (look at the
previous descriptor classes for a pattern that you might want to
follow)

> If there's a
> similar concept to Java's inner classes in Python, maybe using something
> like Consensus.Entry might be a good choice, too, because this class
> will only be used as part of a Consensus.

Yup, there is.

>>> class Foo:
...   class Bar:
...     def __init__(self):
...       self.my_value = 5
...   def __init__(self):
...     self.my_bar = Foo.Bar()
...
>>> f = Foo()
>>> f.my_bar.my_value
5

> A related question:  can you give us a couple of use-cases for the export functionality?  E.g., is filtering (we only want fields X, Y, and Z when Q = ...) likely to be of use?  Anything beyond just a straight dump of descriptor/network status/etc entries?

I'll mostly leave this question for Fabio since the csv dumping
functionality was his idea, though my thoughts on some use cases
are...

- user writes a script that has stem parse the descriptors, filter the
results (say, down to Syrian exit relays), then dumps to a csv so they
can make pretty graphs or do other analysis of the data

- user has a python script that hourly parses their cached descriptors
to get any new exits that only allow plaintext traffic, then dump just
the fingerprint and ip to a csv so they can later be scanned for
malicious activity

> Please use the built-in function vars() instead of __dict__ to retrive
> instance attributes.

Ah ha, thanks.


More information about the tor-dev mailing list