[tor-dev] #6512 [Stem]: TorExport Module and Unit Testing

Damian Johnson atagar at torproject.org
Tue Aug 7 16:14:14 UTC 2012


Hi Erik. Sorry about pushing the revisions without checking first. I
didn't want to distract you and Megan from Onionoo by having a
prolonged code review though on reflection it was rude of me to not
ask first. Apologies. :(

> Looking at coverage the test provides, I was under the impression that the various cases of include_fields and exclude fields (e.g. if they overlap) should also be tested

Ahhh, I didn't realize that's what the tests were for. Indeed, that
would be a good thing to check. Should be easy to add.

> In the mocking changes that I made, I added functionality for mocking functions that take keyword arguments.  Though it is no longer necessary with the new unit tests, I was wondering if there was something I could have done better with that code as it seems such functionality may come in useful in the future.

It sounds like a good change. My issue with it was that
return_for_args() is pretty simple right now, and the kwarg_type
argument complicates its usage quite a bit. It's generally a bad sign
if takes 65 words to describe an argument. ;)

Maybe we could make a new mock function to handle this use case?

> You may have already noticed this (and your documentation is consistent with this), but just to be sure, the new export code won't work with generators

Gah! No, I didn't. Suddenly your implementation makes a lot more sense.

Should be easy to fix by calling iter() on descriptors. I'll make the
change and add a test for it - thanks for the catch!

> Working with Professor Danner this summer (and from the CS classes I have taken so far), I am under the impression that run-time type checking can be expensive and generally falls outside of the realm of 'pythonic' code.

The performance cost of type checking is negligible, though you're
right that it's not very pythonic. Frequently calling isinstance() is
a bad sign since it means you're violating the rule of thumb that
variables should only be of a single type. It's horribly confusing to
have variable 'foo' sometimes be an int and sometimes be a string, so
it's important to keep types consistent. Without this dynamic
languages like python and ruby would be unmaintainable.

That said, rules have exceptions. At the very start of functions I
sometimes use isinstance() checks to normalize a variable's type.
Doing this allows callers more flexibility in how they use us, while
keeping our code simple. In this case 'descriptors' is always a list,
but our caller can provide us with either a single Descriptor or list
of Descriptors as they see fit. Without this we either force our
callers to make wrapper lists...

my_descriptor_csv = export_csv([my_descriptor])

... or bloat our API with specially named methods that all do the same thing...

my_descriptor_csv = export_single_csv(my_descriptor)

Dynamic languages like python lack method overloading and I view that
as a weakness. So, on one hand we want a flexible API that doesn't get
in the way of our users but on the other overloaded variable types are
damn confusing.

The middle ground that I've settled on is to let users provide one or
multiple of an object type when that capability would be useful. Imho
this makes for a nicer API without treading very far into type
confusions. Others may disagree.

As for the isinstance() Descriptor check on line 83, the runtime cost
to check this is so infinitesimally small that it shouldn't be a
concern. The real performance costs of a system either come from
constrained resources (like network IO) or runtime complexity (O(n) vs
O(n^2) operations). It's well worth taking a couple nanoseconds to
provide our callers with a nicer error message when they screw up.

> Thank you in advance for answering these questions -- this summer has been (and still is) a phenomenal learning experience for us, and gaining insight on coding style and the like is crucial to developing our own coding techniques.

I'm glad this has been a good learning experience for you! You're a
great developer and I'm keeping my fingers crossed that you keep
hacking on open source stuff well after the summer has ended.

> Looking at the code, I must apologize.  I didn't mean to take so much of your time rewriting the code.

Don't worry about that. It's mostly my OCD for code to look and behave
a very specific way. Your implementation was perfectly functional.

Cheers! -Damian


More information about the tor-dev mailing list