Hi Damian,
Looking at coverage the test provides, I was under the impression that the various cases of include_fields and exclude fields (e.g. if they overlap) should also be tested, but if you find that unnecessary, that would help explain why my unit tests were difficult to follow.
In the mocking changes that I made, I added functionality for mocking functions that take keyword arguments. Though it is no longer necessary with the new unit tests, I was wondering if there was something I could have done better with that code as it seems such functionality may come in useful in the future.
You may have already noticed this (and your documentation is consistent with this), but just to be sure, the new export code won't work with generators, which I bring up as this is what server_descriptor.parse_file() returns. So a user of this module would need to covert that generator into a list or tuple, which could be rather large.
def gener(list=[1,2,3,4]):
... for item in list: ... yield item
gener()[0]
Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'generator' object has no attribute '__getitem__'
Working with Professor Danner this summer (and from the CS classes I have taken so far), I am under the impression that run-time type checking can be expensive and generally falls outside of the realm of 'pythonic' code. To help Megan and I understand this design change, could you explain why you went with this style (e.g. line 83 in stem/descriptor/export.py)? Along similar lines, why wouldn't we also check the types of other parameters passed into the function?
Thank you in advance for answering these questions -- this summer has been (and still is) a phenomenal learning experience for us, and gaining insight on coding style and the like is crucial to developing our own coding techniques.
Looking at the code, I must apologize. I didn't mean to take so much of your time rewriting the code. That said, looking at your implementation gave me some perspective in evaluating the code I submitted so I don't make similar mistakes in the future.
Best, Erik
On Sat, Aug 4, 2012 at 3:22 PM, Tor Bug Tracker & Wiki < torproject-admin@torproject.org> wrote:
#6512: TorExport Module and Unit Testing
----------------------------+----------------------------------------------- Reporter: ErikI | Owner: atagar Type: enhancement | Status: closed Priority: normal | Milestone: Component: Stem | Version: Resolution: implemented | Keywords: Parent: | Points: Actualpoints: |
----------------------------+----------------------------------------------- Changes (by atagar):
- status: needs_review => closed
- resolution: => implemented
Comment:
Thanks Erik, merged with some revisions [1]. I found the unit tests a bit hard to follow, especially since it wasn't broken down by the test cases so I rewrote those. Let me know if I missed any test cases that you think are important.
[1]
https://gitweb.torproject.org/stem.git/commitdiff/7022021c4207a0066cb9326162...
-- Ticket URL: < https://trac.torproject.org/projects/tor/ticket/6512#comment:3%3E Tor Bug Tracker & Wiki https://trac.torproject.org/ The Tor Project: anonymity online
Hi Erik. Sorry about pushing the revisions without checking first. I didn't want to distract you and Megan from Onionoo by having a prolonged code review though on reflection it was rude of me to not ask first. Apologies. :(
Looking at coverage the test provides, I was under the impression that the various cases of include_fields and exclude fields (e.g. if they overlap) should also be tested
Ahhh, I didn't realize that's what the tests were for. Indeed, that would be a good thing to check. Should be easy to add.
In the mocking changes that I made, I added functionality for mocking functions that take keyword arguments. Though it is no longer necessary with the new unit tests, I was wondering if there was something I could have done better with that code as it seems such functionality may come in useful in the future.
It sounds like a good change. My issue with it was that return_for_args() is pretty simple right now, and the kwarg_type argument complicates its usage quite a bit. It's generally a bad sign if takes 65 words to describe an argument. ;)
Maybe we could make a new mock function to handle this use case?
You may have already noticed this (and your documentation is consistent with this), but just to be sure, the new export code won't work with generators
Gah! No, I didn't. Suddenly your implementation makes a lot more sense.
Should be easy to fix by calling iter() on descriptors. I'll make the change and add a test for it - thanks for the catch!
Working with Professor Danner this summer (and from the CS classes I have taken so far), I am under the impression that run-time type checking can be expensive and generally falls outside of the realm of 'pythonic' code.
The performance cost of type checking is negligible, though you're right that it's not very pythonic. Frequently calling isinstance() is a bad sign since it means you're violating the rule of thumb that variables should only be of a single type. It's horribly confusing to have variable 'foo' sometimes be an int and sometimes be a string, so it's important to keep types consistent. Without this dynamic languages like python and ruby would be unmaintainable.
That said, rules have exceptions. At the very start of functions I sometimes use isinstance() checks to normalize a variable's type. Doing this allows callers more flexibility in how they use us, while keeping our code simple. In this case 'descriptors' is always a list, but our caller can provide us with either a single Descriptor or list of Descriptors as they see fit. Without this we either force our callers to make wrapper lists...
my_descriptor_csv = export_csv([my_descriptor])
... or bloat our API with specially named methods that all do the same thing...
my_descriptor_csv = export_single_csv(my_descriptor)
Dynamic languages like python lack method overloading and I view that as a weakness. So, on one hand we want a flexible API that doesn't get in the way of our users but on the other overloaded variable types are damn confusing.
The middle ground that I've settled on is to let users provide one or multiple of an object type when that capability would be useful. Imho this makes for a nicer API without treading very far into type confusions. Others may disagree.
As for the isinstance() Descriptor check on line 83, the runtime cost to check this is so infinitesimally small that it shouldn't be a concern. The real performance costs of a system either come from constrained resources (like network IO) or runtime complexity (O(n) vs O(n^2) operations). It's well worth taking a couple nanoseconds to provide our callers with a nicer error message when they screw up.
Thank you in advance for answering these questions -- this summer has been (and still is) a phenomenal learning experience for us, and gaining insight on coding style and the like is crucial to developing our own coding techniques.
I'm glad this has been a good learning experience for you! You're a great developer and I'm keeping my fingers crossed that you keep hacking on open source stuff well after the summer has ended.
Looking at the code, I must apologize. I didn't mean to take so much of your time rewriting the code.
Don't worry about that. It's mostly my OCD for code to look and behave a very specific way. Your implementation was perfectly functional.
Cheers! -Damian