[tor-bugs] #13125 [Tor]: Review the guardiness python script of #9321

Thu Dec 4 16:54:15 UTC 2014

#13125: Review the guardiness python script of #9321
------------------------+--------------------------------
     Reporter:  asn     |      Owner:
         Type:  defect  |     Status:  needs_review
     Priority:  normal  |  Milestone:  Tor: 0.2.6.x-final
    Component:  Tor     |    Version:
   Resolution:          |   Keywords:  tor-guard tor-auth
Actual Points:          |  Parent ID:  #9321
       Points:          |
------------------------+--------------------------------

Comment (by asn):

 Replying to [comment:12 nickm]:
 >

 Thanks for the review. I fixed some but not all stuff and pushed the
 changes to my branch.
 Some comments:

 > * Hmmmmm.  I worry slightly about using stem for parsing here
 >   ... remind me, does stem accept unrecognized elements and fields and
 >   so forth in documents, or does it reject those?  I thought
 >   it sometimes needed to get patched when we added new fields.  If
 >   that's right, this will make us fail when the format evolves.
 >
 >   * If I'm wrong about stem's behavior, great.
 >
 >   * If I'm right, we need to give it a way to be more tolerant.
 >
 >   * Regardless, we should make sure that we can recover from
 >     misformatted cnsensuses.
 >

 Good point. As Damian said, stem is filtering unknown lines and putting
 them in a special list. I can do some testing wrt soon, to make sure it
 works fine.

 > * The output format should be made extensible so that it can handle more
 than
 >   one ID format.  Soon, Ed25519+RSA identity pairs will exist.  After
 >   that, Ed25519-alone will exist.
 >
 >   * The schema should also probably handle this situation.
 >

 That's also a good point.

 Currently, I'm using the fingerprint of the guard as its unique
 identifier. The problem I guess is that in the future, each guard will
 have 2 such fingerprints. And in the future future, it will have one but
 not the one it has now.

 How should this be fixed I wonder? Should I change the schema and output
 file format to support multiple fingerprints? So that we include both RSA
 and Ed25519?

 To do so, I will have to change the output file format quite a bit
 again... Both Python and Tor code will need to change, and unittests will
 need to be rewritten. Should I change the output file format to be yaml or
 something? :/ But then how will Tor parse it...

 > * Probably this needs a LICENSE or COPYING file.  And a license.
 >

 Done. Used unlicense.

 > * Do we verify consensuses now?
 >

 No we don't. #11045 is the corresponding ticket. Damian asked me to take
 care of that ticket, which might happen but not any time very soon.

 However, I don't worry *too* much about this. The cron script for now,
 will just copy the consensus from Tor's data directory to the
 guardfraction directory every hour. So the consensus should be authentic.
 I'm also going to suggest to dirauth operators to just run the cron script
 for a month or so before enabling the code, to have a sufficiently
 populated guardfraction db.

 > * README should explain how and what to backup.
 >

 Hm, what do you mean backup?

 > * The cron script probably shouldn't be called cron.sh, right?
 >

 True. Renamed to `guardfraction_cron.sh`.

 >   * Is the cron script smart enough to exit if something fails?
 >

 Done. Now it's smarter!

 > * In guardfraction.py ... double-check that max_days is positive?  And
 not huge?
 >

 Done.

 > * Check that we're not going backwards in time?
 >

 Hm, how should I do that?
 Should I take the latest consensus from the database, and make sure that
 the current time is later than the `valid-after` of that consensus?

 > * Do you need an index on consensus.consensus_date to make delete and
 count fast enough?
 >

 I think that's a good idea since it's used in multiple `WHERE` clauses,
 but I'm not very good at databases. I added the index in any case.

 > * find_missing_hours_from_list() isn't actually right: We don't
 guarantee a one-hour timer.  Instead you need to check for gaps in the
 (valid-after, fresh-until) series of times.  But this would mean that just
 counting consensuses isn't right.  Maybe each consensus needs a duration-
 fresh field.
 >

 Yeah this was a hard problem. The current implementation is a rough hack,
 to give dirauth operators a rough idea of which consensuses are missing...

 > * How does the nested query in read_db_file() perform?  Is it even
 right?  I don't think the parentheses balance.

 I ```think``` it's right. At least its output seems to be correct, and the
 unittests test it.
 Which parentheses don't match?

 Thanks for the code review again! We can discuss this during the next tor
 meeting if you want :)

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/13125#comment:14>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online