[tor-bugs] #11648 [Tor]: Problem parsing .z-compressed descriptors fetched via DirPort

Tue May 6 13:24:35 UTC 2014

#11648: Problem parsing .z-compressed descriptors fetched via DirPort
-------------------------+-----------------------
     Reporter:  karsten  |      Owner:
         Type:  defect   |     Status:  new
     Priority:  normal   |  Milestone:
    Component:  Tor      |    Version:
   Resolution:           |   Keywords:  tor-relay
Actual Points:           |  Parent ID:
       Points:           |
-------------------------+-----------------------

Comment (by wfn):

 Replying to [comment:10 karsten]:
 > Replying to [comment:9 wfn]:
 > > [...]
 > >  * I don't think any zlib-compressed-data caching happens (can't find
 any in `buffers` or `torgzip`, etc.)
 > >  * Nevertheless, the diffs ''should really be'' just because
 directories' view of relays and their descriptors is always changing (it
 would be nice to confirm this in a definite manner, of course.) Also, some
 caching could be happening "somewhere else" (i.e.: I don't know.)
 >
 > I'm not too worried about this, but I can't say for sure that there's
 ''no'' bug here.  But should we move this to a separate ticket?  The two
 possible issues seem unrelated.  Would you want to create that ticket?

 Could you please clarify/confirm what you had in mind? By the two issues,
 did you mean

 1. .z-compressed descriptors can't be parsed / the zlib data returned by
 tor is strange (this ticket)
 2. `tor/server/all` and `tor/server/all.z` (possibly) differ (new ticket)

 Just say "yes, this", and I'll create a ticket for "2."

 > So, I looked at `turtles-server-all.z` using `xxd` with RFC 1950 opened
 in a second window.  The first bytes (`0x78da`) look like a valid header
 (`78` = "deflate with 32k window size", `da`= "max compression, no dict"),
 but I'm not sure about the last bytes (`0x0000ffff`).  These last bytes
 are supposed to be the Adler-32 checksum, but these bytes don't look like
 a valid checksum to me.  Maybe something is not initialized or not flushed
 correctly?

 Good catch!

 Potentially directly related: there are multiple zlib flush modes. Here's
 what [http://www.zlib.net/manual.html zlib's manual] has to say about
 `Z_SYNC_FLUSH` (emphasis mine):

 > If the parameter flush is set to Z_SYNC_FLUSH, all pending output is
 flushed to the output buffer and the output is aligned on a byte boundary,
 so that the decompressor can get all input data available so far. (In
 particular avail_in is zero after the call if enough output space has been
 provided before the call.) Flushing may degrade compression for some
 compression algorithms and so it should be used only when necessary. This
 completes the current deflate block and follows it with an empty stored
 block that is three bits plus filler bits to the next byte, '''followed by
 four bytes (00 00 ff ff)'''.

 This is probably related. (To clarify for someone else who might be
 reading this (ha, ha): those four bytes are at the very end of the
 archived `.z` stream.)

 Regarding the flush modes themselves, python's zlib mentions

 `Z_SYNC_FLUSH`, `Z_FULL_FLUSH`, `Z_FINISH`.

 tor's gzip module only uses `Z_SYNC_FLUSH` and `Z_FINISH`.

 ([http://www.zlib.net/manual.html There are more modes in zlib].)

 Could it be that the two sides are making different assumptions about
 these flush modes?

 Or that tor does not do a `Z_FINISH` in the end (though, maybe it does. It
 has the code for that. Whether it's called at the end of (the writing of)
 the `.z` stream, I'm not sure.)

 Responding inline to less important points below (i.e.: can probably skip
 them.)

 > >  * `all.z` is a zlib stream, and as far as zlib streams go, everything
 is OK with it.
 >
 > I don't buy that yet.  Why would `zlib.decompress` fail if this is a
 valid zlib stream?

 (This paragraph doesn't help much, but it explains my confused thought
 process ->) When I wrote that, I was operating under the assumption that
 there are zlib ''archives'' and ''streams'', and that python expects an
 ''archive''. fwiw, there does seem to be some kind of assumption that
 python makes (see comments on http://stackoverflow.com/questions/3122145
 /zlib-error-error-3-while-decompressing-incorrect-header-
 check/22310760#22310760 maybe), but, I've little clue. Maybe zlib data is
 a zlib stream and that's that, and it should just work.

 > (Also note that the `zlib.decompressobj` solution doesn't work for me,
 because I'm really looking for a way to make
 `java.util.zip.InflaterInputStream` work.  I just wrote the test cases in
 Python, because I was hoping to get more feedback on that.  And look how
 this worked just fine!  But I really suspect that the problem is in tor's
 use of zlib.)

 Aha. :)

 http://docs.oracle.com/javase/7/docs/api/java/util/zip/InflaterInputStream.html
 says that DEFLATE is expected. OK.

 Any meaningful error messages on java's part here?

 Also, by the way: the `tor_gzip_compress()` in `torgzip.c` looks like a
 rather hairy function. It doesn't operate at a "quick hack" level, but it
 does seem to make some specific assumptions. Either way, the comments
 included there do not reassure me that `tor_gzip_compress()` does
 everything right. :)

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/11648#comment:12>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online