[tor-bugs] #11648 [Tor]: Problem parsing .z-compressed descriptors fetched via DirPort

Mon May 5 17:56:50 UTC 2014

#11648: Problem parsing .z-compressed descriptors fetched via DirPort
-------------------------+-----------------------
     Reporter:  karsten  |      Owner:
         Type:  defect   |     Status:  new
     Priority:  normal   |  Milestone:
    Component:  Tor      |    Version:
   Resolution:           |   Keywords:  tor-relay
Actual Points:           |  Parent ID:
       Points:           |
-------------------------+-----------------------

Comment (by wfn):

 Replying to [comment:7 karsten]:
 > Here's a theory why compressed and non-compressed results differ: the
 compressed result is cached on the directory, whereas the non-compressed
 result is put together on-the-fly.  I didn't find the place in the code
 where this is done, but it seems at least plausible to me.

 Good idea. (fwiw, I could not find such a place, and from looking at the
 function `write_to_evbuffer_zlib()` (called by
 `connection_write_to_buf_impl_()`, which is the 'implementation' function
 for `connection_write_to_buf_zlib()`, which is noted [comment:1 above]) in
 `or/buffers.c` (around line 2549), it would appear no caching is taking
 place. But I know nothing about bufferevents, and maybe one of those
 *write*() functions further down the callstack are doing something
 (haven't looked.))

 >
 > So, assuming we don't have this particular rabbit hole, can you rephrase
 your other insights?  Is there anything wrong with the way directories
 compress strings, or is this just a matter of clients handling compressed
 responses in the right way?  Thanks!

 Let's keep this open for at least another day if that's OK, but for now,
 the tl;dr would be "the clients ''should'' assume the compressed data is a
 zlib stream." This ''should'' automatically imply that 1) all is well in
 the way tor handles zlib data (directory-server-side and client-side), and
 that 2) your python solution with the `decompressobj()` is the correct way
 to handle this (from that SO thread it looks like other people do the same
 when having to handle zlib streams) as far as python is concerned.

 If it's not the correct way, it's (only) a matter of finding the proper
 way of handling zlib streams {in python | in general}.

 Main caveat: I don't know nearly enough about zlib to be able to make such
 claims without frowning. :) But, (assuming the diffs can be explained
 away) I ''think'' that this was just a mismatch in assumptions about zlib
 data. This is how it should be.

 I might still recommend to note this behaviour down somewhere, i.e.
 include a "clients should assume the compressed data is a zlib
 ''stream''", but I'm not sure what's the best place for that, if any.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/11648#comment:8>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online