[tor-dev] PRELIMINARY: [PATCH] Adapt to changes in 'GHC.Handle'.

Nick Mathewson nickm at alum.mit.edu
Sat Jun 22 14:18:48 UTC 2013


On Fri, Jun 21, 2013 at 4:10 PM, Nikita Karetnikov
<nikita at karetnikov.org> wrote:

Hello!  I'll try to answer some of the easier stuff now, and put off
the longer stuff till I have time after the weekend.

>> Hm.  I don't understand how the hgetLineN implementation can work.  It
>> looks like it reads N bytes unconditionally, then looks for the EOL
>> marker in them , and returns everything before the EOL marker... but
>> what does it do with everything after the EOL marker?  It appears to
>> me that if the EOL marker is not right at the end of the N bytes,
>> those bytes would get lost.
>
> Right, should it work differently?  (Again, I haven't inspected
> 'hGetLine' yet.)
>
> The current version allows to estimate a worst case scenario because
> it reads N bytes unconditionally.  But it might be better to look for
> the EOL marker first, then check the number of bytes.  What do you
> think?

The part I was most about was the discarding the extra bytes.
Usually, when I see a function called "fooReadLine," I expect it not
to throw away data away.  That is,
if I run "hGetLineN handle (B.pack "\n") 42" three times on a handle
for input "foo\nbar\nbaz\n" , I would expect to get "foo", then "bar",
then "baz".

I don't know whether that's what hGetLine currently does though, but
it's how I interpret its documentation.


(As for whether to read up the the EOL first or whether to read up to
the byte limit first: both can be problematic when the byte limit is
high.  But reading up to the EOL first is extra problematic, since it
means that the program may keep reading forever if the user never
sends an EOL.)

>> Am I missing something there?  Do the extra bytes get put back somehow?
>
> No.  Here is an example:
>
> # cat > Main.hs
> module Main where
> import Data.ByteString
> import qualified Data.ByteString.Char8 as B
> import System.IO
>
> -- | Read @n@ bytes from @handle@; strip @eol@ (e.g., @'B.pack' "\r\n"@)
> -- and everything after it.
> hGetLineN :: Handle -> ByteString -> Int -> IO ByteString
> hGetLineN handle eol n = do
>   hSetBuffering handle LineBuffering
>   bStr <- B.hGet handle n
>   return $ fst $ B.breakSubstring eol bStr
>
> main = do
>   handle <- openFile "test.txt" ReadMode
>   hGetLineN handle (B.pack "\n") 42
>
> # echo -e "foo\nbar\nbaz" > test.txt
> # runhaskell Main
> "foo"
>
> Does it answer your question?
>
> (FYI: a chapter about I/O [5], an introduction to bytestrings [6], and
> a search engine [7].)
>
>>> If you apply both patches (with GHC 7.6.3), the following errors will
>>> appear:
>>>
>>> [ 3 of 39] Compiling TorDNSEL.Compat.Exception ( src/TorDNSEL/Compat/Exception.hs, dist/build/tordnsel/tordnsel-tmp/TorDNSEL/Compat/Exception.o ) [dist/build/autogen/cabal_macros.h changed]
>>> [ 4 of 39] Compiling TorDNSEL.System.Timeout ( src/TorDNSEL/System/Timeout.hs, dist/build/tordnsel/tordnsel-tmp/TorDNSEL/System/Timeout.o )
>
>> I don't see the errors there... did they go to stderr or something?
>
> I was a bit sleepy and messed it up.  Here is the right output:
>
> src/TorDNSEL/System/Timeout.hs:56:47:
>     Module `TorDNSEL.Compat.Exception' does not export `throwDynTo'
>
> src/TorDNSEL/System/Timeout.hs:56:59:
>     Module `TorDNSEL.Compat.Exception' does not export `dynExceptions'
>
>> Seems reasonable.  It might also be good to have a publicly git
>> repository somewhere where you work on the branch, that can store all
>> of the preliminary repositories.
>
> Can I create a new branch here [8]?  If so, should I also send patches
> and comments to this list, or will it only annoy everyone?

We use the main repositories only for released versions.  FOr
development branches, we use pesonal repositories.  Most people find
that it's more convenient to use a service like github or gitorious or
something to host their own branches, and then ask for review and
merging when the branches are ready for review and merging.

It's okay to send stuff to the list, but linking to a public branch
and listing the commit IDs usually works as well as sending patches.

best wishes,
-- 
Nick


More information about the tor-dev mailing list