Re: [tor-dev] PRELIMINARY: [PATCH] Adapt to changes in 'GHC.Handle'.

22 Jun 2013

      On Fri, Jun 21, 2013 at 4:10 PM, Nikita Karetnikov
<nikita@karetnikov.org> wrote:

Hello!  I'll try to answer some of the easier stuff now, and put off
the longer stuff till I have time after the weekend.
...
...
Hm.  I don't understand how the hgetLineN implementation can work.  It
looks like it reads N bytes unconditionally, then looks for the EOL
marker in them , and returns everything before the EOL marker... but
what does it do with everything after the EOL marker?  It appears to
me that if the EOL marker is not right at the end of the N bytes,
those bytes would get lost.
Right, should it work differently?  (Again, I haven't inspected
'hGetLine' yet.)
The current version allows to estimate a worst case scenario because
it reads N bytes unconditionally.  But it might be better to look for
the EOL marker first, then check the number of bytes.  What do you
think?
The part I was most about was the discarding the extra bytes.
Usually, when I see a function called "fooReadLine," I expect it not
to throw away data away.  That is,
if I run "hGetLineN handle (B.pack "\n") 42" three times on a handle
for input "foo\nbar\nbaz\n" , I would expect to get "foo", then "bar",
then "baz".

I don't know whether that's what hGetLine currently does though, but
it's how I interpret its documentation.

(As for whether to read up the the EOL first or whether to read up to
the byte limit first: both can be problematic when the byte limit is
high.  But reading up to the EOL first is extra problematic, since it
means that the program may keep reading forever if the user never
sends an EOL.)
...
...
Am I missing something there?  Do the extra bytes get put back somehow?
No.  Here is an example:
# cat > Main.hs
module Main where
import Data.ByteString
import qualified Data.ByteString.Char8 as B
import System.IO
-- | Read @n@ bytes from @handle@; strip @eol@ (e.g., @'B.pack' "\r\n"@)
-- and everything after it.
hGetLineN :: Handle -> ByteString -> Int -> IO ByteString
hGetLineN handle eol n = do
  hSetBuffering handle LineBuffering
  bStr <- B.hGet handle n
  return $ fst $ B.breakSubstring eol bStr
main = do
  handle <- openFile "test.txt" ReadMode
  hGetLineN handle (B.pack "\n") 42
# echo -e "foo\nbar\nbaz" > test.txt
# runhaskell Main
"foo"
Does it answer your question?
(FYI: a chapter about I/O [5], an introduction to bytestrings [6], and
a search engine [7].)
...
...
If you apply both patches (with GHC 7.6.3), the following errors will
appear:
[ 3 of 39] Compiling TorDNSEL.Compat.Exception ( src/TorDNSEL/Compat/Exception.hs, dist/build/tordnsel/tordnsel-tmp/TorDNSEL/Compat/Exception.o ) [dist/build/autogen/cabal_macros.h changed]
[ 4 of 39] Compiling TorDNSEL.System.Timeout ( src/TorDNSEL/System/Timeout.hs, dist/build/tordnsel/tordnsel-tmp/TorDNSEL/System/Timeout.o )
...
I don't see the errors there... did they go to stderr or something?
I was a bit sleepy and messed it up.  Here is the right output:
src/TorDNSEL/System/Timeout.hs:56:47:
    Module `TorDNSEL.Compat.Exception' does not export `throwDynTo'
src/TorDNSEL/System/Timeout.hs:56:59:
    Module `TorDNSEL.Compat.Exception' does not export `dynExceptions'
...
Seems reasonable.  It might also be good to have a publicly git
repository somewhere where you work on the branch, that can store all
of the preliminary repositories.
Can I create a new branch here [8]?  If so, should I also send patches
and comments to this list, or will it only annoy everyone?
We use the main repositories only for released versions.  FOr
development branches, we use pesonal repositories.  Most people find
that it's more convenient to use a service like github or gitorious or
something to host their own branches, and then ask for review and
merging when the branches are ready for review and merging.

It's okay to send stuff to the list, but linking to a public branch
and listing the commit IDs usually works as well as sending patches.

best wishes,
-- 
Nick

Re: [tor-dev] PRELIMINARY: [PATCH] Adapt to changes in 'GHC.Handle'.

Nick Mathewson