On Fri, Jun 21, 2013 at 4:10 PM, Nikita Karetnikov nikita@karetnikov.org wrote:
Hello! I'll try to answer some of the easier stuff now, and put off the longer stuff till I have time after the weekend.
Hm. I don't understand how the hgetLineN implementation can work. It looks like it reads N bytes unconditionally, then looks for the EOL marker in them , and returns everything before the EOL marker... but what does it do with everything after the EOL marker? It appears to me that if the EOL marker is not right at the end of the N bytes, those bytes would get lost.
Right, should it work differently? (Again, I haven't inspected 'hGetLine' yet.)
The current version allows to estimate a worst case scenario because it reads N bytes unconditionally. But it might be better to look for the EOL marker first, then check the number of bytes. What do you think?
The part I was most about was the discarding the extra bytes. Usually, when I see a function called "fooReadLine," I expect it not to throw away data away. That is, if I run "hGetLineN handle (B.pack "\n") 42" three times on a handle for input "foo\nbar\nbaz\n" , I would expect to get "foo", then "bar", then "baz".
I don't know whether that's what hGetLine currently does though, but it's how I interpret its documentation.
(As for whether to read up the the EOL first or whether to read up to the byte limit first: both can be problematic when the byte limit is high. But reading up to the EOL first is extra problematic, since it means that the program may keep reading forever if the user never sends an EOL.)
Am I missing something there? Do the extra bytes get put back somehow?
No. Here is an example:
# cat > Main.hs module Main where import Data.ByteString import qualified Data.ByteString.Char8 as B import System.IO
-- | Read @n@ bytes from @handle@; strip @eol@ (e.g., @'B.pack' "\r\n"@) -- and everything after it. hGetLineN :: Handle -> ByteString -> Int -> IO ByteString hGetLineN handle eol n = do hSetBuffering handle LineBuffering bStr <- B.hGet handle n return $ fst $ B.breakSubstring eol bStr
main = do handle <- openFile "test.txt" ReadMode hGetLineN handle (B.pack "\n") 42
# echo -e "foo\nbar\nbaz" > test.txt # runhaskell Main "foo"
Does it answer your question?
(FYI: a chapter about I/O [5], an introduction to bytestrings [6], and a search engine [7].)
If you apply both patches (with GHC 7.6.3), the following errors will appear:
[ 3 of 39] Compiling TorDNSEL.Compat.Exception ( src/TorDNSEL/Compat/Exception.hs, dist/build/tordnsel/tordnsel-tmp/TorDNSEL/Compat/Exception.o ) [dist/build/autogen/cabal_macros.h changed] [ 4 of 39] Compiling TorDNSEL.System.Timeout ( src/TorDNSEL/System/Timeout.hs, dist/build/tordnsel/tordnsel-tmp/TorDNSEL/System/Timeout.o )
I don't see the errors there... did they go to stderr or something?
I was a bit sleepy and messed it up. Here is the right output:
src/TorDNSEL/System/Timeout.hs:56:47: Module `TorDNSEL.Compat.Exception' does not export `throwDynTo'
src/TorDNSEL/System/Timeout.hs:56:59: Module `TorDNSEL.Compat.Exception' does not export `dynExceptions'
Seems reasonable. It might also be good to have a publicly git repository somewhere where you work on the branch, that can store all of the preliminary repositories.
Can I create a new branch here [8]? If so, should I also send patches and comments to this list, or will it only annoy everyone?
We use the main repositories only for released versions. FOr development branches, we use pesonal repositories. Most people find that it's more convenient to use a service like github or gitorious or something to host their own branches, and then ask for review and merging when the branches are ready for review and merging.
It's okay to send stuff to the list, but linking to a public branch and listing the commit IDs usually works as well as sending patches.
best wishes,