Why doesn't print force entire lazy IO value?

Why doesn't print force entire lazy IO value? - haskell

I'm using http-client tutorial to get response body using TLS connection. Since I can observe that print is called by withResponse, why doesn't print force entire response to the output in the following fragment?
withResponse request manager $ \response -> do
putStrLn $ "The status code was: " ++
body <- (responseBody response)
print body
I need to write this instead:
response <- httpLbs request manager
putStrLn $ "The status code was: " ++
show (statusCode $ responseStatus response)
print $ responseBody response
Body I want to print is a lazy ByteString. I'm still not sure whether I should expect print to print the entire value.
instance Show ByteString where
showsPrec p ps r = showsPrec p (unpackChars ps) r

This doesn't have to do with laziness, but with the difference between the Response L.ByteString you get with the Simple module, and the Response BodyReader you get with the TLS module.
You noticed that a BodyReader is an IO ByteString. But in particular it is an action that can be repeated, each time with the next chunk of bytes. It follows the protocol that it never sends a null bytestring except when it's at the end of file. (BodyReader might have been called ChunkGetter). bip below is like what you wrote: after extracting the BodyReader/IO ByteString from the Response, it performs it to get the first chunk, and prints it. But doesn't repeat the action to get more - so in this case we just see the first couple chapters of Genesis. What you need is a loop to exhaust the chunks, as in bop below, which causes the whole King James Bible to spill into the console.
{-# LANGUAGE OverloadedStrings #-}
import Network.HTTP.Client
import Network.HTTP.Client.TLS
import qualified Data.ByteString.Char8 as B
main = bip
-- main = bop
bip = do
manager <- newManager tlsManagerSettings
request <- parseRequest "https://raw.githubusercontent.com/michaelt/kjv/master/kjv.txt"
withResponse request manager $ \response -> do
putStrLn "The status code was: "
print (responseStatus response)
chunk <- responseBody response
B.putStrLn chunk
bop = do
manager <- newManager tlsManagerSettings
request <- parseRequest "https://raw.githubusercontent.com/michaelt/kjv/master/kjv.txt"
withResponse request manager $ \response -> do
putStrLn "The status code was: "
print (responseStatus response)
let loop = do
chunk <- responseBody response
if B.null chunk
then return ()
else B.putStr chunk >> loop
loop
The loop keeps going back to get more chunks until it gets an empty string, which represents eof, so in the terminal it prints through to the end of the Apocalypse.
This is behavior is straightforward but slightly technical. You can only work with a BodyReader by hand-written recursion. But the purpose of the http-client library is to make things like http-conduit possible. There the result of withResponse has the type Response (ConduitM i ByteString m ()). ConduitM i ByteString m () is how conduit types of a byte stream; this byte stream would contain the whole file.
In the original form of the http-client/http-conduit material, the Response contained a conduit like this; the BodyReader part was later factored out into http-client so it could be used by different streaming libraries like pipes.
So to take a simple example, in the corresponding http material for the streaming and streaming-bytestring libraries, withHTTP gives you a response of type Response (ByteString IO ()). ByteString IO () is the type of a stream of bytes arising in IO, as its name suggests; ByteString Identity () would be the equivalent of a lazy bytestring (effectively a pure list of chunks.) The ByteString IO () will in this case represent the whole bytestream down to the Apocalypse. So with the imports
import qualified Data.ByteString.Streaming.HTTP as Bytes -- streaming-utils
import qualified Data.ByteString.Streaming.Char8 as Bytes -- streaming-bytestring
the program is identical to a lazy bytestring program:
bap = do
manager <- newManager tlsManagerSettings
request <- parseRequest "https://raw.githubusercontent.com/michaelt/kjv/master/kjv.txt"
Bytes.withHTTP request manager $ \response -> do
putStrLn "The status code was: "
print (responseStatus response)
Bytes.putStrLn $ responseBody response
Indeed it is slightly simpler, since you don't have "extract the bytes from IO`:
lazy_bytes <- responseStatus response
Lazy.putStrLn lazy_bytes
but just write
Bytes.putStrLn $ responseBody response
you just "print" them directly. If you want to view just a bit from the middle of the KJV, you can instead do what you would with a lazy bytestring, and end with:
Bytes.putStrLn $ Bytes.take 1000 $ Bytes.drop 50000 $ responseBody response
Then you will see something about Abraham.
The withHTTP for streaming-bytestring just hides the recursive looping that we needed to use the BodyReader material from http-client directly. It's the same e.g. with the withHTTP you find in pipes-http, which represents a stream of bytestring chunks as Producer ByteString IO (), and the same with http-conduit. In all of these cases, once you have your hands on the byte stream you handle it in the ways typical of the streaming IO framework without handwritten recursion. All of them use the BodyReader from http-client to do this, and this was the main purpose of the library.

Related

When to call runResourceT on streaming-bytestring?

I am a Haskell beginner and still learning about monad transformers.
I am trying to use the streaming-bytestring library to read a binary file, process chunks of bytes, and print the result as each chunk is processed. I believe this is the popular streaming library that provides an alternative to lazy bytestrings. It appears the authors copy-pasted the lazy bytestring documentation and added some arbitrary examples.
The examples mention runResourceT without going into any discussion of what it is or how to use it. It appears that should use runResourceT on any streaming-bytestring function that performs an action. That's fine, but what if I'm reading an infinite stream that processes chunks and prints them? Should I call runResourceT every time I want to process the chunk?
My code is something like this:
import qualified Data.ByteString.Streaming as BSS
import System.TimeIt
main = timeIt $ processByteChunks $ BSS.drop 100 $ BSS.readFile "filename"
and I'm unsure of how to organize processByteChunks as a recursive function that iterates through the binary file.
If I call runResourceT only once, it would read the infinite file BEFORE printing, right? That seems bad.
main = timeIt $ runResourceT $ processByteChunks $ BSS.drop 100 $ BSS.readFile "filename"

The ResourceT monad just cleans up resources in a timely fashion when you're finished with them. In this case, it will ensure the file handle opened by BSS.readFile is closed when the stream is consumed. (Unless the stream truly is infinite, in which case I guess it won't.)
In your application, you only want to call it once, since you don't want the file closed until you've read all the chunks. Don't worry -- it has nothing to do with the timing of output or anything like that.
Here's an example with a recursive processByteChunks that should work. It will read lazily and generate output as chunks are lazily read:
import Control.Monad.IO.Class
import Control.Monad.Trans.Resource
import qualified Data.ByteString.Streaming as BSS
import qualified Data.ByteString as BS
import System.TimeIt
main :: IO ()
main = timeIt $ runResourceT $
processByteChunks $ BSS.drop 100 $ BSS.readFile "filename"
processByteChunks :: MonadIO m => BSS.ByteString m () -> m ()
processByteChunks = go 0 0
where go len nulls stream = do
m <- BSS.unconsChunk stream
case m of
Just (bs, stream') -> do
let len' = len + BS.length bs
nulls' = nulls + BS.length (BS.filter (==0) bs)
liftIO $ print $ "cumulative length=" ++ show len'
++ ", nulls=" ++ show nulls'
go len' nulls' stream'
Nothing -> return ()

Log all requests and responses for http-conduit

I have written this ManagerSettings to log all requests and responses for my http-conduit application. (By the way, I am importing ClassyPrelude).
tracingManagerSettings :: ManagerSettings
tracingManagerSettings =
tlsManagerSettings { managerModifyRequest = \req -> do
putStr "TRACE: "
print req
putStrLn ""
pure req
, managerModifyResponse = \r -> do
responseChunks <- brConsume $ responseBody r
let fullResponse = mconcat responseChunks
putStr "TRACE: RESPONSE: "
putStrLn $ decodeUtf8 fullResponse
pure $ r { responseBody = pure fullResponse }
}
However, it's not working - when I use it, the application is hanging and trying to consume all the RAM in the machine after printing the first request and first response, which suggests some kind of infinite loop.
Also, the request is printed twice.
I made a previous attempt that was similar, but didn't modify r. That failed because after I had already read the response completely, there was no more response data to read.
If I replace this with tlsManagerSettings, http-conduit works again.
My application is using libstackexchange, which I have modified to allow the ManagerSettings to be customised. I am using http-conduit version 2.2.4.
How can I diagnose the issue? How can I fix it?

managerModifyResponse doesn't work with a Response ByteString, it works with a Response BodyReader, where type BodyReader = IO ByteString along with the contract that if it produces a non-empty ByteString there is more input that can be read.
The problem you're running into is that pure fullResponse never returns an empty ByteString unless it always does. You need to provide a somewhat more complex IO action to capture the intended behavior. Maybe something along these lines (untested):
returnOnce :: Monoid a => a -> IO (IO a)
returnOnce x = do
ref <- newIORef x
pure $ readIORef ref <* writeIORef ref mempty
As for how to debug this? Not sure about generic methods. I was just suspicious that you probably needed a solution along these lines, and the docs for BodyReader confirmed it.

How to track progress through a streaming ByteString?

I'm using the streaming-utils streaming-utils to stream a HTTP response body. I want to track the progress similar to how bytestring-progress allows with lazy ByteStrings. I suspect something like toChunks would be necessary, then reducing some cumulative bytes read and returning the original stream unmodified. But I cannot figure it out, and the streaming documentation is very unhelpful, mostly full of grandiose comparisons to alternative libraries.
Here's some code with my best effort so far. It doesn't include the counting yet, and just tries to print the size of chunks as they stream past (and doesn't compile).
download :: ByteString -> FilePath -> IO ()
download i file = do
req <- parseRequest . C.unpack $ i
m <- newHttpClientManager
runResourceT $ do
resp <- http req m
lift . traceIO $ "downloading " <> file
let body = SBS.fromChunks $ mapsM step $ SBS.toChunks $ responseBody resp
SBS.writeFile file body
step bs = do
traceIO $ "got " <> show (C.length bs) <> " bytes"
return bs

What we want is to traverse the Stream (Of ByteString) IO () in two ways:
One that accumulates the incoming lengths of the ByteStrings and prints updates to console.
One that writes the stream to a file.
We can do that with the help of the copy function, which has type:
copy :: Monad m => Stream (Of a) m r -> Stream (Of a) (Stream (Of a) m) r
copy takes a stream and duplicates it into two different monadic layers, where each element of the original stream is emitted by both layers of the new dissociated stream.
(Notice that we are changing the base monad, not the functor. What changing the functor to another Stream does is to delimit groups in a single stream, and we aren't interested in that here.)
The following function takes a stream, copies it, accumulates the length of incoming strings with S.scan, prints them, and returns another stream that you can still work with, for example writing it to a file:
{-# LANGUAGE OverloadedStrings #-}
import Streaming
import qualified Streaming.Prelude as S
import qualified Data.ByteString as B
track :: Stream (Of B.ByteString) IO r -> Stream (Of B.ByteString) IO r
track stream =
S.mapM_ (liftIO . print) -- brings us back to the base monad, here another stream
. S.scan (\s b -> s + B.length b) (0::Int) id
$ S.copy stream
This will print the ByteStrings along with the accumulated lengths:
main :: IO ()
main = S.mapM_ B.putStr . track $ S.each ["aa","bb","c"]

Why does Haskell's main function require IO operations? [duplicate]

I wonder how I/O were done in Haskell in the days when IO monad was still not invented. Anyone knows an example.
Edit: Can I/O be done without the IO Monad in modern Haskell? I'd prefer an example that works with modern GHC.

Before the IO monad was introduced, main was a function of type [Response] -> [Request]. A Request would represent an I/O action like writing to a channel or a file, or reading input, or reading environment variables etc.. A Response would be the result of such an action. For example if you performed a ReadChan or ReadFile request, the corresponding Response would be Str str where str would be a String containing the read input. When performing an AppendChan, AppendFile or WriteFile request, the response would simply be Success. (Assuming, in all cases, that the given action was actually successful, of course).
So a Haskell program would work by building up a list of Request values and reading the corresponding responses from the list given to main. For example a program to read a number from the user might look like this (leaving out any error handling for simplicity's sake):
main :: [Response] -> [Request]
main responses =
[
AppendChan "stdout" "Please enter a Number\n",
ReadChan "stdin",
AppendChan "stdout" . show $ enteredNumber * 2
]
where (Str input) = responses !! 1
firstLine = head . lines $ input
enteredNumber = read firstLine
As Stephen Tetley already pointed out in a comment, a detailed specification of this model is given in chapter 7 of the 1.2 Haskell Report.
Can I/O be done without the IO Monad in modern Haskell?
No. Haskell no longer supports the Response/Request way of doing IO directly and the type of main is now IO (), so you can't write a Haskell program that doesn't involve IO and even if you could, you'd still have no alternative way of doing any I/O.
What you can do, however, is to write a function that takes an old-style main function and turns it into an IO action. You could then write everything using the old style and then only use IO in main where you'd simply invoke the conversion function on your real main function. Doing so would almost certainly be more cumbersome than using the IO monad (and would confuse the hell out of any modern Haskeller reading your code), so I definitely would not recommend it. However it is possible. Such a conversion function could look like this:
import System.IO.Unsafe
-- Since the Request and Response types no longer exist, we have to redefine
-- them here ourselves. To support more I/O operations, we'd need to expand
-- these types
data Request =
ReadChan String
| AppendChan String String
data Response =
Success
| Str String
deriving Show
-- Execute a request using the IO monad and return the corresponding Response.
executeRequest :: Request -> IO Response
executeRequest (AppendChan "stdout" message) = do
putStr message
return Success
executeRequest (AppendChan chan _) =
error ("Output channel " ++ chan ++ " not supported")
executeRequest (ReadChan "stdin") = do
input <- getContents
return $ Str input
executeRequest (ReadChan chan) =
error ("Input channel " ++ chan ++ " not supported")
-- Take an old style main function and turn it into an IO action
executeOldStyleMain :: ([Response] -> [Request]) -> IO ()
executeOldStyleMain oldStyleMain = do
-- I'm really sorry for this.
-- I don't think it is possible to write this function without unsafePerformIO
let responses = map (unsafePerformIO . executeRequest) . oldStyleMain $ responses
-- Make sure that all responses are evaluated (so that the I/O actually takes
-- place) and then return ()
foldr seq (return ()) responses
You could then use this function like this:
-- In an old-style Haskell application to double a number, this would be the
-- main function
doubleUserInput :: [Response] -> [Request]
doubleUserInput responses =
[
AppendChan "stdout" "Please enter a Number\n",
ReadChan "stdin",
AppendChan "stdout" . show $ enteredNumber * 2
]
where (Str input) = responses !! 1
firstLine = head . lines $ input
enteredNumber = read firstLine
main :: IO ()
main = executeOldStyleMain doubleUserInput

I'd prefer an example that works with modern GHC.
For GHC 8.6.5:
import Control.Concurrent.Chan(newChan, getChanContents, writeChan)
import Control.Monad((<=<))
type Dialogue = [Response] -> [Request]
data Request = Getq | Putq Char
data Response = Getp Char | Putp
runDialogue :: Dialogue -> IO ()
runDialogue d =
do ch <- newChan
l <- getChanContents ch
mapM_ (writeChan ch <=< respond) (d l)
respond :: Request -> IO Response
respond Getq = fmap Getp getChar
respond (Putq c) = putChar c >> return Putp
where the type declarations are from page 14 of How to Declare an Imperative by Philip Wadler. Test programs are left as an exercise for curious readers :-)
If anyone is wondering:
-- from ghc-8.6.5/libraries/base/Control/Concurrent/Chan.hs, lines 132-139
getChanContents :: Chan a -> IO [a]
getChanContents ch
= unsafeInterleaveIO (do
x <- readChan ch
xs <- getChanContents ch
return (x:xs)
)
yes - unsafeInterleaveIO does make an appearance.

#sepp2k already clarified how this works, but i wanted to add a few words
I'm really sorry for this. I don't think it is possible to write this function without unsafePerformIO
Of course you can, you should almost never use unsafePerformIO
http://chrisdone.com/posts/haskellers
I'm using slightly different Request type constructor, so that it does not take channel version (stdin / stdout like in #sepp2k's code). Here is my solution for this:
(Note: getFirstReq doesn't work on empty list, you would have to add a case for that, bu it should be trivial)
data Request = Readline
| PutStrLn String
data Response = Success
| Str String
type Dialog = [Response] -> [Request]
execRequest :: Request -> IO Response
execRequest Readline = getLine >>= \s -> return (Str s)
execRequest (PutStrLn s) = putStrLn s >> return Success
dialogToIOMonad :: Dialog -> IO ()
dialogToIOMonad dialog =
let getFirstReq :: Dialog -> Request
getFirstReq dialog = let (req:_) = dialog [] in req
getTailReqs :: Dialog -> Response -> Dialog
getTailReqs dialog resp =
\resps -> let (_:reqs) = dialog (resp:resps) in reqs
in do
let req = getFirstReq dialog
resp <- execRequest req
dialogToIOMonad (getTailReqs dialog resp)

How make chunking work with amazonka, conduit and lazy bytestring

I wrote the code below to simulate upload to S3 from Lazy ByteString (which will be received over the network socket. Here, we simulate by reading from a file of size ~100MB). The problem with the code below is that it seems to be forcing the read of entire file into memory instead of chunking it (cbytes) - will appreciate pointers on why chunking is not working:
import Control.Lens
import Network.AWS
import Network.AWS.S3
import Network.AWS.Data.Body
import System.IO
import Data.Conduit (($$+-))
import Data.Conduit.Binary (sinkLbs,sourceLbs)
import qualified Data.Conduit.List as CL (mapM_)
import Network.HTTP.Conduit (responseBody,RequestBody(..),newManager,tlsManagerSettings)
import qualified Data.ByteString.Lazy as LBS
example :: IO PutObjectResponse
example = do
-- To specify configuration preferences, newEnv is used to create a new Env. The Region denotes the AWS region requests will be performed against,
-- and Credentials is used to specify the desired mechanism for supplying or retrieving AuthN/AuthZ information.
-- In this case, Discover will cause the library to try a number of options such as default environment variables, or an instance's IAM Profile:
e <- newEnv NorthVirginia Discover
-- A new Logger to replace the default noop logger is created, with the logger set to print debug information and errors to stdout:
l <- newLogger Debug stdout
-- The payload for the S3 object is retrieved from a file that simulates lazy bytestring received over network
inb <- LBS.readFile "out"
lenb <- System.IO.withFile "out" ReadMode hFileSize -- evaluates to 104857600 (100MB)
let cbytes = toBody $ ChunkedBody (1024*128) (fromIntegral lenb) (sourceLbs inb)
-- We now run the AWS computation with the overriden logger, performing the PutObject request:
runResourceT . runAWS (e & envLogger .~ l) $
send ((putObject "yourtestenv-change-it-please" "testbucket/test" cbytes) & poContentType .~ Just "text; charset=UTF-8")
main = example >> return ()
Running the executable with RTS -s option shows that entire thing is read into memory (~113MB maximum residency - I did see ~87MB once). On the other hand, if I use chunkedFile, it is chunked correctly (~10MB maximum residency).

It's clear this bit
inb <- LBS.readFile "out"
lenb <- System.IO.withFile "out" ReadMode hFileSize -- evaluates to 104857600 (100MB)
let cbytes = toBody $ ChunkedBody (1024*128) (fromIntegral lenb) (sourceLbs inb)
should be rewritten as
lenb <- System.IO.withFile "out" ReadMode hFileSize -- evaluates to 104857600 (100MB)
let cbytes = toBody $ ChunkedBody (1024*128) (fromIntegral lenb) (C.sourceFile "out")
As you wrote it, the purpose of conduits is defeated. The entire file would need to be accumulated by LBS.readFile, but then broken apart chunk by chunk when fed to sourceLBS. (If lazy IO is working right, this might not happen.) sourceFile reads the file incrementally, chunk by chunk. It may be that, e.g. toBody accumulates the whole file, in which case the point of conduits is defeated at a different point. Glancing at the source for send and so on I can't see anything that would do this, though.

I am not sure but I think the culprit is LBS.readFile its documentation says:
readFile :: FilePath -> IO ByteString
Read an entire file lazily into a ByteString.
The Handle will be held open until EOF is encountered.
chunkedFile works in the way of conduit - alternatively you could use
sourceFile :: MonadResource m => FilePath -> Producer m ByteString
from (conduit-extras/Data.Conduit.Binary) instead of LBS.readFile, but I am not an expert.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Why doesn't print force entire lazy IO value? - haskell

Related

When to call runResourceT on streaming-bytestring?

Log all requests and responses for http-conduit

How to track progress through a streaming ByteString?

Why does Haskell's main function require IO operations? [duplicate]

How make chunking work with amazonka, conduit and lazy bytestring

Categories

Resources