How does httpLBS work in Network.HTTP.Simple module? - haskell

I am confused by the lazy concept.
I am new to Haskell and I met the httpLBS function in the Network.HTTP.Simple module, which says:
Perform an HTTP request and return the body as a lazy ByteString. Note
that the entire value will be read into memory at once (no lazy I/O
will be performed). The advantage of a lazy ByteString here (versus
using httpBS) is--if needed--a better in-memory representation.
What is the value? Is it the entire response? If the entire response is in memory then what is the point of a lazy byte string here? How does this work? What is the difference between the lazy IO and lazy byte string?

Related

Efficient way to combine a lazy ByteString and a lazy Text

I'm writing some code that is rendering an HTML page (via servant, if that's relevant), and for various complicated reasons, I have to construct the HTML by "combining" two segments.
One segment is fetched from an internal HTTP API which returns a Data.ByteString.Lazy
The other segment is rendered using the ede library, which generates a Data.Text.Lazy
What options do I have if I have to combine these two segments efficiently? The two segments can be reasonably large (few 100 kbs each). This servant server is going to see quite some traffic, so any inefficiency (like copying 100s of kbs of memory for every req/res, will quickly add up).
Assuming your endpoint returns a lazy ByteString, use the function encodeUtf8 from Data.Text.Lazy.Encoding to convert your lazy Text into a lazy ByteString, and then return the append of the two lazy ByteStrings.
Internally, lazy ByteStrings are basically lists of strict ByteString chunks. Concatenating them is list concatenation, and doesn't incur in new allocations for the bytes themselves.
A time and space-efficient implementation of lazy byte vectors using
lists of packed Word8 arrays
Some operations, such as concat, append, reverse and cons, have better
complexity than their Data.ByteString equivalents, due to
optimisations resulting from the list spine structure.
If you had a large number of lazy ByteStrings instead of two, you should take the extra step of using lazyByteString to convert them to Builders, concatenate the Builders, and then get the result lazy ByteString using toLazyByteString. This will avoid the inefficiency of left-associated list concatenation.
Builders denote sequences of bytes. They are Monoids where mempty is
the zero-length sequence and mappend is concatenation, which runs in
O(1).

a simple question about a simple http get request of a json string in haskell.. :(

I'm trying to learn haskell, and what better a way than to learn by converting an already existing program I have made over to haskell, since I know how my program works otherwise.
the first step is to make a simple http get request to a link that provides me a JSON string. I have been digging, and dumpster diving through as much haskell documentation as I can, but I am getting the idea that haskell documentation is obviously... not accessible to non-haskell programmers.
I need to take a link- lets say https://aur.archlinux.org/rpc/?v=5&type=search&by=name-desc&arg=brave
and make a get request on that link. In other languages, there seemed to be an easy way to get the body of that response as a STRING. In haskell im wracking my brain with bytestrings? and "you cant do this because main is IO" and etc etc.
I just can't make any sense of it and I want to breach through this accessibility barrier that haskell has because I otherwise love functional programming, I dont wanna program another way!
{-# LANGUAGE OverloadedStrings #-}
import Network.HTTP.Simple
import qualified Data.ByteString.Char8 as B8
main :: IO ()
main = do
httpBS "https://aur.archlinux.org/rpc/?v=5&type=search&by=name-desc&arg=brave" >>= B8.putStrLn . getResponseBody
Doing this gets the response and outputs it as standard out, but I need to save this as a string, or convert it from a bytestring? to a string so that I can parse it.
If I sound defeated it's because I very much am.
what better a way than to learn by converting an already existing program I have made over to haskell, since I know how my program works otherwise.
This strategy will encourage you to carry over idioms from the original language. I'm guessing your prior language didn't encourage use of monads and wasn't lazy. Perhaps it wasn't even functional, but the point is this can actually be a detrimental start.
I just can't make any sense of it and I want to breach through this accessibility barrier
Learning to read documentation is hard in any unknown language (for me at least). Understanding each character is actually rather important. For example, a ByteString - an array of bytes, or just Bytes if it were better named - isn't a hard concept but the name implies things to people.
If I sound defeated it's because I very much am.
You're so close! Think of the high level:
Get the data with an HTTP GET
Parse the data from the string (or bytes) into a structure. Python uses a dictionary, for example.
You did 1 already, nice work. Rather than do everything in a single line in point free style (composing a bunch of functions with a bunch of operators), let's name our intermediate values and have one concept per line:
#!/usr/bin/env cabal
{- cabal:
build-depends: base, http-conduit, aeson
-}
{-# LANGUAGE OverloadedStrings #-}
I'm using a shebang so I can just chmod +x file.hs and ./file.hs as I develop.
import Network.HTTP.Simple
import qualified Data.Aeson as Aeson
The most common JSON library in Haskell is Aeson, we'll use that to parse the bytes into json much like python's json.loads.
main :: IO ()
main = do
httpRequest <- parseRequest "https://aur.archlinux.org/rpc/?v=5&type=search&by=name-desc&arg=brave"
response <- httpLBS httpRequest
You're start is good. Notice httpBS is a class of functions generically http<SomeTypeOfResult> and not httpGet like you might have seen in Java or Python. The function learns if this is a GET (vs POST etc) and the headers using fields in the Request data type. To get a Request we parse the URL string and just use all of parseRequests defaults (which is HTTP GET).
I did change to getting lazy byte strings (LBS) because I know the Aeson library uses those later on. This is something like an iterator that produces bytestrings (for intuition, not entirely accurate).
Rather than >>= moreFunctions I'm naming the intermediate value response so we can use it and look at each step separately.
let body = getResponseBody response
Extracting the body from the response is just like what you had, except as a separate expression.
let obj = Aeson.decode body :: Maybe Aeson.Object
The big part is to decode the bytes to JSON. This is hopefully familiar since every language under the sun does json decoding to some sort of dictionary/map/object. In Haskell you'll find it less common to decode to a map and more common to define a structure that is explicit in what you expect to have in the JSON then make a custom decoding routine for that type using the FromJSON class - you don't have to do that, it brings in way more concepts than you'll want when just getting started as a beginner.
print obj
I know this doesn't need explained.
Alternative
If you saw the documentation you might have seen (or considered searching the page for) JSON. This can save you lots of time!
#!/usr/bin/env cabal
{- cabal:
build-depends: base, http-conduit, aeson
-}
{-# LANGUAGE OverloadedStrings #-}
import Network.HTTP.Simple
import qualified Data.Aeson as Aeson
main :: IO ()
main = do
response <- httpJSON =<< parseRequest "https://aur.archlinux.org/rpc/?v=5&type=search&by=name-desc&arg=brave"
let obj = getResponseBody response :: Maybe Aeson.Object
print obj

What makes a Bytestring "lazy"?

I am learning Haskell but having some difficulty understanding how exactly lazy ByteStrings work. Hackage says that "Lazy ByteStrings use a lazy list of strict chunks which makes it suitable for I/O streaming tasks". In contrast, a strict list is stored as one large array.
What are these "chunks" in lazy byteStrings? How does your compiler know just how large a chunk should be? Further, I understand that the idea behind a lazy list is that you don't have to store the entire thing, which thus allows for infinite lists and all of that. But how is this storage implemented? Does each chunk have a pointer to a next chunk?
Many thanks in advance for the help :)
You can find the definition of the lazy ByteString here:
data ByteString = Empty | Chunk {-# UNPACK #-} !S.ByteString ByteString
deriving (Typeable)
so Chunk is one data-constructor - the first part is a strict (!) strict (S.) ByteString and then some more Chunks or Empty via the second recursive (lazy) ByteString part.
Note that the second part does not have the (!) there - so this can be a GHC thunk (the lazy stuff in Haskell) that will only be forced when you need it (for example pattern-match on it).
That means a lazy ByteString is either Empty or you get a strict (you can think of this as already loaded if you want) part or chunk of the complete string with a lazy remaining/rest/tail ByteString.
As about the size that depends on the code that is generating this lazy bytestring - the compiler does not come into this.
You can see this for hGetContents:
hGetContents = hGetContentsN defaultChunkSize
where defaultChunkSize is defined to be 32 * 1024 - 2 * sizeOf (undefined :: Int) - so a bit less than 32kB
And yes the rest (snd. argument to Chunk) can be seen as a pointer to the next Chunk or Empty (just like with a normal list).

How does lazy evaluation interplay with MVars?

Let's say I have multiple threads that are reading from a file and I want to make sure that only a single thread is reading from the file at any point in time.
One way to implement this is to use an mvar :: MVar () and ensure mutual exclusion as follows:
thread = do
...
_ <- takeMVar mvar
x <- readFile "somefile" -- critical section
putMVar mvar ()
...
-- do something that evaluates x.
The above should work fine in strict languages, but unless I'm missing something, I might run into problems with this approach in Haskell. In particular, since x is evaluated only after the thread exits the critical section, it seems to me that the file will only be read after the thread has executed putMVar, which defeats the point of using MVars in the first place, as multiple threads may read the file at the same time.
Is the problem that I'm describing real and, if so, how do I get around it?
Yes, it's real. You get around it by avoiding all the base functions that are implemented using unsafeInterleaveIO. I don't have a complete list, but that's at least readFile, getContents, hGetContents. IO actions that don't do lazy IO -- like hGet or hGetLine -- are fine.
If you must use lazy IO, then fully evaluate its results in an IO action inside the critical section, e.g. by combining rnf and evaluate.
Some other commentary on related things, but that aren't directly answers to this question:
Laziness and lazy IO are really separate concepts. They happen to share a name because humans are lazy at naming. Most IO actions do not involve lazy IO and do not run into this problem.
There is a related problem about stuffing unevaluated pure computations into your MVar and accidentally evaluating it on a different thread than you were expecting, but if you avoid lazy IO then evaluating on the wrong thread is merely a performance bug rather than an actual semantics bug.
readFile should be named unsafeReadFile because it's unsafe in the same way as unsafeInterleaveIO. If you stay away from functions that have, or should have, the unsafe prefix then you won't have this problem.
Haskell isn't a lazily evaluated language. It's language in which, as in mathematics, evaluation order doesn't matter (except that you mustn't spend an unbounded amount of time trying to evaluate a function's argument before evaluating the function body). Compilers are free to reorder computations for efficiency reasons, and GHC does, so programs compiled with GHC aren't lazily evaluated as a rule.
readFile (along with getContents and hGetContents) is one of a small number of standard Haskell functions without the unsafe prefix that violate Haskell's value semantics. GHC has to specially disable its optimizations when it encounters such functions because they make program transformations observable that aren't supposed to be observable.
These functions are convenient hacks that can make some toy programs easier to write. You shouldn't use them in threaded code, or, in my opinion, at all. I think they shouldn't even be used in introductory programming courses (which is probably what they were meant for) because they give beginners a totally wrong impression of how evaluation in Haskell is supposed to work.

How do laziness and I/O work together in Haskell?

I'm trying to get a deeper understanding of laziness in Haskell.
I was imagining the following snippet today:
data Image = Image { name :: String, pixels :: String }
image :: String -> IO Image
image path = Image path <$> readFile path
The appeal here is that I could simply create an Image instance and pass it around; if I need the image data it would be read lazily - if not, the time and memory cost of reading the file would be avoided:
main = do
image <- image "file"
putStrLn $ length $ pixels image
But is that how it actually works? How is laziness compatible with IO? Will readFile be called regardless of whether I access pixels image or will the runtime leave that thunk unevaluated if I never refer to it?
If the image is indeed read lazily, then isn't it possible I/O actions could occur out of order? For example, what if immediately after calling image I delete the file? Now the putStrLn call will find nothing when it tries to read.
How is laziness compatible with I/O?
Short answer: It isn't.
Long answer: IO actions are strictly sequenced, for pretty much the reasons you're thinking of. Any pure computations done with the results can be lazy, of course; for instance if you read in a file, do some processing, and then print out some of the results, it's likely that any processing not needed by the output won't be evaluated. However, the entire file will be read, even parts you never use. If you want lazy I/O, you have roughly two options:
Roll your own explicit lazy-loading routines and such, like you would in any strict language. Seems annoying, granted, but on the other hand Haskell makes a fine strict, imperative language. If you want to try something new and interesting, try looking at Iteratees.
Cheat like a cheating cheater. Functions such as hGetContents will do lazy, on-demand I/O for you, no questions asked. What's the catch? It (technically) breaks referential transparency. Pure code can indirectly cause side effects, and funny things can happen involving ordering of side effects if your code is really convoluted. hGetContents and friends are implemented using unsafeInterleaveIO, which is... exactly what it says on the tin. It's nowhere near as likely to blow up in your face as using unsafePerformIO, but consider yourself warned.
Lazy I/O breaks Haskell's purity. The results from readFile are indeed produced lazily, on demand. The order in which I/O actions occur is not fixed, so yes, they could occur "out of order". The problem of deleting the file before pulling the pixels is real. In short, lazy I/O is a great convenience, but it's a tool with very sharp edges.
The book on Real World Haskell has a lengthy treatment of lazy I/O and goes over some of the pitfalls.

Resources