Haskell design: Struggling with IO - haskell

I am a novice haskell programmer and I am trying to write some Haskell cgi which will read from a MySQL DB and output JSON. I am able to generate the right JSON but am unable to get the data types correctly to be able to output JSON correctly. I also think that I am primarily thinking imperative still. Here is my code. Note that getTopBrands provides json output.
My problem is that I am unable to figure out how to return "[Char]" from getTopBrands and not "IO [Char]". It looks to me I am still thinking imperative. Any pointers, suggestions to fix this would be greatly appreciated. Please let me know if I need to provide the rest of the code.
RODB.hs:
{-# LANGUAGE RecordWildCards, OverloadedStrings, PackageImports #-}
module Main where
import RODB
import ROOutput
import System.Environment
import Database.HDBC
import Network.Socket(withSocketsDo)
import Network.CGI
import Text.XHtml
import qualified "bytestring" Data.ByteString.Lazy.Char8 as LBS
import Data.Aeson
page :: Html
page = body << h1 << str
main = runCGI $ handleErrors cgiMain
cgiMain :: CGI CGIResult
cgiMain =
do out <- getTopBrands 10 1
setHeader "Content-type" "application/json"
output $ renderHtml page out
getTopBrands :: Integer -> Integer -> IO [Char]
getTopBrands limit sorted =
do let temp = 0
dbh <- connect "127.0.0.1" "ReachOutPublicData" "root" "admin" "/tmp/mysql.sock"
if sorted == 1
then do brandlist <- getBrands dbh limit True
json <- convPublicBrandEntrytoJSON brandlist
return $ LBS.unpack json
else do brandlist <- getBrands dbh limit False
json <- convPublicBrandEntrytoJSON brandlist
return $ LBS.unpack json

As Niklas B said, getTopBrands being in IO is right, since it depends on I/O. I guess your problem is that you get a type error from that when you try to use it directly,
cgiMain :: CGI CGIResult
cgiMain =
do out <- getTopBrands 10 1
setHeader "Content-type" "application/json"
output $ renderHtml page out
since all statements in a do-block must belong to the same monad, and the rest of the block is in CGI. But, CGI is a MonadIO, thus you can simply liftIO it into CGI,
cgiMain :: CGI CGIResult
cgiMain =
do out <- liftIO $ getTopBrands 10 1
setHeader "Content-type" "application/json"
output $ renderHtml page out
The next point Niklas raised is also right, the second Integer argument of getTopBrands should really be a Bool. However, even with its current type, the code duplication is entirely unnecessary, the difference between the two branches is just the Bool argument to getBrands, so
getTopBrands :: Integer -> Integer -> IO [Char]
getTopBrands limit sorted =
do let temp = 0
dbh <- connect "127.0.0.1" "ReachOutPublicData" "root" "admin" "/tmp/mysql.sock"
brandlist <- getBrands dbh limit (sorted == 1)
json <- convPublicBrandEntrytoJSON brandlist
return $ LBS.unpack json
just pass it the condition on which you branched.
Niklas' third point
I also don't see why convPublicBrandEntrytoJSON would need to live in IO, but since you didn't provide its definition I cannot suggest an improvement here.
also looks very valid, a conversion would usually be a pure function. If the only reason why it is in IO is the ability to write
json <- convPublicBrandEntrytoJSON brandlist
you should be aware that you can bind results of pure functions in a do-block
let json = convPublicBrandEntrytoJSON brandlist
using let.

Related

heap memory buildup with xml-conduit parseBytes

I'm parsing some rather large XML files with xml-conduit's streaming interface https://hackage.haskell.org/package/xml-conduit-1.8.0/docs/Text-XML-Stream-Parse.html#v:parseBytes but I'm seeing this memory buildup (here on a small test file):
where the top users are:
The actual data shouldn't take up that much heap – if I serialise and re-read, the resident memory use is kilobytes vs the megabytes here.
The minimal example I've managed to reproduce this with:
{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Control.Monad
import Control.Monad.IO.Class
import Data.Conduit
import Data.Conduit.Binary (sourceFile)
import qualified Data.Conduit.List as CL
import Data.Text (Text)
import Text.XML.Stream.Parse
type Y = [(Text, Text)]
main :: IO ()
main = do
res1 <- runConduitRes $
sourceFile "test.xml"
.| Text.XML.Stream.Parse.parseBytes def
.| parseMain
.| CL.foldM get []
print res1
get :: (MonadIO m, Show a) => [a] -> [a] -> m [a]
get acc !vals = do
liftIO $! print vals -- this oughta force it?
return $! take 1 vals ++ acc
parseMain = void $ tagIgnoreAttrs "Period" parseDetails
parseDetails = many parseParam >>= yield
parseParam = tag' "param" parseParamAttrs $ \idAttr -> do
value <- content
return (idAttr, value)
parseParamAttrs = do
idAttr <- requireAttr "id"
attr "name"
return idAttr
If I change get to just return ["hi"] or something, I don't get the buildup. So it seems the returned texts keep some reference to the larger text they were in (e.g. zero-copy slicing, cf. comment at https://hackage.haskell.org/package/text-0.11.2.0/docs/Data-Text.html#g:18 ), so the rest of the text can't be garbage collected even though we're using only little parts.
Our fix is to use Data.Text.copy on any attributes we want to yield:
someattr <- requireAttr "n"
yield (T.copy someattr)
which lets us parse with nearly constant memory use.
(And we might consider using https://markkarpov.com/post/short-bs-and-text.html#shorttext if we want to save even more memory.)

Haskell Conduit: having a Sink return a value based on the values from upstream

I've been trying to use the Conduit library to do some simple I/O involving files, but I'm having a hard time.
I have a text file containing nothing but a few digits such as 1234. I have a function that reads the file using readFile (no conduits), and returns Maybe Int (Nothing is returned when the file actually doesn't exist). I'm trying to write a version of this function that uses conduits, and I just can't figure it out.
Here is what I have:
import Control.Monad.Trans.Resource
import Data.Conduit
import Data.Functor
import System.Directory
import qualified Data.ByteString.Char8 as B
import qualified Data.Conduit.Binary as CB
import qualified Data.Conduit.Text as CT
import qualified Data.Text as T
myFile :: FilePath
myFile = "numberFile"
withoutConduit :: IO (Maybe Int)
withoutConduit = do
doesExist <- doesFileExist myFile
if doesExist
then Just . read <$> readFile myFile
else return Nothing
withConduit :: IO (Maybe Int)
withConduit = do
doesExist <- doesFileExist myFile
if doesExist
then runResourceT $ source $$ conduit =$ sink
else return Nothing
where
source :: Source (ResourceT IO) B.ByteString
source = CB.sourceFile myFile
conduit :: Conduit B.ByteString (ResourceT IO) T.Text
conduit = CT.decodeUtf8
sink :: Sink T.Text (ResourceT IO) (Maybe Int)
sink = awaitForever $ \txt -> let num = read . T.unpack $ txt :: Int
in -- I don't know what to do here...
Could someone please help me complete the sink function?
Thanks!
This isn't really a good example for where conduit actually provides a lot of value, at least not the way you're looking at it right now. Specifically, you're trying to use the read function, which requires that the entire value be in memory. Additionally, your current error handling behavior is a bit loose. Essentially, you're just going to get an read: no parse error if there's anything unexpected in the content.
However, there is a way we can play with this in conduit and be meaningful: by parsing the ByteString byte-by-byte ourselves and avoiding the read function. Fortunately, this pattern falls into a standard left fold, which the conduit-combinators package provides a perfect function for (element-wise left fold in a conduit, aka foldlCE):
{-# LANGUAGE OverloadedStrings #-}
import Conduit
import Data.Word8
import qualified Data.ByteString as S
sinkInt :: Monad m => Consumer S.ByteString m Int
sinkInt =
foldlCE go 0
where
go total w
| _0 <= w && w <= _9 =
total * 10 + (fromIntegral $ w - _0)
| otherwise = error $ "Invalid byte: " ++ show w
main :: IO ()
main = do
x <- yieldMany ["1234", "5678"] $$ sinkInt
print x
There are plenty of caveats that go along with this: it will simply throw an exception if there are unexpected bytes, and it doesn't handle integer overflow at all (though fixing that is just a matter of replacing Int with Integer). It's important to note that, since the in-memory string representation of a valid 32- or 64-bit int is always going to be tiny, conduit is overkill for this problem, though I hope that this code gives some guidance on how to generally write conduit code.

Replace nested pattern matching with "do" for Monad

I want to simplify this code
{-# LANGUAGE OverloadedStrings #-}
import Data.Aeson
import Network.HTTP.Types
import Data.Text
getJSON :: String -> IO (Either String Value)
getJSON url = eitherDecode <$> simpleHttp url
--------------------------------------------------------------------
maybeJson <- getJSON "abc.com"
case maybeJson of
Right jsonValue -> case jsonValue of
(Object jsonObject) ->
case (HashMap.lookup "key123" jsonObject) of
(Just (String val)) -> Data.Text.IO.putStrLn val
_ -> error "Couldn't get the key"
_ -> error "Unexpected JSON"
Left errorMsg -> error $ "Error in parsing: " ++ errorMsg
by using do syntax for Monad
maybeJson <- getJSON "abc.com/123"
let toPrint = do
Right jsonValue <- maybeJson
Object jsonObject <- jsonValue
Just (String val) <- HashMap.lookup "key123" jsonObject
return val
case toPrint of
Just a -> Data.Text.IO.putStrLn a
_ -> error "Unexpected JSON"
And it gave me 3 errors:
src/Main.hs:86:19:
Couldn't match expected type `Value'
with actual type `Either t0 (Either String Value)'
In the pattern: Right jsonValue
In a stmt of a 'do' block: Right jsonValue <- maybeJson
src/Main.hs:88:19:
Couldn't match expected type `Value' with actual type `Maybe Value'
In the pattern: Just (String val)
In a stmt of a 'do' block:
Just (String val) <- HashMap.lookup "key123" jsonObject
src/Main.hs:88:40:
Couldn't match type `Maybe' with `Either String'
Expected type: Either String Value
Actual type: Maybe Value
Even when I replace
Just (String val) <- HashMap.lookup "key123" jsonObject
with
String val <- HashMap.lookup "key123" jsonObject
I'm getting another similar error about Either:
Couldn't match type `Maybe' with `Either String'
Expected type: Either String Value
Actual type: Maybe Value
In the return type of a call of `HashMap.lookup'
In a stmt of a 'do' block:
String val <- HashMap.lookup "key123" jsonObject
How do I fix those errors?
You can't easily simplify that into a single block of do-notation, because each case is matching over a different type. The first is unpacking an either, the second a Value and the third a Maybe. Do notation works by threading everything together through a single type, so it's not directly applicable here.
You could convert all the cases to use the same monad and then write it all out in a do-block. For example, you could have helper functions that do the second and third pattern match and produce an appropriate Either. However, this wouldn't be very different to what you have now!
In fact, if I was going for this approach, I'd just be content to extract the two inner matches into their own where variables and leave it at that. Trying to put the whole thing together into one monad just confuses the issue; it's just not the right abstraction here.
Instead, you can reach for a different sort of abstraction. In particular, consider using the lens library which has prisms for working with nested pattern matches like this. It even supports aeson nateively! Your desired function would look something like this:
decode :: String -> Maybe Value
decode json = json ^? key "key123"
You could also combine this with more specific prisms, like if you're expecting a string value:
decode :: String -> Maybe String
decode json = json ^? key "key123" . _String
This takes care of parsing the json, making sure that it's an object and getting whatever's at the specified key. The only problem is that it doesn't give you a useful error message about why it failed; unfortunately, I'm not good enough with lens to understand how to fix that (if it's possible at all).
So every line in a do expression for a Monad must return a value in that Monadic type. Monad is a typeclass here, not a type by itself. So putting everything in a do Monad is not really a sensible statement.
You can try your code with everything wrapped in a Maybe monad.
Assuming you've fetched your JSON value:
{-# LANGUAGE OverloadedStrings #-}
import Data.Aeson
import Network.HTTP
import qualified Data.Map as M
import Control.Applicative
import qualified Data.HashMap.Strict as HM
--------------------------------------------------------------------
main = do
maybeJson <- return $ toJSON (M.fromList [("key123","value")] :: M.Map String String)
ioVal <- return $ do -- The Maybe monad do expression starts here
maybeJson <- Just maybeJson
jsonObject <- case maybeJson of
Object x -> Just x
_ -> Nothing
val <- HM.lookup "key123" jsonObject
return val
putStrLn $ show ioVal
Once we start working in the Maybe monad, every expression must return a Maybe Something value. The way the Maybe monad works is that anything that is a Just something comes out as a pure something that you can work with, but if you get a Nothing, the rest of the code will be skipped and you'll get a Nothing.
This property of falling through is unique to the Maybe monad. Different monads behave differently.
You should read up more about Monads and the IO monad here: http://www.haskell.org/haskellwiki/Introduction_to_IO
You should read more about monads and what they really help you do:
http://learnyouahaskell.com/a-fistful-of-monads
(You should work through the previous chapters and then get to this chapter. Once you do, you'll have a pretty solid understanding of what is happening).
I also think your HTTP request is screwed up. Here's an example of a POST request that you can use.
import qualified Network.HTTP as H
main = do
postData <- return $ H.urlEncodeVars [("someVariable","someValue")]
request <- return $ H.postRequestWithBody "http://www.google.com/recaptcha/api/verify" "application/x-www-form-urlencoded" postData
putStrLn $ show request
-- Make the request
(Right response) <- H.simpleHTTP request
-- Print status code
putStrLn $ show $ H.rspCode response
-- Print response
putSrLn $ show $ H.rspBody response
UPDATED:
Use the following to help you get a JSON value:
import qualified Data.ByteString.Lazy as LC
import qualified Data.ByteString.Char8 as C
import qualified Data.Aeson as DA
responseBody <- return $ H.rspBody response
responseJSON <- return (DA.decode (LC.fromChunks [C.pack responseBody]) :: Maybe DA.Value)
You'll have to make a request object to make a request. There are quite a few helpers. I meant the post request as the most generic case:
http://hackage.haskell.org/package/HTTP-4000.0.5/docs/Network-HTTP.html
Since you are in the IO monad, all the <- are going to assume that you are dealing with IO operations. When you write
do
Right jsonValue <- maybeJson
Object jsonObject <- jsonValue
you are saying that jsonValue must be an IO action just like maybeJson. But this is not the case! jsonValue is but a regular Either value. The silution here would ge to use a do-block let instead of a <-:
do
Right jsonValue <- maybeJson
let Object jsonObject = jsonValue
However, its important to note that in both versions of your code you are using an irrecoverable error to abort your program if the JSON parsing fails. If you want to be able to collect errors, the basic idea would be to convert your values to Either (and then use the monad instance for Either to avoid having lots of nested case expressions)

Happstack get new StdGen for each request?

I have a simple little Happstack application that shows a form with an email field and a random question field to help combat spam. To get a random number I use getStdGen in my main function and pass it along to my function which creates the html. The problem is that the same StdGen is used so my random value is not random unless I restart the application.
Here's what my Main.hs looks like:
{-# LANGUAGE OverloadedStrings, ScopedTypeVariables #-}
module Main where
import Happstack.Lite
import qualified Pages.Contact as Contact
import System.Random
main :: IO ()
main = do
gen <- getStdGen
serve Nothing $ pages gen
pages :: StdGen -> ServerPart Response
pages g = msum
[ dir "contact" $ Contact.page g
... Other irrelevant pages
]
And here's the function that uses the StdGen to retrive a random question id:
getRandomQID :: StdGen -> Int
getRandomQID g =
let (rpercent, _) = random g :: (Float, StdGen)
rid = rpercent * questionsAmount
in round rid
questionsAmount :: (Num a) => a
questionsAmount = (fromIntegral . length) questions
What is the most elegant way to solve this problem?
As I wrote this question I found a solution that worked in the Happstack crash course (templates).
In your route which has a return type of ServerPart Response you can use the liftIO monad transformer to be able to perform IO actions. There's this handy function called randomRIO with generates a random Int from an input of a tuple which two Ints as range, like this:
page :: ServerPart Response
page = do
randID <- liftIO $ randomRIO (0, max)
... Code to generate response ...
where max = length questions
randomRIO can be found in System.Random and liftIO can be found in Control.Monad.Trans.

How to get utf8 rss feed?

I'm trying to use the package RSS with UTF8 string with no avail. (i don't want to use HXT which works, i just want to understand where i'm wrong)
In ghci when i put "test" i just get garbage with character such as "é".
If i get the string from reading a file with UTF8.readFile and send it to parseFromString it works, but when i download and use getRespBody it doesn't.
Here is my sample code :
import Network.HTTP (simpleHTTP, getRequest, getResponseBody)
import Data.Maybe (fromJust)
import Text.Feed.Import (parseFeedString)
import Text.RSS.Syntax
import Text.Feed.Types (Feed(..))
import Prelude hiding (putStrLn)
import Data.ByteString.Char8 (putStrLn)
import Data.ByteString.UTF8 (fromString)
siteUrl = "http://radiofrance-podcast.net/podcast09/rss_11549.xml"
type Links = [(String,String,String)]
-------------------------------------------------------------------------------
-- Main function
-------------------------------------------------------------------------------
test = getLinks siteUrl >>= mapM_ (putStrLn.fromString)
-------------------------------------------------------------------------------
-- Retrieve titles
-------------------------------------------------------------------------------
getLinks:: String -> IO [String]
getLinks url = simpleHTTP (getRequest url) >>= getResponseBody >>= parseDoc
parseDoc d = do
let RSSFeed rss = (fromJust . parseFeedString ) d
items = rssItems.rssChannel $ rss
titles = map (fromJust.rssItemTitle) items
return $ titles
Update:
thanks to Roman's answer, i have modified my code. Here are the modification for anyone who may be interested.
import Codec.Binary.UTF8.String (decodeString) -- <-- added
getLinks:: String -> IO [String]
getLinks url = simpleHTTP (getRequest url) >>= getResponseBody >>= parseDoc.decodeString -- <-- modified
The fact that simpleHTTP may return String-based responses is a bit confusing. In reality they are not Unicode strings, but byte strings that contain the HTTP response as is. No automatic decoding is done.
So, you need to decode the http response before passing it to feed parsing functions (e.g. using the encoding or utf8-string package).
You probably want to extract the source encoding information from the Content-Type http header or from the RSS document itself.

Resources