Haskell. MongoDB driver or Aeson charset problem - haskell

Good day, i have mongodb database filled with some data, i ensured that data stored in correct charset, to fetch data i use following snippet:
{-# LANGUAGE OverloadedStrings #-}
import Network.Wai
import Network.Wai.Handler.Warp (run)
import Data.Enumerator (Iteratee (..))
import Data.Either (either)
import Control.Monad (join)
import Data.Maybe (fromMaybe)
import Network.HTTP.Types (statusOK, status404)
import qualified Data.ByteString as B
import qualified Data.ByteString.Lazy as L
import Data.ByteString.Char8 (unpack)
import Data.ByteString.Lazy.Char8 (pack)
import qualified Data.Text.Lazy as T
import Data.Text (Text(..))
import Control.Monad.IO.Class (liftIO, MonadIO)
import Data.Aeson (encode)
import qualified Data.Map as Map
import qualified Database.MongoDB as DB
application dbpipe req = do
case unpack $ rawPathInfo req of
"/items" -> itemsJSON dbpipe req
_ -> return $ responseLBS status404 [("Content-Type", "text/plain")] "404"
indexPage :: Iteratee B.ByteString IO Response
indexPage = do
page <- liftIO $ processTemplate "templates/index.html" []
return $ responseLBS statusOK [("Content-Type", "text/html; charset=utf-8")] page
processTemplate f attrs = do
page <- L.readFile f
return page
itemsJSON :: DB.Pipe -> Request -> Iteratee B.ByteString IO Response
itemsJSON dbpipe req = do
dbresult <- liftIO $ rundb dbpipe $ DB.find (DB.select [] $ tu "table") >>= DB.rest
let docs = either (const []) id dbresult
-- liftIO $ L.putStrLn $ encode $ show $ map docToMap docs
return $ responseLBS statusOK [("Content-Type", "text/plain; charset=utf-8")]
(encode $ map docToMap docs)
docToMap doc = Map.fromList $ map (\f -> (T.dropAround (== '"') $ T.pack $ show $ DB.label f, T.dropAround (== '"') $ T.pack $ show $ DB.value f)) doc
main = do
pipe <- DB.runIOE $ DB.connect $ DB.host "127.0.0.1"
run 3000 $ application pipe
rundb pipe act = DB.access pipe DB.master database act
tu :: B.ByteString -> UString
tu = DB.u . C8.unpack
Then the result is suprprising, DB.label works well, but DB.value giving me native characters as some escape codes, so the result is look like:
curl http://localhost:3000/items gives:
[{"Марка": "\1058\1080\1087 \1087\1086\1076",
"Model": "BD-W LG BP06LU10 Slim \1058\1080\1087 \1087\1086\1076\1082\1083\1102\1095\1077\1085\1080\1103"},
...
]
This happens in case i trying to print data and also in case i return data encoded as JSON
Any idea how correctly extract values from MongoDB driver ?

The following line confirms that aeson's encoding works properly (using the utf8-string library to read utf8 data off the lazy bytestring back to a haskell string:
> putStrLn $ Data.ByteString.Lazy.UTF8.toString $ encode $ ("\1058\1080\1087 \1087\1086\1076",12)
["Тип под",12]
Looking at your code more closely I see the real problem. You're calling T.pack $ show $ DB.value -- this will render out as literal codepoints, and then pack those into a text object. The fix is to switch from show to something smarter. Look at this (untested)
smartShow :: DB.Value -> Text
smartShow (String s) = Data.Text.Encoding.decodeUtf8 $ Data.CompactString.UTF8.toByteString s
smartShow x = T.pack $ show x
Obviously to handle the recursive cases, etc. you need to be smarter than that, but that's the general notion...
In fact, the "best" thing to do is to write a function of BSON -> JSON directly, rather than go through any intermediate structures at all.

Everything is working as expected -- only your expectations are wrong. =)
What you're seeing there are not raw Strings; they are String's which have been escaped to exist purely in the printable ASCII range by the show function, called by print:
print = putStrLn . show
Never fear: in memory, the string that prints as "\1058" is in fact a single Unicode codepoint long. You can observe this by printing the length of one of the Strings you're interested in and comparing that to the number of Unicode codepoints you expect.

Related

What is the best way to get data from url and parse it on Haskell?

I'm having trouble with parsing data from url.
I have url with "https://" so i think i should use import Network.HTTP.Conduit
But
simpleHttp url
returns L.ByteString
I really don't understand what shoud i do after that
So i have such code to get data
toStrict1 :: L.ByteString -> B.ByteString
toStrict1 = B.concat . L.toChunks
main :: IO ()
main = do
lbs <- simpleHttp url
let page = toStrict1 lbs
and example of parsing
let lastModifiedDateTime = fromFooter $ parseTags doc
putStrLn $ "wiki.haskell.org was last modified on " ++ lastModifiedDateTime
where fromFooter = unwords . drop 6 . words . innerText . take 2 . dropWhile (~/= "<li id=footer-info-lastmod>")
How can i combine this two parts of code?
As you've seen, the simpleHttp function returns a lazy bytestring. There are several ways to deal with this in TagSoup.
First, it turns out that you can parse it directly. The function parseTags has signature:
parseTags :: StringLike str => str -> [Tag str]
meaning that it can parse any type str with a StringLike instance, and if you look at the Text.StringLike module documentation, you'll see that lazy ByteStrings have a StringLike instance.
However, if you go this route, you need to be aware that everything's kind of "trapped" in a ByteString world, so you have to write your code using versions of functions like words and unwords that are bytestring-compatible, and even your putStrLn needs an adapter. A full working example would look like this:
import Network.HTTP.Conduit
import Text.HTML.TagSoup
import qualified Data.ByteString.Lazy as BL
import qualified Data.ByteString.Lazy.Char8 as CL
main :: IO ()
main = do
lbs <- simpleHttp "https://wiki.haskell.org"
let lastModifiedDateTime = fromFooter $ parseTags lbs
putStrLn $ "wiki.haskell.org was last modified on "
++ CL.unpack lastModifiedDateTime
where fromFooter = CL.unwords . drop 6 . CL.words
. innerText . take 2 . dropWhile (~/= "<li id=footer-info-lastmod>")
and it works fine:
> main
wiki.haskell.org was last modified on 9 September 2013, at 22:38.
>
The functions from Data.ByteString.Lazy.Char8 basically assume that the bytestring is ASCII-encoded, which is close enough for this example to work.
However, it would be more robust to decode the bytestring based on the proper character encoding to a valid text type. The two main text types in Haskell are the default String type, which is inefficient and slow, but easy to work with, and the Text type, which is highly efficient but a bit more complicated. (Like ByteString, you need to use Text-compatible versions of functions like words and so on.) Both String and Text have StringLike instances, so they both work fine with TagSoup.
If we were going to write production-quality code, we'd actually consult the response headers from the HTTP request and/or check for a <meta> tag in the HTML to determine the real encoding. But, if we just assume the coding is UTF-8 (which it is), the Text version looks like this:
import Network.HTTP.Conduit
import Text.HTML.TagSoup
import qualified Data.Text.Lazy as TL
import qualified Data.Text.Lazy.Encoding as TL
import qualified Data.ByteString.Lazy as BL
main :: IO ()
main = do
lbs <- simpleHttp "https://wiki.haskell.org"
let lastModifiedDateTime = fromFooter $ parseTags (TL.decodeUtf8 lbs)
putStrLn $ "wiki.haskell.org was last modified on "
++ TL.unpack lastModifiedDateTime
where fromFooter = TL.unwords . drop 6 . TL.words
. innerText . take 2 . dropWhile (~/= "<li id=footer-info-lastmod>")
and a String version using Data.ByteString.Lazy.UTF8 from the utf8-string package looks like this:
import Network.HTTP.Conduit
import Text.HTML.TagSoup
import qualified Data.ByteString.Lazy as BL
import qualified Data.ByteString.Lazy.UTF8 as BL
main :: IO ()
main = do
lbs <- simpleHttp "https://wiki.haskell.org"
let lastModifiedDateTime = fromFooter $ parseTags (BL.toString lbs)
putStrLn $ "wiki.haskell.org was last modified on "
++ lastModifiedDateTime
where fromFooter = unwords . drop 6 . words
. innerText . take 2 . dropWhile (~/= "<li id=footer-info-lastmod>")

how do I modify inputText to use inputCheckbox

I am trying to do something similar to this, where for an element in list of strings, I have a checkbox next to it and figure out which checkbox is checked or not. Using examples from the internet, I was able to get an example running
{-# LANGUAGE OverloadedStrings #-}
import Data.Monoid
import Data.String
import Data.List
import qualified Data.Text as T
import Web.Spock.Safe
import Web.Spock.Digestive
import Text.Blaze (ToMarkup(..))
import Text.Blaze.Html5 hiding (html, param, main)
import qualified Text.Blaze.Html5 as H
import Text.Blaze.Html.Renderer.Utf8 (renderHtml)
import Text.Digestive
import Text.Digestive.Blaze.Html5
import System.Directory
import Control.Monad.IO.Class
import Control.Monad (forM_)
gen :: Html -> [Html] -> Html
gen title elts = H.html $ do
H.head $
H.title title
H.body $
H.ul $ mapM_ H.li elts
data CheckBox = CheckBox { postTitle :: T.Text }
checkboxForm = CheckBox
<$> "title" .: Text.Digestive.text Nothing
renderForm :: View Html -> Html
renderForm v = do
Text.Digestive.Blaze.Html5.form v "POST" $ do
H.p $ do
Text.Digestive.Blaze.Html5.label "title" v "Post title: "
inputText "text" v
inputSubmit "Submit Post"
main :: IO ()
main =
runSpock 8080 $ spockT Prelude.id $ do
get root $ do
listing <- liftIO $ getDirectoryContents "/home/hasenov/mydir"
let filteredListing = filter (\l -> not $ isPrefixOf "." l) listing
(view, result) <- runForm "checkboxForm" checkboxForm
case result of
Nothing -> lazyBytes $ renderHtml (renderForm view)
Just newCheckbox -> lazyBytes $ renderHtml (renderForm view)
-- lazyBytes $ renderHtml (gen "My Blog" (Data.List.map fromString filteredListing))
-- get ("hello" <//> var) $ \name ->
-- text ("Hello " <> name <> "!")
However, in the function renderForm, when I change inputText to something like inputCheckbox "True", I get the error True does not exist. I am not able to find an example where inputCheckbox is used, I was hoping someone would help me adapt filteredString so it would display checkboxes next to it, and I can run the form properly. Also, in previous link I posted, I don't know what the function inputCheckBox, since I could only find inputCheckbox. Perhaps this is an outdated function?
I'm answering my own question since I figured out how to get inputCheckbox instead of inputText. Actually, this example helped alot. It was the only one I could find which uses inputCheckbox. What I needed to do was change
data CheckBox = CheckBox { postTitle :: T.Text }
to
data CheckBox = CheckBox Bool
Then I could just initialized
checkboxForm = CheckBox
<$> "title" .: bool (Just False)
Here is the full source:
{-# LANGUAGE OverloadedStrings #-}
import Data.Monoid
import Data.String
import Data.List
import qualified Data.Text as T
import Web.Spock.Safe
import Web.Spock.Digestive
import Text.Blaze (ToMarkup(..))
import Text.Blaze.Html5 hiding (html, param, main)
import qualified Text.Blaze.Html5 as H
import Text.Blaze.Html.Renderer.Utf8 (renderHtml)
import Text.Digestive
import Text.Digestive.Blaze.Html5
import System.Directory
import Control.Monad.IO.Class
import Control.Monad (forM_)
gen :: Html -> [Html] -> Html
gen title elts = H.html $ do
H.head $
H.title title
H.body $
H.ul $ mapM_ H.li elts
data CheckBox = CheckBox Bool
checkboxForm = CheckBox
<$> "title" .: bool (Just False)
renderForm :: View Html -> [Html] -> Html
renderForm v strings = do
Text.Digestive.Blaze.Html5.form v "POST" $ do
H.p $ mapM_ (\string -> do
inputCheckbox "title" v
Text.Digestive.Blaze.Html5.label "title" v string
H.br) strings
inputSubmit "Submit Post"
main :: IO ()
main =
runSpock 8080 $ spockT Prelude.id $ do
get root $ do
listing <- liftIO $ getDirectoryContents "/home/ecks/btsync-gambino"
let filteredListing = filter (\l -> not $ isPrefixOf "." l) listing
(view, result) <- runForm "checkboxForm" checkboxForm
case result of
Nothing -> lazyBytes $ renderHtml (renderForm view (Data.List.map fromString filteredListing))
Just newCheckbox -> lazyBytes $ renderHtml (renderForm view (Data.List.map fromString filteredListing))
-- lazyBytes $ renderHtml (gen "My Blog" (Data.List.map fromString filteredListing))
-- get ("hello" <//> var) $ \name ->
-- text ("Hello " <> name <> "!")

Haskell does not evaluate block

I am writing simple sitemap.xml crawler. The code is below. My question is why the code in the end of main does not print anything. I suspect it's because haskell's lazyness but don't know how to deal with it here:
import Network.HTTP.Conduit
import qualified Data.ByteString.Lazy as L
import Text.XML.Light
import Control.Monad.Trans (liftIO)
import Control.Monad
import Data.String.Utils
import Control.Exception
download :: Manager -> Request -> IO (Either HttpException L.ByteString)
download manager req = do
try $
fmap responseBody (httpLbs req manager)
downloadUrl :: Manager -> String -> IO (Either HttpException L.ByteString)
downloadUrl manager url = do
request <- parseUrl url
download manager request
getPages :: Manager -> [String] -> IO [Either HttpException L.ByteString]
getPages manager urls =
sequence $ map (downloadUrl manager) urls
main = withManager $ \ manager -> do
-- I know simpleHttp is bad here
mapSource <- liftIO $ simpleHttp "http://example.com/sitemap.xml"
let elements = (parseXMLDoc mapSource) >>= Just . findElements (mapElement "loc")
Just urls = liftM (map $ (replace "/#!" "?_escaped_fragment_=") . strContent) elements
mapElement name = QName name (Just "http://www.sitemaps.org/schemas/sitemap/0.9") Nothing
return $
getPages manager urls >>= \ pages -> do
print "evaluate me!"
sequence $ map print pages
You're running into the same problem I describe here, at least as far as having incorrect code that typechecks when it should actually give a type error: Why is the type of "Main.main", "IO ()" and not "IO a"?. This is why you should always give main the type signature main :: IO () explicitly.
To fix the problem, you will want to replace return with lift (see http://hackage.haskell.org/package/transformers/docs/Control-Monad-Trans-Class.html#v:lift) and replace sequence $ map ... with mapM_. mapM_ f is equivalent to sequence_ . map f.
Substitute your last return with runResourceT (http://hackage.haskell.org/package/resourcet-1.1.1/docs/Control-Monad-Trans-Resource.html#v:runResourceT). As it's type suggests, it would turn ResourceT into IO action.

Haskell multiple action types inside do block gives error "Couldn't match expected type `ZMQ z a0'"

I am trying to write a simple program in Haskell which listens over Zero MQ socket and publishes it to websocket connection, below is my code
{-# LANGUAGE OverloadedStrings #-}
import Data.Char (isPunctuation, isSpace)
import Data.Monoid (mappend)
import Data.Text (Text)
import Control.Exception (fromException)
import Control.Monad (forM_, forever)
import Control.Concurrent (MVar, newMVar, modifyMVar_, readMVar)
import Control.Monad.IO.Class (liftIO)
import Control.Monad
import qualified Data.Text as T
import qualified Data.Text.IO as T
import qualified Network.WebSockets as WS
import System.ZMQ3.Monadic
import Data.ByteString.Char8 (pack, unpack)
import Control.Concurrent (threadDelay)
import Data.Text.Encoding
import Data.ByteString.Internal
main :: IO ()
main = do
liftIO $ putStrLn "starting main..."
WS.runServer "0.0.0.0" 9160 $ application
application :: WS.Request -> WS.WebSockets WS.Hybi00 ()
application rq = do
liftIO $ putStrLn "starting..."
WS.acceptRequest rq
sink <- WS.getSink
WS.getVersion >>= liftIO . putStrLn . ("Client version: " ++)
msg <- WS.receiveData
liftIO $ putStrLn $ show $ (msg:: Text)
WS.sendTextData (msg :: Text)
runZMQ $ do
repSocket<- socket Rep
s<-return $bind repSocket "tcp://*:6555"
msg2 <- receive repSocket
let quote = msg2
--msg2 <- WS.receiveData
--liftIO $ putStrLn $ quote
WS.sendTextData $ ("test"::Text)
But compiler fails at statement WS.sendTextData $ ("test"::Text) saying below error
websocket_server.hs:42:17:
Couldn't match expected type `ZMQ z a0'
with actual type `WS.WebSockets p0 ()'
In a stmt of a 'do' block: WS.sendTextData $ ("test" :: Text)
In the second argument of `($)', namely
`do { repSocket <- socket Rep;
s <- return $ bind repSocket "tcp://*:6555";
msg2 <- receive repSocket;
let quote = msg2;
.... }'
In a stmt of a 'do' block:
runZMQ
$ do { repSocket <- socket Rep;
s <- return $ bind repSocket "tcp://*:6555";
msg2 <- receive repSocket;
let quote = msg2;
.... }
I am not sure how to deal with this issue how can I make do block statements return same value when the values cannot be converted into each other?
A simple liftIO should be enough for that call, but I haven't tried.
The trouble is that both the ZMQ and Websockets libraries define a "top level monad" that is not a monad transformer. So there is no provided way to layer the monads. This is poor design on their parts.
My suggestion would be to A) write your own ZMQ transformer or B) use the nonmonadic interface provided by ZMQ at the top level.

Aeson deriveJSON combined with conduit sinkParser

Continuing my exploration of conduit and aeson, how would I go about using my own data type in stead of Value in this (slightly modified) code snippet from the Yesod book.
{-# LANGUAGE OverloadedStrings, TemplateHaskell #-}
import Network.Wai (Response, responseLBS, Application, requestBody)
import Network.HTTP.Types (status200, status400)
import Network.Wai.Handler.Warp (run)
import Data.Aeson.Parser (json)
import Data.Conduit.Attoparsec (sinkParser)
import Control.Monad.IO.Class (liftIO)
import Data.Aeson (Value(..), encode, object, (.=))
import Control.Exception (SomeException)
import Data.ByteString (ByteString)
import Data.Conduit (ResourceT, ($$))
import Control.Exception.Lifted (handle)
import qualified Data.HashMap.Strict as M
import Data.Aeson.TH (deriveJSON)
-- I ADDED THIS
data JSONRequest = JSONRequest {
command :: ByteString,
params :: M.HashMap ByteString ByteString
}
deriveJSON id ''JSONRequest
-- END OF WHAT I ADDED
main :: IO ()
main = run 3000 app
app :: Application
app req = handle invalidJson $ do
value <- requestBody req $$ sinkParser json
newValue <- liftIO $ dispatch value
return $ responseLBS
status200
[("Content-Type", "application/json")]
$ encode newValue
invalidJson :: SomeException -> ResourceT IO Response
invalidJson ex = return $ responseLBS
status400
[("Content-Type", "application/json")]
$ encode $ object
[ ("message" .= show ex)
]
-- Application-specific logic would go here.
dispatch :: Value -> IO Value
dispatch = return
Basically, I want to change the type of dispatch to JSONRequest -> IO JSONRequest. How do I tell the parser to use my own derived instance of fromJSON?
I tried just adding a type declaration, praying for polymorphic return type on json, but I realised it is strictly for Value.
Just looking at the types, don't you just need to fmap your fromJSON over the result coming from json? With a suitable signature for dispatch we just need:
-- import Data.Aeson
app :: Application
app req = handle invalidJson $ do
result <- requestBody req $$ sinkParser (fmap fromJSON json)
next_result <- liftIO $ dispatch result
return $ responseLBS status200 [("Content-Type", "application/json")]
$ encode next_result
dispatch :: Result JSONRequest -> IO JSONRequest
dispatch (Error str) = undefined
dispatch (Success jsonreq) = return jsonreq
But maybe it's a little clearer written thus:
-- import Data.Aeson
-- import qualified Data.Attoparsec as Atto
toRequest :: Value -> Result JSONRequest
toRequest = fromJSON -- specialized now to your fromJSON
jsonRequestParser :: Atto.Parser (Result JSONRequest)
jsonRequestParser = fmap toRequest json
app :: Application
app req = handle invalidJson $ do
result <- requestBody req $$ sinkParser jsonRequestParser
next_result <- liftIO $ dispatch result
return $ responseLBS status200 [("Content-Type", "application/json")]
$ encode next_result
dispatch :: Result JSONRequest -> IO JSONRequest
dispatch (Error str) = undefined
dispatch (Success jsonreq) = return jsonreq
I left the parser returning a Result JSONRequest so dispatch is handling Error cases too, which might mean you need your exception handling somehow?

Resources