simple rss downloader in haskell - haskell

Yesterday i tried to write a simple rss downloader in Haskell wtih hte help of the Network.HTTP and Feed libraries. I want to download the link from the rss item and name the downloaded file after the title of the item.
Here is my short code:
import Control.Monad
import Control.Applicative
import Network.HTTP
import Text.Feed.Import
import Text.Feed.Query
import Text.Feed.Types
import Data.Maybe
import qualified Data.ByteString as B
import Network.URI (parseURI, uriToString)
getTitleAndUrl :: Item -> (Maybe String, Maybe String)
getTitleAndUrl item = (getItemTitle item, getItemLink item)
downloadUri :: (String,String) -> IO ()
downloadUri (title,link) = do
file <- get link
B.writeFile title file
where
get url = let uri = case parseURI url of
Nothing -> error $ "invalid uri" ++ url
Just u -> u in
simpleHTTP (defaultGETRequest_ uri) >>= getResponseBody
getTuples :: IO (Maybe [(Maybe String, Maybe String)])
getTuples = fmap (map getTitleAndUrl) <$> fmap (feedItems) <$> parseFeedString <$> (simpleHTTP (getRequest "http://index.hu/24ora/rss/") >>= getResponseBody)
I reached a state where i got a list which contains tuples, which contains name and the corresponding link. And i have a downloadUri function which properly downloads the given link to a file which has the name of the rss item title.
I already tried to modify downloadUri to work on (Maybe String,Maybe String) with fmap- ing on get and writeFile but failed with it horribly.
How can i apply my downloadUri function to the result of the getTuples function. I want to implement the following main function
main :: IO ()
main = some magic incantation donwloadUri more incantation getTuples
The character encoding of the result of getItemTitle broken, it puts code points in the places of the accented characters. The feed is utf8 encoded, and i thought that all haskell string manipulation functions are defaulted to utf8. How can i fix this?
Edit:
Thanks for you help, i implemented successfully my main and helper functions. Here comes the code:
downloadUri :: (Maybe String,Maybe String) -> IO ()
downloadUri (Just title,Just link) = do
item <- get link
B.writeFile title item
where
get url = let uri = case parseURI url of
Nothing -> error $ "invalid uri" ++ url
Just u -> u in
simpleHTTP (defaultGETRequest_ uri) >>= getResponseBody
downloadUri _ = print "Somewhere something went Nothing"
getTuples :: IO (Maybe [(Maybe String, Maybe String)])
getTuples = fmap (map getTitleAndUrl) <$> fmap (feedItems) <$> parseFeedString <$> decodeString <$> (simpleHTTP (getRequest "http://index.hu/24ora/rss/") >>= getResponseBody)
downloadAllItems :: Maybe [(Maybe String, Maybe String)] -> IO ()
downloadAllItems (Just feedlist) = mapM_ downloadUri $ feedlist
downloadAllItems _ = error "feed does not get parsed"
main = getTuples >>= downloadAllItems
The character encoding issue has been partially solved, i put decodeString before the feed parsing, so the files get named properly. But if i want to print it out, the issue still happens. Minimal working example:
main = getTuples

It sounds like it's the Maybes that are giving you trouble. There are many ways to deal with Maybe values, and some useful library functions like fromMaybe and fromJust. However, the simplest way is to do pattern matching on the Maybe value. We can tweak your downloadUri function to work with the Maybe values. Here's an example:
downloadUri :: (Maybe String, Maybe String) -> IO ()
downloadUri (Just title, Just link) = do
file <- get link
B.writeFile title file
where
get url = let uri = case parseURI url of
Nothing -> error $ "invalid uri" ++ url
Just u -> u in
simpleHTTP (defaultGETRequest_ uri) >>= getResponseBody
downloadUri _ = error "One of my parameters was Nothing".
Or maybe you can let the title default to blank, in which case you could insert this just before the last line in the previous example:
downloadUri (Nothing, Just link) = downloadUri (Just "", Just link)
Now the only Maybe you need to work with is the outer one, applied to the array of tuples. Again, we can pattern match. It might be clearest to write a helper function like this:
downloadAllItems (Just ts) = ??? -- hint: try a `mapM`
downloadAllItems Nothing = ??? -- don't do anything, or report an error, or...
As for your encoding issue, my guesses are:
You're reading the information from a file that isn't UTF-8 encoded, or your system doesn't realise that it's UTF-8 encoded.
You are reading the information correctly, but it gets messed up when you output it.
In order to help you with this problem, I need to see a full code example, which shows how you're reading the information and how you output it.

Your main could be something like the shown below. There may be some more concise way to compose these two operations though:
main :: IO ()
main = getTuples >>= process
where
process (Just lst) = foldl (\s v -> do {t <- s; download v}) (return ()) lst
process Nothing = return ()
download (Just t, Just l) = downloadUri (t,l)
download _ = return ()

Related

Haskell I can not get exception ReadFile with try

I have a function "management" that checks parameters and return a Maybe (String):
If there are not parameter -> return Nothing
If my parameter is equal to "-h" -> Return a string help
My problem arrived when I get a file and check if this file exists.
Couldn't match expected type ‘Maybe String’
with actual type ‘IO (Either e0 a2)’
managagment :: [String] -> Maybe (String)
managagment [] = Nothing
managagment ["-h"] = Just (help)
managagment [file] = try $ readFile file >>= \result -> case result of
Left e -> Nothing
Right content -> Just (content)
There are several problems
Function application ($) is lower precedence than bind (>>=)
You said:
try $ readFile file >>= \res...
Which means
try ( readFile file >>= \res... )
But you wanted:
try ( readFile file ) >>= \res...
IO (Maybe a) and Maybe a are distinct
You have a function using IO (via readFile and try) but many of the cases do not return an IO result (Nothing and Just content).
Solution: Return via return Nothing or pure Nothing to lift a value into the IO monad.
The exception type was ambiguous
The try function can catch any exception type, just look at the signature:
try :: Exception e => IO a -> IO (Either e a)
When you totally ignore the exception you leave the type checker with no information to decide what e should be. In these situations an explicit type signature is useful, such as:
Left (e::SomeException) -> pure Nothing
managagment is partial
managagment ["a","b"] is undefined as is any input list of length over one. Consider a final equational definition of:
managagment _ = managagment ["-h"]
Or more directly:
managagment _ = pure $ Just help
Style and other notes
managagment should probably management
Just (foo) is generally Just foo
help is not a function that returns a String. It is a value of type String.
The example was not complete, missing imports and help.
Fixed Code
Consider instead:
#!/usr/bin/env cabal
{- cabal:
build-depends: base
-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE LambdaCase #-}
import Control.Exception (try, SomeException)
main :: IO ()
main = print =<< management []
help :: String
help = "so helpful"
management :: [String] -> IO (Maybe String)
management [] = pure Nothing
management ["-h"] = pure $ Just help
management [file] =
try (readFile file) >>=
\case
Left (e::SomeException) -> pure Nothing
Right content -> pure $ Just content
management _ = pure $ Just help
And test as such:
% chmod +x x.hs
% ./x.hs

Change a function to support IO String instead of String

I get an IO String via:
import Data.Char
import Network.HTTP
import Text.HTML.TagSoup
openURL :: String -> IO String
openURL x = getResponseBody =<< simpleHTTP (getRequest x)
crawlType :: String -> IO String
crawlType pkm = do
src <- openURL url
return . fromBody $ parseTags src
where
fromBody = unwords . drop 6 . take 7 . words . innerText . dropWhile (~/= "<p>")
url = "http://pokemon.wikia.com/wiki/" ++ pkm
and I want to parse its data via:
getType :: String -> (String, String)
getType pkmType = (dropWhile (== '/') $ fst b, dropWhile (== '/') $ snd b)
where b = break (== '/') pkmType
But like you see, getType doesn't support the IO String yet.
I'm new to IO, so how to make it working?
I also tryed to understand the error when giving the IO String to that function, but it's too complicated for me up to now :/
First, to emphasize: an IO String is not a string. It's an IO action which, when you bind it somewhere within the main action, will yield a result of type String, but you should not think of it as some sort of “variation on the string type”. Rather, it's a special instantiation of the IO a type.
For this reason, you almost certainly do not want to “change a function to support IO String instead of String”. Instead, you want to apply this string-accepting function, as it is, to an outcome of the crawlType action. Such an outcome, as I said, has type String, so you're fine there. For instance,
main :: IO ()
main = do
pkm = "blablabla"
typeString <- crawlType pkm
let typeSpec = getType typeString
print typeSpec -- or whatever you wish to do with it.
You can omit the typeString variable by writing†
typeSpec <- getType <$> crawlType pkm
if you prefer; this corresponds to what in a procedural language might look like
var typeSpec = getType(crawlType(pkm));
Alternatively, you can of course include the parsing right in crawlType:
crawlType' :: String -> IO (String, String)
crawlType' pkm = do
src <- openURL url
return . getType . fromBody $ parseTags src
where
fromBody = unwords . drop 6 . take 7 . words . innerText . dropWhile (~/= "<p>")
url = "http://pokemon.wikia.com/wiki/" ++ pkm
†If you're curious what the <$> operator does: this is not built-in syntax like do/<- notation. Instead, it's just an infix version of fmap, which you may better know in its list-specialised version map. Both list [] and IO are functors, which means you can pull them through ordinary functions, changing only the element/outcome values but not the structure of the IO action / list spine.

Handling exceptions (ExceptT) in chain of actions

I am trying to use an exception to skip parts of the code here. Instead of getting caught by catcheE and resuming normal behavior all following actions in the mapM_ chain get skipped.
I looked at this question and it appears that catchE ~ main and checkMaybe ~ intercept.
I also checked the implementation of mapM_to be sure it does what i want it to, but i don't understand how the Left value can escape dlAsset to affect the behavior of mapM_.
I refactored this from a version where i simply used an empty string as an exception marker for the failed lookup. In that version checkMaybe just returned a Right value immediately and it worked (matching on "" to 'catch')
import Data.HashMap.Strict as HM hiding (map)
import qualified Data.ByteString.Lazy as BS
import qualified Data.ByteString.Char8 as BSC8
import qualified JSONParser as P -- my module
retrieveAssets :: (Text -> Text) -> ExceptT Text IO ()
retrieveAssets withName = withManager $ (lift ((HM.keys . P.assets)
<$> P.raw) ) >>= mapM_ f
where
f = \x -> dlAsset x "0.1246" (withName x)
dlAsset :: Text -> Text -> Text -> ReaderT Manager (ExceptT Text IO) ()
dlAsset name size dest = do
req <- lift $ (P.assetLookup name size <$> P.raw) >>= checkMaybe
name >>= parseUrl . unpack -- lookup of a url
res <- httpLbs req
lift $ (liftIO $ BS.writeFile (unpack dest) $ responseBody res)
`catchE` (\_ -> return ()) -- always a Right value?
where
checkMaybe name a = case a of
Nothing -> ExceptT $ fmap Left $ do
BSC8.appendFile "./resources/images/missingFiles.txt" $
BSC8.pack $ (unpack name) ++ "\n"
putStrLn $ "lookup of " ++ (unpack name) ++ " failed"
return name
Just x -> lift $ pure x
(had to reformat to become somewhat readable here)
edit: i'd like to understand what actually happens here, that would probably help me more than knowing which part of the code is wrong.
The problem is that your call to catchE only covered the very last line of dlAsset. It needs to be moved to the left of the do-notation indentation level to cover all of the do notation.

Haskell does not evaluate block

I am writing simple sitemap.xml crawler. The code is below. My question is why the code in the end of main does not print anything. I suspect it's because haskell's lazyness but don't know how to deal with it here:
import Network.HTTP.Conduit
import qualified Data.ByteString.Lazy as L
import Text.XML.Light
import Control.Monad.Trans (liftIO)
import Control.Monad
import Data.String.Utils
import Control.Exception
download :: Manager -> Request -> IO (Either HttpException L.ByteString)
download manager req = do
try $
fmap responseBody (httpLbs req manager)
downloadUrl :: Manager -> String -> IO (Either HttpException L.ByteString)
downloadUrl manager url = do
request <- parseUrl url
download manager request
getPages :: Manager -> [String] -> IO [Either HttpException L.ByteString]
getPages manager urls =
sequence $ map (downloadUrl manager) urls
main = withManager $ \ manager -> do
-- I know simpleHttp is bad here
mapSource <- liftIO $ simpleHttp "http://example.com/sitemap.xml"
let elements = (parseXMLDoc mapSource) >>= Just . findElements (mapElement "loc")
Just urls = liftM (map $ (replace "/#!" "?_escaped_fragment_=") . strContent) elements
mapElement name = QName name (Just "http://www.sitemaps.org/schemas/sitemap/0.9") Nothing
return $
getPages manager urls >>= \ pages -> do
print "evaluate me!"
sequence $ map print pages
You're running into the same problem I describe here, at least as far as having incorrect code that typechecks when it should actually give a type error: Why is the type of "Main.main", "IO ()" and not "IO a"?. This is why you should always give main the type signature main :: IO () explicitly.
To fix the problem, you will want to replace return with lift (see http://hackage.haskell.org/package/transformers/docs/Control-Monad-Trans-Class.html#v:lift) and replace sequence $ map ... with mapM_. mapM_ f is equivalent to sequence_ . map f.
Substitute your last return with runResourceT (http://hackage.haskell.org/package/resourcet-1.1.1/docs/Control-Monad-Trans-Resource.html#v:runResourceT). As it's type suggests, it would turn ResourceT into IO action.

Function out of scope

I have a doubt of why is this happening. I have been following the "Yesod Web" ebook but with a scaffolded site. When I arrived to a position that I wanted to apply the function of "plural" inside a "messages" file, the compiler returns this error:
Foundation.hs:52:1: Not in scope: `plural'
Where plural is declared in the same hs file as the one that I am calling the "hamlet" one. However, if I move the function declaration before the line #52 in the "Foundation.hs" file, then the error vanishes and it let me compile it effectively. Why does this happen??
module Handler.UserProfile where
import Import
import Data.Maybe (fromMaybe)
import Data.Text (pack, unpack)
viewCountName :: Text
viewCountName = "UserProfileViews"
readInt :: String -> Int
readInt = read
plural :: Int -> String -> String -> String
plural 1 x _ = x
plural _ _ y = y
getUserProfileR :: Handler RepHtml
getUserProfileR = do
viewCount <- lookupSession viewCountName
>>= return . (1 +) . readInt . unpack . fromMaybe "0"
setSession viewCountName (pack $ show viewCount)
maid <- maybeAuth
--msg <- getMessageRender
let user = case maid of
Nothing -> "(Unknown User ID)" --show MsgHello --
Just (Entity _ u) -> userEmail u
defaultLayout $ do
setTitleI MsgUserProfile
$(widgetFile "nhUserProfile")
Take a look at the GHC users manual: http://www.haskell.org/ghc/docs/7.4.2/html/users_guide/template-haskell.html#id684916
The staging restriction in play is described in the second bullet point:
You can only run a function at compile time if it is imported from
another module. That is, you can't define a function in a module, and
call it from within a splice in the same module. (It would make sense
to do so, but it's hard to implement.)

Resources