Lookup values inside a Cassava-ingested CSV

Lookup values inside a Cassava-ingested CSV - haskell

I successfully read in a CSV using Cassava (http://hackage.haskell.org/package/cassava) with this:
getData = do
csvData <- BL.readFile "data.csv"
case decodeByName csvData of
Left err -> putStrLn err
Right (_, v) -> V.forM_ v $ \ p ->
putStrLn $ col1 p ++ "," ++ col2 p ++ "," ++ (show $ col3 p) ++ "," ++ (show $ col4 p) ++ "," ++ (show $ col5 p) ++ "," ++ col6 p ++ "," ++ (show $ col7 p) ++ "," ++ (show $ col8 p) ++ "," ++ (show $ col9 p) ++ "," ++ (show $ col10 p)
What I actually need to do is use the values in col3 as keys to find values in col10.
Someone suggested that I use Map from Data.Map (https://hackage.haskell.org/package/containers-0.4.0.0/docs/Data-Map.html) for this, but I'm not sure how to approach this.
Everything I have tried so far has not worked. I assume you enter the Map inside the Right case, along the lines of:
Right (_, v) -> Map (V.forM_ v) ???
But I am stuck on how to proceed. Would appreciate any suggestions. Ideally, I would want to modify getData so that it is getData keyToFetch = ... -- and that keyToFetch would be used in the Map.

Yes, it is probably a good idea to use Data.Map to find values in col10 using values in col3 as keys.
As we have little data about col3, col10 and the exact data type you are using, I will resort to adapting the decodebyName example in the Cassava documentation to the idea of generating a map object. The example is based on a very simple {name, salary} type of record.
The two branches of the case of construct have to return a common type, in our case a Data.Map object instead of an IO () action. Fortunately, the error function is flexible enough that it can pretend to be of the appropriate type.
This would give this sort of code:
{-# LANGUAGE OverloadedStrings #-}
import Control.Applicative
import qualified Data.ByteString.Lazy as BL
import Data.Csv
import qualified Data.Vector as V
import qualified Data.Map as M
import Control.Monad (forM_)
data Person = Person
{ name :: !String
, salary :: !Int
} deriving (Show, Ord, Eq) -- need that for Map objects
instance FromNamedRecord Person where
parseNamedRecord r = Person <$> r .: "name" <*> r .: "salary"
-- build a map object:
makeMap :: V.Vector Name -> V.Vector Person -> M.Map String Int
makeMap hdr pvec =
-- with name and salary playing the role of col3 and col10:
let pls = V.toList pvec -- get a list
zls = zip (map name pls) (map salary pls)
in M.fromList zls
showRecord :: String -> Int -> String
showRecord name salary = name ++ " earns " ++ (show salary) ++ " dollars"
main :: IO ()
main = do
csvData <- BL.readFile "salaries.csv"
let ma = case decodeByName csvData of
Left errMsg -> error $ "decodeByName failed: " ++ errMsg
Right (hdr, pvec) -> makeMap hdr pvec
-- print out the Map object:
putStrLn $ "Contents of map object:"
putStrLn $ show ma
putStrLn $ ""
forM_ (M.toList ma) (\(n,s) -> putStrLn $ showRecord n s)
let sal1 = M.lookup "John Doe" ma
putStrLn $ "sal1 = " ++ (show sal1)
--
Execution:
Contents of map object:
fromList [("Jane Doe",60000),("John Doe",50000)]
Jane Doe earns 60000 dollars
John Doe earns 50000 dollars
sal1 = Just 50000
Note that I have to use plain lists extensively, as for some reason there is no direct route from vectors to maps, something discussed already in this SO question.

Related

Concatenating scrapeURL results from multiples scrapings into one list

I am scraping https://books.toscrape.com using Haskell's Scalpel library. Here's my code so far:
import Text.HTML.Scalpel
import Data.List.Split (splitOn)
import Data.List (sortBy)
import Control.Monad (liftM2)
data Entry = Entry {entName :: String
, entPrice :: Float
, entRate :: Int
} deriving Eq
instance Show Entry where
show (Entry n p r) = "Name: " ++ n ++ "\nPrice: " ++ show p ++ "\nRating: " ++ show r ++ "/5\n"
entries :: Maybe [Entry]
entries = Just []
scrapePage :: Int -> IO ()
scrapePage num = do
items <- scrapeURL ("https://books.toscrape.com/catalogue/page-" ++ show num ++ ".html") allItems
let sortedItems = items >>= Just . sortBy (\(Entry _ a _) (Entry _ b _) -> compare a b)
>>= Just . filter (\(Entry _ _ r) -> r == 5)
maybe (return ()) (mapM_ print) sortedItems
allItems :: Scraper String [Entry]
allItems = chroots ("article" #: [hasClass "product_pod"]) $ do
p <- text $ "p" #: [hasClass "price_color"]
t <- attr "href" $ "a"
star <- attr "class" $ "p" #: [hasClass "star-rating"]
let fp = read $ flip (!!) 1 $ splitOn "£" p
let fStar = drop 12 star
return $ Entry t fp $ r fStar
where
r f = case f of
"One" -> 1
"Two" -> 2
"Three" -> 3
"Four" -> 4
"Five" -> 5
main :: IO ()
main = mapM_ scrapePage [1..10]
Basically, allItems scrapes for each book's title, price and rating, does some formatting for price to get a float, and returns it as a type Entry. scrapePage takes a number corresponding to the result page number, scrapes that page to get IO (Maybe [Entry]), formats it - in this case, to filter for 5-star books and order by price - and prints each Entry. main performs scrapePage over pages 1 to 10.
The problem I've run into is that my code scrapes, filters and sorts each page, whereas I want to scrape all the pages then filter and sort.
What worked for two pages (in GHCi) was:
i <- scrapeURL ("https://books.toscrape.com/catalogue/page-1.html") allItems
j <- scrapeURL ("https://books.toscrape.com/catalogue/page-2.html") allItems
liftM2 (++) i j
This returns a list composed of page 1 and 2's results that I could then print, but I don't know how to implement this for all 50 result pages. Help would be appreciated.

Just return the entry list without any processing (or you can do filtering in this stage)
-- no error handling
scrapePage :: Int -> IO [Entry]
scrapePage num =
concat . maybeToList <$> scrapeURL ("https://books.toscrape.com/catalogue/page-" ++ show num ++ ".html") allItems
Then you can process them later together
process = filter (\e -> entRate e == 5) . sortOn entPrice
main = do
entries <- concat <$> mapM scrapePage [1 .. 10]
print $ process entries
Moreover you can easily make your code concurrent with mapConcurrently from async package
main = do
entries <- concat <$> mapConcurrently scrapePage [1 .. 20]
print $ process entries

How to reuse IHP classes in a IHP script?

With IHP (the haskell web framework) I created a web application. Now I want to create a IHP Script to load some external data into my database. However I'm getting a lot of import conflicts from the Prelude, but not the types I expected.
#!/usr/bin/env run-script
module Application.Script.DataLoader where
import Application.Script.Prelude hiding (decode, pack, (.:))
import qualified Data.ByteString.Lazy as BL
import Data.Csv
import Data.Text (pack)
import qualified Data.Vector as V
import Control.Monad (mzero)
instance FromNamedRecord Product where
parseNamedRecord r = Product def <$> r .: "title" <*> r .: "price" <*> r .: "category" <*> pure def
run :: Script
run = do
csvData <- BL.readFile "~/tender/data/Boiler-en-kookkraan_Boiler.csv"
case decodeByName csvData of
Left err -> putStrLn $ pack err
Right (_, v) -> V.forM_ v $ \ p ->
putStrLn $ (get #title p) ++ ", " ++ show (get #price p) ++ " euro"
Where my Product schema looks like this:
CREATE TABLE products (
id UUID DEFAULT uuid_generate_v4() PRIMARY KEY NOT NULL,
title TEXT NOT NULL,
price DOUBLE PRECISION NOT NULL,
category TEXT NOT NULL
);
Is there a way to use the types I created as a Data object to e.g. read my csv to?
[Updated output]
Application/Script/DataLoader.hs:12:26: error:
• Couldn't match type ‘MetaBag -> Product' a1’
with ‘Product' (QueryBuilder ProjectProduct)’
Expected type: Parser Product
Actual type: Parser (MetaBag -> Product' a1)
• In the expression:
Product def <$> r .: "title" <*> r .: "price" <*> r .: "category"
<*> pure def
In an equation for ‘parseNamedRecord’:
parseNamedRecord r
= Product def <$> r .: "title" <*> r .: "price" <*> r .: "category"
<*> pure def
In the instance declaration for ‘FromNamedRecord Product’
|
12 | parseNamedRecord r = Product def <$> r .: "title" <*> r .: "price" <*> r .: "category" <*> pure def
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Solved, all credits to the help of #mpscholten]
#!/usr/bin/env run-script
module Application.Script.DataLoader where
import Application.Script.Prelude hiding (decode, pack, (.:))
import qualified Data.ByteString.Lazy as BL
import Data.Csv
import Data.Text (pack)
import qualified Data.Vector as V
import Control.Monad (mzero)
parseProduct :: NamedRecord -> Parser Product
parseProduct r = do
title <- r .: "title"
price <- r .: "price"
category <- r .: "category"
newRecord #Product
|> set #title title
|> set #price price
|> set #category category
|> pure
run :: Script
run = do
csvData <- BL.readFile "data/Boiler-en-kookkraan_Boiler.csv"
case decodeByNameWithP parseProduct defaultDecodeOptions csvData of
Left err -> putStrLn $ pack err
Right (_, v) -> V.forM_ v $ \ p ->
putStrLn $ (get #title p) ++ ", " ++ show (get #price p) ++ " euro"

Inside the FromNamedRecord instance you are missing two fields: id and meta. The id field is the first field of the record. The meta field is a hidden field used by IHP to keep track of validation errors. It's always the last field of a record.
The easiest way to solve this is to use newRecord and write out the code in a more explicit way:
instance FromNamedRecord Product where
parseNamedRecord r = do
title <- r .: "title"
price <- r .: "price"
category <- r .: "category"
newRecord #Product
|> set #title title
|> set #price price
|> set #category category
|> pure
For the error "Ambiguous occurrence ‘title’" try to use the get function instead of using the normal haskell accessor function:
putStrLn $ (get #title p) ++ ", " ++ show (get #price p) ++ " euro"

Can you maybe share what errors you get exactly? Your desired product type should be in Generated.Types which intern is loaded by Application.Script.Prelude.
I think you might have two models that both have the field title. In haskell fields are functions and they may not be used twice.

Haskell Format String Output "(x, y),"

So I am trying to output a list of Points "(X,Y) \n" but I cant get it working.
Both values X and Y are Floats. I tried the module text-format but I cant make it work with Char and Float and the same time.
Anyone has an idea on how to make this work?
BEst regards
UPDATED:
format_pts_string cs = [
format ("(" % a % ", " % b % ")")
| c <- cs]
This code is NOT working, Error -> Print Of Error

From the code of your question, I guess you want to convert a list of Double pair to a list of String, as said in comment, you may not need use Data.Text.Format package, since the basic function show can convert it to String properly as:
format_pts_string::[(Double, Double)]->[String]
format_pts_string cs = map (\c-> (show c) ++ "\n") cs
or use list comprehension:
format_pts_string::[(Double, Double)]->[String]
format_pts_string cs = [show c ++ "\n" | c <- cs]
Furthermore, if need to control output format, you can use ++ to concatenate the String, here is an example:
format_pts_string::[(Double, Double)]->[String]
format_pts_string cs = map formatPair cs
where formatPair (a, b) = "(" ++ (show a) ++ ", " ++ (show b) ++ ")" ++ "\n"
if you still love to use module text-format, you may need to enable OverloadedStrings language extension to convert String to Format type for call format function as:
{-# Language OverloadedStrings #-}
import Data.Text.Lazy (unpack)
import Data.Text.Format (format)
format_pts_string::[(Double, Double)]->[String]
format_pts_string cs = [unpack $ format "({}, {})\n" (c::(Double, Double))| c <- cs]
or don't use OverloadedStrings language extension, use formString in Data.String instead, but it is verbose as:
...
import Data.String (fromString)
...
[unpack $ format (fromString "({}, {})\n") (c::(Double, Double))| c <- cs]

Is this what you have in mind?
main = let ps = [
(1.0,1.0),
(2.0,2.0),
(3.0,3.0),
(4.0,4.0),
(5.0,5.0),
(6.0,6.0)
]
in mapM print ps
output :
(1.0,1.0)
(2.0,2.0)
(3.0,3.0)
(4.0,4.0)
(5.0,5.0)
(6.0,6.0)

zip AST with bool list

I have an AST representing a haskell program and a bitvector/bool list representing the presence of strictness annotations on Patterns in order.For example, 1000 represents a program with 4 Pats where the first one is a BangPat. Is there any way that I can turn on and off the annotations in the AST according to the list?
-- EDIT: further clarify what I want editBang to do
Based on user5042's answer:
Simple.hs :=
main = do
case args of
[] -> error "blah"
[!x] -> putStrLn "one"
(!x : xs) -> putStrLn "many"
And I want editBang "Simple.hs" [True, True, True, True] to produce
main = do
case args of
[] -> error "blah"
[!x] -> putStrLn "one"
(!(!x : !xs)) -> putStrLn "many"
Given that above are the only 4 places that ! can appear

As a first step, here's how to use transformBi:
import Data.Data
import Control.Monad
import Data.Generics.Uniplate.Data
import Language.Haskell.Exts
import Text.Show.Pretty (ppShow)
changeNames x = transformBi change x
where change (Ident str) = Ident ("foo_" ++ str)
change x = x
test2 = do
content <- readFile "Simple.hs"
case parseModule content of
ParseFailed _ e -> error e
ParseOk a -> do
let a' = changeNames a
putStrLn $ ppShow a'
The changeNames function finds all occurrences of a Ident s and replaces it with Ident ("foo_"++s) in the source tree.
There is a monadic version called transformBiM which allows the replacement function to be monadic which would allow you to consume elements from your list of Bools as you found bang patterns.
Here is a complete working example:
import Control.Monad
import Data.Generics.Uniplate.Data
import Language.Haskell.Exts
import Text.Show.Pretty (ppShow)
import Control.Monad.State.Strict
parseHaskell path = do
content <- readFile path
let mode = ParseMode path Haskell2010 [EnableExtension BangPatterns] False False Nothing
case parseModuleWithMode mode content of
ParseFailed _ e -> error $ path ++ ": " ++ e
ParseOk a -> return a
changeBangs bools x = runState (transformBiM go x) bools
where go pp#(PBangPat p) = do
(b:bs) <- get
put bs
if b
then return p
else return pp
go x = return x
test = do
a <- parseHaskell "Simple.hs"
putStrLn $ unlines . map ("before: " ++) . lines $ ppShow a
let a' = changeBangs [True,False] a
putStrLn $ unlines . map ("after : " ++) . lines $ ppShow a'
You might also look into using rewriteBiM.
The file Simple.hs:
main = do
case args of
[] -> error "blah"
[!x] -> putStrLn "one"
(!x : xs) -> putStrLn "many"

Printing the values inside a tuple in Haskell

I have a list of tuples. For example: [("A",100,1),("B",101,2)]. I need to display it in a simple way. For example: "your name is: A", "Your id is: 100".
If anyone can find a solution for this, it would be a great help. Thanks in advance.

The easiest way to do this is to create a function that works for one of the elements in your list. So you'll need something like:
showDetails :: (String, Int, Int) -> String
showDetails (name, uid, _) = "Your name is:" ++ name ++ " Your ID is: " ++ show uid
Then you would apply this function to each element in the list, which means you want to use the mapping function:
map :: (a -> b) -> [a] -> [b]
So, if your list is called xs, you would want something like:
map showDetails xs
This obviously gives you a result of type [String], so you might be interested in the unlines function:
unlines :: [String] -> String
This simply takes a list of strings, and creates a string where each element is separated by a new line.
Putting this all together, then, gives you:
main :: IO ()
main = putStrLn . unlines . map showDetails $ [("A",100,1),("B",101,2)]

For a single tuple, just pattern match all the elements, and do something with them. Having a function that does that, you can use map to transform the entire list.
import Data.List (foldl')
show_tuple :: (Num a, Num b) => (String, a, b) -> String
show_tuple (name, id, something) =
"Your name is: " ++ name ++ "\n" ++
"Your ID is: " ++ (show id) ++ "\n" ++
"Your something: " ++ (show something) ++ "\n\n"
-- transforms the list, and then concatenates it into a single string
show_tuple_list :: (Num a, Num b) => [(String, a, b)] -> String
show_tuple_list = (foldl' (++) "") . (map show_tuple)
The output:
*Main Data.List> putStr $ show_tuple_list [("ab", 2, 3), ("cd", 4, 5)]
Your name is: ab
Your ID is: 2
Your something: 3
Your name is: cd
Your ID is: 4
Your something: 5

Quick and dirty solution
f (x,y,z) = "your id is " ++ (show y) ++ ", your name is " ++ (show x) ++ "\n"
main = putStrLn $ foldr (++) "" (map f [("A",100,1),("B",101,2)])
OR (by #maksenov)
main = putStrLn $ concatMap f [("A",100,1),("B",101,2)]

Please try:
get1st (a,_,_) = a
get2nd (_,a,_) = a
get3rd (_,_,a) = a
showTuples [] = ""
showTuples (x:xs) = "Your name is:" ++ show(get1st(x)) ++ " Your ID is: " ++ show(get2nd(x)) ++ "\n" ++ showTuples xs
main = do
let x = [("A",100,1),("B",101,2)]
putStrLn . showTuples $ x

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Lookup values inside a Cassava-ingested CSV - haskell

Related

Concatenating scrapeURL results from multiples scrapings into one list

How to reuse IHP classes in a IHP script?

Haskell Format String Output "(x, y),"

zip AST with bool list

Printing the values inside a tuple in Haskell

Categories

Resources