Dealing with tabular data in Haskell - haskell

This is an excerpt of a file.csv file with some tabular data
John,23,Paris
Alban,28,London
Klaus,27,Berlin
Hans,29,Stockholm
Julian,25,Paris
Jonathan,26,Lyon
Albert,27,London
The column headers for this file would be
firstName, age, city
This file is loaded in ghci like this
𝛌> :m + Data.List Data.Function Data.List.Split
𝛌> contents <- readFile "file.csv"
𝛌> let t = map (splitOn ",") $ lines contents
𝛌> mapM print $ take 3 t
["John","23","Paris"]
["Alban","28","London"]
["Klaus","27","Berlin"]
[(),(),()]
Now, if I want to add a birthYear column to those 3 columns, I can do
𝛌> let getYear str = show $ 2016 - read str
𝛌> let withYear = map (\(x:xs) -> x : xs ++ [getYear (head xs)]) t
𝛌> mapM print $ take 3 withYear
["John","23","Paris","France","1993"]
["Alban","28","London","UK","1988"]
["Klaus","27","Berlin","Germany","1989"]
[(),(),()]
This works well but what bothers me is that the getYear function has type String -> String and as such, type checking is pretty much useless here.
I could easily convert t into a list of tuples like ("John", 23, "Paris") but what about if I have not 3, but 300 features (which is not that uncommon in machine learning problems)?
What would be the best way to deal with different column types? Using tuples? Using maps?
In case of a big number of columns, is there a way to make Haskell infer the column's types? For instance, it would detect that column 2 in the above example is Int, and the others are strings?
Concerning column headers, would there be a way that one could simply access the columns by label instead of by index, so that getYear could be something like 2016 - column['age'] (Python example)?
I'm used to Python's Pandas DataFrames which perform all this stuff automagically, but Haskell looks like it could perform a ton of it natively. Not sure how to do this however as of now.

Related

String Formatting columns in Haskell without Text.Printf

I am new to Haskell. I am at the last part of a school project. I have to take tuples and print them to an outfile and separate them by a tab column. So (709,4226408), (12965,4226412) and (5,4226016) should have and output of
709 4226408
12965 4226412
5 4226016
What I have been trying to do is this:
genOutput :: (Int, Int) -> String
genOutput (a,b) = (show a) ++ "\t" ++ (show b)
And this gives outputs like:
"709\t4226408"
"12965\t4226412"
"5\t4226016"
There are 3 things wrong with this. 1) Quotes still appear in the output. 2) The \t tab does not actually become a tab space. .Whenever I try to make an actual tab for the "" it just comes out as a " " space. 3) They are not aligned into columns like the above example. I know Text.Printf exists but we are not allowed to import anything other than:
import System.IO
import Data.List
import System.Environment
that's the output you get from GHCi I guess? Try to use putStrLn instead:
Prelude> genOutput (1,42)
"1\t42"
Prelude> putStrLn $ genOutput (1,42)
1 42
Why is that?
If you tell GHCi to evaluate an expression it will do so and (more or less) output it using show - show is designed to work with read and will usually output a value as if you would input it directly into Haskell. For a String that will include escape sequences and the "s
Now using putStrLn it will take the string and print it to stdout as you would expect.
Using print
Another reason could be that you use print to output your value - print is show + putStrLn so it'll show the values first re-introducing the escapes (as GHCi would) - so if you use print change it to putStrLn if you are using Strings

Haskell, make single string from integer set?

I'd greatly appreciate if you could tell me how to make a single string from a range between two ints. Like [5..10] i would need to get a "5678910". And then I'd have to calculate how many (zeroes, ones ... nines) there are in a string.
For example: if i have a range from [1..10] i'd need to print out
1 2 1 1 1 1 1 1 1 1
For now i only have a function to search for a element in string.
`countOfElem elem list = length $ filter (\x -> x == elem) list`
But the part how to construct such a string is bugging me out, or maybe there is an easier way? Thank you.
I tried something like this, but it wouldn't work.
let intList = map (read::Int->String) [15..22]
I tried something like this, but it wouldn't work. let intList = map (read::Int->String) [15..22]
Well... the purpose of read is to parse strings to read-able values. Hence it has a type signature String -> a, which obviously doesn't unify with Int -> String. What you want here is the inverse1 of read, it's called show.
Indeed map show [15..22] gives almost the result you asked for – the numbers as decimal-encoded strings – but still each number as a seperate list element, i.e. type [String] while you want only String. Well, how about asking Hoogle? It gives the function you need as the fifth hit: concat.
If you want to get fancy you can then combine the map and concat stages: both the concatMap function and the >>= operator do that. The most compact way to achieve the result: [15..22]>>=show.
1show is only the right inverse of read, to be precise.

Finding list entry with the highest count

I have an Entry data type
data Entry = Entry {
count :: Integer,
name :: String }
Then I want to write a function, that takes the name and a list of Entrys as arguments an give me the Entrys with the highest count. What I have so far is
searchEntry :: String -> [Entry] -> Maybe Integer
searchEntry _ [] = Nothing
searchEntry name1 (x:xs) =
if name x == name1
then Just (count x)
else searchEntry name xs
That gives me the FIRST Entry that the function finds, but I want the Entry with the highest count. How can I implement that?
My suggestion would be to break the problem into two parts:
Find all entries matching a given name
Find the entry with the highest count
You could set it up as
entriesByName :: String -> [Entry] -> [Entry]
entriesByName name entries = undefined
-- Use Maybe since the list might be empty
entryWithHighestCount :: [Entry] -> Maybe Entry
entryWithHighestCount entries = undefined
entryByNameWithHighestCount :: String -> [Entry] -> Maybe Entry
entryByNameWithHighestCount name entires = entryWithHighestCount $ entriesByName name entries
All you have to do is implement the relatively simple functions that are used to implement getEntryByNameWithHighestCount.
You need to add an inner method that takes a current result as a parameter and returns that instead of Nothing when reaching the end of the method.
Also you would need to update your result found logic to compare a potentially existing function and the found value.
I would consider changing the signature of the function to String->Maybe Entry (or String->[Entry]) if you indeed want to return the "Entry" items with the highest count.
Otherwise, you can actually do what you want as a oneliner using some pretty common Haskell functions....
As Bheklilr mentioned, the name filter can be done first, and it is really easy to do this using the filter function....
filter (hasName theName) entries
Note that hasName can be written out fully as a separate function, but Haskell also offers you the following shortcut.
hasName = (== theName) . name
Now you just need the maximum value.... Haskell has a maximum function, but it only works on the Ord class. You can make Entry an instance of Ord, or you can just use the related maximumBy function, that takes an extra ordering function
maximumBy orderFunction entries2
Again, you can write orderFunction yourself (which you might want to do as an excercise), but haskell again offers a shortcut.
orderFunction = compare `on` count
You will need to import some libs to get this all to work (Data.Function, Data.List). You also will need to put in some extra code to account for the Nothing case.
It might be worth it to write out the functions longhand first, but I recommend that you use Hoogle to lookup and understand compare, on, and maximumBy.... Using tricks like this can really shorten your code.
Putting it all together, you can get the entry with the maximum count like this
maxEntry = maximumBy (compare `on` count) $ filter ((theName ==) . name) $ entries
You will need to modify this to account for the Nothing case, or if you want to return all max Entries (this just chooses one), or if you really wanted to return count, and not the entry.

How to view data from database haskell

I have table in haskell database. My 'link_des' table has two columns. I want to view both columns (data only) at the same time. My code is:
printURLs :: IO ()
printURLs = do urls <- getURLs
mapM_ print urls
getURLs :: IO [String]
getURLs = do conn <- connectSqlite3 "database.db"
res <- quickQuery' conn "SELECT * FROM link_des" []
return $ map fromSql (map head res)
With this I am getting first column data like
["col_1_data_1","col_1_data_2", ...]
using 'last' in lieu of 'head' I can get
["col_2_data_1","col_2_data_2", ...]
But I want to get data like
[("col_1_data_1","col_2_data_1"),("col_1_data_2","col_2_data_2"), ...]
which is actually like the pattern [(row_1),(row_2), ...]
Can anyone please help me. Thanks.
If you look at the type signature of quickQuery', you will see that it returns type IO [[SqlValue]]. That means that you already have the data in a form very similar to what you want.... Instead of
[("col_1_data_1","col_2_data_1"),("col_1_data_2","col_2_data_2"), ...]
you have
[["col_1_data_1","col_2_data_1"],["col_1_data_2","col_2_data_2"], ...]
The function you wrote is just pulling out the first column of this using "map head".
You could always write some code to convert a table with a known number of columns and types to the corresponding tuples (using a function like "convert [first, second] = (fromSql first, fromSql second)"), but it is much harder to write something that does this for arbitrary tables with differing number of columns and types. There are two reasons this is so....
a. First, you need to turn a list into a tuple, which isn't possible in Haskell for lists of differing sizes unless you use extensions. The main problem is that each size of tuple is its own type, and a single function can't choose its output type based on the input. You can do some trickery using GHC extensions, but the result is probably more complicated that you probably want to get into.
b. Second, You have to convert each value in the result from SqlValue to the appropriate Haskell type. This is also hard for similar reasons.
You might want to consider another approach altogether.... Take a look at the Yesod persistent database library, which is described at http://www.yesodweb.com/book/persistent. With that you define your schema in a quasiquote, and it creates Haskell records that are completely type safe.

Generating All Possible Paths in Haskell

I am very bad at wording things, so please bear with me.
I am doing a problem that requires me to generate all possible numbers in the form of a lists of lists, in Haskell.
For example if I have x = 3 and y = 2, I have to generate a list of lists like this:
[[1,1,1], [1,2,1], [2,1,1], [2,2,1], [1,1,2], [1,2,2], [2,1,2], [2,2,2]]
x and y are passed into the function and it has to work with any nonzero positive integers x and y.
I am completely lost and have no idea how to even begin.
For anyone kind enough to help me, please try to keep any math-heavy explanations as easy to understand as possible. I am really not good at math.
Assuming that this is homework, I'll give you the part of the answer, and show you how I think through this sort of problem. It's helpful to experiment in GHCi, and build up the pieces we need. One thing we need is to be able to generate a list of numbers from 1 through y. Suppose y is 7. Then:
λ> [1..7]
[1,2,3,4,5,6,7]
But as you'll see in a moment, what we really need is not a simple list, but a list of lists that we can build on. Like this:
λ> map (:[]) [1..7]
[[1],[2],[3],[4],[5],[6],[7]]
This basically says to take each element in the array, and prepend it to the empty list []. So now we can write a function to do this for us.
makeListOfLists y = map (:[]) [1..y]
Next, we need a way to prepend a new element to every element in a list of lists. Something like this:
λ> map (99:) [[1],[2],[3],[4],[5],[6],[7]]
[[99,1],[99,2],[99,3],[99,4],[99,5],[99,6],[99,7]]
(I used 99 here instead of, say, 1, so that you can easily see where the numbers come from.) So we could write a function to do that:
prepend x yss = map (x:) yss
Ultimately, we want to be able to take a list and a list of lists, and invoke prepend on every element in the list to every element in the list of lists. We can do that using the map function again. But as it turns out, it will be a little easier to do that if we switch the order of the arguments to prepend, like this:
prepend2 yss x = map (x:) yss
Then we can do something like this:
λ> map (prepend2 [[1],[2],[3],[4],[5],[6],[7]]) [97,98,99]
[[[97,1],[97,2],[97,3],[97,4],[97,5],[97,6],[97,7]],[[98,1],[98,2],[98,3],[98,4],[98,5],[98,6],[98,7]],[[99,1],[99,2],[99,3],[99,4],[99,5],[99,6],[99,7]]]
So now we can write that function:
supermap xs yss = map (prepend2 yss) xs
Using your example, if x=2 and y=3, then the answer we need is:
λ> let yss = makeListOfLists 3
λ> supermap [1..3] yss
[[[1,1],[1,2],[1,3]],[[2,1],[2,2],[2,3]],[[3,1],[3,2],[3,3]]]
(If that was all we needed, we could have done this more easily using a list comprehension. But since we need to be able to do this for an arbitrary x, a list comprehension won't work.)
Hopefully you can take it from here, and extend it to arbitrary x.
For the specific x, as already mentioned, the list comprehension would do the trick, assuming that x equals 3, one would write the following:
generate y = [[a,b,c] | a<-[1..y], b<-[1..y], c <-[1..y]]
But life gets much more complicated when x is not predetermined. I don't have much experience of programming in Haskell, I'm not acquainted with library functions and my approach is far from being the most efficient solution, so don't judge it too harshly.
My solution consists of two functions:
strip [] = []
strip (h:t) = h ++ strip t
populate y 2 = strip( map (\a-> map (:a:[]) [1..y]) [1..y])
populate y x = strip( map (\a-> map (:a) [1..y]) ( populate y ( x - 1) ))
strip is defined for the nested lists. By merging the list-items it reduces the hierarchy so to speak. For example calling
strip [[1],[2],[3]]
generates the output:
[1,2,3]
populate is the tricky one.
On the last step of the recursion, when the second argument equals to 2, the function maps each item of [1..y] with every element of the same list into a new list. For example
map (\a-> map (:a:[]) [1..2]) [1..2])
generates the output:
[[[1,1],[2,1]],[[1,2],[2,2]]]
and the strip function turns it into:
[[1,1],[2,1],[1,2],[2,2]]
As for the initial step of the recursion, when x is more than 2, populate does almost the same thing except this time it maps the items of the list with the list generated by the recursive call. And Finally:
populate 2 3
gives us the desired result:
[[1,1,1],[2,1,1],[1,2,1],[2,2,1],[1,1,2],[2,1,2],[1,2,2],[2,2,2]]
As I mentioned above, this approach is neither the most efficient nor the most readable one, but I think it solves the problem. In fact, theoritically the only way of solving this without the heavy usage of recursion would be building the string with list comprehension statement in it and than compiling that string dynamically, which, according to my short experience, as a programmer, is never a good solution.

Resources