Haskell - Reduce list - MapReduce - haskell

I'm trying to reduce a list of tuples, where the values of a duplicate key are added together like this:
[(the, 1), (the, 1)] => [(the, 2)]
I tried this:
reduce :: [(String, Integer)] -> [(String, Integer)]
reduce [] = []
reduce [(k, v) : xs] = (+) [(k, v)] : reduce xs
I'm getting this error:
Couldn't match expected type `(String, Integer)'
with actual type `[(String, Integer)] -> [(String, Integer)]'
What am I doing wrong?
Edit
This is the full program
toTuple :: [String] -> [(String, Integer)]
toTuple [] = []
toTuple (k:xs) = (k, 1) : toTuple xs
reduce :: [(String, Integer)] -> [(String, Integer)]
reduce [] = []
reduce [(k, v) : xs] = (+) [(k, v)] : reduce xs
main_ = do list <- getWords "test.txt"
print $ reduce $ toTuple list
-- Loads words from a text file into a list.
getWords :: FilePath -> IO [String]
getWords path = do contents <- readFile path
return ([Prelude.map toLower x | x <- words contents])

You are doing the pattern matching wrong. The pattern match should be like this:
((k,v):xs)
(k,v) represents the head of the list and xs represents the tail of the list. Similarly this is problematic:
(+) [(k, v)] : reduce xs
The type of + is this:
λ> :t (+)
(+) :: Num a => a -> a -> a
You cannot simply do (+) [(k, v)] : reduce xs which doesn't appear anywhere reasonable. You have to check the contents of the String and then add second part of the tuple.

Let me point out that your function reduce is extremely similar to function fromListWith from Data.Map:
> :m Data.Map
> let reduce = toList . fromListWith (+)
> :t reduce
reduce :: (Ord k, Num a) => [(k, a)] -> [(k, a)]
> reduce [('a', 3), ('a', 1), ('b', 2), ('a', 10), ('b', 2), ('c', 1)]
[('a',14),('b',4),('c',1)]
> reduce [(c,1) | c <- "the quick brown fox jumps over the lazy dog"]
[(' ',8),('a',1),('b',1),('c',1),('d',1),('e',3),('f',1),('g',1),('h',2),('i',1),('j',1),('k',1),('l',1),('m',1),('n',1),('o',4),('p',1),('q',1),('r',2),('s',1),('t',2),('u',2),('v',1),('w',1),('x',1),('y',1),('z',1)]

Related

How to check in a tuple if element matches?

I am trying to check whether my second part of my tuple is 1.0 and if it is then I am storing it in the list. But I can't figure out how to implement the check.
number :: [(Integer, Double)] -> [Integer]
number lst = number' lst []
where
number' [] a = a
((mat,1.0): xs) a = number'(xs) (mat:a) -- Here I am getting my error
((_,lst): xs) a = number'(xs) a
Maybe someone has an idea.
The reason this doesn't work is because you need to write number' for every line:
number :: [(Integer, Double)] -> [Integer]
number lst = number' lst []
where
number' [] a = a
number' ((mat,1.0): xs) a = number'(xs) (mat:a)
number' ((_,lst): xs) a = number'(xs) a
But this will return the "keys" in reverse.
You can pattern match with:
number :: (Eq a, Num a) => [(a, b)] -> [a]
number ((x, 1):xs) = x : number xs
number (_:xs) = number xs
number [] = []
You can however work with list comprehension:
number :: (Eq a, Num a) => [(a, b)] -> [a]
number xs = [ x | (x, 1) <- xs ]
or with a combination of map and filter:
number :: (Eq a, Num a) => [(a, b)] -> [a]
number = map fst . filter ((1 ==) . snd)

Compression reverse Haskell

I have this function where i compress a string
compression :: String -> [(Char,Int)]
compression str = map (\lt -> (head lt, length lt)) (group str)
I've tried to edit this into it's reverse
EXAMPLE:
revcompression [('a', 2), ('b', 3), ('c', 4), ('b', 1)] == "aabbbccccb"
Can anyone edit the first one into the reverse?
This is without recursion:
decompress :: [(Char,Int)] -> String
decompress xs = concat $ map (\(c, n) -> replicate n c) xs
Or using >>=:
decompress :: [(Char,Int)] -> String
decompress xs = xs >>= (\(c, n) -> replicate n c)
Or point-free:
decompress :: [(Char,Int)] -> String
decompress = flip (>>=) (uncurry replicate . swap)
where swap (a, b) = (b,a)
Or with the imports suggested in the comments:
import Control.Monad ((=<<))
import Data.Tuple (swap)
decompress :: [(Char,Int)] -> String
decompress = (=<<) (uncurry replicate . swap)
the joys of haskell :)
This can be done with recursion using the replicate function, which will repeat an element a certain number of times:
decompress :: [(Char, Int)] -> String
decompress [] = ... -- what should you get when you decompress the empty array?
decompress ((c, n):xs) = replicate n c ++ ... -- hint: you'll want to have a recursive call
Where the ... is something you need to fill in.
Jut play with some example data, and generalize:
compressed str = map (\lt -> (head lt, length lt)) (group str)
compressed [a,a,a,b,b,b,c,d,d,a] =
= map (head &&& length) [ [a,a,a], [b,b,b], [c], [d,d], [a] ]
= [ (a, 3), (b, 3), (c, 1), (d, 2), (a, 1)]
uncompressed cs#[ (a, 3), (b, 3), (c, 1), (d, 2), (a, 1)] =
= concat [ [a,a,a], [b,b,b], [c], [d,d], [a] ]
= concat [ replicate k x | (x,k) <- cs ]
= concatMap (uncurry (flip replicate)) cs
Indeed,
> :t concatMap (uncurry (flip replicate))
concatMap (uncurry (flip replicate)) :: [(b, Int)] -> [b]
compression :: String -> [(Char,Int)]
compression str = map (\lt -> (head lt, length lt)) (group str)
So you basically need a function that reverses the operations done in compression.
The type of that function(say decompression) will be [(Char, Int)] -> String and it will go through a list of tuples and populate a list with the first tuple member the number of times determined by the second tuple member. One simple way to do it would be a fold as follows:
decompression = foldl (\acc (a, n) -> acc ++ (replicate n a)) []
A fold uses recursion implicitly and we are also using a handy function replicate to make our work easier. There are other, perhaps more elegant way to do it in the other answers, I think using a fold is pretty readable and natural.
EDIT : As pointed out by #Will Ness in a comment below. Using ++ in a left associated chain is an antipattern. It's quite inefficient as explained in that answer. So it's better to use foldr instead as it is right-associative.
decompression = foldr (\(a, n) acc -> (replicate n a) ++ acc) []
You've already been provided with various solutions. I'll try to provide some insight on how to go about finding one such solution. Below we will derive the solution provided by #rethab,
decompress xs = concat $ map (\(c, n) -> replicate n c) xs
or more succinctly,
decompress = concat . map (\(c, n) -> replicate n c)
In what follows we will write f' to denote a left inverse of f, i.e. a function such that f' . f = id. The two following properties hold,
f' . g' is a left inverse of g . f
map f' is a left inverse of map f
The original compress function can be written as,
headLen lt = (head lt, length lt)
compression str = map headLen . group
We are looking for a function decompress = compress'. Given the two properties above, if we have group' and headLen' then a solution would be,
decompression = group' . map headLen'
So we've factored the original problem into two smaller ones. Finding a left inverse of group and a left inverse of headLen. These are rather simple,
group' = concat
headLen' (c, n) = replicate n c
and so we have,
decompression = concat . map (\(c, n) -> replicate n c)

Populating a list of tuples in a semantic way

I'm working on a piece of code where I have to process lists of tuples where both the order and names of the "keys" (fsts of the tuples) match a certain template. I'm implementing fault tolerance by validating and (if needed) generating a valid list based on the input.
Here's an example of what I mean:
Given the template of keys, ["hello", "world", "this", "is", "a", "test"], and a list [("hello", Just 1), ("world", Just 2), ("test", Just 3)], passing it to my function validate would cause it to fail validation - as the order and values of the keys do not match up with the template.
Upon failing validation, I want to generate a new list, which would look like [("hello", Just 1), ("world", Just 2), ("this", Nothing), ("is", Nothing), ("a", Nothing), ("test", Just 3)].
I tried performing this last step using an (incomplete) list comprehension:
[(x, y) | x <- template, y <- l]
(Obviously, this is missing the step where empty entries would be replaced with Nothings, and works under the assumption that the input is of type [(String, Maybe Int)]).
What would be the easiest semantic way of doing this?
You essentially want to map a function to your list of strings (which you call "template"), i.e. the function that
takes a string xs,
returns
(xs, Just n) if an integer n is associated to xs in your "list to validate",
(xs, Nothing) otherwise.
Here is one possible approach:
import Data.List ( lookup )
import Control.Monad ( join )
consolidate :: [String] -> [(String, Maybe Int)] -> [(String, Maybe Int)]
consolidate temp l = map (\xs -> (xs, join $ lookup xs l)) temp
However, you will get faster lookup if you build a Map holding the key-value pairs of your association list (the "list to validate"):
import qualified Data.Map as M
import Data.Maybe (maybe)
consolidate :: [String] -> [(String, Maybe Int)] -> [(String, Maybe Int)]
consolidate temp l = map (\cs -> (cs, M.lookup cs $ fromList' l)) temp
fromList' :: Ord a => [(a, Maybe b)] -> M.Map a b
fromList' xs = foldr insertJust M.empty xs
insertJust :: Ord a => (a, Maybe b) -> M.Map a b -> M.Map a b
insertJust (xs, maybeVal) mp = maybe mp (\n -> M.insert xs n mp) maybeVal
In GHCi:
λ> let myTemplate = ["hello", "world", "this", "is", "a", "test"]
λ> let myList = [("hello", Just 1), ("world", Just 2), ("test", Just 3)]
λ> consolidate myTemplate myList
[("hello",Just 1),("world",Just 2),("this",Nothing),("is",Nothing),("a",Nothing),("test",Just 3)]

Filter by length

How I can make here filter (x:xs) = (x, length (x:xs)) that puts length when length > 1?
Currently, if input is abcaaabbb output is [('a',1),('b',1),('c',1),('a',3),('b',3)], but I'm looking for abca3b3.
My code:
import Data.List
encode :: [Char] -> [(Char, Int)]
encode s = map go (group s)
where go (x:xs) = (x, length (x:xs))
main = do
s <- getLine
print (encode s)
Last string will be putStrLn (concat (map (\(x,y) -> x : [y]) (encode s))) for convert list to string.
As I am a newbie myself, this is probably not very haskellian. But you can do it about like this (xs as would be the list [('a', 1), ('b', 2), ('a', 3)]):
Create "a1b2a3":
concat $ map (\(c, l) -> c:(show l)) xs
Filter out 1s:
filter (\x -> x /= '1') "a1b2a3"
will give you "ab2a3"
You can't have a list like this in Haskell:
[('a'),('b'),('c'),('a',3),('b',3)]
Each element if a list needs to have the same type in haskell, and ('c') [('a') :: Char] and ('b',3) [('a',1) :: Num t => (Char, t)] are different types.
Maybe also have a look at List of different types?
I would suggest, that you change your list to a (Char, Maybe num) datastructure.
Edit:
From your new question, I think you have been searching for this:
import Data.List
encode :: [Char] -> [(Char, Int)]
encode s = map go (group s)
where go (x:xs) = (x, length (x:xs))
f :: (Char, Int) -> String
f (a, b) = if b == 1 then [a] else [a] ++ show b
encode2 :: [(Char, Int)] -> String
encode2 [] = []
encode2 (x:xs) = f(x) ++ encode2 xs
main = do
s <- getLine
putStrLn $ encode2 $ encode s
Not sure if this suits your needs, but if you do not need filtering, this does the work:
encode::String -> String
encode "" = ""
encode (x:xs) = doIt0 xs x 1 where
doIt0 [] ch currentPos = [ch]++showPos currentPos
doIt0 (x:xs) ch currentPos
|x==ch = doIt0 xs ch $ currentPos+1
|otherwise= [ch]++ (showPos currentPos) ++ (doIt0 xs x 1)
showPos pos = if pos> 1 then show pos else ""
main = do
s <- getLine
print (encode s)

map of all successors for each element in haskell

Given a sequence of elements, I want to find a list of all the direct successors for each element:
Example:
"AABAABAAC"
Should return something like (using Data.Map):
fromList [('A',"ABABA"), ('B',"AA"), ('C', "")]
I am aware of the fromListWith function but I can't seem to get the list comprehension right:
succs :: Ord a => [a] -> M.Map a [a]
succs xs = M.fromListWith (++) [(x, ???) | ??? ]
Does this help?
succs xs#(_:xss) = M.fromListWith (++) $ zip xs (map (:[]) xss ++ [[]])
I think it returns ('A',"ABABAC")..., your example has no C.
(:[]) is a point-free version of
singleton :: a -> [a]
singleton x = [x]
How did I get to this solution? I find this definition for the fibonacci numbers fascinating: [1] [2]
fibs = fibs = 0:1:zipWith (+) fibs (tail fibs)
A similar thing can pair up every element with its successor:
let x = "AABAABAAC"
zip x (tail x)
[('A','A'),('A','B'),('B','A'),('A','A'),('A','B'),('B','A'),('A','A'),('A','C')]
This type almost matches the input to
M.fromListWith :: Ord k => (a -> a -> a) -> [(k, a)] -> M.Map k a
Now turn the characters into singleton lists and add an empty list to not suppress ('C',"").
You can split the problem into two parts. First, find the edges between two elements of a list.
edges :: [a] -> [(a, a)]
edges (x:y:zs) = (x,y):edges (y:zs)
edges _ = []
Then build a map to all the items that are the immediate successors of an item with fromListWith.
succs :: Ord a => [a] -> M.Map a [a]
succs = M.fromListWith (++) . map (\(x,y) -> (x,[y])) . edges
This doesn't give quite exactly what you desire. There's no entry for 'C' since it has no immediate successors.
succs "AABAABAAC" = fromList [('A',"CABABA"),('B',"AA")]
Instead we can make a less general-purpose version of edges that includes an item for the last item in the list.
succs :: Ord a => [a] -> M.Map a [a]
succs = M.fromListWith (++) . edges
where
edges (x:y:zs) = (x,[y]):edges (y:zs)
edges (x:zs) = (x,[] ):edges zs
edges _ = []

Resources