Haskell: Output list of tuples as string output - string

I'm trying to get this list of tuples:
[(2,"a"), (1,"a"), (1,"b"), (1,"c"), (2,"dd")]
into this string output
a 1,2
b 1
c 1
dd 2
I assume I need to use the unzip and unlines functions. But I also saw some solutions using the show function which makes the integers strings. Any ideas?

Break the problem down into steps. What you really want to do first is aggregate all the tuples that have the same string in the second position, so you'll have a function like
aggregate :: [(Int, String)] -> [([Int], String)]
So for your input list you would get the output
[([1, 2], "a"), ([1], "b"), ([1], "c"), ([2], "dd")]
Your hints are
aggregate items = someFunc (map (\(num, str) -> ([num], str)) items)
And take a look at foldr. Before you ask a follow up question about foldr, there are probably hundreds of stackoverflow answers showing how to use it already, take some time to figure it out or it'll get closed immediately as a duplicate.
Then you need a function to convert a single tuple of this form into a single String for outputting:
prettyPrint :: ([Int], String) -> String
prettyPrint (nums, str) = str ++ " " ++ joinWithComma (map show nums)
Where you'll have to implement joinWithComma yourself. Then you need to calculate this and print it for each item in your aggregated list, mapM_ and putStrLn would be preferred, so your main might look like
main :: IO ()
main = do
let inputList = [(2,"a"), (1,"a"), (1,"b"), (1,"c"), (2,"dd")]
mapM_ (putStrLn . prettyPrint) (aggregate inputList)

If you have this list:
pairs = [ ("a", [1,2]), ("b", [1]), ("c", [1]), ("dd", [2]) ]
then you can get the desired output with:
putStrLn $ unlines [ x ++ " " ++ unwords (map show ys) | (x, ys) <- pairs ]
but you have to figure out how to get the pairs list first.

Work step by step. You could start with the groupBy function:
groupBy (\x y-> (snd x)==(snd y)) [(2,"a"), (1,"a"), (1,"b"), (1,"c"), (2,"dd")]
gives you
[[(2,"a"),(1,"a")],[(1,"b")],[(1,"c")],[(2,"dd")]]
The next step would be "totalling" the inner lists, map and foldL (and depending on your requirements maybe sortBy) should be helpful. If you have this, constructing the output is trivial (using show, as you already mentioned).

Related

A better way of optimizing for permutation of function compositions over an input?

I have a list of functions and their 'apply priority'.
It looks like this. Length of it is 33
listOfAllFunctions = [ (f1, 1)
, (f2, 2)
, ...
, ...
, (f33, 33)
]
What I want to do is generate a list of permutations of the above list with no duplicates and I only want 8 unique elements in the inner list.
Which I'm implementing like this
prioratizedFunctions :: [[(MyDataType -> MyDataType, Int)]]
prioratizedFunctions = nubBy removeDuplicates
$ sortBy (comparing snd)
<$> take 8
<$> permutations listOfAllFunctions
where removeDuplicates is defined like
removeDuplicates a b = map snd a == map snd b
Lastly I'm turning the sublists which'd be [(MyDataType -> MyDataType, Int)] to a composition of functions and a [Int]
with this function
compFunc :: [(MyDataType -> MyDataType, Int)] -> MyDataType -> (MyDataType, [Int])
compFunc listOfDataAndInts target = (foldr ((.) . fst) id listOfDataAndInts target
, map snd listOfDataAndInts)
Applying the above function like this (flip compFunc) target <$> prioratizedFunctions
All of the above is a simplified version of the actual code but it should provide the gist it.
The problem is that this code takes practically forever to execute. From some prototyping I think the blame of it falls on my implementation of permutations function inside prioratizedFunctions.
So I was wondering, is there a better way of doing what I want (basically generating permutation of listOfAllFunctions where each list only contains 8 elements, every list of elements sorted by their priority with snd and containing no duplicate list)
or is the problem inherently a long process?
I was generating unnecessary permutations.
This choose function is basically a non-deterministic take function
choose 0 xs = [[]]
choose n [] = []
choose n (x:xs) = map (x:) (choose (n-1) xs) ++ choose n xs
which improved performance by a lot.

Dynamic number of list comprehension items

I'm trying to get permutations of a variable number of strings in a list.. I'm sure this is possible in Haskell, I'm just having a hard time finding a reference for this,
I'm looking to be able to do this [ [n1] ++ [n2] ++ etc | n1 <- {first string}, n2 <- {second string}, etc ]
Where my list might be ["hey", "now"]
and my output would look like this:
["hn","ho","hw","en","eo","ew","yn","yo","yw"]
How would I go about doing something like that?
> sequence ["hey", "now"]
["hn","ho","hw","en","eo","ew","yn","yo","yw"]
sequence is very general, but on lists you can think of it as if it were defined as follows:
sequence :: [[a]] -> [[a]]
sequence [] = [[]]
sequence (x:xs) = [ y:ys | y <- x, ys <- sequence xs ]
The result above is sometimes called the "cartesian product" of a list of lists, since it is similar to that operation on sets.
EDIT: This only works for strings of length 2, but shows the desugaring of the list comprehension (since return is concat and fmap is map, if I recall).
Here's a brute force way of doing it (if you'd like to know a possible approach). If you'd like the clean version, please see chi's answer.
concat $ map (\char1 -> map (\char2 -> char1:[char2]) string2) string1 should do it. There might be a better way with list comprehensions, but this does the job too.
Explanation:
concat $ -- Flatten lists
map (\char1 -> -- Iterate over each character of string1
map (\char2 -> -- Iterate over each character of string2
char1 : [char2] -- Add char1 to char2
) string2
) string1

How to split a [String] in to [[String]] based on length

I'm trying to split a list of Strings in to a List of Lists of Strings
so like in the title [String] -> [[String]]
This has to be done based on length of characters, so that the Lists in the output are no longer than 10. So if input was length 20 this would be broken down in to 2 lists and if length 21 in to 3 lists.
I'm not sure what to use to do this, I don't even know how to brake down a list in to a list of lists never mind based on certain length.
For example if the limit was 5 and the input was:
["abc","cd","abcd","ab"]
The output would be:
[["abc","cd"],["abcd"],["ab"]]
I'd like to be pointed in the right direction and what methods to use, list comprehension? recursion?
Here's an intuitive solution:
import Data.List (foldl')
breakup :: Int -> [[a]] -> [[[a]]]
breakup size = foldl' accumulate [[]]
where accumulate broken l
| length l > size = error "Breakup size too small."
| sum (map length (last broken ++ [l])) <= size
= init broken ++ [last broken ++ [l]]
| otherwise = broken ++ [[l]]
Now, let's go through it line-by-line:
breakup :: Int -> [[a]] -> [[[a]]]
Since you hinted that you may want to generalize the function to accept different size limits, our type signature reflects this. We also generalize beyond [String] (that is, [[Char]]), since our problem is not specific to [[Char]], and could equally apply to any [[a]].
breakup size = foldl' accumulate [[]]
We're using a left fold because we want to transform a list, left-to-right, into our target, which will be a list of sub-lists. Even though we're not concerned with efficiency, we're using Data.List.foldl' instead of Prelude's own foldl because this is standard practice. You can read more about foldl vs. foldl' here.
Our folding function is called accumulate. It will consider a new item and decide whether to place it in the last-created sub-list or to start a new sub-list. To make that judgment, it uses the size we passed in. We start with an initial value of [[]], that is, a list with one empty sub-list.
Now the question is, how should you accumulate your target?
where accumulate broken l
We're using broken to refer to our constructed target so far, and l (for "list") to refer to the next item to process. We'll use guards for the different cases:
| length l > size = error "Breakup size too small."
We need to raise an error if the item surpasses the size limit on its own, since there's no way to place it in a sub-list that satisfies the size limit. (Alternatively, we could build a safe function by wrapping our return value in the Maybe monad, and that's something you should definitely try out on your own.)
| sum (map length (last broken ++ [l])) <= size
= init broken ++ [last broken ++ [l]]
The guard condition is sum (map length (last broken ++ [l])) <= size, and the return value for this guard is init broken ++ [last broken ++ [l]]. Translated into plain English, we might say, "If the item can fit in the last sub-list without going over the size limit, append it there."
| otherwise = broken ++ [[l]]
On the other hand, if there isn't enough "room" in the last sub-list for this item, we start a new sub-list, containing only this item. When the accumulate helper is applied to the next item in the input list, it will decide whether to place that item in this sub-list or start yet another sub-list, following the same logic.
There you have it. Don't forget to import Data.List (foldl') up at the top. As another answer points out, this is not a performant solution if you plan to process 100,000 strings. However, I believe this solution is easier to read and understand. In many cases, readability is the more important optimization.
Thanks for the fun question. Good luck with Haskell, and happy coding!
You can do something like this:
splitByLen :: Int -> [String] -> [[String]]
splitByLen n s = go (zip s $ scanl1 (+) $ map length s) 0
where go [] _ = []
go xs prev = let (lst, rest) = span (\ (x, c) -> c - prev <= n) xs
in (map fst lst) : go rest (snd $ last lst)
And then:
*Main> splitByLen 5 ["abc","cd","abcd","ab"]
[["abc","cd"],["abcd"],["ab"]]
In case there is a string longer than n, this function will fail. Now, what you want to do in those cases depends on your requirements and that was not specified in your question.
[Update]
As requested by #amar47shah, I made a benchmark comparing his solution (breakup) with mine (splitByLen):
import Data.List
import Data.Time.Clock
import Control.DeepSeq
import System.Random
main :: IO ()
main = do
s <- mapM (\ _ -> randomString 10) [1..10000]
test "breakup 10000" $ breakup 10 s
test "splitByLen 10000" $ splitByLen 10 s
putStrLn ""
r <- mapM (\ _ -> randomString 10) [1..100000]
test "breakup 100000" $ breakup 10 r
test "splitByLen 100000" $ splitByLen 10 r
test :: (NFData a) => String -> a -> IO ()
test s a = do time1 <- getCurrentTime
time2 <- a `deepseq` getCurrentTime
putStrLn $ s ++ ": " ++ show (diffUTCTime time2 time1)
randomString :: Int -> IO String
randomString n = do
l <- randomRIO (1,n)
mapM (\ _ -> randomRIO ('a', 'z')) [1..l]
Here are the results:
breakup 10000: 0.904012s
splitByLen 10000: 0.005966s
breakup 100000: 150.945322s
splitByLen 100000: 0.058658s
Here is another approach. It is clear from the problem that the result is a list of lists and we need a running length and an inner list to keep track of how much we have accumulated (We use foldl' with these two as input). We then describe what we want which is basically:
If the length of the current input string itself exceeds the input length, we ignore that string (you may change this if you want a different behavior).
If the new length after we have added the length of the current string is within our input length, we add it to the current result list.
If the new length exceeds the input length, we add the result so far to the output and start a new result list.
chunks len = reverse . map reverse . snd . foldl' f (0, [[]]) where
f (resSoFar#(lenSoFar, (currRes: acc)) curr
| currLength > len = resSoFar -- ignore
| newLen <= len = (newLen, (curr: currRes):acc)
| otherwise = (currLength, [curr]:currRes:acc)
where
newLen = lenSoFar + currLength
currLength = length curr
Every time we add a result to the output list, we add it to the front hence we need reverse . map reverse at the end.
> chunks 5 ["abc","cd","abcd","ab"]
[["abc","cd"],["abcd"],["ab"]]
> chunks 5 ["abc","cd","abcdef","ab"]
[["abc","cd"],["ab"]]
Here is an elementary approach. First, the type String doesn't matter, so we can define our function in terms of a general type a:
breakup :: [a] -> [[a]]
I'll illustrate with a limit of 3 instead of 10. It'll be obvious how to implement it with another limit.
The first pattern will handle lists which are of size >= 3 and the the second pattern handles all of the other cases:
breakup (a1 : a2 : a3 : as) = [a1, a2, a3] : breakup as
breakup as = [ as ]
It is important to have the patterns in this order. That way the second pattern will only be used when the first pattern does not match, i.e. when there are less than 3 elements in the list.
Examples of running this on some inputs:
breakup [1..5] -> [ [1,2,3], [4,5] ]
breakup [1..4] -> [ [1,2,3], [4] ]
breakup [1..2] -> [ [1,2] ]
breakup [1..3] -> [ [1,2,3], [] ]
We see these is an extra [] when we run the function on [1..3]. Fortunately this is easy to fix by inserting another rule before the last one:
breakup [] = []
The complete definition is:
breakup :: [a] -> [[a]]
breakup [] = []
breakup (a1 : a2 : a3 : as) = [a1, a2, a3] : breakup as
breakup as = [ as ]

mapping over entire data set to get results

Suppose I have the arrays:
A = "ABACUS"
B = "YELLOW"
And they are zipped so: Pairing = zip A B
I also have a function Connect :: Char -> [(Char,Char)] -> [(Char,Char,Int)]
What I want to do is given a char such as A, find the indices of where it is present in the first string and return the character in the same positions in the second string, as well as the position e.g. if I did Connect 'A' Pairing I'd want (A,Y,0) and (A,L,2) as results.
I know I can do
pos = x!!map fst pairing
to retrieve the positions. And fnd = findIndices (==pos) map snd pairing to get what's in this position in the second string but in Haskell how would I do this over the whole set of data (as if I were using a for loop) and how would I get my outputs?
To do exactly as you asked (but correct the initial letter of function names to be lowercase), I could define
connect :: Char -> [(Char,Char)] -> [(Char,Char,Int)]
connect c pairs = [(a,b,n)|((a,b),n) <- zip pairs [0..], a == c]
so if
pairing = zip "ABACUS" "YELLOW"
we get
ghci> connect 'A' pairing
[('A','Y',0),('A','L',2)]
However, I think it'd be neater to zip once, not twice, using zip3:
connect3 :: Char -> String -> String -> [(Char,Char,Int)]
connect3 c xs ys = filter (\(a,_,_) -> a==c) (zip3 xs ys [0..])
which is equivalent to
connect3' c xs ys = [(a,b,n)| (a,b,n) <- zip3 xs ys [0..], a==c]
they all work as you wanted:
ghci> connect3 'A' "ABACUS" "YELLOW"
[('A','Y',0),('A','L',2)]
ghci> connect3' 'A' "ABACUS" "AQUAMARINE"
[('A','A',0),('A','U',2)]
In comments, you said you'd like to get pairs for matches the other way round.
This time, it'd be most convenient to use the monadic do notation, since lists are an example of a monad.
connectEither :: (Char,Char) -> String -> String -> [(Char,Char,Int)]
connectEither (c1,c2) xs ys = do
(a,b,n) <- zip3 xs ys [0..]
if a == c1 then return (a,b,n) else
if b == c2 then return (b,a,n) else
fail "Doesn't match - leave it out"
I've used the fail function to leave out ones that don't match. The three lines starting if, if and fail are increasingly indented because they're actually one line from Haskell's point of view.
ghci> connectEither ('a','n') "abacus" "banana"
[('a','b',0),('a','n',2),('n','u',4)]
In this case, it hasn't included ('n','a',2) because it's only checking one way.
We can allow both ways by reusing existing functions:
connectBoth :: (Char,Char) -> String -> String -> [(Char,Char,Int)]
connectBoth (c1,c2) xs ys = lefts ++ rights where
lefts = connect3 c1 xs ys
rights = connect3 c2 ys xs
which gives us everything we want to get:
ghci> connectBoth ('a','n') "abacus" "banana"
[('a','b',0),('a','n',2),('n','a',2),('n','u',4)]
but unfortunately things more than once:
ghci> connectBoth ('A','A') "Austria" "Antwerp"
[('A','A',0),('A','A',0)]
So we can get rid of that using nub from Data.List. (Add import Data.List at the top of your file.)
connectBothOnce (c1,c2) xs ys = nub $ connectBoth (c1,c2) xs ys
giving
ghci> connectBothOnce ('A','A') "ABACUS" "Antwerp"
[('A','A',0),('A','t',2)]
I would recommend not zipping the lists together, since that'd just make it more difficult to use the function elemIndices from Data.List. You then have a list of the indices that you can use directly to get the values out of the second list.
You can add indices with another zip, then filter on the given character and convert tuples to triples. Especially because of this repackaging, a list comprehension seems appropriate:
connect c pairs = [(a, b, idx) | ((a, b), idx) <- zip pairs [0..], a == c]

How to show all the tuple data in a list (using recursion)

Imagine I have a tuple list like
[("Fazaal","Naufer",07712345678)
,("Tharanga","Chandasekara",0779876543)
,("Ruaim","Mohomad",07798454545)
,("Yasitha","Lokunarangoda",07798121212)
,("Rochana","Wimalasena",0779878787)
,("Navin","Dhananshan",077987345678)
,("Akila","Silva",07798123123)
,("Sudantha","Gunawardana",0779812456)
]
I want to show each tuple here.
I tried this code but it messes the format up.
displayDB :: [Reservation] ->String
displayDB [] = []
displayDB (x :xs) = show x ++ show( displayDB (xs))
First, you are calling "show" on the string output of "displayDB". Your last line should be
displayDB (x :xs) = show x ++ displayDB xs
Your version will cause each successive tuple to be enclosed in another layer of string escaping, so you will get progressively more complex escaping.
Second, "show x" will convert the tuple into a string in the most obvious and basic way. You probably want a better function than that which emits the fields in a nicer way, and you will also want to interpolate commas or newlines as appropriate. Without knowing what you want this output for its a bit difficult to tell.
Third, it's bad style to write a recursive function of your own (unless you are writing this as an exercise); a better style is to compose functions like "map". The expression
map show xs
will turn your list of tuples into a list of strings. You can then print these strings or use "intercalate" in Data.List to turn this list of strings into a single string with the right bits put in between the elements. So you probably want something like
displayDB xs = intercalate ",\n" $ map show xs
Or if you prefer it in point-free form:
displayDB = intercalate ",\n" . map show
Consider what happens on smaller input. Each line is an expansion:
displayDB [("A", "B", 1), ("C", "D", 2)]
=> show ("A", "B", 1) ++ show (displayDB [("C", "D", 2)])
=> show ("A", "B", 1) ++ show (show ("C", "D", 2) ++ show (displayDB []))
=> show ("A", "B", 1) ++ show (show ("C", "D", 2) ++ show [])

Resources