How to process / summarise a list into a "different" list - haskell

I think I need something like a fold or maybe a foldt but the examples I've seen, seem to only compress the list into a simple scalar value.
What I need would need to remember and re-use values from previous lines in the list (essentially a "group by" operation)
If my input data looks like:
[["order1", "item1"],["", "item2"],["","item3"],["order2","item4"]]
What is the correct approach to end up with something like:
[["order1",["item1","item2","item3"]],["order2",["item4"]]
ie data Order = Order { id :: Text, items :: [OrderItem]}
What if I wanted a slightly different structure?
[("order1",["item1","item2","item3"]),("order",["item4"])]
ie data OrderTuple = OrderTuple { order :: Order, items :: [OrderItem]}
What if I also wanted to keep a running total of some numeric value from the OrderItem?
edit: Here's the code I'm trying to get working based on Frerich's answer
--testGroupBy :: [[String]] -> [[String]]
testGroupBy :: [[String]] -> [(String, [String])]
testGroupBy z =
--groupBy (\(x:xs) (y:ys) -> x == y || null y) z
groupBy testFunc z
testFunc :: [String] -> [String] -> Bool
testFunc (x:xs) (y:ys) = x == y || null y

Pattern matching is useful here
groupData = foldl acc []
where acc ((r, rs):rss) ("":xs) = (r, rs ++ xs): rss
acc rss (x:xs) = (x, xs): rss
acc _ _ = error "Bad input data"
resultant groups are in reverse order, use reverse if you need.
What if I wanted a slightly different structure?
Simply transform one into other, you can do inside groupData or as separated function.
If you admit initial groups without fst element
groupData = foldr acc []
where acc (x:xs) [] = [(x, xs)]
acc ("":xs) (("", rs):rss) = ("", rs ++ xs): rss
acc (x:xs) (("", rs):rss) = (x, rs ++ xs): rss
acc (x:xs) rss = (x, xs): rss
then
let xs = [["", "item8"],["", "item9"],["order1", "item1"],["", "item2"],["","item3"],["order2","item4"]]
print $ groupData xs
is
[("",["item9","item8"])
,("order1",["item3","item2","item1"])
,("order2",["item4"])]

Instead of looking for a fold-based solution, I'd first try to see whether you can define a function as a composition of higher-level functions (such as map). Let me fire up a ghci session and play abit:
λ: let x = [["order1", "item1"],["", "item2"],["","item3"],["order2","item4"]]
Your "group by" operation actually has an existing name: Data.List.groupBy -- this almost gets us what we need:
λ: import Data.List
λ: let x' = groupBy (\(x:xs) (y:ys) -> x == y || null y) x
λ: x'
[[["order1","item1"],["","item2"],["","item3"]],[["order2","item4"]]]
This groupBy application puts all elements in x into one group (i.e. list) whose first element is equal, or if the second element is empty. This can then get massaged into your desired format (in this case, the second one you proposed with a map):
λ: let x'' = map (\x -> (head (head x), map (!! 1) x)) x'
λ: x''
[("order1",["item1","item2","item3"]),("order2",["item4"])]
Putting it all together:
groupData :: [[String]] -> [(String, [String])]
groupData = map (\x -> (head (head x), map (!! 1) x))
. groupBy (\(x:xs) (y:ys) -> x == y || y == "")
I suppose that with this, building a proper data structure (i.e. something more typesafe than nested lists) should be straightforward.

Related

Remove a Character Sequence From a String

Consider a function, which takes a string and returns a list of all possible cases in which three subsequent 'X's can be removed from the list.
Example:
"ABXXXDGTJXXXDGXF" should become
["ABDGTJXXXDGXF", "ABXXXDGTJDGXF"]
(The order does not matter)
here is a naive implementation:
f :: String -> [String]
f xs = go [] xs [] where
go left (a:b:c:right) acc =
go (left ++ [a]) (b:c:right) y where -- (1)
y = if a == 'X' && b == 'X' && c == 'X'
then (left ++ right) : acc
else acc
go _ _ acc = acc
I think the main problem here is the line marked with (1). I'm constructing the left side of the list by appending to it, which is generally expensive.
Usually something like this can be solved by this pattern:
f [] = []
f (x:xs) = x : f xs
Or more explicitly:
f [] = []
f (x:right) = x : left where
left = f right
Now I'd have the lists right and left in each recursion. However, I need to accumulate them and I could not figure out how to do so here. Or am I on the wrong path?
A solution
Inspired by Gurkenglas' propose, here is a bit more generalized version of it:
import Data.Bool
removeOn :: (String -> Bool) -> Int -> String -> [String]
removeOn onF n xs = go xs where
go xs | length xs >= n =
bool id (right:) (onF mid) $
map (head mid:) $
go (tail xs)
where
(mid, right) = splitAt n xs
go _ = []
removeOn (and . map (=='X')) 3 "ABXXXDGTJXXXDGXF"
--> ["ABDGTJXXXDGXF","ABXXXDGTJDGXF"]
The main idea seems to be the following:
Traverse the list starting from its end. Make use of a 'look-ahead' mechanism which can examine the next n elements of the list (thus it must be checked, if the current list contains that many elements). By this recursive traversal an accumulating list of results is being enhanced in the cases the following elements pass a truth test. In any way those results must be added the current first element of the list because they stem from shorter lists. This can be done blindly, since adding characters to a result string won't change their property of being a match.
f :: String -> [String]
f (a:b:c:right)
= (if a == 'X' && b == 'X' && c == 'X' then (right:) else id)
$ map (a:) $ f (b:c:right)
f _ = []

Haskell creating my own filtering function

I'm a new to haskell, I'm trying to create a function that will take a list on integers and returns a list containing two sublists, the first sublist containing the even number and the other containing the odd numbers. I cannot use even, odd or filter functions. i created my own functions as follows
myodd :: Integer -> Bool
myodd n = rem (abs(n)) 2 == 1
myeven :: Integer -> Bool
myeven n = rem (abs(n)) 2 == 0
segregate [] = ([], [])
segregate [x] = ([x], [])
segregate (x:y:xs) = (x:xp, y:yp) where (xp, yp) = segregate xs
im having trouble trying to use the two first functions and use it on the segregated functions. I have more experience in racket and the function I crated looks like this:
(define (myeven? x)
(= (modulo x 2) 0))
(define (myodd? x)
(= (modulo x 2) 1))
(define (segregate xs)
(foldr (lambda (x b)
(if (myeven? x)
(list (cons x (first b)) (second b))
(list (first b) (cons x (second b))))) '(()()) xs))
Here's one good way:
segregate [] = ?
segregate (x:xs)
| myEven x = ?
| otherwise = ?
where (restEvens, restOdds) = segregate xs
You could also use
segregate = foldr go ([], []) where
go x ~(evens, odds)
| myEven x = ?
| otherwise = ?
A simple way is to run through the list twice using each of your hand-rolled functions as a guard predicate:
segregate :: [Integer] -> ([Integer], [Integer])
segregate [] = ([],[])
segregate xs = (evens, odds)
where evens = [x | x <- xs, myeven x]
odds = [x | x <- xs, myodd x]
Note: your question asked for a list of lists but your pattern matching segregate [] = ([],[]) indicated you wanted a tuple so I gave a tuple solution.
If you really need such a function rather than writing it for educational purposes, there is partition in Data.List, so a simple
import Data.List(partition)
wil get you going.
In the educational case, you can still compare your code, once you're done, with the code of partition.

Combs on a large set doesn't compute Haskell

I'm writing a combs function in haskell
what it needs to do is, when I provide it with a deck of cards, give me every combination of hands possible from that deck of size x
This is the relevant code
combs :: Int -> [a] -> [[a]]
combs 0 _ = [[ ]]
combs i (x:xs) = (filter (isLength i) y)
where y = subs (x:xs)
combs _ _ = [ ]
isLength :: Int -> [a] -> Bool
isLength i x
| length x == i = True
| otherwise = False
subs :: [a] -> [[a]]
subs [ ] = [[ ]]
subs (x : xs) = map (x:) ys ++ ys
where ys = subs xs
However, when I ask it to compute a combs 5 [1..52], e.g. a hand of 5 out of a full deck, it does not provide a result, and keeps running for a really long time
Does anyone know what the problem is and how to speed up this algorithm?
To extract i items from x:xs you can proceed in two ways:
you keep the x, and extract only i-1 elements from xs
you discard x, and extract all the i elements from xs
Hence, a solution is:
comb :: Int -> [a] -> [[a]]
comb 0 _ = [[]] -- only the empty list has 0 elements
comb _ [] = [] -- can not extract > 0 elements from []
comb i (x:xs) = [ x:ys | ys <- comb (i-1) xs ] -- keep x case
++ comb i xs -- discard x case
By the way, the above code also "proves" a well-known recursive formula for the binomial coefficients. You might already have met this formula if you attended a calculus class.
Letting B(k,n) = length (comb k [1..n]), we have
B(k+1,n+1) == B(k,n) + B(k+1,n)
which is just a direct consequence of the last line of the code above.
Right now it's a bit hard to see what you are trying to do - but I guess the problems you have is that you gonna filter and map a lot.
I think a simple way to get what you need is this:
module Combinations where
import Data.List (delete)
combs :: Eq a => Int -> [a] -> [[a]]
combs 0 _ = [[]]
combs i xs = [ y:ys | y <- xs, ys <- combs (i-1) (delete y xs) ]
which uses delete from Data.List
It should be lazy enough to find you combinations quick - of course all will take a while ;)
λ> take 5 $ combs 5 [1..52]
[[1,2,3,4,5],[1,2,3,4,6],[1,2,3,4,7],[1,2,3,4,8],[1,2,3,4,9]]
how does it work
it's one of those recursive combinatorial algorithm that works by selecting a first card y from all the cards xs, and then recursivley gets the rest of the handysfrom the deck without the selected carddelete a xsand then putting it back togethery:ys` inside the list-monad (here using list-comprehensions).
BTW: ther are 311,875,200 such decks ;)
version without list-comprehensions
here is a version without comprehensions in case your system has issues here:
combs :: Eq a => Int -> [a] -> [[a]]
combs 0 _ = [[]]
combs i xs = do
y <- xs
ys <- combs (i-1) (delete y xs)
return $ y:ys
version that will remove permutations
this one uses Ord to get sort the items in ascending order and in doing so removing duplciates in respect to permutaion - for this to work xs is expected to be pre-sorted!
Note chi's version is working with fewer constraints and might be more preformant too - but I thougt this is nice and readable and goes well with the version before so maybe it's of interest to you.
I know it's not a thing often done in Haskell/FP where you strife for the most general and abstract cases but I come form an environment where most strive for readability and understanding (coding for the programmer not only for the compiler) - so be gentle ;)
combs' :: Ord a => Int -> [a] -> [[a]]
combs' 0 _ = [[]]
combs' i xs = [ y:ys | y <- xs, ys <- combs' (i-1) (filter (> y) xs) ]

Splitting list into sublists in Haskell

How can I split a list into 2 sublists, where first sublist includes elements from begin of initial list and equals to first element, and second sublist contains others elements? I have to resolve this without using Prelude functions.
My base solution is:
partSameElems :: [a] -> ([a],[a])
partSameElems [] = ([],[])
partSameElems (x:xs) = fstList (x:xs) scdList (x:xs)
where
fstList (x:y:xs) = if x == y then x:y:fstList xs {- I need to do Nothing in else section? -}
scdList (x:xs) = x:scdList xs
For example:
[3,3,3,3,2,1,3,3,6,3] -> ([3,3,3,3], [2,1,3,3,6,3])
Now I can offer my version of solution:
partSameElems :: Eq a => [a] -> ([a],[a])
partSameElems [] = ([],[])
partSameElems (x:xs) = (fstList (x:xs), scdList (x:xs))
where
fstList [] _ = []
fstList (x:xs) el = if x == el then x:fstList xs el else []
scdList [] _ = []
scdList (x:xs) el = if x /= el then (x:xs) else scdList xs el
This is easier if you don't try to do it in two passes.
parSameElems [] = ([], [])
parSameElems lst = (reverse revxs, ys)
where (revxs, ys) = accum [] lst
accum xs [y] = ((y:xs), [])
accum xs (y1:y2:ys) | y1 == y2 = accum (y1:xs) (y2:ys)
| otherwise = ((y1:xs), (y2:ys))
Not sure you can use guard syntax in where clauses. You will also have to implement reverse yourself since you can't use Prelude, but that's easy.
Note: I haven't actually run this. Make sure you try and debug it.
Also, don't write the type signature yourself. Let ghci tell you. You got it wrong in your first try.
Another implementation can be
partition [] = ([],[])
partition xa#(x:xs) = (f,s)
where
f = takeWhile (==x) xa
s = drop (length f) xa
should be clear what it does.
> partition [3,3,3,3,2,1,3,3,6,3]
([3,3,3,3],[2,1,3,3,6,3])
I assume the "without resorting to Prelude function" means it's educational. Probably aimed at working on recursion, given it's manipulation of List data. So let's emphasize this
Recursive algorithms are simpler to express when input and output types are identical.
Let's rather say that instead of a list [3,3,3,3,2,1,3,3,6,3], your input data is composed of
the front list, but at this stage it's empty
the remainder, at this stage equals to input [3,3,3,2,1,3,3,6,3]
recursion input is then ([],[3,3,3,2,1,3,3,6,3])
The type of the central function will be ([a],[a]) -> ([a],[a])
Now, each recursion step will take the front element of the remainder and either put if in the front list or stop recursion (you reached the final state and can return the result)
module SimpleRecursion where
moveInFront :: (Eq a) => ([a],[a]) -> ([a],[a])
moveInFront (xs , [] ) = ( xs , [])
moveInFront ([] , y:ys ) = moveInFront ( y:[] , ys)
moveInFront (x:xs , y:ys ) = if x == y then moveInFront ( y:x:xs , ys)
else (x:xs, y:ys)
partSameElems :: (Eq a) => [a] -> ([a],[a])
partSameElems a = moveInFront ([],a)
What we have here is a classical recursion scheme, with
- stop condition (x /= y)
- recursion clause
- coverage of trivial cases
Notes :
- writing y:x:xs actually reverses the front list but since all values are equal the result is ok
Please don't do that kind of trick in the code of an actual program, it would come back to bite you eventually
- the function only works on lists of Equatable data (Eq a) => because the recursion / stop condition is the equality test ==

Haskell - Most frequent value

how can i get the most frequent value in a list example:
[1,3,4,5,6,6] -> output 6
[1,3,1,5] -> output 1
Im trying to get it by my own functions but i cant achieve it can you guys help me?
my code:
del x [] = []
del x (y:ys) = if x /= y
then y:del x y
else del x ys
obj x []= []
obj x (y:ys) = if x== y then y:obj x y else(obj x ys)
tam [] = 0
tam (x:y) = 1+tam y
fun (n1:[]) (n:[]) [] =n1
fun (n1:[]) (n:[]) (x:s) =if (tam(obj x (x:s)))>n then fun (x:[]) ((tam(obj x (x:s))):[]) (del x (x:s)) else(fun (n1:[]) (n:[]) (del x (x:s)))
rep (x:s) = fun (x:[]) ((tam(obj x (x:s))):[]) (del x (x:s))
Expanding on Satvik's last suggestion, you can use (&&&) :: (b -> c) -> (b -> c') -> (b -> (c, c')) from Control.Arrow (Note that I substituted a = (->) in that type signature for simplicity) to cleanly perform a decorate-sort-undecorate transform.
mostCommon list = fst . maximumBy (compare `on` snd) $ elemCount
where elemCount = map (head &&& length) . group . sort $ list
The head &&& length function has type [b] -> (b, Int). It converts a list into a tuple of its first element and its length, so when it is combined with group . sort you get a list of each distinct value in the list along with the number of times it occurred.
Also, you should think about what happens when you call mostCommon []. Clearly there is no sensible value, since there is no element at all. As it stands, all the solutions proposed (including mine) just fail on an empty list, which is not good Haskell. The normal thing to do would be to return a Maybe a, where Nothing indicates an error (in this case, an empty list) and Just a represents a "real" return value. e.g.
mostCommon :: Ord a => [a] -> Maybe a
mostCommon [] = Nothing
mostCommon list = Just ... -- your implementation here
This is much nicer, as partial functions (functions that are undefined for some input values) are horrible from a code-safety point of view. You can manipulate Maybe values using pattern matching (matching on Nothing and Just x) and the functions in Data.Maybe (preferable fromMaybe and maybe rather than fromJust).
In case you would like to get some ideas from code that does what you wish to achieve, here is an example:
import Data.List (nub, maximumBy)
import Data.Function (on)
mostCommonElem list = fst $ maximumBy (compare `on` snd) elemCounts where
elemCounts = nub [(element, count) | element <- list, let count = length (filter (==element) list)]
Here are few suggestions
del can be implemented using filter rather than writing your own recursion. In your definition there was a mistake, you needed to give ys and not y while deleting.
del x = filter (/=x)
obj is similar to del with different filter function. Similarly here in your definition you need to give ys and not y in obj.
obj x = filter (==x)
tam is just length function
-- tam = length
You don't need to keep a list for n1 and n. I have also made your code more readable, although I have not made any changes to your algorithm.
fun n1 n [] =n1
fun n1 n xs#(x:s) | length (obj x xs) > n = fun x (length $ obj x xs) (del x xs)
| otherwise = fun n1 n $ del x xs
rep xs#(x:s) = fun x (length $ obj x xs) (del x xs)
Another way, not very optimal but much more readable is
import Data.List
import Data.Ord
rep :: Ord a => [a] -> a
rep = head . head . sortBy (flip $ comparing length) . group . sort
I will try to explain in short what this code is doing. You need to find the most frequent element of the list so the first idea that should come to mind is to find frequency of all the elements. Now group is a function which combines adjacent similar elements.
> group [1,2,2,3,3,3,1,2,4]
[[1],[2,2],[3,3,3],[1],[2],[4]]
So I have used sort to bring elements which are same adjacent to each other
> sort [1,2,2,3,3,3,1,2,4]
[1,1,2,2,2,3,3,3,4]
> group . sort $ [1,2,2,3,3,3,1,2,4]
[[1,1],[2,2,2],[3,3,3],[4]]
Finding element with the maximum frequency just reduces to finding the sublist with largest number of elements. Here comes the function sortBy with which you can sort based on given comparing function. So basically I have sorted on length of the sublists (The flip is just to make the sorting descending rather than ascending).
> sortBy (flip $ comparing length) . group . sort $ [1,2,2,3,3,3,1,2,4]
[[2,2,2],[3,3,3],[1,1],[4]]
Now you can just take head two times to get the element with the largest frequency.
Let's assume you already have argmax function. You can write
your own or even better, you can reuse list-extras package. I strongly suggest you
to take a look at the package anyway.
Then, it's quite easy:
import Data.List.Extras.Argmax ( argmax )
-- >> mostFrequent [3,1,2,3,2,3]
-- 3
mostFrequent xs = argmax f xs
where f x = length $ filter (==x) xs

Resources