How to group similar items in a list using Haskell? - haskell

Given a list of tuples like this:
dic = [(1,"aa"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg"),(1,"bb")]
How to group items of dic resulting in a list grp where,
grp = [(1,["aa","bb","cc"]), (2, ["aa"]), (3, ["ff","gg"])]
I'm actually a newcomer to Haskell...and seems to be falling in love with it..
Using group or groupBy in Data.List will only group similar adjacent items in a list.
I wrote an inefficient function for this, but it results in memory failures as I need to process a very large coded string list. Hope you would help me find a more efficient way.

Whenever possible, reuse library code.
import Data.Map
sortAndGroup assocs = fromListWith (++) [(k, [v]) | (k, v) <- assocs]
Try it out in ghci:
*Main> sortAndGroup [(1,"aa"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg"),(1,"bb")]
fromList [(1,["bb","cc","aa"]),(2,["aa"]),(3,["gg","ff"])]
EDIT In the comments, some folks are worried about whether (++) or flip (++) is the right choice. The documentation doesn't say which way things get associated; you can find out by experimenting, or you can sidestep the whole issue using difference lists:
sortAndGroup assocs = ($[]) <$> fromListWith (.) [(k, (v:)) | (k, v) <- assocs]
-- OR
sortAndGroup = fmap ($[]) . M.fromListWith (.) . map (fmap (:))
These alternatives are about the same length as the original, but they're a bit less readable to me.

Here's my solution:
import Data.Function (on)
import Data.List (sortBy, groupBy)
import Data.Ord (comparing)
myGroup :: (Eq a, Ord a) => [(a, b)] -> [(a, [b])]
myGroup = map (\l -> (fst . head $ l, map snd l)) . groupBy ((==) `on` fst)
. sortBy (comparing fst)
This works by first sorting the list with sortBy:
[(1,"aa"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg"),(1,"bb")]
=> [(1,"aa"),(1,"bb"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg")]
then grouping the list elements by the associated key with groupBy:
[(1,"aa"),(1,"bb"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg")]
=> [[(1,"aa"),(1,"bb"),(1,"cc")],[(2,"aa")],[(3,"ff"),(3,"gg")]]
and then transforming the grouped items to tuples with map:
[[(1,"aa"),(1,"bb"),(1,"cc")],[(2,"aa")],[(3,"ff"),(3,"gg")]]
=> [(1,["aa","bb","cc"]), (2, ["aa"]), (3, ["ff","gg"])]`)
Testing:
> myGroup dic
[(1,["aa","bb","cc"]),(2,["aa"]),(3,["ff","gg"])]

Also you can use TransformListComp extension, for example:
Prelude> :set -XTransformListComp
Prelude> import GHC.Exts (groupWith, the)
Prelude GHC.Exts> let dic = [ (1, "aa"), (1, "bb"), (1, "cc") , (2, "aa"), (3, "ff"), (3, "gg")]
Prelude GHC.Exts> [(the key, value) | (key, value) <- dic, then group by key using groupWith]
[(1,["aa","bb","cc"]),(2,["aa"]),(3,["ff","gg"])]

If the list is not sorted on the first element, I don't think you can do better than O(nlog(n)).
One simple way would be to just sort and then use anything from the answer of second part.
You can use from Data.Map a map like Map k [a] to use first element of tuple as key and keep on adding to the values.
You can write your own complex function, which even after you all the attempts will still take O(nlog(n)).
If list is sorted on the first element as is the case in your example, then the task is trivial for something like groupBy as given in the answer by #Mikhail or use foldr and there are numerous other ways.
An example of using foldr is here:
grp :: Eq a => [(a,b)] -> [(a,[b])]
grp = foldr f []
where
f (z,s) [] = [(z,[s])]
f (z,s) a#((x,y):xs) | x == z = (x,s:y):xs
| otherwise = (z,[s]):a

{-# LANGUAGE TransformListComp #-}
import GHC.Exts
import Data.List
import Data.Function (on)
process :: [(Integer, String)] -> [(Integer, [String])]
process list = [(the a, b) | let info = [ (x, y) | (x, y) <- list, then sortWith by y ], (a, b) <- info, then group by a using groupWith]

Related

Haskell - Sorting sublist

I have a list with some sublists and I need to order it by length.
For example:
[[1,3,4,9],[2,4],[5,4,7]] would become [[2,4],[5,4,7],[1,3,4,9]].
You can make use of sortOn :: Ord b => (a -> b) -> [a] -> [a] to sort items in a list based on the result of a function called on the elements, for example:
Prelude> import Data.List(sortOn)
Prelude Data.List> sortOn length [[1,3,4,9],[2,4],[5,4,7]]
[[2,4],[5,4,7],[1,3,4,9]]
You can use comparing to generate a custom comparison function with the length function, the sort the list with the builtin sortBy function:
import Data.List
import Data.Ord
x :: [[Int]]
x = sortBy (comparing length) [[1,3,4,9],[2,4],[5,4,7]]
-- x == [[2,4],[5,4,7],[1,3,4,9]]

Finding elements and their indices in a list

I need to have both the elements of a list satisfying a predicate and the indices of these elements. I can achieve this as follows:
import Data.List (findIndices)
list :: [Int]
list = [3,2,4,1,9]
indices = findIndices (>2) list
elems = [list!!i | i <- indices]
-- same as: elems = filter (>2) list
Isn't there a package providing a function giving both the elements and their indices in "one shot" ? I'm surprised I don't find this function somewhere. Otherwise, how to do such a function, improving my above code ? I don't believe this code is optimal since it somehow accesses twice to the elements of the list. I took a quick look at the source code of findIndices but I don't understand it yet.
You can make it more efficient – avoid the !! access – by filtering a list of (index, element) tuples.
let (indices, elems) = unzip [(i, x) | (i, x) <- zip [0..] list, x > 2]
Split into an appropriate function:
findItems :: (a -> Bool) -> [a] -> [(Int, a)]
findItems predicate = filter (predicate . snd) . zip [0..]
let (indices, elems) = unzip $ findItems (>2) list
There might be a more straightforward way, and I’ll be happy to find out about it :)
I think Ry's suggestion is just fine. For a more direct, and in particular more generic one, you could use lens tooling:
Prelude> import Control.Lens as L
Prelude L> import Control.Arrow as A
Prelude L A> ifoldr (\i x -> if x>2 then (i:)***(x:) else id) ([],[]) [3,2,4,1,9]
([0,2,4],[3,4,9])
This can immediately be used also on arrays (where the index extraction is much more useful)
Prelude L A> import qualified Data.Vector as V
Prelude L A V> ifoldr (\i x -> if x>2 then (i:)***(x:) else id) ([],[]) $ V.fromList [3,2,4,1,9]
([0,2,4],[3,4,9])
...even on unboxed ones, though these aren't Foldable:
Prelude L A V> import qualified Data.Vector.Unboxed as VU
Prelude L A V VU> import Data.Vector.Generic.Lens as V
ifoldrOf vectorTraverse (\i x -> if x>2 then (i:)***(x:) else id) ([],[]) $ VU.fromList [3,2,4,1,9]
([0,2,4],[3.0,4.0,9.0])
(indices, elems) = unzip [ item | item <- zip [0..] ls, (snd item) > 2 ]
Not sure that it's any more efficient, but it gets it done in "one shot".

Generating all ways of applying a function to a single element with lens

This question is based on to the 11th advent of code task. It basically is a more general version of the river crossing puzzle, you can go up and down floors while carrying one or two items each step. The goal is to bring up all items to the 4th floor.
This is fairly straightforward to solve with an A* search but finding the neighboring states is somewhat annoying.
When solving the puzzle originally I just created masks for all items on the current floor and then used the list monad to generate the combinations - slow and awkward but it works. I figured that there would be an elegant solution using lenses, though.
An easy solution could use a function that returns all options of moving a single item from floor x to floor y. Is there a way to get all combinations of applying a function to a single element using lenses? i.e. f 1 2 [(1, 0), (1, 2)] = [[(2, 0) (1, 2)], [(1, 0), (2, 2)]]
For the sake of reference, this is the best I could come up with so far, slightly simplified:
import Control.Lens
import Data.List (sort)
import Data.Set (fromList, Set)
type GenFloor = Int
type ChipFloor = Int
type State = [(GenFloor, ChipFloor)]
neighborStates :: Int -> State -> Set State
neighborStates currentFloor state = finalize $ createStatesTowards =<< [pred, succ]
where
createStatesTowards direction = traverseOf (traverse . both) (moveTowards direction) state
moveTowards direction i
| i == currentFloor = [direction i, i]
| otherwise = [i]
finalize = fromList . map sort . filter valid
valid = (&&) <$> validCarry <*> validFloors
validCarry = (`elem` [1..2]) . carryCount
carryCount = length . filter (uncurry (/=)) . zip state
validFloors = allOf (traverse . each) (`elem` [1..4])
An easy solution could use a function that returns all options of moving a single item from floor x to floor y. Is there a way to get all combinations of applying a function to a single element using lenses? i.e. f 1 2 [(1, 0), (1, 2)] = [[(2, 0) (1, 2)], [(1, 0), (2, 2)]]
holesOf can do that. Quoting the relevant simplified signature from the documentation:
holesOf :: Traversal' s a -> s -> [Pretext' (->) a s]
Given a traversal, holesOf will generate a list of contexts focused on each element targeted by the traversal. peeks from Control.Comonad.Store can then be used to, from each context, modify the focused target and recreate the surrounding structure:
import Control.Lens
import Control.Comonad.Store
-- allMoves :: Int -> Int -> State -> [State]
allMoves :: (Traversable t, Eq a) => a -> a -> t (a, b) -> [t (a, b)]
allMoves src dst its = peeks (changeFloor src dst) <$> holesOf traverse its
where
-- changeFloor :: Int -> Int -> (Int, Int) -> (Int, Int)
changeFloor src dst = over both (\x -> if x == src then dst else x)
GHCi> allMoves 1 2 [(1,0),(1,2)]
[[(2,0),(1,2)],[(1,0),(2,2)]]

How can I efficiently filter a pair of lists in Haskell?

I'm working on a simple problem on Programming Praxis: remove all duplicates from a list without changing the order. Assuming the elements are in class Ord, I came up with the following:
import Data.Set (Set)
import qualified Data.Set as Set
buildsets::Ord a => [a] -> [Set a]
buildsets = scanl (flip Set.insert) Set.empty
nub2::Ord a => [a] -> [a]
nub2 thelist = map fst $ filter (not . uncurry Set.member) (zip thelist (buildsets thelist))
As you can see, the buildsets function gets me most of the way there, but that last step (nub2) of putting everything together looks absolutely horrible. Is there a cleaner way to accomplish this?
Since we have to filter the list and we should probably use some set to keep records, we might as well use filterM with the state monad:
import qualified Data.Set as S
import Control.Monad.State.Strict
nub2 :: Ord a => [a] -> [a]
nub2 = (`evalState` S.empty) . filterM go where
go x = state $ \s -> if S.member x s
then (False, s)
else (True, S.insert x s)
If I wanted to somewhat golf the function, I'd to the following:
import Control.Arrow (&&&)
nub2 = (`evalState` S.empty) . filterM (\x -> state (S.notMember x &&& S.insert x))
Simple recursion looks ok to me.
> g xs = go xs S.empty where
> go [] _ = []
> go (x:xs) a | S.member x a = go xs a
> | otherwise = x:go xs (S.insert x a)
Based directly on Sassa NF's suggestion, but with a slight type change for cleanliness:
g x = catMaybes $ unfoldr go (Set.empty, x)
where
go (_,[]) = Nothing
go (s,(x:xs)) = Just (if Set.member x s then Nothing else Just x,
(Set.insert x s, xs))
Sometimes it really cleans up code to pull out and name subpieces. (In some ways this really is the Haskell way to comment code)
This is wordier that what you did above, but I think it is much easier to understand....
First I start with some definitions:
type Info=([Int], S.Set Int) --This is the remaining and seen items at a point in the list
item=head . fst --The current item
rest=fst --Future items
seen=snd --The items already seen
Then I add two self descriptive helper functions:
itemHasBeenSeen::Info->Bool
itemHasBeenSeen info = item info `S.member` seen info
moveItemToSet::Info->Info
moveItemToSet info = (tail $ rest info, item info `S.insert` seen info)
With this the program becomes:
nub2::[Int]->[Int]
nub2 theList =
map item
$ filter (not . itemHasBeenSeen)
$ takeWhile (not . null . rest)
$ iterate moveItemToSet start
where start = (theList, S.empty)
Reading from bottom to top (just as the data flows), you can easily see what it happening:
start=(theList, S.empty), start with the full list, and an empty set.
iterate moveItemToSet start, repeatedly move the first item of the list into the set, saving each iteration of Info in an array.
takeWhile (not . null . rest)- Stop the iteration when you run out of elements.
filter (not . itemHasBeenSeen)- Remove items that have already been seen.
map item- Throw away the helper values....

Getting Unique Paths from list of tuple

Given a tuple of lists, I need to find all unique path from that:
Example I/P: [(1,2),(2,3),(3,4),(9,11),(4,5),(5,6),(6,7),(3,9)]
O/P: [[(1,2),(2,3),(3,4),(4,5),(5,6),(6,7)],[(1,2),(2,3),(3,9),(9,11)]]
Two tuples can connect if the second element of the tuple matches with the first element of the other tuple i.e: One tuple is (_,a) and other tuple is like (a,_).
What is the most efficient implementation for this ? I need to find the best data structure suited for it. Any suggestions ? The number of tuples in which I will execute the algorithm will be like more than 400,000.
{-# LANGUAGE NoMonomorphismRestriction #-}
import Data.List (permutations, nub)
path :: Eq a => [(a, a)] -> [(a, a)]
path [] = []
path [x] = [x]
path (u#(_, a):v#(b, _):xs) = if a == b then u:path (v:xs) else [u]
allPaths = nub . map path . permutations
(you can optimize chain generation but I think this problem has exponential time complexity)
EDITED
In general, you must to define more preciselly what paths you want to return.
Ignoring cycle invariant ([(1,2),(2,3),(3,1)] == [(2,3),(3,1),(1,3)]) you can generate all paths (without using permutations)
{-# LANGUAGE NoMonomorphismRestriction #-}
import Data.List (permutations, nub, sortBy, isInfixOf)
data Tree a = Node a [Tree a] deriving Show
treeFromList :: Eq a => a -> [(a, a)] -> Tree a
treeFromList a [] = Node a []
treeFromList a xs = Node a $ map subTree $ filter ((a==).fst) xs
where subTree v#(_, b) = treeFromList b $ filter (v/=) xs
treesFromList :: Eq a => [(a, a)] -> [Tree a]
treesFromList xs = map (flip treeFromList xs) $ nub $ map fst xs ++ map snd xs
treeToList :: Tree a -> [[a]]
treeToList (Node a []) = [[a]]
treeToList (Node a xs) = [a:ws | ws <- concatMap treeToList xs]
treesToList :: [Tree a] -> [[a]]
treesToList = concatMap treeToList
uniqTrees :: Eq a => [[a]] -> [[a]]
uniqTrees = f . reverse . sortBy ((.length).compare.length)
where f [] = []
f (x:xs) = x: filter (not.flip isInfixOf x) (f xs)
allPaths = uniqTrees . treesToList . treesFromList
then
*Main> allPaths [(1, 2), (1, 3), (2, 3), (2, 4), (3, 4), (4, 1)]
[[2,4,1,2,3,4],[2,3,4,1,2,4],[1,3,4,1,2,4],[1,3,4,1,2,3],[1,2,4,1,3,4],[1,2,3,4,1,3]]
uniqTrees has poor efficiency and, in general, you can do many optimizations.
If you want to avoid cycle invariant, you can normalize a cycle selecting minimum base10 representation, in previous example ([(1,2),(2,3),(3,1)] == [(2,3),(3,1),(1,3)]) 1231 < 2313 then
normalize [(2,3),(3,1),(1,3)] == [(1,2),(2,3),(3,1)]
you can normalize a path rotating it n-times and taking "head . sortBy toBase10 . rotations".
I think your problem fits on the NP category since:
A Hamiltonian path, also called a Hamilton path, is a path between two
vertices of a graph that visits each vertex exactly once.
In general, the problem of finding a Hamiltonian path is NP-complete
(Garey and Johnson 1983, pp. 199-200), so the only known way to
determine whether a given general graph has a Hamiltonian path is to
undertake an exhaustive search (source)
You problem is even "harder" since you don't know before hand what will be the end node.
In terms of data structure you can try to simulate the hash table structure in Haskell, since this data type is commonly use in graph and you problem can be turn into a graph.

Resources