Combine equivalent items in a list

Combine equivalent items in a list - haskell

Let's say I have the following type
type Key = String
type Score = Int
data Thing = Thing Key Score
And if I have an array of them like this:
[Thing "a" 7, Thing "b" 5, Thing "a" 10]
Is there a standard way to reduce this so that I don't have any duplicate keys? If two keys match, I want to take the better score
[Thing "b" 5, Thing "a" 10]

A very simple solution is to use Data.Map.fromListWith, which converts a list of key-value pairs to a map, given a function to combine multiple values with the same key.
Prelude Data.Map> fromListWith max [("a", 7), ("b", 5), ("a", 10)]
fromList [("a",10),("b",5)]
Note that this expects tuples, so convert as necessary. Also, it does not preserve the order of the input elements. Run time is O(n log n).

Basically first we must decide what is problem solving and what is implementation difficulties. So what If we first sort by Score, and then just keep the first occurrences in the sorted list with respect to Key? That should work, let's look at the haskell implementation:
import Data.List
import Data.Function
type Key = String
type Score = Int
data Thing = Thing { key :: Key, score :: Score }
deriving (Show)
myNub = nubBy ((==) `on` key)
mySort = sortBy (compare `on` (negate . score))
selectFinest = myNub . mySort
Now we try run this in ghci:
Prelude> :load Test.hs
[1 of 1] Compiling Main ( Test.hs, interpreted )
Ok, modules loaded: Main.
*Main> selectFinest [Thing "a" 7, Thing "b" 5, Thing "a" 10]
[Thing {key = "a", score = 10},Thing {key = "b", score = 5}]
Checkout hoogle if you are uncertain about the functions I used in the solution. It indeed takes some time to learn how to use on and those functions.

I'm posting a O(n log n) solution, since everyone seems fine with being O(n^2)
consolidate :: (Ord a, Ord b) => [Thing a b] -> [Thing a b]
consolidate xs =
max_from_each_group (sortBy (compare `on` getKey) xs)
where
max_from_each_group [] = []
max_from_each_group (x:xs) =
let (same_key, rest) = span (\t -> x == getKey t) xs in
let group_max = maximumBy (compare `on` getValue) (x:same_key) in
group_max : max_from_each_group rest

Here is my feeble attempt. There surely is a nicer way but I'm not
much of a Haskell programmer.
import Data.List
type Key = String
type Score = Int
data Thing = Thing Key Score
deriving (Show, Ord)
instance Eq Thing where
(Thing k1 _) == (Thing k2 _) = k1 == k2
(Thing k1 _) /= (Thing k2 _) = k1 /= k2
thingSort :: [Thing] -> [Thing]
thingSort = Data.List.sortBy (flip compare)
ex = [Thing "a" 7, Thing "b" 5, Thing "a" 10]
filtered = nub (thingSort ex)

Related

Haskell text encoder

I am new to Haskell and would like some direction to solving my problem. I wanted to have a text encode function that list in which each word of the text is represented by its index. For e.g. :
["The more I like, the more I love.","The more I love, the more I hate."]
the output might be
(["The", "more", "I", "like", "the", "love.", "love,", "hate."],
[1, 2, 3, 4, 5, 2, 3, 6, 1, 2, 3, 7, 1, 2, 3, 8])
I have done the remove duplication part
removeDuplicates :: Eq a => [a] -> [a]
removeDuplicates = rdHelper []
where rdHelper seen [] = seen
rdHelper seen (x:xs)
| x `elem` seen = rdHelper seen xs
| otherwise = rdHelper (seen ++ [x]) xs

You can just iterate over the list of words and accumulate the unique words and its indexes. If the element is in the accumulated list, append the index to the accumulated list of indexes. If the element isn't in the list, append the new index (length of the list of words + 1).
To be honest, Haskell code is more understandable, than my description:
import Data.List (findIndex)
build :: ([String], [Int]) -> String -> ([String], [Int])
build (words, indexes) word =
let
maybeIndex = findIndex (== word) words
in
case maybeIndex of
Just index ->
(words, indexes ++ [index + 1])
Nothing ->
(words ++ [word], indexes ++ [(+1) . length $ words])
buildIndexes =
let
listOfWords = words "The more I like, the more I love. The more I love, the more I hate."
in
foldl build ([], []) listOfWords
Here I have a concatenated string as an input
"The more I like, the more I love. The more I love, the more I hate."
Feel free to tailor the code for your needs.
By the way, it might be more performant to insert the elements at the beginning of the lists and then reverse the resulting lists.
import Data.List (findIndex)
build :: ([String], [Int]) -> String -> ([String], [Int])
build (words, indexes) word =
let
maybeIndex = findIndex (== word) words
in
case maybeIndex of
Just index ->
(words, (index + 1) : indexes)
Nothing ->
(word : words, ((+1) . length $ words) : indexes)
buildIndexes =
let
listOfWords = words "The more I like, the more I love. The more I love, the more I hate."
(listOfUniqueWords, listOfIndexes) = foldl build ([], []) listOfWords
in
(reverse listOfUniqueWords, reverse listOfIndexes)

I guess the Data.Map and Data.Set packages are ideal tools to solve this job efficiently. My implementation would be as follows;
import qualified Data.Map.Lazy as Map
import qualified Data.Set as Set
encode :: [String] -> ([String],[[Int]])
encode wss = let dict = Map.fromList . zip (Set.toList . Set.unions . map (Set.fromList . words) $ wss) $ [1..]
in (map fst $ Map.toList dict, map (map (flip (Map.findWithDefault 0) dict) . words) wss)
*Main> encode ["Are you allright", "Hey there how are you", "Hello there", "Do you like coffee"]
(["Are","Do","Hello","Hey","allright","are","coffee","how","like","there","you"],[[1,11,5],[4,10,8,6,11],[3,10],[2,11,9,7]])

Generating all ways of applying a function to a single element with lens

This question is based on to the 11th advent of code task. It basically is a more general version of the river crossing puzzle, you can go up and down floors while carrying one or two items each step. The goal is to bring up all items to the 4th floor.
This is fairly straightforward to solve with an A* search but finding the neighboring states is somewhat annoying.
When solving the puzzle originally I just created masks for all items on the current floor and then used the list monad to generate the combinations - slow and awkward but it works. I figured that there would be an elegant solution using lenses, though.
An easy solution could use a function that returns all options of moving a single item from floor x to floor y. Is there a way to get all combinations of applying a function to a single element using lenses? i.e. f 1 2 [(1, 0), (1, 2)] = [[(2, 0) (1, 2)], [(1, 0), (2, 2)]]
For the sake of reference, this is the best I could come up with so far, slightly simplified:
import Control.Lens
import Data.List (sort)
import Data.Set (fromList, Set)
type GenFloor = Int
type ChipFloor = Int
type State = [(GenFloor, ChipFloor)]
neighborStates :: Int -> State -> Set State
neighborStates currentFloor state = finalize $ createStatesTowards =<< [pred, succ]
where
createStatesTowards direction = traverseOf (traverse . both) (moveTowards direction) state
moveTowards direction i
| i == currentFloor = [direction i, i]
| otherwise = [i]
finalize = fromList . map sort . filter valid
valid = (&&) <$> validCarry <*> validFloors
validCarry = (`elem` [1..2]) . carryCount
carryCount = length . filter (uncurry (/=)) . zip state
validFloors = allOf (traverse . each) (`elem` [1..4])

An easy solution could use a function that returns all options of moving a single item from floor x to floor y. Is there a way to get all combinations of applying a function to a single element using lenses? i.e. f 1 2 [(1, 0), (1, 2)] = [[(2, 0) (1, 2)], [(1, 0), (2, 2)]]
holesOf can do that. Quoting the relevant simplified signature from the documentation:
holesOf :: Traversal' s a -> s -> [Pretext' (->) a s]
Given a traversal, holesOf will generate a list of contexts focused on each element targeted by the traversal. peeks from Control.Comonad.Store can then be used to, from each context, modify the focused target and recreate the surrounding structure:
import Control.Lens
import Control.Comonad.Store
-- allMoves :: Int -> Int -> State -> [State]
allMoves :: (Traversable t, Eq a) => a -> a -> t (a, b) -> [t (a, b)]
allMoves src dst its = peeks (changeFloor src dst) <$> holesOf traverse its
where
-- changeFloor :: Int -> Int -> (Int, Int) -> (Int, Int)
changeFloor src dst = over both (\x -> if x == src then dst else x)
GHCi> allMoves 1 2 [(1,0),(1,2)]
[[(2,0),(1,2)],[(1,0),(2,2)]]

Get third element of tuple

I have a tuple (1, 2, 3) and want to get the third element, however, I keep getting type errors.
Please see the below code:
third (hd : tl) = snd tl
third tpl = head$tail$tail tpl
How can I fix the type errors that are occuring and get the third element correctly?

Tuples aren't lists
In you're code, you're manipulating lists, :, head and tail all work on lists. So
third tpl = head . tail . tail . tail $ tpl
third' (_:_:x:_) = x
Will give you the third element.
a = [1, 2, 3]
>> third a
3
>> third (1, 2, 3)
Error expecting list, but got tuple
Instead you're going to have to use a function of type
thd :: (a, b, c) -> c
This function doesn't exist in the standard libs, it's completely trivial
thd (_, _, a) = a
And that's it :)

You're getting tuples confused with lists:
-- Tuples: Fixed length, mixed types, uses parenthesis
myTuple :: (Int, String)
myTuple = (1, "Hello")
-- Lists: Variable length, single type, uses square brackets
myList :: [Int]
myList = [1, 2, 3, 4]
-- Pattern matching a 3-tuple
isRightTriangle :: (Int, Int, Int) -> Bool
isRightTriangle (a, b, c) = a^2 + b^2 == c^2
-- Pattern matching a 2-tuple
name :: (String, Int) -> String
name (n, a) = n
-- or: name = fst
age :: (String, Int) -> Int
age (n, a) = a
-- or: age = snd
-- Pattern matching a list
thirdElem :: [a] -> a
thirdElem (x1:x2:x3:xs) = x3
thirdElem _ = error "List must have at least 3 elements"
-- or: thirdElem xs = head $ tail $ tail xs
If you haven't already, you should check out Learn You a Haskell For Great Good. It's a great, fun to read introduction to Haskell, starting out with the basic types like strings, tuples, numbers, and lists.

List processing in Haskell

I am teaching myself Haskell and have run into a problem and need help.
Background:
type AInfo = (Char, Int)
type AList = [AInfo] (let’s say [(‘a’, 2), (‘b’,5), (‘a’, 1), (‘w’, 21)]
type BInfo = Char
type BList = [BInfo] (let’s say [‘a’, ‘a’, ‘c’, ‘g’, ‘a’, ‘w’, ‘b’]
One quick edit: The above information is for illustrative purposes only. The actual elements of the lists are a bit more complex. Also, the lists are not static; they are dynamic (hence the uses of the IO monad) and I need to keep/pass/"return"/have access to and change the lists during the running of the program.
I am looking to do the following:
For all elements of AList check against all elements of BList and where the character of the AList element (pair) is equal to the character in the Blist add one to the Int value of the AList element (pair) and remove the character from BList.
So what this means is after the first element of AList is checked against all elements of BList the values of the lists should be:
AList [(‘a’, 5), (‘b’,5), (‘a’, 1), (‘w’, 21)]
BList [‘c’, ‘g’, ‘w’, ‘b’]
And in the end, the lists values should be:
AList [(‘a’, 5), (‘b’,6), (‘a’, 1), (‘w’, 22)]
BList [‘c’, ‘g’]
Of course, all of this is happening in an IO monad.
Things I have tried:
Using mapM and a recursive helper function. I have looked at both:
Every element of AList checked against every element of bList -- mapM (myHelpF1 alist) blist and
Every element of BList checked against every element of AList – mapM (myHelpF2 alist) blist
Passing both lists to a function and using a complicated
if/then/else & helper function calls (feels like I am forcing
Haskell to be iterative; Messy convoluted code, Does not feel
right.)
I have thought about using filter, the character value of AList
element and Blist to create a third list of Bool and the count the
number of True values. Update the Int value. Then use filter on
BList to remove the BList elements that …… (again Does not feel
right, not very Haskell-like.)
Things I think I know about the problem:
The solution may be exceeding trivial. So much so, the more experienced Haskellers will be muttering under their breath “what a noob” as they type their response.
Any pointers would be greatly appreciated. (mutter away….)

A few pointers:
Don't use [(Char, Int)] for "AList". The data structure you are looking for is a finite map: Map Char Int. Particularly look at member and insertWith. toList and fromList convert from the representation you currently have for AList, so even if you are stuck with that representation, you can convert to a Map for this algorithm and convert back at the end. (This will be more efficient than staying in a list because you are doing so many lookups, and the finite map API is easier to work with than lists)
I'd approach the problem as two phases: (1) partition out the elements of blist by whether they are in the map, (2) insertWith the elements which are already in the map. Then you can return the resulting map and the other partition.
I would also get rid of the meaningless assumptions such as that keys are Char -- you can just say they are any type k (for "key") that satisfies the necessary constraints (that you can put it in a Map, which requires that it is Orderable). You do this with lowercase type variables:
import qualified Data.Map as Map
sieveList :: (Ord k) => Map.Map k Int -> [k] -> (Map.Map k Int, [k])
Writing algorithms in greater generality helps catch bugs, because it makes sure that you don't use any assumptions you don't need.
Oh, also this program has no business being in the IO monad. This is pure code.

import Data.List
type AInfo = (Char, Int)
type AList = [AInfo]
type BInfo = Char
type BList = [BInfo]
process :: AList -> BList -> AList
process [] _ = []
process (a:as) b = if is_in a b then (fst a,snd a + 1):(process as (delete (fst a) b)) else a:process as b where
is_in f [] = False
is_in f (s:ss) = if fst f == s then True else is_in f ss
*Main> process [('a',5),('b',5),('a',1),('b',21)] ['c','b','g','w','b']
[('a',5),('b',6),('a',1),('b',22)]
*Main> process [('a',5),('b',5),('a',1),('w',21)] ['c','g','w','b']
[('a',5),('b',6),('a',1),('w',22)]
Probably an important disclaimer: I'm rusty at Haskell to the point of ineptness, but as a relaxing midnight exercise I wrote this thing. It should do what you want, although it doesn't return a BList. With a bit of modification, you can get it to return an (AList,BList) tuple, but methinks you'd be better off using an imperative language if that kind of manipulation is required.
Alternately, there's an elegant solution and I'm too ignorant of Haskell to know it.

While I am by no means a Haskell expert, I have a partial attempt that returns that result of an operation once. Maybe you can find out how to map it over the rest to get your solution. The addwhile is clever, since you only want to update the first occurrence of an element in lista, if it exists twice, it will just add 0 to it. Code critiques are more than welcome.
import Data.List
type AInfo = (Char, Int)
type AList = [AInfo]
type BInfo = Char
type BList = [BInfo]
lista = ([('a', 2), ('b',5), ('a', 1), ('w', 21)] :: AList)
listb = ['a','a','c','g','a','w','b']
--step one, get the head, and its occurrences
items list = (eleA, eleB) where
eleA = length $ filter (\x -> x == (head list)) list
eleB = head list
getRidOfIt list ele = (dropWhile (\x -> x == ele) list) --drop like its hot
--add to lista
addWhile :: [(Char, Int)] -> Char -> Int -> [(Char,Int)]
addWhile [] _ _ = []
addWhile ((x,y):xs) letter times = if x == letter then (x,y+times) : addWhile xs letter times
else (x,y) : addWhile xs letter 0
--first answer
firstAnswer = addWhile lista (snd $ items listb) (fst $ items listb)
--[('a',5),('b',5),('a',1),('w',21)]

The operation you describe is pure, as #luqui points out, so we just define it as a pure Haskell function. It can be used inside a monad (including IO) by means of fmap (or do).
import Data.List
combine alist blist = (reverse a, b4) where
First we sort and count the B list:
b = map (\g->(head g,length g)) . group . sort $ blist
We need the import for group and sort to be available. Next, we roll along the alist and do our thing:
(a,b2) = foldl g ([],b) alist
g (acc,b) e#(x,c) = case pick x b of
Nothing -> (e:acc,b)
Just (n,b2) -> ((x,c+n):acc,b2)
b3 = map fst b2
b4 = [ c | c <- blist, elem c b3 ]
Now pick, as used, must be
pick x [] = Nothing
pick x ((y,n):t)
| x==y = Just (n,t)
| otherwise = case pick x t of Nothing -> Nothing
Just (k,r) -> Just (k, (y,n):r)
Of course pick performs a linear search, so if performance (speed) becomes a problem, b should be changed to allow for binary search (tree etc, like Map). The calculation of b4 which is filter (`elem` b3) blist is another potential performance problem with its repeated checks for presence in b3. Again, checking for presence in trees is faster than in lists, in general.
Test run:
> combine [('a', 2), ('b',5), ('a', 1), ('w', 21)] "aacgawb"
([('a',5),('b',6),('a',1),('w',22)],"cg")
edit: you probably want it the other way around, rolling along the blist while updating the alist and producing (or not) the elements of blist in the result (b4 in my code). That way the algorithm will operate in a more local manner on long input streams (that assuming your blist is long, though you didn't say that). As written above, it will have a space problem, consuming the input stream blist several times over. I'll keep it as is as an illustration, a food for thought.
So if you decide to go the 2nd route, first convert your alist into a Map (beware the duplicates!). Then, scanning (with scanl) over blist, make use of updateLookupWithKey to update the counts map and at the same time decide for each member of blist, one by one, whether to output it or not. The type of the accumulator will thus have to be (Map a Int, Maybe a), with a your element type (blist :: [a]):
scanl :: (acc -> a -> acc) -> acc -> [a] -> [acc]
scanning = tail $ scanl g (Nothing, fromList $ reverse alist) blist
g (_,cmap) a = case updateLookupWithKey (\_ c->Just(c+1)) a cmap of
(Just _, m2) -> (Nothing, m2) -- seen before
_ -> (Just a, cmap) -- not present in counts
new_b_list = [ a | (Just a,_) <- scanning ]
last_counts = snd $ last scanning
You will have to combine the toList last_counts with the original alist if you have to preserve the old duplicates there (why would you?).

How to group similar items in a list using Haskell?

Given a list of tuples like this:
dic = [(1,"aa"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg"),(1,"bb")]
How to group items of dic resulting in a list grp where,
grp = [(1,["aa","bb","cc"]), (2, ["aa"]), (3, ["ff","gg"])]
I'm actually a newcomer to Haskell...and seems to be falling in love with it..
Using group or groupBy in Data.List will only group similar adjacent items in a list.
I wrote an inefficient function for this, but it results in memory failures as I need to process a very large coded string list. Hope you would help me find a more efficient way.

Whenever possible, reuse library code.
import Data.Map
sortAndGroup assocs = fromListWith (++) [(k, [v]) | (k, v) <- assocs]
Try it out in ghci:
*Main> sortAndGroup [(1,"aa"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg"),(1,"bb")]
fromList [(1,["bb","cc","aa"]),(2,["aa"]),(3,["gg","ff"])]
EDIT In the comments, some folks are worried about whether (++) or flip (++) is the right choice. The documentation doesn't say which way things get associated; you can find out by experimenting, or you can sidestep the whole issue using difference lists:
sortAndGroup assocs = ($[]) <$> fromListWith (.) [(k, (v:)) | (k, v) <- assocs]
-- OR
sortAndGroup = fmap ($[]) . M.fromListWith (.) . map (fmap (:))
These alternatives are about the same length as the original, but they're a bit less readable to me.

Here's my solution:
import Data.Function (on)
import Data.List (sortBy, groupBy)
import Data.Ord (comparing)
myGroup :: (Eq a, Ord a) => [(a, b)] -> [(a, [b])]
myGroup = map (\l -> (fst . head $ l, map snd l)) . groupBy ((==) `on` fst)
. sortBy (comparing fst)
This works by first sorting the list with sortBy:
[(1,"aa"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg"),(1,"bb")]
=> [(1,"aa"),(1,"bb"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg")]
then grouping the list elements by the associated key with groupBy:
[(1,"aa"),(1,"bb"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg")]
=> [[(1,"aa"),(1,"bb"),(1,"cc")],[(2,"aa")],[(3,"ff"),(3,"gg")]]
and then transforming the grouped items to tuples with map:
[[(1,"aa"),(1,"bb"),(1,"cc")],[(2,"aa")],[(3,"ff"),(3,"gg")]]
=> [(1,["aa","bb","cc"]), (2, ["aa"]), (3, ["ff","gg"])]`)
Testing:
> myGroup dic
[(1,["aa","bb","cc"]),(2,["aa"]),(3,["ff","gg"])]

Also you can use TransformListComp extension, for example:
Prelude> :set -XTransformListComp
Prelude> import GHC.Exts (groupWith, the)
Prelude GHC.Exts> let dic = [ (1, "aa"), (1, "bb"), (1, "cc") , (2, "aa"), (3, "ff"), (3, "gg")]
Prelude GHC.Exts> [(the key, value) | (key, value) <- dic, then group by key using groupWith]
[(1,["aa","bb","cc"]),(2,["aa"]),(3,["ff","gg"])]

If the list is not sorted on the first element, I don't think you can do better than O(nlog(n)).
One simple way would be to just sort and then use anything from the answer of second part.
You can use from Data.Map a map like Map k [a] to use first element of tuple as key and keep on adding to the values.
You can write your own complex function, which even after you all the attempts will still take O(nlog(n)).
If list is sorted on the first element as is the case in your example, then the task is trivial for something like groupBy as given in the answer by #Mikhail or use foldr and there are numerous other ways.
An example of using foldr is here:
grp :: Eq a => [(a,b)] -> [(a,[b])]
grp = foldr f []
where
f (z,s) [] = [(z,[s])]
f (z,s) a#((x,y):xs) | x == z = (x,s:y):xs
| otherwise = (z,[s]):a

{-# LANGUAGE TransformListComp #-}
import GHC.Exts
import Data.List
import Data.Function (on)
process :: [(Integer, String)] -> [(Integer, [String])]
process list = [(the a, b) | let info = [ (x, y) | (x, y) <- list, then sortWith by y ], (a, b) <- info, then group by a using groupWith]

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Combine equivalent items in a list - haskell

Related

Haskell text encoder

Generating all ways of applying a function to a single element with lens

Get third element of tuple

List processing in Haskell

How to group similar items in a list using Haskell?

Categories

Resources