How can this Haskell combination algorithm be improved?

How can this Haskell combination algorithm be improved? - haskell

How can the solution, presented below, to the following problem be improved? Can it be more efficient in time and space? Is there a space leak?
Problem:
Given an input list of Astronauts, produce a list of pairs of Astronauts such that the pairs do not have two Astronauts from the same country. Assume the input has a list of Astronauts with unique identifiers.
data Astronaut = Astronaut { identifier :: Int, country :: String } deriving (Eq)
astronautPairs :: [Astronaut] -> [(Astronaut, Astronaut)]
astronautPairs xs = foldl accumulatePairs [] [(a, b) | a <- xs, b <- xs, country a /= country b]
where
accumulatePairs pairs pair = if hasPair pair pairs then pairs else pair:pairs
hasPair pair#(a,b) ((c,d):xs) = a == d && b == c || hasPair pair xs
hasPair _ [] = False

Instead of eliminating the flipped pairs why not avoid generating them in the first place. It is we that produce them, yes?
import Data.List (tails)
astronautPairs :: [Astronaut] -> [(Astronaut, Astronaut)]
astronautPairs xs = [ (y,z) | (y:ys) <- tails xs, z <- .... , country y /= country .... ]
This assumes that the input list of astronauts has no duplicates.
Thus, we avoid duplicates by a triangular generation.
(I leave a part of code out, for you to complete).

Let's step back from implementation details and think about what you're trying to accomplish:
produce a list of pairs of Astronauts such that the pairs do not have two Astronauts from the same country
You seem to be allowed to assume that each astronaut appears only once in the list.
One efficient way to approach this problem is to start by partitioning your list by country. A natural way to do this is to build a HashMap String [Int] that holds a list of all the astronauts from each country.
import qualified Data.HashMap.Strict as HMS
import Data.HashMap.Strict (HashMap)
divideAstronauts :: [Astronaut] -> HashMap String [Int]
divideAstronauts = foldl' go mempty where
go hm (Astronaut ident cntry) = HMS.insertWith (++) cntry [ident] hm
Now you can divide up the rest of the program into two steps:
Choose all pairs of countries.
For every pair of countries, choose all pairs of astronauts such that each comes from one of those countries.

Related

Combine list of lists with named indices in Map-like structure

I have a program with two data structures I wish to combine. The use of Data.Map here is incidental because I'm using it elsewhere for a related purpose. If a solution never uses Data.Map, that's fine (probably better). I've simplified the problem to the below script that has all the essential elements.
My actual program is in a different domain, but in the analogy various "interviewers" are assigned to interview all the people in given households (named by index position of the "house"). I would like to determine which interviewers will need to conduct multiple interviews.
If an interviewer is assigned multiple households, she automatically must interview multiple people (in the scenario, all households are occupied). However, if she is assigned only one household, she might also need to interview the several people there.
The initial wrong approach I found (misled by my wrong assumption about the domain) produces the result below. However, I'm having trouble formulating the correct solution. For my purpose, the order in which the interviews occur in the result is not important.
import Data.Map.Strict (Map)
import qualified Data.Map.Strict as Map
-- Create Map from list of pairs, making dup vals into a list
fromListWithDuplicates :: Ord k => [(k, v)] -> Map k [v]
fromListWithDuplicates pairs =
Map.fromListWith (++) [(k, [v]) | (k, v) <- pairs]
data Person = Person {
name :: String
} deriving (Show, Eq)
households = [[Person "Alice", Person "Bob"],
[Person "Carlos"],
[Person "Dabir", Person "Eashan"],
[Person "Fatima"] ]
interviewers = [("Agent1", [0]), ("Agent2", [1,2]), ("Agent3", [3])]
multiInterviewsWRONG households interviewers =
let assignments = [(agent, name person) |
(agent, houseIndices) <- interviewers,
index <- houseIndices,
person <- (households !! index),
length houseIndices > 1 ]
in Map.assocs $ fromListWithDuplicates assignments
main :: IO ()
main = do
-- Prints: [("Agent2", ["Eashan","Dabir","Carlos"])]
putStrLn $ show (multiInterviewsWRONG households interviewers)
-- Correct: [("Agent2", ["Eashan","Dabir","Carlos"]),
-- ("Agent1", ["Alice","Bob"]]
Followup: this solution is just Willem Van Onsem's below, but putting it in one place:
import Util (lengthExceeds)
multiInterviews households interviewers =
let assignments = [(agent, name person) |
(agent, houseIndices) <- interviewers,
index <- houseIndices,
person <- (households !! index) ]
in filter (flip lengthExceeds 1 . snd)
(Map.assocs $ fromListWithDuplicates assignments)

Obviously Willem's answer is great, but I think it can't hurt to also offer one without a list comprehension:
atLeastTwo :: [a] -> Bool
atLeastTwo (_:_:_) = True
atLeastTwo _ = False
transformSnd :: (b -> c) -> (a, b) -> (a, c)
transformSnd fun (f, s) = (f, fun s)
-- or transformSnd = second (from Control.Arrow; h/t Willem)
-- or transformSnd = fmap (from (,)'s Functor instance; h/t Will Ness)
-- or transformSnd = second (from Data.Bifunctor)
mult :: [(String, [String])]
mult = filter (atLeastTwo . snd) . map (transformSnd toInterviewees) $ interviewers
where toInterviewees = map name . concatMap (households !!)
-- mult == [("Agent1",["Alice","Bob"]),("Agent2",["Carlos","Dabir","Eashan"])]
I'm reasonably sure the two versions run equally fast; which one is more readable depends on who's doing the reading.
There are a couple of functional differences. First, with Willem's answer, you get a map, while with this one you get a list (but the difference is mainly cosmetic, and you said you didn't care much).
Second, the two versions behave differently if there are two pairs in the interviewers list that have the same first element. Doing it Willem's way will do what you probably want, i. e. treat them as one pair with a longer second element; doing it this way will give you two pairs in the result list which have the same first element.
Also, you probably know this, but: if you find yourself combining lists a lot, you might want sets instead.

You should remove the length houseIndices > 1 constraints, since that means that it will only retain agents, given they have to interview two or more households. You thus should use as list comprehension:
multiInterviews households interviewers =
let assignments = [
(agent, name person) |
(agent, houseIndices) <- interviewers,
index <- houseIndices,
person <- households !! index
]
# …
The given list comprehension will produce a list that looks like:
Prelude> :{
Prelude| [
Prelude| (agent, name person) |
Prelude| (agent, houseIndices) <- interviewers,
Prelude| index <- houseIndices,
Prelude| person <- households !! index
Prelude| ]
Prelude| :}
[("Agent1","Alice"),("Agent1","Bob"),("Agent2","Carlos"),("Agent2","Dabir"),("Agent2","Eashan"),("Agent3","Fatima")]
We however need to filter, we can look at the assocs with lists that contain at least two items. We can implement an efficient function to determine if the list has at least two items:
atLeastTwo :: [a] -> Bool
atLeastTwo (_:_:_) = True
atLeastTwo _ = False
and apply this filter to the assocs of the Map:
multiInterviews households interviewers =
let assignments = …
in filter (atLeastTwo . snd) (Map.assocs (fromListWithDuplicates assignments))

Comparing indexes of a list of tuples to a list of integers

I am very new to functional programming and haskell.
Basically, I have formed a list of tuples where they are ordered and indexed, like so
listOfCharTuples = [(1,H),(2,a),(3,s),(4,k),(5,e),(6,l),(7,l)
and as an input I will be given a list of ordered integers that will be at maximum, the number of tuples in this list.
I want a way to make a function that prints out the respective characters given the input of the list of integers, so if the list given was, for example [1,3,5], I would want the function to print "Hse".
I may be given any list of input characters, the only progress I've made so far is to make them into a list of indexed tuples, and I am really struggling to solve this

Here is a possible definition, using a list comprehension:
f :: [Int] -> [(Int,Char)] -> String
f is ts = [c | (i,c) <- ts, i `elem` is]
and then
f [1,3,5] listOfCharTuples

As hinted in the comment, a possible solution that is more general (does not require orderedness of the input) can make use of the lookup function:
import Data.Maybe (fromMaybe)
lookupAll :: [(Int,Char)] -> [Int] -> String
lookupAll tuples = map (\x -> fromMaybe '?' $ lookup x tuples)
Examples:
putStrLn $ lookupAll listOfCharTuples [1,3,5] -- Hse
putStrLn $ lookupAll listOfCharTuples [5,3,1] -- esH

Grouping by function value into Multimap

Assuming I have a list of values like this:
["abc","abd","aab","123"]
I want to group those values into a MultiMap (conceptually, not limited to a specific data structure) in Haskell by using a function that maps any element to a key.
For this example, we shall use take 2 as a mapper.
The result I intend to get is (conceptually, as JSON):
{"ab":["abc","abd"], "aa":["aab"], "12":["123"]}
In this example I will use [(String, [String])] as a Multimap data structure.
My basic idea (conceptually):
let datalist = ["abc","abd","aab","123"]
let mapfn = take 2
let keys = nub $ map mapfn datalist
let valuesForKey key = filter ((==key).mapfn) datalist
let resultMultimap = zip keys $ map valuesForKey keys
My question:
Is there any better way (in base or external packages) to do this? I want to avoid custom code.
If 1) is not applicable, is there any guarantee that GHC will optimize this so one pass over the data list is sufficient to generate the full multimap (as opposed to one filter run per key)?
Conceptually, this question is similar to the SQL GROUP BY statement.

Using fromListWith from Data.Map:
> let xs = ["abc","abd","aab","123"]
> let f = take 2
> Data.Map.fromListWith (++) [(f x, [x]) | x <- xs]
fromList [("12",["123"]),("aa",["aab"]),("ab",["abd","abc"])]

Edit 2014-03-28: My functions have now been published on Hackage, see group-with
Pull requests are welcome!
Based on hammar's excellent answer I put together two reusable functions to solve this problem.
groupWith solves exactly what I asked for. groupWithMulti generalizes the concept by allowing the identifier-generating function (e.g. take 2 in my example) to return multiple identifiers for a single value (where the value is, in my example, one of ["abc","abd","aab","123"]), or none at all.
The value will be added to the Map value for any identifier generated by f.
import Data.Map (Map)
import qualified Data.Map as Map
-- | Group values in a list by their identifier, being returned
-- by a given function. The resulting map contains,
-- for each generated identifier the values (from the original list)
-- that yielded said identifier by using the function
groupWith :: (Ord b) => (a -> b) -> [a] -> (Map b [a])
groupWith f xs = Map.fromListWith (++) [(f x, [x]) | x <- xs]
-- | Like groupWith, but the identifier-generating function
-- may generate multiple outputs (or even none).
-- The corresponding value from the original list will be placed
-- in the identifier-corresponding map entry for each generated
-- identifier
groupWithMulti :: (Ord b) => (a -> [b]) -> [a] -> (Map b [a])
groupWithMulti f xs =
let identifiers x = [(val, [x]) | val <- f x]
in Map.fromListWith (++) $ concat [identifiers x | x <- xs]
Simply use Map.toList to convert the results of these functions back to a tuple list.
When I have some spare time, I will attempt to create a generalized library on Hackage out of this approach on in-memory data grouping.

Dynamic List Comprehension in Haskell

Suppose I have a list comprehension that returns a list of sequences, where the elements chosen depend on each other (see example below). Is there a way to (conveniently) program the number of elements and their associated conditions based on an earlier computation? For example, return type [[a,b,c]] or [[a,b,c,d,e]] depending on another value in the program? Also, are there other/better ways than a list comprehension to formulate the same idea?
(I thought possible, although cumbersome and limited, to write out a larger list comprehension to start with and trim it by adding to s a parameter and helper functions that could make one or more of the elements a value that could easily be filtered later, and the associated conditions True by default.)
s = [[a, b, c, d] | a <- list, someCondition a,
b <- list, b /= a, not (someCondition b),
otherCondition a b,
c <- list, c /= a, c /= b, not (someCondition c),
otherCondition b c,
d <- list, d /= a, d /= b, d /= c,
someCondition d, someCondition (last d),
otherCondition c d]

The question is incredibly difficult to understand.
Is there a way to (conveniently) program the number of elements and their associated conditions based on an earlier computation?
The problem is "program" is not really an understandable verb in this sentence, because a human programs a computer, or programs a VCR, but you can't "program a number". So I don't understand what you are trying to say here.
But I can give you code review, and maybe through code review I can understand the question you are asking.
Unsolicited code review
It sounds like you are trying to solve a maze by eliminating dead ends, maybe.
What your code actually does is:
Generate a list of cells that are not dead ends or adjacent to dead ends, called filtered
Generate a sequence of adjacent cells from step 1, sequences
Concatenate four such adjacent sequences into a route.
Major problem: this only works if a correct route is exactly eight tiles long! Try to solve this maze:
[E]-[ ]-[ ]-[ ]
|
[ ]-[ ]-[ ]-[ ]
|
[ ]-[ ]-[ ]-[ ]
|
[ ]-[ ]-[ ]-[ ]
|
[ ]-[ ]-[ ]-[E]
So, working backwards from the code review, it sounds like your question is:
How do I generate a list if I don't know how long it is beforehand?
Solutions
You can solve a maze with a search (DFS, BFS, A*).
import Control.Monad
-- | Maze cells are identified by integers
type Cell = Int
-- | A maze is a map from cells to adjacent cells
type Maze = Cell -> [Cell]
maze :: Maze
maze = ([[1], [0,2,5], [1,3], [2],
[5], [4,6,1,9], [5,7], [6,11],
[12], [5,13], [9], [7,15],
[8,16], [14,9,17], [13,15], [14,11],
[12,17], [13,16,18], [17,19], [18]] !!)
-- | Find paths from the given start to the end
solve :: Maze -> Cell -> Cell -> [[Cell]]
solve maze start = solve' [] where
solve' path end =
let path' = end : path
in if start == end
then return path'
else do neighbor <- maze end
guard (neighbor `notElem` path)
solve' path' neighbor
The function solve works by depth-first search. Rather than putting everything in a single list comprehension, it works recursively.
In order to find a path from start to end, if start /= end,
Look at all cells adjacent to the end, neighbor <- maze end,
Make sure that we're not backtracking over a cell guard (negihbor `notElem` path),
Try to find a path from start to neighbor.
Don't try to understand the whole function at once, just understand the bit about recursion.
Summary
If you want to find the route from cell 0 to cell 19, recurse: We know that cell 18 and 19 are connected (because they are directly connected), so we can instead try to solve the problem of finding a route from cell 0 to cell 18.
This is recursion.
Footnotes
The guard,
someCondition a == True
Is equivalent to,
someCondition a
And therefore also equivalent to,
(someCondition a == True) == True
Or,
(someCondition a == (True == True)) == (True == (True == True))
Or,
someCondition a == (someCondition a == someCondition a)
The first one, someCondition a, is fine.
Footnote about do notation
The do notation in the above example is equivalent to list comprehension,
do neighbor <- maze end
guard (neighbor `notElem` path)
solve' path' neighbor
The equivalent code in list comprehension syntax is,
[result | neighbor <- maze end,
neighbor `notElem` path,
result <- solve' path' neighbor]

Is there a way to (conveniently) program the number of elements and their associated conditions based on an earlier computation? For example, return type [[a,b,c]] or [[a,b,c,d,e]] depending on another value in the program?
I suppose you want to encode the length of the list (or vector) statically in the type signature. Length of the standard lists cannot be checked on type level.
One approach to do that is to use phantom types, and introduce dummy data types which will encode different sizes:
newtype Vector d = Vector { vecArray :: UArray Int Float }
-- using EmptyDataDecls extension too
data D1
data D2
data D3
Now you can create vectors of different length which will have distinct types:
vector2d :: Float -> Float -> Vector D2
vector2d x y = Vector $ listArray (1,2) [x,y]
vector3d :: Float -> Float -> Float -> Vector D3
vector3d x y z = Vector $ listArray (1,3) [x,y,z]
If the length of the output depends on the length of the input, then consider using type-level arithmetics to parametrize the output.
You can find more by googling for "Haskell statically sized vectors".
A simpler solution is to use tuples, which are fixed length. If your function can produce either a 3-tuple, or a 5-tuple, wrap them with an Either data type: `Either (a,b,c) (a,b,c,d,e).

Looks like you're trying to solve some logic puzzle by unique selection from finite domain. Consult these:
Euler 43 - is there a monad to help write this list comprehension?
Splitting list into a list of possible tuples
The way this helps us is, we carry our domain around while we're making picks from it; and the next pick is made from the narrowed domain containing what's left after the previous pick, so a chain is naturally formed. E.g.
p43 = sum [ fromDigits [v0,v1,v2,v3,v4,v5,v6,v7,v8,v9]
| (dom5,v5) <- one_of [0,5] [0..9] -- [0..9] is the
, (dom6,v6) <- pick_any dom5 -- initial domain
, (dom7,v7) <- pick_any dom6
, rem (100*d5+10*d6+d7) 11 == 0
....
-- all possibilities of picking one elt from a domain
pick_any :: [a] -> [([a], a)]
pick_any [] = []
pick_any (x:xs) = (xs,x) : [ (x:dom,y) | (dom,y) <- pick_any xs]
-- all possibilities of picking one of provided elts from a domain
-- (assume unique domains, i.e. no repetitions)
one_of :: (Eq a) => [a] -> [a] -> [([a], a)]
one_of ns xs = [ (ys,y) | let choices = pick_any xs, n <- ns,
(ys,y) <- take 1 $ filter ((==n).snd) choices ]
You can trivially check a number of elements in your answer as a part of your list comprehension:
s = [answer | a <- .... , let answer=[....] , length answer==4 ]
or just create different answers based on a condition,
s = [answer | a <- .... , let answer=if condition then [a,b,c] else [a]]

You have Data.List.subsequences
You can write your list comprehension in monadic form (see guards in Monad Comprehensions):
(Explanation: The monad must be an instance of MonadPlus which supports failure.
guard False makes the monad fail evaluating to mzero., subsequent results are appended with mplus = (++) for the List monad.)
import Control.Monad (guard)
myDomain = [1..9] -- or whatever
validCombinations :: [a] -> [[a]]
validCombinations domainList = do
combi <- List.subsequences domainList
case combi of
[a,b] -> do
guard (propertyA a && propertyB b)
return combi
[a,b,c] -> do
guard (propertyA a && propertyB b && propertyC c)
return combi
_ -> guard False
main = do
forM_ (validCombinations myDomain) print
Update again, obtaining elements recursively, saving combinations and checks
import Control.Monad
validCombinations :: Eq a => Int -> Int -> [a] -> [(a -> Bool)] -> [a] -> [[a]]
validCombinations indx size domainList propList accum = do
elt <- domainList -- try all domain elements
let prop = propList!!indx
guard $ prop elt -- some property
guard $ elt `notElem` accum -- not repeated
{-
case accum of
prevElt : _ -> guard $ some_combined_check_with_previous elt prevElt
_ -> guard True
-}
if size > 1 then do
-- append recursively subsequent positions
other <- validCombinations (indx+1) (size-1) domainList propList (elt : accum)
return $ elt : other
else
return [elt]
myDomain = [1..3] :: [Int]
myProps = repeat (>1)
main = do
forM_ (validCombinations 0 size myDomain myProps []) print
where
size = 2
result for size 2 with non trivial result:
[2,3]
[3,2]

List processing in Haskell

I am teaching myself Haskell and have run into a problem and need help.
Background:
type AInfo = (Char, Int)
type AList = [AInfo] (let’s say [(‘a’, 2), (‘b’,5), (‘a’, 1), (‘w’, 21)]
type BInfo = Char
type BList = [BInfo] (let’s say [‘a’, ‘a’, ‘c’, ‘g’, ‘a’, ‘w’, ‘b’]
One quick edit: The above information is for illustrative purposes only. The actual elements of the lists are a bit more complex. Also, the lists are not static; they are dynamic (hence the uses of the IO monad) and I need to keep/pass/"return"/have access to and change the lists during the running of the program.
I am looking to do the following:
For all elements of AList check against all elements of BList and where the character of the AList element (pair) is equal to the character in the Blist add one to the Int value of the AList element (pair) and remove the character from BList.
So what this means is after the first element of AList is checked against all elements of BList the values of the lists should be:
AList [(‘a’, 5), (‘b’,5), (‘a’, 1), (‘w’, 21)]
BList [‘c’, ‘g’, ‘w’, ‘b’]
And in the end, the lists values should be:
AList [(‘a’, 5), (‘b’,6), (‘a’, 1), (‘w’, 22)]
BList [‘c’, ‘g’]
Of course, all of this is happening in an IO monad.
Things I have tried:
Using mapM and a recursive helper function. I have looked at both:
Every element of AList checked against every element of bList -- mapM (myHelpF1 alist) blist and
Every element of BList checked against every element of AList – mapM (myHelpF2 alist) blist
Passing both lists to a function and using a complicated
if/then/else & helper function calls (feels like I am forcing
Haskell to be iterative; Messy convoluted code, Does not feel
right.)
I have thought about using filter, the character value of AList
element and Blist to create a third list of Bool and the count the
number of True values. Update the Int value. Then use filter on
BList to remove the BList elements that …… (again Does not feel
right, not very Haskell-like.)
Things I think I know about the problem:
The solution may be exceeding trivial. So much so, the more experienced Haskellers will be muttering under their breath “what a noob” as they type their response.
Any pointers would be greatly appreciated. (mutter away….)

A few pointers:
Don't use [(Char, Int)] for "AList". The data structure you are looking for is a finite map: Map Char Int. Particularly look at member and insertWith. toList and fromList convert from the representation you currently have for AList, so even if you are stuck with that representation, you can convert to a Map for this algorithm and convert back at the end. (This will be more efficient than staying in a list because you are doing so many lookups, and the finite map API is easier to work with than lists)
I'd approach the problem as two phases: (1) partition out the elements of blist by whether they are in the map, (2) insertWith the elements which are already in the map. Then you can return the resulting map and the other partition.
I would also get rid of the meaningless assumptions such as that keys are Char -- you can just say they are any type k (for "key") that satisfies the necessary constraints (that you can put it in a Map, which requires that it is Orderable). You do this with lowercase type variables:
import qualified Data.Map as Map
sieveList :: (Ord k) => Map.Map k Int -> [k] -> (Map.Map k Int, [k])
Writing algorithms in greater generality helps catch bugs, because it makes sure that you don't use any assumptions you don't need.
Oh, also this program has no business being in the IO monad. This is pure code.

import Data.List
type AInfo = (Char, Int)
type AList = [AInfo]
type BInfo = Char
type BList = [BInfo]
process :: AList -> BList -> AList
process [] _ = []
process (a:as) b = if is_in a b then (fst a,snd a + 1):(process as (delete (fst a) b)) else a:process as b where
is_in f [] = False
is_in f (s:ss) = if fst f == s then True else is_in f ss
*Main> process [('a',5),('b',5),('a',1),('b',21)] ['c','b','g','w','b']
[('a',5),('b',6),('a',1),('b',22)]
*Main> process [('a',5),('b',5),('a',1),('w',21)] ['c','g','w','b']
[('a',5),('b',6),('a',1),('w',22)]
Probably an important disclaimer: I'm rusty at Haskell to the point of ineptness, but as a relaxing midnight exercise I wrote this thing. It should do what you want, although it doesn't return a BList. With a bit of modification, you can get it to return an (AList,BList) tuple, but methinks you'd be better off using an imperative language if that kind of manipulation is required.
Alternately, there's an elegant solution and I'm too ignorant of Haskell to know it.

While I am by no means a Haskell expert, I have a partial attempt that returns that result of an operation once. Maybe you can find out how to map it over the rest to get your solution. The addwhile is clever, since you only want to update the first occurrence of an element in lista, if it exists twice, it will just add 0 to it. Code critiques are more than welcome.
import Data.List
type AInfo = (Char, Int)
type AList = [AInfo]
type BInfo = Char
type BList = [BInfo]
lista = ([('a', 2), ('b',5), ('a', 1), ('w', 21)] :: AList)
listb = ['a','a','c','g','a','w','b']
--step one, get the head, and its occurrences
items list = (eleA, eleB) where
eleA = length $ filter (\x -> x == (head list)) list
eleB = head list
getRidOfIt list ele = (dropWhile (\x -> x == ele) list) --drop like its hot
--add to lista
addWhile :: [(Char, Int)] -> Char -> Int -> [(Char,Int)]
addWhile [] _ _ = []
addWhile ((x,y):xs) letter times = if x == letter then (x,y+times) : addWhile xs letter times
else (x,y) : addWhile xs letter 0
--first answer
firstAnswer = addWhile lista (snd $ items listb) (fst $ items listb)
--[('a',5),('b',5),('a',1),('w',21)]

The operation you describe is pure, as #luqui points out, so we just define it as a pure Haskell function. It can be used inside a monad (including IO) by means of fmap (or do).
import Data.List
combine alist blist = (reverse a, b4) where
First we sort and count the B list:
b = map (\g->(head g,length g)) . group . sort $ blist
We need the import for group and sort to be available. Next, we roll along the alist and do our thing:
(a,b2) = foldl g ([],b) alist
g (acc,b) e#(x,c) = case pick x b of
Nothing -> (e:acc,b)
Just (n,b2) -> ((x,c+n):acc,b2)
b3 = map fst b2
b4 = [ c | c <- blist, elem c b3 ]
Now pick, as used, must be
pick x [] = Nothing
pick x ((y,n):t)
| x==y = Just (n,t)
| otherwise = case pick x t of Nothing -> Nothing
Just (k,r) -> Just (k, (y,n):r)
Of course pick performs a linear search, so if performance (speed) becomes a problem, b should be changed to allow for binary search (tree etc, like Map). The calculation of b4 which is filter (`elem` b3) blist is another potential performance problem with its repeated checks for presence in b3. Again, checking for presence in trees is faster than in lists, in general.
Test run:
> combine [('a', 2), ('b',5), ('a', 1), ('w', 21)] "aacgawb"
([('a',5),('b',6),('a',1),('w',22)],"cg")
edit: you probably want it the other way around, rolling along the blist while updating the alist and producing (or not) the elements of blist in the result (b4 in my code). That way the algorithm will operate in a more local manner on long input streams (that assuming your blist is long, though you didn't say that). As written above, it will have a space problem, consuming the input stream blist several times over. I'll keep it as is as an illustration, a food for thought.
So if you decide to go the 2nd route, first convert your alist into a Map (beware the duplicates!). Then, scanning (with scanl) over blist, make use of updateLookupWithKey to update the counts map and at the same time decide for each member of blist, one by one, whether to output it or not. The type of the accumulator will thus have to be (Map a Int, Maybe a), with a your element type (blist :: [a]):
scanl :: (acc -> a -> acc) -> acc -> [a] -> [acc]
scanning = tail $ scanl g (Nothing, fromList $ reverse alist) blist
g (_,cmap) a = case updateLookupWithKey (\_ c->Just(c+1)) a cmap of
(Just _, m2) -> (Nothing, m2) -- seen before
_ -> (Just a, cmap) -- not present in counts
new_b_list = [ a | (Just a,_) <- scanning ]
last_counts = snd $ last scanning
You will have to combine the toList last_counts with the original alist if you have to preserve the old duplicates there (why would you?).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How can this Haskell combination algorithm be improved? - haskell

Related

Combine list of lists with named indices in Map-like structure

Comparing indexes of a list of tuples to a list of integers

Grouping by function value into Multimap

Dynamic List Comprehension in Haskell

List processing in Haskell

Categories

Resources