Union Function in Haskell - haskell

I have been getting my hands around coding in Haskell, but couldn't grasp the idea of implementing union function.
I have also found some function definition embedded inside the Haskell platform. but the problem is I need a neat and understandable way to make it work.
Anyone can help me with that?

Assuming you're talking about union :: Eq a => [a] -> [a] -> [a] which takes two input lists and returns a third list which contains all of the elements of each argument list, then it's defined in Data.List which is in the base package.
In the source it's divided into two functions, the generalized function unionBy which takes a custom definition of equality (a function of type equal to (==) :: a -> a -> Bool) and then defines the one that uses the Eq typeclass by passing in (==) as a concrete implementation of equality.
union :: (Eq a) => [a] -> [a] -> [a]
union = unionBy (==)
We can substitute (==) into the unionBy code, though, as Haskell lets us use equational reasoning.
union = unionBy (==)
-- so...
union :: Eq a => [a] -> [a] -> [a]
union xs ys = xs ++ foldl (flip (deleteBy (==))) (nubBy (==) ys) xs
This same pattern occurs twice more in the definition of unionBy in deleteBy and nubBy, both of which follow the same convention. delete removes an element from a list and nub returns a list of unique elements. We'll simplify the definition again to eliminate all traces of (==) and simply assume that the elements a have Eq defined.
union xs ys = xs ++ foldl (flip delete) (nub ys) xs
Now the definition is perhaps more readable. The union of xs and ys is xs appended to the unique ("nubbed") values of ys which have been processed by foldl (flip delete) _ xs. The net result of that foldl is to one by one try to delete each element of xs from (nub ys). What that ends up meaning is that union xs ys is xs appended to each unique element from ys less those in xs.
As an aside, with this source in hand we can notice some quirky behavior of union such as how it treats duplicates in the first argument differently from the second argument
union [1,1,2] [2] == [1,1,2]
union [2] [1,1,2] == [2,1]
which is a bit of a letdown, a result of using [] to represent a Set-like notion of union. However, if we view the results using Set.fromList then we're fine.
xs, ys :: Eq a => [a]
Set.fromList (xs `union` ys) == Set.fromList xs `Set.union` Set.fromList ys
which also gives us another definition of union
union xs ys = Set.toList (Set.fromList xs `Set.union` Set.fromList ys)
So how does that foldl trick work? Let's unpack the definition of foldl to see, again abusing equational reasoning.
union xs ys = xs ++ (case xs of
[] -> nub ys
(x:xs') -> foldl (flip delete) (delete x (nub ys)) xs'
)
which should make the trick more evident—it cycles over the elements of xs, deleting them one by one from (nub ys).
While hopefully this helped to make the code in union a bit more clear, the real take home should be that equational reasoning is a powerful tool for dissecting Haskell code. Don't be afraid to simplify code directly by manually inlining the definition of a function.

I am not sure if this union meets your requirements but it is rather simple.
I needed my own function to remove duplicates.
rmdups ls = [d|(z,d)<- zip [0..] ls,notElem d $ take z ls]
It does the same as any recursive function of the same purpose.
union l1 l2 = let l = l1 ++ l2 in rmdups l

I may have misinterpreted the question, but this was a post I found as I was trying to find how to write my own union function. I understand there is one built in but as someone who was trying to learn Haskell that doesn't help at all. These were the functions I wrote to make it work.
memberSet :: Int -> [Int] -> Bool
memberSet x [] = False
memberSet x (y:ys)
| x == y = True
| otherwise = memberSet x ys
unionSet :: [Int] -> [Int] -> [Int]
unionSet [] [] = []
unionSet (x:xs) [] = (x:xs)
unionSet [] (y:ys) = (y:ys)
unionSet (x:xs) (y:ys)
| memberSet y (x:xs) = unionSet (x:xs) ys
| otherwise = y : unionSet (x:xs) ys
main = do
print (unionSet [1,2,3] [2,5,3,4])
Member set checks if an element is present in a list (again I know there is a built in function to do this but I'm trying to learn). And there union set checks if the first elem of the second list is in the first list, if its not it adds it to a list and recursively calls itself. If it is in the first list, it skips that elem and recursively calls itself.

One problem I don't think anyone has addressed is that set union and intersection must be commutative and associative (Tom Apostol Calculus, p. 14).
λ> (Data.List.union [1,1,2] [2]) == (Data.List.union [2] [1,1,2])
False
λ> import Data.Set (Set, lookupMin, lookupMax)
λ> import qualified Data.Set as Set
λ> Set.union (Set.fromList [1, 3, 5, 7]) (Set.fromList [0, 2, 4, 6])
fromList [0,1,2,3,4,5,6,7]
λ> Set.union (Set.fromList [0, 2, 4, 6]) (Set.fromList [1, 3, 5, 7])
fromList [0,1,2,3,4,5,6,7]
So Data.List is not aware, but few more experiments would seem to demonstrate that Haskell Data.Set is aware. So another nail in the coffin of lists as sets, although I suppose we could kludge something further with lists.

Related

Is there a function in haskell to determine if each element in a list is in another list?

I am wondering if there is a function in haskell to determine if each element in a list is in another list. I wrote my own, but it seems like something that would be in Prelude or Data.List
each :: Eq a => [a] -> [a] -> Bool
each xs ys = foldl (\acc x -> if x `elem` ys then True && acc else False) True xs
Does something like this already exist?
The set operation you want specifically is not in the Prelude, but all makes the definition somewhat trivial (though not necessarily efficient).
-- Check if every element in xs is in ys (i.e., is xs a subset of ys)
each xs ys = all (`elem` ys) xs
Assuming your lists do not have duplicate values, you might try (\\).
import Data.List
-- Check if removing all elements in ys from xs produces an empty list.
each xs ys = null (xs \\ ys)

interleaving two strings, preserving order: functional style

In this question, the author brings up an interesting programming question: given two string, find possible 'interleaved' permutations of those that preserves order of original strings.
I generalized the problem to n strings instead of 2 in OP's case, and came up with:
-- charCandidate is a function that finds possible character from given strings.
-- input : list of strings
-- output : a list of tuple, whose first value holds a character
-- and second value holds the rest of strings with that character removed
-- i.e ["ab", "cd"] -> [('a', ["b", "cd"])] ..
charCandidate xs = charCandidate' xs []
charCandidate' :: [String] -> [String] -> [(Char, [String])]
charCandidate' [] _ = []
charCandidate' ([]:xs) prev =
charCandidate' xs prev
charCandidate' (x#(c:rest):xs) prev =
(c, prev ++ [rest] ++ xs) : charCandidate' xs (x:prev)
interleavings :: [String] -> [String]
interleavings xs = interleavings' xs []
-- interleavings is a function that repeatedly applies 'charCandidate' function, to consume
-- the tuple and build permutations.
-- stops looping if there is no more tuple from charCandidate.
interleavings' :: [String] -> String -> [String]
interleavings' xs prev =
let candidates = charCandidate xs
in case candidates of
[] -> [prev]
_ -> concat . map (\(char, ys) -> interleavings' ys (prev ++ [char])) $ candidates
-- test case
input :: [String]
input = ["ab", "cd"]
-- interleavings input == ["abcd","acbd","acdb","cabd","cadb","cdab"]
it works, however I'm quite concerned with the code:
it is ugly. no point-free!
explicit recursion and additional function argument prev to preserve states
using tuples as intermediate form
How can I rewrite the above program to be more "haskellic", concise, readable and more conforming to "functional programming"?
I think I would write it this way. The main idea is to treat creating an interleaving as a nondeterministic process which chooses one of the input strings to start the interleaving and recurses.
Before we start, it will help to have a utility function that I have used countless times. It gives a convenient way to choose an element from a list and know which element it was. This is a bit like your charCandidate', except that it operates on a single list at a time (and is consequently more widely applicable).
zippers :: [a] -> [([a], a, [a])]
zippers = go [] where
go xs [] = []
go xs (y:ys) = (xs, y, ys) : go (y:xs) ys
With that in hand, it is easy to make some non-deterministic choices using the list monad. Notionally, our interleavings function should probably have a type like [NonEmpty a] -> [[a]] which promises that each incoming string has at least one character in it, but the syntactic overhead of NonEmpty is too annoying for a simple exercise like this, so we'll just give wrong answers when this precondition is violated. You could also consider making this a helper function and filtering out empty lists from your top-level function before running this.
interleavings :: [[a]] -> [[a]]
interleavings [] = [[]]
interleavings xss = do
(xssL, h:xs, xssR) <- zippers xss
t <- interleavings ([xs | not (null xs)] ++ xssL ++ xssR)
return (h:t)
You can see it go in ghci:
> interleavings ["abc", "123"]
["abc123","ab123c","ab12c3","ab1c23","a123bc","a12bc3","a12b3c","a1bc23","a1b23c","a1b2c3","123abc","12abc3","12ab3c","12a3bc","1abc23","1ab23c","1ab2c3","1a23bc","1a2bc3","1a2b3c"]
> interleavings ["a", "b", "c"]
["abc","acb","bac","bca","cba","cab"]
> permutations "abc" -- just for fun, to compare
["abc","bac","cba","bca","cab","acb"]
This is fastest implementation I've come up with so far. It interleaves a list of lists pairwise.
interleavings :: [[a]] -> [[a]]
interleavings = foldr (concatMap . interleave2) [[]]
This horribly ugly mess is the best way I could find to interleave two lists. It's intended to be asymptotically optimal (which I believe it is); it's not very pretty. The constant factors could be improved by using a special-purpose queue (such as the one used in Data.List to implement inits) rather than sequences, but I don't feel like including that much boilerplate.
{-# LANGUAGE BangPatterns #-}
import Data.Monoid
import Data.Foldable (toList)
import Data.Sequence (Seq, (|>))
interleave2 :: [a] -> [a] -> [[a]]
interleave2 xs ys = interleave2' mempty xs ys []
interleave2' :: Seq a -> [a] -> [a] -> [[a]] -> [[a]]
interleave2' !prefix xs ys rest =
(toList prefix ++ xs ++ ys)
: interleave2'' prefix xs ys rest
interleave2'' :: Seq a -> [a] -> [a] -> [[a]] -> [[a]]
interleave2'' !prefix [] _ = id
interleave2'' !prefix _ [] = id
interleave2'' !prefix xs#(x : xs') ys#(y : ys') =
interleave2' (prefix |> y) xs ys' .
interleave2'' (prefix |> x) xs' ys
Using foldr over interleave2
interleave :: [[a]] -> [[a]]
interleave = foldr ((concat .) . map . iL2) [[]] where
iL2 [] ys = [ys]
iL2 xs [] = [xs]
iL2 (x:xs) (y:ys) = map (x:) (iL2 xs (y:ys)) ++ map (y:) (iL2 (x:xs) ys)
Another approach would be to use the list monad:
interleavings xs ys = interl xs ys ++ interl ys xs where
interl [] ys = [ys]
interl xs [] = [xs]
interl xs ys = do
i <- [1..(length xs)]
let (h, t) = splitAt i xs
map (h ++) (interl ys t)
So the recursive part will alternate between the two lists, taking all from 1 to N elements from each list in turns and then produce all possible combinations of that. Fun use of the list monad.
Edit: Fixed bug causing duplicates
Edit: Answer to dfeuer. It turned out tricky to do code in the comment field. An example of solutions that do not use length could look something like:
interleavings xs ys = interl xs ys ++ interl ys xs where
interl [] ys = [ys]
interl xs [] = [xs]
interl xs ys = splits xs >>= \(h, t) -> map (h ++) (interl ys t)
splits [] = []
splits (x:xs) = ([x], xs) : map ((h, t) -> (x:h, t)) (splits xs)
The splits function feels a bit awkward. It could be replaced by use of takeWhile or break in combination with splitAt, but that solution ended up a bit awkward as well. Do you have any suggestions?
(I got rid of the do notation just to make it slightly shorter)
Combining the best ideas from the existing answers and adding some of my own:
import Control.Monad
interleave [] ys = return ys
interleave xs [] = return xs
interleave (x : xs) (y : ys) =
fmap (x :) (interleave xs (y : ys)) `mplus` fmap (y :) (interleave (x : xs) ys)
interleavings :: MonadPlus m => [[a]] -> m [a]
interleavings = foldM interleave []
This is not the fastest possible you can get, but it should be good in terms of general and simple.

How to get rid of boxing and unboxing in functional programing?

says that we want to filter out all the odd one in a list.
odd' (i,n) = odd i
unbox (i,n) = n
f :: [Int] -> [Int]
f lst = map unbox $ filter odd' $ zip [1..] lst
*Main> f [1,2,3,4]
[1,3]
it has the unpleasant boxing and unboxing.
can we change the way we think this problem and eliminate boxing and unboxing?
#user3237465 list comprehension is indeed a good way of thinking this sort of problem.
as well as function composition. well, I think we won't get rid of "wrapping the original list to [(index,value)] form and then unwrap it" without writing a special form like #Carsten König provided.
Or have a function that give out one value's index given the list and the value. like filter (odd . getindex) xs
and maybe that's why clojure made it's pattern matching strong enough to get value in complex structure.
you can always rewrite the function if you want - this comes to mind:
odds :: [a] -> [a]
odds (x:_:xs) = x : odds xs
odds [x] = [x]
odds _ = []
aside from this both you don't need odd' and unbox:
odd' is just odd . fst
unbox is just snd
You can write this as
f xs = [x | (i, x) <- zip [0..] xs, even i]
or
f = snd . foldr (\x ~(o, e) -> (e, x : o)) ([], [])
or
import Data.Either
f = lefts . zipWith ($) (cycle [Left, Right])

How to multiply two elements of each pair from list of pairs - Haskell

I want to make function which returns list of multiplied elements from each pair from list of pairs. For example:
>product [1,2] [3,4]
[3,8]
I want to do this using list comprehension. I tried something like this:
product :: Num a => [a] -> [a] -> [a]
product xs ys = [x*y | z<-zip xs ys, (x, _)<-z, (_, y)<-z]
but it's not working. What should be changed?
As a rule of thumb, you should have one <- for each nested iteration. Since you only want to iterate over one list -- namely, zip xs ys -- there should be only one <-. Thus:
scalarproduct xs ys = sum [x*y | (x,y) <- zip xs ys]
You may also like the zipWith function:
scalarproduct xs ys = sum (zipWith (*) xs ys)
scalarproduct = (sum .) . zipWith (*) -- may be readable, depending on your bent
I want to do this using list comprehension. [...]
My advice is: don't! List comprehensions are a tool, and if they're not helping you, then use a different tool.
Simple solution:
product :: Num a => [a] -> [a] -> [a]
product xs ys = zipWith (*) xs ys

implementation of unzip function in haskell

I am trying to implement the unzip function, I did the following code but I get error.
myUnzip [] =()
myUnzip ((a,b):xs) = a:fst (myUnzip xs) b:snd (myUnzip xs)
I know that problem is in the right side of the second line but I do know how to improve it .
any hint please .
the error that I am getting is
ex1.hs:190:22:
Couldn't match expected type `()' with actual type `[a0]'
In the expression: a : fst (myUnzip xs) b : snd (myUnzip xs)
In an equation for `myUnzip':
myUnzip ((a, b) : xs) = a : fst (myUnzip xs) b : snd (myUnzip xs)
ex1.hs:190:29:
Couldn't match expected type `(t0 -> a0, b0)' with actual type `()'
In the return type of a call of `myUnzip'
In the first argument of `fst', namely `(myUnzip xs)'
In the first argument of `(:)', namely `fst (myUnzip xs) b'
ex1.hs:190:49:
Couldn't match expected type `(a1, [a0])' with actual type `()'
In the return type of a call of `myUnzip'
In the first argument of `snd', namely `(myUnzip xs)'
In the second argument of `(:)', namely `snd (myUnzip xs)'
You could do it inefficiently by traversing the list twice
myUnzip [] = ([], []) -- Defaults to a pair of empty lists, not null
myUnzip xs = (map fst xs, map snd xs)
But this isn't very ideal, since it's bound to be quite slow compared to only looping once. To get around this, we have to do it recursively
myUnzip [] = ([], [])
myUnzip ((a, b):xs) = (a : ???, b : ???)
where ??? = myUnzip xs
I'll let you fill in the blanks, but it should be straightforward from here, just look at the type signature of myUnzip and figure out what you can possible put in place of the question marks at where ??? = myUnzip xs
I thought it might be interesting to display two alternative solutions. In practice you wouldn't use these, but they might open your mind to some of the possibilities of Haskell.
First, there's the direct solution using a fold -
unzip' xs = foldr f x xs
where
f (a,b) (as,bs) = (a:as, b:bs)
x = ([], [])
This uses a combinator called foldr to iterate through the list. Instead, you just define the combining function f which tells you how to combine a single pair (a,b) with a pair of lists (as, bs), and you define the initial value x.
Secondly, remember that there is the nice-looking solution
unzip'' xs = (map fst xs, map snd xs)
which looks neat, but performs two iterations of the input list. It would be nice to be able to write something as straightforward as this, but which only iterates through the input list once.
We can nearly achieve this using the Foldl library. For an explanation of why it doesn't quite work, see the note at the end - perhaps someone with more knowledge/time can explain a fix.
First, import the library and define the identity fold. You may have to run cabal install foldl first in order to install the library.
import Control.Applicative
import Control.Foldl
ident = Fold (\as a -> a:as) [] reverse
You can then define folds that extract the first and second components of a list of pairs,
fsts = map fst <$> ident
snds = map snd <$> ident
And finally you can combine these two folds into a single fold that unzips the list
unzip' = (,) <$> fsts <*> snds
The reason that this doesn't quite work is that although you only traverse the list once to extract the pairs, they will be extracted in reverse order. This is what necessitates the additional call to reverse in the definition of ident, which results in an extra traversal of the list, to put it in the right order. I'd be interested to learn of a way to fix that up (I expect it's not possible with the current Foldl library, but might be possible with an analogous Foldr library that gives up streaming in order to preserve the order of inputs).
Note that neither of these work with infinite lists. The solution using Foldl will never be able to handle infinite lists, because you can't observe the value of a left fold until the list has terminated.
However, the version using a right fold should work - but at the moment it isn't lazy enough. In the definition
unzip' xs = foldr f x xs
where
f (a,b) (as,bs) = (a:as, b:bs) -- problem is in this line!
x = ([], [])
the pattern match requires that we open up the tuple in the second argument, which requires evaluating one more step of the fold, which requires opening up another tuple, which requires evaluating one more step of the fold, etc. However, if we use an irrefutable pattern match (which always succeeds, without having to examine the pattern) we get just the right amount of laziness -
unzip'' xs = foldr f x xs
where
f (a,b) ~(as,bs) = (a:as, b:bs)
x = ([], [])
so we can now do
>> let xs = repeat (1,2)
>> take 10 . fst . unzip' $ xs
^CInterrupted
<< take 10 . fst . unzip'' $ xs
[1,1,1,1,1,1,1,1,1,1]
Here's Chris Taylor's answer written using the (somewhat new) "folds" package:
import Data.Fold (R(R), run)
import Control.Applicative ((<$>), (<*>))
ident :: R a [a]
ident = R id (:) []
fsts :: R (a, b) [a]
fsts = map fst <$> ident
snds :: R (a, b) [b]
snds = map snd <$> ident
unzip' :: R (a, b) ([a], [b])
unzip' = (,) <$> fsts <*> snds
test :: ([Int], [Int])
test = run [(1,2), (3,4), (5,6)] unzip'
*Main> test
([1,3,5],[2,4,6])
Here is what I got working after above guidances
myUnzip' [] = ([],[])
myUnzip' ((a,b):xs) = (a:(fst rest), b:(snd rest))
where rest = myUnzip' xs
myunzip :: [(a,b)] -> ([a],[b])
myunzip xs = (firstValues xs , secondValues xs)
where
firstValues :: [(a,b)] -> [a]
firstValues [] = []
firstValues (x : xs) = fst x : firstValues xs
secondValues :: [(a,b)] -> [b]
secondValues [] = []
secondValues (x : xs) = snd x : secondValues xs

Resources