Given a tuple of lists, I need to find all unique path from that:
Example I/P: [(1,2),(2,3),(3,4),(9,11),(4,5),(5,6),(6,7),(3,9)]
O/P: [[(1,2),(2,3),(3,4),(4,5),(5,6),(6,7)],[(1,2),(2,3),(3,9),(9,11)]]
Two tuples can connect if the second element of the tuple matches with the first element of the other tuple i.e: One tuple is (_,a) and other tuple is like (a,_).
What is the most efficient implementation for this ? I need to find the best data structure suited for it. Any suggestions ? The number of tuples in which I will execute the algorithm will be like more than 400,000.
{-# LANGUAGE NoMonomorphismRestriction #-}
import Data.List (permutations, nub)
path :: Eq a => [(a, a)] -> [(a, a)]
path [] = []
path [x] = [x]
path (u#(_, a):v#(b, _):xs) = if a == b then u:path (v:xs) else [u]
allPaths = nub . map path . permutations
(you can optimize chain generation but I think this problem has exponential time complexity)
EDITED
In general, you must to define more preciselly what paths you want to return.
Ignoring cycle invariant ([(1,2),(2,3),(3,1)] == [(2,3),(3,1),(1,3)]) you can generate all paths (without using permutations)
{-# LANGUAGE NoMonomorphismRestriction #-}
import Data.List (permutations, nub, sortBy, isInfixOf)
data Tree a = Node a [Tree a] deriving Show
treeFromList :: Eq a => a -> [(a, a)] -> Tree a
treeFromList a [] = Node a []
treeFromList a xs = Node a $ map subTree $ filter ((a==).fst) xs
where subTree v#(_, b) = treeFromList b $ filter (v/=) xs
treesFromList :: Eq a => [(a, a)] -> [Tree a]
treesFromList xs = map (flip treeFromList xs) $ nub $ map fst xs ++ map snd xs
treeToList :: Tree a -> [[a]]
treeToList (Node a []) = [[a]]
treeToList (Node a xs) = [a:ws | ws <- concatMap treeToList xs]
treesToList :: [Tree a] -> [[a]]
treesToList = concatMap treeToList
uniqTrees :: Eq a => [[a]] -> [[a]]
uniqTrees = f . reverse . sortBy ((.length).compare.length)
where f [] = []
f (x:xs) = x: filter (not.flip isInfixOf x) (f xs)
allPaths = uniqTrees . treesToList . treesFromList
then
*Main> allPaths [(1, 2), (1, 3), (2, 3), (2, 4), (3, 4), (4, 1)]
[[2,4,1,2,3,4],[2,3,4,1,2,4],[1,3,4,1,2,4],[1,3,4,1,2,3],[1,2,4,1,3,4],[1,2,3,4,1,3]]
uniqTrees has poor efficiency and, in general, you can do many optimizations.
If you want to avoid cycle invariant, you can normalize a cycle selecting minimum base10 representation, in previous example ([(1,2),(2,3),(3,1)] == [(2,3),(3,1),(1,3)]) 1231 < 2313 then
normalize [(2,3),(3,1),(1,3)] == [(1,2),(2,3),(3,1)]
you can normalize a path rotating it n-times and taking "head . sortBy toBase10 . rotations".
I think your problem fits on the NP category since:
A Hamiltonian path, also called a Hamilton path, is a path between two
vertices of a graph that visits each vertex exactly once.
In general, the problem of finding a Hamiltonian path is NP-complete
(Garey and Johnson 1983, pp. 199-200), so the only known way to
determine whether a given general graph has a Hamiltonian path is to
undertake an exhaustive search (source)
You problem is even "harder" since you don't know before hand what will be the end node.
In terms of data structure you can try to simulate the hash table structure in Haskell, since this data type is commonly use in graph and you problem can be turn into a graph.
Related
I'm trying to write a function in Haskell to generate multidimensional lists.
(Technically I'm using Curry, but my understanding is that it's mostly a superset of Haskell, and the thing I'm trying to do is common to Haskell as well.)
After a fair bit of head scratching, I realized my initial desired function (m_array generating_function list_of_dimensions, giving a list nested to a depth equal to length list_of_dimensions) was probably at odds with they type system itself, since (AFAICT) the nesting-depth of lists is part of its type, and my function wanted to return values whose nesting-depths differed based on the value of a parameter, meaning it wanted to return values whose types varied based on the value of a parameter, which (AFAICT) isn't supported in Haskell. (If I'm wrong, and this CAN be done, please tell me.) At this point I moved on to the next paragraph, but if there's a workaround I've missed that takes very similar parameters and still outputs a nested list, let me know. Like, maybe if you can encode the indices as some data type that implicitly includes the nesting level in its type, and is instantiated with e.g. dimensions 5 2 6 ..., maybe that'd work? Not sure.
In any case, I thought that perhaps I could encode the nesting-depth by nesting the function itself, while still keeping the parameters manageable. This did work, and I ended up with the following:
ma f (l:ls) idx = [f ls (idx++[i]) | i <- [0..(l-1)]]
However, so far it's still a little clunky to use: you need to nest the calls, like
ma (ma (ma (\_ i -> 0))) [2,2,2] []
(which, btw, gives [[[0,0],[0,0]],[[0,0],[0,0]]]. If you use (\_ i -> i), it fills the array with the indices of the corresponding element, which is a result I'd like to keep available, but could be a confusing example.)
I'd prefer to minimize the boilerplate necessary. If I can't just call
ma (\_ i -> i) [2,2,2]
I'd LIKE to be able to call, at worst,
ma ma ma (\_ i -> i) [2,2,2] []
But if I try that, I get errors. Presumably the list of parameters is being divvied up in a way that doesn't make sense for the function. I've spent about half an hour googling and experimenting, trying to figure out Haskell's mechanism for parsing strings of functions like that, but I haven't found a clear explanation, and understanding eludes me. So, the formal questions:
How does Haskell parse e.g. f1 f2 f3 x y z? How are the arguments assigned? Is it dependent on the signatures of the functions, or does it e.g. just try to call f1 with 5 arguments?
Is there a way of restructuring ma to permit calling it without parentheses? (Adding at most two helper functions would be permissible, e.g. maStart ma ma maStop (\_ i -> i) [1,2,3,4] [], if necessary.)
The function you want in your head-scratching paragraph is possible directly -- though a bit noisily. With GADTs and DataKinds, values can be parameterized by numbers. You won't be able to use lists directly, because they don't mention their length in their type, but a straightforward variant that does works great. Here's how it looks.
{-# Language DataKinds #-}
{-# Language GADTs #-}
{-# Language ScopedTypeVariables #-}
{-# Language StandaloneDeriving #-}
{-# Language TypeOperators #-}
import GHC.TypeLits
infixr 5 :+
data Vec n a where
O :: Vec 0 a -- O is supposed to look a bit like a mix of 0 and []
(:+) :: a -> Vec n a -> Vec (n+1) a
data FullTree n a where
Leaf :: a -> FullTree 0 a
Branch :: [FullTree n a] -> FullTree (n+1) a
deriving instance Show a => Show (Vec n a)
deriving instance Show a => Show (FullTree n a)
ma :: forall n a. ([Int] -> a) -> Vec n Int -> FullTree n a
ma f = go [] where
go :: [Int] -> Vec n' Int -> FullTree n' a
go is O = Leaf (f is)
go is (l :+ ls) = Branch [go (i:is) ls | i <- [0..l-1]]
Try it out in ghci:
> ma (\_ -> 0) (2 :+ 2 :+ 2 :+ O)
Branch [Branch [Branch [Leaf 0,Leaf 0],Branch [Leaf 0,Leaf 0]],Branch [Branch [Leaf 0,Leaf 0],Branch [Leaf 0,Leaf 0]]]
> ma (\i -> i) (2 :+ 2 :+ 2 :+ O)
Branch [Branch [Branch [Leaf [0,0,0],Leaf [1,0,0]],Branch [Leaf [0,1,0],Leaf [1,1,0]]],Branch [Branch [Leaf [0,0,1],Leaf [1,0,1]],Branch [Leaf [0,1,1],Leaf [1,1,1]]]]
A low-tech solution:
In Haskell, you can model multi-level lists by using the so-called free monad.
The base definition is:
data Free ft a = Pure a | Free (ft (Free ft a))
where ft can be any functor, but here we are interested in ft being [], that is the list functor.
So we define our multidimensional list like this:
import Control.Monad
import Control.Monad.Free
type Mll = Free [] -- Multi-Level List
The Mll type transformer happens to be an instance of the Functor, Foldable, Traversable classes, which can come handy.
To make an array of arbitrary dimension, we start with:
the list of dimensions, for example [5,2,6]
the filler function, which returns a value for a given set of indices
We can start by making a “grid” object, whose item at indices say [x,y,z] is precisely the [x,y,z] list. As we have a functor instance, we can complete the process by just applying fmap filler to our grid object.
This gives the following code:
makeNdArray :: ([Int] -> a) -> [Int] -> Mll a
makeNdArray filler dims =
let
addPrefix x (Pure xs) = Pure (x:xs)
addPrefix x (Free xss) = Free $ map (fmap (x:)) xss
makeGrid [] = Pure []
makeGrid (d:ds) = let base = 0
fn k = addPrefix k (makeGrid ds)
in Free $ map fn [base .. (d-1+base)]
grid = makeGrid dims
in
fmap filler grid -- because we are an instance of the Functor class
To visualize the resulting structure, it is handy to be able to remove the constructor names:
displayMll :: Show a => Mll a -> String
displayMll = filter (\ch -> not (elem ch "Pure Free")) . show
The resulting structure can easily be flattened if need be:
toListFromMll :: Mll a -> [a]
toListFromMll xs = foldr (:) [] xs
For numeric base types, we can get a multidimensional sum function “for free”, so to speak:
mllSum :: Num a => (Mll a) -> a
mllSum = sum -- because we are an instance of the Foldable class
-- or manually: foldr (+) 0
Some practice:
We use [5,2,6] as the dimension set. To visualize the structure, we associate a decimal digit to every index. We can pretend to have 1-base indexing by adding 111, because that way all the resulting numbers are 3 digits long, which makes the result easier to check. Extra newlines added manually.
$ ghci
GHCi, version 8.8.4: https://www.haskell.org/ghc/ :? for help
λ>
λ> dims = [5,2,6]
λ> filler = \[x,y,z] -> (100*x + 10*y + z + 111)
λ>
λ> mxs = makeNdArray filler dims
λ>
λ> displayMll mxs
"[[[111,112,113,114,115,116],[121,122,123,124,125,126]],
[[211,212,213,214,215,216],[221,222,223,224,225,226]],
[[311,312,313,314,315,316],[321,322,323,324,325,326]],
[[411,412,413,414,415,416],[421,422,423,424,425,426]],
[[511,512,513,514,515,516],[521,522,523,524,525,526]]]"
λ>
As mentioned above, we can flatten the structure:
λ>
λ> xs = toListFromMll mxs
λ> xs
[111,112,113,114,115,116,121,122,123,124,125,126,211,212,213,214,215,216,221,222,223,224,225,226,311,312,313,314,315,316,321,322,323,324,325,326,411,412,413,414,415,416,421,422,423,424,425,426,511,512,513,514,515,516,521,522,523,524,525,526]
λ>
or take its overall sum:
λ>
λ> sum mxs
19110
λ>
λ> sum xs
19110
λ>
λ>
λ> length mxs
60
λ>
λ> length xs
60
λ>
Upon working with long strings now, I came across a rather big problem in creating suffix trees in Haskell.
Some constructing algorithms (as this version of Ukkonen's algorithm) require establishing links between nodes. These links "point" on a node in the tree. In imperative languages, such as Java, C#, etc. this is no problem because of reference types.
Are there ways of emulating this behaviour in Haskell? Or is there a completely different alternative?
You can use a value that isn't determined until the result of a computation in the construction of data in the computation by tying a recursive knot.
The following computation builds a list of values that each hold the total number of items in the list even though the total is computed by the same function that's building the list. The let binding in zipCount passes one of the results of zipWithAndCount as the first argument to zipWithAndCount.
zipCount :: [a] -> [(a, Int)]
zipCount xs =
let (count, zipped) = zipWithAndCount count xs
in zipped
zipWithAndCount :: Num n => b -> [a] -> (n, [(a, b)])
zipWithAndCount y [] = (0, [])
zipWithAndCount y (x:xs) =
let (count', zipped') = zipWithAndCount y xs
in (count' + 1, (x, y):zipped')
Running this example makes a list where each item holds the count of the total items in the list
> zipCount ['a'..'e']
[('a',5),('b',5),('c',5),('d',5),('e',5)]
This idea can be applied to Ukkonen's algorithm by passing in the #s that aren't known until the entire result is known.
The general idea of recursively passing a result into a function is called a least fixed point, and is implemented in Data.Function by
fix :: (a -> a) -> a
fix f = let x = f x in x
We can write zipCount in points-free style in terms of zipWithAndCount and fix.
import Data.Function
zipCount :: [a] -> [(a, Int)]
zipCount = snd . fix . (. fst) . flip zipWithAndCount
EDIT: while I'm still interested in an answer on the problems the execution faces in this case, it appears that it was indeed related to strictness since a -O fixes the execution and the program can handle the tree really quickly.
I'm currently working on the 67th problem of Project Euler.
I already solved it using simple lists and dynamic programming.
I'd like to solve it now using a tree datastructure (well, where a Node can have two parents so it's not really a tree). I thought I'd use a simple tree but would take care to craft it so that Nodes are shared when appropriate:
data Tree a = Leaf a | Node a (Tree a) (Tree a) deriving (Show, Eq)
Solving the problem is then just a matter of going through the tree recursively:
calculate :: (Ord a, Num a) => Tree a => a
calculate (Node v l r) = v + (max (calculate l) (calculate r))
calculate (Leaf v) = v
Obviously this has exponential time complexity though. So I tried to memoize the results with :
calculate :: (Ord a, Num a) => Tree a => a
calculate = memo go
where go (Node v l r) = v + (max (calculate l) (calculate r))
go (Leaf v) = v
where memo comes from Stable Memo. Stable Memo is supposed to memoize based on whether or not it has seen the exact same arguments (as in, same in memory).
So I used ghc-vis to see if my tree was correctly sharing nodes to avoid recomputation of things already computed in another branch.
On the sample tree produced by my function as such: lists2tree [[1], [2, 3], [4, 5, 6]], it returns the following correct sharing:
(source: crydee.eu)
Here we can see that the node 5 is shared.
Yet it seems that my tree in the actual Euler Problem isn't getting memoized correctly.
The code is available on github, but I guess that apart from the calculate method above, the only other important method is the one that creates the tree. Here it is:
lists2tree :: [[a]] -> Tree a
lists2tree = head . l2t
l2t :: [[a]] -> [Tree a]
l2t (xs:ys:zss) = l2n xs ts t
where (t:ts) = l2t (ys:zss)
l2t (x:[]) = l2l x
l2t [] = undefined
l2n :: [a] -> [Tree a] -> Tree a -> [Tree a]
l2n (x:xs) (y:ys) p = Node x p y:l2n xs ys y
l2n [] [] _ = []
l2n _ _ _ = undefined
l2l :: [a] -> [Tree a]
l2l = map (\l -> Leaf l)
It basically goes through the list of lists two rows at a time and then creates nodes from bottom to top recursively.
What is wrong with this approach? I thought it might that the program will still produce a complete tree parse in thunks before getting to the leaves and hence before memoizing, avoiding all the benefits of memoization but I'm not sure it's the case. If it is, is there a way to fix it?
This doesn't really address the original question, but I find it is usually easier and more powerful to use explicit memoization.
I chose to store the triangle as a list indexed by a position rather than a tree:
[ ((1,1),3),
((2,1),7), ((2,2), 4),
....
Suppose that part of the result has already been memoized in a list of this format. Then computing the answer at a particular coordinate is trivial:
a # i = let Just v = lookup i a in v
compute tree result (x,y) = tree # (x,y) + max (result # (x+1,y)) (result # (x+1,y+1))
Now we must build result. This is also trivial; all we have to do is map compute over all valid indices.
euler67 :: [((Int, Int), Integer)] -> Integer
euler67 tree = result # (1,1)
where
xMax = maximum $ map (fst . fst) tree
result = [ ((x,y), compute (x,y)) | x <- [1 .. xMax], y <- [1..x] ]
++ [ ((xMax + 1,y),0) | y <- [1..xMax + 1]]
compute (x,y) = tree # (x,y) + max (result # (x+1,y)) (result # (x+1,y+1))
Computing height of the triangle (xMax) is just getting the maximum x-index. Of course we are assuming that the tree is well formed.
The only remotely complicated part is determining which indices are valid for result. Obviously we need 1 row for every row in the original tree. Row x will have x items. We also add an extra row of zeroes at the bottom - we could handle the base case in a special way in compute but it is probably easier this way.
You'll notice that is is quite slow for the hundred row triangle. This is because lookup is traversing three lists per call to compute. To speed it up I used arrays:
euler67' :: Array (Int, Int) Integer -> Integer
euler67' tree = result ! (1,1)
where
((xMin, yMin), (xMax, yMax)) = bounds tree
result = accumArray (+) 0 ((xMin, yMin), (xMax + 1, yMax + 1)) $
[ ((x,y), compute (x,y)) | x <- [xMin .. xMax], y <- [yMin..x] ]
++ [ ((xMax + 1,y),0) | y <- [yMin..xMax + 1]]
compute (x,y) = tree ! (x,y) + max (result ! (x+1,y)) (result ! (x+1,y+1))
Also here is the code I used for reading the files:
readTree' :: String -> IO (Array (Int, Int) Integer)
readTree' path = do
tree <- readTree path
let
xMax = maximum $ map (fst . fst) tree
yMax = maximum $ map (snd . fst) tree
return $ array ((1,1), (xMax,yMax)) tree
readTree :: String -> IO [((Int, Int), Integer)]
readTree path = do
s <- readFile path
return $ map f $ concat $ zipWith (\n xs -> zip (repeat n) xs) [1..] $ map (zip [1..] . map read . words) $ lines s
where
f (a, (b, c)) = ((a,b), c)
I have two lists. One list contains some random data and other list contains the index of first list which needs to be deleted.
For example, let us consider two lists:
let a = [3,4,5,6,6,7,8]
let b = [1,3]
Then, the resultant output should be [3,5,6,7,8]. The number 4 and 6 are deleted since they are on index positions 1 and 3 respectively.
I'm new to Haskell, so finding it difficult to find the solution.
Update: Following code makes it work
import Data.List
dele :: Eq a => [a] -> [Int] -> [a]
dele [] _ = []
dele x [] = x
dele x (y:ys) = dele (delete (x !! y) x) ys
I was just wondering, is there a way to solve it through map/fold way ?
deleteByIndex :: (Enum a, Eq a, Num a) => [a] -> [b] -> [b]
deleteByIndex r = map snd . filter (\(i, _) -> notElem i r) . zip [0..]
[0..] produces an infinite list [0, 1, 2, 3, ...]
zip constructs a list of pairs with the values of this list and your input list in the form [(0,x), (1, y), ...]
filter takes a function a -> Bool. The lambda checks if the index (first element of the pair) is in your input list r.
map snd returns the second element of each pair of the zip list.
zip,filter, map and notElem are documented here
Off the top of my head:
removeByIndex :: [Integer] -> [a] -> [a]
removeByIndex indices = map snd . filter notInIndices . zip [0..]
where notInIndices (i,_) = i `notElem` indices
An alternative answer using the lens library which has received considerable attention recently
import Control.Lens
>let a = [3,4,5,6,6,7,8]
>let b = [1,3]
>a^..elements (`notElem`b)
[3,5,6,7,8]
(^..) is jus the infix for of toListOf which can be used to traverse a structure and make a list out of its parts. The elements function just lets you choose which ones to include.
Other options are 'traverse' to traverse a traversables, 'both' to traverse a (,) and they compose together with (.) so traverse.both would traverse [(1,2), (3,4)] for example.
[(1,2), (3,4)]^..traverse.both
[1,2,3,4]
I'm having a problem creating an function similar to the nub function.
I need this func to remove duplicated elements form a list.
An element is duplicated when 2 elements have the same email, and it should keep the newer one (is closer to the end of the list).
type Regist = [name,email,,...,date]
type ListRe = [Regist]
rmDup ListRe -> ListRe
rmDup [] = []
rmDup [a] = [a]
rmDup (h:t) | isDup h (head t) = rmDup t
| otherwise = h : rmDup t
isDup :: Regist -> Regist -> Bool
isDup (a:b:c:xs) (d:e:f:ts) = b==e
The problem is that the function doesn't delete duplicated elements unless they are together in the list.
Just use nubBy, and specify an equality function that compares things the way you want.
And I guess reverse the list a couple of times if you want to keep the last element instead of the first.
Slightly doctored version of your original code to make it run:
type Regist = [String]
type ListRe = [Regist]
rmDup :: ListRe -> ListRe
rmDup [] = []
rmDup (x:xs) = x : rmDup (filter (\y -> not(x == y)) xs)
Result:
*Main> rmDup [["a", "b"], ["a", "d"], ["a", "b"]]
[["a","b"],["a","d"]]
Anon is correct: nubBy is the function you are looking for, and can be found in Data.List.
That said, you want a function rem which accepts a list xs and a function f :: a -> a -> Bool (on which elements are compared for removal from xs). Since the definition is recursive, you need a base case and a recursive case.
In the base case xs = [] and rem f xs = [], since the result of removing all duplicate elements from [] is []:
rem :: Eq a => (a -> a -> Bool) -> [a] -> [a]
rem f [] = []
In the recursive case, xs = (a:as). Let as' be the list obtained by removing all elements a' such that f a a' = True from the list as. This is simply the function filter (\a' -> not $ f a a') applied to the list as. Them rem f (a:as) is the result of recursively calling rem f on as', that is, a : rem f as':
rem f (a:as) = a : rem f $ filter (\a' -> not $ f a a') as
Replace f be a function comparing your list elements for the appropriate equality (e-mail addresses).
While nubBy with two reverse's is probably the best among simple solutions (and probably exactly what Justin needs for his task), one should not forget that it isn't the ideal solution in terms of efficiency - after all nubBy is O(n^2) (in the "worst case" - when there are no duplicates). Two reverse's will also take their toll (in the form of memory allocation).
For more efficient implementation Data.Map (O(logN) on inserts) can be used as an intermediate "latest non duplicating element" holder (Set.insert replaces older element with newer if there is a collision):
import Data.List
import Data.Function
import qualified Data.Set as S
newtype Regis i e = Regis { toTuple :: (i,[e]) }
selector (Regis (_,(_:a:_))) = a
instance Eq e => Eq (Regis i e) where
(==) = (==) `on` selector
instance Ord e => Ord (Regis i e) where
compare = compare `on` selector
rmSet xs = map snd . sortBy (compare `on` fst) . map toTuple . S.toList $ set
where
set = foldl' (flip (S.insert . Regis)) S.empty (zip [1..] xs)
While nubBy implementation is definitely much simpler:
rmNub xs = reverse . nubBy ((==) `on` (!!1)) . reverse $ xs
on 10M elements list (with lots of duplication - nub should play nice here) there is 3 times difference in terms of running time and 700 times difference in memory usage. Compiled with GHC with -O2 :
input = take 10000000 $ map (take 10) $ permutations [1..]
test1 = rmNub input
test2 = rmSet input
Not sure about the nature of the author's data though (the real data might change the picture).
(Assuming you want to figure out an answer, not just call a library function that does this job for you.)
You get what you ask for. What if h is not equal to head t but is instead equal to the 3rd element of t? You need to write an algorithm that compares h with every element of t, not just the first element.
Why not putting everything in a Map from email to Regist (of course respecting your "keep the newest" rule), and then transform the values of the map back in the list? That's the most efficient way I can think of.
I used Alexei Polkhanov's answer and came to the following, so you can remove duplicates from lists with a type that extends Eq class.
removeDuplicates :: Eq a => [[a]] -> [[a]]
removeDuplicates [] = []
removeDuplicates (x:xs) = x : removeDuplicates (filter (\y -> not (x == y)) xs)
Examples:
*Verdieping> removeDuplicates [[1],[2],[1],[1,2],[1,2]]
[[1],[2],[1,2]]
*Verdieping> removeDuplicates [["a","b"],["a"],["a","b"],["c"],["c"]]
[["a","b"],["a"],["c"]]