Linking in tree structures

Linking in tree structures - haskell

Upon working with long strings now, I came across a rather big problem in creating suffix trees in Haskell.
Some constructing algorithms (as this version of Ukkonen's algorithm) require establishing links between nodes. These links "point" on a node in the tree. In imperative languages, such as Java, C#, etc. this is no problem because of reference types.
Are there ways of emulating this behaviour in Haskell? Or is there a completely different alternative?

You can use a value that isn't determined until the result of a computation in the construction of data in the computation by tying a recursive knot.
The following computation builds a list of values that each hold the total number of items in the list even though the total is computed by the same function that's building the list. The let binding in zipCount passes one of the results of zipWithAndCount as the first argument to zipWithAndCount.
zipCount :: [a] -> [(a, Int)]
zipCount xs =
let (count, zipped) = zipWithAndCount count xs
in zipped
zipWithAndCount :: Num n => b -> [a] -> (n, [(a, b)])
zipWithAndCount y [] = (0, [])
zipWithAndCount y (x:xs) =
let (count', zipped') = zipWithAndCount y xs
in (count' + 1, (x, y):zipped')
Running this example makes a list where each item holds the count of the total items in the list
> zipCount ['a'..'e']
[('a',5),('b',5),('c',5),('d',5),('e',5)]
This idea can be applied to Ukkonen's algorithm by passing in the #s that aren't known until the entire result is known.
The general idea of recursively passing a result into a function is called a least fixed point, and is implemented in Data.Function by
fix :: (a -> a) -> a
fix f = let x = f x in x
We can write zipCount in points-free style in terms of zipWithAndCount and fix.
import Data.Function
zipCount :: [a] -> [(a, Int)]
zipCount = snd . fix . (. fst) . flip zipWithAndCount

Related

Directly generating specific subsets of a powerset?

Haskell's expressiveness enables us to rather easily define a powerset function:
import Control.Monad (filterM)
powerset :: [a] -> [[a]]
powerset = filterM (const [True, False])
To be able to perform my task it is crucial for said powerset to be sorted by a specific function, so my implementation kind of looks like this:
import Data.List (sortBy)
import Data.Ord (comparing)
powersetBy :: Ord b => ([a] -> b) -> [a] -> [[a]]
powersetBy f = sortBy (comparing f) . powerset
Now my question is whether there is a way to only generate a subset of the powerset given a specific start and endpoint, where f(start) < f(end) and |start| < |end|. For example, my parameter is a list of integers ([1,2,3,4,5]) and they are sorted by their sum. Now I want to extract only the subsets in a given range, lets say 3 to 7. One way to achieve this would be to filter the powerset to only include my range but this seems (and is) ineffective when dealing with larger subsets:
badFunction :: Ord b => b -> b -> ([a] -> b) -> [a] -> [[a]]
badFunction start end f = filter (\x -> f x >= start && f x <= end) . powersetBy f
badFunction 3 7 sum [1,2,3,4,5] produces [[1,2],[3],[1,3],[4],[1,4],[2,3],[5],[1,2,3],[1,5],[2,4],[1,2,4],[2,5],[3,4]].
Now my question is whether there is a way to generate this list directly, without having to generate all 2^n subsets first, since it will improve performance drastically by not having to check all elements but rather generating them "on the fly".

If you want to allow for completely general ordering-functions, then there can't be a way around checking all elements of the powerset. (After all, how would you know the isn't a special clause built in that gives, say, the particular set [6,8,34,42] a completely different ranking from its neighbours?)
However, you could make the algorithm already drastically faster by
Only sorting after filtering: sorting is O (n · log n), so you want keep n low here; for the O (n) filtering step it matters less. (And anyway, number of elements doesn't change through sorting.)
Apply the ordering-function only once to each subset.
So
import Control.Arrow ((&&&))
lessBadFunction :: Ord b => (b,b) -> ([a]->b) -> [a] -> [[a]]
lessBadFunction (start,end) f
= map snd . sortBy (comparing fst)
. filter (\(k,_) -> k>=start && k<=end)
. map (f &&& id)
. powerset
Basically, let's face it, powersets of anything but a very small basis are infeasible. The particular application “sum in a certain range” is pretty much a packaging problem; there are quite efficient ways to do that kind of thing, but you'll have to give up the idea of perfect generality and of quantification over general subsets.

Since your problem is essentially a constraint satisfaction problem, using an external SMT solver might be the better alternative here; assuming you can afford the extra IO in the type and the need for such a solver to be installed. The SBV library allows construction of such problems. Here's one encoding:
import Data.SBV
-- c is the cost type
-- e is the element type
pick :: (Num e, SymWord e, SymWord c) => c -> c -> ([SBV e] -> SBV c) -> [e] -> IO [[e]]
pick begin end cost xs = do
solutions <- allSat constraints
return $ map extract $ extractModels solutions
where extract ts = [x | (t, x) <- zip ts xs, t]
constraints = do tags <- mapM (const free_) xs
let tagged = zip tags xs
finalCost = cost [ite t (literal x) 0 | (t, x) <- tagged]
solve [finalCost .>= literal begin, finalCost .<= literal end]
test :: IO [[Integer]]
test = pick 3 7 sum [1,2,3,4,5]
We get:
Main> test
[[1,2],[1,3],[1,2,3],[1,4],[1,2,4],[1,5],[2,5],[2,3],[2,4],[3,4],[3],[4],[5]]
For large lists, this technique will beat out generating all subsets and filtering; assuming the cost function generates reasonable constraints. (Addition will be typically OK, if you've multiplications, the backend solver will have a harder time.)
(As a side note, you should never use filterM (const [True, False]) to generate power-sets to start with! While that expression is cute and fun, it is extremely inefficient!)

Stuck - Practice exam Q for Haskell coding: Return the longest String in a list of Strings

The full practice exam question is:
Using anonymous functions and mapping functions, define Haskell
functions which return the longest String in a list of Strings, e.g.
for [“qw”, “asd”,”fghj”, “kl”] the function should return “fghj”.
I tried doing this and keep failing and moving onto others, but I would really like to know how to tackle this. I have to use mapping functions and anonymous functions it seems, but I don't know how to write code to make each element check with each to find the highest one.
I know using a mapping function like "foldr" can make you perform repeating operations to each element and return one result, which is what we want to do with this question (check each String in the list of Strings for the longest, then return one string).
But with foldr I don't know how to use it to make checks between elments to see which is "longest"... Any help will be gladly appreciated.
So far I've just been testing if I can even use foldr to test the length of each element but it doesn't even work:
longstr :: [String] -> String
longstr lis = foldr (\n -> length n > 3) 0 lis
I'm quite new to haskell as this is a 3 month course and it's only been 1 month and we have a small exam coming up

I'd say they're looking for a simple solution:
longstr xs = foldr (\x acc -> if length x > length acc then x else acc) "" xs
foldr is like a loop that iterates on every element of the list xs. It receives 2 arguments: x is the element and acc (for accumulator) in this case is the longest string so far.
In the condition if the longest string so far is longer than the element we keep it, otherwise we change it.

Another idea:
Convert to a list of tuples: (length, string)
Take the maximum of that list (which is some pair).
Return the string of the pair returned by (2).
Haskell will compare pairs (a,b) lexicographically, so the pair returned by (2) will come from the string with largest length.
Now you just have to write a maximum function:
maximum :: Ord a => [a] -> a
and this can be written using foldr (or just plain recursion.)
To write the maximum function using recursion, fill in the blanks:
maximum [a] = ??? -- maximum of a single element
maximum (a:as) = ??? -- maximum of a value a and a list as (hint: use recursion)
The base case for maximum begins with a single element list since maximum [] doesn't make sense here.

You can map the list to a list of tuples, consisting of (length, string). Sort by length (largest first) and return the string of the first element.
https://stackoverflow.com/a/9157940/127059 has an answer as well.

Here's an example of building what you want from the bottom up.
maxBy :: Ord b => (a -> b) -> a -> a -> a
maxBy f x y = case compare (f x) (f y) of
LT -> y
_ -> x
maximumBy :: Ord b => (a -> b) -> [a] -> Maybe a
maximumBy _ [] = Nothing
maximumBy f l = Just . fst $ foldr1 (maxBy snd) pairs
where
pairs = map (\e -> (e, f e)) l
testData :: [String]
testData = ["qw", "asd", "fghj", "kl"]
test :: Maybe String
test = maximumBy length testData
main :: IO ()
main = print test

How can I iterate over a string without recursion?

isTogether' :: String -> Bool
isTogether' (x:xs) = isTogether (head xs) (head (tail xs))
For the above code, I want to go through every character in the string. I am not allowed to use recursion.

isTogether' (x:xs) = isTogether (head xs) (head (tail xs))
If I've got it right, you are interested in getting consequential char pairs from some string. So, for example, for abcd you need to test (a,b), (b,c), (c,d) with some (Char,Char) -> Bool or Char -> Char -> Bool function.
Zip could be helpful here:
> let x = "abcd"
> let pairs = zip x (tail x)
it :: [(Char, Char)]
And for some f :: Char -> Char -> Bool function we can get uncurry f :: (Char, Char) -> Bool.
And then it's easy to get [Bool] value of results with map (uncurry f) pairs :: [Bool].

In Haskell, a String is just a list of characters ([Char]). Thus, all of the normal higher-order list functions like map work on strings. So you can use whichever higher-order function is most applicable to your problem.
Note that these functions themselves are defined recursively; in fact, there is no way to go through the entire list in Haskell without either recursing explicitly or using a function that directly or indirectly recurses.

To do this without recursion, you will need to use a higher order function or a list comprehension. I don't understand what you're trying to accomplish so I can only give generic advice. You probably will want one of these:
map :: (a -> b) -> [a] -> [b]
Map converts a list of one type into another. Using map lets you perform the same action on every element of the list, given a function that operates on the kinds of things you have in the list.
filter :: (a -> Bool) -> [a] -> [a]
Filter takes a list and a predicate, and gives you a new list with only the elements that satisfy the predicate. Just with these two tools, you can do some pretty interesting things:
import Data.Char
map toUpper (filter isLower "A quick test") -- => "QUICKTEST"
Then you have folds of various sorts. A fold is really a generic higher order function for doing recursion on some type, so using it takes a bit of getting used to, but you can accomplish pretty much any recursive function on a list with a fold instead. The basic type of foldr looks like this:
foldr :: (a -> b -> b) -> b -> [a] -> b
It takes three arguments: an inductive step, a base case and a value you want to fold. Or, in less mathematical terms, you could think of it as taking an initial state, a function to take the next item and the previous state to produce the next state, and the list of values. It then returns the final state it arrived at. You can do some pretty surprising things with fold, but let's say you want to detect if a list has a run of two or more of the same item. This would be hard to express with map and filter (impossible?), but it's easy with recursion:
hasTwins :: (Eq a) => [a] -> Bool
hasTwins (x:y:xs) | x == y = True
hasTwins (x:y:xs) | otherwise = hasTwins (y:xs)
hasTwins _ = False
Well, you can express this with a fold like so:
hasTwins :: (Eq a) => [a] -> Bool
hasTwins (x:xs) = snd $ foldr step (x, False) xs
where
step x (prev, seenTwins) = (x, prev == x || seenTwins)
So my "state" in this fold is the previous value and whether we've already seen a pair of identical values. The function has no explicit recursion, but my step function passes the current x value along to the next invocation through the state as the previous value. But you don't have to be happy with the last state you have; this function takes the second value out of the state and returns that as the overall return value—which is the boolean whether or not we've seen two identical values next to each other.

Compute Most Frequent Occurance of Numbers of A Sorted List in Haskell

The question is to compute the mode (the value that occurs most frequently) of a sorted list of integers.
[1,1,1,1,2,2,3,3] -> 1
[2,2,3,3,3,3,4,4,8,8,8,8] -> 3 or 8
[3,3,3,3,4,4,5,5,6,6] -> 3
Just use the Prelude library.
Are the functions filter, map, foldr in Prelude library?

Starting from the beginning.
You want to make a pass through a sequence and get the maximum frequency of an integer.
This sounds like a job for fold, as fold goes through a sequence aggregating a value along the way before giving you a final result.
foldl :: (a -> b -> a) -> a -> [b] -> a
The type of foldl is shown above. We can fill in some of that already (I find that helps me work out what types I need)
foldl :: (a -> Int -> a) -> a -> [Int] -> a
We need to fold something through that to get the value. We have to keep track of the current run and the current count
data BestRun = BestRun {
currentNum :: Int,
occurrences :: Int,
bestNum :: Int,
bestOccurrences :: Int
}
So now we can fill in a bit more:
foldl :: (BestRun -> Int -> BestRun) -> BestRun -> [Int] -> BestRun
So we want a function that does the aggregation
f :: BestRun -> Int -> BestRun
f (BestRun current occ best bestOcc) x
| x == current = (BestRun current (occ + 1) best bestOcc) -- continuing current sequence
| occ > bestOcc = (BestRun x 1 current occ) -- a new best sequence
| otherwise = (BestRun x 1 best bestOcc) -- new sequence
So now we can write the function using foldl as
bestRun :: [Int] -> Int
bestRun xs = bestNum (foldl f (BestRun 0 0 0 0) xs)

Are the functions filter, map, foldr in Prelude library?
Stop...Hoogle time!
Did you know Hoogle tells you which module a function is from? Hoolging map results in this information on the search page:
map :: (a -> b) -> [a] -> [b]
base Prelude, base Data.List
This means map is defined both in Prelude and in Data.List. You can hoogle the other functions and likewise see that they are indeed in Prelude.
You can also look at Haskell 2010 > Standard Prelude or the Prelude hackage docs.
So we are allowed to map, filter, and foldr, as well as anything else in Prelude. That's good. Let's start with Landei's idea, to turn the list into a list of lists.
groupSorted :: [a] -> [[a]]
groupSorted = undefined
-- groupSorted [1,1,2,2,3,3] ==> [[1,1],[2,2],[3,3]]
How are we supposed to implement groupSorted? Well, I dunno. Let's think about that later. Pretend that we've implemented it. How would we use it to get the correct solution? I'm assuming it is OK to choose just one correct solution, in the event that there is more than one (as in your second example).
mode :: [a] -> a
mode xs = doSomething (groupSorted xs)
where doSomething :: [[a]] -> a
doSomething = undefined
-- doSomething [[1],[2],[3,3]] ==> 3
-- mode [1,2,3,3] ==> 3
We need to do something after we use groupSorted on the list. But what? Well...we should find the longest list in the list of lists. Right? That would tell us which element appears the most in the original list. Then, once we find the longest sublist, we want to return the element inside it.
chooseLongest :: [[a]] -> a
chooseLongest xs = head $ chooseBy (\ys -> length ys) xs
where chooseBy :: ([a] -> b) -> [[a]] -> a
chooseBy f zs = undefined
-- chooseBy length [[1],[2],[3,3]] ==> [3,3]
-- chooseLongest [[1],[2],[3,3]] ==> 3
chooseLongest is the doSomething from before. The idea is that we want to choose the best list in the list of lists xs, and then take one of its elements (its head does just fine). I defined this by creating a more general function, chooseBy, which uses a function (in this case, we use the length function) to determine which choice is best.
Now we're at the "hard" part. Folds. chooseBy and groupSorted are both folds. I'll step you through groupSorted, and leave chooseBy up to you.
How to write your own folds
We know groupSorted is a fold, because it consumes the entire list, and produces something entirely new.
groupSorted :: [Int] -> [[Int]]
groupSorted xs = foldr step start xs
where step :: Int -> [[Int]] -> [[Int]]
step = undefined
start :: [[Int]]
start = undefined
We need to choose an initial value, start, and a stepping function step. We know their types because the type of foldr is (a -> b -> b) -> b -> [a] -> b, and in this case, a is Int (because xs is [Int], which lines up with [a]), and the b we want to end up with is [[Int]].
Now remember, the stepping function will inspect the elements of the list, one by one, and use step to fuse them into an accumulator. I will call the currently inspected element v, and the accumulator acc.
step v acc = undefined
Remember, in theory, foldr works its way from right to left. So suppose we have the list [1,2,3,3]. Let's step through the algorithm, starting with the rightmost 3 and working our way left.
step 3 start = [[3]]
Whatever start is, when we combine it with 3 it should end up as [[3]]. We know this because if the original input list to groupSorted were simply [3], then we would want [[3]] as a result. However, it isn't just [3]. Let's pretend now that it's just [3,3]. [[3]] is the new accumulator, and the result we would want is [[3,3]].
step 3 [[3]] = [[3,3]]
What should we do with these inputs? Well, we should tack the 3 onto that inner list. But what about the next step?
step 2 [[3,3]] = [[2],[3,3]]
In this case, we should create a new list with 2 in it.
step 1 [[2],[3,3]] = [[1],[2],[3,3]]
Just like last time, in this case we should create a new list with 1 inside of it.
At this point we have traversed the entire input list, and have our final result. So how do we define step? There appear to be two cases, depending on a comparison between v and acc.
step v acc#((x:xs):xss) | v == x = (v:x:xs) : xss
| otherwise = [v] : acc
In one case, v is the same as the head of the first sublist in acc. In that case we prepend v to that same sublist. But if such is not the case, then we put v in its own list and prepend that to acc. So what should start be? Well, it needs special treatment; let's just use [] and add a special pattern match for it.
step elem [] = [[elem]]
start = []
And there you have it. All you have to do to write your on fold is determine what start and step are, and you're done. With some cleanup and eta reduction:
groupSorted = foldr step []
where step v [] = [[v]]
step v acc#((x:xs):xss)
| v == x = (v:x:xs) : xss
| otherwise = [v] : acc
This may not be the most efficient solution, but it works, and if you later need to optimize, you at least have an idea of how this function works.

I don't want to spoil all the fun, but a group function would be helpful. Unfortunately it is defined in Data.List, so you need to write your own. One possible way would be:
-- corrected version, see comments
grp [] = []
grp (x:xs) = let a = takeWhile (==x) xs
b = dropWhile (==x) xs
in (x : a) : grp b
E.g. grp [1,1,2,2,3,3,3] gives [[1,1],[2,2],[3,3,3]]. I think from there you can find the solution yourself.

I'd try the following:
mostFrequent = snd . foldl1 max . map mark . group
where
mark (a:as) = (1 + length as, a)
mark [] = error "cannot happen" -- because made by group
Note that it works for any finite list that contains orderable elements, not just integers.

Merging an unbound number of ordered infinite sequences

I want to generate all natural numbers together with their decomposition in prime factors, up to a certain threshold.
I came up with the following function:
vGenerate :: [a] -- generator set for monoid B* (Kleene star of B)
-> (a, (a -> a -> a)) -- (identity element, generating function)
-> (a -> Bool) -- filter
-> [a] -- B* filtered
vGenerate [] (g0,_) _ = [g0]
vGenerate (e:es) (g0,g) c =
let coEs = vGenerate es (g0,g) c
coE = takeWhile (c) $ iterate (g e) g0
in concatMap (\m -> takeWhile (c) $ map (g m) coE) coEs
gen then generates all natural numbers together with their prime factors:
gen threshold =
let b = map (\x -> (x,[x])) $ takeWhile (<= threshold) primes
condition = (<= threshold) . fst
g0 = (1,[])
g = \(n,nl)(m,ml) -> ((n*m), nl ++ ml)
in vGenerate b (g0,g) condition
primes = [2,3,5,7,11,.. ] -- pseudo code
I have the following questions:
It is not always known in advance how many numbers we will need. Can we modify vGenerate such that it starts with a lazy infinite list of primes, and generates all the factorizations in increasing order? The challenge is that we have an infinite list of primes, for each prime an infinite list of powers of that prime number, and then have to take all possible combinations. The lists are naturally ordered by increasing first element, so they could be generated lazily.
I documented vGenerate in terms of monoid, with the intention to keep it as abstract as possible, but perhaps this just obfuscates the code? I want to generalize it later (more as an exercise than for real usage), e.g. for generating raster points within certain constraints, which can also be put in the monoid context, so I thought it was a good start to get rid of all references to the problem space (in casu: primes). But I feel that the filtering function does not fit well in the abstraction: the generation must happen in an order that is monotonous for the metric tested by c, because recursion is terminated as soon as c is not satisfied. Any advice?

Have a look at mergeAll :: Ord a => [[a]] -> [a] from the data-ordlist package. It merges an unbound number of infinite sequences as long as the sequences are ordered, and the heads of the sequences are ordered. I've used it for similar problems before, for example to generate all numbers of the form 2^i*3^j.
> let numbers = mergeAll [[2^i*3^j | j <- [0..]] | i <- [0..]]
> take 20 numbers
[1,2,3,4,6,8,9,12,16,18,24,27,32,36,48,54,64,72,81,96]
You should be able to extend this to generate all numbers with their factorizations.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Linking in tree structures - haskell

Related

Directly generating specific subsets of a powerset?

Stuck - Practice exam Q for Haskell coding: Return the longest String in a list of Strings

How can I iterate over a string without recursion?

Compute Most Frequent Occurance of Numbers of A Sorted List in Haskell

Merging an unbound number of ordered infinite sequences

Categories

Resources