Haskell Tree construction - string

Need some help writing a Haskell function that takes a string and creates a binary tree. Need some help from someone with a little better Haskell experience to fill in some holes for me and describe why because this is a learning experience for me.
I'm given a tree encoded in a single string for a project in Haskell (Example **B**DECA). The star denotes a node any other character denotes a Leaf. I'm trying to fill this data structure with the information read in from input.
data Trie = Leaf Char | Branch Trie Trie
I'm more of a math and imperative programming guy so I made the observation that I can define a subtree by parsing from left to right. A proper tree will have 1 more character than *. Mathematically I would think of a recursive structure.
If the first character is not a * the solution is the first character. Else the solution is a Branch where the subbranches are feed back into the function where the left Branch is the first set of characters where Characters out number *'s and the right Branch is everything else.
constructTrie :: String -> Trie
constructTrie x = if x !! 1 == '*' then
let leftSubtree = (first time in drop 1 x where characters out number *'s)
rightSubtree = (rest of the characters in the drop 1 x)
in Branch constructTrie(leftSubtree) constructTrie(rightSubtree)
else Leaf x !! 1
Mostly I need help defining the left and right subtree and if there is anything wrong with defining it this way.

!! (which, by the way, is 0-indexed) is usually a no-go. It's a very "imperative" thing to do, and it's especially smelly with constant indices like here. That means you really want a pattern match. Also, splitting a list (type String = [Char]) at an index and operating on the two parts separately is a bad idea, because these lists are linked and immutable, so you'll end up copying the entire first part.
The way you want to do this is as follows:
If the string is empty, fail.
If it starts with a *, then do something that somehow parses the left subtree and gets the remainder of the list in one step, and then parse the right one out of that remainder, making a Branch.
If it's another character, make a Leaf.
There's no need to figure out where to split the string, actually split the string, and then parse the halves; just parse the list until you can't anymore and then whatever's left (or should I say right?) can be parsed again.
So: define a function constructTrie' :: String -> Maybe (Trie, String) that consumes the start of a String into a Trie, and then leaves the unparsed bit behind (and gives Nothing if it just fails to parse). This function will be recursive, which is why it gets that extra output value: it needs extra plumbing to move the remainder of the list around. constructTrie itself can then be defined as a wrapper around it:
-- Maybe Trie because it's perfectly possible that the String just won't parse
constructTrie :: String -> Maybe Trie
constructTrie s = do (t, "") <- constructTrie' s
-- patmat failure in a monad calls fail; fail #Maybe _ = Nothing
return t
-- can make this local to constructTrie in a where clause
-- or leave it exposed in case it's useful
constructTrie' :: String -> Maybe (Trie, String)
constructTrie' "" = Nothing -- can't parse something from nothing!
constructTrie' ('*':leaves) = do (ln, rs) <- constructTrie' leaves
-- Parse out left branch and leave the right part
-- untouched. Doesn't copy the left half
(rn, rest) <- constructTrie' rs
-- Parse out the right branch. Since, when parsing
-- "**ABC", the recursion will end up with
-- constructTrie' "*ABC", we should allow the slop.
return (Branch ln rn, rest)
constructTrie' (x:xs) = return (Leaf x, xs)
This is a very common pattern: defining a recursive function with extra plumbing to pass around some state and wrapping it in a nicer one. I guess it corresponds to how imperative loops usually mutate variables to keep their state.

Related

Haskell recursive program

I begin the function from here and don't know what to do next. Please help me in solving this function.
Write a Haskell recursive function noDupl which returns True if there
are no duplicates characters in the given string.
noDupl :: String -> Bool
noDupl = ?
Example Output:
noDupl "abcde"
True
noDupl "aabcdee"
False
Well, you've got the type signature right. Now like all recursion questions you can then think about the base case (where the recursion ends) and the recursive case (which will recurse with a smaller input).
For strings (and lists in general), the base case is usually the empty string (list). The recursive case usually takes the head of the list, processes it, then pushes to the front of the new result.
This probably sounds pretty confusing. It'll make sense when you look at some examples:
-- Increment each character by one (by ASCII).
incAll :: String -> String
incAll [] = [] -- Base case: empty string (list).
incAll (x:xs) = chr (ord x + 1) : incAll xs -- Recursive case, process head and prepend to recursed result.
There's a more concise way to write the above, but it demonstrates how recursion could be done.
Of course, you don't have to process each char individually, you could pattern match on two chars like so:
f (x0:x1:xs) = ...
(But you'll need to be careful with the base case.)
Hopefully this provides you with enough hints to write noDupl.

Split string on multiple delimiters of any length in Haskell

I am attempting a Haskell coding challenge where, given a certain string with a prefix indicating which substrings are delimiting markers, a list needs to be built from the input.
I have already solved the problem for multiple single-length delimiters, but I am stuck with the problem where the delimiters can be any length. I use splitOneOf from Data.List.Split, but this works for character (length 1) delimiters only.
For example, given
input ";,\n1;2,3,4;10",
delimiters are ';' and ','
splitting the input on the above delivers
output [1,2,3,4,10]
The problem I'm facing has two parts:
Firstly, a single delimiter of any length, e.g.
"****\n1****2****3****4****10" should result in the list [1,2,3,4,10].
Secondly, more than one delimiter can be specified, e.g.
input "[***][||]\n1***2||3||4***10",
delimiters are "***" and "||"
splitting the input on the above delivers
output [1,2,3,4,10]
My code for retrieving the delimiter in the case of character delimiters:
--This gives the delimiters as a list of characters, i.e. a String.
getDelimiter::String->[Char]
getDelimiter text = head . splitOn "\n" $ text
--drop "[delimiters]\n" from the input
body::String->String
body text = drop ((length . getDelimiter $ text)+1)) $ text
--returns tuple with fst being the delimiters, snd the body of the input
doc::String->(String,String)
doc text = (getDelimiter text, body text)
--given the delimiters and the body of the input, return a list of strings
numbers::(String,String)->[String]
numbers (delim, rest) = splitOneOf delim rest
--input ",##\n1,2#3#4" gives output ["1","2","3","4"]
getList::String->[String]
getList text = numbers . doc $ text
So my question is, how do I do the processing for when the delimiters are e.g. "***" and "||"?
Any hints are welcome, especially in a functional programming context.
If you don't mind making multiple passes over the input string, you can use splitOn from Data.List.Split, and gradually split the input string using one delimiter at a time.
You can write this fairly succinctly using foldl':
import Data.List
import Data.List.Split
splitOnAnyOf :: Eq a => [[a]] -> [a] -> [[a]]
splitOnAnyOf ds xs = foldl' (\ys d -> ys >>= splitOn d) [xs] ds
Here, the accumulator for the fold operation is a list of strings, or more generally [[a]], so you have to 'lift' xs into a list, using [xs].
Then you fold over the delimiters ds - not the input string to be parsed. For each delimiter d, you split the accumulated list of strings with splitOn, and concatenate them. You could also have used concatMap, but here I arbitrarily chose to use the more general >>= (bind) operator.
This seems to do what is required in the OP:
*Q49228467> splitOnAnyOf [";", ","] "1;2,3,4;10"
["1","2","3","4","10"]
*Q49228467> splitOnAnyOf ["***", "||"] "1***2||3||4***10"
["1","2","3","4","10"]
Since this makes multiple passes over temporary lists, it's most likely not the fastest implementation you can make, but if you don't have too many delimiters, or extremely long lists, this may be good enough.
This problem has two kinds of solutions: the simple, and the efficient. I will not cover the efficient (because it is not simple), though I will hint on it.
But first, the part where you extract the delimiter and body parts of the input, may be simplified with Data.List.break:
delims = splitOn "/" . fst . break (== '\n') -- Presuming the delimiters are delimited with
-- a slash.
body = snd . break (== '\n')
In any way, we may reduce this problem to finding the positions of all the given patterns in a given string. (By saying "string", I do not mean the haskell String. Rather, I mean an arbitrarily long sequence (or even an infinite stream) of any symbols for which an Equality relation is defined, which is typed in Haskell as Eq a => [a]. I hope this is not too confusing.) As soon as we have the positions, we may slice the string to our hearts' content. If we want to deal with an infinite stream, we must obtain the positions incrementally, and yield the results as we go, which is a restriction that must be kept in mind. Haskell is equipped well enough to handle the stream case as well as the finite string.
A simple approach is to cast isPrefixOf on the string, for each of the patterns.
If some of them matches, we replace it with a Nothing.
Otherwise we mark the first symbol as Just and move to the next position.
Thus, we will have replaced all the different delimiters by a single one: Nothing. We may then readily slice the string by it.
This is fairly idiomatic, and I will bring the code to your judgement shortly. The problem with this approach is that it is inefficient: in fact, if a pattern failed to match, we would rather advance by more than one symbol.
It would be more efficient to base our work on the research that has been made into finding patterns in a string; this problem is well known and there are great, intricate algorithms that solve it an order of magnitude faster. These algorithms are designed to work with a single pattern, so some work must be put into adapting them to our case; however, I believe they are adaptable. The simplest and eldest of such algorithms is the KMP, and it is already encoded in Haskell. You may wish to take arms and generalize it − a quick path to some amount of fame.
Here is the code:
module SplitSubstr where
-- stackoverflow.com/questions/49228467
import Data.List (unfoldr, isPrefixOf, elemIndex)
import Data.List.Split (splitWhen) -- Package `split`.
import Data.Maybe (catMaybes, isNothing)
-- | Split a (possibly infinite) string at the occurrences of any of the given delimiters.
--
-- λ take 10 $ splitOnSubstrs ["||", "***"] "la||la***fa"
-- ["la","la","fa"]
--
-- λ take 10 $ splitOnSubstrs ["||", "***"] (cycle "la||la***fa||")
-- ["la","la","fa","la","la","fa","la","la","fa","la"]
--
splitOnSubstrs :: [String] -> String -> [String]
splitOnSubstrs delims
= fmap catMaybes -- At this point, there will be only `Just` elements left.
. splitWhen isNothing -- Now we may split at nothings.
. unfoldr f -- Replace the occurences of delimiters with a `Nothing`.
where
-- | This is the base case. It will terminate the `unfoldr` process.
f [ ] = Nothing
-- | This is the recursive case. It is divided into 2 cases:
-- * One of the delimiters may match. We will then replace it with a Nothing.
-- * Otherwise, we will `Just` return the current element.
--
-- Notice that, if there are several patterns that match at this point, we will use the first one.
-- You may sort the patterns by length to always match the longest or the shortest. If you desire
-- more complicated behaviour, you must plug a more involved logic here. In any way, the index
-- should point to one of the patterns that matched.
--
-- vvvvvvvvvvvvvv
f body#(x:xs) = case elemIndex True $ (`isPrefixOf` body) <$> delims of
Just index -> return (Nothing, drop (length $ delims !! index) body)
Nothing -> return (Just x, xs)
It might happen that you will not find this code straightforward. Specifically, the unfoldr part is somewhat dense, so I will add a few words about it.
unfoldr f is an embodiment of a recursion scheme. f is a function that may chip a part from the body: f :: (body -> Maybe (chip, body)).
As long as it keeps chipping, unfoldr keeps applying it to the body. This is called recursive case.
Once it fails (returning Nothing), unfoldr stops and hands you all the chips it thus collected. This is called base case.
In our case, f takes symbols from the string, and fails once the string is empty.
That's it. I hope you send me a postcard when you receive a Turing award for a fast splitting algorithm.

Capitalize Every Other Letter in a String -- take / drop versus head / tail for Lists

I have spent the past afternoon or two poking at my computer as if I had never seen one before. Today's topic Lists
The exercise is to take a string and capitalize every other letter. I did not get very far...
Let's take a list x = String.toList "abcde" and try to analyze it. If we add the results of take 1 and drop 1 we get back the original list
> x = String.toList "abcde"
['a','b','c','d','e'] : List Char
> (List.take 1 x) ++ (List.drop 1 x)
['a','b','c','d','e'] : List Char
I thought head and tail did the same thing, but I get a big error message:
> [List.head x] ++ (List.tail x)
==================================== ERRORS ====================================
-- TYPE MISMATCH --------------------------------------------- repl-temp-000.elm
The right argument of (++) is causing a type mismatch.
7│ [List.head x] ++ (List.tail x)
^^^^^^^^^^^
(++) is expecting the right argument to be a:
List (Maybe Char)
But the right argument is:
Maybe (List Char)
Hint: I always figure out the type of the left argument first and if it is
acceptable on its own, I assume it is "correct" in subsequent checks. So the
problem may actually be in how the left and right arguments interact.
The error message tells me a lot of what's wrong. Not 100% sure how I would fix it. The list joining operator ++ is expecting [Maybe Char] and instead got Maybe [Char]
Let's just try to capitalize the first letter in a string (which is less cool, but actually realistic).
[String.toUpper ( List.head x)] ++ (List.drop 1 x)
This is wrong since Char.toUpper requires String and instead List.head x is a Maybe Char.
[Char.toUpper ( List.head x)] ++ (List.drop 1 x)
This also wrong since Char.toUpper requires Char instead of Maybe Char.
In real life a user could break a script like this by typing non-Unicode character (like an emoji). So maybe Elm's feedback is right. This should be an easy problem it takes "abcde" and turns into "AbCdE" (or possibly "aBcDe"). How to handle errors properly?
The same question in JavaScript: How do I make the first letter of a string uppercase in JavaScript?
In Elm, List.head and List.tail both return they Maybe type because either function could be passed an invalid value; specifically, the empty list. Some languages, like Haskell, throw an error when passing an empty list to head or tail, but Elm tries to eliminate as many runtime errors as possible.
Because of this, you must explicitly handle the exceptional case of the empty list if you choose to use head or tail.
Note: There are probably better ways to achieve your end goal of string mixed capitalization, but I'll focus on the head and tail issue because it's a good learning tool.
Since you're using the concatenation operator, ++, you'll need a List for both arguments, so it's safe to say you could create a couple functions that handle the Maybe return values and translate them to an empty list, which would allow you to use your concatenation operator.
myHead list =
case List.head list of
Just h -> [h]
Nothing -> []
myTail list =
case List.tail list of
Just t -> t
Nothing -> []
Using the case statements above, you can handle all possible outcomes and map them to something usable for your circumstances. Now you can swap myHead and myTail into your code and you should be all set.

Why am I receiving this syntax error - possibly due to bad layout?

I've just started trying to learn haskell and functional programming. I'm trying to write this function that will convert a binary string into its decimal equivalent. Please could someone point out why I am constantly getting the error:
"BinToDecimal.hs":19 - Syntax error in expression (unexpected `}', possibly due to bad layout)
module BinToDecimal where
total :: [Integer]
total = []
binToDecimal :: String -> Integer
binToDecimal a = if (null a) then (sum total)
else if (head a == "0") then binToDecimal (tail a)
else if (head a == "1") then total ++ (2^((length a)-1))
binToDecimal (tail a)
So, total may not be doing what you think it is. total isn't a mutable variable that you're changing, it will always be the empty list []. I think your function should include another parameter for the list you're building up. I would implement this by having binToDecimal call a helper function with the starting case of an empty list, like so:
binToDecimal :: String -> Integer
binToDecimal s = binToDecimal' s []
binToDecimal' :: String -> [Integer] -> Integer
-- implement binToDecimal' here
In addition to what #Sibi has said, I would highly recommend using pattern matching rather than nested if-else. For example, I'd implement the base case of binToDecimal' like so:
binToDecimal' :: String -> [Integer] -> Integer
binToDecimal' "" total = sum total -- when the first argument is the empty string, just sum total. Equivalent to `if (null a) then (sum total)`
-- Include other pattern matching statements here to handle your other if/else cases
If you think it'd be helpful, I can provide the full implementation of this function instead of giving tips.
Ok, let me give you hints to get you started:
You cannot do head a == "0" because "0" is String. Since the type of a is [Char], the type of head a is Char and you have to compare it with an Char. You can solve it using head a == '0'. Note that "0" and '0' are different.
Similarly, rectify your type error in head a == "1"
This won't typecheck: total ++ (2^((length a)-1)) because the type of total is [Integer] and the type of (2^((length a)-1)) is Integer. For the function ++ to typecheck both arguments passed to it should be list of the same type.
You are possible missing an else block at last. (before the code binToDecimal (tail a))
That being said, instead of using nested if else expression, try to use guards as they will increase the readability greatly.
There are many things we can improve here (but no worries, this is perfectly normal in the beginning, there is so much to learn when we start Haskell!!!).
First of all, a string is definitely not an appropriate way to represent a binary, because nothing prevents us to write "éaldkgjasdg" in place of a proper binary. So, the first thing is to define our binary type:
data Binary = Zero | One deriving (Show)
We just say that it can be Zero or One. The deriving (Show) will allow us to have the result displayed when run in GHCI.
In Haskell to solve problem we tend to start with a more general case to dive then in our particular case. The thing we need here is a function with an additional argument which holds the total. Note the use of pattern matching instead of ifs which makes the function easier to read.
binToDecimalAcc :: [Binary] -> Integer -> Integer
binToDecimalAcc [] acc = acc
binToDecimalAcc (Zero:xs) acc = binToDecimalAcc xs acc
binToDecimalAcc (One:xs) acc = binToDecimalAcc xs $ acc + 2^(length xs)
Finally, since we want only to have to pass a single parameter we define or specific function where the acc value is 0:
binToDecimal :: [Binary] -> Integer
binToDecimal binaries = binToDecimalAcc binaries 0
We can run a test in GHCI:
test1 = binToDecimal [One, Zero, One, Zero, One, Zero]
> 42
OK, all fine, but what if you really need to convert a string to a decimal? Then, we need a function able to convert this string to a binary. The problem as seen above is that not all strings are proper binaries. To handle this, we will need to report some sort of error. The solution I will use here is very common in Haskell: it is to use "Maybe". If the string is correct, it will return "Just result" else it will return "Nothing". Let's see that in practice!
The first function we will write is to convert a char to a binary. As discussed above, Nothing represents an error.
charToBinary :: Char -> Maybe Binary
charToBinary '0' = Just Zero
charToBinary '1' = Just One
charToBinary _ = Nothing
Then, we can write a function for a whole string (which is a list of Char). So [Char] is equivalent to String. I used it here to make clearer that we are dealing with a list.
stringToBinary :: [Char] -> Maybe [Binary]
stringToBinary [] = Just []
stringToBinary chars = mapM charToBinary chars
The function mapM is a kind of variation of map which acts on monads (Maybe is actually a monad). To learn about monads I recommend reading Learn You a Haskell for Great Good!
http://learnyouahaskell.com/a-fistful-of-monads
We can notice once more that if there are any errors, Nothing will be returned.
A dedicated function to convert strings holding binaries can now be written.
binStringToDecimal :: [Char] -> Maybe Integer
binStringToDecimal = fmap binToDecimal . stringToBinary
The use of the "." function allow us to define this function as an equality with another function, so we do not need to mention the parameter (point free notation).
The fmap function allow us to run binToDecimal (which expect a [Binary] as argument) on the return of stringToBinary (which is of type "Maybe [Binary]"). Once again, Learn you a Haskell... is a very good reference to learn more about fmap:
http://learnyouahaskell.com/functors-applicative-functors-and-monoids
Now, we can run a second test:
test2 = binStringToDecimal "101010"
> Just 42
And finally, we can test our error handling system with a mistake in the string:
test3 = binStringToDecimal "102010"
> Nothing

Haskell add value to tree

Im trying to make a funciton which allows me to add a new value to a tree IF the value at the given path is equal to ND (no data), this was my first attempt.
It checks the value etc, but the problem, is i want to be able to print the modified tree with the new data. can any one give me any pointers? I have also tried making a second function that checks the path to see if its ok to add data, but im just lost to how to print out the modified tree?
As iuliux points out, your problem is that you are treating your BTree as though it were a mutable structure. Remember functions in haskell take arguments and return a value. That is all. So when you "map over" a list, or traverse a tree your function needs to return a new tree.
The code you have is traversing the recursive tree and only returning the last leaf. Imagine for now that the leaf at the end of the path will always be ND. This is what you want:
add :: a -> Path -> Btree a -> Btree a
add da xs ND = Data da
add _ [] _ = error "You should make sure this doesn't happen or handle it"
add da (x:xs) (Branch st st2) =
case x of
L -> Branch (add da xs st) st2
R -> Branch st (add da xs st2)
Notice how in your original code you discard the Branch you pattern match against, when what you need to do is return it "behind you" as it were.
Now, on to the issue of handling situations where the leaf you arrive it is not a ND constructor:
This type of problem is common in functional programming. How can you return your recursive data structure "as you go" when the final result depends on a leaf far down the tree?
One solution for the trickiest of cases is the Zipper, which is a data structure that lets you go up down and sideways as you please. For your case that would be overkill.
I would suggest you change your function to the following:
add :: a -> Path -> Btree a -> Maybe (Btree a)
which means at each level you must return a Maybe (Btree a). Then use the Functor instance of Maybe in your recursive calls. Notice:
fmap (+1) (Just 2) == Just 3
fmap (+1) (Nothing) == Nothing
You should try to puzzle out the implementation for yourself!
I'm no expert in Haskell, but functional programming only works with functions. So kind of anything is a function.
Now, your function takes some input and returns something, not modifing the input. You have to retain the returned tree somewhere and that will be your new tree, the one with inserted element in it
We really need to see the Path and Error data types to answer your question, but you can print out your trees using the IO Monad:
main :: IO()
main = do let b = Branch ND (Branch (Data 1) (Data 2))
let b1 = add 10 [L] b --actual call depends on definition of Path
(putStrLn . show) b1

Resources