Filter out the format [n] from a string in Haskell - haskell

Hey guys I'm relatively new to Haskell and I'm trying to return the number of references of the format [n] from a string.
For example let x = "blah blah [1] blah blah [1] blah blah [2]" should return 3 since the function would see that there are three references in this function. I'm having trouble trying to match against [n] since n can be any number. I have tried things such as filter([_]) but had no luck.

You can quite easily achieve your goal with standard Haskell alone. The general name for this kind of task is parsing, but at least for such simple grammars that amounts mostly pattern matching on string-heads. Very easy because Haskell strings are just lists of characters!
First, think about what signature you want.
extractBracketNums :: String -> [Integer]
seems a good idea. If you only want to count them, the result type could as well be a single Int, but if you actually output the list of numbers you keep the function more generally useful, and it's trivial to obtain the count afterwards with length.
Now, to implement this we need three cases:
extractBracketNums "" = []
this is just the base case, ensuring that any recursion we do will safely terminate when the string has been fully consumed. Nothing in the string means, no bracketed numbers either!
extractBracketNums ('[':r) = ...
This is the interesting case: we've found an opening bracket, it might be the beginning of a number!
Before actually implementing that case, we need to consider one more:
extractBracketNums (_:r) = extractBracketNums
This means: a (sub-) string that doesn't begin with an opening bracket is uninteresting for us, so just discard any such character.
Now for the interesting case: we need to check that after the opening bracket comes a number, and only a number, before the bracket is closed again. So, we need to try to split off a number from the beginning of the remaining string. This can be done quite well with the span function. Its signature is (a->Bool) -> [a] -> ([a],[a]), so for us (Char->Bool) -> String -> (String,String). It takes a predicate on characters, which should determine whether a character is possibly part of a number-string. The module Data.Char has that exact function available.
extractBracketNums ('[':r) = case span isNumber r of
...
so now we need to decide if we really found a number, and nothing else. If we didn't find a number right at the beginning, then span will immediately fail, i.e. give an empty string as its first result.
extractBracketNums ('[':r) = case span isNumber r of
([],_) -> extractBracketNums r
i.e., in this case I just forget about the opening bracket since it doesn't contain (only) a number. Same plan if we do find a number but not a closing bracket immediately after it (we'll cover that one in the end) but if we do find a closing bracket directly after the split off number, we can take that number, convert it from string to Integer, prepend it to any possible other numbers yet to be found. And be happy.
(num,']':r') -> read num : extractBracketNums r'
Now the completing case (did find some number, but no closing bracket after it): this also means we'll simply forget about the opening bracket.
(_,_) -> extractBracketNums r
So, all together:
import Data.Char (isNumber)
extractBracketNums :: String -> [Integer]
extractBracketNums "" = []
extractBracketNums ('[':r) = case span isNumber r of
([],_) -> extractBracketNums r
(num,']':r') -> read num : extractBracketNums r'
(_,_) -> extractBracketNums r
extractBracketNums (_:r) = extractBracketNums r

You may use Text.Regex.Posix for that:
\> import Text.Regex.Posix ((=~))
\> let txt = "blah blah [1] blah blah [1] blah blah [2]"
\> let pat = "\\[[0-9]+\\]" -- regular expression pattern to match
\> txt =~ pat :: Int
3
The function =~ is overloaded on return type; for example if you want to see what matched, you use the exact same function call with different type annotation:
\> txt =~ pat :: [[String]]
[["[1]"],["[1]"],["[2]"]]

Related

Unary predicate to check for a character being in string

I'm reading Real World Haskell, and I tried to implement the splitLines code myself, and I came up with more or less the same implementation (Chapter 4, page 73):
splitLines :: String -> [String]
splitLines [] = []
splitLines ('\r':a) = splitLines a
splitLines ('\n':a) = splitLines a
splitLines a = let (l,r) = break isCRorNL a
in l:splitLines r
where isCRorNL e = ???
--the book defines isCRorNL c = c == '\n' || c == '\r'
However, I've been spending definitely too much time trying to write the isCRorNL in the most functional and readable way I could think of, so that I can get rid of the where and turning the last definition of splitLines into an amost-english sentence (just like compare `on` length and the likes), without success.
Some sparse thoughts I have been going through:
A lambda, (\c -> c == '\n' || c == '\r'), is just too much power and too little expressiveness for such a simple and specific task;
furthermore, it contains a fair amount of duplicated code and/or it is uselessly verbose.
Whatever I have to put in isCRorNL has to have type Char -> Bool,
therefore it can have any type a1 -> a2 -> ... -> an -> Char -> Bool if I provide it with the first n arguments.
The any function can help me checking if a given character is either '\n' or '\r' or, in other words, if it is in the list of Chars "\n\r".
Since I want to check for equality, I can pass (==) to my function.
Therefore isCRorNL can have type (Char -> Char -> Bool) -> [Char] -> Char -> Bool (or with the first two argument inverted), and I can pass to it (==) as the first argument and "\n\r" as the second argument.
So I was looking for some standard functions I could compose to get such a function.
Finally I gave up and defined it this way: isCRorNL e = any (== e) "\n\r"; I think this is quite good as regards extensibility, as I can add as many characters in the "…", and I can change the operator ==; sadly I cannot put the function directly where it is used, as I am not able to write it as a partially applied function.
How would you do it?
As soon as I looked for the link in the question and visited it (for the first time), I realized that the code chunks are commented by readers, and the first comment under splitLines reads:
augustss 2008-04-23
[...] If you're making a point about functional
style maybe you should use
isLineSeparator = (`elem` "\r\n")
So it comes out I was thinking to much about composition of functions, while the easiest solution lies in the partial application of a so simple function, elem. The drawback here is that the operator used to check for equality is built in elem and cannot be changed. Nonetheless I feel dumb for not having thought to elem myself.

Haskell + counting substring [duplicate]

Hey guys I'm relatively new to Haskell and I'm trying to return the number of references of the format [n] from a string.
For example let x = "blah blah [1] blah blah [1] blah blah [2]" should return 3 since the function would see that there are three references in this function. I'm having trouble trying to match against [n] since n can be any number. I have tried things such as filter([_]) but had no luck.
You can quite easily achieve your goal with standard Haskell alone. The general name for this kind of task is parsing, but at least for such simple grammars that amounts mostly pattern matching on string-heads. Very easy because Haskell strings are just lists of characters!
First, think about what signature you want.
extractBracketNums :: String -> [Integer]
seems a good idea. If you only want to count them, the result type could as well be a single Int, but if you actually output the list of numbers you keep the function more generally useful, and it's trivial to obtain the count afterwards with length.
Now, to implement this we need three cases:
extractBracketNums "" = []
this is just the base case, ensuring that any recursion we do will safely terminate when the string has been fully consumed. Nothing in the string means, no bracketed numbers either!
extractBracketNums ('[':r) = ...
This is the interesting case: we've found an opening bracket, it might be the beginning of a number!
Before actually implementing that case, we need to consider one more:
extractBracketNums (_:r) = extractBracketNums
This means: a (sub-) string that doesn't begin with an opening bracket is uninteresting for us, so just discard any such character.
Now for the interesting case: we need to check that after the opening bracket comes a number, and only a number, before the bracket is closed again. So, we need to try to split off a number from the beginning of the remaining string. This can be done quite well with the span function. Its signature is (a->Bool) -> [a] -> ([a],[a]), so for us (Char->Bool) -> String -> (String,String). It takes a predicate on characters, which should determine whether a character is possibly part of a number-string. The module Data.Char has that exact function available.
extractBracketNums ('[':r) = case span isNumber r of
...
so now we need to decide if we really found a number, and nothing else. If we didn't find a number right at the beginning, then span will immediately fail, i.e. give an empty string as its first result.
extractBracketNums ('[':r) = case span isNumber r of
([],_) -> extractBracketNums r
i.e., in this case I just forget about the opening bracket since it doesn't contain (only) a number. Same plan if we do find a number but not a closing bracket immediately after it (we'll cover that one in the end) but if we do find a closing bracket directly after the split off number, we can take that number, convert it from string to Integer, prepend it to any possible other numbers yet to be found. And be happy.
(num,']':r') -> read num : extractBracketNums r'
Now the completing case (did find some number, but no closing bracket after it): this also means we'll simply forget about the opening bracket.
(_,_) -> extractBracketNums r
So, all together:
import Data.Char (isNumber)
extractBracketNums :: String -> [Integer]
extractBracketNums "" = []
extractBracketNums ('[':r) = case span isNumber r of
([],_) -> extractBracketNums r
(num,']':r') -> read num : extractBracketNums r'
(_,_) -> extractBracketNums r
extractBracketNums (_:r) = extractBracketNums r
You may use Text.Regex.Posix for that:
\> import Text.Regex.Posix ((=~))
\> let txt = "blah blah [1] blah blah [1] blah blah [2]"
\> let pat = "\\[[0-9]+\\]" -- regular expression pattern to match
\> txt =~ pat :: Int
3
The function =~ is overloaded on return type; for example if you want to see what matched, you use the exact same function call with different type annotation:
\> txt =~ pat :: [[String]]
[["[1]"],["[1]"],["[2]"]]

converting a list of string into a list of tuples in Haskell

I have a list of strings:
[" ix = index"," ctr = counter"," tbl = table"]
and I want to create a tuple from it like:
[("ix","index"),("ctr","counter"),("tbl","table")]
I even tried:
genTuple [] = []
genTuples (a:as)= do
i<-splitOn '=' a
genTuples as
return i
Any help would be appriciated
Thank you.
Haskell's type system is really expressive, so I suggest to think about the problem in terms of types. The advantage of this is that you can solve the problem 'top-down' and the whole program can be typechecked as you go, so you can catch all kinds of errors early on. The general approach is to incrementally divide the problem into smaller functions, each of which remaining undefined initially but with some plausible type.
What you want is a function (let's call it convert) which take a list of strings and generates a list of tuples, i.e.
convert :: [String] -> [(String, String)]
convert = undefined
It's clear that each string in the input list will need to be parsed into a 2-tuple of strings. However, it's possible that the parsing can fail - the sheer type String makes no guarantees that your input string is well formed. So your parse function maybe returns a tuple. We get:
parse :: String -> Maybe (String, String)
parse = undefined
We can immediately plug this into our convert function using mapMaybe:
convert :: [String] -> [(String, String)]
convert list = mapMaybe parse list
So far, so good - but parse is literally still undefined. Let's say that it should first verify that the input string is 'valid', and if it is - it splits it. So we'll need
valid :: String -> Bool
valid = undefined
split :: String -> (String, String)
split = undefined
Now we can define parse:
parse :: String -> Maybe (String, String)
parse s | valid s = Just (split s)
| otherwise = Nothing
What makes a string valid? Let's say it has to contain a = sign:
valid :: String -> Bool
valid s = '=' `elem` s
For splitting, we'll take all the characters up to the first = for the first tuple element, and the rest for the second. However, you probably want to trim leading/trailing whitespace as well, so we'll need another function. For now, let's make it a no-op
trim :: String -> String
trim = id
Using this, we can finally define
split :: String -> (String, String)
split s = (trim a, trim (tail b))
where
(a, b) = span (/= '=') s
Note that we can safely call tail here because we know that b is never empty because there's always a separator (that's what valid verified). Type-wise, it would've been nice to express this guarantee using a "non-empty string" but that may be a bit overengineered. :-)
Now, there are a lot of solutions to the problem, this is just one example (and there are ways to shorten the code using eta reduction or existing libraries). The main point I'm trying to get across is that Haskell's type system allows you to approach the problem in a way which is directed by types, which means the compiler helps you fleshing out a solution from the very beginning.
You can do it like this:
import Control.Monda
import Data.List
import Data.List.Split
map ((\[a,b] -> (a,b)) . splitOn "=" . filter (/=' ')) [" ix = index"," ctr = counter"," tbl = table"]

Why am I receiving this syntax error - possibly due to bad layout?

I've just started trying to learn haskell and functional programming. I'm trying to write this function that will convert a binary string into its decimal equivalent. Please could someone point out why I am constantly getting the error:
"BinToDecimal.hs":19 - Syntax error in expression (unexpected `}', possibly due to bad layout)
module BinToDecimal where
total :: [Integer]
total = []
binToDecimal :: String -> Integer
binToDecimal a = if (null a) then (sum total)
else if (head a == "0") then binToDecimal (tail a)
else if (head a == "1") then total ++ (2^((length a)-1))
binToDecimal (tail a)
So, total may not be doing what you think it is. total isn't a mutable variable that you're changing, it will always be the empty list []. I think your function should include another parameter for the list you're building up. I would implement this by having binToDecimal call a helper function with the starting case of an empty list, like so:
binToDecimal :: String -> Integer
binToDecimal s = binToDecimal' s []
binToDecimal' :: String -> [Integer] -> Integer
-- implement binToDecimal' here
In addition to what #Sibi has said, I would highly recommend using pattern matching rather than nested if-else. For example, I'd implement the base case of binToDecimal' like so:
binToDecimal' :: String -> [Integer] -> Integer
binToDecimal' "" total = sum total -- when the first argument is the empty string, just sum total. Equivalent to `if (null a) then (sum total)`
-- Include other pattern matching statements here to handle your other if/else cases
If you think it'd be helpful, I can provide the full implementation of this function instead of giving tips.
Ok, let me give you hints to get you started:
You cannot do head a == "0" because "0" is String. Since the type of a is [Char], the type of head a is Char and you have to compare it with an Char. You can solve it using head a == '0'. Note that "0" and '0' are different.
Similarly, rectify your type error in head a == "1"
This won't typecheck: total ++ (2^((length a)-1)) because the type of total is [Integer] and the type of (2^((length a)-1)) is Integer. For the function ++ to typecheck both arguments passed to it should be list of the same type.
You are possible missing an else block at last. (before the code binToDecimal (tail a))
That being said, instead of using nested if else expression, try to use guards as they will increase the readability greatly.
There are many things we can improve here (but no worries, this is perfectly normal in the beginning, there is so much to learn when we start Haskell!!!).
First of all, a string is definitely not an appropriate way to represent a binary, because nothing prevents us to write "éaldkgjasdg" in place of a proper binary. So, the first thing is to define our binary type:
data Binary = Zero | One deriving (Show)
We just say that it can be Zero or One. The deriving (Show) will allow us to have the result displayed when run in GHCI.
In Haskell to solve problem we tend to start with a more general case to dive then in our particular case. The thing we need here is a function with an additional argument which holds the total. Note the use of pattern matching instead of ifs which makes the function easier to read.
binToDecimalAcc :: [Binary] -> Integer -> Integer
binToDecimalAcc [] acc = acc
binToDecimalAcc (Zero:xs) acc = binToDecimalAcc xs acc
binToDecimalAcc (One:xs) acc = binToDecimalAcc xs $ acc + 2^(length xs)
Finally, since we want only to have to pass a single parameter we define or specific function where the acc value is 0:
binToDecimal :: [Binary] -> Integer
binToDecimal binaries = binToDecimalAcc binaries 0
We can run a test in GHCI:
test1 = binToDecimal [One, Zero, One, Zero, One, Zero]
> 42
OK, all fine, but what if you really need to convert a string to a decimal? Then, we need a function able to convert this string to a binary. The problem as seen above is that not all strings are proper binaries. To handle this, we will need to report some sort of error. The solution I will use here is very common in Haskell: it is to use "Maybe". If the string is correct, it will return "Just result" else it will return "Nothing". Let's see that in practice!
The first function we will write is to convert a char to a binary. As discussed above, Nothing represents an error.
charToBinary :: Char -> Maybe Binary
charToBinary '0' = Just Zero
charToBinary '1' = Just One
charToBinary _ = Nothing
Then, we can write a function for a whole string (which is a list of Char). So [Char] is equivalent to String. I used it here to make clearer that we are dealing with a list.
stringToBinary :: [Char] -> Maybe [Binary]
stringToBinary [] = Just []
stringToBinary chars = mapM charToBinary chars
The function mapM is a kind of variation of map which acts on monads (Maybe is actually a monad). To learn about monads I recommend reading Learn You a Haskell for Great Good!
http://learnyouahaskell.com/a-fistful-of-monads
We can notice once more that if there are any errors, Nothing will be returned.
A dedicated function to convert strings holding binaries can now be written.
binStringToDecimal :: [Char] -> Maybe Integer
binStringToDecimal = fmap binToDecimal . stringToBinary
The use of the "." function allow us to define this function as an equality with another function, so we do not need to mention the parameter (point free notation).
The fmap function allow us to run binToDecimal (which expect a [Binary] as argument) on the return of stringToBinary (which is of type "Maybe [Binary]"). Once again, Learn you a Haskell... is a very good reference to learn more about fmap:
http://learnyouahaskell.com/functors-applicative-functors-and-monoids
Now, we can run a second test:
test2 = binStringToDecimal "101010"
> Just 42
And finally, we can test our error handling system with a mistake in the string:
test3 = binStringToDecimal "102010"
> Nothing

Haskell negating functions

This is my task:
Write a function:
onlyDigits :: String -> String
that strips all non-digit characters from a string (for example, onlyDigits "ac245d62"
is "24562").
I have this:
onlyDigits :: String -> String
onlyDigits a = [ (not)isAlpha b | b <- a ]
But i can't compile it
Can anyone see where i've gone wrong?
By writing
(not) isAlpha b
you're applying not to the two arguments isAlpha and b, and that's probably why the compiler complains.
If you fix this little mistake and now write:
onlyDigits :: String -> String
onlyDigits a = [ not (isAlpha b) | b <- a ]
you'll still get an error since this creates a list of Bools!
What you probably want is:
onlyDigits :: String -> String
onlyDigits a = [ b | b <- a, not $ isAlpha b ]
This will take all the elements b of a, that fulfil the condition not (isAlpha b).
You could also use the filter function and have a point-free function:
onlyDigits :: String -> String
onlyDigits = filter (not.isAlpha)
or even better:
onlyDigits :: String -> String
onlyDigits = filter isDigit
to only keep digits!
You actually have 2 errors.
The first one is a type error, stemming from not using function application correctly.
But the second one is that your function will still not do what you want if you only filter one category of unwanted characters. For example, you also do not want space characters. Or graphic symbols.
Sometimes, it is better to say "I want this" instead of "I want anything but ... (long list)".
So, filter for the property you want, not for the negation of one of the properties you don't want.

Resources