Splitting string into type in Haskell - haskell

I need to create a parse function. I am new in Haskell and I am interesting can my thinking be implemented in Haskell using only GHC base functions.
So the problem is : I have so message in string with coordinates and value like (x: 01, 01, ...
y:01, 02,: v: X, Y, Z) and i need to parse it type like ([Char], [Int], [Int]).
In language like C , I would create loop and go from start and would check and then put it in there arrays but I am afraid this would not work in Haskell. Can someone give a hint on a approachable solutions to this problem?

If you’re accustomed to imperative programming with loops, you can actually do a fairly literal translation of an imperative solution to Haskell using direct recursion.
Bear in mind, this isn’t the easiest or best way to arrive at a working solution, but it’s good to learn the technique so that you understand what more idiomatic solutions are abstracting away for you.
The basic principle is to replace each loop with a recursive function, and replace each mutable variable with an accumulator parameter to that function. Where you would modify the variable within an iteration of the loop, just make a new variable; where you would modify it between iterations of the loop, call the looping function with a different argument in place of that parameter.
For a simple example, consider computing the sum of a list of integers. In C, that might be written like this:
struct ListInt { int head; struct ListInt *tail; }
int total(ListInt const *list) {
int acc = 0;
ListInt const *xs = list;
while (xs != NULL) {
acc += xs->head;
xs = xs->tail;
}
return acc;
}
We can translate that literally to low-level Haskell:
total :: [Int] -> Int
total list
= loop
0 -- acc = 0
list -- xs = list
where
loop
:: Int -- int acc;
-> [Int] -- ListInt const *xs;
-> Int
loop acc xs -- loop:
| not (null xs) = let -- if (xs != NULL) {
acc' = acc + head xs -- acc += xs->head;
xs' = tail xs -- xs = xs->tail;
in loop acc' xs' -- goto loop;
-- } else {
| otherwise = acc -- return acc;
-- }
The outer function total sets up the initial state, and the inner function loop handles the iteration over the input. In this case, total immediately returns after the loop, but if there were some more code after the loop to process the results, that would go in total:
total list = let
result = loop 0 list
in someAdditionalProcessing result
It’s extremely common in Haskell for a helper function to accumulate a list of results by prepending them to the beginning of an accumulator list with :, and then reversing this list after the loop, because appending a value to the end of a list is much more costly. You can think of this pattern as using a list as a stack, where : is the “push” operation.
Also, straight away, we can make some simple improvements. First, the accessor functions head and tail may throw an error if our code is wrong and we call them on empty lists, just like accessing a head or tail member of a NULL pointer (although an exception is clearer than a segfault!), so we can simplify it and make it safer use pattern matching instead of guards & head/tail:
loop :: Int -> [Int] -> Int
loop acc [] = acc
loop acc (h : t) = loop (acc + h) t
Finally, this pattern of recursion happens to be a fold: there’s an initial value of the accumulator, updated for each element of the input, with no complex recursion. So the whole thing can be expressed with foldl':
total :: [Int] -> Int
total list = foldl' (\ acc h -> acc + h) 0 list
And then abbreviated:
total = foldl' (+) 0
So, for parsing your format, you can follow a similar approach: instead of a list of integers, you have a list of characters, and instead of a single integer result, you have a compound data type, but the overall structure is very similar:
parse :: String -> ([Char], [Int], [Int])
parse input = let
(…, …, …) = loop ([], [], []) input
in …
where
loop (…, …, …) (c : rest) = … -- What to do for each character.
loop (…, …, …) [] = … -- What to do at end of input.
If there are different sub-parsers, where you would use a state machine in an imperative language, you can make the accumulator include a data type for the different states. For example, here’s a parser for numbers separated by spaces:
import Data.Char (isSpace, isDigit)
data ParseState
= Space
| Number [Char] -- Digit accumulator
numbers :: String -> [Int]
numbers input = loop (Space, []) input
where
loop :: (ParseState, [Int]) -> [Char] -> [Int]
loop (Space, acc) (c : rest)
| isSpace c = loop (Space, acc) rest -- Ignore space.
| isDigit c = loop (Number [c], acc) rest -- Push digit.
| otherwise = error "expected space or digit"
loop (Number ds, acc) (c : rest)
| isDigit c = loop (Number (c : ds), acc) rest -- Push digit.
| otherwise
= loop
(Space, read (reverse ds) : acc) -- Save number, expect space.
(c : rest) -- Repeat loop for same char.
loop (Number ds, acc) [] = let
acc' = read (reverse ds) : acc -- Save final number.
in reverse acc' -- Return final result.
loop (Space, acc) [] = reverse acc -- Return final result.
Of course, as you may be able to tell, this approach quickly becomes very complicated! Even if you write your code very compactly, or express it as a fold, if you’re working at the level of individual characters and parser state machines, it will take a lot of code to express your meaning, and there are many opportunities for error. A better approach is to consider the data flow at work here, and put together the parser from high-level components.
For example, the intent of the above parser is to do the following:
Split the input on whitespace
For each split, read it as an integer
And that can be expressed very directly with the words and map functions:
numbers :: String -> [Int]
numbers input = map read (words input)
One readable line instead of dozens! Clearly this approach is better. Consider how you can express the format you’re trying to parse in this style. If you want to avoid libraries like split, you can still write a function to split a string on separators using base functions like break, span, or takeWhile; then you can use that to split the input into records, and split each record into fields, and parse fields as integers or textual names accordingly.
But the preferred approach for parsing in Haskell is not to manually split up input at all, but to use parser combinator libraries like megaparsec. There are parser combinators in base too, under Text.ParserCombinators.ReadP. With those, you can express a parser in the abstract, without talking about splitting up input at all, by just combining subparsers with standard interfaces (Functor, Applicative, Alternative, and Monad), for example:
import Data.Char (isDigit)
import Text.ParserCombinators.ReadP
( endBy
, eof
, munch1
, readP_to_S
, skipSpaces
, skipSpaces
)
numbers :: String -> [Int]
numbers = fst . head . readP_to_S onlyNumbersP
where
onlyNumbersP :: ReadP [Int]
onlyNumbersP = skipSpaces *> numbersP <* eof
numbersP :: ReadP [Int]
numbersP = numberP `endBy` skipSpaces
numberP :: ReadP Int
numberP = read <$> munch1 isDigit
This is the approach I would recommend in your case. Parser combinators are also an excellent way to get comfortable using applicatives and monads in practice.

Related

Removing specific elements from lists in Haskell

I'm having a hard time getting Haskell and functional programming together in my head. What I am trying to do is manipulate a string so that I am printing/returning specific characters each time based on a number given. For example:
printing "testing" 2 = "etn"
printing "testing" 3 = "sn"
I've read a lot online, and from what I understand I can achieve this with filtering and cycling, but I cannot get/understand the syntax of this language to get a working program.
I'll try to describe my thought process so you can follow. This function fits the pattern of creating an output list (here a string) from an input seed (here a string) by repeated function application (here dropping some elements). Thus I choose an implementation with Data.List.unfoldr.
unfoldr :: (b -> Maybe (a, b)) -> b -> [a]
Okay so, I need to turn the seed b into (Maybe) an output a and the rest of the string. I'll call this subfunction f and pass it into unfoldr.
printing s n = unfoldr f s
where f b = case drop n b of
[] -> Nothing
(x:xs) -> Just (x,xs)
It turns out that attempting to take the head off the front of the list and returning a Maybe is also a common pattern. It's Data.List.uncons, so
printing s n = unfoldr (uncons . drop n) s
Very smooth! So I test it out, and the output is wrong! Your specified output actually eg. for n=2 selects every 2nd character, ie. drops (n-1) characters.
printing s n = unfoldr (uncons . drop (n-1)) s
I test it again and it matches the desired output. Phew!
To demonstrate the Haskell language some alternative solutions to the accepted answer.
Using list comprehension:
printing :: Int -> String -> String
printing j ls = [s | (i, s) <- zip [1 .. ] ls, mod i j == 0]
Using recursion:
printing' :: Int -> String -> String
printing' n ls
| null ls' = []
| otherwise = x : printing' n xs
where
ls' = drop (n - 1) ls
(x : xs) = ls'
In both cases I flipped the arguments so it is easier to do partial application: printing 5 for example is a new function and will give each 5th character when applied to a string.
Note with a minor modification they will work for any list
takeEvery :: Int -> [a] -> [a]

How to iterate over a list of characters and manipluate the characters in Haskell?

I am trying to go through a list of characters in a list and do something to the current character. My java equivalent of what I am trying to accomplish is:
public class MyClass {
void repeat(String s) {
String newString = "";
for(int i = 0; i < s.length(); i++) {
newString += s.charAt(i);
newString += s.charAt(i);
}
public static void main(String args[]) {
MyClass test = new MyClass();
test.repeat("abc");
}
}
One of the nicest thing about functional programming is that patterns like yours can be encapsulated in one higher-order function; if nothing fits, you can still use recursion.
Recursion
First up, a simple recursive solution. The idea behind this is that it's like a for-loop:
recursiveFunction [] = baseCase
recursiveFunction (char1:rest) = (doSomethingWith char1) : (recursiveFunction rest)
So let's write your repeat function in this form. What is the base case? Well, if you repeat an empty string, you'll get an empty string back. What is the recursion? In this case, we're doubling the first character, then recursing along the rest of the string. So here's a recursive solution:
repeat1 [] = []
repeat1 (c:cs) = c : c : (repeat1 cs)
Higher-order Functions
As you start writing more Haskell, you'll discover that these sort of recursive solutions often fit into a few repetitive patterns. Luckily, the standard library contains several predefined recursive functions for these sort of patterns:
fmap is used to map each element of a list to a different value using a function given as a parameter. For example, fmap (\x -> x + 1) adds 1 to each element of a list. Unfortunately, it can't change the length of a list, so we can't use fmap by itself.
concat is used to 'flatten' a nested list. For example, concat [[1,2],[3,4,5]] is [1,2,3,4,5].
foldr/foldl are two more complex and generic functions. For more details, consult Learn You a Haskell.
None of these seem to directly fit your needs. However, we can use concat and fmap together:
repeat2 list = concat $ fmap (\x -> [x,x]) list
The idea is that fmap changes e.g. [1,2,3] to a nested list [[1,1],[2,2],[3,3]], which concat then flattens. This pattern of generating multiple elements from a single one is so common that the combination even has a special name: concatMap. You use it like so:
repeat3 list = concatMap (\x -> [x,x]) list
Personally, this is how I'd write repeat in Haskell. (Well, almost: I'd use eta-reduction to simplify it slightly more. But at your level that's irrelevant.) This is why Haskell in my opinion is so much more powerful than many other languages: this 7-line Java method is one line of highly readable, idiomatic Haskell!
As others have suggested, it's probably wise to start with a list comprehension:
-- | Repeat each element of a list twice.
double :: [x] -> [x]
double xs = [d | x <- xs, d <- [x, x]]
But the fact that the second list in the comprehension always has the same number of elements, regardless of the value of x, means that we don't need quite that much power: the Applicative interface is sufficient. Let's start by writing the comprehension a bit differently:
double xs = xs >>= \x -> [x, x] >>= \d -> pure d
We can simplify immediately using a monad identity law:
double xs = xs >>= \x -> [x, x]
Now we switch over to Applicative, but let's leave a hole for the hard part:
double :: [x] -> [x]
double xs = liftA2 _1 xs [False, True]
The compiler lets us know that
_1 :: x -> Bool -> x
Since the elements of the inner/second list are always the same, and always come from the current outer/first list element, we don't have to care about the Bool:
double xs = liftA2 const xs [False, True]
Indeed, we don't even need to be able to distinguish the list positions:
double xs = liftA2 const xs [(),()]
Of course, we have a special Applicative method, (<*), that corresponds to liftA2 const, so let's use it:
double xs = xs <* [(),()]
And then, if we like, we can avoid mentioning xs by switching to a "point-free" form:
-- | Repeat each element of a list twice.
double :: [x] -> [x]
double = (<* [(),()])
Now for the test:
main :: IO ()
main = print $ double [1..3]
This will print [1,1,2,2,3,3].
double admits a slight generalization of dubious value:
double :: Alternative f => f x -> f x
double = (<* join (<|>) (pure ()))
This will work for sequences as well as lists:
double (Data.Sequence.fromList [1..3]) = Data.Sequence.fromList [1,1,2,2,3,3]
but it could be a bit confusing for some other Alternative instances:
double (Just 3) = Just 3

How to destructure a string into first, middle, and last?

I'm writing a function to determine whether a number is a palindrome.
What I would like to do in the first case is to destructure the string into the first character, all the characters in the middle, and the last character. What I do is check if the first character is equal to the last, and then if so, proceed to check the middle characters.
What I have is below, but it generates type errors upon compilation.
numberIsPalindrome :: Int -> Bool
numberIsPalindrome n =
case nString of
(x:xs:y) -> (x == y) && numberIsPalindrome xs
(x:y) -> x == y
x -> True
where nString = show n
Using the String representation is cheating...
Not really, but this is more fun:
import Data.List
palindrome n = list == reverse list where
list = unfoldr f n
f 0 = Nothing
f k = Just (k `mod` 10, k `div` 10)
What it does is creating a list of digits of the number (unfoldr is really useful for such tasks), and then comparing whether the list stays the same when reversed.
What you try has several problems, e.g. you miss a conversion from the number to a String (which is just a list of Char in Haskell), and lists work completely different from what you try: Think of them more as stacks, where you usually operate only on one end.
That said, there is an init and a last function for lists, which allow to work your way from the "outer" elements of the list to the inner ones. A naive (and inefficient) implementation could look like this:
palindrome n = f (show n) where
f [] = True
f [_] = True
f (x : xs) = (x == last xs) && (f (init xs))
But this is only for demonstration purposes, don't use such code in real live...
The definition you probably want is
numberIsPalindrome :: Int -> Bool
numberIsPalindrome num = let str = show num
in (str == reverse str)
The (:) operator is known as cons, it prepends items to lists:
1:2:[] results in [1,2]
You are getting a type error because you are trying to compare the first argument, a Char, with the last one, a [a].
If you really would like to compare the first with the last you would use head and last.
But you are better using the solution that taktoa proposed:
numberIsPalindrome :: Int -> Bool
numberIsPalindrome num =
numberString == reverse numberString
where numberString = show num

How to "pack" some strings in a list on Haskell?

I want to write a function pack such that
pack ['a','a','a','b','c','c','a','a','d','e','e','e']
= ["aaa","b","cc","aa","d","eee"]
How can I do this? I'm stuck...
Use Data.List.group:
λ> import Data.List (group)
λ> :t group
group :: Eq a => [a] -> [[a]]
λ> group ['a','a','a','b','c','c','a','a','d','e','e','e']
["aaa","b","cc","aa","d","eee"]
Unless you want to write the function yourself (see Michael Foukarakis answer)
Here's something off the top of my head:
pack :: (Eq a) => [a] -> [[a]]
pack [] = []
-- We split elements of a list recursively into those which are equal to the first one,
-- and those that are not. Then do the same for the latter:
pack (x:xs) = let (first, rest) = span (==x) xs
in (x:first) : pack rest
Data.List already has what you're looking for, though.
I think it's worth adding a more explicit/beginner version:
pack :: [Char] -> [String]
pack [] = []
pack (c:cs) =
let (v, s) = findConsecutive [c] cs
in v : pack s
where
findConsecutive ds [] = (ds, [])
findConsecutive s#(d:ds) t#(e:es)
| d /= e = (s, t)
| otherwise = findConsecutive (e:s) es
If the input is an empty list, the outcome is also an empty list. Otherwise, we find the next consecutive Chars that are equal and group them together into a String, which is returned in the result list. In order to do that we use the findConsecutive auxiliary function. This function's behavior resembles the takeWhile function, with the difference that we know in advance the predicate to use (equality comparison) and that we return both the consumed and the remaining list.
In other words, the signature of findConsecutive could be written as:
findConsecutive :: String -> [Char] -> (String, String)
which means that it takes a string containing only repeated characters to be used as an accumulator and a list whose characters are "extracted" from. It returns a tuple containing the current sequence of elements and the remaining list. Its body should be intuitive to follow: while the characters list is not empty and the current element is equal to the ones in the accumulator, we add the character to the accumulator and recursive into the function. The function returns when we reach the end of the list or a different character is encountered.
The same rationale can be used to understand the body of pack.

Reading numbers from input Haskell

I want to have a function that reads arbitrary int's until the number '0' is inserted, and then presents the numbers inserted in an ordered list.
For that i wrote this function:
import Data.List
readIntegers :: IO()
readIntegers = do
putStrLn "insert a number: "
num<-getLine
let list = ordList ((read num :: Int):list)
if (read num == 0)
then print list
else readIntegers
where ordList ::[Int]->[Int]
ordList [] = []
ordList xs = sort xs
This compiles just fine, but when i insert the number '0', it gives me this error:
*** Exception: <<loop>>
What am i doing wrong ?
As #phg points out, you are essentially constructing an infinite list, and actually evaluating it causes the loop error. A simple implementation to resolve this issue is to define a helper function which takes an additional parameter - a list to store all the inputs read in from the screen, like so:
readInteger :: IO ()
readInteger = readInteger' []
where
readInteger' x = do
putStrLn "insert a number: "
num<-getLine
if ((read num :: Int) == 0)
then print $ ordList x
else readInteger' $ (read num :: Int):x
where ordList ::[Int]->[Int]
ordList [] = []
ordList xs = sort xs
Please note that the above is essentially just an implementation of #phg's answer, but with some changes to your original logic. Firstly, since 0 is a sentinel value, we shouldn't be appending that to our list. Second, we do not need to sort the list every single time we are adding a value to it. Sorting once at the time of printing/passing to another function is sufficient.
Demo
If you want to read an unspecified number of integers without prompting for user input and cut it off the moment you encounter 0, you would probably do well to use getContents, which will read everything from the standard input as a single string, lazily.
Then, it is a simple matter of parsing it to a list of numbers and doing what you want with it, like so:
readIntegers :: ()
readIntegers = do
a <- getContents
let b = ordList $ takeWhile (/= 0) $ map (\x -> read x :: Int) $ words a
mapM (putStrLn . show) b
where ordList ::[Int]->[Int]
ordList [] = []
ordList xs = sort xs
let list = ordList ((read num :: Int):list)
This is basically a recursive definition of a list of the form [x, x, ...] (like if you wrote an equation saying x = 1 + x). That is perfectly fine by itself, since Haskell is lazy; however, if you try to print list (aka "solve the equation"), it will fail, since it will try to print infinitely many numbers.
You probably have a misconception about the workings of the (:) operator. Haskell functions will never perform an assignment operation and concatenate num onto list by changing it, like in imperative languages. There are only pure functions.
If you want to accumulate all numbers, you should try to come up with a recursive definition of readIntegers, keeping its state (the list) in an additional parameter (there are also more sophisticated ways, hiding the state passing, but more complicated to use for a beginner).
For a more sophisticated solution, note that this is an unfold and you can use unfoldM from Control.Monad.Loops to implement it:
import Control.Monad.Loops (unfoldM)
readInts :: IO [Int]
readInts = unfoldM $ fmap (check . read) getLine
where check x = if x == 0 then Nothing else Just x
This has the nice property that it returns the list in the order in which it was read.

Resources