How to iterate over a list of characters and manipluate the characters in Haskell? - haskell

I am trying to go through a list of characters in a list and do something to the current character. My java equivalent of what I am trying to accomplish is:
public class MyClass {
void repeat(String s) {
String newString = "";
for(int i = 0; i < s.length(); i++) {
newString += s.charAt(i);
newString += s.charAt(i);
}
public static void main(String args[]) {
MyClass test = new MyClass();
test.repeat("abc");
}
}

One of the nicest thing about functional programming is that patterns like yours can be encapsulated in one higher-order function; if nothing fits, you can still use recursion.
Recursion
First up, a simple recursive solution. The idea behind this is that it's like a for-loop:
recursiveFunction [] = baseCase
recursiveFunction (char1:rest) = (doSomethingWith char1) : (recursiveFunction rest)
So let's write your repeat function in this form. What is the base case? Well, if you repeat an empty string, you'll get an empty string back. What is the recursion? In this case, we're doubling the first character, then recursing along the rest of the string. So here's a recursive solution:
repeat1 [] = []
repeat1 (c:cs) = c : c : (repeat1 cs)
Higher-order Functions
As you start writing more Haskell, you'll discover that these sort of recursive solutions often fit into a few repetitive patterns. Luckily, the standard library contains several predefined recursive functions for these sort of patterns:
fmap is used to map each element of a list to a different value using a function given as a parameter. For example, fmap (\x -> x + 1) adds 1 to each element of a list. Unfortunately, it can't change the length of a list, so we can't use fmap by itself.
concat is used to 'flatten' a nested list. For example, concat [[1,2],[3,4,5]] is [1,2,3,4,5].
foldr/foldl are two more complex and generic functions. For more details, consult Learn You a Haskell.
None of these seem to directly fit your needs. However, we can use concat and fmap together:
repeat2 list = concat $ fmap (\x -> [x,x]) list
The idea is that fmap changes e.g. [1,2,3] to a nested list [[1,1],[2,2],[3,3]], which concat then flattens. This pattern of generating multiple elements from a single one is so common that the combination even has a special name: concatMap. You use it like so:
repeat3 list = concatMap (\x -> [x,x]) list
Personally, this is how I'd write repeat in Haskell. (Well, almost: I'd use eta-reduction to simplify it slightly more. But at your level that's irrelevant.) This is why Haskell in my opinion is so much more powerful than many other languages: this 7-line Java method is one line of highly readable, idiomatic Haskell!

As others have suggested, it's probably wise to start with a list comprehension:
-- | Repeat each element of a list twice.
double :: [x] -> [x]
double xs = [d | x <- xs, d <- [x, x]]
But the fact that the second list in the comprehension always has the same number of elements, regardless of the value of x, means that we don't need quite that much power: the Applicative interface is sufficient. Let's start by writing the comprehension a bit differently:
double xs = xs >>= \x -> [x, x] >>= \d -> pure d
We can simplify immediately using a monad identity law:
double xs = xs >>= \x -> [x, x]
Now we switch over to Applicative, but let's leave a hole for the hard part:
double :: [x] -> [x]
double xs = liftA2 _1 xs [False, True]
The compiler lets us know that
_1 :: x -> Bool -> x
Since the elements of the inner/second list are always the same, and always come from the current outer/first list element, we don't have to care about the Bool:
double xs = liftA2 const xs [False, True]
Indeed, we don't even need to be able to distinguish the list positions:
double xs = liftA2 const xs [(),()]
Of course, we have a special Applicative method, (<*), that corresponds to liftA2 const, so let's use it:
double xs = xs <* [(),()]
And then, if we like, we can avoid mentioning xs by switching to a "point-free" form:
-- | Repeat each element of a list twice.
double :: [x] -> [x]
double = (<* [(),()])
Now for the test:
main :: IO ()
main = print $ double [1..3]
This will print [1,1,2,2,3,3].
double admits a slight generalization of dubious value:
double :: Alternative f => f x -> f x
double = (<* join (<|>) (pure ()))
This will work for sequences as well as lists:
double (Data.Sequence.fromList [1..3]) = Data.Sequence.fromList [1,1,2,2,3,3]
but it could be a bit confusing for some other Alternative instances:
double (Just 3) = Just 3

Related

Splitting string into type in Haskell

I need to create a parse function. I am new in Haskell and I am interesting can my thinking be implemented in Haskell using only GHC base functions.
So the problem is : I have so message in string with coordinates and value like (x: 01, 01, ...
y:01, 02,: v: X, Y, Z) and i need to parse it type like ([Char], [Int], [Int]).
In language like C , I would create loop and go from start and would check and then put it in there arrays but I am afraid this would not work in Haskell. Can someone give a hint on a approachable solutions to this problem?
If you’re accustomed to imperative programming with loops, you can actually do a fairly literal translation of an imperative solution to Haskell using direct recursion.
Bear in mind, this isn’t the easiest or best way to arrive at a working solution, but it’s good to learn the technique so that you understand what more idiomatic solutions are abstracting away for you.
The basic principle is to replace each loop with a recursive function, and replace each mutable variable with an accumulator parameter to that function. Where you would modify the variable within an iteration of the loop, just make a new variable; where you would modify it between iterations of the loop, call the looping function with a different argument in place of that parameter.
For a simple example, consider computing the sum of a list of integers. In C, that might be written like this:
struct ListInt { int head; struct ListInt *tail; }
int total(ListInt const *list) {
int acc = 0;
ListInt const *xs = list;
while (xs != NULL) {
acc += xs->head;
xs = xs->tail;
}
return acc;
}
We can translate that literally to low-level Haskell:
total :: [Int] -> Int
total list
= loop
0 -- acc = 0
list -- xs = list
where
loop
:: Int -- int acc;
-> [Int] -- ListInt const *xs;
-> Int
loop acc xs -- loop:
| not (null xs) = let -- if (xs != NULL) {
acc' = acc + head xs -- acc += xs->head;
xs' = tail xs -- xs = xs->tail;
in loop acc' xs' -- goto loop;
-- } else {
| otherwise = acc -- return acc;
-- }
The outer function total sets up the initial state, and the inner function loop handles the iteration over the input. In this case, total immediately returns after the loop, but if there were some more code after the loop to process the results, that would go in total:
total list = let
result = loop 0 list
in someAdditionalProcessing result
It’s extremely common in Haskell for a helper function to accumulate a list of results by prepending them to the beginning of an accumulator list with :, and then reversing this list after the loop, because appending a value to the end of a list is much more costly. You can think of this pattern as using a list as a stack, where : is the “push” operation.
Also, straight away, we can make some simple improvements. First, the accessor functions head and tail may throw an error if our code is wrong and we call them on empty lists, just like accessing a head or tail member of a NULL pointer (although an exception is clearer than a segfault!), so we can simplify it and make it safer use pattern matching instead of guards & head/tail:
loop :: Int -> [Int] -> Int
loop acc [] = acc
loop acc (h : t) = loop (acc + h) t
Finally, this pattern of recursion happens to be a fold: there’s an initial value of the accumulator, updated for each element of the input, with no complex recursion. So the whole thing can be expressed with foldl':
total :: [Int] -> Int
total list = foldl' (\ acc h -> acc + h) 0 list
And then abbreviated:
total = foldl' (+) 0
So, for parsing your format, you can follow a similar approach: instead of a list of integers, you have a list of characters, and instead of a single integer result, you have a compound data type, but the overall structure is very similar:
parse :: String -> ([Char], [Int], [Int])
parse input = let
(…, …, …) = loop ([], [], []) input
in …
where
loop (…, …, …) (c : rest) = … -- What to do for each character.
loop (…, …, …) [] = … -- What to do at end of input.
If there are different sub-parsers, where you would use a state machine in an imperative language, you can make the accumulator include a data type for the different states. For example, here’s a parser for numbers separated by spaces:
import Data.Char (isSpace, isDigit)
data ParseState
= Space
| Number [Char] -- Digit accumulator
numbers :: String -> [Int]
numbers input = loop (Space, []) input
where
loop :: (ParseState, [Int]) -> [Char] -> [Int]
loop (Space, acc) (c : rest)
| isSpace c = loop (Space, acc) rest -- Ignore space.
| isDigit c = loop (Number [c], acc) rest -- Push digit.
| otherwise = error "expected space or digit"
loop (Number ds, acc) (c : rest)
| isDigit c = loop (Number (c : ds), acc) rest -- Push digit.
| otherwise
= loop
(Space, read (reverse ds) : acc) -- Save number, expect space.
(c : rest) -- Repeat loop for same char.
loop (Number ds, acc) [] = let
acc' = read (reverse ds) : acc -- Save final number.
in reverse acc' -- Return final result.
loop (Space, acc) [] = reverse acc -- Return final result.
Of course, as you may be able to tell, this approach quickly becomes very complicated! Even if you write your code very compactly, or express it as a fold, if you’re working at the level of individual characters and parser state machines, it will take a lot of code to express your meaning, and there are many opportunities for error. A better approach is to consider the data flow at work here, and put together the parser from high-level components.
For example, the intent of the above parser is to do the following:
Split the input on whitespace
For each split, read it as an integer
And that can be expressed very directly with the words and map functions:
numbers :: String -> [Int]
numbers input = map read (words input)
One readable line instead of dozens! Clearly this approach is better. Consider how you can express the format you’re trying to parse in this style. If you want to avoid libraries like split, you can still write a function to split a string on separators using base functions like break, span, or takeWhile; then you can use that to split the input into records, and split each record into fields, and parse fields as integers or textual names accordingly.
But the preferred approach for parsing in Haskell is not to manually split up input at all, but to use parser combinator libraries like megaparsec. There are parser combinators in base too, under Text.ParserCombinators.ReadP. With those, you can express a parser in the abstract, without talking about splitting up input at all, by just combining subparsers with standard interfaces (Functor, Applicative, Alternative, and Monad), for example:
import Data.Char (isDigit)
import Text.ParserCombinators.ReadP
( endBy
, eof
, munch1
, readP_to_S
, skipSpaces
, skipSpaces
)
numbers :: String -> [Int]
numbers = fst . head . readP_to_S onlyNumbersP
where
onlyNumbersP :: ReadP [Int]
onlyNumbersP = skipSpaces *> numbersP <* eof
numbersP :: ReadP [Int]
numbersP = numberP `endBy` skipSpaces
numberP :: ReadP Int
numberP = read <$> munch1 isDigit
This is the approach I would recommend in your case. Parser combinators are also an excellent way to get comfortable using applicatives and monads in practice.

How does listx2 = [x * 2 | x<- numberList] work?

So I m watching a very basic Tutorial, and I m at list comprehension where this comes up:
listx2 = [x * 2 | x<- numberList]
with numberList being a list of numbers
So this takes every number in the list and duplicates it, so numberList = [1,2] results in [2,4].
But HOW does the whole Syntax come together?
I know that x * 2 is the doubleing, but the rest just doesn't make sense to me.
| is the "or" Symbol as far as I know,and what does it do there?
x <- numberList gives x a number from the list, but why does it take just a number? and why so nicely one after the other? There is no recursion or anything that tells it to do one element at a time...
I learn stuff by understanding it, so is that even possible here or do I just have to accept this as "thats how it goes" and memorize the pattern?
List comprehensions use their own special syntax, which is
[ e | q1, q2, ..., qn ]
The | is not an "or", it's part of the syntax, just as [ and ].
Each qi can be of the following forms.
x <- list chooses x from the list
condition is a boolean expression, which discards the xs chosen before if the condition is false
let y = expression defines variable y accordingly
Finally, e is an expression which can involve all the variables defined in the qi, and which forms the elements in the resulting list.
What you see is syntactical sugar. So Haskell does not interpret the pipe (|) as a guard, etc. It sees the list comprehension as a whole.
This however does not mean that the <- are picked at random. Actually list comprehension maps nicely on the list monad. What you see is syntactical sugar for:
listx2 = do
x <- numberList
return x*2
Now a list type [] is actually a monad. It means that we have written:
listx2 = numberList >>= \x -> return (x*2)
Or even shorter:
listx2 = numberList >>= return . (*2)
Now the list monad is defined as:
instance Monad [] where
return x = [x]
xs >>= k = concat $ fmap k xs
So this means that it is equivalent to:
listx2 = numberList >>= return . (*2)
listx2 = concat (fmap (return . (*2)) numberList)
listx2 = concat (fmap (\x -> [2*x]) numberList)
Now for a list fmap is equal to map, so:
listx2 = concat $ map (\x -> [2*x]) numberList
listx2 = concatMap (\x -> [2*x]) numberList
so that means that for every element x in the numberList we will generate a singleton list [2*x] and concatenate all these singleton lists into the result.

Are there ways to call two functions (one just after another) in purely functional language? (in non-io mode)

I'm trying to understand order of execution in purely functional language.
I know that in purely functional languages, there is no necessary execution order.
So my question is:
Suppose there are two functions.
I would like to know all ways in which I can call one function after another (except nested call of one function from another) (and except io-mode).
I would like to see examples in Haskell or pseudo-code.
There is no way to do what you describe, if the functions are totally independent and you don't use the result of one when you call the other.
This is because there is no reason to do this. In a side effect free setting, calling a function and then ignoring its result is exactly the same as doing nothing for the amount of time it takes to call that function (setting aside memory usage).
It is possible that seq x y will evaluate x and then y, and then give you y as its result, but this evaluation order isn't guaranteed.
Now, if we do have side effects, such as if we are working inside a Monad or Applicative, this could be useful, but we aren't truly ignoring the result since there is context being passed implicitly. For instance, you can do
main :: IO ()
main = putStrLn "Hello, " >> putStrLn "world"
in the IO Monad. Another example would be the list Monad (which could be thought of as representing a nondeterministic computation):
biggerThanTen :: Int -> Bool
biggerThanTen n = n > 10
example :: String
example = filter biggerThanTen [1..15] >> return 'a' -- This evaluates to "aaaaa"
Note that even here we aren't really ignoring the result. We ignore the specific values, but we use the structure of the result (in the second example, the structure would be the fact that the resulting list from filter biggerThanTen [1..15] has 5 elements).
I should point out, though, that things that are sequenced in this way aren't necessarily evaluated in the order that they are written. You can sort of see this with the list Monad example. This becomes more apparent with bigger examples though:
example2 :: [Int]
example2 =
[1,2,3] >>=
(\x -> [10,100,1000] >>=
(\y -> return (x * y))) -- ==> [10,100,1000,20,200,2000,30,300,3000]
The main takeaway here is that evaluation order (in the absence of side effects like IO and ignoring bottoms) doesn't affect the ultimate meaning of code in Haskell (other than possible differences in efficiency, but that is another topic). As a result, there is never a reason to call two functions "one after another" in the fashion described in the question (that is, where the calls are totally independent from each other).
Do notation
Do notation is actually exactly equivalent to using >>= and >> (there is actually one other thing involved that takes care of pattern match failures, but that is irrelevant to the discussion at hand). The compiler actually takes things written in do notation and converts them to >>= and >> through a process called "desugaring" (since it removes the syntactic sugar). Here are the three examples from above written with do notation:
IO Example
main :: IO ()
main = do
putStrLn "Hello, "
putStrLn "World"
First list example
biggerThanTen :: Int -> Bool
biggerThanTen n = n > 10
example :: String -- String is a synonym for [Char], by the way
example = do
filter biggerThanTen [1..15]
return 'a'
Second list example
example2 :: [Int]
example2 = do
x <- [1,2,3]
y <- [10,100,1000]
return (x * y)
Here is a side-by-side comparison of the conversions:
do --
m -- m >> n
n --
do --
x <- m -- m >>= (\x ->
... -- ...)
The best way to understand do notation is to first understand >>= and return since, as I said, that's what the compiler transforms do notation into.
As a side-note, >> is just the same as >>=, it just ignores the "result" of it's left argument (although it preserves the "context" or "structure"). So all definitions of >> must be equivalent to m >> n = m >>= (\_ -> n).
Expanding the >>= in the second list example
To help drive home the point that Monads are not usually impure, lets expand the >>= calls in the second list example, using the Monad definition for lists. The definition is:
instance Monad [] where
return x = [x]
xs >>= f = concatMap f xs
and we can convert example2 into:
Step 0 (what we already have)
example2 :: [Int]
example2 =
[1,2,3] >>=
(\x -> [10,100,1000] >>=
(\y -> return (x * y)))
Step 1 (converting the first >>=)
example2 =
concatMap
(\x -> [10,100,1000] >>=
(\y -> return (x * y)))
[1,2,3]
Step 2
example2 =
concatMap
(\x -> concatMap
(\y -> return (x * y))
[10,100,1000])
[1,2,3]
Step 3
example2 =
concatMap
(\x -> concatMap
(\y -> [x * y])
[10,100,1000])
[1,2,3]
So, there is no magic going on here, just normal function calls.
You can write a function whose arguments depend on the evaluation of another function:
-- Ads the first two elements of a list together
myFunc :: [Int] -> Int
myFunc xs = (head xs) + (head $ tail xs)
If that's what you mean. In this case, you can't get the output of myFunc xs without evaluating head xs, head $ tail xs and (+). There is an order here. However, the compiler can choose which order to execute head xs and head $ tail xs in since they are not dependent on each other, but it can't do the addition without having both of the other results. It could even choose to evaluate them in parallel, or on different machines. The point is that pure functions, because they have no side effects, don't have to be evaluated in a given order until their results are interdependent.
Another way to look at the above function is as a graph:
myFunc
|
(+)
/ \
/ \
head head
\ |
\ tail
\ /
xs
In order to evaluate a node, all nodes below it have to be evaluated first, but different branches can be evaluated in parallel. First xs must be evaluated, at least partially, but after that the two branches can be evaluated in parallel. There are some nuances due to lazy evaluation, but this is essentially how the compiler constructs evaluation trees.
If you really want to force one function call before the other, you can use the seq function. It takes two arguments, forces the first to be evaluated, then returns the second, e.g.
myFunc2 :: [Int] -> Int
myFunc2 xs = hxs + (hxs `seq` (head $ tail xs))
where hxs = head xs
This will force head xs to evaluate before head $ tail xs, but this is more dealing with strictness than sequencing functions.
Here is an easy way:
case f x of
result1 -> case g y of
result2 -> ....
Still, unless g y uses something from result1 and the subsequent calculations something from result2, or the pattern is such that the result must be evaluated, there is no guarantee that either of f or g are actually called, nor in what order.
Still, you wanted a way to call one function after another, and this is such a way.

iterating through a list in haskell

I have a list of list of characters ::[[Char]].
I need to iterate both over the list of strings and also over each character in each string.
Say, my list is present in this variable.
let xs
Please suggest an easy way to iterate.
If you want to apply a function f to every element of a list like this:
[a, b, c, d] → [f a, f b, f c, f d]
then map f xs does the trick. map turns a function on elements to a function on lists. So, we can nest it to operate on lists of lists: if f transforms as into bs, map (map f) transforms [[a]]s into [[b]]s.
If you instead want to perform some IO action for every element of a list (which is more like traditional iteration), then you're probably looking for forM_:1
forM_ :: [a] -> (a -> IO b) -> IO ()
You give it a function, and it calls it with each element of the list in order. For instance, forM_ xs putStrLn is an IO action that will print out every string in xs on its own line. Here's an example of a more involved use of forM_:
main = do
...
forM_ xs $ \s -> do
putStrLn "Here's a string:"
forM_ s print
putStrLn "Now it's done."
If xs contains ["hello", "world"], then this will print out:
Here's a string:
'h'
'e'
'l'
'l'
'o'
Now it's done.
Here's a string:
'w'
'o'
'r'
'l'
'd'
Now it's done.
1 forM_ actually has a more general type, but the simpler version I've shown is more relevant here.
Just that:
[c | x <- xs, c <- x]
The "correct" way to iterate is actually fold. Anything you might ever want to do with a list can be done with a fold. Let's consider what you want to do. You're probably thinking of something like this:
for (row in xs):
for (c in row):
doSomething
The problem is, you're probably making use of mutable variables in doSomething. That's ok, we can deal with that. So suppose you have this.
def iter2d(xs):
outerVar = outerInit
for (row in xs):
innerVar = innerInit(row)
outerVar.adjust1(row)
for (c in row):
innerVar.adjust2(c)
outerVar.adjust3(c, innerVar)
return outerVar
Let's translate that to folds. And immutability.
iter2d :: [[Char]] -> Something
iter2d xs = foldl' outerStep outerInit xs
where outerInit = ... -- same as outerInit above
outerStep acc row = fst $ foldl' innerStep innerInit' row)
where innerInit' = ((adjust1 acc row), innerInit row)
innerInit row = ... -- same as innerInit above
innerStep (outAcc, inAcc) c = (outAcc', inAcc')
where inAcc' = adjust2 inAcc c
outAcc' = adjust3 outAcc c inAcc'
Notice with immutability, we are forced to indicate that outAc' depends on inAcc', rather than inAcc, meaning, the "state" of innerVar after it is updated.
Now you might say "wow that Haskell looks way ugly, why would I ever want to use Haskell". Yes, it does look ugly, but only because I tailored it to be a direct translation of imperative code. Once you get used to using folds instead of "iterating through a list", then you will find that folding is a very powerful technique that lets you do a lot of things in a more elegant way than for loops allow.
map (map f) l
where f :: Char -> Foo is a function to apply to each Char and l :: [[Char]]
returns l' :: [[Foo]]

When destructuring tuples in Haskell, where can the elements be used?

I am reading a tutorial that uses the following example (that I'll generalize somewhat):
f :: Foo -> (Int, Foo)
...
fList :: Foo -> [Int]
fList foo = x : fList bar
where
(x, bar) = f foo
My question lies in the fact that it seems you can refer to x and bar, by name, outside of the tuple where they are obtained. This would seem to act like destructuring parameter lists in other languages, if my guess is correct. (In other words, I didn't have to do the following:)
fList foo = (fst tuple) : fList (snd tuple)
where
tuple = f foo
Am I right about this behavior? I've never seen it mentioned yet in the tutorials/books I've been reading. Can someone point me to more info on the subject?
Edit: Can anything (lists, arrays, etc.) be destructured in a similar way, or can you only do this with tuples?
Seeing your edit, I think what your asking about is Pattern matching.
And to answer your question: Yes, anything you can construct, you can also 'deconstruct' using the constructors. For example, you're probably familiar with this form of pattern matching:
head :: [a] -> a
head (x:xs) = x
head [] = error "Can't take head of empty list"
However, there are more places where you can use pattern matching, other valid notations are:
head xs = case xs of
(y:ys) -> y
[] -> error "Can't take head of empty list"
head xs = let (y:ys) = xs
in y
head xs = y
where
(y:ys) = xs
Note that the last two examples are a bit different from the first to because they give different error messages when you call them with an empty list.
Although these examples are specific to lists, you can do the same with other data types, like so:
first :: (a, b) -> a
first tuple = x
where
(x, y) = tuple
second :: (a, b) -> b
second tuple = let (x, y) = tuple
in y
fromJust :: Maybe a -> a
fromJust ma = x
where
(Just x) = ma
Again, the last function will also crash if you call it with Nothing.
To sum up; if you can create something using constructors (like (:) and [] for lists, or (,) for tuples, or Nothing and Just for Maybe), you can use those same constructors to do pattern matching in a variety of ways.
Am I right about this behavior?
Yes. The names exist only in the block where you have defined them, though. In your case, this means the logical unit that your where clause is applied to, i.e. the expression inside fList.
Another way to look at it is that code like this
x where x = 3
is roughly equivalent to
let x = 3 in x
Yes, you're right. Names bound in a where clause are visible to the full declaration preceding the where clause. In your case those names are f and bar.
(One of the hard things about learning Haskell is that it is not just permitted but common to use variables in the source code in locations that precede the locations where those variables are defined.)
The place to read more about where clauses is in the Haskell 98 Report or in one of the many fine tutorials to be found at haskell.org.

Resources