Why isn't the Prelude's words function written more simply? - haskell

Consider the words Prelude function; it is really easy and one could write it in the following manner:
words' :: String -> [String]
words' [] = []
words' str = before : words' (dropWhile isSpace after) where
(before, after) = break isSpace str
However, I noticed that its original Prelude code seems much less... natural:
words :: String -> [String]
words s = case dropWhile {-partain:Char.-}isSpace s of
"" -> []
s' -> w : words s''
where (w, s'') =
break {-partain:Char.-}isSpace s'
I assume that there are optimization-related reasons for it. The question is: am I wrong to expect that the compiler should optimize the words' function just as well as its Prelude version? I did use the same functions (break, dropWhile, isSpace).
I was once very surprised that GHC did not perform some of the simplest low-level optimizations:
C vs Haskell Collatz conjecture speed comparison
but aside for the {-partain:Char.-} bits (this hint for the compiler does not seem very helpful in this situation IMO) the words code seems unnecesarily bloated for a high-level language. What is the reason behind it in this case?

This is nearly exactly the same code. The only difference is if we're doing the dropWhile isSpace before every call or only the recursive call. Neither is more complex than the other, but rather the latter (Prelude) version seems more verbose because the pattern matching isn't directly in the function.
You can observe the difference (and why the Prelude version has better behavior) like so:
*Main> words " "
[]
*Main> words' " "
[""]
Note that you can quickly verify if your "improved" versions are the same as the originals using QuickCheck.

Related

Haskell - Couldn't match type `[Char]' with `Char'

I currently have the following code in Haskell
splitStringOnDelimeter :: String -> Char -> [String]
splitStringOnDelimeter "" delimeter = return [""]
splitStringOnDelimeter string delimeter = do
let split = splitStringOnDelimeter (tail string) delimeter
if head string == delimeter
then return ([""] ++ split)
else return ( [( [(head string)] ++ (head split) )] ++ (tail split))
If I run it in a Haskell terminal (i.e. https://www.tryhaskell.org) with values for the return statement such as ( [( [(head "ZZZZ")] ++ (head ["first", "second", "third"]) )] ++ (tail ["first", "second", "third"])) or [""] ++ ["first", "second", "third"] or [""] then I receive the correct types from the terminal which is different to my local stack compiler. Furthermore, if I also change the top return statement to return "" then it doesn't complain about that statement which I'm pretty sure is incorrect.
My local compiler works fine with the rest of my Haskell codebase which is why I think it might be something wrong with my code...
One of the unfortunate things in the design of the Monad typeclass, is that they introduced a function called return. But although in many imperative programming languages return is a keyword to return content, in Haskell return has a totally different meaning, it does not really return something.
You can solve the problem by dropping the return:
splitStringOnDelimeter :: String -> Char -> [String]
splitStringOnDelimeter "" delimeter = [""]
splitStringOnDelimeter string delimeter =
let split = splitStringOnDelimeter (tail string) delimeter in
if head string == delimeter
then ([""] ++ split)
else ( [( [(head string)] ++ (head split) )] ++ (tail split))
The return :: Monad m => a -> m a is used to wrap a value (of type a) in a monad. Since here your signature hints about a list, Haskell will assume that you look for the list monad. So that means that you return would wrap [""] into another list, so implicitly with return [""] you would have written (in this context), [[""]], and this of course does not match with [String].
The same goes for do, again you them make a monadic function, but here your function has not much to do with monads.
Note that the name return is not per se bad, but since nearly all imperative languages attach an (almost) equivalent meaning to it, most people assume that it works the same way in functional languages, but it does not.
Mind that you use functions like head, tail, etc. These are usually seen as anti-patterns: you can use pattern matching instead. We can rewrite this to:
splitStringOnDelimeter :: String -> Char -> [String]
splitStringOnDelimeter "" delimeter = [""]
splitStringOnDelimeter (h:t) delimeter | h == delimeter = "" : split
| otherwise = (h : sh) : st
where split#(sh:st) = splitStringOnDelimeter t delimeter
By using pattern matching, we know for sure that the string has a head h and a tail t, and we can directly use these into the expression. This makes the expression shorter as well as more readable. Although if-then-else clauses are not per se anti-patterns, personally I think guards are syntactically more clean. We thus use a where clause here where we call splitStringOnDelimter t delimeter, and we pattern match this with split (as well as with (sh:st). We know that this will always match, since both the basecase and the inductive case always produce a list with at least one element. This again allows use to write a neat expression where we can use sh and st directly, instead of calling head and tail.
If I test this function locally, I got:
Prelude> splitStringOnDelimeter "foo!bar!!qux" '!'
["foo","bar","","qux"]
As take-away message, I think you better avoid using return, and do, unless you know what this function and keyword (do is a keyword) really mean. In the context of functional programming these have a different meaning.
return has type forall m a. Monad m => a -> m a.
The output type of the function splitStringOnDelimiter is [String], so if you try to write some output value using return, the compiler will infer that you want to provide some m a, thus instantiating m to [] (which is indeed an instance of the Monad typeclass), and a to String. It follows that the compiler will now expect some String to be used as argument of return. This expectation is violated in, for example, return ([""] ++ split), because here the argument of return, namely [""] ++ split has type [String] rather than String.
do is used as a convenient notation for monadic code, so you should rely on it only if you are interested in using the monadic operations of the output type. In this case, you really just want to manipulate lists using pure functions.
I'll add my 2 cents and suggest a solution. I used a foldr, that is a simple instance of a recursion scheme. Recursion schemes like foldr capture common patterns of computation; they make recursive definitions clear, easy to reason about, and total by construction.
I also took advantage of the fact that the output list is always non-empty, so I wrote it in the type. By being more precise about my intentions, I now know that split, the result of the recursive call, is a NonEmpty String, so I can use the total functions head and tail (from Data.List.NonEmpty), because a non-empty list has always a head and a tail.
import Data.List.NonEmpty as NE (NonEmpty(..), (<|), head, tail)
splitStringOnDelimeter :: String -> Char -> NonEmpty String
splitStringOnDelimeter string delimiter = foldr f (pure "") string
where f h split = if h == delimiter
then ("" <| split)
else (h : NE.head split) :| NE.tail split

Split string on multiple delimiters of any length in Haskell

I am attempting a Haskell coding challenge where, given a certain string with a prefix indicating which substrings are delimiting markers, a list needs to be built from the input.
I have already solved the problem for multiple single-length delimiters, but I am stuck with the problem where the delimiters can be any length. I use splitOneOf from Data.List.Split, but this works for character (length 1) delimiters only.
For example, given
input ";,\n1;2,3,4;10",
delimiters are ';' and ','
splitting the input on the above delivers
output [1,2,3,4,10]
The problem I'm facing has two parts:
Firstly, a single delimiter of any length, e.g.
"****\n1****2****3****4****10" should result in the list [1,2,3,4,10].
Secondly, more than one delimiter can be specified, e.g.
input "[***][||]\n1***2||3||4***10",
delimiters are "***" and "||"
splitting the input on the above delivers
output [1,2,3,4,10]
My code for retrieving the delimiter in the case of character delimiters:
--This gives the delimiters as a list of characters, i.e. a String.
getDelimiter::String->[Char]
getDelimiter text = head . splitOn "\n" $ text
--drop "[delimiters]\n" from the input
body::String->String
body text = drop ((length . getDelimiter $ text)+1)) $ text
--returns tuple with fst being the delimiters, snd the body of the input
doc::String->(String,String)
doc text = (getDelimiter text, body text)
--given the delimiters and the body of the input, return a list of strings
numbers::(String,String)->[String]
numbers (delim, rest) = splitOneOf delim rest
--input ",##\n1,2#3#4" gives output ["1","2","3","4"]
getList::String->[String]
getList text = numbers . doc $ text
So my question is, how do I do the processing for when the delimiters are e.g. "***" and "||"?
Any hints are welcome, especially in a functional programming context.
If you don't mind making multiple passes over the input string, you can use splitOn from Data.List.Split, and gradually split the input string using one delimiter at a time.
You can write this fairly succinctly using foldl':
import Data.List
import Data.List.Split
splitOnAnyOf :: Eq a => [[a]] -> [a] -> [[a]]
splitOnAnyOf ds xs = foldl' (\ys d -> ys >>= splitOn d) [xs] ds
Here, the accumulator for the fold operation is a list of strings, or more generally [[a]], so you have to 'lift' xs into a list, using [xs].
Then you fold over the delimiters ds - not the input string to be parsed. For each delimiter d, you split the accumulated list of strings with splitOn, and concatenate them. You could also have used concatMap, but here I arbitrarily chose to use the more general >>= (bind) operator.
This seems to do what is required in the OP:
*Q49228467> splitOnAnyOf [";", ","] "1;2,3,4;10"
["1","2","3","4","10"]
*Q49228467> splitOnAnyOf ["***", "||"] "1***2||3||4***10"
["1","2","3","4","10"]
Since this makes multiple passes over temporary lists, it's most likely not the fastest implementation you can make, but if you don't have too many delimiters, or extremely long lists, this may be good enough.
This problem has two kinds of solutions: the simple, and the efficient. I will not cover the efficient (because it is not simple), though I will hint on it.
But first, the part where you extract the delimiter and body parts of the input, may be simplified with Data.List.break:
delims = splitOn "/" . fst . break (== '\n') -- Presuming the delimiters are delimited with
-- a slash.
body = snd . break (== '\n')
In any way, we may reduce this problem to finding the positions of all the given patterns in a given string. (By saying "string", I do not mean the haskell String. Rather, I mean an arbitrarily long sequence (or even an infinite stream) of any symbols for which an Equality relation is defined, which is typed in Haskell as Eq a => [a]. I hope this is not too confusing.) As soon as we have the positions, we may slice the string to our hearts' content. If we want to deal with an infinite stream, we must obtain the positions incrementally, and yield the results as we go, which is a restriction that must be kept in mind. Haskell is equipped well enough to handle the stream case as well as the finite string.
A simple approach is to cast isPrefixOf on the string, for each of the patterns.
If some of them matches, we replace it with a Nothing.
Otherwise we mark the first symbol as Just and move to the next position.
Thus, we will have replaced all the different delimiters by a single one: Nothing. We may then readily slice the string by it.
This is fairly idiomatic, and I will bring the code to your judgement shortly. The problem with this approach is that it is inefficient: in fact, if a pattern failed to match, we would rather advance by more than one symbol.
It would be more efficient to base our work on the research that has been made into finding patterns in a string; this problem is well known and there are great, intricate algorithms that solve it an order of magnitude faster. These algorithms are designed to work with a single pattern, so some work must be put into adapting them to our case; however, I believe they are adaptable. The simplest and eldest of such algorithms is the KMP, and it is already encoded in Haskell. You may wish to take arms and generalize it − a quick path to some amount of fame.
Here is the code:
module SplitSubstr where
-- stackoverflow.com/questions/49228467
import Data.List (unfoldr, isPrefixOf, elemIndex)
import Data.List.Split (splitWhen) -- Package `split`.
import Data.Maybe (catMaybes, isNothing)
-- | Split a (possibly infinite) string at the occurrences of any of the given delimiters.
--
-- λ take 10 $ splitOnSubstrs ["||", "***"] "la||la***fa"
-- ["la","la","fa"]
--
-- λ take 10 $ splitOnSubstrs ["||", "***"] (cycle "la||la***fa||")
-- ["la","la","fa","la","la","fa","la","la","fa","la"]
--
splitOnSubstrs :: [String] -> String -> [String]
splitOnSubstrs delims
= fmap catMaybes -- At this point, there will be only `Just` elements left.
. splitWhen isNothing -- Now we may split at nothings.
. unfoldr f -- Replace the occurences of delimiters with a `Nothing`.
where
-- | This is the base case. It will terminate the `unfoldr` process.
f [ ] = Nothing
-- | This is the recursive case. It is divided into 2 cases:
-- * One of the delimiters may match. We will then replace it with a Nothing.
-- * Otherwise, we will `Just` return the current element.
--
-- Notice that, if there are several patterns that match at this point, we will use the first one.
-- You may sort the patterns by length to always match the longest or the shortest. If you desire
-- more complicated behaviour, you must plug a more involved logic here. In any way, the index
-- should point to one of the patterns that matched.
--
-- vvvvvvvvvvvvvv
f body#(x:xs) = case elemIndex True $ (`isPrefixOf` body) <$> delims of
Just index -> return (Nothing, drop (length $ delims !! index) body)
Nothing -> return (Just x, xs)
It might happen that you will not find this code straightforward. Specifically, the unfoldr part is somewhat dense, so I will add a few words about it.
unfoldr f is an embodiment of a recursion scheme. f is a function that may chip a part from the body: f :: (body -> Maybe (chip, body)).
As long as it keeps chipping, unfoldr keeps applying it to the body. This is called recursive case.
Once it fails (returning Nothing), unfoldr stops and hands you all the chips it thus collected. This is called base case.
In our case, f takes symbols from the string, and fails once the string is empty.
That's it. I hope you send me a postcard when you receive a Turing award for a fast splitting algorithm.

Haskell: Why ++ is not allowed in pattern matching?

Suppose we want to write our own sum function in Haskell:
sum' :: (Num a) => [a] -> a
sum' [] = 0
sum' (x:xs) = x + sum' xs
Why can't we do something like:
sum' :: (Num a) => [a] -> a
sum' [] = 0
sum' (xs++[x]) = x + sum' xs
In other words why can't we use ++ in pattern matching ?
This is a deserving question, and it has so far received sensible answers (mutter only constructors allowed, mutter injectivity, mutter ambiguity), but there's still time to change all that.
We can say what the rules are, but most of the explanations for why the rules are what they are start by over-generalising the question, addressing why we can't pattern match against any old function (mutter Prolog). This is to ignore the fact that ++ isn't any old function: it's a (spatially) linear plugging-stuff-together function, induced by the zipper-structure of lists. Pattern matching is about taking stuff apart, and indeed, notating the process in terms of the plugger-togetherers and pattern variables standing for the components. Its motivation is clarity. So I'd like
lookup :: Eq k => k -> [(k, v)] -> Maybe v
lookup k (_ ++ [(k, v)] ++ _) = Just v
lookup _ _ = Nothing
and not only because it would remind me of the fun I had thirty years ago when I implemented a functional language whose pattern matching offered exactly that.
The objection that it's ambiguous is a legitimate one, but not a dealbreaker. Plugger-togetherers like ++ offer only finitely many decompositions of finite input (and if you're working on infinite data, that's your own lookout), so what's involved is at worst search, rather than magic (inventing arbitrary inputs that arbitrary functions might have thrown away). Search calls for some means of prioritisation, but so do our ordered matching rules. Search can also result in failure, but so, again, can matching.
We have a sensible way to manage computations offering alternatives (failure and choice) via the Alternative abstraction, but we are not used to thinking of pattern matching as a form of such computation, which is why we exploit Alternative structure only in the expression language. The noble, if quixotic, exception is match-failure in do-notation, which calls the relevant fail rather than necessarily crashing out. Pattern matching is an attempt to compute an environment suitable for the evaluation of a 'right-hand side' expression; failure to compute such an environment is already handled, so why not choice?
(Edit: I should, of course, add that you only really need search if you have more than one stretchy thing in a pattern, so the proposed xs++[x] pattern shouldn't trigger any choices. Of course, it takes time to find the end of a list.)
Imagine there was some sort of funny bracket for writing Alternative computations, e.g., with (|) meaning empty, (|a1|a2|) meaning (|a1|) <|> (|a2|), and a regular old (|f s1 .. sn|) meaning pure f <*> s1 .. <*> sn. One might very well also imagine (|case a of {p1 -> a1; .. pn->an}|) performing a sensible translation of search-patterns (e.g. involving ++) in terms of Alternative combinators. We could write
lookup :: (Eq k, Alternative a) => k -> [(k, v)] -> a k
lookup k xs = (|case xs of _ ++ [(k, v)] ++ _ -> pure v|)
We may obtain a reasonable language of search-patterns for any datatype generated by fixpoints of differentiable functors: symbolic differentiation is exactly what turns tuples of structures into choices of possible substructures. Good old ++ is just the sublists-of-lists example (which is confusing, because a list-with-a-hole-for-a-sublist looks a lot like a list, but the same is not true for other datatypes).
Hilariously, with a spot of LinearTypes, we might even keep hold of holey data by their holes as well as their root, then plug away destructively in constant time. It's scandalous behaviour only if you don't notice you're doing it.
You can only pattern match on constructors, not on general functions.
Mathematically, a constructor is an injective function: each combination of arguments gives one unique value, in this case a list. Because that value is unique, the language can deconstruct it again into the original arguments. I.e., when you pattern match on :, you essentially use the function
uncons :: [a] -> Maybe (a, [a])
which checks if the list is of a form you could have constructed with : (i.e., if it is non-empty), and if yes, gives you back the head and tail.
++ is not injective though, for example
Prelude> [0,1] ++ [2]
[0,1,2]
Prelude> [0] ++ [1,2]
[0,1,2]
Neither of these representations is the right one, so how should the list be deconstructed again?
What you can do however is define a new, “virtual” constructor that acts like : in that it always seperates exactly one element from the rest of the list (if possible), but does so on the right:
{-# LANGUAGE PatternSynonyms, ViewPatterns #-}
pattern (:>) :: [a] -> a -> [a]
pattern (xs:>ω) <- (unsnoc -> Just (xs,ω))
where xs:>ω = xs ++ [ω]
unsnoc :: [a] -> Maybe ([a], a)
unsnoc [] = Nothing
unsnoc [x] = Just x
unsnoc (_:xs) = unsnoc xs
Then
sum' :: Num a => [a] -> a
sum' (xs:>x) = x + sum xs
sum' [] = 0
Note that this is very inefficient though, because the :> pattern-synonym actually needs to dig through the entire list, so sum' has quadratic rather than linear complexity.
A container that allows pattern matching on both the left and right end efficiently is Data.Sequence, with its :<| and :|> pattern synonyms.
You can only pattern-match on data constructors, and ++ is a function, not a data constructor.
Data constructors are persistent; a value like 'c':[] cannot be simplified further, because it is a fundamental value of type [Char]. An expression like "c" ++ "d", however, can replaced with its equivalent "cd" at any time, and thus couldn't reliably be counted on to be present for pattern matching.
(You might argue that "cd" could always replaced by "c" ++ "d", but in general there isn't a one-to-one mapping between a list and a decomposition via ++. Is "cde" equivalent to "c" ++ "de" or "cd" ++ "e" for pattern matching purposes?)
++ isn't a constructor, it's just a plain function. You can only match on constructors.
You can use ViewPatterns or PatternSynonyms to augment your ability to pattern match (thanks #luqui).

Non-exhaustive pattern while iterating matrix

So I'm very new to haskell, and I'm not very sure how to deal with this error when iterating over a matrix. I'm guessing there's a case I'm not considering, but I can't figure out what it is. I've got two functions, one that turns a list into a string and another that turns a matrix into a string. These are my two functions:
listToString :: [Int] -> String
listToString [] = "\n"
listToString (x:xs) = show x ++ " " ++ listToString xs
matToString :: [[Int]] -> String
matToString [[]] = ""
matToString (y:x:xs)) = listToString y ++ matToString (x:xs)
listToString works fine but matToString does not. I was wondering if someone could help me out with this. I've been having a hard time understanding Haskell, since I've never programmed in a functional programming language before, or well at least not one that is purely functional.
Your recursive case covers every list with at least two arguments, so that's cool. The problem is your base case—it only covers the case of a list with exactly one element, itself the empty list.
Add this to the very top of your file: {-# OPTIONS_GHC -Wall #-}. That should give you a detailed compiler warning indicating which pattern(s) is/are missing.

First attempt at Haskell: Converting lower case letters to upper case

I have recently started learning Haskell, and I've tried creating a function in order to convert a lower case word to an upper case word, it works, but I don't know how good it is and I have some questions.
Code:
lowerToUpperImpl element list litereMari litereMici =
do
if not (null list) then
if (head list) == element then
['A'..'Z'] !! (length ['A'..'Z'] - length (tail list ) -1)
else
lowerToUpperImpl element (tail list) litereMari litereMici
else
'0' --never to be reached
lowerToUpper element = lowerToUpperImpl element ['a'..'z'] ['A'..'Z'] ['a'..'z']
lowerToUpperWordImpl word =
do
if not (null word) then
lowerToUpper (head (word)):(lowerToUpperWordImpl (tail word))
else
""
I don't like the way I have passed the upper case and lower case
letters , couldn't I just declare a global variables or something?
What would your approach be in filling the dead else branch?
What would your suggestions on improving this be?
Firstly, if/else is generally seen as a crutch in functional programming languages, precisely because they aren't really supposed to be used as branch operations, but as functions. Also remember that lists don't know their own lengths in Haskell, and so calculating it is an O(n) step. This is particularly bad for infinite lists.
I would write it more like this (if I didn't import any libraries):
uppercase :: String -> String
uppercase = map (\c -> if c >= 'a' && c <= 'z' then toEnum (fromEnum c - 32) else c)
Let me explain. This code makes use of the Enum and Ord typeclasses that Char satisfies. fromEnum c translates c to its ASCII code and toEnum takes ASCII codes to their equivalent characters. The function I supply to map simply checks that the character is lowercase and subtracts 32 (the difference between 'A' and 'a') if it is, and leaves it alone otherwise.
Of course, you could always just write:
import Data.Char
uppercase :: String -> String
uppercase = map toUpper
Hope this helps!
The things I always recommend to people in your circumstances are these:
Break the problem down into smaller pieces, and write separate functions for each piece.
Use library functions wherever you can to solve the smaller subproblems.
As an exercise after you're done, figure out how to write on your own the library functions you used.
In this case, we can apply the points as follows. First, since String in Haskell is a synonym for [Char] (list of Char), we can break your problem into these two pieces:
Turn a character into its uppercase counterpart.
Transform a list by applying a function separately to each of its members.
Second point: as Alex's answer points out, the Data.Char standard library module comes with a function toUpper that performs the first task, and the Prelude library comes with map which performs the second. So using those two together solves your problem immediately (and this is exactly the code Alex wrote earlier):
import Data.Char
uppercase :: String -> String
uppercase = map toUpper
But I'd say that this is the best solution (shortest and clearest), and as a beginner, this is the first answer you should try.
Applying my third point: after you've come up with the standard solution, it is enormously educational to try and write your own versions of the library functions you used. The point is that this way you learn three things:
How to break down problems into easier, smaller pieces, preferably reusable ones;
The contents of the standard libraries of the language;
How to write the simple "foundation" functions that the library provides.
So in this case, you can try writing your own versions of toUpper and map. I'll provide a skeleton for map:
map :: (a -> b) -> [a] -> [b]
map f [] = ???
map f (x:xs) = ???

Resources