Haskell - Echoing all characters besides spaces

Haskell - Echoing all characters besides spaces - haskell

I am supposed to have an input of [Char] and output [Char] but from the input double all the characters twice besides the spaces.
I can double each character including the spaces but can not figure out how to exclude the spaces.
echo :: [Char] -> [Char]
echo x = concatMap (replicate 2) x
This will take "Hello World" and output "HHeelloo WWoorrlldd" (2 spaces)
but I want it to output "HHeelloo WWoorrlldd" (1 space)
Any ideas would be helpful!
Edit: Thanks for all the helpful ideas! I have been able to figure out how to properly implement this!

Well, so you've observed that replicate 2 doesn't quite do what you want, because it duplicates spaces when you don't want it to. So let's write a new function that checks if it's a space before deciding what to do, hey? You can use pattern matching to check if your input Char is a space, like this:
notReplicate2 :: Char -> [Char]
notReplicate2 ' ' = {- exercise -}
notReplicate2 anythingElse = {- exercise -}
Or, if you want to handle things like newlines, tabs, vertical tabs, etc. similarly to a single space character, you could put some meat on this skeleton instead:
import Data.Char
notReplicate2 :: Char -> [Char]
notReplicate2 c | isSpace c = {- exercise -}
| otherwise = {- exercise -}

Related

Foldl-like operator for Parsec

Suppose I have a function of this type:
once :: (a, b) -> Parser (a, b)
Now, I would like to repeatedly apply this parser (somewhat like using >>=) and use its last output to feed it in the next iteration.
Using something like
sequence :: (a, b) -> Parser (a, b)
sequence inp = once inp >>= sequence
with specifying the initial values for the first parser doesn't work, because it would go on until it inevitably fails. Instead, I would like it to stop when it would fail (somewhat like many).
Trying to fix it using try makes the computation too complex (adding try in each iteration).
sequence :: (a, b) -> Parser (a, b)
sequence inp = try (once inp >>= sequence) <|> pure inp
In other words, I am looking for a function somewhat similar to foldl on Parsers, which stops when the next Parser would fail.

If your once parser fails immediately without consuming input, you don't need try. As a concrete example, consider a rather silly once parser that uses a pair of delimiters to parse the next pair of delimiters:
once :: (Char, Char) -> Parser (Char, Char)
once (c1, c2) = (,) <$ char c1 <*> anyChar <*> anyChar <* char c2
You can parse a nested sequence using:
onces :: (Char, Char) -> Parser (Char, Char)
onces inp = (once inp >>= onces) <|> pure inp
which works fine:
> parseTest (onces ('(',')')) "([])[{}]{xy}xabyDONE"
('a','b')
You only need try if your once might fail after parsing input. For example, the following won't parse without try:
> parseTest (onces ('(',')')) "([])[not valid]"
parse error at (line 1, column 8):
unexpected "t"
expecting "]"
because we start parsing the opening delimiter [ before discovering not valid].
(With try, it returns the correct ('[',']').)
All that being said, I have no idea how you came to the conclusion that using try makes the computation "too complex". If you are just guessing from something you've read about try being potentially inefficient, then you've misunderstood. try can cause problems if it's used in a manner than can result in a big cascade of backtracking. That's not a problem here -- at most, you're backtracking a single once, so don't worry about it.

String to [Char] in haskell

In Haskell, I got this string "bbbbffff", and I want to get a list in this form:
['b','b','b','b','f','f','f','f']
I think that I can use map, but sincerely, I don't know how to begin, and are a lot of things that I do not understand in Haskell.
Thanks in advance.

By default, a String is already a [Char] (see specification):
A string is a list of characters:
type String = [Char]
They simply don't print as ['b','b','b',...] because [Char] and String is the same type and therefore indistinguishable and must be shown the same way. Indeed, if you input your list you'll see it formatted as a string:
Prelude> 42
42
Prelude> [1,2,3]
[1,2,3]
Prelude> ['b','b','b','b','f','f','f','f']
"bbbbffff"
This means that you can immediately pass it to any list function, without having to do anything with it:
myLength :: [Char] -> Int
myLength (c:rest) = 1 + myLength rest
myLength [] = 0
-- Prints 11
main = print (myLength "hello world")
There are other textual data types like Data.Text, but these are for more advanced use and must be explicitly enabled and used.

How to generate strings drawn from every possible character?

At the moment I'm generating strings like this:
arbStr :: Gen String
arbStr = listOf $ elements (alpha ++ digits)
where alpha = ['a'..'z']
digits = ['0'..'9']
But obviously this only generates strings from alpha num chars. How can I do it to generate from all possible chars?

Char is a instance of both the Enum and Bounded typeclass, you can make use of the arbitraryBoundedEnum :: (Bounded a, Enum a) => Gen a function:
import Test.QuickCheck(Gen, arbitraryBoundedEnum, listOf)
arbStr :: Gen String
arbStr = listOf arbitraryBoundedEnum
For example:
Prelude Test.QuickCheck> sample arbStr
""
""
"\821749"
"\433465\930384\375110\256215\894544"
"\431263\866378\313505\1069229\238290\882442"
""
"\126116\518750\861881\340014\42369\89768\1017349\590547\331782\974313\582098"
"\426281"
"\799929\592960\724287\1032975\364929\721969\560296\994687\762805\1070924\537634\492995\1079045\1079821"
"\496024\32639\969438\322614\332989\512797\447233\655608\278184\590725\102710\925060\74864\854859\312624\1087010\12444\251595"
"\682370\1089979\391815"
Or you can make use of the arbitrary in the Arbitrary Char typeclass:
import Test.QuickCheck(Gen, arbitrary, listOf)
arbStr :: Gen String
arbStr = listOf arbitrary
Note that the arbitrary for Char is implemented such that ASCII characters are (three times) more common than non-ASCII characters, so the "distribution" is different.

Since Char is an instance of Bounded as well as Enum (confirm this by asking GHCI for :i Char), you can simply write
[minBound..maxBound] :: [Char]
to get a list of all legal characters. Obviously this will not lead to efficient random access, though! So you could instead convert the bounds to Int with Data.Char.ord :: Char -> Int, and use QuickCheck's feature to select from a range of integers, then map back to a character with Data.Chra.chr :: Int -> Char.

When we do like
λ> length ([minBound..maxBound] :: [Char])
1114112
we get the number of all characters and say Wow..! If you think the list is too big then you may always do like drop x . take y to limit the range.
Accordingly, if you need n many random characters just shuffle :: [a] -> IO [a] the list and do a take n from that shuffled list.
Edit:
Well of course... since shuffling could be expensive, it's best if we chose a clever strategy. It would be ideal to randomly limit the all characters list. So just
make a limits = liftM sort . mapM randomRIO $ replicate 2 (0,1114112) :: (Ord a, Random a, Num a) => IO [a]
limits >>= \[min,max] -> return . drop min . take max $ ([minBound..maxBound] :: [Char])
Finally just take n many like random Chars like liftM . take n from the result of Item 2.

Haskell code prints out a list for ints but not for chars

My code currently looks like this. It is supposed to show the possible first symbols in the regular expression definition given to us beforehand. I am supposed to print these out as a list. For example, if the answer is supposed to be [1,2], it will come out [1,2] but when the answer is supposed to be ['1','2'] it will come out "12" or when it is supposed to be ['a', 'b'] it will come out "ab". What am I doing wrong?
data RE a -- regular expressions over an alphabet defined by 'a'
= Empty -- empty regular expression
| Sym a -- match the given symbol
| RE a :+: RE a -- concatenation of two regular expressions
| RE a :|: RE a -- choice between two regular expressions
| Rep (RE a) -- zero or more repetitions of a regular expression
| Rep1 (RE a) -- one or more repetitions of a regular expression
deriving (Show)
firstMatches :: RE a -> [a]
firstMatches Empty = []
firstMatches (Sym a)= a:list
firstMatches(Rep(a))=firstMatches a
firstMatches(Rep1(a))=firstMatches a
firstMatches (Empty :+: b)= firstMatches b
firstMatches (a :+: _) = firstMatches a
firstMatches (a :|: b)= firstMatches a ++ firstMatches b

You're not doing anything wrong.
String is a type synonym for [Char], so if you try to print a [Char] it will print as a String. This is somewhat of a special case, and it can be a little weird.
Show is the typeclass used to print things as a string. The definition of Show is something like this:
class Show a where
showsPrec :: Int -> a -> ShowS
show :: a -> String
showList :: [a] -> ShowS
The showList function is optional. The documentation states:
The method showList is provided to allow the programmer to give a specialised way of showing lists of values. For example, this is used by the predefined Show instance of the Char type, where values of type String should be shown in double quotes, rather than between square brackets.
So if you define a new type and instantiate Show, you can optionally define a special way to show a list of your type, separate from the way it's normally shown and separate from the way lists are normally shown. Char takes advantage of this, in that a [Char] (or equivalently, a String), is shown with double-quotes instead of as a list of Char values.
I can't think of a way to get it to use the default show for a [Char]. I don't think there is one. A workaround might be to create a newtype wrapping Char with its own Show that uses the default showList implementation, but that doesn't seem appropriate here.
If this is homework, I'd expect the grader to know about this already, and I seriously doubt you'd get marked down for it, especially since the problem doesn't appear to be about show at all.

Split string on multiple delimiters of any length in Haskell

I am attempting a Haskell coding challenge where, given a certain string with a prefix indicating which substrings are delimiting markers, a list needs to be built from the input.
I have already solved the problem for multiple single-length delimiters, but I am stuck with the problem where the delimiters can be any length. I use splitOneOf from Data.List.Split, but this works for character (length 1) delimiters only.
For example, given
input ";,\n1;2,3,4;10",
delimiters are ';' and ','
splitting the input on the above delivers
output [1,2,3,4,10]
The problem I'm facing has two parts:
Firstly, a single delimiter of any length, e.g.
"****\n1****2****3****4****10" should result in the list [1,2,3,4,10].
Secondly, more than one delimiter can be specified, e.g.
input "[***][||]\n1***2||3||4***10",
delimiters are "***" and "||"
splitting the input on the above delivers
output [1,2,3,4,10]
My code for retrieving the delimiter in the case of character delimiters:
--This gives the delimiters as a list of characters, i.e. a String.
getDelimiter::String->[Char]
getDelimiter text = head . splitOn "\n" $ text
--drop "[delimiters]\n" from the input
body::String->String
body text = drop ((length . getDelimiter $ text)+1)) $ text
--returns tuple with fst being the delimiters, snd the body of the input
doc::String->(String,String)
doc text = (getDelimiter text, body text)
--given the delimiters and the body of the input, return a list of strings
numbers::(String,String)->[String]
numbers (delim, rest) = splitOneOf delim rest
--input ",##\n1,2#3#4" gives output ["1","2","3","4"]
getList::String->[String]
getList text = numbers . doc $ text
So my question is, how do I do the processing for when the delimiters are e.g. "***" and "||"?
Any hints are welcome, especially in a functional programming context.

If you don't mind making multiple passes over the input string, you can use splitOn from Data.List.Split, and gradually split the input string using one delimiter at a time.
You can write this fairly succinctly using foldl':
import Data.List
import Data.List.Split
splitOnAnyOf :: Eq a => [[a]] -> [a] -> [[a]]
splitOnAnyOf ds xs = foldl' (\ys d -> ys >>= splitOn d) [xs] ds
Here, the accumulator for the fold operation is a list of strings, or more generally [[a]], so you have to 'lift' xs into a list, using [xs].
Then you fold over the delimiters ds - not the input string to be parsed. For each delimiter d, you split the accumulated list of strings with splitOn, and concatenate them. You could also have used concatMap, but here I arbitrarily chose to use the more general >>= (bind) operator.
This seems to do what is required in the OP:
*Q49228467> splitOnAnyOf [";", ","] "1;2,3,4;10"
["1","2","3","4","10"]
*Q49228467> splitOnAnyOf ["***", "||"] "1***2||3||4***10"
["1","2","3","4","10"]
Since this makes multiple passes over temporary lists, it's most likely not the fastest implementation you can make, but if you don't have too many delimiters, or extremely long lists, this may be good enough.

This problem has two kinds of solutions: the simple, and the efficient. I will not cover the efficient (because it is not simple), though I will hint on it.
But first, the part where you extract the delimiter and body parts of the input, may be simplified with Data.List.break:
delims = splitOn "/" . fst . break (== '\n') -- Presuming the delimiters are delimited with
-- a slash.
body = snd . break (== '\n')
In any way, we may reduce this problem to finding the positions of all the given patterns in a given string. (By saying "string", I do not mean the haskell String. Rather, I mean an arbitrarily long sequence (or even an infinite stream) of any symbols for which an Equality relation is defined, which is typed in Haskell as Eq a => [a]. I hope this is not too confusing.) As soon as we have the positions, we may slice the string to our hearts' content. If we want to deal with an infinite stream, we must obtain the positions incrementally, and yield the results as we go, which is a restriction that must be kept in mind. Haskell is equipped well enough to handle the stream case as well as the finite string.
A simple approach is to cast isPrefixOf on the string, for each of the patterns.
If some of them matches, we replace it with a Nothing.
Otherwise we mark the first symbol as Just and move to the next position.
Thus, we will have replaced all the different delimiters by a single one: Nothing. We may then readily slice the string by it.
This is fairly idiomatic, and I will bring the code to your judgement shortly. The problem with this approach is that it is inefficient: in fact, if a pattern failed to match, we would rather advance by more than one symbol.
It would be more efficient to base our work on the research that has been made into finding patterns in a string; this problem is well known and there are great, intricate algorithms that solve it an order of magnitude faster. These algorithms are designed to work with a single pattern, so some work must be put into adapting them to our case; however, I believe they are adaptable. The simplest and eldest of such algorithms is the KMP, and it is already encoded in Haskell. You may wish to take arms and generalize it − a quick path to some amount of fame.
Here is the code:
module SplitSubstr where
-- stackoverflow.com/questions/49228467
import Data.List (unfoldr, isPrefixOf, elemIndex)
import Data.List.Split (splitWhen) -- Package `split`.
import Data.Maybe (catMaybes, isNothing)
-- | Split a (possibly infinite) string at the occurrences of any of the given delimiters.
--
-- λ take 10 $ splitOnSubstrs ["||", "***"] "la||la***fa"
-- ["la","la","fa"]
--
-- λ take 10 $ splitOnSubstrs ["||", "***"] (cycle "la||la***fa||")
-- ["la","la","fa","la","la","fa","la","la","fa","la"]
--
splitOnSubstrs :: [String] -> String -> [String]
splitOnSubstrs delims
= fmap catMaybes -- At this point, there will be only `Just` elements left.
. splitWhen isNothing -- Now we may split at nothings.
. unfoldr f -- Replace the occurences of delimiters with a `Nothing`.
where
-- | This is the base case. It will terminate the `unfoldr` process.
f [ ] = Nothing
-- | This is the recursive case. It is divided into 2 cases:
-- * One of the delimiters may match. We will then replace it with a Nothing.
-- * Otherwise, we will `Just` return the current element.
--
-- Notice that, if there are several patterns that match at this point, we will use the first one.
-- You may sort the patterns by length to always match the longest or the shortest. If you desire
-- more complicated behaviour, you must plug a more involved logic here. In any way, the index
-- should point to one of the patterns that matched.
--
-- vvvvvvvvvvvvvv
f body#(x:xs) = case elemIndex True $ (`isPrefixOf` body) <$> delims of
Just index -> return (Nothing, drop (length $ delims !! index) body)
Nothing -> return (Just x, xs)
It might happen that you will not find this code straightforward. Specifically, the unfoldr part is somewhat dense, so I will add a few words about it.
unfoldr f is an embodiment of a recursion scheme. f is a function that may chip a part from the body: f :: (body -> Maybe (chip, body)).
As long as it keeps chipping, unfoldr keeps applying it to the body. This is called recursive case.
Once it fails (returning Nothing), unfoldr stops and hands you all the chips it thus collected. This is called base case.
In our case, f takes symbols from the string, and fails once the string is empty.
That's it. I hope you send me a postcard when you receive a Turing award for a fast splitting algorithm.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Haskell - Echoing all characters besides spaces - haskell

Related

Foldl-like operator for Parsec

String to [Char] in haskell

How to generate strings drawn from every possible character?

Haskell code prints out a list for ints but not for chars

Split string on multiple delimiters of any length in Haskell

Categories

Resources