Haskell: Escaped character from character - haskell

I'm writing a parsec parser which reads in strings and converts escaped characters, as part of exercise 3 here.
For that exercise I am using this function:
escapedCharFromChar :: Char -> Char
escapedCharFromChar c = read $ concat ["'\\",[c],"'"]
I am not to impressed with the use of read to convert the character x into the escape character with the name x. Can anyone suggest a more elegant function of type Char -> Char to do this?

One way is to lay out the cases exhaustively:
charFromEscape :: Char -> Char
charFromEscape 'n' = '\n'
charFromEscape 't' = '\t'
--- ... --- Help!
You could also use lookup:
-- this import goes at the top of your source file
import Data.Maybe (fromJust)
charFromEscape :: Char -> Char
charFromEscape c = fromJust $ lookup c escapes
where escapes = [('n', '\n'), ('t', '\t')] -- and so on
The fromJust bit may look strange. The type of lookup is
lookup :: (Eq a) => a -> [(a, b)] -> Maybe b
which means for a value of some type over which equality is defined and a lookup table, it wants to give you the corresponding value from the lookup table—but your key isn't guaranteed to be present in the table! That's the purpose of Maybe, whose definition is
data Maybe a = Just a | Nothing
With fromJust, it assumes you got Just something (i.e., c has an entry in escapes), but this will fall apart when that assumption is invalid:
ghci> charFromEscape 'r'
*** Exception: Maybe.fromJust: Nothing
These examples will move you along in the exercise, but it's clear that you'd like better error handling. Also, if you expect the lookup table to be large, you may want to look at Data.Map.

read (or rather, Text.Read.Lex.lexCharE) is how you get at GHC's internal table, which is defined as:
lexEscChar =
do c <- get
case c of
'a' -> return '\a'
'b' -> return '\b'
'f' -> return '\f'
'n' -> return '\n'
'r' -> return '\r'
't' -> return '\t'
'v' -> return '\v'
'\\' -> return '\\'
'\"' -> return '\"'
'\'' -> return '\''
_ -> pfail
Eventually, you have to define the semantics somewhere. You can do it in your program, or you can reuse GHC's.

I just used pattern matching for the few escapes I cared about - i.e. 't' -> '\t'etc. The solution other readers suggested were similar. Not very generic, but very straight-forward.

You should consider implementing a proper Parser Char function instead of a Char -> Char.
(Or, if you're doing that anyway, consider using a Char -> Maybe Char instead.)
The Char -> Char approach only works if escape sequences consist of only a backslash and a single other character. Some languages have more complex character escapes that consist of a longer sequence of characters. For example, C++ supports multi-character escape sequences such as \u005C (which represents the unicode code point U+005C).
parseEscapeSequence :: Parser Char
parseEscapeSequence = do
c <- get
case c of
'\' -> return '\\'
'0' -> Just '\0'
't' -> return '\t'
'f' -> return '\f'
'r' -> return '\r'
'n' -> return '\n'
-- ...
'u' -> parseUnicodeEscape4
'U' -> parseUnicodeEscape8
-- ...
_ -> fail "Unrecognised escape sequence"
Whereby parseUnicodeEscape4 and parseUnicodeEscape8 would each parse a fixed number of hexadecimal digits and convert them into a unicode character, likely by first converting the digits into integers in the 0..15 range, then combining those 'nibbles' into a larger integer, and then converting that integer into a unicode character.
You could alternatively offload the simple escape sequences to another function that does pattern matching, but that function should ideally have a type of Char -> Maybe Char, to allow for proper error reporting.
parseEscapeSequence :: Parser Char
parseEscapeSequence = do
c <- get
case c of
-- ...
'u' -> parseUnicodeEscape4
'U' -> parseUnicodeEscape8
-- ...
_ ->
case maybeCharFromEscape c of
Just result -> return result
Nothing -> fail "Unrecognised escape sequence"
maybeCharFromEscape :: Char -> Maybe Char
maybeCharFromEscape c =
case c of
'\' -> Just '\\'
'0' -> Just '\0'
't' -> Just '\t'
'f' -> Just '\f'
'r' -> Just '\r'
'n' -> Just '\n'
_ -> Nothing
For the maybeCharFromEscape function you could alternatively implement it via lookup or a Map (as other answers point out), but even then you're still going to have to explicitly write down all the possibilities and the result might be less efficient (though you may find it more readable).

Related

Haskell function incompatible type and definition

I'm trying to understand this function (taken from here)
escape :: String -> String
escape =
let
escapeChar c =
case c of
'<' -> "<"
'>' -> ">"
_ -> [c]
in
concat . map escapeChar
My questions are:
According to the type, escape is a function that takes a String. But it seems that in in the fuction definition it does not receive any argument. How does this work?
What is the relationship between escapeChar and c? How is that relationship established? Does c coming right after escapeChar have a meaning?
Would it be easier if escapeChar were a top-level definition using pattern matching:
escape :: String -> String
escape = concatMap escapeChar
escapeChar :: Char -> String
escapeChar '<' = "<"
escapeChar '>' = ">"
escapeChar ch = [ch]
[ch] is a singleton list, it turns ch :: Char into a [ch] :: String.
In Haskell you can remove/add an argument from/to each side (eta conversion). escape is the eta reduced form of
escape :: String -> String
escape str = concatMap escapeChar str
Just like, if you wanted to define a synonym for (+) you have equivalent ways of writing it. I feel like the add = (+) is clearest, you are identifying the two functions. The arguments are the same on both sides so we don't specify them.
add :: Int -> Int -> Int
add = (+)
add a = (+) a
add a = (a +)
add a b = (+) a b
add a b = a + b
These are equivalent ways of writing escape:
escape = concat . map escapeChar
escape str = concat (map escapeChar str)
According to the type, escape is a function that takes a String. But it seems that in in the fuction definition it does not receive any argument. How does this work?
concat . map escape returns a function. That function will take a string and process it.
What is the relationship between escapeChar and c? How is that relationship established? Does c coming right after escapeChar have a meaning?
Yes, it is the first (and only) parameter of the function. It is a Character, and the escapeChar function maps that Char on a String. The let clause thus defines a function escapeChar :: Char -> String that will then be used in concat . map escape (or perhaps better concatMap escape). This will map each Char of the given String to a substring, and these are then concatenated together as a result.
the map function has a signature (a -> b) -> ([a] -> [b]) this means that the map function takes in a function(the escapeChar function) and returns a function that converts a list using that function(the escapeChar function). map escapeChar returns a function that converts a string using the escapeChar function on each character in the string.

Haskell - Echoing all characters besides spaces

I am supposed to have an input of [Char] and output [Char] but from the input double all the characters twice besides the spaces.
I can double each character including the spaces but can not figure out how to exclude the spaces.
echo :: [Char] -> [Char]
echo x = concatMap (replicate 2) x
This will take "Hello World" and output "HHeelloo WWoorrlldd" (2 spaces)
but I want it to output "HHeelloo WWoorrlldd" (1 space)
Any ideas would be helpful!
Edit: Thanks for all the helpful ideas! I have been able to figure out how to properly implement this!
Well, so you've observed that replicate 2 doesn't quite do what you want, because it duplicates spaces when you don't want it to. So let's write a new function that checks if it's a space before deciding what to do, hey? You can use pattern matching to check if your input Char is a space, like this:
notReplicate2 :: Char -> [Char]
notReplicate2 ' ' = {- exercise -}
notReplicate2 anythingElse = {- exercise -}
Or, if you want to handle things like newlines, tabs, vertical tabs, etc. similarly to a single space character, you could put some meat on this skeleton instead:
import Data.Char
notReplicate2 :: Char -> [Char]
notReplicate2 c | isSpace c = {- exercise -}
| otherwise = {- exercise -}

Haskell: Parsing String to Custom Type

I have declared a type:
type Foo = (Char, Char, Char)
And want to be able to parse a 3 letter string "ABC" to produce an output Foo with each of ABC as the three attributes of the type.
My current attempt is;
parseFoo :: String → Maybe Foo
parseFoo str = f where
f (a, _, _) = str[0]
f (_, b, _) = str[1]
f (_, _, c) = str[2]
This is returning an error:
Illegal operator ‘→’ in type ‘String → Maybe Foo’
Use TypeOperators to allow operators in types
My question is:
How do I prevent this error on compilation?
Am I even on the right track?
If I understand it the correct way, you want to store the first three characters of a string into a type Foo (which is an alias for a 3-tuple that contains three Chars).
The signature seems correct (it is good practice to return a Maybe if something can go wrong, and here it is possible that the string contains less than three characters). A problem hwever is that you write an arrow character → whereas signatures in Haskell usse -> (two ASCII characters, a dash and a greater than symbol).
So we can define the signature as:
parseFoo :: String -> Maybe Foo
Now the second problem is that you here define a function f that maps Foos to Strings, so the reverse. You also make use of a syntax that is frequently used for indexing in languages of the C/C++/C#/Java programming language family, but indexing in Haskell is done with the (!!) operator, and since you define the function in reverse, it will not help.
A string is a list of Chars, so:
type String = [Char]
We can thus define two patterns:
a list with three (or more) characters; and
a list with less than three characters.
For the former, we return a 3-tuple with these characters (wrapped in a Just), for the latter we return Nothing:
parseFoo :: String -> Maybe Foo
parseFoo (a:b:c:_) = Just (a, b, c)
parseFoo _ = Nothing
Or if we do not want to parse strings with more than three characters successfully:
parseFoo :: String -> Maybe Foo
parseFoo [a, b, c] = Just (a, b, c)
parseFoo _ = Nothing

Why do I have to compose id with mapM

I have the following methods:
nucleotideComplement :: Char -> Either String Char
nucleotideComplement 'G' = Right 'C'
nucleotideComplement 'C' = Right 'G'
nucleotideComplement 'T' = Right 'A'
nucleotideComplement 'A' = Right 'U'
nucleotideComplement x = Left "Not a valid RNA nucleotide."
And would like to define another:
toRNA :: String -> String
toRNA = either error mapM nucleotideComplement
However I'm getting a type error here. However doing it this way seems to fix the issue:
toRNA :: String -> String
toRNA = either error id . mapM nucleotideComplement
I don't understand why this happens
First, id has the type a -> a. Next, when getting the type (:t) of mapM nucleotideComplement and id . mapM nucleotideComplement, they seem to be the same. Why am I getting such a different effect?
Hope someone could clarify this further.
I think you're reading this wrong...
either error id . mapM nucleotideComplement
You seem to think this means
either error (id . mapM nucleotideComplement)
when in fact it means
(either error id) . (mapM nucleotideComplement)
You aren't doing id . mapM nucleotideComplement anywhere. You're applying mapM and then passing the result to either, which will apply error or id depending on whether it sees Left or Right.
The type of either is (a -> c) -> (b -> c) -> Either a b -> c. So you apply it to error and you get (b -> c) -> Either String b -> c, then you apply that to mapM and you get Monad m => Either String (a -> m b) -> [a] -> m [b]. Then you apply that to nucleotideComplement and you get an error because nucleotideComplement is a function and not an Either.
In other words you apply either to three arguments when you intended to call it with two arguments where the second argument was the result of applying mapM to nucleotideComplement. To call the function with the arguments you intended, you can write either error (mapM nucleotideComponent), but that still won't work because the second argument to either should be a function accepting a Char (because you have an Either String Char), not one accepting a monad. To achieve what you wanted you can either write either error nucleotideComponent or use . as you already found out.
The version with . works because the precedence rules of Haskell say that either error id . mapM nucleotideComplement is equivalent to (either error id) . (mapM nucleotideComplement), not (either error id . mapM) nucleotideComplement or either error (id . mapM nucleotideComplement). either error id is a function that turns an Either String b into an Either a b where the left case would cause an error and mapM nucleotideComplement is a function that turns an m Char into another m Char with the char being "flipped" for any monad m - in this case m being Either String. So by composing these two functions, you get a function that turns an Either String Char into an Either a Char with the right case being a flipped char and the left case causing an error.
Of course either error flipNucleotide is the far simpler solution.
This doesn't exactly get at your type error, but I'd like to suggest that you reconsider your representation. Specifically, it's generally best to use types to enforce invariants, avoiding partial functions that can throw errors or exceptions, and avoiding accidentally mixing up related things that may belong to different parts of the code. There are various ways to approach this, but here's one. This approach pretends that DNA and RNA have completely different kinds of nucleotides. Chemically, this is not true, but it's probably a sensible representation for what you're doing. Actually encoding the chemical reality is probably beyond the abilities of Haskell's type system, and probably actually less useful for catching mistakes in this context.
data DNANucleotide = GD | CD | TD | AD
data RNANucleotide = GR | CR | UR | AR
toStringDNA :: [DNANucleotide] -> String
toStringDNA = map (\nucleotide -> case nucleotide of
{GD -> 'G'; CD -> 'C'; TD -> 'T'; AD -> 'A'})
toStringRNA = ...
fromCharDNA :: Char -> Maybe DNANucleotide
fromCharDNA 'G' = Just GD
fromCharDNA 'C' = Just CD
...
fromCharDNA _ = Nothing
fromCharRNA = ...
fromStringDNA :: String -> Maybe [DNANucleotide]
fromStringDNA = mapM fromCharDNA
fromStringRNA :: String -> Maybe [RNANucleotide]
fromStringRNA = mapM fromCharRNA
Once you get into the actual mechanics of working with DNA and RNA, as opposed to reading them in from strings, there can be no more errors:
transcribeN :: DNANucleotide -> RNANucleotide
transcribeN GD = CR
transcribeN CD = GR
transcribeN TD = AR
transcribeN AD = UR
transcribe :: [DNANucleotide] -> [RNANucleotide]
transcribe = map transcribeN

Convert String to Tuple, Special Formatting in Haskell

For a test app, I'm trying to convert a special type of string to a tuple. The string is always in the following format, with an int (n>=1) followed by a character.
Examples of Input String:
"2s"
"13f"
"1b"
Examples of Desired Output Tuples (Int, Char):
(2, 's')
(13, 'f')
(1, 'b')
Any pointers would be extremely appreciated. Thanks.
You can use readS to parse the int and get the rest of the string:
readTup :: String -> (Int, Char)
readTup s = (n, head rest)
where [(n, rest)] = reads s
a safer version would be:
maybeReadTup :: String -> Maybe (Int, Char)
maybeReadTup s = do
[(n, [c])] <- return $ reads s
return (n, c)
Here's one way to do it:
import Data.Maybe (listToMaybe)
parseTuple :: String -> Maybe (Int, Char)
parseTuple s = do
(int, (char:_)) <- listToMaybe $ reads s
return (int, char)
This uses the Maybe Monad to express the possible parse failure. Note that if the (char:_) pattern fails to match (i.e., if there is only a number with no character after it), this gets translated into a Nothing result (this is due to how do notation works in Haskell. It calls the fail function of the Monad if pattern matches fail. In the case of Maybe a, we have fail _ = Nothing). The function also evaluates to Nothing if reads can't read an Int at the beginning of the input. If this happens, reads gives [] which is then turned into Nothing by listToMaybe.

Resources