How to deal with file ending '\' in strings haskell - haskell

import Data.Char (isAlpha)
import Data.List (elemIndex)
import Data.Maybe (fromJust)
helper = ['a'..'z'] ++ ['a'..'z'] ++ ['A'..'Z'] ++ ['A'..'Z']
rotate :: Char -> Char
rotate x | '\' = '\'
|isAlpha(x) = helper !! (fromJust (elemIndex x helper) + 13)
| otherwise = x
rot13 :: String -> String
rot13 "" = ""
rot13 s = map rotate s
main = do
print $ rot13( "Hey fellow warriors" )
print $ rot13( "This is a test")
print $ rot13( "This is another test" )
print $ rot13("\604099\159558\705559&\546452\390142")
n <- getLine
print $ rot13( show n)
This is my code for ROT13 and there is an error when I try to pass file ending directly
rot13.hs:8:15: error:
lexical error in string/character literal at character ' '
|
8 | rotate x | '\' = '\'
There is also an error even from if not replace just use isAlpha to filter
How to deal with this?

As in many languages, backslash is the escape character. It's used to introduce characters that are hard or impossible to include in strings in other ways. For example, strings can't span multiple lines*, so it's impossible to include a literal newline in a string literal; and double-quotes end the string, so it's normally impossible to include a double quote in a string literal. The \n and \" escapes, respectively, covers those:
> putStrLn "before\nmiddle\"after"
before
middle"after
>
Since \ introduces escape codes, it always expects to be followed by something. If you want a literal backslash to be included at that spot, you can use a second backslash. For example:
> putStrLn "before\\after"
before\after
>
The Report, Section 2.6 is the final word on what escapes are available and what they mean.
Literal characters have a similar (though not quite identical) collection of escapes to strings. So the fix to your syntax looks like this:
rotate x | '\\' = '\\'
This will let your code parse, though there are further errors to fix once you get past that.
* Yes, yes, string gaps. I know. Doesn't actually change the point, since the newline in the gap isn't included in the resulting string.

Related

Haskell - Splitting a string by delimiter

I am trying to write a program in Haskell to split a string by delimiter.
And I have studied different examples provided by other users. An example would the the code that is posted below.
split :: String -> [String]
split [] = [""]
split (c:cs)
| c == ',' = "" : rest
| otherwise = (c : head rest) : tail rest
where
rest = split cs
Sample Input: "1,2,3".
Sample Output: ["1","2","3"].
I have been trying to modify the code so that the output would be something like ["1", "," , "2", "," , "3"] which includes the delimiter in the output as well , but I just cannot succeed.
For example, I changed the line:
| c == ',' = "" : rest
into:
| c == ',' = "," : rest
But the result becomes ["1,","2,","3"].
What is the problem and in which part I have had a misunderstanding?
If you're trying to write this function "for real" instead of writing the character-by-character recursion for practice, I think a clearer method is to use the break function from Data.List. The following expression:
break (==',') str
breaks the string into a tuple (a,b) where the first part consists of the initial "comma-free" part, and the second part is either more string starting with the comma or else empty if there's no more string.
This makes the definition of split clear and straightforward:
split str = case break (==',') str of
(a, ',':b) -> a : split b
(a, "") -> [a]
You can verify that this handles split "" (which returns [""]), so there's no need to treat that as a special case.
This version has the added benefit that the modification to include the delimiter is also easy to understand:
split2 str = case break (==',') str of
(a, ',':b) -> a : "," : split2 b
(a, "") -> [a]
Note that I've written the patterns in these functions in more detail than is necessary to make it absolute clear what's going on, and this also means that Haskell does a duplicate check on each comma. For this reason, some people might prefer:
split str = case break (==',') str of
(a, _:b) -> a : split b
(a, _) -> [a]
or, if they still wanted to document exactly what they were expecting in each case branch:
split str = case break (==',') str of
(a, _comma:b) -> a : split b
(a, _empty) -> [a]
Instead of altering code in the hope that it matches the expecations, it is usually better to understand the code fragment first.
split :: String -> [String]
split [] = [""]
split (c:cs) | c == ',' = "" : rest
| otherwise = (c : head rest) : tail rest
where rest = split cs
First of all we better analyze what split does. The first statement simply says "The split of an empty string, is a list with one element, the empty string". This seems reasonable. Now the second clause states: "In case the head of the string is a comma, we produce a list where the first element is an empty string, followed by splitting up the remainings of the string.". The last guard says "In case the first character of the string is not a comma, we prepend that character to the first item of the split of the remaining string, followed by the remaining elements of the split of the remaining string". Mind that split returns a list of strings, so the head rest is a string.
So if we want to add the delimiter to the output, then we need to add that as a separate string in the output of split. Where? In the first guard. We should not return "," : rest, since the head is - by recursion - prepended, but as a separate string. So the result is:
split :: String -> [String]
split [] = [""]
split (c:cs) | c == ',' = "" : "," : rest
| otherwise = (c : head rest) : tail rest
where rest = split cs
That example code is poor style. Never use head and tail unless you know exactly what you're doing (these functions are unsafe, partial functions). Also, equality comparisons are usually better written as dedicated patterns.
With that in mind, the example becomes:
split :: String -> [String]
split "" = [""]
split (',':cs) = "" : split cs
split (c:cs) = (c:cellCompletion) : otherCells
where cellCompletion : otherCells = split cs
(Strictly speaking, this is still unsafe because the match cellCompletion:otherCells is non-exhaustive, but at least it happens in a well-defined place which will give a clear error message if anything goes wrong.)
Now IMO, this makes it quite a bit clearer what's actually going on here: with "" : split cs, the intend is not really to add an empty cell to the result. Rather, it is to add a cell which will be filled up by calls further up in the recursion stack. This happens because those calls deconstruct the deeper result again, with the pattern match cellCompletion : otherCells = split cs, i.e. they pop off the first cell again and prepend the actual cell contents.
So, if you change that to "," : split, the effect is just that all cells you build will already be pre-terminated with a , character. That's not what you want.
Instead you want to add an additional cell that won't be touched anymore. That needs to be deeper in the result then:
split (',':cs) = "" : "," : split cs

Backslash in string changing output

I am currently trying to implement a method that counts the number of characters and digits in a string. However if I use a string that contains the '\' character I am getting strange results. I am guessing it's because the backslash character is an escape character.
Here is the method:
import Data.Char
countLettersAndDigits :: String -> Int
countLettersAndDigits [] = 0
countLettersAndDigits (x:xs) = if isDigit x == True || isLetter x == True
then 1 + countLettersAndDigits xs
else countLettersAndDigits xs
Here is a set of inputs with their respective results:
"1234fd" -> 6 (Doesn't contain '\')
"1234f\d" -> lexical error in string/character literal at character
'd'
"1234\fd" -> 5
"123\4fd" -> 5
"12\34fd" -> 4
"1\234fd" -> 4
"\1234fd" -> 3
I find it strange that, for example, "1234\fd" and "123\4fd" both give 5 as a result.
Any help explaining why this maybe the case and also how to get around this problem? would be great!
Cheers.
Edit
I forgot to mention that the string that I used above was just an example I was playing with. The actual string that is causing a problem is being generated by Quick Check. The string was "\178". So I require a way to be able to handle this case in my code when their is only one backslash and the string is being generated for me. Cheers.
You are correct that \ is Haskell's escape character. If you print out the generated strings, the answer may be more obvious:
main = mapM_ putStrLn [ "1234fd"
, "1234\fd"
, "123\4fd"
, "12\34fd"
, "1\234fd"
, "\1234fd"
]
yields...
1234fd
1234d
123fd
12"fd
1êfd
Ӓfd
If you actually intended on including a backslash character in your string, you need to double it up: "\\" will result in a single \ being printed.
You can read up on escape sequences here.

Split a string while keeping delimiters Haskell

Basically I'm trying to split a String into [[String]] and then concat the results back but keeping the delimiters in the resultant list (even repeating in a row).
Something like the below kind of works, but the delimiter gets crunched into one space instead of retaining all three spaces
unwords . map (\x -> "|" ++ x ++"|") . words $ "foo bar"
-- "|foo| |bar|"
Ideally I could get something like:
"|foo|| ||bar|" -- or
"|foo| |bar|"
I just can't figure out how to preserve the delimiter, all the split functions I've seen have dropped the delimiters from the resulting lists, I can write one myself but it seems like something that would be in a standardish library and at this point I'm looking to learn more than the basics which includes getting familiar with more colloquial ways of doing things.
I think I'm looking for some function like:
splitWithDelim :: Char -> String -> [String]
splitWithDelim "foo bar" -- ["foo", " ", " ", " ", "bar"]
or maybe it's best to use regexes here?
You can split a list, keeping delimiters using the keepDelimsL and keepDelimsR functions in the Data.List.Split package, like here:
split (keepDelimsL $ oneOf "xyz") "aazbxyzcxd" == ["aa","zb","x","y","zc","xd"]

Standard ML string to a list

Is there a way in ML to take in a string and output a list of those string where a separation is a space, newline or eof, but also keeping strings inside strings intact?
EX) hello world "my id" is 5555
-> [hello, world, my id, is, 5555]
I am working on a tokenizing these then into:
->[word, word, string, word, int]
Sure you can! Here's the idea:
If we take a string like "Hello World, \"my id\" is 5555", we can split it at the quote marks, ignoring the spaces for now. This gives us ["Hello World, ", "my id", " is 5555"]. The important thing to notice here is that the list contains three elements - an odd number. As long as the string only contains pairs of quotes (as it will if it's properly formatted), we'll always get an odd number of elements when we split at the quote marks.
A second important thing is that all the even-numbered elements of the list will be strings that were unquoted (if we start counting from 0), and the odd-numbered ones were quoted. That means that all we need to do is tokenize the ones that were unquoted, and then we're done!
I put some code together - you can continue from there:
fun foo s =
let
val quoteSep = String.tokens (fn c => c = #"\"") s
val spaceSep = String.tokens (fn c => c = #" ") (* change this to include newlines and stuff *)
fun sepEven [] = []
| sepEven [x] = (* there were no quotes in the string *)
| sepEven (x::y::xs) = (* x was unquoted, y was quoted *)
in
if length quoteSep mod 2 = 0
then (* there was an uneven number of quote marks - something is wrong! *)
else (* call sepEven *)
end
String.tokens brings you halfway there. But if you really want to handle quotes like you are sketching then there is no way around writing an actual lexer. MLlex, which comes with SML/NJ and MLton (but is usable with any SML) could help. Or you just write it by hand, which should be easy enough in this case as well.

Lexical analysis of string token using Parsec

I have this parser for string parsing using Haskell Parsec library.
myStringLiteral = lexeme (
do str <- between (char '\'')
(char '\'' <?> "end of string")
(many stringChar)
; return (U.replace "''" "'" (foldr (maybe id (:)) "" str))
<?> "literal string"
)
Strings in my language are defined as alpha-num characters inside of '' (example: 'this is my string'), but these string can also contain ' inside of it (in this case ' must be escaped by another ', ex 'this is my string with '' inside of it').
What I need to do, is to look forward when ' appears during parsing of string and decide, if there is another ' after or not (if no, return end of string). But I dont know how to do it. Any ideas? Thanks!
If the syntax is as simple as it seems, you can make a special case for the escaped single quote,
escapeOrStringChar :: Parser Char
escapeOrStringChar = try (string "''" >> return '\'') <|> stringChar
and use that in
myStringLiteral = lexeme $ do
char '\''
str <- many escapeOrStringChar
char '\'' <?> "end of string"
return str
You can use stringLiteral for that.
Parsec deals only with LL(1) languages (details). It means the parser can look only one symbol a time. Your language is LL(2). You can write your own FSM for parsing your language. Or you can transform the text before parsing to make it LL(1).
In fact, Parsec is designed for syntactic analysis not lexical. The good idea is to make lexical analysis with other tool and than use Parsec for parsing the sequence of lexemes instead of sequence of chars.

Resources