How are apostrophes/character literals parsed in Haskell? - haskell

I'm trying to write something that reads Lambda expressions and outputs a beta reduced version. Lambdas will be typed as follows : \variable -> expression and applications will be of the form (expression) (expression). So if '\' is found at the beginning of the string it knows to process a Lambda and if '(' is found it knows to process an application.
I have a type for Lambda Expressions defined:
data Expression = Variable String
| Lambda Expression Expression
| Apply Expression Expression
Here's my first attempt at writing a function for reading the input
processInput :: String -> Expression
processInput ('\':input) = processLambda input
processInput ('(':input) = processApply input
processInput str = Variable str
When I try to load this function I get
lexical error in string/character literal at ':'
So I tried using guards instead:
processInput input
| head input == '\' = processLambda (tail input)
| head input == '(' = processApply (tail input)
| otherwise = Variable input
But got
lexical error in string/character literal at character ' '
I have no idea what's wrong with either of these functions.

The backslash is a special character in string and character literals. You use to represent non-printable characters, line breaks and characters that would otherwise have special meaning in a literal. For example '\n' is a line break '\b' is a back space and '\'' is a single quote (without the \, the second ' would be seen as the end of the character literal).
So when you write '\', the lexer sees the start of a character literal, followed by an escaped '. Now it expects another ' to close the character literal, but gets a colon instead, causing the error.
To represent a backslash as a character literal, you escape the backslash with another backslash like this: '\\'.

The backslash is the escape character so it needs to be doubled up to represent a single backslash: '\\'.
processInput ('\\':input) = processLambda input
...
-- or...
processInput input
| head input == '\\' = processLambda (tail input)
...

Related

Haskell - how to pattern match on backslash character?

I want to replace \n with a space in a String with a recursive function using pattern matching, but I can't figure out how to match the \ char.
This is my function:
replace :: String -> String
replace ('\\':'n':xs) = ' ' : replace xs
replace (x:xs) = x : replace xs
replace "" = ""
In ('\':'n':xs) the backslash would escape the single quote and mess up the code, so I wrote ('\\':'n':xs) expecting that the first \ would escape the escape of the second \ and would match a backslash in a String. However, it doesn't.
This is what happens when I try the function in GHCi:
*Example> replace "m\nop"
"m\nop"
*Example> replace "m\\nop"
"m op"
How can I match a single backslash?
\n is a single character. If we use \n in a string like "Hello\nWorld!", then the resulting list looks like this: ['H','e','l','l','o','\n','W','o','r','l','d','!']. \n denotes a newline character, a single ASCII byte 10. However, since a newline isn't really easy to type in many programming languages, the escape sequence \n is used instead in string literals.
If you want to pattern match on a newline, you must use the whole escape sequence:
replace :: String -> String
replace ('\n':xs) = ' ' : replace xs
replace (x:xs) = x : replace xs
replace "" = ""
Otherwise, you will only match the literal \.
Exercise: Now that replace works, try to use map instead of explicit recursion.

gsubbing a string with a pattern containing a newline character in Lua

Does string.gsub recognize the newline character in a string literal? I have a scenario in which I am trying to gsub a portion of a string indicated by a given operator from the start of the operator to the newline like so:
local function removeComments(str, operator)
local new_Sc = (str):gsub(operator..".*\n", "");
return new_Sc;
end
local source = [[
int hi = 123; //a basic comment
char ok = "abc"; //another comment
]];
source = removeComments(source, "//");
print(source);
however in the output I see that it removed the rest of the string literal after the first comment:
int hi = 123;
I tried using the literal newline character by using string.char(10) like so (str):gsub(operator..".*"..string.char(10), ""); however I still got the same output; it removes the comment and the rest of the string instead of the start of the comment to the newline.
So is there anyway to gsub a string literal for a pattern containing a newline character?
Thanks
The problem you are facing is akin to greedy vs. lazy matching in regular expressions (.* vs .*?).
In Lua patterns, X.*\n means "match X, then match as many as possible characters followed by a newline". gsub has no special handling for a newline, hence it will try to continue matching until the last newline, subbing as many characters as it can. You want to match as few characters as possible, which is represented by .- in Lua patterns.
Also, I am not sure if it is intended or not, but this strategy will not remove the comment from the last line, if it is not (properly) ended by a newline. I am not sure if it can be represented by a single pattern, but this function will remove comments from all lines:
local function removeComments(str, operator)
local new_Sc = str:gsub(operator..".-\n", "\n");
new_Sc = new_Sc:gsub(operator.."[^\n].*$", "");
return new_Sc;
end

ANTLRv4 : Read double quotes escaped with both \ and "

I'm trying to implement a parser using ANTLRv4 for a language that accepts both "" and \" as a way escaping " characters in " delimited strings.
The answers to this question show how to do it for "" escaping. However when I try to extend it to also cover the \" case, it almost works but becomes too greedy when two strings are on the same line.
Here is my grammar:
grammar strings;
strings : STRING (',' STRING )* ;
STRING
: '"' (~[\r\n"] | '""' | '\"' )* '"'
;
Here is my input of three strings:
"This is ""my string\"",
"cat","fish"
This correctly recognises "This is ""my string\"", but thinks that "cat","fish" is all one string.
If I move "fish" down on to the next line it works correctly.
Can anyone figure out how to make it work if "cat" and "fish" are on the same line?
Make your STRING rule non greedy to stop at the first quote char it encounters, instead of trying to get as much as possible:
STRING
: '"' (~[\r\n"] | '""' | '\"' )*? '"'
;
I've found what I need to do to get this to work as I wanted, though to be honest I'm still not entirely sure why Antlr was doing what it did.
Simply by adding another backslash character to the '\"' clause it works!
So my final STRINGS definition is : '"' (~[\r\n"] | '""' | '\\"' )* '"'
Going back to first principles, I hand drew a state transition diagram of the problem and then realised that the two escaping mechanism sequences are not the same and cannot be treated similarly. Then trying to implement the two patterns in AntlrWorks it became apparent that I needed to add the second backslash at which point it all started working.
Does a single backslash followed by some arbitrary character simply mean that character?

Perl Force Inteprolation of Literal String [duplicate]

In perl suppose I have a string like 'hello\tworld\n', and what I want is:
'hello world
'
That is, "hello", then a literal tab character, then "world", then a literal newline. Or equivalently, "hello\tworld\n" (note the double quotes).
In other words, is there a function for taking a string with escape sequences and returning an equivalent string with all the escape sequences interpolated? I don't want to interpolate variables or anything else, just escape sequences like \x, where x is a letter.
Sounds like a problem that someone else would have solved already. I've never used the module, but it looks useful:
use String::Escape qw(unbackslash);
my $s = unbackslash('hello\tworld\n');
You can do it with 'eval':
my $string = 'hello\tworld\n';
my $decoded_string = eval "\"$string\"";
Note that there are security issues tied to that approach if you don't have 100% control of the input string.
Edit: If you want to ONLY interpolate \x substitutions (and not the general case of 'anything Perl would interpolate in a quoted string') you could do this:
my $string = 'hello\tworld\n';
$string =~ s#([^\\A-Za-z_0-9])#\\$1#gs;
my $decoded_string = eval "\"$string\"";
That does almost the same thing as quotemeta - but exempts '\' characters from being escaped.
Edit2: This still isn't 100% safe because if the last character is a '\' - it will 'leak' past the end of the string though...
Personally, if I wanted to be 100% safe I would make a hash with the subs I specifically wanted and use a regex substitution instead of an eval:
my %sub_strings = (
'\n' => "\n",
'\t' => "\t",
'\r' => "\r",
);
$string =~ s/(\\n|\\t|\\n)/$sub_strings{$1}/gs;

How to use backslash escape char for new line in JavaCC?

I have an assignment to create a lexical analyser and I've got everything working except for one bit.
I need to create a string that will accept a new line, and the string is delimited by double quotes.
The string accepts any number, letter, some specified punctuation, backslashes and double quotes within the delimiters.
I can't seem to figure out how to escape a new line character.
Is there a certain way of escaping characters like new line and tab?
Here's some of my code that might help
< STRING : ( < QUOTE> (< QUOTE > | < BACKSLASH > | < ID > | < NUM > | " " )* <QUOTE>) >
< #QUOTE : "\"" >
< #BACKSLASH : "\\" >
So my string should allow for a quote, then any of the following characters like a backslash, a whitespace, a number etc, and then followed by another quote.
The newline char like "\n" is what's not working.
Thanks in advance!
For string literals, JavaCC borrows the syntax of Java. So, a single-character literal comprising a carriage return is escaped as "\r", and a single-character literal comprising a line feed is escaped as "\n".
However, the processed string value is just a single character; it is not the escape itself. So, suppose you define a token for line feed:
< LF : "\n" >
A match of the token <LF> will be a single line-feed character. When substituting the token in the definition of another token, the single character is effectively substituted. So, suppose you have the higher-level definition:
< STRING : "\"" ( <LF> ) "\"" >
A match of the token <STRING> will be three characters: a quotation mark, followed by a line feed, followed by a quotation mark. What you seem to want instead is for the escape sequence to be recognized:
< STRING : "\"" ( "\\n" ) "\"" >
Now a match of the token <STRING> will be four characters: a quotation mark, followed by an escape sequence representing a line feed, followed by a quotation mark.
In your current definition, I see that other often-escaped metacharacters like quotation mark and backslash are also being recognized literally, rather than as escape sequences.

Resources