Perl Force Inteprolation of Literal String [duplicate] - string

In perl suppose I have a string like 'hello\tworld\n', and what I want is:
'hello world
'
That is, "hello", then a literal tab character, then "world", then a literal newline. Or equivalently, "hello\tworld\n" (note the double quotes).
In other words, is there a function for taking a string with escape sequences and returning an equivalent string with all the escape sequences interpolated? I don't want to interpolate variables or anything else, just escape sequences like \x, where x is a letter.

Sounds like a problem that someone else would have solved already. I've never used the module, but it looks useful:
use String::Escape qw(unbackslash);
my $s = unbackslash('hello\tworld\n');

You can do it with 'eval':
my $string = 'hello\tworld\n';
my $decoded_string = eval "\"$string\"";
Note that there are security issues tied to that approach if you don't have 100% control of the input string.
Edit: If you want to ONLY interpolate \x substitutions (and not the general case of 'anything Perl would interpolate in a quoted string') you could do this:
my $string = 'hello\tworld\n';
$string =~ s#([^\\A-Za-z_0-9])#\\$1#gs;
my $decoded_string = eval "\"$string\"";
That does almost the same thing as quotemeta - but exempts '\' characters from being escaped.
Edit2: This still isn't 100% safe because if the last character is a '\' - it will 'leak' past the end of the string though...
Personally, if I wanted to be 100% safe I would make a hash with the subs I specifically wanted and use a regex substitution instead of an eval:
my %sub_strings = (
'\n' => "\n",
'\t' => "\t",
'\r' => "\r",
);
$string =~ s/(\\n|\\t|\\n)/$sub_strings{$1}/gs;

Related

gsubbing a string with a pattern containing a newline character in Lua

Does string.gsub recognize the newline character in a string literal? I have a scenario in which I am trying to gsub a portion of a string indicated by a given operator from the start of the operator to the newline like so:
local function removeComments(str, operator)
local new_Sc = (str):gsub(operator..".*\n", "");
return new_Sc;
end
local source = [[
int hi = 123; //a basic comment
char ok = "abc"; //another comment
]];
source = removeComments(source, "//");
print(source);
however in the output I see that it removed the rest of the string literal after the first comment:
int hi = 123;
I tried using the literal newline character by using string.char(10) like so (str):gsub(operator..".*"..string.char(10), ""); however I still got the same output; it removes the comment and the rest of the string instead of the start of the comment to the newline.
So is there anyway to gsub a string literal for a pattern containing a newline character?
Thanks
The problem you are facing is akin to greedy vs. lazy matching in regular expressions (.* vs .*?).
In Lua patterns, X.*\n means "match X, then match as many as possible characters followed by a newline". gsub has no special handling for a newline, hence it will try to continue matching until the last newline, subbing as many characters as it can. You want to match as few characters as possible, which is represented by .- in Lua patterns.
Also, I am not sure if it is intended or not, but this strategy will not remove the comment from the last line, if it is not (properly) ended by a newline. I am not sure if it can be represented by a single pattern, but this function will remove comments from all lines:
local function removeComments(str, operator)
local new_Sc = str:gsub(operator..".-\n", "\n");
new_Sc = new_Sc:gsub(operator.."[^\n].*$", "");
return new_Sc;
end

get perl to process backslash escapes in string

Consider this code:
my $str = '"line 1\n\t line 2"'; # from some JSON, or something
say $str; # print literal backslashes, not what I want
say eval $str; # processes backslashes, but overkill
Is there a reasonably easy way to get the effect of the last line, but without using full-blown eval? Even leaving aside the security implications (I mostly trust this string), this interpolates variables and stuff which I don't want. This can be worked around by an extra preprocessing step where I manually escape dollar signs and such, but this still feels a bit too hacky, even for my tastes.
#mob has the right recommendation. For the general problem:
#!/usr/bin/env perl
use strict;
use warnings;
my %unescape = map +($_ => eval "qq{\\$_}"), qw(f n r t); # etc
my $special = join '|', keys %unescape;
my $str = '"line 1\n\t line 2"';
$str =~ s{ \\ ($special) }{$unescape{$1}}xg;
print "'$str\n'";
If it's JSON, then decode it with JSON.
use JSON;
my $str = '"line 1\n\t line 2"'; # from some JSON, or something
my $decoded = JSON::decode_json("[$str]");
say $decoded->[0];

How to remove or replace brackets in a string?

my $book = Spreadsheet::Read->new();
my $book = ReadData
('D:\Profiles\jmahroof\Desktop\Scheduled_Build_Overview.xls');
my $cell = "CD7";
my $n = "1";
my $send = $book->[$n]{$cell};
The above code gets data from a spreadsheet, then prints the content of a cell that I know has text in. It has text of exactly the following format: text(text)
I need to replace the open bracket with a empty space and I need to remove the close bracket. I have tried the below code to substitute the open bracket for an empty space however it does not seem to work.
$send =~ s/(/ /g;
print $send;
The bracket is seen as part of the code, just escape it.
$send =~ s/\(/ /;
print $send;
Since you only replace one char with another, you don't want a substitution, but a transliteration. That's the tr/// function in Perl. Since the pattern is just a list of chars, and not an actual regex, you don't need to escape the open parenthesis (. There is also no /g flag. It just substitutes all occurrences.
$send =~ tr/(/ /;
The main difference to a regular expression substitution is that the transliterations get compiled at compile time, not at run time. That makes the tr/// faster than a s///, especially in a loop.
See the full documentation in perlop.

How to grep/split a word in middle of %% or $$

I have a variable from which I have to grep the which in middle of %% adn the word which starts with $$. I used split it works... but for only some scenarios.
Example:
#!/usr/bin/perl
my $lastline ="%Filters_LN_RESS_DIR%\ARC\Options\Pega\CHF_Vega\$$(1212_GV_DATE_LDN)";
my #lastline_temp = split(/%/,$lastline);
print #lastline_temp;
my #var=split("\\$\\$",$lastline_temp[2]);
print #var;
I get the o/p as expected. But can i get the same using Grep command. I mean I dont want to use the array[2] or array[1]. So that I can replace the values easily.
I don't really see how you can get the output you expect. Because you put your data in "busy" quotes (interpolating, double, ...), it comes out being stored as:
'%Filters_LN_RESS_DIR%ARCOptionsPegaCHF_Vega$01212_GV_DATE_LDN)'
See Quote and Quote-like Operators and perhaps read Interpolation in Perl
Notice that the backslashes are gone. A backslash in interpolating quotes simply means "treat the next character as literal", so you get literal 'A', literal 'O', literal 'P', ....
That '0' is the value of $( (aka $REAL_GROUP_ID) which you unwittingly asked it to interpolate. So there is no sequence '$$' to split on.
Can you get the same using a grep command? It depends on what "the same" is. You save the results in arrays, the purpose of grep is to exclude things from the arrays. You will neither have the arrays, nor the output of the arrays if you use a non-trivial grep: grep {; 1 } #data.
Actually you can get the exact same result with this regular expression, assuming that the single string in #vars is the "result".
m/%([^%]*)$/
Of course, that's no more than
substr( $lastline, rindex( $lastline, '%' ) + 1 );
which can run 8-10 times faster.
First, be very careful in your use of quotes, I'm not sure if you don't mean
'%Filters_LN_RESS_DIR%\ARC\Options\Pega\CHF_Vega\$$(1212_GV_DATE_LDN)'
instead of
"%Filters_LN_RESS_DIR%\ARC\Options\Pega\CHF_Vega\$$(1212_GV_DATE_LDN)"
which might be a different string. For example, if evaluated, "$$" means the variable $PROCESS_ID.
After trying to solve riddles (not sure about that), and quoting your string
my $lastline =
'%Filters_LN_RESS_DIR%\ARC\Options\Pega\CHF_Vega\$$(1212_GV_DATE_LDN)'
differently, I'd use:
my ($w1, $w2) = $lastline =~ m{ % # the % char at the start
([^%]+) # CAPTURE everything until next %
[^(]+ # scan to the first brace
\( # hit the brace
([^)]+) # CAPTURE everything up to closing brace
}x;
print "$w1\n$w2";
to extract your words. Result:
Filters_LN_RESS_DIR
1212_GV_DATE_LDN
But what do you mean by replace the values easily. Which values?
Addendum
Now lets extract the "words" delimited by '\'. Using a simple split:
my #words = split /\\/, # use substr to start split after the first '\\'
substr $lastline, index($lastline,'\\');
you'll get the words between the backslashes if you drop the last entry (which is the $$(..) string):
pop #words; # remove the last element '$$(..)'
print join "\n", #words; # print the other elements
Result:
ARC
Options
Pega
CHF_Vega
Does this work better with grep? Seems to:
my #words = grep /^[^\$%]+$/, split /\\/, $lastline;
and
print join "\n", #words;
also results in:
ARC
Options
Pega
CHF_Vega
Maybe that is what you are after? What do you want to do with these?
Regards
rbo

Printing string in Perl

Is there an easy way, using a subroutine maybe, to print a string in Perl without escaping every special character?
This is what I want to do:
print DELIMITER <I don't care what is here> DELIMITER
So obviously it will great if I can put a string as a delimiter instead of special characters.
perldoc perlop, under "Quote and Quote-like Operators", contains everything you need.
While we usually think of quotes as literal values, in Perl they function as operators, providing various kinds of interpolating and pattern matching
capabilities. Perl provides customary quote characters for these behaviors, but also provides a way for you to choose your quote character for any of
them. In the following table, a "{}" represents any pair of delimiters you choose.
Customary Generic Meaning Interpolates
'' q{} Literal no
"" qq{} Literal yes
`` qx{} Command yes*
qw{} Word list no
// m{} Pattern match yes*
qr{} Pattern yes*
s{}{} Substitution yes*
tr{}{} Transliteration no (but see below)
<<EOF here-doc yes*
* unless the delimiter is ''.
$str = q(this is a "string");
print $str;
if you mean quotes and apostrophes with 'special characters'
You can use the __DATA__ directive which will treat all of the following lines as a file that can be accessed from the DATA handle:
while (<DATA>) {
print # or do something else with the lines
}
__DATA__
#!/usr/bin/perl -w
use Some::Module;
....
or you can use a heredoc:
my $string = <<'END'; #single quotes prevent any interpolation
#!/usr/bin/perl -b
use Some::Module;
....
END
The printing is not doing special things to the escapes, double quoted strings are doing it. You may want to try single quoted strings:
print 'this is \n', "\n";
In a single quoted string the only characters that must be escaped are single quotes and a backslash that occurs immediately before the end of the string (i.e. 'foo\\').
It is important to note that interpolation does not work with single quoted strings, so
print 'foo is $foo', "\n";
Will not print the contents of $foo.
You can pretty much use any character you want with q or qq. For example:
#!/usr/bin/perl
use utf8;
use strict; use warnings;
print q∞This is a test∞;
print qq☼\nThis is another test\n☼;
print q»But, what is the point?»;
print qq\nYou are just making life hard on yourself!\n;
print qq¿That last one is tricky\n¿;
You cannot use qq DELIMITER foo DELIMITER. However, you could use heredocs for a similar effect:
print <<DELIMITER
...
DELIMETER
;
or
print <<'DELIMETER'
...
DELIMETER
;
but your source code would be really ugly.
If you want to print a string literally and you have Perl 5.10 or later then
say 'This is a string with "quotes"' ;
will print the string with a newline.. The importaning thing is to use single quotes ' ' rather than double ones " "

Resources