Is there a way to represent a long string that doesnt have any whitespace on multiple lines in a YAML document? - string

Say I have the following string:
"abcdefghijklmnopqrstuvwxyz"
And I think its too long for one line in my YAML file, is there some way to split that over several lines?
>-
abcdefghi
jklmnopqr
stuvwxyz
Would result in "abcdefghi jklmnopqr stuvwxyz" which is close, but it shouldn't have any spaces.

Use double-quotes, and escape the newline:
"abcdefghi\
jklmnopqr\
stuvwxyz"

There are some subtleties that Jesse's answer will miss.
YAML (like many programming languages) treats single and double quotes differently. Consider this document:
regexp: "\d{4}"
This will fail to parse with an error such as:
found unknown escape character while parsing a quoted scalar at line 1 column 9
Compare that to:
regexp: '\d{4}'
Which will parse correctly. In order to use backslash character inside double-quoted strings you would need to escape them, as in:
regexp: "\\d{4}"
I'd also like to highlight Steve's comment about single-quoted strings. Consider this document:
s1: "this\
is\
a\
test"
s2: 'this\
is\
a\
test'
When parsed, you will find that it is equivalent to:
s1: thisisatest
s2: "this\\ is\\ a\\ test"
This is a direct result of the fact that YAML treats single-quoted strings as literals, while double-quoted strings are subject to escape character expansion.

Related

Text containing special characters in command line cannot be well read

I have a function analyze_text: string -> unit to analyze a text. As a result, (most of the time,) ./analyze aText launches the function with the argument.
let usage_msg = "./analyze [options] TEXT" in
Arg.parse options analyze_text usage_msg;
However, I realize that when the text contains special characters like ", ' or !, it cannot be well read. Does anyone know if there is a way to well wrap the text and give it to the function?
On the shell there are many shell characters. You can escape the shell characters by enclosing your input in single quotes.
$ echo 'a*$b"$c"!d'
a*$b"$c"!d
If your input itself contains single quote. You'll have to enclose that in the double quotes and concatenate with the rest of substrings of input which are enclosed in single quotes.
e.g. You want to print: He$l!o Wo$r'ld
You can do it like:
$ echo 'He$l!o Wo$r'"'"'ld'
He$l!o Wo$r'ld
In your case, the culprit is not your OCaml code, but the behavior of your shell, e.g., bash. When you enter text in the bash command line prompt many characters have special meaning, e.g., ", ', $, \ and so on. To hush the special meaning of a character in bash you can either escape it with the backslash, e.g., \$, \\, \' or delimit with single quotes (but you will still need to escape single quotes in the single-quotes-delimited text.
The general approach is that when your input is actual text or data, not a sequence of commands and options, you should read the input from a file or from the standard input channel. This also helps, when the size of the input is large, as most of the shells limit (sometimes significantly) the total number of characters that can be passed through the command line. In vanilla OCaml, you can input the whole file into a single string using the following simple code
let read_file filename =
let buf = Buffer.create 4096 in
let chan = open_in filename in
begin
try while true do Buffer.add_channel buf chan 4096 done
with End_of_file -> ()
end;
Buffer.contents buf
Then you don't need to deal with any special characters, as your input will be the file and no shell in between will do any interpretations. You can even analyze binary data with that.

replace sub-string with last special character, being (3rd part) of comma separated string

I have a string with comma separated values, like:
742108,SOME-STRING_A_-BLAHBLAH_1-4MP0RTTYE,SOME-STRING_A_-BLAHBLAH_1-4MP0-,,,
As you can see, the 3rd comma separated value has sometimes special character, like the dash (-), in the end. I want to used sed, or preferably perl command to replace this string (with the -i option, so as to replace at existing file), with same string at the same place (i.e. 3rd comma separated value) but without the special character (like the dash (-)) at the end of the string. So, result at above example string should be:
742108,SOME-STRING_A_-BLAHBLAH_1-4MP0RTTYE,SOME-STRING_A_-BLAHBLAH_1-4MP0,,,
Since such multiple lines like the above are inside a file, I am using while loop at shell/bash script to loop and manipulate all lines of the file. And I have assigned the above string values to variables, so as to replace them using perl. So, my while loop is:
while read mystr
do
myNEWstr=$(echo $mystr | sed s/[_.-]$// | sed s/[__]$// | sed s/[_.-]$//)
perl -pi -e "s/\b$mystr\b/$myNEWstr/g" myFinalFile.txt
done < myInputFile.txt
where:
$mystr is the "SOME-STRING_A_-BLAHBLAH_1-4MP0-"
$myNEWstr result is the "SOME-STRING_A_-BLAHBLAH_1-4MP0"
Note that the myInputFile.txt is a file that contains the 3rd comma separated values of the myFinalFile.txt, so that those EXACT string values ($mystr) will be checked for special characters in the end, like underscore, dash, dot, double-underscore, and if they exist to be removed and form the new string ($myNEWstr), then finally that new string ($myNEWstr) to be replaced at the myFinalFile.txt, so as to have the resulting strings like the example final string shown above, i.e. with the 3rd comma separated sub-string value WITHOUT the special character in the end (which is dash (-) at above example).
Thank you.
You could use the following regex:
s/^([^,]*,[^,]*,[^,]*)-,/$1,/
This defined csv fields as series of characters other than a comma (empty fields are allowed). We are looking for a dash at the very end of the third csv field. The regex captures everything until there, and then replaces it while omitting the dash.
$ cat t.txt
742108,SOME-STRING_A_-BLAHBLAH_1-4MP0RTTYE,SOME-STRING_A_-BLAHBLAH_1-4MP0-,,,
]$ perl -p -e 's/^([^,]*,[^,]*,[^,]*)-,/$1,/' t.txt
742108,SOME-STRING_A_-BLAHBLAH_1-4MP0RTTYE,SOME-STRING_A_-BLAHBLAH_1-4MP0,,,
]$

gsubbing a string with a pattern containing a newline character in Lua

Does string.gsub recognize the newline character in a string literal? I have a scenario in which I am trying to gsub a portion of a string indicated by a given operator from the start of the operator to the newline like so:
local function removeComments(str, operator)
local new_Sc = (str):gsub(operator..".*\n", "");
return new_Sc;
end
local source = [[
int hi = 123; //a basic comment
char ok = "abc"; //another comment
]];
source = removeComments(source, "//");
print(source);
however in the output I see that it removed the rest of the string literal after the first comment:
int hi = 123;
I tried using the literal newline character by using string.char(10) like so (str):gsub(operator..".*"..string.char(10), ""); however I still got the same output; it removes the comment and the rest of the string instead of the start of the comment to the newline.
So is there anyway to gsub a string literal for a pattern containing a newline character?
Thanks
The problem you are facing is akin to greedy vs. lazy matching in regular expressions (.* vs .*?).
In Lua patterns, X.*\n means "match X, then match as many as possible characters followed by a newline". gsub has no special handling for a newline, hence it will try to continue matching until the last newline, subbing as many characters as it can. You want to match as few characters as possible, which is represented by .- in Lua patterns.
Also, I am not sure if it is intended or not, but this strategy will not remove the comment from the last line, if it is not (properly) ended by a newline. I am not sure if it can be represented by a single pattern, but this function will remove comments from all lines:
local function removeComments(str, operator)
local new_Sc = str:gsub(operator..".-\n", "\n");
new_Sc = new_Sc:gsub(operator.."[^\n].*$", "");
return new_Sc;
end

Unescape a string with escaped sequences in Delphi

I use Delphi 5 and have a String like this from a http-connection:
str :='content=bell=7'#$8'size=20'#$8'other1'#$D#$A#$8'other2'
This string contains some sequence with escape characters and i want to unescape these characters. If I use the trim function, the escape sequence are still inside. Maybe this is because '#$8' is no viewable sign?
How can i replace '#&8' separately. For example with '&', so that i get the string:
str1 :='content=bell=7&size=20&other1'#$D#$A'&other2'
After this I can use trim to unescape the other sequences.
str2 :='content=bell=7&size=20&other1#13#10&other2'
Those are Delphi character sequences. The compiler interprets them as it processes your source file. It converts #$8 into a backspace character in the string. If you want to replace that character with something else, you could call StringReplace. (If that's your real code, then you could just skip the extra function call and use the desired characters in the string literal directly in your code.)
str2 := StringReplace(str1, #8, '&', [rfReplaceAll]);
Trim removes whitespace from the start and end of a string, but your characters aren't at either end.

Printing string in Perl

Is there an easy way, using a subroutine maybe, to print a string in Perl without escaping every special character?
This is what I want to do:
print DELIMITER <I don't care what is here> DELIMITER
So obviously it will great if I can put a string as a delimiter instead of special characters.
perldoc perlop, under "Quote and Quote-like Operators", contains everything you need.
While we usually think of quotes as literal values, in Perl they function as operators, providing various kinds of interpolating and pattern matching
capabilities. Perl provides customary quote characters for these behaviors, but also provides a way for you to choose your quote character for any of
them. In the following table, a "{}" represents any pair of delimiters you choose.
Customary Generic Meaning Interpolates
'' q{} Literal no
"" qq{} Literal yes
`` qx{} Command yes*
qw{} Word list no
// m{} Pattern match yes*
qr{} Pattern yes*
s{}{} Substitution yes*
tr{}{} Transliteration no (but see below)
<<EOF here-doc yes*
* unless the delimiter is ''.
$str = q(this is a "string");
print $str;
if you mean quotes and apostrophes with 'special characters'
You can use the __DATA__ directive which will treat all of the following lines as a file that can be accessed from the DATA handle:
while (<DATA>) {
print # or do something else with the lines
}
__DATA__
#!/usr/bin/perl -w
use Some::Module;
....
or you can use a heredoc:
my $string = <<'END'; #single quotes prevent any interpolation
#!/usr/bin/perl -b
use Some::Module;
....
END
The printing is not doing special things to the escapes, double quoted strings are doing it. You may want to try single quoted strings:
print 'this is \n', "\n";
In a single quoted string the only characters that must be escaped are single quotes and a backslash that occurs immediately before the end of the string (i.e. 'foo\\').
It is important to note that interpolation does not work with single quoted strings, so
print 'foo is $foo', "\n";
Will not print the contents of $foo.
You can pretty much use any character you want with q or qq. For example:
#!/usr/bin/perl
use utf8;
use strict; use warnings;
print q∞This is a test∞;
print qq☼\nThis is another test\n☼;
print q»But, what is the point?»;
print qq\nYou are just making life hard on yourself!\n;
print qq¿That last one is tricky\n¿;
You cannot use qq DELIMITER foo DELIMITER. However, you could use heredocs for a similar effect:
print <<DELIMITER
...
DELIMETER
;
or
print <<'DELIMETER'
...
DELIMETER
;
but your source code would be really ugly.
If you want to print a string literally and you have Perl 5.10 or later then
say 'This is a string with "quotes"' ;
will print the string with a newline.. The importaning thing is to use single quotes ' ' rather than double ones " "

Resources