Why use '\\n' rather than '\n'? - python-3.x

I saw this in a Python 3 tutorial about how to download a file and this is what it kinda looks like.
from urllib import request
import requests
goog="http://realchart.finance.yahoo.com/table.csvs=GOOG&d=8&e=7&f=2016&g=d&a=7&b=19&c=2004&ignore=.csv"
rp=request.urlopen(goog)
s=rp.read()
cp=str(s)
m=cp.split('\\n')
dest='goog.csv'
fw=open(dest,'w')
for c in m:
fw.write(c+ '\n')
fw.close()
fr=open('goog.csv','r')
k=fr.read()
print(k)
Why was this used?
split('\\n')
Its true that the code only works properly when you use the double backslashes but why?

The backslash is a special character inside strings, its purpose is to introduce special characters into the strings, special characters that can't otherwise be written on a keyboard in a natural way, if at all. The most common being the newline '\n'.
However, since the backslash is special, how do one make a string contain an actual backslash? Simple: Use the backslash to escape itself! A double-backslash will be translated into a literal backslash.
In the context of this question, the text being searched contains a literal backslash, so to find this literal backslash one must use the double backslash.

<button onclick='window.alert("\n")'>alert not escaped</button>
<button onclick='window.alert("\\n")'>alert escaped</button>
In a string a single backslash is a so-called 'escape' character. This is used to include special characters like tab (\t) or a new line (\n).

Related

how to replace string (with special characters) to normal string in vim

Hi i am trying to replace a string with special character at the end with new string. For Example, I want to replace
qwerty_CRS_abc\
to
qwerty_CRS_abc
I tried with this:
:%s/qwerty_CRS_abc\/qwerty_CRS_abc/g
but I'm getting this error:
Pattern not found: padring_CRS_CAN\/padring_CRS_CAN\g
Basically, I just want to remove that backslash in whole file. It should be just
qwerty_CRS_abc
Use:
:%s/qwerty_CRS_abc\\/qwerty_CRS_abc/g
Certain characters such as /&!.^*$\? carry a special significance to the search process and must be escaped using the \ character when they are used in a search. Hence the \\ used to escape the backslash in your example.

String lexical rule in ANTLR with greedy wildcald and escape character

From the book "The Definitive ANTLR 4 Reference":
Our STRING rule isn’t quite good enough yet because it doesn’t allow
double quotes inside strings. To support that, most languages define
escape sequences starting with a backslash. To get a double quote
inside a double-quoted string, we use \". To support the common escape
characters, we need something like the following:
STRING ​: ​ ​'"' ​( ESC |.)*?​ ​'"' ​ ​;
fragment
ESC ​: ​ ​'\\"' | ​ ​'\\\\' ​ ​; ​ ​// 2-char sequences \" and \\
​ ANTLR itself needs to escape the escape character, so that’s why we need \\ to
specify the backslash character. The loop in STRING now matches either
an escape character sequence, by calling fragment rule ESC, or any
single character via the dot wildcard. The *? subrule operator
terminates the (ESC |.)*?
That sounds fine, but when I read that I noticed a certain ambiguity in the choice between ESC and .. As far as STRING is concerned, it is possible to match an input "Hi\"" by matching the escape character \ to the ., and to consider the following escaped double-quote as closing the string. This would even be less greedy and so would conform better to the use of ?.
The problem, of course, is that if we do that, then we have an extra double-quote at the end that does not get matched to anything.
So I wrote the following grammar:
grammar String;
anything: STRING '"'? '\r\n';
STRING: '"' (ESC|.)*? '"';
fragment
ESC: '\\"' | '\\\\';
which accepts an optional lonely double-quote character right after the string. This grammar still parses "Orange\"" as a full string:
So my question is: why is this the accepted parse, as opposed to the one taking "Orange\" as the STRING, followed by an isolated double-quote "? Note that the latter would be less greedy, which would seem to conform better to the use of ?, so one could think it would be preferable.
After some more experimentation, I realize the explanation is that the choice operator | is order-dependent (but only under non-greedy operator ?): ESC is tried before .. If I invert the two and write (.|ESC)*?, I do get
This is not really surprising, but an interesting reminder that ANTLR is not as declarative as we may sometimes expect (in the sense that logic-or is order-independent but | is not). It is also a good reminder that the non-greedy operator ? does not extend its minimization capabilities to all choices, but just to the first one that matches the input (#sepp2k adds that order dependency only applies to the non-greedy case).

How do I put a single backslash into an ES6 template literal's output?

I'm struggling to get an ES6 template literal to produce a single backslash it its result.
> `\s`
's'
> `\\s`
'\\s'
> `\\\s`
'\\s'
> `\\\\s`
'\\\\s'
> `\u005Cs`
'\\s'
Tested with Node 8.9.1 and 10.0.0 by inspecting the value at a Node REPL (rather than printing it using console.log)
If I get your question right, how about \\?
I tried using $ node -i and run
console.log(`\\`);
Which successfully output a backslash. Keep in mind that the output might be escaped as well, so the only way to know you are successfully getting a backslash is getting the character code:
const myBackslash = `\\`;
console.log(myBackslash.charCodeAt(0)); // 92
And to make sure you are not actually getting \\ (i.e. a double-backslash), check the length:
console.log(myBackslash.length); // 1
It is a known issue that unknown string escape sequences lose their escaping backslash in JavaScript normal and template string literals:
When a character in a string literal or regular expression literal is
preceded by a backslash, it is interpreted as part of an escape
sequence. For example, the escape sequence \n in a string literal
corresponds to a single newline character, and not the \ and n
characters. However, not all characters change meaning when used in an
escape sequence. In this case, the backslash just makes the character
appear to mean something else, and the backslash actually has no
effect. For example, the escape sequence \k in a string literal just
means k. Such superfluous escape sequences are usually benign, and do
not change the behavior of the program.
In regular string literals, one needs to double the backslash in order to introduce a literal backslash char:
console.log("\s \\s"); // => s \s
console.log('\s \\s'); // => s \s
console.log(`\s \\s`); // => s \s
There is a better idea: use String.raw:
The static String.raw() method is a tag function of template
literals. This is similar to the r prefix in Python, or the #
prefix in C# for string literals. (But it is not identical; see
explanations in this issue.) It's used to get the raw string form
of template strings, that is, substitutions (e.g. ${foo}) are
processed, but escapes (e.g. \n) are not.
So, you may simply use String.raw`\s` to define a \s text:
console.log(String.raw`s \s \\s`); // => s \s \\s

Characters to separate value

i need to create a string to store couples of key/value data, for example:
key1::value1||key2::value2||key3::value3
in deserializing it, i may encounter an error if the key or the value happen to contain || or ::
What are common techniques to deal with such situation? thanks
A common way to deal with this is called an escape character or qualifier. Consider this Comma-Separated line:
Name,City,State
John Doe, Jr.,Anytown,CA
Because the name field contains a comma, it of course gets split improperly and so on.
If you enclose each data value by qualifiers, the parser knows when to ignore the delimiter, as in this example:
Name,City,State
"John Doe, Jr.",Anytown,CA
Qualifiers can be optional, used only on data fields that need it. Many implementations will use qualifiers on every field, needed or not.
You may want to implement something similar for your data encoding.
Escape || when serializing, and unescape it when deserializing. A common C-like way to escape is to prepend \. For example:
{ "a:b:c": "foo||bar", "asdf": "\\|||x||||:" }
serialize => "a\:b\:c:foo\|\|bar||asdf:\\\\\|\|\|x\|\|\|\|\:"
Note that \ needs to be escaped (and double escaped due to being placed in a C-style string).
If we assume that you have total control over the input string, then the common way of dealing with this problem is to use an escape character.
Typically, the backslash-\ character is used as an escape to say that "the next character is a special character", so in this case it should not be used as a delimiter. So the parser would see || and :: as delimiters, but would see \|\| as two pipe characters || in either the key or the value.
The next problem is that we have overloaded the backslash. The problem is then, "how do I represent a backslash". This is sovled by saying that the backslash is also escaped, so to represent a \, you would have to say \\. So the parser would see \\ as \.
Note that if you use escape characters, you can use a single character for the delimiters, which might make things simpler.
Alternatively, you may have to restict the input and say that || and :: are just baned and fail/remove when the string is encoded.
A simple solution is to escape a separator (with a backslash, for instance) any time it occurs in data:
Name,City,State
John Doe\, Jr.,Anytown,CA
Of course, the separator will need to be escaped when it occurs in data as well; in this case, a backslash would become \\.
You can use non-ascii character as separator (e.g. vertical tab :-) ).
You can escape separator character in your data during serialization. For example: if you use one character as separator (key1:value1|key2:value2|...) and your data is:
this:is:key1 this|is|data1
this:is:key2 this|is|data2
you double every colon and pipe character in you data when you serialize it. So you will get:
this::is::key1:this||is||data1|this::is::key2:this||is||data2|...
During deserialization whenever you come across two colon or two pipe characters you know that this is not your separator but part of your data and that you have to change it to one character. On the other hand, every single colon or pipe character is you separator.
Use a prefix (say "a") for your special characters (say "b") present in the key and values to store them. This is called escaping.
Then decode the key and values by simply replacing any "ab" sequence with "b". Bear in mind that the prefix is also a special character. An example:
Prefix: \
Special characters: :, |, \
Encoded:
title:Slashdot\: News for Nerds. Stuff that Matters.|shortTitle:\\.
Decoded:
title=Slashdot: News for Nerds. Stuff that Matters.
shortTitle=\.
The common technique is escaping reserved characters, for example:
In urls you escape some characters
using %HEX representation:
http://example.com?aa=a%20b
In programming languages you escape
some characters with a slash prefix:
"\"hello\""

Are quotes a type of string delimiter? Or does 'delimiter' mean other types of characters not including quotes?

When people talk about string delimiters, does that include quotes or does that mean everything except quotes?
It means any character used to define the beginning and end of a string (e.g. quotes but, in other contexts, other characters).
There's a subtle difference, if you're talking about string delimiters that nearly always means quotes, either " or '.
If you're talking about a delimited string, then you're normal talking about a string of tokens, with delimiters between them ie
"this,is,a,delimited,string" -
It's very common to use a comma, as the delimiter, but that leads to issues when the token already contains a comma - for instance
"one,million,dollars,$1,000,000"
In this instance it's common to further delimit the token so we get
"one,million,dollars,"$1,000,000""
another common alternative is to use an unusual character as the delimiter, and there's a minor convention to use the pipe symbol |
"one|million|dollars|$1,000,000"

Resources