EOL error while concatinating strings when backslash(\) is used in python - python-3.x

Why doesn't it work? my gut feeling is it has something to do with the slashes(\);
savepath = ("C:\\Python\" + date4filename + ".txt")
Error is
File "C:\python\temp.py", line 2
savepath=("C:\\Python\" + date4filename)
^
SyntaxError: EOL while scanning string literal
[Finished in 0.191s]

Back slash has special meaning which is used to take away special meaning of special characters when prefixed, here it is double quote ("). For this reason we have raw strings in python.
Raw strings are defined using r' ' . When raw strings are used all characters inside string are treated normal with no special meaning
Since backslash has special meaning, to use actual backslash we need to use (\\)
savepath = ("C:\\Python\\" + date4filename + ".txt")
Not to make it complex, use os.path library
import os.path
os.path.join("c://python/", date4filename, ".txt")
To avoid these path problems, you can absolutely use *nix style forwardslash(/) in python regardless of platform

Related

Pathlib how to deal with folders that start with a number

Using Python 3 pathlib on Windows, is there a way to deal with folders that start with a number, other than adding an extra slash?
For example:
from pathlib import Path, PureWindowsPath
op = pathlib.Path("D:\Documents\01")
fn = "test.txt"
fp = outpath / fn
with fp.open("w", encoding ="utf-8") as f:
f.write(result)
Returns error: [Errno 22] Invalid argument: 'D:\\Documents\x01\\test.txt'
I would have thought the PureWindowsPath would have taken care of this. If I manually escape out of it with op = pathlib.Path("D:\Documents\\01"), then it is fine. Do I always have to manually add a backslash to avoid the escape?
"\01" is a byte whose value is 1, not "backslash, zero, one".
You can do, for example:
op = pathlib.Path("D:\Documents") / "01"
The extra slash in "D:\Documents\\01" is there to tell Python that you don't want it to interpret \01 as an escape sequence.
From the comments chain:
It's the Python interpreter that's doing the escaping: \01 will always
be treated as an escape sequence (unless it's in a raw string literal
like r"\01"). pathlib has nothing to do with escaping in this case

python Using variable in re.search source.error("bad escape %s" % escape, len(escape)) [duplicate]

I want to use input from a user as a regex pattern for a search over some text. It works, but how I can handle cases where user puts characters that have meaning in regex?
For example, the user wants to search for Word (s): regex engine will take the (s) as a group. I want it to treat it like a string "(s)" . I can run replace on user input and replace the ( with \( and the ) with \) but the problem is I will need to do replace for every possible regex symbol.
Do you know some better way ?
Use the re.escape() function for this:
4.2.3 re Module Contents
escape(string)
Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.
A simplistic example, search any occurence of the provided string optionally followed by 's', and return the match object.
def simplistic_plural(word, text):
word_or_plural = re.escape(word) + 's?'
return re.match(word_or_plural, text)
You can use re.escape():
re.escape(string)
Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.
>>> import re
>>> re.escape('^a.*$')
'\\^a\\.\\*\\$'
If you are using a Python version < 3.7, this will escape non-alphanumerics that are not part of regular expression syntax as well.
If you are using a Python version < 3.7 but >= 3.3, this will escape non-alphanumerics that are not part of regular expression syntax, except for specifically underscore (_).
Unfortunately, re.escape() is not suited for the replacement string:
>>> re.sub('a', re.escape('_'), 'aa')
'\\_\\_'
A solution is to put the replacement in a lambda:
>>> re.sub('a', lambda _: '_', 'aa')
'__'
because the return value of the lambda is treated by re.sub() as a literal string.
Usually escaping the string that you feed into a regex is such that the regex considers those characters literally. Remember usually you type strings into your compuer and the computer insert the specific characters. When you see in your editor \n it's not really a new line until the parser decides it is. It's two characters. Once you pass it through python's print will display it and thus parse it as a new a line but in the text you see in the editor it's likely just the char for backslash followed by n. If you do \r"\n" then python will always interpret it as the raw thing you typed in (as far as I understand). To complicate things further there is another syntax/grammar going on with regexes. The regex parser will interpret the strings it's receives differently than python's print would. I believe this is why we are recommended to pass raw strings like r"(\n+) -- so that the regex receives what you actually typed. However, the regex will receive a parenthesis and won't match it as a literal parenthesis unless you tell it to explicitly using the regex's own syntax rules. For that you need r"(\fun \( x : nat \) :)" here the first parens won't be matched since it's a capture group due to lack of backslashes but the second one will be matched as literal parens.
Thus we usually do re.escape(regex) to escape things we want to be interpreted literally i.e. things that would be usually ignored by the regex paraser e.g. parens, spaces etc. will be escaped. e.g. code I have in my app:
# escapes non-alphanumeric to help match arbitrary literal string, I think the reason this is here is to help differentiate the things escaped from the regex we are inserting in the next line and the literal things we wanted escaped.
__ppt = re.escape(_ppt) # used for e.g. parenthesis ( are not interpreted as was to group this but literally
e.g. see these strings:
_ppt
Out[4]: '(let H : forall x : bool, negb (negb x) = x := fun x : bool =>HEREinHERE)'
__ppt
Out[5]: '\\(let\\ H\\ :\\ forall\\ x\\ :\\ bool,\\ negb\\ \\(negb\\ x\\)\\ =\\ x\\ :=\\ fun\\ x\\ :\\ bool\\ =>HEREinHERE\\)'
print(rf'{_ppt=}')
_ppt='(let H : forall x : bool, negb (negb x) = x := fun x : bool =>HEREinHERE)'
print(rf'{__ppt=}')
__ppt='\\(let\\ H\\ :\\ forall\\ x\\ :\\ bool,\\ negb\\ \\(negb\\ x\\)\\ =\\ x\\ :=\\ fun\\ x\\ :\\ bool\\ =>HEREinHERE\\)'
the double backslashes I believe are there so that the regex receives a literal backslash.
btw, I am surprised it printed double backslashes instead of a single one. If anyone can comment on that it would be appreciated. I'm also curious how to match literal backslashes now in the regex. I assume it's 4 backslashes but I honestly expected only 2 would have been needed due to the raw string r construct.

Golang trimPrefix from string "\"

I've seen that golang have the function func TrimPrefix(s, prefix string) string which returns s without the provided leading prefix string.
My problem is that I have a string which start with the character "\" (for example "\foo"). When I try to use TrimPrefix I getting an error.
golang code:
var s = "\foo"
s = strings.TrimPrefix(s, "\")
fmt.Print(s)
error:
./prog.go:10:32: newline in string
./prog.go:10:32: syntax error: unexpected newline, expecting comma or )
I have seen that it is due to golang understang "\" as the scape character. Do you know if ther is any golang option I can use in order to make golang understand that I don't want to use "\" as the escape character?
"\" is not a valid Go string literal. What you get is a compile-time error. In interpreted string literals backslash \ is a special character.
If you want the string to contain a backslash character, you have to use the sequence \\:
var s = "\\foo"
s = strings.TrimPrefix(s, "\\")
Which will output (try it on the Go Playground):
foo
Another option is to use raw string literals where the backslash is not special:
var s = `\foo`
s = strings.TrimPrefix(s, `\`)
Try this one on the Go Playground.
if you only want to trim the prefix which is a specific prefix( like "\" is a prefix with length of 1 ), you can use slice function as :str[len(prefix):].
Just because it is a prefix -- head of a string and a length-known prefix.
Ignore my post if you only want to know the use of TrimPrefix. :D

Multiple checks but still SyntaxError: EOL while scanning string literal

I have checked on this string multiple times to ensure that the (".") are in place, but the message
File "<ipython-input-13-ef09f7b4583b>", line 48 plt.savefig("C:\scratch\\data\"+str(angle).zfill(3)+".")
SyntaxError: EOL while scanning string literal
still comes up.
Any suggestions?
if save is not False:
plt.savefig("C:\scratch\\data\"+str(angle).zfill(3)+".png")
plt.close("all")
else:
plt.show()
return
A Python string can not terminate with \ as this will escape the closing " (or ').
You have several options:
Use double-back slashes in a constant manner:
plt.savefig("C:\\scratch\\data\\" + str(angle).zfill(3) + ".png")
Use .format, preferably with combination of raw string to avoid problems in case a directory name starts with t, n or any other character that will become a control sequence when prefixed with \:
plt.savefig(r"C:\scratch\data\{}.png".format(str(angle).zfill(3)))

Add 'r' prefix to a python variable

I have string variable which is
temp = '1\2\3\4'
I would like to add a prefix 'r' to the string variable and get
r'1\2\3\4'
so that I can split the string based on '\'. I tried the following:
r'temp'
'r' + temp
r + temp
But none of the above works. Is there a simple to do it? I'm using python 3. I also tried to encode the string, using
temp.encode('string-escape')
But it returns the following error
LookupError: unknown encoding: string-escape
r is a prefix for string literals. This means, r"1\2\3\4" will not interpret \ as an escape when creating the string value, but keep \ as an actual character in the string. Thus, r"1\2\3\4" will have seven characters.
You already have the string value: there is nothing to interpret. You cannot have the r prefix affect a variable, only a literal.
Your temp = "1\2\3\4" will interpret backslashes as escapes, create the string '1\x02\x03\x04' (a four-character string), then assign this string to the variable temp. There is no way to retroactively reinterpret the original literal.
EDIT: In view of the more recent comments, you do not seem to, in fact, have a string "1\2\3\4". If you have a valid path, you can split it using
path.split(r'\')
or
path.split('\\')
but you probably also don't need that; rather, you may want to split a path into directory and file name, which is best done by os.path functions.
Wouldn't it just be re.escape(temp)?
Take for example the use case of trying to generate a pattern on the fly involving word boundaries. Then you can do this
r'\b' + re.escape(temp) + r'\b'
just to prefix r in variable in search, Please do this r+""+temp.
e.g.-
import re
email_address = 'Please contact us at: support#datacamp.com'
searchString = "([\w\.-]+)#([\w\.-]+)"
re.serach(r""+searchString, email_address)

Resources