How to convert backslash string to forward slash string in python3? - python-3.x

I am using python3 in my ubuntu machine.
I have a string variable which consist a path with backslash and i need to convert it to forward slash string. So i tried
import pathlib
s = '\dir\wnotherdir\joodir\more'
x = repr(s)
p = pathlib.PureWindowsPath(x)
print(p.as_posix())
this will print correctly as
/dir/wnotherdir/joodir/more
but for different other string path, it acts weirdly. Ex, for the string,
'\dir\aotherdir\oodir\more'
it correctly replacing backslashes but the value is wrong because of the character 'a' in original string
/dir/x07otherdir/oodir/more
What is the reason for this behavior?

This has nothing to do with paths per-se. The problem here is that the \a is being interpreted as an ASCII BELL. As a rule of thumb whenever you want to disable the special interpretation of escaped string literals you should use raw strings:
>>> import pathlib
>>> r = r'\dir\aotherdir\oodir\more'
>>> pathlib.PureWindowsPath(r)
PureWindowsPath('/dir/aotherdir/oodir/more')
>>>

Related

Pathlib how to deal with folders that start with a number

Using Python 3 pathlib on Windows, is there a way to deal with folders that start with a number, other than adding an extra slash?
For example:
from pathlib import Path, PureWindowsPath
op = pathlib.Path("D:\Documents\01")
fn = "test.txt"
fp = outpath / fn
with fp.open("w", encoding ="utf-8") as f:
f.write(result)
Returns error: [Errno 22] Invalid argument: 'D:\\Documents\x01\\test.txt'
I would have thought the PureWindowsPath would have taken care of this. If I manually escape out of it with op = pathlib.Path("D:\Documents\\01"), then it is fine. Do I always have to manually add a backslash to avoid the escape?
"\01" is a byte whose value is 1, not "backslash, zero, one".
You can do, for example:
op = pathlib.Path("D:\Documents") / "01"
The extra slash in "D:\Documents\\01" is there to tell Python that you don't want it to interpret \01 as an escape sequence.
From the comments chain:
It's the Python interpreter that's doing the escaping: \01 will always
be treated as an escape sequence (unless it's in a raw string literal
like r"\01"). pathlib has nothing to do with escaping in this case

Convert string with "\x" character to float

I'm converting strings to floats using float(x). However for some reason, one of the strings is "71.2\x0060". I've tried following this answer, but it does not remove the bytes character
>>> s = "71.2\x0060"
>>> "".join([x for x in s if ord(x) < 127])
'71.2\x0060'
Other methods I've tried are:
>>> s.split("\\x")
['71.2\x0060']
>>> s.split("\x")
ValueError: invalid \x escape
I'm not sure why this string is not formatted correctly, but I'd like to get as much precision from this string and move on.
Going off of wim's comment, the answer might be this:
>>> s.split("\x00")
['71.2', '60']
So I should do:
>>> float(s.split("\x00")[0])
71.2
Unfortunately the POSIX group \p{XDigit} does not exist in the re module. To remove the hex control characters with regular expressions anyway, you can try the following.
impore re
re.sub(r'[\x00-\x1F]', r'', '71.2\x0060') # or:
re.sub(r'\\x[0-9a-fA-F]{2}', r'', r'71.2\x0060')
Output:
'71.260'
'71.260'
r means raw. Take a look at the control characters up to hex 1F in the ASCII table: https://www.torsten-horn.de/techdocs/ascii.htm

Replace two backslashes with a single backslash

I want to replace a string with two backslashes with single backslashes. However replace doesn't seem to accept '\\' as the replacement string. Here's the interpreter output:
>>> import tempfile
>>> temp_folder = tempfile.gettempdir()
>>> temp_folder
'C:\\Users\\User\\AppData\\Local\\Temp'
>>> temp_folder.replace('\\\\', '\\')
'C:\\Users\\User\\AppData\\Local\\Temp'
BTW, I know that Windows paths need to contain either double backslashes or a single forward slashes. I want to replace them anyway for display purposes.
Your output doesn't have double backslashes. What you are looking at is the repr() value of the string and that displays with escaped backslashes. Assuming your temp_folder would have double backslashes, you should instead use:
print(temp_folder.replace('\\\\', '\\'))
and that will show you:
C:\Users\User\AppData\Local\Temp
which also drops the quotes.
But your temp_folder is unlikely to have double backslashes and this difference in display probably got you thinking that there are double backslashes in the return value from tempfile.gettempdir(). As #Jean-Francois indicated, there should not be (at least not on Windows). So you don't need to use the .replace(), just print:
print(temp_folder)
This works for me
text = input('insert text')
list = text.split('\\')
print(list)
text2 = ''
for items in list:
if items != '':
text += items + '\\'
print(text2)

Add 'r' prefix to a python variable

I have string variable which is
temp = '1\2\3\4'
I would like to add a prefix 'r' to the string variable and get
r'1\2\3\4'
so that I can split the string based on '\'. I tried the following:
r'temp'
'r' + temp
r + temp
But none of the above works. Is there a simple to do it? I'm using python 3. I also tried to encode the string, using
temp.encode('string-escape')
But it returns the following error
LookupError: unknown encoding: string-escape
r is a prefix for string literals. This means, r"1\2\3\4" will not interpret \ as an escape when creating the string value, but keep \ as an actual character in the string. Thus, r"1\2\3\4" will have seven characters.
You already have the string value: there is nothing to interpret. You cannot have the r prefix affect a variable, only a literal.
Your temp = "1\2\3\4" will interpret backslashes as escapes, create the string '1\x02\x03\x04' (a four-character string), then assign this string to the variable temp. There is no way to retroactively reinterpret the original literal.
EDIT: In view of the more recent comments, you do not seem to, in fact, have a string "1\2\3\4". If you have a valid path, you can split it using
path.split(r'\')
or
path.split('\\')
but you probably also don't need that; rather, you may want to split a path into directory and file name, which is best done by os.path functions.
Wouldn't it just be re.escape(temp)?
Take for example the use case of trying to generate a pattern on the fly involving word boundaries. Then you can do this
r'\b' + re.escape(temp) + r'\b'
just to prefix r in variable in search, Please do this r+""+temp.
e.g.-
import re
email_address = 'Please contact us at: support#datacamp.com'
searchString = "([\w\.-]+)#([\w\.-]+)"
re.serach(r""+searchString, email_address)

Remove double escape characters from a string and make it a binary

I have a string looking like this:
"\\xd6\\x83\\x8dd!VT\\x92\\xaaA\\x05\\xe0\\x9b\\x8b\\xf1"
and I want to remove the double escape characters in order to make it a proper binary. Is that even possible?
The source string looks pretty much like a bytes string, so you could do:
>>> import ast
>>> s = "\\xd6\\x83\\x8dd!VT\\x92\\xaaA\\x05\\xe0\\x9b\\x8b\\xf1"
>>> print(ast.literal_eval("b'''%s'''" % s))
b'\xd6\x83\x8dd!VT\x92\xaaA\x05\xe0\x9b\x8b\xf1'

Resources