Using Python 3 pathlib on Windows, is there a way to deal with folders that start with a number, other than adding an extra slash?
For example:
from pathlib import Path, PureWindowsPath
op = pathlib.Path("D:\Documents\01")
fn = "test.txt"
fp = outpath / fn
with fp.open("w", encoding ="utf-8") as f:
f.write(result)
Returns error: [Errno 22] Invalid argument: 'D:\\Documents\x01\\test.txt'
I would have thought the PureWindowsPath would have taken care of this. If I manually escape out of it with op = pathlib.Path("D:\Documents\\01"), then it is fine. Do I always have to manually add a backslash to avoid the escape?
"\01" is a byte whose value is 1, not "backslash, zero, one".
You can do, for example:
op = pathlib.Path("D:\Documents") / "01"
The extra slash in "D:\Documents\\01" is there to tell Python that you don't want it to interpret \01 as an escape sequence.
From the comments chain:
It's the Python interpreter that's doing the escaping: \01 will always
be treated as an escape sequence (unless it's in a raw string literal
like r"\01"). pathlib has nothing to do with escaping in this case
Related
I am using python3 in my ubuntu machine.
I have a string variable which consist a path with backslash and i need to convert it to forward slash string. So i tried
import pathlib
s = '\dir\wnotherdir\joodir\more'
x = repr(s)
p = pathlib.PureWindowsPath(x)
print(p.as_posix())
this will print correctly as
/dir/wnotherdir/joodir/more
but for different other string path, it acts weirdly. Ex, for the string,
'\dir\aotherdir\oodir\more'
it correctly replacing backslashes but the value is wrong because of the character 'a' in original string
/dir/x07otherdir/oodir/more
What is the reason for this behavior?
This has nothing to do with paths per-se. The problem here is that the \a is being interpreted as an ASCII BELL. As a rule of thumb whenever you want to disable the special interpretation of escaped string literals you should use raw strings:
>>> import pathlib
>>> r = r'\dir\aotherdir\oodir\more'
>>> pathlib.PureWindowsPath(r)
PureWindowsPath('/dir/aotherdir/oodir/more')
>>>
Why doesn't it work? my gut feeling is it has something to do with the slashes(\);
savepath = ("C:\\Python\" + date4filename + ".txt")
Error is
File "C:\python\temp.py", line 2
savepath=("C:\\Python\" + date4filename)
^
SyntaxError: EOL while scanning string literal
[Finished in 0.191s]
Back slash has special meaning which is used to take away special meaning of special characters when prefixed, here it is double quote ("). For this reason we have raw strings in python.
Raw strings are defined using r' ' . When raw strings are used all characters inside string are treated normal with no special meaning
Since backslash has special meaning, to use actual backslash we need to use (\\)
savepath = ("C:\\Python\\" + date4filename + ".txt")
Not to make it complex, use os.path library
import os.path
os.path.join("c://python/", date4filename, ".txt")
To avoid these path problems, you can absolutely use *nix style forwardslash(/) in python regardless of platform
I'm converting strings to floats using float(x). However for some reason, one of the strings is "71.2\x0060". I've tried following this answer, but it does not remove the bytes character
>>> s = "71.2\x0060"
>>> "".join([x for x in s if ord(x) < 127])
'71.2\x0060'
Other methods I've tried are:
>>> s.split("\\x")
['71.2\x0060']
>>> s.split("\x")
ValueError: invalid \x escape
I'm not sure why this string is not formatted correctly, but I'd like to get as much precision from this string and move on.
Going off of wim's comment, the answer might be this:
>>> s.split("\x00")
['71.2', '60']
So I should do:
>>> float(s.split("\x00")[0])
71.2
Unfortunately the POSIX group \p{XDigit} does not exist in the re module. To remove the hex control characters with regular expressions anyway, you can try the following.
impore re
re.sub(r'[\x00-\x1F]', r'', '71.2\x0060') # or:
re.sub(r'\\x[0-9a-fA-F]{2}', r'', r'71.2\x0060')
Output:
'71.260'
'71.260'
r means raw. Take a look at the control characters up to hex 1F in the ASCII table: https://www.torsten-horn.de/techdocs/ascii.htm
While importing data from a flat file, I noticed some embedded hex-values in the string (<0x00>, <0x01>).
I want to replace them with specific characters, but am unable to do so. Removing them won't work either.
What it looks like in the exported flat file: https://i.imgur.com/7MQpoMH.png
Another example: https://i.imgur.com/3ZUSGIr.png
This is what I've tried:
(and mind, <0x01> represents a none-editable entity. It's not recognized here.)
import io
with io.open('1.txt', 'r+', encoding="utf-8") as p:
s=p.read()
# included in case it bears any significance
import re
import binascii
s = "Some string with hex: <0x01>"
s = s.encode('latin1').decode('utf-8')
# throws e.g.: >>> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 114: invalid start byte
s = re.sub(r'<0x01>', r'.', s)
s = re.sub(r'\\0x01', r'.', s)
s = re.sub(r'\\\\0x01', r'.', s)
s = s.replace('\0x01', '.')
s = s.replace('<0x01>', '.')
s = s.replace('0x01', '.')
or something along these lines in hopes to get a grasp of it while iterating through the whole string:
for x in s:
try:
base64.encodebytes(x)
base64.decodebytes(x)
s.strip(binascii.unhexlify(x))
s.decode('utf-8')
s.encode('latin1').decode('utf-8')
except:
pass
Nothing seems to get the job done.
I'd expect the characters to be replacable with the methods I've dug up, but they are not. What am I missing?
NB: I have to preserve umlauts (äöüÄÖÜ)
-- edit:
Could I introduce the hex-values in the first place when exporting? If so, is there a way to avoid that?
with io.open('out.txt', 'w', encoding="utf-8") as temp:
temp.write(s)
Judging from the images, these are actually control characters.
Your editor displays them in this greyed-out way showing you the value of the bytes using hex notation.
You don't have the characters "0x01" in your data, but really a single byte with the value 1, so unhexlify and friends won't help.
In Python, these characters can be produced in string literals with escape sequences using the notation \xHH, with two hexadecimal digits.
The fragment from the first image is probably equal to the following string:
"sich z\x01 B. irgendeine"
Your attempts to remove them were close.
s = s.replace('\x01', '.') should work.
I want to get the first two number of str "21N"
import re
str = "21N"
number = re.find(r\'d{1,2}', str)
but get this error, how do I get the first two number from the the str
I'am very think for
Move the backslash between the apostrophe and d:
number = re.findall(r'\d{1,2}', str)
You have a couple of errors. re.find doesn't exist, you can use re.search instead. And your backslash needs to be inside rather than outside your opening quote.
So the following would work:
number = re.search(r'\d{1,2}', str)
But the {1,2} is actually unnecessary if you know you're looking for exactly 2 digits. Just use:
number = re.search(r'\d{2}', str)
And as an aside, don't use the variable name str, as it is a built-in type in Python.