How to use python to convert a backslash in to forward slash for naming the filepaths in windows OS? - python-3.x

I have a problem in converting all the back slashes into forward slashes using Python.
I tried using the os.sep function as well as the string.replace() function to accomplish my task. It wasn't 100% successful in doing that
import os
pathA = 'V:\Gowtham\2019\Python\DailyStandup.txt'
newpathA = pathA.replace(os.sep,'/')
print(newpathA)
Expected Output:
'V:/Gowtham/2019/Python/DailyStandup.txt'
Actual Output:
'V:/Gowtham\x819/Python/DailyStandup.txt'
I am not able to get why the number 2019 is converted in to x819. Could someone help me on this?

Your issue is already in pathA: if you print it out, you'll see that it already as this \x81 since \201 means a character defined by the octal number 201 which is 81 in hexadecimal (\x81). For more information, you can take a look at the definition of string literals.
The quick solution is to use raw strings (r'V:\....'). But you should take a look at the pathlib module.

Using the raw string leads to the correct answer for me.
import os
pathA = r'V:\Gowtham\2019\Python\DailyStandup.txt'
newpathA = pathA.replace(os.sep,'/')
print(newpathA)
OutPut:
V:/Gowtham/2019/Python/DailyStandup.txt

Try this, Using raw r'your-string' string format.
>>> import os
>>> pathA = r'V:\Gowtham\2019\Python\DailyStandup.txt' # raw string format
>>> newpathA = pathA.replace(os.sep,'/')
Output:
>>> print(newpathA)
V:/Gowtham/2019/Python/DailyStandup.txt

Related

extract data between single quotes

trying to extract the data between single quotes
import re
a = 'USA-APA HA-WBS-10.152.08.0/24'
print(re.findall(r'()', a))
expecting the oputput : USA-APA HA-WBS-10.152.08.0/24
What is wrong with ? It is just a string ?
a = 'USA-APA HA-WBS-10.152.08.0/24'
print(a)
Output:
% python3 test.py
USA-APA HA-WBS-10.152.08.0/24
You might want to look at this also regarding quotes and strings:
Single and Double Quotes | Python
I am not very familiar with python but with some quick searching around
I've found that this work
import re
a = 'USA-APA HA-WBS-10.152.08.0/24'
result = re.findall(r'(.*?)', a)
print("".join(result))
I'm pretty sure there are better ways of solving this but I'm not familiar with the language

How to use f'string bytes'string together? [duplicate]

I'm looking for a formatted byte string literal. Specifically, something equivalent to
name = "Hello"
bytes(f"Some format string {name}")
Possibly something like fb"Some format string {name}".
Does such a thing exist?
No. The idea is explicitly dismissed in the PEP:
For the same reason that we don't support bytes.format(), you may
not combine 'f' with 'b' string literals. The primary problem
is that an object's __format__() method may return Unicode data
that is not compatible with a bytes string.
Binary f-strings would first require a solution for
bytes.format(). This idea has been proposed in the past, most
recently in PEP 461. The discussions of such a feature usually
suggest either
adding a method such as __bformat__() so an object can control how it is converted to bytes, or
having bytes.format() not be as general purpose or extensible as str.format().
Both of these remain as options in the future, if such functionality
is desired.
In 3.6+ you can do:
>>> a = 123
>>> f'{a}'.encode()
b'123'
You were actually super close in your suggestion; if you add an encoding kwarg to your bytes() call, then you get the desired behavior:
>>> name = "Hello"
>>> bytes(f"Some format string {name}", encoding="utf-8")
b'Some format string Hello'
Caveat: This works in 3.8 for me, but note at the bottom of the Bytes Object headline in the docs seem to suggest that this should work with any method of string formatting in all of 3.x (using str.format() for versions <3.6 since that's when f-strings were added, but the OP specifically asks about 3.6+).
From python 3.6.2 this percent formatting for bytes works for some use cases:
print(b"Some stuff %a. Some other stuff" % my_byte_or_unicode_string)
But as AXO commented:
This is not the same. %a (or %r) will give the representation of the string, not the string iteself. For example b'%a' % b'bytes' will give b"b'bytes'", not b'bytes'.
Which may or may not matter depending on if you need to just present the formatted byte_or_unicode_string in a UI or if you potentially need to do further manipulation.
As noted here, you can format this way:
>>> name = b"Hello"
>>> b"Some format string %b World" % name
b'Some format string Hello World'
You can see more details in PEP 461
Note that in your example you could simply do something like:
>>> name = b"Hello"
>>> b"Some format string " + name
b'Some format string Hello'
This was one of the bigger changes made from python 2 to python3. They handle unicode and strings differently.
This s how you'd convert to bytes.
string = "some string format"
string.encode()
print(string)
This is how you'd decode to string.
string.decode()
I had a better appreciation for the difference between Python 2 versus 3 change to unicode through this coursera lecture by Charles Severence. You can watch the entire 17 minute video or fast forward to somewhere around 10:30 if you want to get to the differences between python 2 and 3 and how they handle characters and specifically unicode.
I understand your actual question is how you could format a string that has both strings and bytes.
inBytes = b"testing"
inString = 'Hello'
type(inString) #This will yield <class 'str'>
type(inBytes) #this will yield <class 'bytes'>
Here you could see that I have a string a variable and a bytes variable.
This is how you would combine a byte and string into one string.
formattedString=(inString + ' ' + inBytes.encode())

python replace \\ with \ in stringpath automatically

how can I replace "\" in path string with "\\" python, u know \ is for escape character and r'\' and r"\" also don't work, neither in str.replace() or in re.sub()
If your objective is to get the correct path you can use the raw string:
r"C:\Users"
# will return
Out[2]: 'C:\\Users'
# in the console
#however if you print it, it will print this:
print(r"C:\Users")
C:\Users
if you want to combine parts of the path dynamically i recommend the os library (standard library)
use it like this:
import os
path = os.path.join(r"first_part_of_path", r"other_part_of_path", "filename.xlsx")
from python's documentation: "The solution is to use Python’s raw string notation for regular expression patterns; backslashes are not handled in any special way in a string literal prefixed with 'r'. So r"\n" is a two-character string containing '' and 'n', while "\n" is a one-character string containing a newline. Usually patterns will be expressed in Python code using this raw string notation."
https://docs.python.org/3/library/re.html
below maybe what you are looking for:
x=r'this, is a \test'
re.subn('\\','\\',x)
from the standard library, you could use os.path.normpath
Example:
import os
myDir = r"path\to\dir"
normalized = os.path.normpath(myDir)
Which enables the following :
>>> normalized
'path\\to\\dir'
>>> print(normalized)
path\to\dir
>>> str(normalized)
'path\\to\\dir'
>>> repr(normalized)
"'path\\\\to\\\\dir'"
I just realized our path for i.e.
path_str="E:\neural network\Pytorch"
can be changed to
path_str=path_str.encode('unicode-escape').decode().replace('\\\\', '\\')
and this would also do it automatically without need to manipulating the string manually to
path_str=r"E:\neural network\Pytorch"

Problem with multivariables in string formatting

I have several files in a folder named t_000.png, t_001.png, t_002.png and so on.
I have made a for-loop to import them using string formatting. But when I use the for-loop I got the error
No such file or directory: '/file/t_0.png'
This is the code that I have used I think I should use multiple %s but I do not understand how.
for i in range(file.shape[0]):
im = Image.open(dir + 't_%s.png' % str(i))
file[i] = im
You need to pad the string with leading zeroes. With the type of formatting you're currently using, this should work:
im = Image.open(dir + 't_%03d.png' % i)
where the format string %03s means "this should have length 3 characters and empty space should be padded by leading zeroes".
You can also use python's other (more recent) string formatting syntax, which is somewhat more succinct:
im = Image.open(f"{dir}t_{i:03d}")
You are not padding the number with zeros, thus you get t_0.png instead of t_000.png.
The recommended way of doing this in Python 3 is via the str.format function:
for i in range(file.shape[0]):
im = Image.open(dir + 't_{:03d}.png'.format(i))
file[i] = im
You can see more examples in the documentation.
Formatted string literals are also an option if you are using Python 3.6 or a more recent version, see Green Cloak Guy's answer for that.
Try this:
import os
for i in range(file.shape[0]):
im = Image.open(os.path.join(dir, f't_{i:03d}.png'))
file[i] = im
(change: f't_{i:03d}.png' to 't_{:03d}.png'.format(i) or 't_%03d.png' % i for versions of Python prior to 3.6).
The trick was to specify a certain number of leading zeros, take a look at the official docs for more info.
Also, you should replace 'dir + file' with the more robust os.path.join(dir, file), which would work regardless of dir ending with a directory separator (i.e. '/' for your platform) or not.
Note also that both dir and file are reserved names in Python and you may want to rename your variables.
Also check that if file is a NumPy array, file[i] = im may not be working.

How to read Greek characters in pandas?

I am dealing with a dataframe that has Greek characters. They appear like that:
The data are here:
toy.to_json()
'{"a_a":{"0":49.0,"1":50.0,"2":52.0,"3":53.0,"4":54.0},"grade":{"0":3.0,"1":5.0,"2":4.0,"3":5.0,"4":4.0},"sex":{"0":"\\u00c1\\u00e3\\u00fc\\u00f1\\u00e9","1":"\\u00c1\\u00e3\\u00fc\\u00f1\\u00e9","2":"\\u00c1\\u00e3\\u00fc\\u00f1\\u00e9","3":"\\u00c1\\u00e3\\u00fc\\u00f1\\u00e9","4":"\\u00c1\\u00e3\\u00fc\\u00f1\\u00e9"},"age":{"0":122.0,"1":125.0,"2":119.0,"3":122.0,"4":127.0},"fath_job":{"0":2.0,"1":2.0,"2":2.0,"3":2.0,"4":2.0},"phscs":{"0":49.0,"1":73.0,"2":61.0,"3":75.0,"4":59.0},"pcc":{"0":10.0,"1":26.0,"2":19.0,"3":28.0,"4":23.0},"pcg":{"0":21.0,"1":28.0,"2":20.0,"3":25.0,"4":19.0},"tasc":{"0":17.0,"1":5.0,"2":17.0,"3":8.0,"4":11.0},"class":{"0":0.0,"1":0.0,"2":0.0,"3":0.0,"4":0.0},"grade3":{"0":"\\u00ec\\u00dd\\u00f4\\u00f1\\u00e9\\u00ef\\u00f2","1":"\\u00f5\\u00f8\\u00e7\\u00eb\\u00fc\\u00f2","2":"\\u00f5\\u00f8\\u00e7\\u00eb\\u00fc\\u00f2","3":"\\u00f5\\u00f8\\u00e7\\u00eb\\u00fc\\u00f2","4":"\\u00f5\\u00f8\\u00e7\\u00eb\\u00fc\\u00f2"},"pcc3":{"0":"\\u00f7\\u00e1\\u00ec\\u00e7\\u00eb\\u00de","1":"\\u00f5\\u00f8\\u00e7\\u00eb\\u00de","2":"\\u00ec\\u00dd\\u00f4\\u00f1\\u00e9\\u00e1","3":"\\u00f5\\u00f8\\u00e7\\u00eb\\u00de","4":"\\u00ec\\u00dd\\u00f4\\u00f1\\u00e9\\u00e1"},"tasc3":{"0":3.0,"1":1.0,"2":3.0,"3":2.0,"4":2.0},"pcg3":{"0":"\\u00ec\\u00dd\\u00f4\\u00f1\\u00e9\\u00e1","1":"\\u00f5\\u00f8\\u00e7\\u00eb\\u00de","2":"\\u00ec\\u00dd\\u00f4\\u00f1\\u00e9\\u00e1","3":"\\u00f5\\u00f8\\u00e7\\u00eb\\u00de","4":"\\u00ec\\u00dd\\u00f4\\u00f1\\u00e9\\u00e1"},"phscs3":{"0":"\\u00f7\\u00e1\\u00ec\\u00e7\\u00eb\\u00de","1":"\\u00f5\\u00f8\\u00e7\\u00eb\\u00de","2":"\\u00ec\\u00dd\\u00f4\\u00f1\\u00e9\\u00e1","3":"\\u00f5\\u00f8\\u00e7\\u00eb\\u00de","4":"\\u00ec\\u00dd\\u00f4\\u00f1\\u00e9\\u00e1"}}'
I tried to import the file with encoding = 'utf_8' but it did not work.
Here are some other approaches I tried:
toy.to_csv('toy.csv', index = False)
import chardet
rawdata = open('toy.csv', 'rb').read()
result = chardet.detect(rawdata)
charenc = result['encoding']
pd.read_csv('toy.csv', encoding = charenc)
pd.read_csv('toy.csv', encoding = 'cp737')
Try something like this:
pd.read_csv('toy.csv', encoding = 'iso8859_7')
It solved the problem for me, I hope it's the same for you.
On Mac (macOS Catalina 10.15.7) this works:
gdf = gpd.read_file("./../data/processed/landmarks.shp", encoding="mac_greek")
Apparently, you have to find the right encoding for your system. A list of python encodings can be found in the documentation. However, it is still not perfect. I still have some random characters. I do not know it that is due to the data source or because the list refers to python 2.
Greek encoded geo pandas data

Resources