how to fix the unicode problem on configparser - python-3.x

I use Python 3.7 and
configparser 3.7.4.
I have a rank.ini:
[example]
placeholder : \U0001F882
And i have a main.py file:
import configparser
config = configparser.ConfigParser()
config.read('ranks.ini')
print('🢂')
test = '\U0001F882'
print(type(test))
print(test)
test2 = config.get('example', 'placeholder')
print(type(test2))
print(test2)
The result of the code is:
🢂
<class 'str'>
🢂
<class 'str'>
\U0001F882
Why is the var test2 not "🢂" and how i can fix it.

It took me a while to figure this one out since python3 sees everything as unicode explained here
If my understanding is correct the original print is being seen like this u'\U0001F882', so it converts it into the character.
However, when you pass the variable in using the configparser as a string the unicode escape character is essentially getting lost such as '\\U0001F882'.
You can see this difference if you print test and test2's repr
print(repr(test))
print(repr(test2))
To get the output you want you will have to unicode escape the string value
print(test2.encode('utf8').decode('unicode-escape')
Hope this works for you.

Related

How to convert iso 8859-1 in simple letter using python

I m trying to clean my sqlite database using python. At first I loaded using this code:
import sqlite3, pandas as pd
con = sqlite3.connect("DATABASE.db")
import sqlite3, pandas as pd
df = pd.read_sql_query("SELECT TITLE from DOCUMENT", con)
So I got the dirty words. for example this "Conciliaci\363n" I want to get "Conciliacion". I used this code:
df['TITLE']=df['TITle'].apply(lambda x: x.decode('iso-8859-1').encode('utf8'))
I got b'' in blank cells. and got 'Conciliaci\\363n' too. So maybe I'm doing wrong. how can I solve this problem. Thanks in advance.
It's unclear, but if your string contains a literal backslash and numbers like this:
>>> s= r"Conciliaci\363n" # A raw string to make a literal escape code
>>> s
'Conciliaci\\363n' # debug display of string shows an escaped backslash
>>> print(s)
Conciliaci\363n # printing prints the escape
Then this will decode it correctly:
>>> s.encode('ascii').decode('unicode-escape') # convert to byte string, then decode
'Conciliación'
If you want to lose the accent mark as your question shows, then decomposing the Unicode string, converting to ASCII ignoring errors, then converting back to a Unicode string will do it:
>>> s2 = s.encode('ascii').decode('unicode-escape')
>>> s2
'Conciliación'
>>> import unicodedata as ud
>>> ud.normalize('NFD',s2) # Make Unicode decomposed form
'Conciliación' # The ó is now an ASCII 'o' and a combining accent
>>> ud.normalize('NFD',s2).encode('ascii',errors='ignore').decode('ascii')
'Conciliacion' # accent isn't ASCII, so is removed

Python getting a url request to work with special characters [duplicate]

If I do
url = "http://example.com?p=" + urllib.quote(query)
It doesn't encode / to %2F (breaks OAuth normalization)
It doesn't handle Unicode (it throws an exception)
Is there a better library?
Python 2
From the documentation:
urllib.quote(string[, safe])
Replace special characters in string
using the %xx escape. Letters, digits,
and the characters '_.-' are never
quoted. By default, this function is
intended for quoting the path section
of the URL.The optional safe parameter
specifies additional characters that
should not be quoted — its default
value is '/'
That means passing '' for safe will solve your first issue:
>>> urllib.quote('/test')
'/test'
>>> urllib.quote('/test', safe='')
'%2Ftest'
About the second issue, there is a bug report about it. Apparently it was fixed in Python 3. You can workaround it by encoding as UTF-8 like this:
>>> query = urllib.quote(u"Müller".encode('utf8'))
>>> print urllib.unquote(query).decode('utf8')
Müller
By the way, have a look at urlencode.
Python 3
In Python 3, the function quote has been moved to urllib.parse:
>>> import urllib.parse
>>> print(urllib.parse.quote("Müller".encode('utf8')))
M%C3%BCller
>>> print(urllib.parse.unquote("M%C3%BCller"))
Müller
In Python 3, urllib.quote has been moved to urllib.parse.quote, and it does handle Unicode by default.
>>> from urllib.parse import quote
>>> quote('/test')
'/test'
>>> quote('/test', safe='')
'%2Ftest'
>>> quote('/El Niño/')
'/El%20Ni%C3%B1o/'
I think module requests is much better. It's based on urllib3.
You can try this:
>>> from requests.utils import quote
>>> quote('/test')
'/test'
>>> quote('/test', safe='')
'%2Ftest'
My answer is similar to Paolo's answer.
If you're using Django, you can use urlquote:
>>> from django.utils.http import urlquote
>>> urlquote(u"Müller")
u'M%C3%BCller'
Note that changes to Python mean that this is now a legacy wrapper. From the Django 2.1 source code for django.utils.http:
A legacy compatibility wrapper to Python's urllib.parse.quote() function.
(was used for unicode handling on Python 2)
It is better to use urlencode here. There isn't much difference for a single parameter, but, IMHO, it makes the code clearer. (It looks confusing to see a function quote_plus! - especially those coming from other languages.)
In [21]: query='lskdfj/sdfkjdf/ksdfj skfj'
In [22]: val=34
In [23]: from urllib.parse import urlencode
In [24]: encoded = urlencode(dict(p=query,val=val))
In [25]: print(f"http://example.com?{encoded}")
http://example.com?p=lskdfj%2Fsdfkjdf%2Fksdfj+skfj&val=34
Documentation
urlencode
quote_plus
An alternative method using furl:
import furl
url = "https://httpbin.org/get?hello,world"
print(url)
url = furl.furl(url).url
print(url)
Output:
https://httpbin.org/get?hello,world
https://httpbin.org/get?hello%2Cworld

How to use f'string bytes'string together? [duplicate]

I'm looking for a formatted byte string literal. Specifically, something equivalent to
name = "Hello"
bytes(f"Some format string {name}")
Possibly something like fb"Some format string {name}".
Does such a thing exist?
No. The idea is explicitly dismissed in the PEP:
For the same reason that we don't support bytes.format(), you may
not combine 'f' with 'b' string literals. The primary problem
is that an object's __format__() method may return Unicode data
that is not compatible with a bytes string.
Binary f-strings would first require a solution for
bytes.format(). This idea has been proposed in the past, most
recently in PEP 461. The discussions of such a feature usually
suggest either
adding a method such as __bformat__() so an object can control how it is converted to bytes, or
having bytes.format() not be as general purpose or extensible as str.format().
Both of these remain as options in the future, if such functionality
is desired.
In 3.6+ you can do:
>>> a = 123
>>> f'{a}'.encode()
b'123'
You were actually super close in your suggestion; if you add an encoding kwarg to your bytes() call, then you get the desired behavior:
>>> name = "Hello"
>>> bytes(f"Some format string {name}", encoding="utf-8")
b'Some format string Hello'
Caveat: This works in 3.8 for me, but note at the bottom of the Bytes Object headline in the docs seem to suggest that this should work with any method of string formatting in all of 3.x (using str.format() for versions <3.6 since that's when f-strings were added, but the OP specifically asks about 3.6+).
From python 3.6.2 this percent formatting for bytes works for some use cases:
print(b"Some stuff %a. Some other stuff" % my_byte_or_unicode_string)
But as AXO commented:
This is not the same. %a (or %r) will give the representation of the string, not the string iteself. For example b'%a' % b'bytes' will give b"b'bytes'", not b'bytes'.
Which may or may not matter depending on if you need to just present the formatted byte_or_unicode_string in a UI or if you potentially need to do further manipulation.
As noted here, you can format this way:
>>> name = b"Hello"
>>> b"Some format string %b World" % name
b'Some format string Hello World'
You can see more details in PEP 461
Note that in your example you could simply do something like:
>>> name = b"Hello"
>>> b"Some format string " + name
b'Some format string Hello'
This was one of the bigger changes made from python 2 to python3. They handle unicode and strings differently.
This s how you'd convert to bytes.
string = "some string format"
string.encode()
print(string)
This is how you'd decode to string.
string.decode()
I had a better appreciation for the difference between Python 2 versus 3 change to unicode through this coursera lecture by Charles Severence. You can watch the entire 17 minute video or fast forward to somewhere around 10:30 if you want to get to the differences between python 2 and 3 and how they handle characters and specifically unicode.
I understand your actual question is how you could format a string that has both strings and bytes.
inBytes = b"testing"
inString = 'Hello'
type(inString) #This will yield <class 'str'>
type(inBytes) #this will yield <class 'bytes'>
Here you could see that I have a string a variable and a bytes variable.
This is how you would combine a byte and string into one string.
formattedString=(inString + ' ' + inBytes.encode())

Converting string to dictionary from a opened file

A text file contains dictionary as below
{
"A":"AB","B":"BA"
}
Below are code of python file
with open('devices_file') as d:
print (d["A"])
Result should print AB.
As #rassar and #Ivrf suggested in comments you can use ast.literal_eval() as well as json.loads() to achieve this. Both code snippets outputs AB.
Solution with ast.literal_eval():
import ast
with open("devices_file", "r") as d:
content = d.read()
result = ast.literal_eval(content)
print(result["A"])
Solution with json.loads():
import json
with open("devices_file") as d:
content = json.load(d)
print(content["A"])
Python documentation about ast.eval_literal() and json.load().
Also: I noticed that you're not using the correct syntax in the code snippet in your question. Indented lines should be indented with 4 spaces, and between the print keyword and the associated parentheses there's no whitespace allowed.

How to use python to convert a backslash in to forward slash for naming the filepaths in windows OS?

I have a problem in converting all the back slashes into forward slashes using Python.
I tried using the os.sep function as well as the string.replace() function to accomplish my task. It wasn't 100% successful in doing that
import os
pathA = 'V:\Gowtham\2019\Python\DailyStandup.txt'
newpathA = pathA.replace(os.sep,'/')
print(newpathA)
Expected Output:
'V:/Gowtham/2019/Python/DailyStandup.txt'
Actual Output:
'V:/Gowtham\x819/Python/DailyStandup.txt'
I am not able to get why the number 2019 is converted in to x819. Could someone help me on this?
Your issue is already in pathA: if you print it out, you'll see that it already as this \x81 since \201 means a character defined by the octal number 201 which is 81 in hexadecimal (\x81). For more information, you can take a look at the definition of string literals.
The quick solution is to use raw strings (r'V:\....'). But you should take a look at the pathlib module.
Using the raw string leads to the correct answer for me.
import os
pathA = r'V:\Gowtham\2019\Python\DailyStandup.txt'
newpathA = pathA.replace(os.sep,'/')
print(newpathA)
OutPut:
V:/Gowtham/2019/Python/DailyStandup.txt
Try this, Using raw r'your-string' string format.
>>> import os
>>> pathA = r'V:\Gowtham\2019\Python\DailyStandup.txt' # raw string format
>>> newpathA = pathA.replace(os.sep,'/')
Output:
>>> print(newpathA)
V:/Gowtham/2019/Python/DailyStandup.txt

Resources