Request() with scandinavian letters - python-3.x

When I am trying to request following URL that has the special letter ä:
req = Request(https://www.booli.se/slutpriser/frosunda/874692/?objectType=Lägenhet&page=1)
I get the following error when i run it:
UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in
position 45: ordinal not in range(128)
Is there a special encode I should import in order to solve this error?

That's not valid code, and not a terribly clear use of the Requests class (assuming you're using requests).
Perhaps this will do what you want:
#!/usr/bin/python3
import requests
req = requests.get("https://www.booli.se/slutpriser/frosunda/874692/?objectType=Lägenhet&page=1")
print (req.text)

Related

Seaborn Error: 'ascii' codec can't encode character '\xda' in position 4710: ordinal not in range(128)

I'm trying to create a bar graph similar to this example https://seaborn.pydata.org/examples/grouped_barplot.html
My code is as follows:
sns.set(style="whitegrid")
df_noshow = sns.load_dataset(df)
g = sns.catplot(x="Noshow", y="SMS_received", hue="Gender",
data=df_noshow, height=6, kind="bar", palette="muted")
g.despine(left=True)
g.set_ylabels("Text Message Received")
I'm getting this error: UnicodeEncodeError: 'ascii' codec can't encode character '\xda' in position 4710: ordinal not in range(128)
Additionally, I'm not 100% sure I did the following code correctly:
df_noshow = sns.load_dataset(df)
I did create df earlier with
import pandas as pd
df= pd.read_csv('noshow2016.csv')
and all the previous code has been working and I can't imagine the unicode error having anything to do with csv file not loading correctly, however I wanted to include it just in case. Thank you.

'charmap' codec can't encode characters in position XX

I have a simple script that is attempting to extract mutiple json objects from a single file, and store it as a list:
import json
URL = r"C:\Users\Kenneth\Youtube_comment_parser\Testing.txt"
with open(URL, 'r', encoding="utf-8") as handle:
json_data = [json.loads(line) for line in handle]
print(json_data) # Can't .encode() because it's a list
Even after specifying utf-8 encoding, I'm still running into a codec error. If possible, I would also like to change this object into a dictionary, but this is as far as I've got.
The exact error reads:
UnicodeEncodeError: 'charmap' codec can't encode characters in position
394-395: character maps to <undefined>
Thanks in advance.
I was able to solve this issue by removing one unicode character that was producing "/undefined>", the string '\ufeff', and then the rest was able to display nicely. This required me to iterate over the keys in the list of dictionaries, and replace as necessary.
import json
URL = r"C:\Users\Kenneth\Youtube_comment_parser\Testing.txt"
json1_file = open(URL, encoding='utf-8')
json1_str = json1_file.read()
json1_str = [d.strip() for d in json1_str.splitlines()]
json1_data = [json.loads(i) for i in json1_str]
json1_data = [{key:value.replace(u'\ufeff', '') for
key, value in json1_data[index].items()} for
index in range(len(json1_data))]
print(json1_data[1]['text'].encode('utf-8'))
Still not sure why I have to open with utf-8 and then encode again with my print statement, but it produced the string nicely.

Remove non utf-8 characters from string in python

I am attempting to read in tweets and write these tweets to a file. However, I am getting UnicodeEncodeErrors when I try to write some of these tweets to a file. Is there a way to remove these non utf-8 characters so I can write out the rest of the tweet?
For example, a problem tweet may look it this:
Camera? 🎥
This is the code I am using:
with open("Tweets.txt",'w') as f:
for user_tws in twitter.get_user_timeline(screen_name='camera',
count = 200):
try:
f.write(user_tws["text"] + '\n')
except UnicodeEncodeError:
print("skipped: " + user_tws["text"])
mod_tw = user_tws["text"]
mod_tw=mod_tw.encode('utf-8','replace').decode('utf-8')
print(mod_tw)
f.write(mod_tw)
The error is this:
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f3a5' in position 56: character maps to
You are not writing a UTF8 encoded file, add the encoding parameter to the open function
with open("Tweets.txt",'w', encoding='utf8') as f:
...
Have fun 🎥

Python 3, UnicodeEncodeError with decode set to ignore

This code makes an http call to a solr index.
query_uri = prop.solr_base_uri + "?q=" + query + "&wt=json&indent=true"
with urllib.request.urlopen(query_uri) as response:
data = response.read()
#data is bytes
data_str=data.decode('utf-8', 'ignore')
print(data_str)
The print statement throws:
UnicodeEncodeError: 'charmap' codec can't encode character '\u2715' in position 149273: character maps to undefined
I thought the decode('utf-8', 'ignore') was supposed to ignore non utf-8 characters and leave it out of the result? How is it that I have a UnicodeEncodeError in the the print statement? How do I handle characters that can't encoded in Unicode? Thanks!
The error is caused by print (and any file.write()) not having a character map set and defaulting to ASCII.
The recommended approach is to set PYTHONIOENCODING=UTF-8 in your environment or encode each string before printing:
print(`data_str`.encode("utf-8")
For file writing, set the encoding for the file when you open it:
file = open("/temp/test.txt", "w", encoding="UTF-8")
file.write('\u2715')

urlopen() throwing error in python 3.3

from urllib.request import urlopen
def ShowResponse(param):
uri = str("mysite.com/?param="+param+"&submit=submit")
print(urlopen(uri).read())
file = open("myfile.txt","r")
if file.mode == "r":
filelines = file.readlines()
for line in filelines:
line = line.strip()
ShowResponse(line)
this is my python code but when i run this it causes an error "UnicodeEncodeError: 'ascii' codec can't encode characters in position 47-49: ordinal not in range(128)"
i dont know how to fix this. im new to python
I'm going to assume that the stack trace shows that line 4 (uri = str(...) is throwing the given error and myfile.txt contains UTF-8 characters.
The error is because you're trying to convert a Unicode object (decoded from assumed UTF-8) to an ASCII string object. ASCII simply can not represent your character.
URIs (including the Query String) must encode non-ASCII chars as percent-encoded UTF-8 bytes. Example:
€ (EURO SIGN) is encoded in UTF-8 is:
0xE2 0x82 0xAC
Percent-encoded, it's:
%E2%82%AC
Therefore, your code needs to re-encode your parameter to UTF-8 then percent-encode it:
from urllib.request import urlopen, quote
def ShowResponse(param):
param_utf8 = param.encode("utf-8")
param_perc_encoded = quote(param_utf8)
# or uri = str("mysite.com/?param="+param_perc_encoded+"&submit=submit")
uri = str("mysite.com/?param={0}&submit=submit".format(param_perc_encoded) )
print(urlopen(uri).read())
You'll also see I've changed your uri = definition slightly to use String.format() (https://docs.python.org/2/library/string.html#format-string-syntax), which I find easier to create complex strings rather than doing string concatenation with +. In this example, {0} is replaced with the first argument to .format().

Resources