Python requests: UnicodeEncodeError: 'charmap' codec can't encode character - python-3.x

I scraped a webpage (name changed in code here) as follows:
import requests
r = requests.get('https://www.samplewebpage.com')
Then I tried to write r.text to a file as follows:
f = open ('filename', 'w')
f.write(r.text)
f.close()
I get an error as:
UnicodeEncodeError: 'charmap' codec can't encode character '\u20b9' in position 158691: character maps to <undefined>
r.encoding shows UTF-8. How to resolve the above?
Have also tried the following:
- few other random webpages and am able to run the code without any error for most.
- instead of r.text used r.content.decode('utf-8', 'ignore') but same error as above
My environment/system specifications:
Python 3.6.4
Windows 8.1 Pro, 64 bit
Default IDLE as installed from https://www.python.org.
Tried with a script in Atom as well, but same error.
Suspecting console encoding mismatch as I read in another similar problem on this forum, I reconfirmed from that the Atom console is set to UTF-8, though I believe console encoding is not the problem here, as I want to write to a file.
Thanks

Try explicitly specifying the file's encoding:
f = open ('filename', 'w', encoding='utf8')
f.write(r.text)
f.close()

Related

Reading file using python is not working properly in Linux

I'm running a python code where we read a fixed width file, which we extracted from ftp server. the code is working on windows without any issue. but when i am running the same code in the linux ec2 instance it's giving an error saying that "UnicodeDecodeError: 'utf-8' codec can't decode byte 0x99 in position 1819: invalid start byte". but the same code running in windows without any error.since i am not aware about the encoding type of the source file i am passing the encoding type as None. And this also working fine in windows but when we running the code in linux its giving an error saying that "encoding type None is not recognize".
i am using the codecs library to read the file and python version that i am using is 3.7.3
with codecs.open("recode.dat",encoding=None,errors='replace') as open_src:
with open("target_file.dat", 'w+',encoding=None) as open_tgt:
for src_rec in open_src:
new_rec = ''
for f_length in data_type_length:
f_length = int(f_length)
field = '"' + src_rec[:f_length].strip() + '"|'
new_rec += (field)
src_rec = src_rec[f_length:]
open_tgt.write(new_rec[:-1] + '\n')

Getting UnicodeDecodeError: 'utf-8' when robot framework testcases are run in command prompt

Whenever I am trying to run testcases in Robot framework through cmd, i am getting the below error:
Parsing <filename with path> failed UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 31: invalid start byte
The above error thrown for some files and below error for some files
Parsing <filename with path> failed UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 47: invalid start byte
And then my test fails saying there is no such tag in the suite I am referring to, but I have the tag in my file.
Initally I thought it was because of some setting in the editor(STS) I am using and changed the settings under Window-> preferences -> General -> Workspace -> text file encoding option to 'Other' and selected utf-8, gave workspace rebuild, restarted the STS, but still no luck.
Been searching for a solution since weeks. Can someone help me on this?
Checking each and every Testcases by removing and adding them in new file and also by removing all the special characters used in keyword definition as well in testcase definition in the robot files resolved the issue

open large gzip file (~1gb) in python

I am beginner in python and trying to learn python. I have written few line of code to open a large gzip file (size of ~ 1gb) and want to extract some content, however I am getting memory related error. could somebody please guide me how open the gzip with limited memory. I have put a part of code that is throwing error.
import os
import gzip
with gzip.open("test.gz","rb") as peak:
for line in peak:
file_content = line.read().decode("utf-8")
print(file_content)
Error: File "/software/anaconda3/lib/python3.7/gzip.py", line 276, in read
return self._buffer.read(size)
I am trying to recreate your issue but I am unable to. Using fallocate I create a big file, then gzip it, but hit no error in Python
$ fallocate -l 2G tempfile.img
$ gzip tempfile.img
$ ipython
>>> import gzip
>>> with gzip.open('tempfile.img.gz', 'rb') as fIn:
>>> content = fIn.read()
If you hit an exception, it should have some name like OSError or something more specific. My guess is that you have a 32-bit installation of Python which would impose memory limits in the gigabyte range. This SO thread covers a way to check if you're running 32- or 64-bit.
If you post the name of the exception or a reproducible example, then I can update this answer.

Python Script got ERROR when switch from Windows to Linux

I write a Python script on Windows and work pretty well, now I just installed "elementary Os" that is a Ubuntu based distro, but some how when I start the script it just crashed... I dont really now how to fixed it.
I let u a part of the script making problem:
memos=open(str(os.getcwd())+'\\LOG\\tres.txt','w')
menf='3)PRESION LATERAL DEL SUELO DE RELLENO\n Ka = '+str(round(Ka,2))+'\nP = '+str(round(P,2))+'\nY = '+str(round(Y,2))+'''
\nHm = '''+str(round(Hm,2))+'\nPm = '+str(round(Pm,2))+'\nYm = '+str(round(Ym,2))
memos.write(menf)
memos.close()
So the deal should be...
memos=open(str(os.getcwd())+'\\LOG\\tres.txt','w')
Because show me an error...
UnicodeEncodeError: 'ascii' codec can't encode character '\xba' in position 199: ordinal not in range(128)
Now, when I change for...
memos=open(str(os.getcwd())+'/LOG/tres.txt','w')
It got me another error...
FileNotFoundError: [Errno 2] No such file or directory: '/home/jojaror/Documentos/Scripts de Python/LOG/tres.txt'
I tryed to solve at my own, but i can't... so, if anyway could help me on this it would be helpful.

Encoding error Python3.5

So I'm encountering a strange encoding error in Python3.5, I'm reading a string consisting html-data, and I'm handling the string like this :
def parseHtml(self,url):
r = requests.get(self.makeUrl())
data = r.text.encode('utf-8').decode('ascii', 'ignore')
self.soup = BeautifulSoup(data,'lxml')
The error happens when I'm trying to print the following:
def extractTable(self):
table = self.soup.findAll("table", { "class" : "messageTable" })
print(table)
I have checked my locale, and tried various variations of encode / decode as stated in previous similar posts on SO. The strangest thing (for me) is that the script works flawlessly on a different machine and on my laptop. But on my Windows Machine (using cygwin to a remote server) and on my Ubuntu install it simply wont run and gives me:
UnicodeEncodeError: 'ascii' codec can't encode character '\xa0' in position 1273: ordinal not in range(128)
Okay, so I moved the file from the remote server to my local-machine and it executed perfectly. I then checked my sys.stdout.encoding :
>>> import sys
>>> sys.stdout.encoding
'ANSI_X3.4-1968'
Clearly something was wrong, so I ended up exporting :
export PYTHONIOENCODING=utf-8
And voìla!

Resources