How can I download an image from my local directory? - python-3.x

Definitely, I'm going to do my PC crawling.
I want to get an image from an HTML document on my PC.
I tried this:
n=0
for i in soup.find_all('div', class_='c_img'):
with open('FILE DIRECTORY', 'r', encoding='utf-8') as f:
r=f.read()
with open(str(n)+'.jpg', 'wb', encoding='utf-8') as f:
f.write(r)
n+=1
And I got:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xea in position 5: invalid continuation byte
So I tried encoding='utf-16'
But it threw UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 44-45: illegal encoding
How can I make it? Thanks.

I believe the issue arises because you're attempting to encode a .jpg with utf-8.
You've posted only a small portion of your code, and I'm not sure what the other code does, but you should open the .jpg file as 'wb' without specifying an encoding.
If your "FILE DIRECTORY" file contains the .jpg, open it with 'rb' again, with no encoding.

Related

AWS reading utf-8 file pycaption.detect_format returns None

Python version: 3.5-slim-buster
Module: pycaption
When reading caption .srt that is us-ascii encoded from s3 bucket:
obj.get()['Body'].read()
print(pycaption.detect_format(body.decode()))
I get a desired response
<class 'pycaption.srt.SRTReader'>
But when reading utf-8 encoded s3 .srt file
pycaption can't detect format response:
None
I have tried:
obj.get()['Body'].read().decode('utf-8')
print(pycaption.detect_format(body))
But with no luck
In the end the issue was in DOS newlines CR/LF that I converted to DOS newlines CR/LF.

Reading from a file with printer commands and printing in Python3

I'm reading a file and I'm trying to send it to a PCL printer HP LJ2035 with
os.startfile("C:/tmp/tmp.txt", "print")
That file which I read is encoded in IMB-852 from DOS and has printer commands inside like ^[E^[(17U^[(s10.2H or ^[(s3B.
My main issue is here that os.startfile prints from Windows on a default printer and it prints also printer commands and doesn't interpret them. The file prints fine when I send it from program writen in Clarion (in DOS) directly to LPT port without Windows printing.
In Linux command file on tmp.txt file encoded in IBM-852 returns:
tmp.txt: HP PCL printer data
How should I read and print that file in Python 3 so that my printer interprets printer commands? Do I need to open that file in binary mode? If I do with open(my_file, 'rb') as f: I get an error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x86 in position 62: invalid start byte

python error while reading large files from a folder to copy to another file

i'm trying to read files in folder and copy specific part of each file to a new file using the below python code.but getting error as below
import glob
file=glob.glob("C:/Users/prasanth/Desktop/project/prgms/rank_free1/*.txt")
fp=[]
for b in file:
fp.append(open(b,'r'))
s1=''
for f in fp:
d=f.read().split('\t')
rank=d[0]
appname=d[1]
appid=d[2]
s1=appid+'\n'
file=open('C:/Users/prasanth/Desktop/project/prgms/appids_file.txt','a',encoding="utf-8")
file.write(s1)
file.close()
im getting the following error message
enter code here
Traceback (most recent call last):
File "appids.py", line 8, in <module>
d=f.read().split('\t')
File "C:\Users\prasanth\AppData\Local\Programs\Python\Python36-
32\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position
12307: character maps to <undefined>
From what I can see one of the files you are opening contains non-UTF8 characters so it can't be read into a string variable without appropriate information about its encoding.
To handle this you need to open the file for reading in binary mode and take care of the problem in your script.
You may put d=f.read().split('\t') in a try: except: construct and reopen the file in binary mode in the except: branch. Then handle in your script the problem with non-UTF8 characters it contains.

Python reading from non ascii file

I have a text file which contains the following character:
ΓΏ
When I try and read the file in I've tried both:
with open (file, "r") as myfile:
AND
with codecs.open(file, encoding='utf-8') as myfile:
with success. However when I try to read the file in as a string using:
file_string=myfile.read()
OR
file_string=myfile.readLine()
I keep getting this error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 11889: invalid start byte
Ideally I want it to ignore the character or subsitute it with '' or whitespace
I've come up with a solution. Just use python2 instead of python3. I still can't seem to get it to work in python3 though

Python opening a txt file converted from pdf

I downloaded from http://icdept.cgaux.org/pdf_files/English-Italian-Glossary-Nautical-Terms.pdf the pdf file and converted it to a txt file using pdf2txt ( downloaded from iTunes) I am trying to convert the contents of the file to a searchable Python dictionary(I am studying for an Italian sailing licence).
I am using simply to test whether I can get the text into a format that I can parse :
with open('English-Italian-Glossary-Nautical-Terms1.txt', 'r') as out_file:
with open("nautical_glossary.txt", 'w') as in_file:
for line in out_file:
in_file.write(line)
but constantly get an error:
Traceback (most recent call last):
File "/Users/admin/Desktop/untitled folder/nautical.py", line 4, in <module>
for line in out_file:
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfe in position 0: ordinal not in range(128)
I would appreciate some help understanding the error and a suggestion to resolve the problem.
I am not sure whether someone can suggest an obvious way to parse this particular file into a dictionary format?
This error tells you that the coding of the file is not the expected. See on wikipedia about it. In other words, he doesn't know what does 0xfe mean.
You should find the correct encoding of the file and open with it. I suspect it is utf-8, but I could be wrong. Did you tried to open the file to see how it is?
Read this and try this:
with open('English-Italian-Glossary-Nautical-Terms1.txt', 'r') as out_file:
with open("nautical_glossary.txt", 'w') as in_file:
for line in out_file.readlines():
in_file.write(line)

Resources