Google Maps API using urllib.request cannot save jpg file - python-3.x

I am using Google Maps API, static map, and would like to save an image file in format JPG.
When I am saving a PNG using urllib.request.urlretrieve(url, 'map_46_6.png') this is working fine. However, when I am using urllib.request.urlretrieve(url, 'map_46_6.jpg'), this is not working. Opening the file gives an error « Not a JPG file: starts with 0x89 0x50 ». Changing manually the extension to PNG will resolve it.
The following is the code :
import urllib.request
url = 'http://maps.googleapis.com/maps/api/staticmap?scale=2&center=46.257632,6.108669&zoom=12&size=400x400&maptype=satellite&key=xxxxx'
urllib.request.urlretrieve(url, 'map_46_6.jpg')
As this code is part of a previously built pipeline, I would need the JPG files for the next steps.
My question is, is there a setting in Urllib, Google Maps or anything else that could result in this error? Thank you very much in advance !

I have found a solution. If one wants jpg, one needs to explicitly code the format, &format=jpg like the following:
import urllib.request
url = 'https://maps.googleapis.com/maps/api/staticmap?scale=2&center=46.257632,6.108669&zoom=16&size=400x400&maptype=satellite&format=jpg&key=xxxx'
urllib.request.urlretrieve(url, 'map_46_6.jpg')

Related

How to upload downloaded telegram media directly on google drive?

I'm working on the telethon download_media method for downloading images and videos. It is working fine (as expected). Now, I want to directly upload the download_media to my google drive folder.
Sample code looks something like:
from telethon import TelegramClient, events, sync
from telethon.tl.types import PeerUser, PeerChat, PeerChannel
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
gauth = GoogleAuth()
drive = GoogleDrive(gauth)
gfile = drive.CreateFile({'parents': [{'id': 'drive_directory_path'}]})
api_id = #####
api_hash = ##########
c = client.get_entity(PeerChannel(1234567)) # some random channel id
for m in client.iter_messages(c):
if m.photo:
# below is the one way and it works
# m.download_media("Media/")
# I want to try something like this - below code
gfile.SetContentFile(m.media)
gfile.Upload()
This code is not working. How Can I define the google drive object for download_media?
Thanks in advance. Kindly assist!
The main problem is that according to PyDrive's documentation, setContentFile() expects a string with the file's local path, and then it just uses open(), so you're meant to use this with local files. In your code you're trying to feed it the media file so it won't work.
To upload a bytes file with PyDrive you'll need to convert it to BytesIO and send it as the content. An example with a local file would look like this:
drive = GoogleDrive(gauth)
file = drive.CreateFile({'mimeType':'image/jpeg', 'title':'example.jpg'})
filebytes = open('example.jpg', 'rb').read()
file.content = io.BytesIO(filebytes)
file.Upload()
Normally you don't need to do it this way because setContentFile() does the opening and conversion for you, but this should give you the idea that if you get the bytes media file you can just convert it and assign it to file.content and then you can upload it.
Now, if you look at the Telethon documentation, you will see that download_media() takes a file argument which you can set to bytes:
file (str | file, optional):
The output file path, directory, or stream-like object. If the path exists and is a file, it will be overwritten. If file is the type bytes, it will be downloaded in-memory as a bytestring (e.g. file=bytes).
So you should be able to call m.download_media(file=bytes) to get a bytes object. Looking even deeper at the Telethon source code it appears that this does return a BytesIO object. With this in mind, you can try the following change in your loop:
for m in client.iter_messages(c):
if m.photo:
gfile.content = io.BytesIO(m.download_media(file=bytes))
gfile.Upload()
Note that I only tested the PyDrive side since I currently don't have access to the Telegram API, but looking at the docs I believe this should work. Let me know what happens.
Sources:
PyDrive docs and source
Telethon docs and source

Downloading an image from the web and saving

I am trying to download an image from Wikipedia and save it to a file locally (using Python 3.9.x). Following this link I tried:
import urllib.request
http = 'https://en.wikipedia.org/wiki/Abacus#/media/File:Abacus_4.jpg'
urllib.request.urlretrieve(http, 'test.jpg')
However, when I try to open this file (Mac OS) I get an error: The file “test.jpg” could not be opened. It may be damaged or use a file format that Preview doesn’t recognize.
I did some more search and came across this article which suggests modifying the User-Agent. Following that I modified the above code as follows:
import urllib.request
opener=urllib.request.build_opener()
opener.addheaders=[('User-Agent','Mozilla/5.0')]
urllib.request.install_opener(opener)
http = 'https://en.wikipedia.org/wiki/Abacus#/media/File:Abacus_4.jpg'
urllib.request.urlretrieve(http, 'test.jpg')
However, modifying the User-Agent did NOT help and I still get the same error while trying to open the file: The file “test.jpg” could not be opened. It may be damaged or use a file format that Preview doesn’t recognize.
Another piece of information: the downloaded file (that does not open) is 235 KB. But if I download the image manually (Right Click -> Save Image As...) it is 455 KB.
I was wondering what else am I missing? Thank you!
The problem is, you're trying to download the web page with the .jpg format.
This link you used is actually not a photo link, but a Web site contains a photograph.
That's why the photo size is 455KB and the size of the file you're downloading is 235KB.
Instead of this :
http = 'https://en.wikipedia.org/wiki/Abacus#/media/File:Abacus_4.jpg'
urllib.request.urlretrieve(http, 'test.jpg')
Use this :
http = 'https://upload.wikimedia.org/wikipedia/commons/thumb/b/be/Abacus_4.jpg/800px-Abacus_4.jpg'
urllib.request.urlretrieve(http, 'test.jpg')
It is better to open any photo you want to use first with the "open image in new tab" option in your browser and then copy the url.

Using PDF file in Keras OCR or PyTesseract - Python, is it possible?

I am using Keras OCR and PyTesseract and was wondering if it is possible to use PDF files as the image input.
If not, does anyone have a suggestion as to how to convert a very massive PDF file into PNG or another acceptable format?
Thank you!
No, as far as I know PyTesseract works only with images. You'll need to convert your pdf to images first.
By "very massive PDF" I'm assuming you mean a pdf with lots of pages. This is not an issue. You can use pdf2image library (see the docs here). The method convert_from_path has an output_folder argument that lets you specify the folder where all your generated images will be saved:
Output directory for the generated files, should be seen more as a
“working directory” than an output folder. The converted images will
be written there to save system memory.
You can later use them one by one instead of your pdf to work with PyTesseract. If you don't assign the returned list of images from convert_from_path you don't risk filling up your memory.
Otherwise, if you are willing to keep everything in memory you can use the returned pages directly, like so:
pages = convert_from_path(pdf_path)
for example, my code :
Python : 3.9
Macos: BigSur
from PIL import Image
from fonctions_images import *
from pdf2image import convert_from_path
path='/Users/yves/documents_1/'
fichier =path+'TOUTOU.pdf'
images = convert_from_path(fichier,500, transparent=True,grayscale=True,poppler_path='/usr/local/Cellar/poppler/21.12.0/bin')
for v in range(0,len(images)):
image=images[v]
image.save(path+"image.png", format="png")
test=path+"image.png"
img = cv2.imread(test) # to store image in memory
img = del_lines(path,img) # to supprime the lines
img = cv2.imread(path+"img_final_bin_1.png")
pytesseract.pytesseract.tesseract_cmd = "/usr/local/bin/tesseract"
d=pytesseract.image_to_data(img[3820:4050,2340:4000], lang='fra',config=custom_config,output_type='data.frame')

Image type Python: loaded a jpg, showing a png

I have been playing around with images in Python, just trying to understand how things work basically. I have noticed something odd and was wondering if anyone else could explain it.
I have an image 'duck.jpg' -
If I look at the properties I can see that it is a jpg image.
However, after importing into python using the follwoing convoluted way:
from PIL import Image
import io
with open('duck.jpg', 'rb') as f:
im = Image.open(io.BytesIO(f.read()))
f.close()
I get the following output after calling
im.format
'PNG'
Is there some sort of automatic conversion going on?

In python 3, requests.get().content works to download images, but not for this type of url

I've been using different versions of a web scraper to download anime images from a number of websites I like using beautifulsoup, urllib, and requests.
when I have the image link i use requests.get(name_of_url).content and write the file to a directory on my computer. It has been working for other sites but not on this new one. With this new site, the program runs fine, but the file is not written correctly, as I am unable to view it with any image viewers. Here is my code without all of the html parsing, just the url to image download section:
import requests
import os
img_data = requests.get("https://cs.sankakucomplex.com/data/ba/bc/babc83a0361198bb43a9b367273b3ef7.jpg?e=1510027320&m=euskBFzOAk-YJJjfbP-26A").content
completename = os.path.join('C:\\', 'Users', 'jesse', '.spyder-py3', 'Image_scraper','sankaku', 'testtesttest.jpg')
with open(completename, 'wb') as handler:
handler.write(img_data)
I'm fairly certain that the issue is coming from the different url structure this sight has. If you notice after the ".jpg?" there is more url information, which the other sites I was looking through did not previously have. I'm open to using urllib2 or another library, I'm just learning to use python to interface with html over the last 2-3 weeks. Any ideas or suggestions are appreciated~
thank you

Resources