How to upload downloaded telegram media directly on google drive? - python-3.x

I'm working on the telethon download_media method for downloading images and videos. It is working fine (as expected). Now, I want to directly upload the download_media to my google drive folder.
Sample code looks something like:
from telethon import TelegramClient, events, sync
from telethon.tl.types import PeerUser, PeerChat, PeerChannel
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
gauth = GoogleAuth()
drive = GoogleDrive(gauth)
gfile = drive.CreateFile({'parents': [{'id': 'drive_directory_path'}]})
api_id = #####
api_hash = ##########
c = client.get_entity(PeerChannel(1234567)) # some random channel id
for m in client.iter_messages(c):
if m.photo:
# below is the one way and it works
# m.download_media("Media/")
# I want to try something like this - below code
gfile.SetContentFile(m.media)
gfile.Upload()
This code is not working. How Can I define the google drive object for download_media?
Thanks in advance. Kindly assist!

The main problem is that according to PyDrive's documentation, setContentFile() expects a string with the file's local path, and then it just uses open(), so you're meant to use this with local files. In your code you're trying to feed it the media file so it won't work.
To upload a bytes file with PyDrive you'll need to convert it to BytesIO and send it as the content. An example with a local file would look like this:
drive = GoogleDrive(gauth)
file = drive.CreateFile({'mimeType':'image/jpeg', 'title':'example.jpg'})
filebytes = open('example.jpg', 'rb').read()
file.content = io.BytesIO(filebytes)
file.Upload()
Normally you don't need to do it this way because setContentFile() does the opening and conversion for you, but this should give you the idea that if you get the bytes media file you can just convert it and assign it to file.content and then you can upload it.
Now, if you look at the Telethon documentation, you will see that download_media() takes a file argument which you can set to bytes:
file (str | file, optional):
The output file path, directory, or stream-like object. If the path exists and is a file, it will be overwritten. If file is the type bytes, it will be downloaded in-memory as a bytestring (e.g. file=bytes).
So you should be able to call m.download_media(file=bytes) to get a bytes object. Looking even deeper at the Telethon source code it appears that this does return a BytesIO object. With this in mind, you can try the following change in your loop:
for m in client.iter_messages(c):
if m.photo:
gfile.content = io.BytesIO(m.download_media(file=bytes))
gfile.Upload()
Note that I only tested the PyDrive side since I currently don't have access to the Telegram API, but looking at the docs I believe this should work. Let me know what happens.
Sources:
PyDrive docs and source
Telethon docs and source

Related

Google Maps API using urllib.request cannot save jpg file

I am using Google Maps API, static map, and would like to save an image file in format JPG.
When I am saving a PNG using urllib.request.urlretrieve(url, 'map_46_6.png') this is working fine. However, when I am using urllib.request.urlretrieve(url, 'map_46_6.jpg'), this is not working. Opening the file gives an error « Not a JPG file: starts with 0x89 0x50 ». Changing manually the extension to PNG will resolve it.
The following is the code :
import urllib.request
url = 'http://maps.googleapis.com/maps/api/staticmap?scale=2&center=46.257632,6.108669&zoom=12&size=400x400&maptype=satellite&key=xxxxx'
urllib.request.urlretrieve(url, 'map_46_6.jpg')
As this code is part of a previously built pipeline, I would need the JPG files for the next steps.
My question is, is there a setting in Urllib, Google Maps or anything else that could result in this error? Thank you very much in advance !
I have found a solution. If one wants jpg, one needs to explicitly code the format, &format=jpg like the following:
import urllib.request
url = 'https://maps.googleapis.com/maps/api/staticmap?scale=2&center=46.257632,6.108669&zoom=16&size=400x400&maptype=satellite&format=jpg&key=xxxx'
urllib.request.urlretrieve(url, 'map_46_6.jpg')

Writing BytesIO objects to in-memory Zipfile

I have a Flask-based webapp that I'm trying to do everything in-memory without touching the disk at all.
I have created an in-memory Word doc (using python-docx library) and an in-memory Excel file (using openpyxl). They are both of type BytesIO. I want to return them both with Flask, so I want to zip them up and return the zipfile to the user's browser.
My code is as follows:
inMemory = io.BytesIO()
zipfileObj = zipfile.ZipFile(inMemory, mode='w', compression=zipfile.ZIP_DEFLATED)
try:
print('adding files to zip archive')
zipfileObj.write(virtualWorkbook)
zipfileObj.write(virtualWordDoc)
When the zipfile tries to write the virtualWorkbook I get the following error:
{TypeError}stat: path should be string, bytes, os.PathLike or integer, not BytesIO
I have skimmed the entirety of the internet but have come up empty-handed, so if someone could explain what I'm doing wrong that would be amazing
Seems like it's easier to mount tmpfs/ramdisk/smth to a specific directory like here, and just use tempfile.NamedTemporaryFile() as usual.
You could use the writestr method. It accepts both string and bytes.
zipfileObj.write(zipfile.ZipInfo('folder/name.docx'),
virtualWorkbook.read())

Read a utf-16LE file directly in cloud function -python/GCP

I have a csv file with utf-16le encoding, I tried to open it in cloud function using
import pandas as pd
from io import StringIO as sio
with open("gs://bucket_name/my_file.csv", "r", encoding="utf16") as f:
read_all_once = f.read()
read_all_once = read_all_once.replace('"', "")
file_like = sio(read_all_once)
df = pd.read_csv(file_like, sep=";", skiprows=5)
I get the error that the file is not found on location. what is the issue? When I run the same code locally with a local path it works.
Also when the file is in utf-8 encoding I can read it directly with
df = pd.read_csv("gs://bucket_name/my_file.csv, delimiter=";", encoding="utf-8", skiprows=0,low_memory=False)
I need to know if I can read the utf16 file directly with pd.read_csv()? if no, how do I make with open() recognize the path?
Thanks in advance!
Yes, you can read the utf-16 csv file directly with the pd.read_csv() method.
For the method to work please make sure that the service account attached to your function has access to read the CSV file in the Cloud Storage bucket.
Please ensure whether the encoding of the csv file you are using is “utf-16” or “utf-16le” or “utf-16be” and use the appropriate one in the method.
I used python 3.7 runtime.
My main.py file and requirement.txt file looks as below. You can
modify the main.py according to your use case.
main.py
import pandas as pd
def hello_world(request):
#please change the file's URI
data = pd.read_csv('gs://bucket_name/file.csv', encoding='utf-16le')
print (data)
return f'check the results in the logs'
requirement.txt
pandas==1.1.0
gcsfs==0.6.2

Importing scripts into a notebook in IBM WATSON STUDIO

I am doing PCA on CIFAR 10 image on IBM WATSON Studio Free version so I uploaded the python file for downloading the CIFAR10 on the studio
pic below.
But when I trying to import cache the following error is showing.
pic below-
After spending some time on google I find a solution but I can't understand it.
link
https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/add-script-to-notebook.html
the solution is as follows:-
Click the Add Data icon (Shows the Add Data icon), and then browse the script file or drag it into your notebook sidebar.
Click in an empty code cell in your notebook and then click the Insert to code link below the file. Take the returned string, and write to a file in the file system that comes with the runtime session.
To import the classes to access the methods in a script in your notebook, use the following command:
For Python:
from <python file name> import <class name>
I can't understand this line
` and write to a file in the file system that comes with the runtime session.``
Where can I find the file that comes with runtime session? Where is the file system located?
Can anyone plz help me in this with the details where to find that file
You have the import error because the script that you are trying to import is not available in your Python runtime's local filesystem. The files (cache.py, cifar10.py, etc.) that you uploaded are uploaded to the object storage bucket associated with the Watson Studio project. To use those files you need to make them available to the Python runtime for example by downloading the script to the runtimes local filesystem.
UPDATE: In the meanwhile there is an option to directly insert the StreamingBody objects. This will also have all the required credentials included. You can skip to writing it to a file in the local runtime filesystem section of this answer if you are using insert StreamingBody object option.
Or,
You can use the code snippet below to read the script in a StreamingBody object:
import types
import pandas as pd
from botocore.client import Config
import ibm_boto3
def __iter__(self): return 0
os_client= ibm_boto3.client(service_name='s3',
ibm_api_key_id='<IBM_API_KEY_ID>',
ibm_auth_endpoint="<IBM_AUTH_ENDPOINT>",
config=Config(signature_version='oauth'),
endpoint_url='<ENDPOINT>')
# Your data file was loaded into a botocore.response.StreamingBody object.
# Please read the documentation of ibm_boto3 and pandas to learn more about the possibilities to load the data.
# ibm_boto3 documentation: https://ibm.github.io/ibm-cos-sdk-python/
# pandas documentation: http://pandas.pydata.org/
streaming_body_1 = os_client.get_object(Bucket='<BUCKET>', Key='cifar.py')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(streaming_body_1, "__iter__"): streaming_body_1.__iter__ = types.MethodType( __iter__, streaming_body_1 )
And then write it to a file in the local runtime filesystem.
f = open('cifar.py', 'wb')
f.write(streaming_body_1.read())
This opens a file with write access and calls the write method to write to the file. You should then be able to simply import the script.
import cifar
Note: You can get the credentials like IBM_API_KEY_ID for the file by clicking on the Insert credentials option on the drop-down menu for your file.
The instructions that op found miss one crucial line of code. I followed them and was able to import modules but wasn't able to use any functions or classes in those modules. This was fixed by closing the files after writing. This part in the instrucitons:
f = open('<myScript>.py', 'wb')
f.write(streaming_body_1.read())
should instead be (at least this works in my case):
f = open('<myScript>.py', 'wb')
f.write(streaming_body_1.read())
f.close()
Hopefully this helps someone.

Watson Data Platform how to unzip the zip file in the data assets

How to unzip the zip file in the data assets of the Watson Data Platform?
from io import BytesIO
import zipfile
zip_ref = zipfile.ZipFile(BytesIO(streaming_body_1.read()), 'r')
zip_ref.extractall(WHICH DIRECTORY FOR THE DATA ASSETS)
zip_ref.close()
streaming_body_1 is the zip file streaming body object in the DATA ASSETS section. I uploaded the zip file to the DATA ASSETS.
How can I unzip the zip file in the Data Assets?
Since I don't know the exact Key Path of the DATA ASSETS section.
I am trying to do this in the jupyter notebook of the project.
Thank you!
When you upload a file to your project it is stored in the project's assigned cloud storage, which should now be Cloud Object Storage by default. (Check your project settings.) To work with uploaded files (which are just one type of data asset, there are others) in a notebook you'll have to first download it from the cloud storage to make it accessible in the kernel's file system and then perform the desired file operation (e.g. read, extract, ...)
Assuming you've uploaded your ZIP file you should be able to generate code that reads the ZIP file using the tooling:
click the 1010 (Data icon) on the upper right hand side
select "Insert to code" > "Insert StreamingBody object"
consume the StreamingBody as desired
I ran a quick test and it worked like a charm:
...
# "Insert StreamingBody object" generated code
...
from io import BytesIO
import zipfile
zip_ref = zipfile.ZipFile(BytesIO(streaming_body_1.read()), 'r')
print zip_ref.namelist()
zip_ref.close()
Edit 1: If your archive is a compressed tar file use the following code instead:
...
# "Insert StreamingBody object" generated code
...
import tarfile
from io import BytesIO
tf = tarfile.open(fileobj=BytesIO(streaming_body_1.read()), mode="r:gz")
tf.getnames()
Edit 2: To avoid the read timeout you'll have to change the generated code from
config=Config(signature_version='oauth'),
to
config=Config(signature_version='oauth',connect_timeout=50, read_timeout=70),
With those changes in place I was able to download and extract training_data.tar.gz from the repo you've mentioned.

Resources