Watson Data Platform how to unzip the zip file in the data assets - python-3.x

How to unzip the zip file in the data assets of the Watson Data Platform?
from io import BytesIO
import zipfile
zip_ref = zipfile.ZipFile(BytesIO(streaming_body_1.read()), 'r')
zip_ref.extractall(WHICH DIRECTORY FOR THE DATA ASSETS)
zip_ref.close()
streaming_body_1 is the zip file streaming body object in the DATA ASSETS section. I uploaded the zip file to the DATA ASSETS.
How can I unzip the zip file in the Data Assets?
Since I don't know the exact Key Path of the DATA ASSETS section.
I am trying to do this in the jupyter notebook of the project.
Thank you!

When you upload a file to your project it is stored in the project's assigned cloud storage, which should now be Cloud Object Storage by default. (Check your project settings.) To work with uploaded files (which are just one type of data asset, there are others) in a notebook you'll have to first download it from the cloud storage to make it accessible in the kernel's file system and then perform the desired file operation (e.g. read, extract, ...)
Assuming you've uploaded your ZIP file you should be able to generate code that reads the ZIP file using the tooling:
click the 1010 (Data icon) on the upper right hand side
select "Insert to code" > "Insert StreamingBody object"
consume the StreamingBody as desired
I ran a quick test and it worked like a charm:
...
# "Insert StreamingBody object" generated code
...
from io import BytesIO
import zipfile
zip_ref = zipfile.ZipFile(BytesIO(streaming_body_1.read()), 'r')
print zip_ref.namelist()
zip_ref.close()
Edit 1: If your archive is a compressed tar file use the following code instead:
...
# "Insert StreamingBody object" generated code
...
import tarfile
from io import BytesIO
tf = tarfile.open(fileobj=BytesIO(streaming_body_1.read()), mode="r:gz")
tf.getnames()
Edit 2: To avoid the read timeout you'll have to change the generated code from
config=Config(signature_version='oauth'),
to
config=Config(signature_version='oauth',connect_timeout=50, read_timeout=70),
With those changes in place I was able to download and extract training_data.tar.gz from the repo you've mentioned.

Related

How to upload downloaded telegram media directly on google drive?

I'm working on the telethon download_media method for downloading images and videos. It is working fine (as expected). Now, I want to directly upload the download_media to my google drive folder.
Sample code looks something like:
from telethon import TelegramClient, events, sync
from telethon.tl.types import PeerUser, PeerChat, PeerChannel
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
gauth = GoogleAuth()
drive = GoogleDrive(gauth)
gfile = drive.CreateFile({'parents': [{'id': 'drive_directory_path'}]})
api_id = #####
api_hash = ##########
c = client.get_entity(PeerChannel(1234567)) # some random channel id
for m in client.iter_messages(c):
if m.photo:
# below is the one way and it works
# m.download_media("Media/")
# I want to try something like this - below code
gfile.SetContentFile(m.media)
gfile.Upload()
This code is not working. How Can I define the google drive object for download_media?
Thanks in advance. Kindly assist!
The main problem is that according to PyDrive's documentation, setContentFile() expects a string with the file's local path, and then it just uses open(), so you're meant to use this with local files. In your code you're trying to feed it the media file so it won't work.
To upload a bytes file with PyDrive you'll need to convert it to BytesIO and send it as the content. An example with a local file would look like this:
drive = GoogleDrive(gauth)
file = drive.CreateFile({'mimeType':'image/jpeg', 'title':'example.jpg'})
filebytes = open('example.jpg', 'rb').read()
file.content = io.BytesIO(filebytes)
file.Upload()
Normally you don't need to do it this way because setContentFile() does the opening and conversion for you, but this should give you the idea that if you get the bytes media file you can just convert it and assign it to file.content and then you can upload it.
Now, if you look at the Telethon documentation, you will see that download_media() takes a file argument which you can set to bytes:
file (str | file, optional):
The output file path, directory, or stream-like object. If the path exists and is a file, it will be overwritten. If file is the type bytes, it will be downloaded in-memory as a bytestring (e.g. file=bytes).
So you should be able to call m.download_media(file=bytes) to get a bytes object. Looking even deeper at the Telethon source code it appears that this does return a BytesIO object. With this in mind, you can try the following change in your loop:
for m in client.iter_messages(c):
if m.photo:
gfile.content = io.BytesIO(m.download_media(file=bytes))
gfile.Upload()
Note that I only tested the PyDrive side since I currently don't have access to the Telegram API, but looking at the docs I believe this should work. Let me know what happens.
Sources:
PyDrive docs and source
Telethon docs and source

Python: Access a zipped XL file without extracting it

Is there a way I can process an open the excel file within a zip file without first extracting it. I am not interested in modifying it.
from zipfile import ZipFile
from openpyxl import load_workbook
procFile ="C:\\Temp2\\XLFile-Demo-PW123.zip"
xl_file = "XLFile-Demo.xlsx"
myzip = ZipFile(procFile)
myzip.setpassword(bytes('123', 'utf-8'))
# line below returns an error
with load_workbook(myzip.open(xl_file)) as wb_obj:
print(wb_obj.sheetnames)
Most of the examples that perform this only directly open text files.
I would like to simulate the behaviour of archiving programs such as WinRar and 7zip.
Thanks

Download multiple Dropbox zip files from csv file

In have a .csv file that contains ~100 links to dropbox files. The current method I have downloads the files missing the ?dl=0 extension that seems to be critical
#import packages
import pandas as pd
import wget
#read the .csv file, iterate through each row and download it
data = pd.read_csv("BRAIN_IMAGING_SUMSTATS.csv")
for index, row in data.iterrows():
print(row['Links'])
filename = row['Links']
wget.download(filename)
Output:
https://www.dropbox.com/s/xjtu071g7o6gimg/metal_roi_volume_dec12_2018_pheno1.txt.zip?dl=0
https://www.dropbox.com/s/9oc9j8zhd4mn113/metal_roi_volume_dec12_2018_pheno2.txt.zip?dl=0
https://www.dropbox.com/s/0jkdrb76i7rixa5/metal_roi_volume_dec12_2018_pheno3.txt.zip?dl=0
https://www.dropbox.com/s/gu5p46bakgvozs5/metal_roi_volume_dec12_2018_pheno4.txt.zip?dl=0
https://www.dropbox.com/s/8zfpfscp8kdwu3h/metal_roi_volume_dec12_2018_pheno5.txt.zip?dl=0
These look like the correct links, but the download files are in the format
metal_roi_volume_dec12_2018_pheno1.txt.zip instead of metal_roi_volume_dec12_2018_pheno1.txt.zip?dl=0, so I cannot unzip them. Any ideas how to download the actual dropbox files?
By default (without extra URL parameters, or with dl=0 like in your example), Dropbox shared links point to an HTML preview page for the linked file, not the file data itself. Your code as-is will download the HTML, not the actual zip file data.
You can modify these links for direct file access though, as documented in this Dropbox help center article.
So, you should modify the link, e.g., to use raw=1 instead of dl=0, before calling wget.download on it.
Quick fix would be something like:
#import packages
import pandas as pd
import wget
import os
from urllib.parse import urlparse
#read the .csv file, iterate through each row and download it
data = pd.read_csv("BRAIN_IMAGING_SUMSTATS.csv")
for index, row in data.iterrows():
print(row['Links'])
filename = row['Links']
parsed = urlparse(filename)
fname = os.path.basename(parsed.path)
wget.download(filename, fname)
Basically, you extract filename from the URL and then use that filename as the output param in the wget.download fn.

Importing scripts into a notebook in IBM WATSON STUDIO

I am doing PCA on CIFAR 10 image on IBM WATSON Studio Free version so I uploaded the python file for downloading the CIFAR10 on the studio
pic below.
But when I trying to import cache the following error is showing.
pic below-
After spending some time on google I find a solution but I can't understand it.
link
https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/add-script-to-notebook.html
the solution is as follows:-
Click the Add Data icon (Shows the Add Data icon), and then browse the script file or drag it into your notebook sidebar.
Click in an empty code cell in your notebook and then click the Insert to code link below the file. Take the returned string, and write to a file in the file system that comes with the runtime session.
To import the classes to access the methods in a script in your notebook, use the following command:
For Python:
from <python file name> import <class name>
I can't understand this line
` and write to a file in the file system that comes with the runtime session.``
Where can I find the file that comes with runtime session? Where is the file system located?
Can anyone plz help me in this with the details where to find that file
You have the import error because the script that you are trying to import is not available in your Python runtime's local filesystem. The files (cache.py, cifar10.py, etc.) that you uploaded are uploaded to the object storage bucket associated with the Watson Studio project. To use those files you need to make them available to the Python runtime for example by downloading the script to the runtimes local filesystem.
UPDATE: In the meanwhile there is an option to directly insert the StreamingBody objects. This will also have all the required credentials included. You can skip to writing it to a file in the local runtime filesystem section of this answer if you are using insert StreamingBody object option.
Or,
You can use the code snippet below to read the script in a StreamingBody object:
import types
import pandas as pd
from botocore.client import Config
import ibm_boto3
def __iter__(self): return 0
os_client= ibm_boto3.client(service_name='s3',
ibm_api_key_id='<IBM_API_KEY_ID>',
ibm_auth_endpoint="<IBM_AUTH_ENDPOINT>",
config=Config(signature_version='oauth'),
endpoint_url='<ENDPOINT>')
# Your data file was loaded into a botocore.response.StreamingBody object.
# Please read the documentation of ibm_boto3 and pandas to learn more about the possibilities to load the data.
# ibm_boto3 documentation: https://ibm.github.io/ibm-cos-sdk-python/
# pandas documentation: http://pandas.pydata.org/
streaming_body_1 = os_client.get_object(Bucket='<BUCKET>', Key='cifar.py')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(streaming_body_1, "__iter__"): streaming_body_1.__iter__ = types.MethodType( __iter__, streaming_body_1 )
And then write it to a file in the local runtime filesystem.
f = open('cifar.py', 'wb')
f.write(streaming_body_1.read())
This opens a file with write access and calls the write method to write to the file. You should then be able to simply import the script.
import cifar
Note: You can get the credentials like IBM_API_KEY_ID for the file by clicking on the Insert credentials option on the drop-down menu for your file.
The instructions that op found miss one crucial line of code. I followed them and was able to import modules but wasn't able to use any functions or classes in those modules. This was fixed by closing the files after writing. This part in the instrucitons:
f = open('<myScript>.py', 'wb')
f.write(streaming_body_1.read())
should instead be (at least this works in my case):
f = open('<myScript>.py', 'wb')
f.write(streaming_body_1.read())
f.close()
Hopefully this helps someone.

Magento: "Image does not exist"

I'm importing a CSV file in Magento (version 1.9).
I receive the error: 'Image does not exist'.
I've tried to do everything I could find on the internet.
The template I'm using for upload is the default template taken from my export folder.
I've added the / before the image name and I've also saved the file as UTF-8 format.
Any advice would help.
Use advanced profiler
System > Import/Export > Dataflow – Profiles
You only need to include the attributes that are required, which is just the SKU. Plus the appropiate image attributes. Plus labels if you want to go all out.
When you are creating your new profile, enter the following settings:
Now you can hit save! With our Profile now complete, we just need to create the folder media/import. This is where you will be storing all your images awaiting import.
When uploading images, they need to be within a folder called media/import. Once saved to that folder you can then reference them relatively. By that I mean if your image is in media/import/test.jpg in your csv reference it as /test.jpg. It’s as easy as that.
Please check this link for more information
Import products using csv
in the Default Import
first move all the images in media/import folder and then use '/imagename' in csv and then import.
And give the 777 permission to the import folder.
Let me know if you have any query....
check 3 point before upload csv file in Magento
create media > import folder and place all images inside import
folder import folder should have 777 permission
the path of images should be /desert-002.jpg
It may issue with image path in CSV if a image path in CSV is abg/test.jpg then it path in Dir is ..media/import/abg/test.jpg.also check image extension letter issue. Suppose your image extension I'd JPG and you rewrite in CSV is jpg .then it show image not exits
Your file template must look like this:
sku,image
product-001,/product_image.jpg
This file must exist: yourdocroot/media/import/product_image.jpg
More detail please read this method:
Mage_Catalog_Model_Convert_Adapter_Product::saveImageDataRow
You will see these lines:
$imageFile = trim($importData['_media_image']);
$imageFile = ltrim($imageFile, DS);
$imageFilePath = Mage::getBaseDir('media') . DS . 'import' . DS . $imageFile;
I hope this help!!!

Resources