Save my file on a shared drive google colab - python-3.x

We are working as a team on a Shared Drive, we are using Google Colab to support our code.
Here is the path : /content/drive/Shared drives/Projet_IE/Technique/BDD
We want to save the file (.json) in our drive, but it fails because of the blank in the "Shared drives"
How can we make this path understandable by the code, since "Shared drives" is impose by GDrive and the code doesn't understand the blank in the path ?
Thanks !

from google.colab import drive
import json
!mkdir -p /content/gdrive/My\ Drive/test
a = {'a':1,'b':2,'c':3}
with open("/content/gdrive/My Drive/test/your_json_file", "w") as fp:
json.dump(a , fp)

Related

How to search for Tensorflow Files in Google Drive?

I'm following the docs here: https://colab.research.google.com/github/google/earthengine-api/blob/master/python/examples/ipynb/TF_demo1_keras.ipynb#scrollTo=43-c0JNFI_m6 to learn how to use Tensorflow with GEE. One part of this tutorial is checking the existence of exported files. In the docs, the example code is:
fileNameSuffix = '.tfrecord.gz'
trainFilePath = 'gs://' + outputBucket + '/' + trainFilePrefix + fileNameSuffix
testFilePath = 'gs://' + outputBucket + '/' + testFilePrefix + fileNameSuffix
print('Found training file.' if tf.gfile.Exists(trainFilePath)
else 'No training file found.')
print('Found testing file.' if tf.gfile.Exists(testFilePath)
else 'No testing file found.')
In my case, I'm just exporting the files to Google Drive instead of Google Cloud bucket. How would I change trainFilePath and testFilePath to point to the Google Drive folder? FWIW, when I go into the Google Drive Folder, I do actually see files.
I would say you could use the Google Drive API to list files in you Google Drive instead of a GCS Bucket. You can find the documentation here.
You can also use PyDrive, which is pretty easy to understand. This is an example, you only have to adjust the query "q" to your needs:
from pydrive.drive import GoogleDrive
from pydrive.auth import GoogleAuth
gauth = GoogleAuth()
gauth.LocalWebserverAuth()
drive = GoogleDrive(gauth)
file_list = drive.ListFile({'q': "'root' in parents and trashed=false"}).GetList()
for file in file_list:
print(f"title: {file['title']}, id: {file['id']}")
Solution
You can use the great PyDrive library to acess your Drive files easily from Google collab and thus check which files you have or have been exported, etc.
The following piece of code is an example that lists all the files in the root directory of your Google Drive API. This has been found in this answer (yes, I am making this answer a community wiki post) :
# Install the library
!pip install -U -q PyDrive
# Install the rest of the services/libraries needed
import os
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
# choose a local (colab) directory to store the data.
local_download_path = os.path.expanduser('~/data')
try:
os.makedirs(local_download_path)
except: pass
# 2. Auto-iterate using the query syntax, in this case as I am using the main directory of Drive this would be root
# https://developers.google.com/drive/v2/web/search-parameters
file_list = drive.ListFile(
{'q': "'root' in parents"}).GetList()
for f in file_list:
# 3. Print the name and id of the files
print('title: %s, id: %s' % (f['title'], f['id']))
NOTE: when you do this colab will take you to another page to authentificate and make you insert the secret key. Just follow what the service indicates you to do, it is pretty straightforward.
I hope this has helped you. Let me know if you need anything else or if you did not understood something. :)

Importing scripts into a notebook in IBM WATSON STUDIO

I am doing PCA on CIFAR 10 image on IBM WATSON Studio Free version so I uploaded the python file for downloading the CIFAR10 on the studio
pic below.
But when I trying to import cache the following error is showing.
pic below-
After spending some time on google I find a solution but I can't understand it.
link
https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/add-script-to-notebook.html
the solution is as follows:-
Click the Add Data icon (Shows the Add Data icon), and then browse the script file or drag it into your notebook sidebar.
Click in an empty code cell in your notebook and then click the Insert to code link below the file. Take the returned string, and write to a file in the file system that comes with the runtime session.
To import the classes to access the methods in a script in your notebook, use the following command:
For Python:
from <python file name> import <class name>
I can't understand this line
` and write to a file in the file system that comes with the runtime session.``
Where can I find the file that comes with runtime session? Where is the file system located?
Can anyone plz help me in this with the details where to find that file
You have the import error because the script that you are trying to import is not available in your Python runtime's local filesystem. The files (cache.py, cifar10.py, etc.) that you uploaded are uploaded to the object storage bucket associated with the Watson Studio project. To use those files you need to make them available to the Python runtime for example by downloading the script to the runtimes local filesystem.
UPDATE: In the meanwhile there is an option to directly insert the StreamingBody objects. This will also have all the required credentials included. You can skip to writing it to a file in the local runtime filesystem section of this answer if you are using insert StreamingBody object option.
Or,
You can use the code snippet below to read the script in a StreamingBody object:
import types
import pandas as pd
from botocore.client import Config
import ibm_boto3
def __iter__(self): return 0
os_client= ibm_boto3.client(service_name='s3',
ibm_api_key_id='<IBM_API_KEY_ID>',
ibm_auth_endpoint="<IBM_AUTH_ENDPOINT>",
config=Config(signature_version='oauth'),
endpoint_url='<ENDPOINT>')
# Your data file was loaded into a botocore.response.StreamingBody object.
# Please read the documentation of ibm_boto3 and pandas to learn more about the possibilities to load the data.
# ibm_boto3 documentation: https://ibm.github.io/ibm-cos-sdk-python/
# pandas documentation: http://pandas.pydata.org/
streaming_body_1 = os_client.get_object(Bucket='<BUCKET>', Key='cifar.py')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(streaming_body_1, "__iter__"): streaming_body_1.__iter__ = types.MethodType( __iter__, streaming_body_1 )
And then write it to a file in the local runtime filesystem.
f = open('cifar.py', 'wb')
f.write(streaming_body_1.read())
This opens a file with write access and calls the write method to write to the file. You should then be able to simply import the script.
import cifar
Note: You can get the credentials like IBM_API_KEY_ID for the file by clicking on the Insert credentials option on the drop-down menu for your file.
The instructions that op found miss one crucial line of code. I followed them and was able to import modules but wasn't able to use any functions or classes in those modules. This was fixed by closing the files after writing. This part in the instrucitons:
f = open('<myScript>.py', 'wb')
f.write(streaming_body_1.read())
should instead be (at least this works in my case):
f = open('<myScript>.py', 'wb')
f.write(streaming_body_1.read())
f.close()
Hopefully this helps someone.

How to access an Excel file on Onedrive with Openpyxl

I want to open an Excel file (on Onedrive) with Openpyxl (Python). I received error trying this:
from openpyxl import load_workbook
file = r"https://d.docs.live.net/dd10xxxxxxxxxx"
wb = load_workbook(filename = file)
self.fp = io.open(file, filemode)
OSError: [Errno 22] Invalid argument: 'https://d.docs.live.net/dd10...
OpenPyXL cannot read/write files over http. It expects a file on a traditional filesystem, whether it's local, on a network share, etc.
If you're using OneDrive For Business you could try mapping it to a drive letter, or investigate the use of Google Sheets and the gspread library instead.
One alternative way is to use google drive rather than one drive then open google colab, mount the google drive and open the file from google drive.
from google.colab import drive
drive.mount('/content/gdrive/')

Loading local data google colab

I have a npy file, (largeFIle.npy) saved in the same "colab notebooks" folder on my google drive that I have my google colab notebook saved in. I'm trying to load the data into my notebook with the code below but I'm getting the error below. This code works fine when I run it locally on my laptop with the notebook in the same folder as the file. Is there something different I need to do when loading data with notebooks in google colab? I'm very new to colab.
code:
dataset_name = 'largeFIle.npy'
dataset = np.load(dataset_name, encoding='bytes')
Error:
FileNotFoundError Traceback (most recent call last)
<ipython-input-6-db02a0bfcf1d> in <module>()
----> 1 dataset = np.load(dataset_name, encoding='bytes')
/usr/local/lib/python3.6/dist-packages/numpy/lib/npyio.py in load(file, mmap_mode, allow_pickle, fix_imports, encoding)
370 own_fid = False
371 if isinstance(file, basestring):
--> 372 fid = open(file, "rb")
373 own_fid = True
374 elif is_pathlib_path(file):
FileNotFoundError: [Errno 2] No such file or directory: 'largeFIle.npy'
When you launch a new notebook on colab, it connects you with a remote machine for 12 hours and all you have there is the notebook and preloaded functions. To access your folders on drive, you need to connect the remote instance to your drive and authenticate it.
This thing bugged me for sometime when I was beginning too, so I'm creating a gist and I'll update it as I learn more. For your case, check out section 2 (Connecting with Drive). You don't have to edit or understand anything, just copy the cell and run it. It will run a bunch of functions and then give you an authentication link. You need to go to that link and sign-in with Google, you'll get an access token there. Put it back in the input box and press Enter. If it doesn't work or if there's some error, run the cell again.
In the next part I mount my drive to the folder '/drive'. So now, everything that's on your drive exists in this folder, including your notebook. Next, you can change your working directory. For me, I'm keeping all my notebooks in '/Colab' folder, edit it accordingly.
Hope it helps you. Feel free to suggest me edits to the gist as you learn more. :)
Have you set up your google drive with google colab with this method. After mounting Google drive use below command for your problem (Assuming you have stored largeFIle.npy in Colab Notebook folder.)
dataset = np.load('drive/Colab Notebooks/largeFIle.npy, encoding='bytes')

Watson Data Platform how to unzip the zip file in the data assets

How to unzip the zip file in the data assets of the Watson Data Platform?
from io import BytesIO
import zipfile
zip_ref = zipfile.ZipFile(BytesIO(streaming_body_1.read()), 'r')
zip_ref.extractall(WHICH DIRECTORY FOR THE DATA ASSETS)
zip_ref.close()
streaming_body_1 is the zip file streaming body object in the DATA ASSETS section. I uploaded the zip file to the DATA ASSETS.
How can I unzip the zip file in the Data Assets?
Since I don't know the exact Key Path of the DATA ASSETS section.
I am trying to do this in the jupyter notebook of the project.
Thank you!
When you upload a file to your project it is stored in the project's assigned cloud storage, which should now be Cloud Object Storage by default. (Check your project settings.) To work with uploaded files (which are just one type of data asset, there are others) in a notebook you'll have to first download it from the cloud storage to make it accessible in the kernel's file system and then perform the desired file operation (e.g. read, extract, ...)
Assuming you've uploaded your ZIP file you should be able to generate code that reads the ZIP file using the tooling:
click the 1010 (Data icon) on the upper right hand side
select "Insert to code" > "Insert StreamingBody object"
consume the StreamingBody as desired
I ran a quick test and it worked like a charm:
...
# "Insert StreamingBody object" generated code
...
from io import BytesIO
import zipfile
zip_ref = zipfile.ZipFile(BytesIO(streaming_body_1.read()), 'r')
print zip_ref.namelist()
zip_ref.close()
Edit 1: If your archive is a compressed tar file use the following code instead:
...
# "Insert StreamingBody object" generated code
...
import tarfile
from io import BytesIO
tf = tarfile.open(fileobj=BytesIO(streaming_body_1.read()), mode="r:gz")
tf.getnames()
Edit 2: To avoid the read timeout you'll have to change the generated code from
config=Config(signature_version='oauth'),
to
config=Config(signature_version='oauth',connect_timeout=50, read_timeout=70),
With those changes in place I was able to download and extract training_data.tar.gz from the repo you've mentioned.

Resources