I am using Google Colaboratory to run my machine learning project. I wanted to import .csv into pandas and use it for further processing, but I am facing an error stating that the file is not found. Do I need to provide any authorization to access that file or is it mandatory to upload file into google colab? That file already exists in same folder on Google Drive as that of .ipynb notebook.
Code: pandas read_csv function to read file
Error: Unable to locate
Do I need to provide any authentication or something like that?
See the I/O sample notebook for examples showing how to work with local files and those stored on Drive.
Related
I did connect Google Collab with my Google Drive
from google.colab import drive
drive.mount('/content/drive/')
and I can already see my local Google Drive folders and files
pwd
>> /content
ls drive/MyDrive/
>> MyFolder
But now, how can I import local modules I have installed within /MyFolder/SubFolder/Source? Will I have to mark all the directories in-between as python modules adding __init__.py to all of them?
Currently, my notebook is located within /MyFolder, so I can easily import my modules with
from SubFolder.source.mypersonalmodule import *
ALTERNATIVELY
Is there a way to run my notebook from content/drive/MyDrive/MyFolder/?
Thank you for your help!
Just specify where your file is starting from drive.MyDrive.
For example, if I had a file test_function.py in the root of my Google Drive with the function square in it, I can import it by
from drive.MyDrive.test_function import square
Example image of execution and printing of paths:
And yes, you are able to run your Jupyter Notebooks from anywhere in your Google Drive. Just find the file, click on it, and click on "Open with Google Colaboratory" at the top when Google says there is "No preview available".
I am doing some work on the Covid-19 and I had to access .csv files on Github. (to be honest, the URL is https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series).
So, I went to this page and downloaded the .csv files that interested me directly on my hard drive: C: \ Users \ ... .csv
Then, what I do is that I import these files as pandas dataframes into a Jupyter notebook to work with Python, by coding for example: dataD = pd.read_csv ('C: / Users / path_of_my_file_on_my_computer ...').
It all works very well.
To make it easier to chat with other people, I was told that I should import the .csv files not on my C but on Google drive (https://drive.google.com/drive/my-drive), and then put there also the .ipynb files that I created in Jupyter notebook and then allow access to the people concerned.
So I created a folder on my drive (say, Covid-19) to put these .csv files there, but I don't understand what kind of Python code I am supposed to write at the beginning of my Python file to replace the simple previous instruction dataD = pd .read_csv ('C: / Users / path_of_my_file_on_my_computer ...'), so that the program reads the data directly from my Google drive and no longer from my C?
I have looked at various posts that seem to speak more or less about this issue, but I don't really understand what to do.
I hope my question is clear enough (I am attaching a picture of the situation in my Google drive, assuming that it provides interesting information ... It's in French)
Given that your files are already hosted in the cloud and you are planning a collaborative scenario I think the idea proposed by #Eric is actually smarter.
Approach 1:
Otherwise, if you can't rely on that data source, you will have to build an authorization flow for your script to access Google Drive resources. You can see here a complete documentation on how to build your Python script and interact with the Google Drive API.
Approach 2:
Although, the Google Drive API requires authorization to access files URLs, you can build a workaround. Google Drive will generate some export links that, if your file is publicly available, will be accessible without authorization. In this Stack Overflow answer you can find more details about it.
In your Python script you will be able to parse the URL request directly without accessing the file system nor google drive authorization flow.
We have few .py files on my local needs to stored/saved on fileStore path on dbfs. How can I achieve this?
Tried with dbUtils.fs module copy actions.
I tried the below code but did not work, I know something is not right with my source path. Or is there any better way of doing this? please advise
'''
dbUtils.fs.cp ("c:\\file.py", "dbfs/filestore/file.py")
'''
It sounds like you want to copy a file on local to the dbfs path of servers of Azure Databricks. However, due to the interactive interface of Notebook of Azure Databricks based on browser, it could not directly operate the files on local by programming on cloud.
So the solutions as below that you can try.
As #Jon said in the comment, you can follow the offical document Databricks CLI to install the databricks CLI via Python tool command pip install databricks-cli on local and then copy a file to dbfs.
Follow the offical document Accessing Data to import data via Drop files into or browse to files in the Import & Explore Data box on the landing page, but also recommended to use CLI, as the figure below.
Upload your specified files to Azure Blob Storage, then follow the offical document Data sources / Azure Blob Storage to do the operations include dbutils.fs.cp.
Hope it helps.
I downloaded images from a url using urlretrieve (urllib) in Google Colab. However, after downloading the images, I am not able to locate the images.
from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)
root_dir = "/content/gdrive/My Drive/"
base_dir = root_dir + 'my-images/'
Now you may access your Google Drive as a file system using standard python commands to read and write files. Don’t forget to append base_dir before root path(s) where you need to use. Here the base_dir variable expects a folder named my-images at the location pointed by root_dir.
Google Colab lets you access and use your Google Drive as a storage for a time constrained runtime session. It is not a disk storage on your local machine or physical computer. Notebooks run by connecting to virtual machines that have maximum lifetimes that can be as much as 12 hours.
Notebooks will also disconnect from VMs when left idle for too long which in turn also disconnects your Google Drive session. The code in the answer only shows one of many ways to mount your Google Drive in your colab runtime's virtual machine.
For more documentation, refer to this link and faq.
Suppose, your working Google Drive Folder is
Colab Notebooks/STUDY/
Its actual path is drive/My Drive/Colab Notebooks/STUDY
1) First mount and authenticate yourself with the following code
from google.colab import drive
drive.mount('/content/drive')
2) Second change your current folder to point working folder STUDY
os.chdir("drive/My Drive/Colab Notebooks/STUDY")
os.listdir()
Done!
I'm using Databricks on Azure and am using a library called OpenPyXl.
I'm running the sameple cosde shown here: and the last line of the code is:
wb.save('document.xlsx', as_template=False)
The code seems to run so I'm guessing it's storing the file somewhere on the cluster. Does anyone know where so that I can then transfer it to BLOB?
To save a file to the FileStore, put it in the /FileStore directory in DBFS:
dbutils.fs.put("/FileStore/my-stuff/my-file.txt", "Contents of my
file")
Note: The FileStore is a special folder within Databricks File System - DBFS where you can save files and have them accessible to your web browser. You can use the File Store to:
For more detials, refer "Databricks - The FileStore".
Hope this helps.