Accessing files on google colab - conv-neural-network

i'm using Google Colaboratory IPython to do style transfer, after mounting my drive by running:
from google.colab import drive
drive.mount('/drive')
It was mounted, so i tried to cd into a directory, show the pwd and ls but it doesn't display the correct pwd
!cd "/content/drive/My Drive/"
!pwd
!ls
but it won't cd into the directory given, it only cd into 'content/'
also when i tried accessing some images using a "load_image() function in my code like below
def load_image(img_path, max_size=400, Shape=None):
image = Image.open(img_path).convert('RGB')
if max(image.size) > max_size:
size = max_size
else:
size = max(image.size)
if shape is not None:
size = shape
in_transform = transforms.Compose([transforms.Resize(size),
transforms.ToTensor(),
transforms.Normalize((0.485, 0.456, 0.406),
(0.229, 0.224, 0.225))])
image = in_transform(image)[:3,:,:].unsqueeze(0)
return image
#load image content
content = load_image('content/drive/My Drive/uche.jpg')
style = load_image('content/drive/My Drive/uche.jpg')
But this code throws an error when i tried to load image from the directory saying:
FileNotFoundError: [Errno 2] No such file or directory: 'content/drive/My Drive/uche.jpg'

Short answer: To change the working directory, use %cd or os.chdir rather than !cd.
The back-story is that ! commands are executed in a subshell, with its own independent working directory from the Python process running your code. But, what you want is to change the working directory of the Python process. That's what os.chdir will do, and %cd is a convenient alias that works in notebooks.
Putting it together, I think you want to write:
from google.colab import drive
drive.mount('/content/drive')
%cd /content/drive/My\ Drive

Related

accessing in the sub-folder of a folder in python

What I am trying to do is to access the images saved in my path that is E:/project/plane
but unable to get access.
I tried using glob.glob, all I'm getting is access to the subfolder not to the images inside the subfolder.
I also tried to take the name of the subfolder as an input combine it with the path but still can't get access to the folder.
Can anyone help me with how can I achieve this task?
here is my Python code:
import os
import glob
import cv2
path= "E:\project\%s"% input("filename")
print(path)
for folder in glob.glob(path + '*' , recursive=True):
print(folder)

FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\DELL\\coil-20-unproc\\obj1_0.png'

I am trying to divide the image dataset into train and test. For this I am copying the images from one folder to other in python. For this I have given the address of both source and destination. But the problem arises when it displays the above error. It can not find the image files to copy. Although I have given correct image address which is "C:\Users\DELL\coil-20-unproc\imagename". Still can't copy the images
original_dataset_dir=r"C:\Users\DELL\coil-20-unproc"
# Copy object1 images to train_obj1_dir
fnames = ['obj1_{}.png'.format(i) for i in range(0,72)]
for fname in fnames:
src = os.path.join(original_dataset_dir, fname)
dst = os.path.join(train_obj1_dir, fname)
shutil.copyfile(src, dst)
Jupyter notebook and Jupyter lab refer to relative path from location that it was started up. You can try these
Copy the file to your startup directory.
(You could enter !pwd in a cell and execute to find put your startup directory)
Create a link from a file in your startup directory to that file.

Colab: No such file or directory

I just found a problem about directory making on Colab.
First, I checked the current working directory:
import os
!pwd
/content/
Then, I created a sub-directory and checked it as follows.
data_path = '/content/kaggle_original_data_cats_dogs'
!mkdir data_path
!ls /content/
adc.json datalab data_path sample_data sampleSubmission.csv train.zip
Here, we can see that data_path is the third member of /content/.
However, when I tried to change the working directory to data_path, I got:
os.chdir(data_path)
FileNotFoundError: [Errno 2] No such file or directory: '/content/kaggle_original_data_cats_dogs'
So far...I can't figure out what just happened? Is there anything wrong with the lines above?
The mistake is here
!mkdir data_path
It should be this instead:
!mkdir $data_path

Creating a Spark RDD from a file located in Google Drive using Python on Colab.Research.Google

I have been successful in running Python 3 / Spark 2.2.1 program in Google's Colab.Research platform :
!apt-get update
!apt-get install openjdk-8-jdk-headless -qq > /dev/null
!wget -q http://apache.osuosl.org/spark/spark-2.2.1/spark-2.2.1-bin-hadoop2.7.tgz
!tar xf spark-2.2.1-bin-hadoop2.7.tgz
!pip install -q findspark
import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["SPARK_HOME"] = "/content/spark-2.2.1-bin-hadoop2.7"
import findspark
findspark.init()
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[*]").getOrCreate()
this works perfectly when I uploaded text files from my local computer to the Unix VM using
from google.colab import files
datafile = files.upload()
and read them as follows :
textRDD = spark.read.text('hobbit.txt').rdd
so far so good ..
My problem starts when I am trying to read a file that is lying in my Google drive colab directory.
Following instructions I have authenticated user and created a drive service
from google.colab import auth
auth.authenticate_user()
from googleapiclient.discovery import build
drive_service = build('drive', 'v3')
after which I have been able to access the file lying in the drive as follows :
file_id = '1RELUMtExjMTSfoWF765Hr8JwNCSL7AgH'
import io
from googleapiclient.http import MediaIoBaseDownload
request = drive_service.files().get_media(fileId=file_id)
downloaded = io.BytesIO()
downloader = MediaIoBaseDownload(downloaded, request)
done = False
while done is False:
# _ is a placeholder for a progress object that we ignore.
# (Our file is small, so we skip reporting progress.)
_, done = downloader.next_chunk()
downloaded.seek(0)
print('Downloaded file contents are: {}'.format(downloaded.read()))
Downloaded file contents are: b'The king beneath the mountain\r\nThe king of ......
even this works perfectly ..
downloaded.seek(0)
print(downloaded.read().decode('utf-8'))
and gets the data
The king beneath the mountain
The king of carven stone
The lord of silver fountain ...
where things FINALLY GO WRONG is where I try to grab this data and put it into a spark RDD
downloaded.seek(0)
tRDD = spark.read.text(downloaded.read().decode('utf-8'))
and I get the error ..
AnalysisException: 'Path does not exist: file:/content/The king beneath the mountain\ ....
Evidently, I am not using the correct method / parameters to read the file into spark. I have tried quite a few of the methods described
I would be very grateful if someone can help me figure out how to read this file for subsequent processing.
A complete solution to this problem is available in another StackOverflow question that is available at this URL.
Here is the notebook where this solution is demonstrated.
I have tested it and it works!
It seems that spark.read.text expects a file name. But you give it the file content instead. You can try either of these:
save it to a file then give the name
use just downloaded instead of downloaded.read().decode('utf-8')
You can also simplify downloading from Google Drive with pydrive. I gave an example here.
https://gist.github.com/korakot/d56c925ff3eccb86ea5a16726a70b224
Downloading is just
fid = drive.ListFile({'q':"title='hobbit.txt'"}).GetList()[0]['id']
f = drive.CreateFile({'id': fid})
f.GetContentFile('hobbit.txt')

How to add a module folder /tar.gz to nodes in Pyspark

I am running pyspark in Ipython Notebook after doing following configuration
export PYSPARK_DRIVER_PYTHON=/usr/local/bin/jupyter
export PYSPARK_DRIVER_PYTHON_OPTS="notebook--NotebookApp.open_browser=False --NotebookApp.ip='*' --NotebookApp.port=8880"
export PYSPARK_PYTHON=/usr/bin/python
I am having a custom udf function, which makes use of a module called mzgeohash. But, I am getting module not found error, I guess this module might be missing in workers / nodes .I tried to add sc.addpyfile and all. But, what will be the effective way to add a cloned folder or tar.gz python module in this case , from Ipython .
Here is how I do it, basically the idea is to create a zip of all the files in your module and pass it to sc.addPyFile() :
import dictconfig
import zipfile
def ziplib():
libpath = os.path.dirname(__file__) # this should point to your packages directory
zippath = '/tmp/mylib-' + rand_str(6) + '.zip' # some random filename in writable directory
zf = zipfile.PyZipFile(zippath, mode='w')
try:
zf.debug = 3 # making it verbose, good for debugging
zf.writepy(libpath)
return zippath # return path to generated zip archive
finally:
zf.close()
...
zip_path = ziplib() # generate zip archive containing your lib
sc.addPyFile(zip_path) # add the entire archive to SparkContext
...
os.remove(zip_path) # don't forget to remove temporary file, preferably in "finally" clause

Resources