Add a folder with ~20K of images into Google Colaboratory - keras

I am working with cat breeds recognition with Keras and try to use Google Colaboratory for training on GPU. When I worked in PyCharm, I used a path to folder with images:
data_dir = '//home//kate//Рабочий стол//барахло линух минт//more_breeds_all_new'
And I can not understand, how can I download a folder with 19500 images to Colab instead loading pictures there one by one like Google offers in it's notebook.
I also have a folder with these images on Google Drive, but I also do not know how to use it as a full folder with it's path.

First: zip image folder in .zip .tar format,example folder_data.zip
and sync or upload it( folder_data.zip) to Google Drive.
Get google drive file_id of zip file( folder_data.zip) like 1iytA1n2z4go3uVCwE_vIKouTKyIDjEq
Second:
I recommend you use Pydrive to download your file from google drive to colab notebook VM . I download 500MB dataset for 5s.
1. install Pydrive
!pip install PyDrive
2. OAouth
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Authenticate and create the PyDrive client.
# This only needs to be done once in a notebook.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
code for download file from google drive
fileId = drive.CreateFile({'id': 'DRIVE_FILE_ID'}) #DRIVE_FILE_ID is file id example: 1iytA1n2z4go3uVCwE_vIKouTKyIDjEq
print fileId['title'] # folder_data.zip
fileId.GetContentFile('folder_data.zip') # Save Drive file as a local file
Finaly: unzip it to folder, example in here is
!unzip folder_data.zip -d ./
list file look like it
folder_data.zip
folder_data/
Cheer Mo Jihad

Related

accessing in the sub-folder of a folder in python

What I am trying to do is to access the images saved in my path that is E:/project/plane
but unable to get access.
I tried using glob.glob, all I'm getting is access to the subfolder not to the images inside the subfolder.
I also tried to take the name of the subfolder as an input combine it with the path but still can't get access to the folder.
Can anyone help me with how can I achieve this task?
here is my Python code:
import os
import glob
import cv2
path= "E:\project\%s"% input("filename")
print(path)
for folder in glob.glob(path + '*' , recursive=True):
print(folder)

How to open a file in Google App Engine using python 3.5?

I am able to load my txt file using the line below on my local machine.
lines=open(args['train_file1'],mode='r').read().split('\n')
args is dict which has the dir of training file.
Now i changed the working python version to 3.5 and now i am getting this error. I am clueless why this error is coming, the file is present in that directory.
FileNotFoundError: [Errno 2] No such file or directory: 'gs://bot_chat-227711/data/movie_lines.txt'
If I understood your question correctly, you are trying to read a file from Cloud Storage in App Engine.
You cannot do so directly by using the open function, as files in Cloud Storage are located in Buckets in the Cloud. Since you are using Python 3.5, you can use the Python Client library for GCS in order to work with files located in GCS .
This is a small example, that reads your file located in your Bucket, in a handler on an App Engine application:
from flask import Flask
from google.cloud import storage
app = Flask(__name__)
#app.route('/openFile')
def openFile():
client = storage.Client()
bucket = client.get_bucket('bot_chat-227711')
blob = bucket.get_blob('data/movie_lines.txt')
your_file_contents = blob.download_as_string()
return your_file_contents
if __name__ == '__main__':
app.run(host='127.0.0.1', port=8080, debug=True)
Note that you will need to add the line google-cloud-storage to your requirements.txt file in order to import and use this library.

Accessing files on google colab

i'm using Google Colaboratory IPython to do style transfer, after mounting my drive by running:
from google.colab import drive
drive.mount('/drive')
It was mounted, so i tried to cd into a directory, show the pwd and ls but it doesn't display the correct pwd
!cd "/content/drive/My Drive/"
!pwd
!ls
but it won't cd into the directory given, it only cd into 'content/'
also when i tried accessing some images using a "load_image() function in my code like below
def load_image(img_path, max_size=400, Shape=None):
image = Image.open(img_path).convert('RGB')
if max(image.size) > max_size:
size = max_size
else:
size = max(image.size)
if shape is not None:
size = shape
in_transform = transforms.Compose([transforms.Resize(size),
transforms.ToTensor(),
transforms.Normalize((0.485, 0.456, 0.406),
(0.229, 0.224, 0.225))])
image = in_transform(image)[:3,:,:].unsqueeze(0)
return image
#load image content
content = load_image('content/drive/My Drive/uche.jpg')
style = load_image('content/drive/My Drive/uche.jpg')
But this code throws an error when i tried to load image from the directory saying:
FileNotFoundError: [Errno 2] No such file or directory: 'content/drive/My Drive/uche.jpg'
Short answer: To change the working directory, use %cd or os.chdir rather than !cd.
The back-story is that ! commands are executed in a subshell, with its own independent working directory from the Python process running your code. But, what you want is to change the working directory of the Python process. That's what os.chdir will do, and %cd is a convenient alias that works in notebooks.
Putting it together, I think you want to write:
from google.colab import drive
drive.mount('/content/drive')
%cd /content/drive/My\ Drive

How to run PyPdf2 fileMerger within a shared folder on a network drive

I am trying to merge multiple files using PyPDf2 within a folder in my office's shared drive. However, my program never finishes running because I believe that it does not have permission to access the folder. Is there a way to allow access to it?
from PyPDF2 import PdfFileMerger
import os
path = "H:\\Accounting\\ME\\Attachments\\Pdf"
pdf_files = ["\\file1.pdf", "\\file2.pdf"]
merger = PdfFileMerger()
for files in pdf_files:
merger.append(path + files)
if not os.path.exists(path + "\\newMerged.pdf"):
merger.write(path + "\\newMerged.pdf")
merger.close()
print("done")

Creating a Spark RDD from a file located in Google Drive using Python on Colab.Research.Google

I have been successful in running Python 3 / Spark 2.2.1 program in Google's Colab.Research platform :
!apt-get update
!apt-get install openjdk-8-jdk-headless -qq > /dev/null
!wget -q http://apache.osuosl.org/spark/spark-2.2.1/spark-2.2.1-bin-hadoop2.7.tgz
!tar xf spark-2.2.1-bin-hadoop2.7.tgz
!pip install -q findspark
import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["SPARK_HOME"] = "/content/spark-2.2.1-bin-hadoop2.7"
import findspark
findspark.init()
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[*]").getOrCreate()
this works perfectly when I uploaded text files from my local computer to the Unix VM using
from google.colab import files
datafile = files.upload()
and read them as follows :
textRDD = spark.read.text('hobbit.txt').rdd
so far so good ..
My problem starts when I am trying to read a file that is lying in my Google drive colab directory.
Following instructions I have authenticated user and created a drive service
from google.colab import auth
auth.authenticate_user()
from googleapiclient.discovery import build
drive_service = build('drive', 'v3')
after which I have been able to access the file lying in the drive as follows :
file_id = '1RELUMtExjMTSfoWF765Hr8JwNCSL7AgH'
import io
from googleapiclient.http import MediaIoBaseDownload
request = drive_service.files().get_media(fileId=file_id)
downloaded = io.BytesIO()
downloader = MediaIoBaseDownload(downloaded, request)
done = False
while done is False:
# _ is a placeholder for a progress object that we ignore.
# (Our file is small, so we skip reporting progress.)
_, done = downloader.next_chunk()
downloaded.seek(0)
print('Downloaded file contents are: {}'.format(downloaded.read()))
Downloaded file contents are: b'The king beneath the mountain\r\nThe king of ......
even this works perfectly ..
downloaded.seek(0)
print(downloaded.read().decode('utf-8'))
and gets the data
The king beneath the mountain
The king of carven stone
The lord of silver fountain ...
where things FINALLY GO WRONG is where I try to grab this data and put it into a spark RDD
downloaded.seek(0)
tRDD = spark.read.text(downloaded.read().decode('utf-8'))
and I get the error ..
AnalysisException: 'Path does not exist: file:/content/The king beneath the mountain\ ....
Evidently, I am not using the correct method / parameters to read the file into spark. I have tried quite a few of the methods described
I would be very grateful if someone can help me figure out how to read this file for subsequent processing.
A complete solution to this problem is available in another StackOverflow question that is available at this URL.
Here is the notebook where this solution is demonstrated.
I have tested it and it works!
It seems that spark.read.text expects a file name. But you give it the file content instead. You can try either of these:
save it to a file then give the name
use just downloaded instead of downloaded.read().decode('utf-8')
You can also simplify downloading from Google Drive with pydrive. I gave an example here.
https://gist.github.com/korakot/d56c925ff3eccb86ea5a16726a70b224
Downloading is just
fid = drive.ListFile({'q':"title='hobbit.txt'"}).GetList()[0]['id']
f = drive.CreateFile({'id': fid})
f.GetContentFile('hobbit.txt')

Resources