How can i Load local images to train model in tensorflow - python-3.x

I am trying to build a CNN to differentiate between a car and a bicycle. However i saw the same example of a horse and a human in the Laurence's example here. But instead of loading the data from some library, i have downloaded close to 5000 images of cars and bicycle and segregated them as the folders suggested in the video. But how to load the local files to train my model? Am trying to use the below code. But it is giving me a file not found exception. Here is the link to my colab am trying to do.
import os
# Directory with our training cycle pictures
train_cycle_dir = os.path.join('C:/Users/User/Desktop/Tensorflow/PrivateProject/Images/training/cycle')
# Directory with our training car pictures
train_car_dir = os.path.join('C:/Users/User/Desktop/Tensorflow/PrivateProject/Images/training/cars')
# Directory with our training cycle pictures
validation_cycle_dir = os.path.join('C:/Users/User/Desktop/Tensorflow/PrivateProject/Images/validation/cycle')
# Directory with our training car pictures
validation_car_dir = os.path.join('C:/Users/User/Desktop/Tensorflow/PrivateProject/Images/validation/cars')

You can't access files which are on your computer directly from colab. If you have enough space on your google drive, you can upload them to your google drive and mount it like here or with the "Mount Drive" button on the files sidebar. There's also a button to upload files from your computer there. But if you upload them to colab, you have to do it again after 12 hours when the runtime resets. (you can read it here)

Related

Is there a way to use sklearn.datasets.load_files for image files

Trying to use custom folders with images instead of X, y = sklearn.datasets.load_digits(return_X_y=True) for sklearn image classification tasks.
load_files does what I need, but it seems to be created for text files. Any tips for working with image files, would be appreciated.
I have the image files stored in following structure
DataSet/label1/image1.png
DataSet/label1/image2.png
DataSet/label1/image3.png
DataSet/label2/image1.png
DataSet/label2/image2.png
I had the same task and found this thread: Using sklearn load_files() to load images from png as data
Hopefully, this helps you too.

Google COLAB free version saving Keras trained model

I saved keras trained model in google colab free version
model.save("my_model.h5")
i tried to retrieve model using below method
from keras.models import load_model
model = load_model('my_model.h5')
But it is throwing errors
OSError: Unable to open file (unable to open file: name = 'my_model.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)
will i able to retrive saved model from free google colab version, can you any help on this
I checked similar question in stackoverflow, i think these answers belongs to colab pro version
Otherwise, do i have to save model in specific path to local drive while training?
What is Problem
You are storing your model in runtime not in your google drive. After 12 hour runtime automatically deleted with data. So we have to save model in google drive.
How to store to Google Drive
First connect to google drive
from google.colab import drive
drive.mount('/content/drive')
Now you will find file explorer at left side which has drive directory. When you go inside that directory, it will take you to google drive.
Suppose I want to put my data in drive My Drive then
from keras.models import load_model
MODEL_PATH = './drive/My Drive/model.h5'
# Now save model in drive
model.save(MODEL_PATH)
# Load Model
model = load_model(MODEL_PATH)
When you open your drive, you will find file model.h5 in drive.

using keras's .flow_from_directory() on mounted s3 bucket in databricks

I am trying to build a convolutional neural network in databricks using Spark 2.4.4 and a Scala 2.11 backend in python. I have build CNN's before but this is my first time with using Spark(databricks) and AWS s3.
The files in AWS are oredered like this:
train_test_small/(train or test)/(0,1,2 or 3)/
And then a list of images in every directory corresponding to their category(0,1,2,3)
In order to access my files stored in the s3 bucket I mounted the bucket to databricks like this:
# load in the image files
WS_BUCKET_NAME = "sensored_bucket_name/video_topic_modelling/data/train_test_small"
MOUNT_NAME = "train_test_small"
dbutils.fs.mount("s3a://%s" % AWS_BUCKET_NAME, "/mnt/%s" % MOUNT_NAME)
display(dbutils.fs.ls("/mnt/%s" % MOUNT_NAME))
Upon using: display(dbutils.fs.mounts()) I can see the bucket mounted to:
MountInfo(mountPoint='/mnt/train_test_small', source='sensored_bucket_name/video_topic_modelling/data/train_test_small', encryptionType='')
I then try to access this mounted directory through keras's flow_from_directory() module using the following piece of code:
# create extra partition of the training data as a validation set
train_datagen=ImageDataGenerator(preprocessing_function=preprocess_input, validation_split=0) #included in our dependencies
# set scaling to most common shapes
train_generator=train_datagen.flow_from_directory('/mnt/train_test_small',
target_size=(320, 240),
color_mode='rgb',
batch_size=96,
class_mode='categorical',
subset='training')
#shuffle=True)
validation_generator=train_datagen.flow_from_directory('/mnt/train_test_small',
target_size=(320, 240),
color_mode='rgb',
batch_size=96,
class_mode='categorical',
subset='validation')
However this gives me the following error:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/train_test_small/train/'
I tried to figure this out using keras and databricks documentation but got no further. My best guess at the moment right now is that the keras flow_from_directory() is unable to detect mounted directories but I am not sure.
Anyone out there who does know how to apply the .flow_from_directory() module on a s3 mounted directory in databricks or who knows a good alternative? Help would be much appreciated!
I think you may be missing one more directory level indication to the flow_from_directory. From Keras documentation:
directory: string, path to the target directory. It should contain one subdirectory per class. Any PNG, JPG, BMP, PPM or TIF images inside each of the subdirectories directory tree will be included in the generator.
# set scaling to most common shapes
train_generator=train_datagen.flow_from_directory(
'/mnt/train_test_small/train', # <== add "train" folder
target_size=(320, 240),
...
validation_generator=train_datagen.flow_from_directory(
'/mnt/train_test_small/test', # <== add "test" folder
target_size=(320, 240),
....
Answer found. To access the direct path to the folder add /dbfs/mnt/train_test_small/train/

Google Colab is so slow while reading images from Google Drive

I have my own dataset for a deep learning project. I uploaded that into Google Drive and linked it to a Colab page. But Colab could read only 2-3 images in a second, where my computer can dozens of them. (I used imread to read images.)
There is no speed problem with model compiling process of keras, but only with reading images from Google Drive. Does anybody know a solution? Someone suffered of this problem too, but it's still unsolved: Google Colab very slow reading data (images) from Google Drive (I know this is kind of a duplication of the question in the link, but I reposted it because it is still unsolved. I hope this is not a violation of Stack Overflow rules.)
Edit: The code piece that I use for reading images:
def getDataset(path, classes, pixel=32, rate=0.8):
X = []
Y = []
i = 0
# getting images:
for root, _, files in os.walk(path):
for file in files:
imagePath = os.path.join(root, file)
className = os.path.basename(root)
try:
image = Image.open(imagePath)
image = np.asarray(image)
image = np.array(Image.fromarray(image.astype('uint8')).resize((pixel, pixel)))
image = image if len(image.shape) == 3 else color.gray2rgb(image)
X.append(image)
Y.append(classes[className])
except:
print(file, "could not be opened")
X = np.asarray(X, dtype=np.float32)
Y = np.asarray(Y, dtype=np.int16).reshape(1, -1)
return shuffleDataset(X, Y, rate)
I'd like to provide a more detailed answer about what unzipping the files actually looks like. This is the best way to speed up reading data because unzipping the file into the VM disk is SO much faster than reading each file individually from Drive.
Let's say you have the desired images or data in your local machine in a folder Data. Compress Data to get Data.zip and upload it to Drive.
Now, mount your drive and run the following command:
!unzip "/content/drive/My Drive/path/to/Data.Zip" -d "/content"
Simply amend all your image paths to go through /content/Data, and reading your images will be much much faster.
I recommend you to upload your file to GitHub then clone it to Colab. It can reduce my training time from 1 hour to 3 minutes.
Upload zip files to the drive. After transferring to colab unzip them. File copy overhead is cumbersome therefore you shouldn't copy masses of files instead copy a single zip and unzip.

How to save python code (part of the notebook) to file in GDrive from code

I am using Google Colabs for my research in machine learning.
I do many variations on a network and run them saving the results.
I have a part of my notebook that used to be separate file (network.py) At the start of a training session I used to save this file in a directory that has the results and logs etc. Now that this part of the code is in the notebook it is easier to edit etc, BUT I do not have a file to copy to the output directory that describes the model. how to i take a section of a google colab notebook and save the raw code as a python file?
Things I have tried:
%%writefile "my_file.py" - is able to write the file however the classes are not available to the runtime.

Resources