Issue with shuffling image to a hdf5 file - python-3.x

I want to shuffle my image before putting it into the hdf5 file, but got an error in the computation. As a recent learner, I can't figure this out even afer reading the hdf5 documentation. Kindly guide me.
from random import shuffle
import glob
shuffle_data = True # shuffle the addresses before saving
hdf5_path = 'Cat vs Dog/dataset.hdf5' # address to where you want to save the hdf5 file
cat_dog_train_path = 'Cat vs Dog/train/*.jpg'
# read addresses and labels from the 'train' folder
addrs = glob.glob(cat_dog_train_path)
labels = [0 if 'cat' in addr else 1 for addr in addrs] # 0 = Cat, 1 = Dog
# to shuffle data
if shuffle_data:
c = list(zip(addrs, labels))
shuffle(c)
addrs, labels = zip(*c)
Error:
> ValueError Traceback (most recent call
> last) <ipython-input-19-4408536403db> in <module>()
> 2 c = list(zip(address, labels))
> 3 shuffle(c)
> ----> 4 addrs, labels = zip(*c)
>
> ValueError: not enough values to unpack (expected 2, got 0)
Reference: http://machinelearninguru.com/deep_learning/data_preparation/hdf5/hdf5.html#list

The website gives Python 2 code. I am seeing your tag, are you using Python 3? You can convert it using
2to3 -n filename.py

Related

Getting Error too many indices for tensor of dimension 3

I am trying to Read an Image using GeneralizedRCNN, Input shape is given as a comment with code. The problem is I am getting an error while tracing the model with input shape. The error is :
> trace = torch.jit.trace(model, input_batch) line Providing the error
> "/usr/local/lib/python3.7/dist-packages/torch/tensor.py:467:
> RuntimeWarning: Iterating over a tensor might cause the trace to be
> incorrect. Passing a tensor of different shape won't change the number
> of iterations executed (and might lead to errors or silently give
> incorrect results). 'incorrect results).', category=RuntimeWarning)
> --------------------------------------------------------------------------- IndexError Traceback (most recent call
> last) <ipython-input-25-52ff7ef794de> in <module>()
> 1 #First attempt at tracing
> ----> 2 trace = torch.jit.trace(model, input_batch)
>
> 7 frames
> /usr/local/lib/python3.7/dist-packages/detectron2/modeling/meta_arch/rcnn.py
> in <listcomp>(.0)
> 182 Normalize, pad and batch the input images.
> 183 """
> --> 184 images = [x["image"].to(self.device) for x in batched_inputs]
> 185 images = [(x - self.pixel_mean) / self.pixel_std for x in images]
> 186 images = ImageList.from_tensors(images, self.backbone.size_divisibility)
>
> IndexError: too many indices for tensor of dimension 3
model = build_model(cfg)
model.eval()
# print(model)
input_image = Image.open("model/xxx.jpg")
display(input_image)
to_tensor = transforms.ToTensor()
input_tensor = to_tensor(input_image)
# input_tensor.size = torch.Size([3, 519, 1038])
input_batch = input_tensor.unsqueeze(0)
# input_batch.size = torch.Size([1, 3, 519, 1038])
trace = torch.jit.trace(model, input_batch)
This error occurred because input_batch.size = torch.Size([1, 3, 519, 1038]) has 4 dimensions and trace = torch.jit.trace(model, input_batch) expected to get a 3 dimensions as input.
you don't need input_batch = input_tensor.unsqueeze(0). delete or comment this line.
By default
..
The torch.jit.trace function cannot be used directly. However, it does provide a wrapper called that the model can take a tensor or a tuple of tensors as input. You can find a way to use it because of them.
The code for tracing the Mask RCNN model looks like this:
import torch
import torchvision
from detectron2.export.flatten import TracingAdapter
def inference_func(model, image):
inputs= [{"image": image}]
return model.inference(inputs, do_postprocess=False)[0]
print("cfg.MODEL.WEIGHTS: ",cfg.MODEL.WEIGHTS) ## RETURNS : cfg.MODEL.WEIGHTS: drive/Detectron2/model_final.pth
model= build_model(cfg)
example= torch.rand(1, 3, 224, 224)
wrapper= TracingAdapter(model, example, inference_func)
wrapper.eval()
traced_script_module= torch.jit.trace(wrapper, (example,))
traced_script_module.save("drive/Detectron2/model-final.pt")

Hide RandomizedSearchCV output

I'm executing pylint in the terminal to clean up my python script a bit. In my script, I also use RandomizedSearchCV. What can I do so that pylint's results do not show the different combinations of the RandomizedSearchCV results? Or how can I suppress the output from RandomizedSearchCV?
Here's a snippet of the code in my .py script that causes this issue, and the screenshots of the start/end of what I see when I execute in the terminal.
LOGGER.info("Fine tune model and fit it (Model 2)")
# with warnings.catch_warnings():
# warnings.filterwarnings("ignore")
new_model = RandomizedSearchCV(lr_alt, parameters, cv=4, n_iter=15)
# with warnings.catch_warnings():
# warnings.filterwarnings("ignore")
new_model.fit(train_features_x, train_y)
Can't load images yet, but here's a snippet of the start code in terminal:
(env-stats404-w20-HW5) Franciscos-MacBook-Pro:FRANCISCO-AVALOS franciscoavalosjr$ pylint main.py
************* Module main
main.py:69:0: C0103: Argument name "Product" doesn't conform to snake_case naming style (invalid-name)
main.py:80:4: R1705: Unnecessary "elif" after "return" (no-else-return)
main.py:89:75: W0108: Lambda may not be necessary (unnecessary-lambda)
Traceback (most recent call last):
File "/Users/franciscoavalosjr/opt/anaconda3/envs/env-stats404-w20-HW5/bin/pylint", line 8, in <module>
sys.exit(run_pylint())
File "/Users/franciscoavalosjr/opt/anaconda3/envs/env-stats404-w20-HW5/lib/python3.7/site-packages/pylint/__init__.py", line 23, in run_pylint
PylintRun(sys.argv[1:])
And here's a snippet of the end of that pylint run:
File "/Users/franciscoavalosjr/opt/anaconda3/envs/env-stats404-w20-HW5/lib/python3.7/site-packages/astroid/decorators.py", line 131, in raise_if_nothing_inferred
yield next(generator)
File "/Users/franciscoavalosjr/opt/anaconda3/envs/env-stats404-w20-HW5/lib/python3.7/site-packages/astroid/decorators.py", line 88, in wrapped
if context.push(node):
File "/Users/franciscoavalosjr/opt/anaconda3/envs/env-stats404-w20-HW5/lib/python3.7/site-packages/astroid/context.py", line 92, in push
self.path.add((node, name))
RecursionError: maximum recursion depth exceeded while calling a Python object
(env-stats404-w20-HW5) Franciscos-MacBook-Pro:FRANCISCO-AVALOS franciscoavalosjr$
Turns out python didn't like how I assigned my new column. The fix came from creating a new variable with the new formed column instead of adding it to my dataframe. Before and after code below:
Original code:
# DF_MAJORITY = SUB_DATA[SUB_DATA['balance'] == 0]
# DF_MINORITY = SUB_DATA[SUB_DATA['balance'] == 1]
# NEW_MAJORITY_NUMBER = ((DF_MINORITY.shape[0]/0.075) - DF_MINORITY.shape[0])
# NEW_MAJORITY_NUMBER = int(round(NEW_MAJORITY_NUMBER))
# DF_MAJORITY_DOWNSAMPLED = resample(DF_MAJORITY, replace=False, n_samples=NEW_MAJORITY_NUMBER,
# random_state=29)
# DF_DOWNSAMPLED = pd.concat([DF_MAJORITY_DOWNSAMPLED, DF_MINORITY])
New Code:
BALANCE = SUB_DATA.loc[:, 'Delivery Status'].apply(lambda x: classify_shipping(x))
BALANCE = pd.DataFrame(BALANCE)
DF_MAJORITY = BALANCE[BALANCE['Delivery Status'] == 0]
DF_MINORITY = BALANCE[BALANCE['Delivery Status'] == 1]
NEW_MAJORITY_NUMBER = ((DF_MINORITY.shape[0]/0.075) - DF_MINORITY.shape[0])
NEW_MAJORITY_NUMBER = int(round(NEW_MAJORITY_NUMBER))
DF_MAJORITY_DOWNSAMPLED = resample(DF_MAJORITY, replace=False,
n_samples=NEW_MAJORITY_NUMBER, random_state=29)
DF_DOWNSAMPLED = pd.concat([DF_MAJORITY_DOWNSAMPLED, DF_MINORITY])

Confusion Matrix : RecursionError

I had been trying to replicated an online tutorial for plotting confusion matrix but got recursion error, tried resetting the recursion limit but still the error persists. The code is a below:
log = LogisticRegression()
log.fit(x_train,y_train)
pred_log = log.predict(x_train)
confusion_matrix(y_train,pred_log)
The error I got is :
---------------------------------------------------------------------------
RecursionError Traceback (most recent call last)
<ipython-input-57-4b8fbe47e72d> in <module>
----> 1 (confusion_matrix(y_train,pred_log))
<ipython-input-48-92d5242f8580> in confusion_matrix(test_data, pred_data)
1 def confusion_matrix(test_data,pred_data):
----> 2 c_mat = confusion_matrix(test_data,pred_data)
3 return pd.DataFrame(c_mat)
... last 1 frames repeated, from the frame below ...
<ipython-input-48-92d5242f8580> in confusion_matrix(test_data, pred_data)
1 def confusion_matrix(test_data,pred_data):
----> 2 c_mat = confusion_matrix(test_data,pred_data)
3 return pd.DataFrame(c_mat)
RecursionError: maximum recursion depth exceeded
The shape of the train and test data is as below
x_train.shape,y_train.shape,x_test.shape,y_test.shape
# ((712, 7), (712,), (179, 7), (179,))
Tried with: sys.setrecursionlimit(1500)
But still no resolution.
Looks like you are recursively calling the same function. Try changing the outer function name.
1 def confusion_matrix(test_data,pred_data):
----> 2 c_mat = confusion_matrix(test_data,pred_data)
3 return pd.DataFrame(c_mat)
To
def confusion_matrix_pd_convertor(test_data,pred_data):
c_mat = confusion_matrix(test_data,pred_data)
return pd.DataFrame(c_mat)
log = LogisticRegression()
log.fit(x_train,y_train)
pred_log = log.predict(x_train)
confusion_matrix_pd_convertor(y_train,pred_log)

Google AI Adventures code for simple Estimator for iris flower ValueError: invalid literal for int() with base 10: 'Sepal length'

This is regarding an issue of the code which has been shown in this video. I tried to run the code in TensorFlow (version 1.12 and 1.3) with python (version 3.7 and 3.6.4). But I get an error like below
"ValueError: invalid literal for int() with base 10: 'Sepal length'".
When I was running the code in TensorFlow version 1.12, I realized an additional warning/error which went into different code files to spit the mistake.
#Code
import tensorflow as tf
import numpy as np
print (tf.__version__)
from tensorflow.contrib.learn.python.learn.datasets import base
# Data files
IRIS_TRAINING = "iris_training.csv"
IRIS_TEST = "iris_test.csv"
# Load datasets.
training_set = base.load_csv_with_header(filename=IRIS_TRAINING,
features_dtype=np.float32,
target_dtype=np.int)
test_set = base.load_csv_with_header(filename=IRIS_TEST,
features_dtype=np.float32,
target_dtype=np.int)
print(training_set.data)
print(training_set.target)
Traceback
# 1.3.0
# ValueError Traceback (most recent call last)
# <ipython-input-2-065d21e0a8b0> in <module>
# 13 training_set = base.load_csv_with_header(filename=IRIS_TRAINING,
# 14 features_dtype=np.float32,
# ---> 15 target_dtype=np.int)
# 16 test_set = base.load_csv_with_header(filename=IRIS_TEST,
# 17 features_dtype=np.float32,
# c:\users\sanjay\appdata\local\programs\python\python36\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\base.py in load_csv_with_header(filename, target_dtype, features_dtype, target_column)
# 46 data_file = csv.reader(csv_file)
# 47 header = next(data_file)
# ---> 48 n_samples = int(header[0])
# 49 n_features = int(header[1])
# 50 data = np.zeros((n_samples, n_features), dtype=features_dtype)
# ValueError: invalid literal for int() with base 10: 'Sepal length'
Figured it out, the data file has to be formatted differently "sepal length" was my column header name. I had prepared it from: https://en.wikipedia.org/wiki/Iris_flower_data_set
Instead of that using these files, the code works.
http://download.tensorflow.org/data/iris_training.csv, http://download.tensorflow.org/data/iris_test.csv

Numpy printing with iterator .format throws an error

Using anaconda distribution, Python 3.61 and using Jupyter notebook for Scipy/Numpy. I can use the print(' blah {} '.format(x)) to format numbers but if I iterate over a nparray I get an error.
# test of formatting
'{:+f}; {:+f}'.format(3.14, -3.14) # show it always
example stolen from the Python 3.6 manual section 6.1.3.2 Here and I get the expected response. So I know that it isn't that I've forgotten to import something i.e. it is built in.
if I do this:
C_sense = C_pixel + C_stray
print('Capacitance of node')
for x, y in np.nditer([Names,C_sense]):
print('The {} has C ={} [F]'.format(x,y))
I get output
Capacitance of node
The 551/751 has C =8.339999999999999e-14 [F]
The 554 has C =3.036e-13 [F]
The 511 has C =1.0376e-12 [F]
But if I do this:
# now with formatting
C_sense = C_pixel + C_stray
print('Capacitance of node')
for x, y in np.nditer([Names,C_sense]):
print('The {} has C ={:.3f} [F]'.format(x,y))
I get the following error:
TypeError Traceback (most recent call last)
<ipython-input-9-321e0b5edb03> in <module>()
3 print('Capacitance of node')
4 for x, y in np.nditer([Names,C_sense]):
----> 5 print('The {} has C ={:.3f} [F]'.format(x,y))
TypeError: unsupported format string passed to numpy.ndarray.__format__
I've attached a screen shot of my Jupyter notebook to show context of this code.
The error is clearly coming from the formatter, not knowing what to do with the numpy iterable you get from np.nditer.
Does the following work?
for x,y in zip(Names,C_sense):
print('The {} has C ={:.3f} [F]'.format(x,y))

Resources