ValueError: cannot reshape array of size 0 - python-3.x

If I call this function on my dataset:
def reconstruct_flight(data, sequence_lenght, flight_len, param_len):
stack_factor = int(flight_len/sequence_lenght)
data_reconstructed = []
for i in range(0, len(data_clean), stack_factor):
if i<len(data_clean):
data_reconstructed.append(
data[i:i+stack_factor].reshape(
[flight_len, param_len])
)
return np.array(data_reconstructed)
I get the following error:
ValueError: cannot reshape array of size 0 into shape (1500,77)
But if I run the for loop in the console without passing it as a function:
data_reconstructed = []
for i in range(0, len(data_clean), stack_factor):
if i<len(data_clean):
data_reconstructed.append(
data[i:i+stack_factor].reshape(
[flight_len, param_len])
)
It works as expected. Why is that ?

When reshaping, if you are keeping the same data contiguity and just reshaping the box, you can reshape your data with
data_reconstructed = data_clean.reshape((10,1500,77))
if you are changing the contiguity from one axis to another, you will need to add a permutation of the axes beforehand https://numpy.org/doc/stable/reference/generated/numpy.transpose.html

Related

Transforms.Normalize returns values higher than 255 Pytorch

I am working on an video dataset, I read the frames as integers and convert them to a numpy array float32.
After being loaded, they appear in a range between 0 and 255:
[165., 193., 148.],
[166., 193., 149.],
[167., 193., 149.],
...
Finally, to feed them to my model and stack the frames I do the "ToTensor()" plus my transformation [transforms.Resize(224), transforms.Normalize([0.454, 0.390, 0.331], [0.164, 0.187, 0.152])]
and here the code to transform and stack the frames:
res_vframes = []
for i in range(len(v_frames)):
res_vframes.append(self.transforms((v_frames[i])))
res_vframes = torch.stack(res_vframes, 0)
The problem is that after the transformation the values appears in this way, which has values higher than 255:
[tensor([[[1003.3293, 1009.4268, 1015.5244, ..., 1039.9147, 1039.9147,
1039.9147],...
Any idea on what I am missing or doing wrong?
The behavior of torchvision.transforms.Normalize:
output[channel] = (input[channel] - mean[channel]) / std[channel]
Since the numerator of the lefthand of the above equation is greater than 1 and the denominator of it is smaller than 1, the computed value gets larger.
The class ToTensor() maps a tensor's value to [0, 1] only if some condition is satisfied. Check this code from official Pytorch docs:
if isinstance(pic, np.ndarray):
# handle numpy array
if pic.ndim == 2:
pic = pic[:, :, None]
img = torch.from_numpy(pic.transpose((2, 0, 1))).contiguous()
# backward compatibility
if isinstance(img, torch.ByteTensor):
return img.to(dtype=default_float_dtype).div(255)
else:
return img
Therefore you need to divide tensors explicitly or make to match the above condition.
Your normalization uses values between 0-1 and not 0-255.
You need to change your input frames to 0-1 or the normalization vectors to 0-255.
You can divide the frames by 255 before using the transform:
res_vframes = []
for i in range(len(v_frames)):
res_vframes.append(self.transforms((v_frames[i]/255)))
res_vframes = torch.stack(res_vframes, 0)

Error trying to use nump frombuffer function on a large bytes object

I am trying to pass a very, very long bytes object in numpy frombuffer, and it is giving me the following error:
ValueError: buffer size must be a multiple of element size
Is there a flag I am missing? How can I specify and larger buffer size?
Edit: The format is like:
x = b'\xdc\x08....\x01'
y = np.frombuffer(x)
You need to tell it what type of data it is, and if it's an array, what is the array shape. For example
import numpy as np
a = [[1,2,3],[2,4,6]]
npa = np.array(a)
x = npa.tobytes()
y = np.frombuffer(x, dtype = npa.dtype.name).reshape(npa.shape)
# check to see that y is the same as npa

how to select specific columns in a table by using np.r__ in dataset.loc and deal with string data

I would like to classify a problem which its data rows are something similar to
In order to divide to test train data:
x_train, x_test, y_train, y_test = train_test_split(X, y,test_size = 0.25, random_state = 0)
Method 1:
X = dataset.loc[np.r_[0:5, 7:26]].values
y = dataset.loc[np.r_[6]].values
Method 2:
X = dataset.loc[:, ['x1', 'x2','x3','x4','x5','x6','x7','x8','x9','x10','x11','x12','x13','x14','x15','x16','x17','x18','x19','x20','x21','x22','x23','x24','x25','x26']].values
y = dataset.loc[:, ['y']].values
The first method encounters this problem:
ValueError: Found input variables with inconsistent numbers of samples: [24, 1]
while the second one is OK. I do not like to write all of the columns but I do not know how to solve the problem of first method.
Also, since the data is string I encounter this error:
ValueError: could not convert string to float: 'id8053'
I tried to solve with:
X = X.apply(lambda x: pd.factorize(x)[1])
y = y.apply(lambda x: pd.factorize(x)[0])
but I encounter this error:
AttributeError: 'numpy.ndarray' object has no attribute 'apply'
What is wrong?
np.r_ should work fine in your case. Method 1 missed the rows. You slice on integer index-columns (i.e, slicing by integer position of columns), so you need to use .iloc with np.r_ for columns and specify : for rows
Try this (note the right-end of slices in np.r_ got added 1 because .iloc ignore the right-end while loc keeps it)
Method 1:
X = dataset.iloc[:, np.r_[0:6, 7:27]].values
y = dataset.iloc[:, np.r_[7]].values

Index 150 out of bounds in axis0 with size 1

I was making histogram using numpy array in Python with open cv. The code is as follows:
#finding histogram of an image
import numpy as np
import cv2
img = cv2.imread("cr7.jpg")
gry_img=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
a=np.zeros((1,256),dtype=np.uint8)
#finding how many times a particular pixel intensity repeats
for x in range (0,183): #size of gray_img is (184,275)
for y in range (0,274):
g=gry_ img[x,y]
a[g]=a[g]+1
print(a)
Error is as follows:
IndexError: index 150 is out of bounds for axis 0 with size 1
Since you haven't supplied the image, it is only from guessing that it seems you've made a mistake with the dimensions of the image. Alternatively the issue is entirely with the shape of your results array a.
The code you have is rather fragile, and here is a cleaner way to interact with images. I use an image from opencv's data directory: aero1.jpg.
The code here resolves both potential issues identified above, whichever one it was:
fname = 'aero1.jpg'
im = cv2.imread(fname)
gry_img = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
gry_img.shape
>>> (480, 640)
# note that the image is 640pix wide by 480 tall;
# the numpy array shows the number of rows first.
# rows are in y / columns are in x
# NOTE the results array `a` need only be 1-dimensional, not 2d (1x256)
a=np.zeros((256, ), dtype=np.uint8)
# iterating over all pixels, whatever the shape of the image.
height, width = gry_img.shape
for x in xrange(width):
for y in xrange(height):
g = gry_img[y, x] # NOTE y, x not x, y
a[g] += 1
But note that you could also achieve this easily with a numpy function np.histogram (docs), with slightly careful handling of the bin edges.
histb, bin_edges = np.histogram(gry_img.reshape(-1), bins=xrange(0, 257))
# check that we arrived at the same result as iterating manually:
(a == histb).all()
>>> True

slicing error in numpy array

I am trying to run the following code
fs = 1000
data = np.loadtxt("trainingdataset.txt", delimiter=",")
data1 = data[:,2]
data2 = data1.astype(int)
X,Y = data2['521']
but it gets me the following error
Traceback (most recent call last):
File "C:\Users\hadeer.elziaat\Desktop\testspec.py", line 58, in <module>
X,Y = data2['521']
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
my dataset
1,4,6,10
2,100,125,10
3,100,7216,254
4,100,527,263
5,100,954,13
6,100,954,23
You're using the string '521' rather than the number 521 for indexing. Try X,Y = data2[521] instead.
If you are only given the string, you could cast it to an int first: X,Y = data2[int('521')], but this might result in some errors and/or unexpected behaviour.
Next problem, you are requiring two variable, one for X and one for Y, yet the data2[521] selection only provides you with a single variable (the number in the 3rd column, 522nd row).
You say you want all the data in the 3rd column.
I assume you also want some kind of x-axis, since you are attempting to do X, Y = .... How about using the first column for that? Then your code would be:
import numpy as np
data = np.loadtxt("trainingdataset.txt", delimiter=',', dtype='int')
x = data[:, 0]
y = data[:, 2]
What remains unclear from your question is why you tried to index your data with 521 - which failed because you cannot use strings as indices on plain arrays.

Resources