Can't reshape my numpy array for training a KNN model - python-3.x

I try to train a KNN model using a Local Binary Pattern (LBP) descriptor.
My data is a numpy.array of shape (67, 26) elements, but myaray.shape returns (67, ).
I tried to reshape the array like:
myarray.reshape(-1, 26)
but it resulted in the following error:
ValueError: cannot reshape array of size 67 into shape (26)**
Thanks you so much

As I'm not sure I've clearly understood your question, first I'm going to try to mock up your data:
In [101]: import numpy as np
In [102]: myarray = np.empty(shape=67, dtype=object)
In [103]: for i in range(len(myarray)):
...: myarray[i] = np.random.rand(26)
Please, run the following code:
In [104]: type(myarray)
Out[104]: numpy.ndarray
In [105]: myarray.shape
Out[105]: (67,)
In [106]: myarray.dtype
Out[106]: dtype('O')
In [107]: type(myarray[0])
Out[107]: numpy.ndarray
In [108]: myarray[0].shape
Out[108]: (26,)
If you get the same results as above, numpy.stack should do the trick as pointed out by #hpaulj in the comments:
In [109]: x = np.stack(myarray)
In [110]: type(x)
Out[110]: numpy.ndarray
In [111]: x.shape
Out[111]: (67, 26)

Related

how to use map with tuples in a tensorflow 2 dataset?

trying to map a tuple to a tuple in a dataset in tf 2 (please see code below). my output (please see below) shows that the map function is only called once. and i can not seem to get at the tuple.
how do i get at the "a","b","c" from the input parameter which is a:
tuple Tensor("args_0:0", shape=(3,), dtype=string)
type <class 'tensorflow.python.framework.ops.Tensor'>
edit: it seems like using Dataset.from_tensor_slices produces the data all at once. this explcains why map is only called once. so i probably need to make the dataset in some other way.
from __future__ import absolute_import, division, print_function, unicode_literals
from timeit import default_timer as timer
print('import tensorflow')
start = timer()
import tensorflow as tf
end = timer()
print('Elapsed time: ' + str(end - start),"for",tf.__version__)
import numpy as np
def map1(tuple):
print("<<<")
print("tuple",tuple)
print("type",type(tuple))
print("shape",tuple.shape)
print("tuple 0",tuple[0])
print("type 0",type(tuple[0]))
print("shape 0",tuple.shape[0])
# how do i get "a","b","c" from the input parameter?
print(">>>")
return ("1","2","3")
l=[]
l.append(("a","b","c"))
l.append(("d","e","f"))
print(l)
ds=tf.data.Dataset.from_tensor_slices(l)
print("ds",ds)
print("start mapping")
result = ds.map(map1)
print("end mapping")
$ py mapds.py
import tensorflow
Elapsed time: 12.002168990751619 for 2.0.0
[('a', 'b', 'c'), ('d', 'e', 'f')]
ds <TensorSliceDataset shapes: (3,), types: tf.string>
start mapping
<<<
tuple Tensor("args_0:0", shape=(3,), dtype=string)
type <class 'tensorflow.python.framework.ops.Tensor'>
shape (3,)
tuple 0 Tensor("strided_slice:0", shape=(), dtype=string)
type 0 <class 'tensorflow.python.framework.ops.Tensor'>
shape 0 3
>>>
end mapping
The value or values returned by map function (map1) determine the structure of each element in the returned dataset. [Ref]
In your case, result is a tf dataset and there is nothing wrong in your coding.
To check whether every touple is mapped correctly you can traverse every sample of your dataset like follows:
[Updated Code]
def map1(tuple):
print(tuple[0].numpy().decode("utf-8")) # Print first element of tuple
return ("1","2","3")
l=[]
l.append(("a","b","c"))
l.append(("d","e","f"))
ds=tf.data.Dataset.from_tensor_slices(l)
ds = ds.map(lambda tpl: tf.py_function(map1, [tpl], [tf.string, tf.string, tf.string]))
for sample in ds:
print(str(sample[0].numpy().decode()), sample[1].numpy().decode(), sample[2].numpy().decode())
Output:
a
1 2 3
d
1 2 3
Hope it will help.

Why Deprecation Warning and ValueError shows up even when the shapes and length of input arrays is same?

I am partitioning my data using train_test_split. I have got 2 features to fit, namely 'horsepower' and 'price' of the car each containing 199 elements. So I tried out the following code:
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
lm=LinearRegression()
x_train,x_test,y_train,y_test =train_test_split(df['horsepower'],df['price'],test_size=0.3,random_state=0)
model = lm.fit(x_train, y_train)
predictions = lm.predict(x_test)
#Now, just to recheck:
print(x_train.shape == y_train.shape)
>>>True
#And
len(x_train)
>>>139
len(y_train)
>>>139
However all I am getting is a DeprecationWarning and ValueError:
DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17
and will raise ValueError in 0.19. Reshape your data either using
X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1)
if it contains a single sample.
and
ValueError: Found input variables with inconsistent numbers of samples: [1, 139]
Sklearn requires your X data shape as (n_row, n_column).
When you select a column in DataFrame by df['horsepower'], what you get is a pandas.Series, and thus your shape is (n_row,).
To avoid this, you have two options:
select your column in this way: df[['horsepower']], this gives you a new DataFrame and thus the shape is (n_row, n_column)
do reshape before fitting your model: x_train = x_train.reshape(-1,1) and x_test = x_test.reshape(-1,1)

Apply MinMaxScaler() on a pandas column

I am trying to use the sklearn MinMaxScaler to rescale a python column like below:
scaler = MinMaxScaler()
y = scaler.fit(df['total_amount'])
But got the following errors:
Traceback (most recent call last):
File "/Users/edamame/workspace/git/my-analysis/experiments/my_seq.py", line 54, in <module>
y = scaler.fit(df['total_amount'])
File "/Users/edamame/workspace/git/my-analysis/venv/lib/python3.4/site-packages/sklearn/preprocessing/data.py", line 308, in fit
return self.partial_fit(X, y)
File "/Users/edamame/workspace/git/my-analysis/venv/lib/python3.4/site-packages/sklearn/preprocessing/data.py", line 334, in partial_fit
estimator=self, dtype=FLOAT_DTYPES)
File "/Users/edamame/workspace/git/my-analysis/venv/lib/python3.4/site-packages/sklearn/utils/validation.py", line 441, in check_array
"if it contains a single sample.".format(array))
ValueError: Expected 2D array, got 1D array instead:
array=[3.180000e+00 2.937450e+03 6.023850e+03 2.216292e+04 1.074589e+04
:
0.000000e+00 0.000000e+00 9.000000e+01 1.260000e+03].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
Any idea what was wrong?
The input to MinMaxScaler needs to be array-like, with shape [n_samples, n_features]. So you can apply it on the column as a dataframe rather than a series (using double square brackets instead of single):
y = scaler.fit(df[['total_amount']])
Though from your description, it sounds like you want fit_transform rather than just fit (but I could be wrong):
y = scaler.fit_transform(df[['total_amount']])
A little more explanation:
If your dataframe had 100 rows, consider the difference in shape when you transform a column to an array:
>>> np.array(df[['total_amount']]).shape
(100, 1)
>>> np.array(df['total_amount']).shape
(100,)
The first returns a shape that matches [n_samples, n_features] (as required by MinMaxScaler), whereas the second does not.
Try to do with this way:
import pandas as pd
from sklearn import preprocessing
x = df.values #returns a numpy array
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
df = pd.DataFrame(x_scaled)

Tensorflow reset or clear collection

In tensorflow, I find the API tf.add_to_collcetion to add some value to collection like code bellow.
def accuracy_rate(logits, labels):
correct = tf.nn.in_top_k(logits, labels, 1)
# Return the accuracy of true entries.
accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
return accuracy
with tf.Session() as sess:
logits, labels = ...
accuracy = accuracy_rate(logits, labels)
tf.add_to_collection('total_accuracy', sess.run(accuracy))
What I can't find in the API is that, how can I clear all values that I've already stored in one collection?
You can use tf.get_collection_ref to get a mutable reference to the collection which you can clear (it's just a python list).
I think this might be what you are looking for?
In [2]: import tensorflow as tf
In [3]: w = tf.Variable([[1,2,3], [4,5,6], [7,8,9], [3,1,5], [4,1,7]], collections=[tf.GraphKeys.WEIGHTS, tf.GraphKeys.GLOBAL_VARIABLES], dtype=tf.float32)
In [4]: params = tf.get_collection_ref(tf.GraphKeys.WEIGHTS)
In [5]: del params[:]
In [6]: tf.get_collection_ref(tf.GraphKeys.WEIGHTS)
Out[6]: []
In [10]: params = tf.get_collection_ref(tf.GraphKeys.GLOBAL_VARIABLES)
In [11]: params
Out[11]: [<tf.Variable 'Variable:0' shape=(5, 3) dtype=float32_ref>]
Find an alternative solution, using different tf.Graph().

How to plot accuracy bars for each feature of an array

I have a data set "x" and its label vector "y". I want to plot the accuracy for each attribute (for each column of "x") after applying NaiveBayes and cross-validation. I want a bar graph.
So at the end I need to have 3 bars, because "x" has 3 columns. And the classification has to run 3 times. 3 different accuracies for each feature.
Whenever I execute my code it shows:
ValueError: Found arrays with inconsistent numbers of samples: [1 3]
DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
What am I doing wrong?
import matplotlib.pyplot as plt
import numpy as np
from sklearn import cross_validation
from sklearn.naive_bayes import GaussianNB
clf = GaussianNB()
x = np.array([[0, 0.51, 0.00101], [3, 0.54, 0.00105], [6, 0.57, 0.00108], [9, 0.60, 0.00111], [1, 0.73, 0.00114], [5, 0.76, 0.00117], [8, 0.89, 120]])
y = np.array([1, 0, 0, 1, 1, 1, 0])
scores = list()
scores_std = list()
for i in range(x.shape[1]):
xA=x[:, i]
scoresKF2 = cross_validation.cross_val_score(clf, xA, y, cv=2)
scores.append(np.mean(scoresKF2))
scores_std.append(np.std(scoresKF2))
plt.bar(x[:,i], scores)
plt.show()
Checking the shape of your input data, xA, shows us that it is 1-dimensional -- specifically, it is (7,) shape. As the warning tells us, you are not allowed to pass in a 1d array here. The key to solving this in the warning that was returned Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample. Therefore, since it is just a single feature, do this xA = x[:,i].reshape(-1, 1) instead of xA = x[:,i].
I think there is another issue with the plotting. I'm not completely sure what you are expecting to see but you should probably replace plt.bar(x[:,i], scores) with plt.bar(i, np.mean(scoresKF2)).

Resources