How to implement MultiLabelBinarizer on this dataframe? - python-3.x

I have a dataframe like this:
mid value label
ID
192 3 176.6 [9, 6, 8, 0, 8, 8, 7, 9, 2, 19...
192 4 73.6 [9, 6, 8, 0, 8, 8, 7, 9, 2, 19...
192 5 15.8 [9, 6, 8, 0, 8, 8, 7, 9, 2, 19...
194 3 9603.2 [0, 0, 0, 0, 0, 9, 6, 1, 8, ...
I want to implement MultiLabelBinarizer after removing the duplicate values in each list of label column.
I have tried by looping the frame and removing duplicates. and also, the multilabel binarizer doesnt work and throws an exception
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
mlb.fit(y_train.data)
X_train includes the mid and value columns
y_train includes label values
id is the index
I expect a prediction from the above values after the duplicate values are removed from each list of label column

Let's assume your dataframe is named df:
df2 = pd.DataFrame(df.groupby(['ID','mid', 'value'])['label'].apply(lambda x: tuple(x.values)))
df2.reset_index(inplace=True)
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
mlb.fit(df2['label'])
mlb.transform(df2['label'])

Related

Simplify numpy expression [duplicate]

This question already has answers here:
Access n-th dimension in python [duplicate]
(5 answers)
Closed 2 years ago.
How can I simplify this:
import numpy as np
ex = np.arange(27).reshape(3, 3, 3)
def get_plane(axe, index):
return ex.swapaxes(axe, 0)[index] # is there a better way ?
I cannot find a numpy function to get a plane in a higher dimensional array, is there one?
EDIT
The ex.take(index, axis=axe) method is great, but it copies the array instead of giving a view, what I originally wanted.
So what is the shortest way to index (without copying) a n-th dimensional array to get a 2d slice of it, with index and axis?
Inspired by this answer, you can do something like this:
def get_plane(axe, index):
slices = [slice(None)]*len(ex.shape)
slices[axe]=index
return ex[tuple(slices)]
get_plane(1,1)
output:
array([[ 3, 4, 5],
[12, 13, 14],
[21, 22, 23]])
What do you mean by a 'plane'?
In [16]: ex = np.arange(27).reshape(3, 3, 3)
Names like plane, row, and column, are arbitrary conventions, not formally defined in numpy. The default display of this array looks like 3 'planes' or 'blocks', each with rows and columns:
In [17]: ex
Out[17]:
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]]])
Standard indexing lets us view any 2d block, in any dimension:
In [18]: ex[0]
Out[18]:
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
In [19]: ex[0,:,:]
Out[19]:
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
In [20]: ex[:,0,:]
Out[20]:
array([[ 0, 1, 2],
[ 9, 10, 11],
[18, 19, 20]])
In [21]: ex[:,:,0]
Out[21]:
array([[ 0, 3, 6],
[ 9, 12, 15],
[18, 21, 24]])
There are ways of saying I want block 0 in dimension 1 etc, but first make sure you understand this indexing. This is the core numpy functionality.
In [23]: np.take(ex, 0, 1)
Out[23]:
array([[ 0, 1, 2],
[ 9, 10, 11],
[18, 19, 20]])
In [24]: idx = (slice(None), 0, slice(None)) # also np.s_[:,0,:]
In [25]: ex[idx]
Out[25]:
array([[ 0, 1, 2],
[ 9, 10, 11],
[18, 19, 20]])
And yes you can swap axes (or transpose), it that suits your needs.

Array conforming shape of a given variable

I need to do some calculations with a NetCDF file.
So I have two variables with following dimensions and sizes:
A [time | 1] x [lev | 12] x [lat | 84] x [lon | 228]
B [lev | 12]
What I need is to produce a new array, C, that is shaped as (1,12,84,228) where B contents are propagated to all dimensions of A.
Usually, this is easily done in NCL with the conform function. I am not sure what is the equivalent of this in Python.
Thank you.
The numpy.broadcast_to function can do something like this, although in this case it does require B to have a couple of extra trailing size 1 dimension added to it to satisfy the numpy broadcasting rules
>>> import numpy
>>> B = numpy.arange(12).reshape(12, 1, 1)
>>> B
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
>>> B = B.reshape(12, 1, 1)
>>> B.shape
(12, 1, 1)
>>> C = numpy.broadcast_to(b, (1, 12, 84, 228))
>>> C.shape
(1, 12, 84, 228)
>>> C[0, :, 0, 0]
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
>>> C[-1, :, -1, -1]
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])

How to run this function for a larger dataset(9 features)

I am new to python. In my code i was trying to implement Support Vector Machine from scratch. The code previously had 2 features and 2 class(1 and -1) with 6 instances (for each class) and it was working fine.
I am trying to implement the same code for 9 features and 2 classes(1 and -1) with 6 instances(for each class) and it is giving me a Value Error and I can't seem to fix it.
I am using Python version 3.6.3
Thank you for your help.
#This is my dictionary/dataset
data_dict = {-1: np.array([[1, 7, 4, 1, 9, 1, 5, 6, 7],
[2, 8, 6, 0, 8, 6, 8, 5, 2],
[3, 8, 7, 3, 2, 5, 4, 4, 8], ]),
1: np.array([[5, 1, 8, 2, 6, 4, 0, 2, -3],
[6, -1, 5, -2, 6, -3, 0, 5, 3],
[7, 3, 0, 4, 10, -6, 9, 8, 2], ])}
#Call to the function
svm = Support_Vector_Machine()
svm.fit(data=data_dict)
#Function fit
def fit(self, data):
self.data = data
#Some more code here
#w_t and b intialized here
for i in self.data:
for xi in self.data[i]:
yi = i
if not yi * (np.dot(w_t, xi) + b) >= 1:
found_option = False
# print(xi,':',yi*(np.dot(w_t,xi)+b))
if found_option:
opt_dict[np.linalg.norm(w_t)] = [w_t, b]
Error Message:
in module
svm.fit(data=data_dict)
in fit
if not yi * (np.dot(w_t, xi) + b) >= 1:
ValueError: shapes (2,) and (9,) not aligned: 2 (dim 0) != 9 (dim 0)
Thank You.
I figured it out. The problem was with the size and shape of the array w_t.
w_t had 2 elements and i was trying to multiply it with a 9 element array.
I fixed it and now its working fine.

Adding 2 numpy nd.array

I have to numpy.ndarray A & B which are of the following shape
A=(500000,784),B =(500000,).I need to add these 2 arrays in a way that the array B , which has labels gets added as the 785th column in the array without changing any sequence in its row- wise data.
i.e, A becomes of shape (500000,785).
np.append(A.T,[B.T], axis=0).T
For example:
A = np.array([[1,2,3],[4,5,6],[7,8,9],[10,9,11]])
B = np.array([4,5,3,6])
np.append(A.T,[B.T], axis=0).T
Output:
array([[ 1, 2, 3, 4],
[ 4, 5, 6, 5],
[ 7, 8, 9, 3],
[10, 9, 11, 6]])

How to get indices of a specific number in an array?

I want to pick the indices of number 8 without knowing its position in the array.
a = np.arange(10)
You can use np.where like :
>>> import numpy as np
>>> a = np.array([1,4,8,2,6,7,9,8,7,8,8,9,1,0])
>>> a
array([1, 4, 8, 2, 6, 7, 9, 8, 7, 8, 8, 9, 1, 0])
>>> np.where(a==8)[0]
array([ 2, 7, 9, 10], dtype=int64)

Resources