Does convolution in Theano rotate the filters? - theano

I have an 3-channel 5-by-5 image like this:
1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
And a 3-channel 3-by-3 filter like this:
10 20 30 0.1 0.2 0.3 1 2 3
40 50 60 0.4 0.5 0.6 4 5 6
70 80 90 0.7 0.8 0.9 7 8 9
When convolve the image with the filter, I am expecting this output:
369.6 514.8 316.8
435.6 594. 356.4
211.2 277.2 158.4
However, Theano (using keras) gives me this output:
158.4 277.2 211.2
356.4 594. 435.6
316.8 514.8 369.6
It seems the output is rotated 180 degrees, I wonder why this happens and how can I get the correct answer. Here is my test code:
def SimpleNet(weight_array,biases_array):
model = Sequential()
model.add(ZeroPadding2D(padding=(1,1),input_shape=(3,5,5)))
model.add(Convolution2D(1, 3, 3, weights=[weight_array,biases_array],border_mode='valid',subsample=(2,2)))
return model
im = np.asarray([
1,1,1,1,1,
1,1,1,1,1,
1,1,1,1,1,
1,1,1,1,1,
1,1,1,1,1,
2,2,2,2,2,
2,2,2,2,2,
2,2,2,2,2,
2,2,2,2,2,
2,2,2,2,2,
3,3,3,3,3,
3,3,3,3,3,
3,3,3,3,3,
3,3,3,3,3,
3,3,3,3,3])
weight_array = np.asarray([
10,20,30,
40,50,60,
70,80,90,
0.1,0.2,0.3,
0.4,0.5,0.6,
0.7,0.8,0.9,
1,2,3,
4,5,6,
7,8,9])
im = np.reshape(im,[1,3,5,5])
weight_array = np.reshape(weight_array,[1,3,3,3])
biases_array = np.zeros(1)
model = SimpleNet(weight_array,biases_array)
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(optimizer=sgd, loss='categorical_crossentropy')
out = model.predict(im)
print out.shape
print out

This is the definition of convolution. It has the advantage that if you convolve an image that consists of only zeros except for one single 1 somewhere, the convolution will place a copy of the filter at that position.
Theano does exactly these convolutions, as defined mathematically. This implies flipping the filters (the operation is filter[:, :, ::-1, ::-1]) before taking dot products with the image patches. Note that these are not rotations by 180 degrees, at least not in general.
It appears that what you are looking for is cross-correlation, which is taking dot products with the non-flipped versions of the filters at each point of the image.
See also this answer in which theano.tensor.nnet.conv2d is shown to do exactly the same thing as the scipy counterpart.

Related

Create Multiple rows for each value in given column in pandas df

I have a dataframe with points given in two columns x and y.
Thing x y length_x length_y
0 A 1 3 1 2
1 B 2 3 2 1
These (x,y) points are situated in the middle of one of the sides of a rectangle with vertex lengths length_x and length_y. What I wish to do is for each of these points give the coordinates of the rectangles they are on. That is: the following coordinated for Thing A would be:
(1+1*0.5, 3), (1-1*0.5,3), (1+1*0.5,3-2*0.5), (1-1*0.5, 3-2*0.5)
The half comes from the fact that the given lengths are the middle-points of an object so half the length is the distance from that point to the corner of the rectangle.
Hence my desired output is:
Thing x y Corner_x Corner_y length_x length_y
0 A 1 3 1.5 2.0 1 2
1 A 1 3 1.5 1.0 1 2
2 A 1 3 0.5 2.0 1 2
3 A 1 3 0.5 1.0 1 2
4 A 1 3 1.5 2.0 1 2
5 B 2 3 3.0 3.0 2 1
6 B 2 3 3.0 2.5 2 1
7 B 2 3 1.0 3.0 2 1
8 B 2 3 1.0 2.5 2 1
9 B 2 3 3.0 3.0 2 1
I tried to do this with defining a lambda returning two value but failed. Tried even to create multiple columns and then stack them, but it's really dirty.
bb = []
for thing in list_of_things:
new_df = df[df['Thing']=='{}'.format(thing)]
df = df.sort_values('x',ascending=False)
df['corner 1_x'] = df['x']+df['length_x']/2
df['corner 1_y'] = df['y']
df['corner 2_x'] = df['x']+1df['x_length']/2
df['corner 2_y'] = df['y']-df['length_y']/2
.........
Note also that the first corner's coordinates need to be repeated as I later what to use geopandas to transform each of these sets of coordinates into a POLYGON.
What I am looking for is a way to generate these rows is a fast and clean way.
You can use apply to create your corners as lists and explode them to the four rows per group.
Finally join the output to the original dataframe:
df.join(df.apply(lambda r: pd.Series({'corner_x': [r['x']+r['length_x']/2, r['x']-r['length_x']/2],
'corner_y': [r['y']+r['length_y']/2, r['y']-r['length_y']/2],
}), axis=1).explode('corner_x').explode('corner_y'),
how='right')
output:
Thing x y length_x length_y corner_x corner_y
0 A 1 3 1 2 1.5 4
0 A 1 3 1 2 1.5 2
0 A 1 3 1 2 0.5 4
0 A 1 3 1 2 0.5 2
1 B 2 3 2 1 3 3.5
1 B 2 3 2 1 3 2.5
1 B 2 3 2 1 1 3.5
1 B 2 3 2 1 1 2.5

Conditional Cumulative Sum Based Multiple Pandas Columns

I have a dataframe that contains multiple "stacks" and their corresponding "lengths".
df = pd.DataFrame({'stack-1-material': ['rock', 'paper', 'paper', 'scissors', 'rock'], 'stack-2-material': ['rock', 'paper', 'rock', 'paper', 'scissors'], 'stack-1-length': [3, 1, 1, 2, 3], 'stack-2-length': [3, 1, 3, 1, 2]})
stack-1-material stack-2-material stack-1-length stack-2-length
0 rock rock 3 3
1 paper paper 1 1
2 paper rock 1 3
3 scissors paper 2 1
4 rock scissors 3 2
I am trying to create a separate column for each material that tracks the cumulative sum of the length regardless of which "stack" they're. I've tried using groupby but am only able to get the cumulative sum into a single column. Here is what I'm looking for:
stack-1-material stack-2-material stack-1-length stack-2-length rock_cumsum paper_cumsum scissors_cumsum
0 rock rock 3 3 6 0 0
1 paper paper 1 1 6 2 0
2 paper rock 1 3 9 3 0
3 scissors paper 2 1 9 4 2
4 rock scissors 3 2 12 4 4
you can use the columns materials as mask on the columns length, then sum along the column and cumsum, for each material.
#separate material and length
material = df.filter(like='material').to_numpy()
lentgh = df.filter(like='length')
# get all unique material
l_mat = np.unique(material)
# iterate over nique materials
for mat in l_mat:
df[f'{mat}_cumsum'] = lentgh.where(material==mat).sum(axis=1).cumsum()
print(df)
stack-1-material stack-2-material stack-1-length stack-2-length \
0 rock rock 3 3
1 paper paper 1 1
2 paper rock 1 3
3 scissors paper 2 1
4 rock scissors 3 2
rock_cumsum paper_cumsum scissors_cumsum
0 6.0 0.0 0.0
1 6.0 2.0 0.0
2 9.0 3.0 0.0
3 9.0 4.0 2.0
4 12.0 4.0 4.0
First, reverse your column names that way we can use wide_to_long to reshape the DataFrame.
Then take the cumsum within material and determine the max value per material per row. We can then reshape this and ffill and replace the remaining NaN with 0s and join back to the original.
df.columns = ['-'.join(x[::-1]) for x in df.columns.str.rsplit('-', n=1)]
res = (pd.wide_to_long(df.reset_index(), stubnames=['material', 'length'],
i='index', j='whatever', suffix='.*')
.sort_index(level=0))
# material length
#index whatever
#0 -stack-1 rock 3
# -stack-2 rock 3
#1 -stack-1 paper 1
# -stack-2 paper 1
#2 -stack-1 paper 1
# -stack-2 rock 3
#3 -stack-1 scissors 2
# -stack-2 paper 1
#4 -stack-1 rock 3
# -stack-2 scissors 2
res['csum'] = res.groupby('material')['length'].cumsum()
res = (res.groupby(['index', 'material'])['csum'].max()
.unstack(-1).ffill().fillna(0, downcast='infer')
.add_suffix('_cumsum'))
df = pd.concat([df, res], axis=1)
material-stack-1 material-stack-2 length-stack-1 length-stack-2 paper_cumsum rock_cumsum scissors_cumsum
0 rock rock 3 3 0 6 0
1 paper paper 1 1 2 6 0
2 paper rock 1 3 3 9 0
3 scissors paper 2 1 4 9 2
4 rock scissors 3 2 4 12 4

how to change a value of a cell that contains nan to another specific value?

I have a dataframe that contains nan values in particular column. while iterating through the rows, if it come across nan(using isnan() method) then I need to change it to some other value(since I have some conditions). I tried using replace() and fillna() with limit parameter also but they are modifying whole column when they come across the first nan value? Is there any method that I can assign value to specific nan rather than changing all the values of a column?
Example: the dataframe looks like it:
points sundar cate king varun vicky john charlie target_class
1 x2 5 'cat' 4 10 3 2 1 NaN
2 x3 3 'cat' 1 2 3 1 1 NaN
3 x4 6 'lion' 8 4 3 7 1 NaN
4 x5 4 'lion' 1 1 3 1 1 NaN
5 x6 8 'cat' 10 10 9 7 1 0.0
an I have a list like
a = [1.0, 0.0]
and I expect to be like
points sundar cate king varun vicky john charlie target_class
1 x2 5 'cat' 4 10 3 2 1 1.0
2 x3 3 'cat' 1 2 3 1 1 1.0
3 x4 6 'lion' 8 4 3 7 1 1.0
4 x5 4 'lion' 1 1 3 1 1 0.0
5 x6 8 'cat' 10 10 9 7 1 0.0
I wanted to change the target_class values based on some conditions and assign values of the above list.
I believe need replace NaNs values to 1 only for indexes specified in list idx:
mask = df['target_class'].isnull()
idx = [1,2,3]
df.loc[mask, 'target_class'] = df[mask].index.isin(idx).astype(int)
print (df)
points sundar cate king varun vicky john charlie target_class
1 x2 5 'cat' 4 10 3 2 1 1.0
2 x3 3 'cat' 1 2 3 1 1 1.0
3 x4 6 'lion' 8 4 3 7 1 1.0
4 x5 4 'lion' 1 1 3 1 1 0.0
5 x6 8 'cat' 10 10 9 7 1 0.0
Or:
idx = [1,2,3]
s = pd.Series(df.index.isin(idx).astype(int), index=df.index)
df['target_class'] = df['target_class'].fillna(s)
EDIT:
From comments solution is assign values by index and columns values with DataFrame.loc:
df2.loc['x2', 'target_class'] = list1[0]
I suppose your conditions for imputing the nan values does not depend on the number of them in a column. In the code below I stored all the imputation rules in one function that receives as parameters the entire row (containing the nan) and the column you are investigating for. If you also need all the dataframe for the imputation rules, just pass it through the replace_nan function. In the example I imputate the col element with the mean values of the other columns.
import pandas as pd
import numpy as np
def replace_nan(row, col):
row[col] = row.drop(col).mean()
return row
df = pd.DataFrame(np.random.rand(5,3), columns = ['col1', 'col2', 'col3'])
col_to_impute = 'col1'
df.loc[[1, 3], col_to_impute] = np.nan
df = df.apply(lambda x: replace_nan(x, col_to_impute) if np.isnan(x[col_to_impute]) else x, axis=1)
The only thing that you should do is making the right assignation. That is, make an assignation in the rows that contain nulls.
Example dataset:
,event_id,type,timestamp,label
0,asd12e,click,12322232,0.0
1,asj123,click,212312312,0.0
2,asd321,touch,12312323,0.0
3,asdas3,click,33332233,
4,sdsaa3,touch,33211333,
Note: The last two rows contains nulls in column: 'label'. Then, we load the dataset:
df = pd.read_csv('dataset.csv')
Now, we make the appropiate condition:
cond = df['label'].isnull()
Now, we make the assignation over these rows (I don't know the logical of assignation. Therefore I assign 1 value to NaN's):
df1.loc[cond,'label'] = 1
There are another more accurate approaches. fillna() method could be used. You should provide the logical in order to help you.

Conditional cumulative sum in Python/Pandas

Consider my dataframe, df:
data data_binary sum_data
2 1 1
5 0 0
1 1 1
4 1 2
3 1 3
10 0 0
7 0 0
3 1 1
How can I calculate the cumulative sum of data_binary within groups of contiguous 1 values?
The first group of 1's had a single 1 and sum_data has only a 1. However, the second group of 1's has 3 1's and sum_data is [1, 2, 3].
I've tried using np.where(df['data_binary'] == 1, df['data_binary'].cumsum(), 0), but that returns
array([1, 0, 2, 3, 4, 0, 0, 5])
Which is not what I want.
You want to take the cumulative sum of data_binary and subtract the most recent cumulative sum where data_binary was zero.
b = df.data_binary
c = b.cumsum()
c.sub(c.mask(b != 0).ffill(), fill_value=0).astype(int)
Output
0 1
1 0
2 1
3 2
4 3
5 0
6 0
7 1
Name: data_binary, dtype: int64
Explanation
Let's start by looking at each step side by side
cols = ['data_binary', 'cumulative_sum', 'nan_non_zero', 'forward_fill', 'final_result']
print(pd.concat([
b, c,
c.mask(b != 0),
c.mask(b != 0).ffill(),
c.sub(c.mask(b != 0).ffill(), fill_value=0).astype(int)
], axis=1, keys=cols))
Output
data_binary cumulative_sum nan_non_zero forward_fill final_result
0 1 1 NaN NaN 1
1 0 1 1.0 1.0 0
2 1 2 NaN 1.0 1
3 1 3 NaN 1.0 2
4 1 4 NaN 1.0 3
5 0 4 4.0 4.0 0
6 0 4 4.0 4.0 0
7 1 5 NaN 4.0 1
The problem with cumulative_sum is that the rows where data_binary is zero, do not reset the sum. And that is the motivation for this solution. How do we "reset" the sum when data_binary is zero? Easy! I slice the cumulative sum where data_binary is zero and forward fill the values. When I take the difference between this and the cumulative sum, I've effectively reset the sum.
I think you can groupby with DataFrameGroupBy.cumsum by Series, where first compare the next value by the shifted column if not equal (!=) and then create groups by cumsum. Last, replace 0 by column data_binary with mask:
print (df.data_binary.ne(df.data_binary.shift()).cumsum())
0 1
1 2
2 3
3 3
4 3
5 4
6 4
7 5
Name: data_binary, dtype: int32
df['sum_data1'] = df.data_binary.groupby(df.data_binary.ne(df.data_binary.shift()).cumsum())
.cumsum()
df['sum_data1'] = df['sum_data1'].mask(df.data_binary == 0, 0)
print (df)
data data_binary sum_data sum_data1
0 2 1 1 1
1 5 0 0 0
2 1 1 1 1
3 4 1 2 2
4 3 1 3 3
5 10 0 0 0
6 7 0 0 0
7 3 1 1 1
If you want the excellent piRSquared's answer in just one single command:
df['sum_data'] = df[['data_binary']].apply(
lambda x: x.cumsum().sub(x.cumsum().mask(x != 0).ffill(), fill_value=0).astype(int),
axis=0)
Note that the double squared bracket on the right hand side is necessary to make a one-column DataFrame instead of a Series in order to use apply with the axis argument (which is not available when apply is used on Series).

How to get Equation of a decision boundary in matlab svm plot?

my data
y n Rh y2
1 1 1.166666667 1
-1 2 0.5 1
-1 3 0.333333333 1
-1 4 0.166666667 1
1 5 1.666666667 2
1 6 1.333333333 1
-1 7 0.333333333 1
-1 8 0.333333333 1
1 9 0.833333333 1
1 10 2.333333333 2
1 11 1 1
-1 12 0.166666667 1
1 13 0.666666667 1
1 14 0.833333333 1
1 15 0.833333333 1
-1 16 0.333333333 1
-1 17 0.166666667 1
1 18 2 2
1 19 0.833333333 1
1 20 1.333333333 1
1 21 1.333333333 1
-1 22 0.166666667 1
-1 23 0.166666667 1
-1 24 0.333333333 1
-1 25 0.166666667 1
-1 26 0.166666667 1
-1 27 0.333333333 1
-1 28 0.166666667 1
-1 29 0.166666667 1
-1 30 0.5 1
1 31 0.833333333 1
-1 32 0.166666667 1
-1 33 0.333333333 1
-1 34 0.166666667 1
-1 35 0.166666667 1
my codes r
data=xlsread('btpdata.xlsx',1.)
A = data(1:end,2:3)
B = data(1:end,1)
svmStruct = svmtrain(A,B,'showplot',true)
hold on
C = data(1:end,2:3)
D = data(1:end,4)
svmStruct = svmtrain(C,D,'showplot',true)
hold off
How can i get the approximate equations of this black lines in the given mat-lab plot?
It depends what package you did use, but as it is a linear Support Vector Machine there are more or less two options:
Your trained svm contains the equation of the line in a property coefs (sometimes called w or weights) and b (or intercept), so your line is <coefs, X> + b = 0
Your svm containes alphas (dual coefficients, Lagrange multipliers) and then coefs = SUM_i alphas_i * y_i * SV_i where SV_i is i'th support vector (the ones in circles on your plot) and y_i is its label (-1 or +1). Sometimes alphas are already multiplied by y_i, then your coefs = SUM_i alphas_i * SV_i.
If you are trying to get the equation from the actual plot (image), then you can only read it (and it is more or less y = 0.6, meaning that coefs = [0 1] and b = -0.6. Image analysis based approach (for arbitrary such plot) would require:
detecting image part (object detection)
reading the ticks/scale (OCR + object detection) <- this would be actually the hardest part
filtering out everything non-black and performing linear regression to points left, then trasforming through scale detected earlier.
I have had the same problem. To build the linear equation (y = mx + b) of the decision boundary you need the gradient (m) and the y-intercept (b). SVMStruct.Bias is the b-term. The gradient is determined by the SVM beta weights, which SVMStruct does not contain so you need to calculate them from the alphas (which are included in SVMStruct):
alphas = SVMStruct.Alpha;
SV = SVMStruct.SupportVectors;
betas = sum(alphas.*SV);
m = betas(1)/betas(2)
By the way, if your SVM has scaled the data, then I think you will need to unscale it.

Resources