I seem to be unable to find out how to vectorize this py3 loop
import numpy as np
a = np.array([-72, -10, -70, 37, 68, 9, 1, -3, 2, 3, -6, -4, ], np.int16)
result = np.array([-72, -10, -111, -23, 1, -2, 1, -3, 1, 2, -5, -5, ], np.int16)
b = np.copy(a)
for i in range(2, len(b)):
b[i] += int( (b[i-1] + b[i-2]) / 2)
assert (b == result).all()
I tried playing with np.convolve and pandas.rolling_apply but couldn't get it working. Maybe this is the time to learn about c-extensions?
It would be great to get the time for this down to something like 50..100ms for input arrays of ~500k elements.
#hpaulj asked in his answer for a closed expression of b[k] in terms of a[:k]. I didn't think it existed, but I worked a bit on it and indeed found that the closed form contains a bunch of Jacobsthal numbers as #Divakar pointed out.
Here is one closed form:
J_n here is the Jacobsthal number, when expanding it like this:
J_n = (2^n - (-1)^n) / 3
one ends up with an expression which I can imagine to use a vectorized implementation ...
Most numpy code operates on the whole array at once. Ok it iterates in C code but buffered in a way that it doesn't matter which element is used first.
Here changes to b[2] affect the value calculated for b[3] and on down the line.
add.at and other such ufunc do unbuffered calculations. This allows you to add some value repeatedly to one element. I played a bit with it in that case, but no luck so far.
cumsum and cumprod are also handy for problems were values depend earlier ones.
Is it possible to generalize the calculation, so as to as define b[i] in terms of all the a[:i]. We know b[2] as a function of a[:2], but what of b[3]?
Even if we go this working for floats, it might be off when doing integer divisions.
I think you already have the sane solution. Any other vectorization would rely on floating point calculations and it would be really difficult to keep track of the error accumulation. For example say you want to have a matrix vector multiplication: for the first seven terms the matrix would look like
array([[ 1. , 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 1. , 0. , 0. , 0. , 0. , 0. ],
[ 0.5 , 0.5 , 1. , 0. , 0. , 0. , 0. ],
[ 0.25 , 0.75 , 0.5 , 1. , 0. , 0. , 0. ],
[ 0.375 , 0.625 , 0.75 , 0.5 , 1. , 0. , 0. ],
[ 0.3125 , 0.6875 , 0.625 , 0.75 , 0.5 , 1. , 0. ],
[ 0.34375, 0.65625, 0.6875 , 0.625 , 0.75 , 0.5 , 1. ]])
The relationship can be described as the iterative formula
[ a[i-2] ]
b[i] = [0.5 , 0.5 , 1] [ a[i-1] ]
[ a[i] ]
That defines a series of elementary matrices of the form of an identity matrix with
[0 ... 0.5 0.5 1 0 ... 0]
on the ith row. And successive multiplication gives the matrix above for the first seven terms. There is indeed a subdiagonal structure but the terms are getting too small very quickly. As you have shown 2 to the power 500k is not fun.
In order to keep track of floating point noise, an iterative solution is required which is what you have anyways.
Related
I want to recode the values in my label array so that the labels 0,1,2 correspond to the center values
1.00162877,0.74014188,1.16120161
import numpy as np
label=np.array([0, 2, 1, 1, 2, 1, 0, 0, 1, 2])
center=np.array([[1.00162877],
[0.74014188],
[1.16120161]])
Using the np.where is not overwriting all the values in a single loop but returning 3 different arrays where only a single value is changed and not all.
for i in range(len(center)):
result=np.where(label==[i], center[i], label)
print(result)
[1.00162877 2. 1. 1. 2. 1.
1.00162877 1.00162877 1. 2. ]
[0. 2. 0.74014188 0.74014188 2. 0.74014188
0. 0. 0.74014188 2. ]
[0. 1.16120161 1. 1. 1.16120161 1.
0. 0. 1. 1.16120161]
How to modify the np.where or using any other function that the outcome will look like this.
Expected=([1.00162877,1.16120161,0.74014188,0.74014188,1.1612016,0.74014188,
1.00162877,1.00162877,0.74014188,1.16120161])
This is not a loop but I think it works:
center[label].ravel()
Output:
array([1.00162877, 1.16120161, 0.74014188, 0.74014188, 1.16120161,
0.74014188, 1.00162877, 1.00162877, 0.74014188, 1.16120161])
For whatever reason this only returns 0 or 1 instead of float between them.
from sklearn import preprocessing
X = [[1.3, 1.6, 1.4, 1.45, 12.3, 63.01,],
[1.9, 0.01, 4.3, 45.4, 3.01, 63.01]]
minmaxscaler = preprocessing.MinMaxScaler()
X_scale = minmaxscaler.fit_transform(X)
print(X_scale) # returns [[0. 1. 0. 0. 1. 0.] [1. 0. 1. 1. 0. 0.]]
Minmax Scaler can not work with list of lists, it needs to work with numpy array for example (or dataframes).
You can convert to numpy array. It will result 6 features with 2 samples, which I guess is not what you means so you need also reshape.
import numpy
X = numpy.array([[1.3, 1.6, 1.4, 1.45, 12.3, 63.01,],
[1.9, 0.01, 4.3, 45.4, 3.01, 63.01]]).reshape(-1,1)
Results after MinMax Scaler:
[[0.02047619]
[0.0252381 ]
[0.02206349]
[0.02285714]
[0.19507937]
[1. ]
[0.03 ]
[0. ]
[0.06809524]
[0.72047619]
[0.04761905]
[1. ]]
Not exactly sure if you want to minimax each list separatly or all together
The answer which you have got from MinMaxScaler is the expected answer.
When you have only two datapoints, you will get only 0s and 1s. See the example here for three datapoints scenario.
You need to understand that it will convert the lowest value as 0 and highest values as 1 for each column. When you have more datapoints, the remaining ones would calculation based on the range (Max-min). see the formula here.
Also, MinMaxScaler accepts 2D data, which means lists of list is acceptable. Thats the reason why you did not got any error.
I've been playing around with numpy's linalg module and wanted to get the eigenvectors for the following matrix:
import numpy as np
matrix = np.array([[4,0,-1],[0,3,0],[1,0,2]])
w,v = np.linalg.eig(matrix)
print(v)
array([[0.70710678, 0.70710678, 0. ],
[0. , 0. , 1. ],
[0.70710678, 0.70710678, 0. ]])
Calculating the eigenvectors by hand gives me only two vectors which are [1,0,1] and [0,1,0]. I know that numpy normalizes the vectors which is fine but the problem is when I try to check if the first and second columns are equal:
v[:,0] == v[:,1]
array([False, True, False])
This gives me the impression that these are two different vectors (so I now have a total of 3 eigenvectors) when I already know I'll only get two.
Can someone please explain what's going on here.
I am working on a clustering problem. There's a situation where I have 3 cluster centers as below, and I want to calculate euclidean distance from these 3 cluster centers from another m*n dimensional matrix. It would be very helpful if anyone can guide me through this.
kmeans.cluster_centers_
Out[99]:
array([[-2.23020213, 0.35654288],
[ 7.69370352, 1.72991757],
[ 0.92519202, -0.29218753]])
matrix
Out[100]:
array([[ 0.11650485, 0.11650485, 0.11650485, 0.11650485, 0.11650485,
0.11650485],
[ 0.11650485, 0.18446602, 0.18446602, 0.2815534 , 0.37864078,
0.37864078],
[ 0.21359223, 0.21359223, 0.21359223, 0.21359223, 0.29708738,
0.35533981],
...,
[ 0.2640625 , 0.2734375 , 0.30546875, 0.31953125, 0.31953125,
0.31953125],
[ 1. , 1. , 1. , 1. , 1. ,
1. ],
[ 0.5 , 0.5 , 0.5 , 0.5 , 0.5 ,
0.5 ]])
I want to do it in Python. I have used sklearn for my clustering.
Euclidean distance is defined on vectors of a fixed length d.
I.e.it is a function R^d x R^d -> R
So whatever you are trying to do - it is not the usual Euclidean distance. You seem to have k=3 cluster centers with d=2 coordinates, but your matrix has an incompatible shape that cannot be interpreted in an obvious way as 2 d vectors.
I am working on semantic segmentation using CNNs. I have an imbalance number of pixels for each class.
Based on this link, I am trying to create weight matrix H in order to define Infogain loss layer for my imbalance class members.
My data has five classes. I wrote the following code in python:
Reads a sample image:
im=imread(sample_img_path)
Counts the number of pixels of each class
cl0=np.count_nonzero(im == 0) #0=background class
.
.
cl4=np.count_nonzero(im == 4) #4=class 4
output:
39817 13751 1091 10460 417
#Inverse class weights
#FORMULA=(total number of sample)/((number of classes)*(number of sample in class i))
w0=round(sum_/(no_classes*cl0),3)
w1=round(sum_/(no_classes*cl1),3)
w2=round(sum_/(no_classes*cl2),3)
w3=round(sum_/(no_classes*cl3),3)
w4=round(sum_/(no_classes*cl4),3)
print w0,w1,w2,w3,w4
L_1=[w0,w1,w2,w3,w4]
#weighting based on the number of pixel
print L_1
L=[round(i/sum(L_1),2) for i in L_1] #normalizing the weights
print L
print sum(L)
#creating the H matrix
H=np.eye(5)
print H
#H = np.eye( L, dtype = 'f4' )
d=np.diag_indices_from(H)
H[d]=L
print H
blob = caffe.io.array_to_blobproto(H.reshape((1,1,L,L)))
with open( 'infogainH.binaryproto', 'wb' ) as f :
f.write( blob.SerializeToString() )
print f
The output, after removing some unimportant lines, is as follows:
(256, 256)
39817 13751 1091 10460 417
0.329 0.953 12.014 1.253 31.432
<type 'list'>
[0.329, 0.953, 12.014, 1.253, 31.432]
[0.01, 0.02, 0.26, 0.03, 0.68]
1.0
[[ 1. 0. 0. 0. 0.]
[ 0. 1. 0. 0. 0.]
[ 0. 0. 1. 0. 0.]
[ 0. 0. 0. 1. 0.]
[ 0. 0. 0. 0. 1.]]
[[ 0.01 0. 0. 0. 0. ]
[ 0. 0.02 0. 0. 0. ]
[ 0. 0. 0.26 0. 0. ]
[ 0. 0. 0. 0.03 0. ]
[ 0. 0. 0. 0. 0.68]]
Traceback (most recent call last):
File "create_class_prob.py", line 59, in <module>
blob = caffe.io.array_to_blobproto(H.reshape((1,1,L,L)))
TypeError: an integer is required
As it can be seen, it is giving an error. My question can be folded into two parts:
How to solve this error?
I replaced L with 5 as follows:
blob = caffe.io.array_to_blobproto(H.reshape((1,1,5,5)))
Now, it is not giving error and last line shows this:
<closed file 'infogainH.binaryproto', mode 'wb' at 0x7f94b5775b70>
It created the file infogainH.binaryproto, Is this correct?
Is this matrix H should be constant for the all images in database?
I really appreciate any help.
Thanks
You have a simple "copy-paste" bug. You copied your code from this answer where L was an integer representing the number of classes. In your code, on the other hand, L is a list with the class weights. replacing L with 5 in your code does indeed solves the problem.
Should H be constant? This is really up to you to decide.
BTW, AFAIK, current caffe version does not support pixel-wise infogain loss, you might need to use the code in PR #3855.