How to computes the Jacobian of BertForMaskedLM using jacrev - nlp

I tried this plan blow to compute the Jacobian of BertForMaskedLM using jacrev:
import numpy as np
from transformers import BertTokenizer,BertForMaskedLM
import torch
import torch.nn as nn
from functorch import make_functional, make_functional_with_buffers, vmap, vjp, jvp, jacrev
device = 'cuda:2'
torch.cuda.empty_cache()
model_name = 'bert-base-chinese'
tokenizer = BertTokenizer.from_pretrained(model_name)
bert_model = BertForMaskedLM.from_pretrained(model_name)
net = bert_model.to(device)
fnet, params, buffers = make_functional_with_buffers(net)
def fnet_single(params,x,y):
result = fnet(params, buffers, x.unsqueeze(0).unsqueeze(0),y.unsqueeze(0).unsqueeze(0))['logits']
return result.squeeze(0).squeeze(0)
text = u'大肠杆菌是人和许多动物肠道中最主要的一种细菌'
inputs = tokenizer.encode_plus(text)
segment_ids = inputs['token_type_ids']
token_ids = inputs['input_ids']
length = len(token_ids) - 2
batch_token_ids = torch.tensor([token_ids] * (2 * length - 1),requires_grad=True).to(device)
batch_segment_ids = torch.zeros_like(batch_token_ids).to(device)
for i in range(length):
if i > 0:
batch_token_ids[2 * i - 1, i] = 103
batch_token_ids[2 * i - 1, i + 1] = 103
batch_token_ids[2 * i, i + 1] = 103
threshold = 100
word_token_ids = [[token_ids[1]]]
for i in range(1, length):
x,y = batch_token_ids[2 * i],batch_segment_ids[2*i]
jacobian1 = jacrev(fnet_single,argnums=1)(params,x,y)
x,y = batch_token_ids[2 * i - 1],batch_segment_ids[2*i-1]
jacobian2 = jacrev(fnet_single,argnums=1)(params,x,y)
Howerer,an error appeared:
'Traceback (most recent call last):
File "study_jacrev.py", line 49, in
batch_token_ids = torch.tensor([token_ids] * (2 * length - 1),requires_grad=True).to(device)
RuntimeError: Only Tensors of floating point and complex dtype can require gradients'
Is there anyone to help me?

It is because you are trying to get the jacobian with respect to data for whom the gradient scope is not set.
If you want to get the jacobian wrt parameters: jacrev(fnet_single, argnums=0)(params, x, y)
If you want to get the jacobian wrt data: x = x.to(torch.float32).requires_grad_(True) (note that converting x dtype to float is mandatory to set the scope on it)

Related

3D plotting lorentz python

I'm pretty new to python, but I've been working on this program to graph the solutions to the Lorenz differential equation in 3D, but
I keep getting this error:
Traceback (most recent call last):
File "lorenz_attractor3D.py", line 3, in <module>
from mpl_toolkits.mplot3d.axes3d import Axes3D
ImportError: No module named mpl_toolkits.mplot3d.axes3d]
Not clear why. I don't know if it's because I don't have matlib installed correctly.
Code:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d.axes3d import Axes3D
from matplotlib import cm
def lorenz(x, y, z, s=10, r=28, b=2.667):
"""
Given:
x, y, z: a point of interest in three dimensional space
s, r, b: parameters defining the lorenz attractor
Returns:
x_dot, y_dot, z_dot: values of the lorenz attractor's
derivatives at the point x, y, z
"""
x_dot = s*(y - x)
y_dot = r*x - y - x*z
z_dot = x*y - b*z
return x_dot, y_dot, z_dot
dt = 0.01
num_steps = 10000
# Need one more for the initial values
xs = np.empty(num_steps + 1)
ys = np.empty(num_steps + 1)
zs = np.empty(num_steps + 1)
# Set initial values
xs[0], ys[0], zs[0] = (0., 1., 1.05)
# Step through "time", calculating the partial derivatives at the current point
# and using them to estimate the next point
for i in range(num_steps):
x_dot, y_dot, z_dot = lorenz(xs[i], ys[i], zs[i])
xs[i + 1] = xs[i] + (x_dot * dt)
ys[i + 1] = ys[i] + (y_dot * dt)
zs[i + 1] = zs[i] + (z_dot * dt)
# Plot
fig = plt/figure()
ax = plt.figure().add_subplot(projection='3d')
ax.plot_wireframe(xs, ys, zs, lw=0.5)
ax.set_xlabel("X Axis")
ax.set_ylabel("Y Axis")
ax.set_zlabel("Z Axis")
ax.set_title("Lorenz Attractor")
plt.plot(xs,ys)
plt.show()

Argument must be a string or a number issue, Not 'Type' - Pyspark

Update:
So i have been looking into the issue, the problem is with scikit-multiflow datastream. in last quarter of code stream_clf.partial_fit(X,y, classes=stream.target_values) here the class valuefor stream.target_values should a number or string, but the method is returning (dtype). When i print or loop stream.target_values i get this:
I have tried to do conversion etc. but still of no use. can someone please help here ?
Initial Problem
I am running a code (took inspiration from here). It works perfectly alright when used vanilla python environment.
But if i run this code after certain modification in Apache Spark using Pyspark , i get the following error
TypeError: int() argument must be a string, a bytes-like object or a number, not 'type'
I have tried every possibile way to trace the issue but everything looks alright. The error arises from the last line of the code where hoefding tree is called for prediction. It expects an ndarray and the type of X variable is also ndarray. I am not sure what is trigerring the issue. Can some one please help or direct me to right trace?
complete stack of error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-52-1310132c88db> in <module>
30 D3_win.addInstance(X,y)
31 xx = np.array(X,dtype='float64')
---> 32 y_hat = stream_clf.predict(xx)
33
34
~/conceptDrift/projectTest/lib/python3.5/site-packages/skmultiflow/trees/hoeffding_tree.py in predict(self, X)
1068 r, _ = get_dimensions(X)
1069 predictions = []
-> 1070 y_proba = self.predict_proba(X)
1071 for i in range(r):
1072 index = np.argmax(y_proba[i])
~/conceptDrift/projectTest/lib/python3.5/site-packages/skmultiflow/trees/hoeffding_tree.py in predict_proba(self, X)
1099 votes = normalize_values_in_dict(votes, inplace=False)
1100 if self.classes is not None:
-> 1101 y_proba = np.zeros(int(max(self.classes)) + 1)
1102 else:
1103 y_proba = np.zeros(int(max(votes.keys())) + 1)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'type'
Code
import findspark
findspark.init()
import pyspark as ps
import warnings
from pyspark.sql import functions as fn
import sys
from pyspark import SparkContext,SparkConf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import StratifiedKFold
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score as AUC
from sklearn.preprocessing import MinMaxScaler
import matplotlib.pyplot as plt
from skmultiflow.trees.hoeffding_tree import HoeffdingTree
from skmultiflow.data.data_stream import DataStream
import time
def drift_detector(S,T,threshold = 0.75):
T = pd.DataFrame(T)
#print(T)
S = pd.DataFrame(S)
# Give slack variable in_target which is 1 for old and 0 for new
T['in_target'] = 0 # in target set
S['in_target'] = 1 # in source set
# Combine source and target with new slack variable
ST = pd.concat( [T, S], ignore_index=True, axis=0)
labels = ST['in_target'].values
ST = ST.drop('in_target', axis=1).values
# You can use any classifier for this step. We advise it to be a simple one as we want to see whether source
# and target differ not to classify them.
clf = LogisticRegression(solver='liblinear')
predictions = np.zeros(labels.shape)
# Divide ST into two equal chunks
# Train LR on a chunk and classify the other chunk
# Calculate AUC for original labels (in_target) and predicted ones
skf = StratifiedKFold(n_splits=2, shuffle=True)
for train_idx, test_idx in skf.split(ST, labels):
X_train, X_test = ST[train_idx], ST[test_idx]
y_train, y_test = labels[train_idx], labels[test_idx]
clf.fit(X_train, y_train)
probs = clf.predict_proba(X_test)[:, 1]
predictions[test_idx] = probs
auc_score = AUC(labels, predictions)
print(auc_score)
# Signal drift if AUC is larger than the threshold
if auc_score > threshold:
return True
else:
return False
class D3():
def __init__(self, w, rho, dim, auc):
self.size = int(w*(1+rho))
self.win_data = np.zeros((self.size,dim))
self.win_label = np.zeros(self.size)
self.w = w
self.rho = rho
self.dim = dim
self.auc = auc
self.drift_count = 0
self.window_index = 0
def addInstance(self,X,y):
if(self.isEmpty()):
self.win_data[self.window_index] = X
self.win_label[self.window_index] = y
self.window_index = self.window_index + 1
else:
print("Error: Buffer is full!")
def isEmpty(self):
return self.window_index < self.size
def driftCheck(self):
if drift_detector(self.win_data[:self.w], self.win_data[self.w:self.size], auc): #returns true if drift is detected
self.window_index = int(self.w * self.rho)
self.win_data = np.roll(self.win_data, -1*self.w, axis=0)
self.win_label = np.roll(self.win_label, -1*self.w, axis=0)
self.drift_count = self.drift_count + 1
return True
else:
self.window_index = self.w
self.win_data = np.roll(self.win_data, -1*(int(self.w*self.rho)), axis=0)
self.win_label =np.roll(self.win_label, -1*(int(self.w*self.rho)), axis=0)
return False
def getCurrentData(self):
return self.win_data[:self.window_index]
def getCurrentLabels(self):
return self.win_label[:self.window_index]
def select_data(x):
x = "/user/hadoop1/tellus/sea_1.csv"
peopleDF = spark.read.csv(x, header= True)
df = peopleDF.toPandas()
scaler = MinMaxScaler()
df.iloc[:,0:df.shape[1]-1] = scaler.fit_transform(df.iloc[:,0:df.shape[1]-1])
return df
def check_true(y,y_hat):
if(y==y_hat):
return 1
else:
return 0
df = select_data("/user/hadoop1/tellus/sea_1.csv")
stream = DataStream(df)
stream.prepare_for_use()
stream_clf = HoeffdingTree()
w = int(2000)
rho = float(0.4)
auc = float(0.60)
# In[ ]:
D3_win = D3(w,rho,stream.n_features,auc)
stream_acc = []
stream_record = []
stream_true= 0
i=0
start = time.time()
X,y = stream.next_sample(int(w*rho))
stream_clf.partial_fit(X,y, classes=stream.target_values)
while(stream.has_more_samples()):
X,y = stream.next_sample()
if D3_win.isEmpty():
D3_win.addInstance(X,y)
y_hat = stream_clf.predict(X)
Problem was with select_data() function, data type of variables was being changed during the execution. This issue is fixed now.

Solving simple chemical network odes in pymc3 with theano

Im trying to solve a simple chemical network A->B(reaction rate k1) and A1->B(reaction rate k2) with Bayesian inference. My hopes are to get sensitivity analysis of k1 and k2. If A, A1 and B are my constant variables only logical thing would be that if for example k1 decreases k2 should increase for some proportional amount and vice versa. But I am having some troubles with ODE's in pymc3. So here is my attempt:
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import odeint, solve_ivp
import seaborn
import pymc3 as pm
import theano.tensor as T
from theano.compile.ops import as_op
from sys import exit
time = 10
Nt = 11
tt = np.linspace(0,time, Nt)
y0 = [1,2,0]
k1, k2 = 1, 1
#Actual Solution of the Differential Equation(Used to generate data)
def real(t,c):
da_dt = -k1*c[0]
da1_dt = -k2*c[1]
db_dt = k1*c[0] + k2*c[1]
return da_dt, da1_dt, db_dt
c_est = solve_ivp(real, t_span = [0,time], t_eval = tt, y0 = y0)
#Method For Solving the ODE
def lv(xdata, k1=1, k2=1):
def equat(c,t):
da_dt = -k1*c[0]
da1_dt = -k2*c[1]
db_dt = k1*c[0] + k2*c[1]
return da_dt, da1_dt, db_dt
Y, dict = odeint(equat,y0,xdata,full_output=True)
return Y
#Generating Data for Bayesian Inference
k1, k2 = 1, 1
ydata = c_est.y
# Adding some error to the ydata points
yerror = 10*np.random.rand(Nt)
ydata += np.random.normal(0.0, np.sqrt(yerror))
ydata = np.ravel(ydata)
#as_op(itypes=[T.dscalar, T.dscalar], otypes=[T.dvector])
def func(al,be):
Q = lv(tt, k1=al, k2=be)
return np.ravel(Q)
# Number of Samples and Initial Conditions
nsample = 5000
y0 = 1.0
sd = 0.2
# Model for Bayesian Inference
model = pm.Model()
with model:
# Priors for unknown model parameters
k1 = pm.HalfNormal('k1', sd = sd)
k2 = pm.HalfNormal('k2', sd = sd)
# Expected value of outcome
mu = func(k1,k2)
# Likelihood (sampling distribution) of observations
Y_obs = pm.Normal('Y_obs', mu=mu, sd=yerror, observed=y_data)
trace = pm.sample(nsample, nchains=1)
pm.traceplot(trace)
plt.show()
But it doesn't "loop" through equat function. Output error:
Traceback (most recent call last):
File "<ipython-input-16-14ca425a8735>", line 1, in <module>
runfile('/folder/code.py', wdir='/folder')
File "/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 786, in runfile
execfile(filename, namespace)
File "/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "/code.py", line 77, in <module>
mu = func(k1,k2)
File "/anaconda3/lib/python3.7/site-packages/theano/gof/op.py", line 674, in __call__
required = thunk()
File "/anaconda3/lib/python3.7/site-packages/theano/gof/op.py", line 892, in rval
r = p(n, [x[0] for x in i], o)
File "/anaconda3/lib/python3.7/site-packages/theano/compile/ops.py", line 555, in perform
outs = self.__fn(*inputs)
File "/code.py", line 60, in func
Q = lv(tt, k1=al, k2=be)
File "/code.py", line 42, in lv
Y, dict = odeint(equat,y0,xdata,full_output=True)
File "/anaconda3/lib/python3.7/site-packages/scipy/integrate/odepack.py", line 233, in odeint
int(bool(tfirst)))
File "/code.py", line 39, in equat
da1_dt = -k2*c[1]
IndexError: index 1 is out of bounds for axis 0 with size 1
I'm going nuts here. :( I don't even know if I am on the right path.
Edit, corrected that but now it shows another error.
If anyone else has difficulty here I solved it!
from scipy.integrate import odeint, solve_ivp
import numpy as np
import matplotlib.pyplot as plt
from theano.compile.ops import as_op
import theano.tensor as T
import pymc3 as pm
import copy
from sys import exit
time = 10
Nt = 11
tt = np.linspace(0,time, Nt+1)
y0 = [1,2,0]
k1, k2 = 1, 1
def real_equat(t,c):
da_dt = -k1*c[0]
da1_dt = -k2*c[1]
db_dt = k1*c[0] + k2*c[1]
return da_dt, da1_dt, db_dt
z = solve_ivp(real_equat, t_span=[0,time], t_eval= tt, y0 = y0)
def lv(xdata, k1=k1, k2=k2):
def equat(c,t):
da_dt = -k1*c[0]
da1_dt = -k2*c[1]
db_dt = k1*c[0] + k2*c[1]
return da_dt, da1_dt, db_dt
Y, dict = odeint(equat,y0,tt,full_output=True)
return Y
a = z.y
ydata = copy.copy(a)
yerror = 10*np.random.rand(Nt+1)
ydata += np.random.normal(0.0, np.sqrt(yerror))
ydata = np.ravel(ydata)
#as_op(itypes=[T.dscalar, T.dscalar], otypes=[T.dvector])
def func(al,be):
Q = lv(tt, k1 = al, k2 = be)
return np.ravel(Q)
niter = 10
model = pm.Model()
with model:
# Priors for unknown model parameters
k1 = pm.Uniform('k1', upper = 1.2, lower = 0.8)
k2 = pm.Uniform('k2', upper = 1.2, lower = 0.8)
# Expected value of outcome
mu = func(k1,k2)
# Likelihood (sampling distribution) of observations
Y_obs = pm.Normal('Y_obs', mu=mu, sd=0.2, observed=ydata)
trace = pm.sample(niter = niter, nchains=4)

Linear Regression with gradient desent not giving correct results

Linear regression with gradient descent is giving different result on the same dataset compared to sklearn.
Want to know why is that so. Is it the problem of local minima
The dataset is as follows
ht wt
63 127
64 121
66 142
69 157
69 162
71 156
71 169
72 165
73 181
75 208
Sklearn is computing intercept as -266.53439537 and coefficient as 6.13758146
whereas gradient descent is giving intercept as -1.49087014 and coeff as 2.3239637
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
def cost (m,b , data_size):
x = IN
y = OUT
totalError = 0
for i in range (data_size):
x = IN[i]
y = OUT[i]
totalError += ((m*x + b) - y) ** 2
return totalError/ float(data_size)
def compute_gradient(X , Y, theta_1 ,theta_0 , N, learning_rate):
gradient_theta_0 = 0
gradient_theta_1 = 0
#print (X.shape, Y.shape, N)
Y_pred = theta_1*X + theta_0
gradient_theta_1 = ((-2/N) * sum(X * (Y - Y_pred)))
gradient_theta_0 = ((-2/N) * sum(Y - Y_pred))
#print (gradient_theta_0 , gradient_theta_1, gradient_theta_0 *
learning_rate, gradient_theta_1 * learning_rate)
new_theta_0 = theta_0 - (gradient_theta_0 * learning_rate)
new_theta_1 = theta_1 - (gradient_theta_1 * learning_rate)
return (new_theta_1,new_theta_0)
IN = np.array([63 , 64, 66, 69, 69, 71, 71, 72, 73, 75])
OUT = np.array([127,121,142,157,162,156,169,165,181,208])
X = IN[:,np.newaxis]
Y = OUT[:,np.newaxis]
iterations = 10000
initial_theta_0 = 0
initial_theta_1 = 0
learning_rate = 0.00001
theta_0 = initial_theta_0
theta_1 = initial_theta_1
fig,ax = plt.subplots(figsize=(12,8))
cost_history = []
for i in range (iterations):
#print ("iteration {} m {} b {}".format(i, theta_1, theta_0))
[theta_1, theta_0] = compute_gradient(X , Y , theta_1 ,theta_0,
data_size, learning_rate)
totalError = cost (theta_1,theta_0, data_size)
#print (totalError)
cost_history.append (totalError)
ax.plot(range(iterations),cost_history,'b.')
print ("iteration {} m {} b {}".format(i, theta_1, theta_0))
reg_line = [(theta_1 * x) + theta_0 for x in IN]
lm = LinearRegression()
lm.fit(X, Y)
print ("SKLEARN coeff {}".format(lm.coef_))
print ("SKLEARN intercept {}".format(lm.intercept_))
#reg_line = [(lm.coef_[0] * x) + lm.intercept_ for x in IN]
ax3.plot (IN, reg_line , color='red')
plt.show()
print ("SKLEARN coeff {}".format(lm.coef_))
print ("SKLEARN intercept {}".format(lm.intercept_))
RESULTS
iteration 99999 m [2.3239637] b [-1.49087014]
SKLEARN coeff [[6.13758146]]
SKLEARN intercept [-266.53439537]
You have taken bad initial conditions (0,0) and fallen into a local minimum close to that point. More intuitive initial conditions are based on maxima and minima of ht and wt, i.e.
initial_theta_0 = np.min(Y)+np.min(X)*(np.min(Y)-np.max(Y))/(np.max(X)-np.min(X)) #-335.75
initial_theta_1 = (np.max(Y)-np.min(Y))/(np.max(X)-np.min(X)) # 7.25
#initial_theta_0 = 121+63*(121-208)/(75-63) # -335.75
#initial_theta_1 = (208-121)/(75-63) # 7.25

Trying to to use Caffe classifier causes "sequence argument must have length equal to input rank "error

I am trying to use Caffe.Classifier class and its predict() method on my Imagenet trained caffemodel.
Images were resized to 256x256 and crops of 227x227 were used to train the net.
Everything is simple and straight forward, yet I keep getting weird errors such as the following :
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-7-3b440ebf1f6e> in <module>()
17 image_dims=(256, 256))
18
---> 19 out = net.predict([image_caffe], oversample=True)
20 print(labels[out[0].argmax()].strip(),' (', out[0][out[0].argmax()] , ')')
21 plabel = int(labels[out[0].argmax()].strip())
<ipython-input-5-e6ae1810b820> in predict(self, inputs, oversample)
65 for ix, in_ in enumerate(inputs):
66 print('image dims = ',self.image_dims[0],',',self.image_dims[1] ,'_in = ',in_.shape)
---> 67 input_[ix] = caffe.io.resize_image(in_, self.image_dims)
68
69 if oversample:
C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\caffe\io.py in resize_image(im, new_dims, interp_order)
335 # ndimage interpolates anything but more slowly.
336 scale = tuple(np.array(new_dims, dtype=float) / np.array(im.shape[:2]))
--> 337 resized_im = zoom(im, scale + (1,), order=interp_order)
338 return resized_im.astype(np.float32)
339
C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\scipy\ndimage\interpolation.py in zoom(input, zoom, output, order, mode, cval, prefilter)
588 else:
589 filtered = input
--> 590 zoom = _ni_support._normalize_sequence(zoom, input.ndim)
591 output_shape = tuple(
592 [int(round(ii * jj)) for ii, jj in zip(input.shape, zoom)])
C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\scipy\ndimage\_ni_support.py in _normalize_sequence(input, rank, array_type)
63 if len(normalized) != rank:
64 err = "sequence argument must have length equal to input rank"
---> 65 raise RuntimeError(err)
66 else:
67 normalized = [input] * rank
RuntimeError: sequence argument must have length equal to input rank
And here is the snippets of code I'm using :
import sys
import caffe
import numpy as np
import lmdb
import matplotlib.pyplot as plt
import itertools
def flat_shape(x):
"Returns x without singleton dimension, eg: (1,28,28) -> (28,28)"
return x.reshape(x.shape)
def db_reader(fpath, type='lmdb'):
if type == 'lmdb':
return lmdb_reader(fpath)
else:
return leveldb_reader(fpath)
def lmdb_reader(fpath):
import lmdb
lmdb_env = lmdb.open(fpath)
lmdb_txn = lmdb_env.begin()
lmdb_cursor = lmdb_txn.cursor()
for key, value in lmdb_cursor:
datum = caffe.proto.caffe_pb2.Datum()
datum.ParseFromString(value)
label = int(datum.label)
image = caffe.io.datum_to_array(datum).astype(np.uint8)
yield (key, flat_shape(image), label)
def leveldb_reader(fpath):
import leveldb
db = leveldb.LevelDB(fpath)
for key, value in db.RangeIter():
datum = caffe.proto.caffe_pb2.Datum()
datum.ParseFromString(value)
label = int(datum.label)
image = caffe.io.datum_to_array(datum).astype(np.uint8)
yield (key, flat_shape(image), label)
Classifier class (copied form Caffe's python directory):
import numpy as np
import caffe
class Classifier(caffe.Net):
"""
Classifier extends Net for image class prediction
by scaling, center cropping, or oversampling.
Parameters
----------
image_dims : dimensions to scale input for cropping/sampling.
Default is to scale to net input size for whole-image crop.
mean, input_scale, raw_scale, channel_swap: params for
preprocessing options.
"""
def __init__(self, model_file, pretrained_file, image_dims=None,
mean=None, input_scale=None, raw_scale=None,
channel_swap=None):
caffe.Net.__init__(self, model_file, pretrained_file, caffe.TEST)
# configure pre-processing
in_ = self.inputs[0]
print('inputs[0]',self.inputs[0])
self.transformer = caffe.io.Transformer(
{in_: self.blobs[in_].data.shape})
self.transformer.set_transpose(in_, (2, 0, 1))
if mean is not None:
self.transformer.set_mean(in_, mean)
if input_scale is not None:
self.transformer.set_input_scale(in_, input_scale)
if raw_scale is not None:
self.transformer.set_raw_scale(in_, raw_scale)
if channel_swap is not None:
self.transformer.set_channel_swap(in_, channel_swap)
print('crops: ',self.blobs[in_].data.shape[2:])
self.crop_dims = np.array(self.blobs[in_].data.shape[2:])
if not image_dims:
image_dims = self.crop_dims
self.image_dims = image_dims
def predict(self, inputs, oversample=True):
"""
Predict classification probabilities of inputs.
Parameters
----------
inputs : iterable of (H x W x K) input ndarrays.
oversample : boolean
average predictions across center, corners, and mirrors
when True (default). Center-only prediction when False.
Returns
-------
predictions: (N x C) ndarray of class probabilities for N images and C
classes.
"""
# Scale to standardize input dimensions.
input_ = np.zeros((len(inputs),
self.image_dims[0],
self.image_dims[1],
inputs[0].shape[2]),
dtype=np.float32)
for ix, in_ in enumerate(inputs):
print('image dims = ',self.image_dims[0],',',self.image_dims[1] ,'_in = ',in_.shape)
input_[ix] = caffe.io.resize_image(in_, self.image_dims)
if oversample:
# Generate center, corner, and mirrored crops.
input_ = caffe.io.oversample(input_, self.crop_dims)
else:
# Take center crop.
center = np.array(self.image_dims) / 2.0
crop = np.tile(center, (1, 2))[0] + np.concatenate([
-self.crop_dims / 2.0,
self.crop_dims / 2.0
])
input_ = input_[:, crop[0]:crop[2], crop[1]:crop[3], :]
# Classify
caffe_in = np.zeros(np.array(input_.shape)[[0, 3, 1, 2]],
dtype=np.float32)
for ix, in_ in enumerate(input_):
caffe_in[ix] = self.transformer.preprocess(self.inputs[0], in_)
out = self.forward_all(**{self.inputs[0]: caffe_in})
predictions = out[self.outputs[0]]
# For oversampling, average predictions across crops.
if oversample:
predictions = predictions.reshape((len(predictions) / 10, 10, -1))
predictions = predictions.mean(1)
return predictions
Main section :
proto ='deploy.prototxt'
model='snap1.caffemodel'
mean='imagenet_mean.binaryproto'
db_path='G:/imagenet/ilsvrc12_val_lmdb'
# Extract mean from the mean image file
#mean_blobproto_new = caffe.proto.caffe_pb2.BlobProto()
#f = open(mean, 'rb')
#mean_blobproto_new.ParseFromString(f.read())
#mean_image = caffe.io.blobproto_to_array(mean_blobproto_new)
#f.close()
mu = np.load('mean.npy').mean(1).mean(1)
caffe.set_mode_gpu()
reader = lmdb_reader(db_path)
i = 0
for i, image, label in reader:
image_caffe = image.reshape(1, *image.shape)
print(image_caffe.shape, mu.shape)
net = Classifier(proto, model,
mean= mu,
channel_swap=(2,1,0),
raw_scale=255,
image_dims=(256, 256))
out = net.predict([image_caffe], oversample=True)
print(i, labels[out[0].argmax()].strip(),' (', out[0][out[0].argmax()] , ')')
i+=1
What is wrong here?
I found the cause, I had to feed the image in the form of 3D tensor not a 4D one!
so our 4d tensor:
image_caffe = image.reshape(1, *image.shape)
needed to be changed to a 3D one:
image_caffe = image.transpose(2,1,0)
As a side note, try using python2 for running any caffe related. python3 might work at first but will definitely cause a lot of headaches. for instance, predict method with oversample set to True, will crash under python3 but works just fine under python2!

Resources