I want to make predictions from a simple time series. The observations y=[11,22,33,44,55,66,77,88,99,110] and at time x=[1,2,3,4,5,6,7,8,9,10]. I am using epsilon-SVR from libsvm toolbox. My code is as follows:
x1 = (1:7)'; #' training set
y1 = [11, 22, 33, 44, 55, 66, 77]'; #' observations from time series
options = ' -s 3 -t 2 -c 100 -g 0.05 -p 0.0003 ';
model = svmtrain(y1, x1, options)
x2 = (8:10)'; #' test set
y2 = [88, 99, 110]'; #' hidden values that are not used for training
[y2_predicted, accuracy] = svmpredict(y2, x2, model)
But the svmpredict function is giving me null output as shown below:
y2_predicted =
[]
accuracy =
[]
The reason you're not getting output predictions is that you are calling svmpredict incorrectly. There are two ways to call it:
[predicted_label, accuracy, decision_values/prob_estimates] = svmpredict(testing_label_vector, testing_instance_matrix, model, 'libsvm_options')
[predicted_label] = svmpredict(testing_label_vector, testing_instance_matrix, model, 'libsvm_options'
With the output of one argument and of 3, but not 2. So to fix your problem, you can do:
[y2_pred, accuracy, ~] = svmpredict(y2, x2, model)
if you don't care about the decision values. If you do, then
[y2_pred, accuracy, decision_values] = svmpredict(y2, x2, model)
Related
I'm trying to run a forecast on data that appears to have two components to it. A curve that looks like an exponential decay curve with a seasonal overlay on top. The attached image shows a sample of the simulated data.
What I've done so far is the following
I can do a gaussian process regression using the following model. It tries to find a switch point and tries to fit a linear and seasonal component pre/post that point. This would have worked just by itself with the available data, but in my real data I suspect there is an exponential like trend in the data.
switchpoint = pm.DiscreteUniform("switchpoint", lower=40, upper=60, testval=50)
ls_1 = pm.Gamma(name="ls_1", alpha=1.0, beta=0.5)
ls_2 = pm.Gamma(name="ls_2", alpha=1.0, beta=0.5)
period_1 = pm.Gamma(name="period_1", alpha=12, beta=2)
period_2 = pm.Gamma(name="period_2", alpha=12, beta=2)
ls_switched = pm.math.switch(switchpoint < x_switch, ls_1, ls_2)
period_switched = pm.math.switch(switchpoint < x_switch, period_1, period_2)
gp_1 = pm.gp.Marginal(
cov_func=pm.gp.cov.Periodic(input_dim=1, period=period_switched, ls=ls_switched)
)
# Linear trend.
c_31 = pm.Normal(name="c_31", mu=0, sigma=2)
c_32 = pm.Normal(name="c_32", mu=0, sigma=2)
c_switched = pm.math.switch(switchpoint < x_switch, c_31, c_32)
gp_3 = pm.gp.Marginal(cov_func=pm.gp.cov.Linear(1, c=c_switched))
ls_switched = pm.math.switch(switchpoint < x_switch, ls_1, ls_2)
# Define gaussian process.
gp = gp_1 + gp_3
# Noise.
sigma = pm.HalfNormal("sigma", sigma=2)
# Likelihood.
y_pred = gp.marginal_likelihood(
"y_pred",
X=x_train.reshape(n_train, 1),
y=y_train.reshape(n_train, 1).flatten(),
noise=sigma,
)
I want to overlay that using the following model. It seeks to fit an exponential curve on data.
amp = pm.Uniform("amp", 0.05, 0.4)
size = pm.Uniform("size", 0.5, 2.5)
ps = pm.Normal("ps", 0.13, 40)
x_pred = np.linspace(0, 70, 1)
z = pm.Deterministic(
"z",
amp
* np.exp(
-1
* (np.pi**2 * size * x_pred / (3600.0 * 180.0)) ** 2
/ (4.0 * np.log(2.0))
)
+ ps,
)
y = pm.Normal("y", mu=z + gp, tau=1.0, observed=y_act)
Basically my generative process is something that decays like an exponential function but has seasonality overlaid on it. But this is the point where I'm stuck. How do I tell pymc3 that I want to run samples from an overlay of the two processes ?
It gives the following error
Traceback (most recent call last):
File "./t.py", line 60, in <module>
y = pm.Normal("y", mu=z + gp, tau=1.0, observed=y_act)
TypeError: unsupported operand type(s) for +: 'DeterministicWrapper' and 'Marginal'
I'm using MulticlassClassificationEvaluator to retrieve some metrics like F1-Score or accuracy in a Cross Validation in PySpark:
cross_result = CrossValidator(estimator=RandomForestClassifier(),
estimatorParamMaps=ParamGridBuilder().build(),
evaluator=MulticlassClassificationEvaluator(metricName='f1'),
numFolds=5,
parallelism=-1)
f1_score = cross_result.avgMetrics[0]
Now, my question is: why is avgMetrics a list if it only has one value? Doesn't It should be a scalar value? Am I missing something about this attribute?
Following the source code, I realized that avgMetrics is a list with the average of all the cross-validation folds of the metric for each parameter defined in ParamGrid. So:
dataset = spark.createDataFrame(
[(Vectors.dense([0.0]), 0.0),
(Vectors.dense([0.6]), 1.0),
(Vectors.dense([1.0]), 1.0)] * 10,
["features", "label"])
lr = LogisticRegression()
# Note that there are three values for maxIter: 0, 1 and 5
grid = ParamGridBuilder().addGrid(lr.maxIter, [0, 1, 5]).build()
evaluator = MulticlassClassificationEvaluator(metricName='accuracy')
cv = CrossValidator(
estimator=lr,
estimatorParamMaps=grid,
evaluator=evaluator,
parallelism=2
)
cvModel = cv.fit(dataset)
cvModel.avgMetrics[0] # Average accuracy for maxIter = 0
cvModel.avgMetrics[1] # Average accuracy for maxIter = 1
cvModel.avgMetrics[2] # Average accuracy for maxIter = 5
I've been build siamese neural network using pytorch. But I've just test it by inserting 2 pictures and calculate the similarity score, where 0 says that picture is different and 1 says picture is same.
import numpy as np
import os, sys
from PIL import Image
dir_name = "/Users/tania/Desktop/Aksara/Compare" #this should contain 26 images only
X = []
for i in os.listdir(dir_name):
if ".PNG" in i:
X.append(torch.from_numpy(np.array(Image.open("./Compare/" + i))))
x1 = np.array(Image.open("/Users/tania/Desktop/Aksara/TEST/Ba/B/B.PNG"))
x1 = transforms(x1)
x1 = torch.from_numpy(x1)
#x1 = torch.stack([x1])
closest = 0.0 #highest similarity
closest_letter_idx = 0 #index of closest letter 0=A, 1=B, ...
cnt = 0
for i in X:
output = model(x1,i) #assuming x1 is your input image
output = torch.sigmoid(output)
if output > closest:
closest_letter_idx = cnt
closest = output
cnt += 1
Both pictures are different, so the output
File "test.py", line 83, in <module>
X.append(torch.from_numpy(Image.open("./Compare/" + i)))
TypeError: expected np.ndarray (got PngImageFile)
this is the directory
Yes there is a way, you could use the softmax function:
output = torch.softmax(output)
This returns a tensor of 26 values, each corresponding to the probability that the image corresponds to each of the 26 classes. Hence, the tensor sums to 1 (100%).
However, this method is suitable for classification tasks, as opposed to Siamese Networks. Siamese networks compare between inputs, instead of sorting inputs into classes. From your question, it seems like you're trying to compare 1 picture with 26 others. You could loop over all the 26 samples to compare with, compute & save the similarity score for each, and output the maximum value (that is if you don't want to modify your model):
dir_name = '/Aksara/Compare' #this should contain 26 images only
X = []
for i in os.listdir(dir_name):
if ".PNG" in i:
X.append(torch.from_numpy(np.array(Image.open("./Compare/" + i))))
x1 = np.array(Image.open("test.PNG"))
#do your transformations on x1
x1 = torch.from_numpy(x1)
closest = 0.0 #highest similarity
closest_letter_idx = 0 #index of closest letter 0=A, 1=B, ...
cnt = 0
for i in X:
output = model(x1,i) #assuming x1 is your input image
output = torch.sigmoid(output)
if output > closest:
closest_letter_idx = cnt
closest = output
cnt += 1
print(closest_letter_idx)
I recently tried to implement a vanilla RNN from scratch. I implemented everything and even ran a seemingly OK example! yet I noticed the gradient check is not successful! and only some parts (specifically weight and bias for the output) pass the gradient check while other weights (Whh, Whx) don't pass it.
I followed karpathy/corsera's implementation and made sure everything is implemented. Yet karpathy/corsera's code passes the gradient check and mine doesn't. I have no clue at this point, what is causing this!
Here is the snippets responsible for backward pass in the original code :
def rnn_step_backward(dy, gradients, parameters, x, a, a_prev):
gradients['dWya'] += np.dot(dy, a.T)
gradients['dby'] += dy
da = np.dot(parameters['Wya'].T, dy) + gradients['da_next'] # backprop into h
daraw = (1 - a * a) * da # backprop through tanh nonlinearity
gradients['db'] += daraw
gradients['dWax'] += np.dot(daraw, x.T)
gradients['dWaa'] += np.dot(daraw, a_prev.T)
gradients['da_next'] = np.dot(parameters['Waa'].T, daraw)
return gradients
def rnn_backward(X, Y, parameters, cache):
# Initialize gradients as an empty dictionary
gradients = {}
# Retrieve from cache and parameters
(y_hat, a, x) = cache
Waa, Wax, Wya, by, b = parameters['Waa'], parameters['Wax'], parameters['Wya'], parameters['by'], parameters['b']
# each one should be initialized to zeros of the same dimension as its corresponding parameter
gradients['dWax'], gradients['dWaa'], gradients['dWya'] = np.zeros_like(Wax), np.zeros_like(Waa), np.zeros_like(Wya)
gradients['db'], gradients['dby'] = np.zeros_like(b), np.zeros_like(by)
gradients['da_next'] = np.zeros_like(a[0])
### START CODE HERE ###
# Backpropagate through time
for t in reversed(range(len(X))):
dy = np.copy(y_hat[t])
# this means, subract the correct answer from the predicted value (1-the predicted value which is specified by Y[t])
dy[Y[t]] -= 1
gradients = rnn_step_backward(dy, gradients, parameters, x[t], a[t], a[t-1])
### END CODE HERE ###
return gradients, a
and this is my implementation:
def rnn_cell_backward(self, xt, h, h_prev, output, true_label, dh_next):
"""
Runs a single backward pass once.
Inputs:
- xt: The input data of shape (Batch_size, input_dim_size)
- h: The next hidden state at timestep t(which comes from the forward pass)
- h_prev: The previous hidden state at timestep t-1
- output : The output at the current timestep
- true_label: The label for the current timestep, used for calculating loss
- dh_next: The gradient of hidden state h (dh) which in the beginning
is zero and is updated as we go backward in the backprogagation.
the dh for the next round, would come from the 'dh_prev' as we will see shortly!
Just remember the backward pass is essentially a loop! and we start at the end
and traverse back to the beginning!
Returns :
- dW1 : The gradient for W1
- dW2 : The gradient for W2
- dW3 : The gradient for W3
- dbh : The gradient for bh
- dbo : The gradient for bo
- dh_prev : The gradient for previous hiddenstate at timestep t-1. this will be used
as the next dh for the next round of backpropagation.
- per_ts_loss : The loss for current timestep.
"""
e = np.copy(output)
# correct idx for each row(sample)!
idxs = np.argmax(true_label, axis=1)
# number of rows(samples) in our batch
rows = np.arange(e.shape[0])
# This is the vectorized version of error_t = output_t - label_t or simply e = output[t] - 1
# where t refers to the index in which label is 1.
e[rows, idxs] -= 1
# This is used for our loss to see how well we are doing during training.
per_ts_loss = output[rows, idxs].sum()
# must have shape of W3 which is (vocabsize_or_output_dim_size, hidden_state_size)
dW3 = np.dot(e.T, h)
# dbo = e.1, since we have batch we use np.sum
# e is a vector, when it is subtracted from label, the result will be added to dbo
dbo = np.sum(e, axis=0)
# when calculating the dh, we also add the dh from the next timestep as well
# when we are in the last timestep, the dh_next is initially zero.
dh = np.dot(e, self.W3) + dh_next # from later cell
# the input part
dtanh = (1 - h * h) * dh
# dbh = dtanh.1, we use sum, since we have a batch
dbh = np.sum(dtanh, axis=0)
# compute the gradient of the loss with respect to W1
# this is actually not needed! we only care about tune-able
# parameters, so we are only after, W1,W2,W3, db and do
# dxt = np.dot(dtanh, W1.T)
# must have the shape of (vocab_size, hidden_state_size)
dW1 = np.dot(xt.T, dtanh)
# compute the gradient with respect to W2
dh_prev = np.dot(dtanh, self.W2)
# shape must be (HiddenSize, HiddenSize)
dW2 = np.dot(h_prev.T, dtanh)
return dW1, dW2, dW3, dbh, dbo, dh_prev, per_ts_loss
def rnn_layer_backward(self, Xt, labels, H, O):
"""
Runs a full backward pass on the given data. and returns the gradients.
Inputs:
- Xt: The input data of shape (Batch_size, timesteps, input_dim_size)
- labels: The labels for the input data
- H: The hiddenstates for the current layer prodced in the foward pass
of shape (Batch_size, timesteps, HiddenStateSize)
- O: The output for the current layer of shape (Batch_size, timesteps, outputsize)
Returns :
- dW1: The gradient for W1
- dW2: The gradient for W2
- dW3: The gradient for W3
- dbh: The gradient for bh
- dbo: The gradient for bo
- dh: The gradient for the hidden state at timestep t
- loss: The current loss
"""
dW1 = np.zeros_like(self.W1)
dW2 = np.zeros_like(self.W2)
dW3 = np.zeros_like(self.W3)
dbh = np.zeros_like(self.bh)
dbo = np.zeros_like(self.bo)
dh_next = np.zeros_like(H[:, 0, :])
hprev = None
_, T_x, _ = Xt.shape
loss = 0
for t in reversed(range(T_x)):
# this if-else block can be removed! and for hprev, we can simply
# use H[:,t -1, : ] instead, but I also add this in case it makes a
# a difference! so far I have not seen any difference though!
if t > 0:
hprev = H[:, t - 1, :]
else:
hprev = np.zeros_like(H[:, 0, :])
dw_1, dw_2, dw_3, db_h, db_o, dh_prev, e = self.rnn_cell_backward(Xt[:, t, :],
H[:, t, :],
hprev,
O[:, t, :],
labels[:, t, :],
dh_next)
dh_next = dh_prev
dW1 += dw_1
dW2 += dw_2
dW3 += dw_3
dbh += db_h
dbo += db_o
# Update the loss by substracting the cross-entropy term of this time-step from it.
loss -= np.log(e)
return dW1, dW2, dW3, dbh, dbo, dh_next, loss
I have commented everything and provided a minimal example to demonstrate this here:
My code (doesn't pass gradient check)
And here is the implementation that I used as my guide. This is from karpathy/Coursera and passes all the gradient checks!: original code
At this point I have no idea why this is not working. I'm a beginner in Python so, this could be why I can't find the issue.
2 month later I think I found the culprit! I should have changed the following line :
# compute the gradient with respect to W2
dh_prev = np.dot(dtanh, self.W2)
to
# compute the gradient with respect to W2
# note the transpose here!
dh_prev = np.dot(dtanh, self.W2.T)
When I was initially writing the backward pass, I only paid attention to the dimensions and that made me make this mistake. This is actually an example of messing features that can happen in mindless/blind reshaping/transposing(or not doing so!)
In order to get what has gone wrong here let me give an example.
Suppose we have a matrix of peoples features and we dedicated each row to each person, therefore our matrix would look like this :
Features | Age | height(cm) | weight(kg) |
matrix = | 20 | 185 | 75 |
| 85 | 155 | 95 |
| 40 | 205 | 120 |
Now if we make this into a numpy array we will have the following :
m = np.array([[20, 185, 75],
[85, 155, 95],
[40, 205, 120]])
A simple 3x3 array right?
Now the way we interpret our matrix is very important, here each row and each column has a specific meaning. Each person is described using a row, and each column is a specific feature vector.
So, you see there is a "structure" in the matrix we represent our data with.
In other words, each data item is represented as a row, and each column specifies a single feature. When multiplying with another matrix, this semantic should be paid attention to ,meaning, when two matrices are to be multiplied, each data row must have this semantic.
Lets have an example and make this more clear :
suppose we have two matrices :
m1 = np.array([[20, 185, 75],
[85, 155, 95],
[40, 205, 120]])
m2 = np.array([[0.9, 0.8, 0.85],
[0.1, 0.5, 0.4],
[0.6, 0.9, 0.8]])
these two matrices contain data that are arranged in rows, therefore, multiplying them would result in the correct answer, However altering the order of data using Transpose for example, will destroy the semantic and we will be multiplying unrelated data!
In my case I needed to transpose the second matrix it to make the order right
for the operation at hand! and that fixed the gradient checking hopefully!
When I run the following script, I notice the following couple of errors:
import tensorflow as tf
import numpy as np
import seaborn as sns
import random
#set random seed:
random.seed(42)
def potential(N):
points = np.random.rand(N,2)*10
values = np.array([np.exp((points[i][0]-5.0)**2 + (points[i][1]-5.0)**2) for i in range(N)])
return points, values
def init_weights(shape,var_name):
"""
Xavier initialisation of neural networks
"""
init = tf.contrib.layers.xavier_initializer()
return tf.get_variable(initializer=init,name = var_name,shape=shape)
def neural_net(X):
with tf.variable_scope("model",reuse=tf.AUTO_REUSE):
w_h = init_weights([2,10],"w_h")
w_h2 = init_weights([10,10],"w_h2")
w_o = init_weights([10,1],"w_o")
### bias terms:
bias_1 = init_weights([10],"bias_1")
bias_2 = init_weights([10],"bias_2")
bias_3 = init_weights([1],"bias_3")
h = tf.nn.relu(tf.add(tf.matmul(X, w_h),bias_1))
h2 = tf.nn.relu(tf.add(tf.matmul(h, w_h2),bias_2))
return tf.nn.relu(tf.add(tf.matmul(h2, w_o),bias_3))
X = tf.placeholder(tf.float32, [None, 2])
with tf.Session() as sess:
model = neural_net(X)
## define optimizer:
opt = tf.train.AdagradOptimizer(0.0001)
values =tf.placeholder(tf.float32, [None, 1])
squared_loss = tf.reduce_mean(tf.square(model-values))
## define model variables:
model_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES,"model")
train_model = opt.minimize(squared_loss,var_list=model_vars)
sess.run(tf.global_variables_initializer())
for i in range(10):
points, val = potential(100)
train_feed = {X : points,values: val.reshape((100,1))}
sess.run(train_model,feed_dict = train_feed)
print(sess.run(model,feed_dict = {X:points}))
### plot the approximating model:
res = 0.1
xy = np.mgrid[0:10:res, 0:10:res].reshape(2,-1).T
values = sess.run(model, feed_dict={X: xy})
sns.heatmap(values.reshape((int(10/res),int(10/res))),xticklabels=False,yticklabels=False)
On the first run I get:
[nan] [nan] [nan] [nan] [nan] [nan] [nan]] Traceback (most
recent call last):
...
File
"/Users/aidanrockea/anaconda/lib/python3.6/site-packages/seaborn/matrix.py",
line 485, in heatmap
yticklabels, mask)
File
"/Users/aidanrockea/anaconda/lib/python3.6/site-packages/seaborn/matrix.py",
line 167, in init
cmap, center, robust)
File
"/Users/aidanrockea/anaconda/lib/python3.6/site-packages/seaborn/matrix.py",
line 206, in _determine_cmap_params
vmin = np.percentile(calc_data, 2) if robust else calc_data.min()
File
"/Users/aidanrockea/anaconda/lib/python3.6/site-packages/numpy/core/_methods.py",
line 29, in _amin
return umr_minimum(a, axis, None, out, keepdims)
ValueError: zero-size array to reduction operation minimum which has
no identity
On the second run I have:
ValueError: Variable model/w_h/Adagrad/ already exists, disallowed.
Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope?
It's not clear to me why I get either of these errors. Furthermore, when I use:
for i in range(10):
points, val = potential(10)
train_feed = {X : points,values: val.reshape((10,1))}
sess.run(train_model,feed_dict = train_feed)
print(sess.run(model,feed_dict = {X:points}))
I find that on the first run, I sometimes get a network that has collapsed to the constant function with output 0. Right now my hunch is that this might simply be a numerics problem but I might be wrong.
If so, it's a serious problem as the model I have used here is very simple.
Right now my hunch is that this might simply be a numerics problem
indeed, when running potential(100) I sometimes get values as large as 1E21. The largest points will dominate your loss function and will drive the network parameters.
Even when normalizing your target values e.g. to unit variance, the problem of the largest values dominating the loss would still remain (look e.g. at plt.hist(np.log(potential(100)[1]), bins = 100)).
If you can, try learning the log of val instead of val itself. Note however that then you are changing the assumption of the loss function from 'predictions follow a normal distribution around the target values' to 'log predictions follow a normal distribution around log of the target values'.