I have data of the form
y1, x1
y2, x2
...
I am trying to train a neural network function say f such that over a set of pairs xi, xj, f(xi) - f(xj) is trained to be similar to (yi - yj) i.e. pairwise loss
You need to reconstruct the data, for each pair Xi, Xj the label will be Yi-Yj. since the problem is a regression problem use MSE as a loss function and that hopefully will lead to what you want.
Related
Fitting a single polynomial to a bunch of data is pretty easy in Pytorch using an nn.Linear layer. I've included a trivial example at the end of this post. But suppose I have tons of data split into groups, and I want to fit a different polynomial to each group. As an example, find the particular quadratic coefficients that fit each column in this image:
In other words, I want to simultaneously find the coefficients for N polynomials of order n, given m data per set to be fit:
In the image above, there are m=80 points per dataset, and N=100 sets to fit.
This perfectly lends itself to tensor manipulation and Pytorch on a gpu should make this blindingly fast by fitting all N at once. Problem is, I'm having a terrible brain fart, and haven't been able to wrap my head around the right layer configuration. Basically I need N nn.Linear layers, each operating on its own dataset. If this were convolution, I'd use a depthwise layer...
Example network to fit one polynomial where X are the m x p abscissa data, y are the m ordinate data, and we want to find the p coefficients.
class polyfit(torch.nn.Module):
def __init__(self,n=2):
super(polyfit, self).__init__()
self.poly = torch.nn.Linear(n,1,bias=False,)
def forward(self, x):
print(x.shape,self.poly)
return self.poly(x)
model = polyfit(n)
loss = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
for epoch in range(100): # or however I want to run the loops
output = model(X)
mse = loss(output, y)
optimizer.zero_grad()
mse.backward()
optimizer.step()
Figured it out after thinking about my Depthwise Convolution comment. A Conv1D with just 3 parameters times a tensor with values [1,x,x**2] is a quadratic, same as with a Linear layer with n=3. So the layer needs to be:
self.poly = torch.nn.Conv1d(N,N,n+1,bias=False,groups=N)
Just have to make sure the X,y tensors are the right dimensions of [m, N, n] and [m, N, 1] respectively.
I created three Convolutional Autoencoders with the same architecture to extract features from some images related to different types of trees.
My code is something like:
model1 = myAutoencoder()
model2 = myAutoencoder()
model3 = myAutoencoder()
opt = keras.optimizers.Adam(learning_rate=0.001)
loss = keras.losses.MeanSquaredError()
model1.compile(opt=opt, loss=loss)
model2.compile(opt=opt, loss=loss)
model3.compile(opt=opt, loss=loss)
Then I train:
#X1, X2, X3 are tensors of 64x64 RGB images: for example(100, 64,64,3)
model1.fit(X1, X1)
model2.fit(X2, X2)
model3.fit(X3, X3)
However, only the first model is learning, while the second and third are stuck with the same loss as in the figure:
enter image description here
Interestingly, if I swap the positions of let's say model1 and model2, like this:
model2.fit(X2, X2)
model1.fit(X1, X1)
model3.fit(X3, X3)
then only model 2 is learning and models 1 and 3 are stuck. I cannot figure out why...
edit: The actual training that I am doing is this:
def scheduler(epoch, lr):
if epoch < 50:
return lr
else:
return lr * np.math.exp(-0.1)
model2.fit(X2, X2, epochs=100, callbacks=[LearningRateScheduler(scheduler)])
model1.fit(X1, X1, epochs=100, callbacks=[LearningRateScheduler(scheduler)])
model3.fit(X3, X3, epochs=100, callbacks=[LearningRateScheduler(scheduler)])
I figured out that if I delete the callbacks the learning process is "normal", is there a reason why the callbacks are interfering between models?
Suppose the following model :
import torch.nn as nn
class PGN(nn.Module):
def __init__(self, input_size):
super(PGN, self).__init__()
self.linear = nn.Sequential(
nn.Linear(in_features=input_size, out_features=128),
nn.ReLU(),
nn.Linear(in_features=128, out_features=1)
)
def forward(self, x):
return self.linear(x)
I figure I have to modify the model to fit a 2-dimensional curve.
Is there a way to fit a Gaussian curve with mu=0 and sigma=0 using Pytorch? If so, can you show me?
A neural network can approximate an arbitrary function of any number of parameters to a space of any dimension.
To fit a 2 dimensional curve your network should be fed with vectors of size 2, that is a vector of x and y coordinates. The output is a single value of size 1.
For training you must generate ground truth data, that is a mapping between coordinates (x and y) and the value (z). The loss function should compare this ground truth value with the estimate of your network.
If it is just a tutorial to learn Pytorch and not a real application, you can define a function that for a given x and y output the gaussian value according to your parameters.
Then during training you randomly choose a x and y and feed this to the networks then do backprop with the true value.
For a function y = a*exp(-((x-b)^2)/2c^2),
Create this mathematical equation, for some values of x, (and a,b,c), get the outputs y. This will be your training set with x values as inputs and y values as output labels. Since this is not a linear equation, you will have to experiment with no of layers/neurons and other stuff, but it will give you a good enough approximation. For different values of a,b,c, generate your data for that and maybe try different things like adding those as inputs with x.
I am creating a basic, and also my first neural network on handwritten digit recognition without any framework (like Tensorflow, PyTorch...) using the Backpropagation algorithm.
My NN has 784 inputs and 10 outputs. So for the last layer, I have to use Softmax.
Because of some memory errors, I have right now my images in shape (300, 784) and my labels in shape (300, 10)
After that I am calculating loss from Categorical Cross-entropy.
Now we are getting to my problem. In Backpropagation, I need manually compute the first derivative of an activation function. I am doing it like this:
dAl = -(np.divide(Y, Al) - np.divide(1 - Y, 1 - Al))
#Y = test labels
#Al - Activation value from my last layer
And after that my Backpropagation can start, so the last layer is softmax.
def SoftmaxDerivative(dA, Z):
#Z is an output from np.dot(A_prev, W) + b
#Where A_prev is an activation value from previous layer
#W is weight and b is bias
#dA is the derivative of an activation function value
x = activation_functions.softmax(dA)
s = x.reshape(-1,1)
dZ = np.diagflat(s) - np.dot(s, s.T)
return dZ
1. Is this function working properly?
In the end, I would like to compute derivatives of weights and biases, So I am using this:
dW = (1/m)*np.dot(dZ, A_prev.T)
#m is A_prev.shape[1] -> 10
db = (1/m)*np.sum(dZ, axis = 1, keepdims = True)
BUT it fails on dW, because dZ.shape is (3000, 3000) (compare to A_prev.shape, which is (300,10))
So from this I assume, that there are only 3 possible outcomes.
My Softmax backward is wrong
dW is wrong
I have some other bug completely somewhere else
Any help would be really appreciated!
I faced the same problem recently. I'm not sure but maybe this question will help you: Softmax derivative in NumPy approaches 0 (implementation)
I have a linear regression model that seems to work. I first load the data into X and the target column into Y, after that I implement the following...
X_train, X_test, Y_train, Y_test = train_test_split(
X_data,
Y_data,
test_size=0.2
)
rng = np.random
n_rows = X_train.shape[0]
X = tf.placeholder("float")
Y = tf.placeholder("float")
W = tf.Variable(rng.randn(), name="weight")
b = tf.Variable(rng.randn(), name="bias")
pred = tf.add(tf.multiply(X, W), b)
cost = tf.reduce_sum(tf.pow(pred-Y, 2)/(2*n_rows))
optimizer = tf.train.GradientDescentOptimizer(FLAGS.learning_rate).minimize(cost)
init = tf.global_variables_initializer()
init_local = tf.local_variables_initializer()
with tf.Session() as sess:
sess.run([init, init_local])
for epoch in range(FLAGS.training_epochs):
avg_cost = 0
for (x, y) in zip(X_train, Y_train):
sess.run(optimizer, feed_dict={X:x, Y:y})
# display logs per epoch step
if (epoch + 1) % FLAGS.display_step == 0:
c = sess.run(
cost,
feed_dict={X:X_train, Y:Y_train}
)
print("Epoch:", '%04d' % (epoch + 1), "cost=", "{:.9f}".format(c))
print("Optimization Finished!")
accuracy, accuracy_op = tf.metrics.accuracy(labels=tf.argmax(Y_test, 0), predictions=tf.argmax(pred, 0))
print(sess.run(accuracy))
I cannot figure out how to print out the model's accuracy. For example, in sklearn, it is simple, if you have a model you just print model.score(X_test, Y_test). But I do not know how to do this in tensorflow or if it is even possible.
I think I'd be able to calculate the Mean Squared Error. Does this help in any way?
EDIT
I tried implementing tf.metrics.accuracy as suggested in the comments but I'm having an issue implementing it. The documentation says it takes 2 arguments, labels and predictions, so I tried the following...
accuracy, accuracy_op = tf.metrics.accuracy(labels=tf.argmax(Y_test, 0), predictions=tf.argmax(pred, 0))
print(sess.run(accuracy))
But this gives me an error...
FailedPreconditionError (see above for traceback): Attempting to use uninitialized value accuracy/count
[[Node: accuracy/count/read = IdentityT=DT_FLOAT, _class=["loc:#accuracy/count"], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
How exactly does one implement this?
Turns out, since this is a multi-class Linear Regression problem, and not a classification problem, that tf.metrics.accuracy is not the right approach.
Instead of displaying the accuracy of my model in terms of percentage, I instead focused on reducing the Mean Square Error (MSE) instead.
From looking at other examples, tf.metrics.accuracy is never used for Linear Regression, and only classification. Normally tf.metric.mean_squared_error is the right approach.
I implemented two ways of calculating the total MSE of my predictions to my testing data...
pred = tf.add(tf.matmul(X, W), b)
...
...
Y_pred = sess.run(pred, feed_dict={X:X_test})
mse = tf.reduce_mean(tf.square(Y_pred - Y_test))
OR
mse = tf.metrics.mean_squared_error(labels=Y_test, predictions=Y_pred)
They both do the same but obviously the second approach is more concise.
There's a good explanation of how to measure the accuracy of a Linear Regression model here.
I didn't think this was clear at all from the Tensorflow documentation, but you have to declare the accuracy operation, and then initialize all global and local variables, before you run the accuracy calculation:
accuracy, accuracy_op = tf.metrics.accuracy(labels=tf.argmax(Y_test, 0), predictions=tf.argmax(pred, 0))
# ...
init_global = tf.global_variables_initializer
init_local = tf.local_variables_initializer
sess.run([init_global, init_local])
# ...
# run accuracy calculation
I read something on Stack Overflow about the accuracy calculation using local variables, which is why the local variable initializer is necessary.
After reading the complete code you posted, I noticed a couple other things:
In your calculation of pred, you use
pred = tf.add(tf.multiply(X, W), b). tf.multiply performs element-wise multiplication, and will not give you the fully connected layers you need for a neural network (which I am assuming is what you are ultimately working toward, since you're using TensorFlow). To implement fully connected layers, where each layer i (including input and output layers) has ni nodes, you need separate weight and bias matrices for each pair of successive layers. The dimensions of the i-th weight matrix (the weights between the i-th layer and the i+1-th layer) should be (ni, ni + 1), and the i-th bias matrix should have dimensions (ni + 1, 1). Then, going back to the multiplication operation - replace tf.multiply with tf.matmul, and you're good to go. I assume that what you have is probably fine for a single-class linear regression problem, but this is definitely the way you want to go if you plan to solve a multiclass regression problem or implement a deeper network.
Your weight and bias tensors have a shape of (1, 1). You give the variables the initial value of np.random.randn(), which according to the documentation, generates a single floating point number when no arguments are given. The dimensions of your weight and bias tensors need to be supplied as arguments to np.random.randn(). Better yet, you can actually initialize these to random values in Tensorflow: W = tf.Variable(tf.random_normal([dim0, dim1], seed = seed) (I always initialize random variables with a seed value for reproducibility)
Just a note in case you don't know this already, but non-linear activation functions are required for neural networks to be effective. If all your activations are linear, then no matter how many layers you have, it will reduce to a simple linear regression in the end. Many people use relu activation for hidden layers. For the output layer, use softmax activation for multiclass classification problems where the output classes are exclusive (i.e., where only one class can be correct for any given input), and sigmoid activation for multiclass classification problems where the output classes are not exlclusive.