Now I'm trying to make some predictors with several explantory variables.
I want to ask you whether my method is wrong.
I have one score (which will be Y here) and 4 features (X1, X2, X3, and X4) with 100 observations.
Now I investigated the relationships between Y and X1, X2, X3, and X4 individually by using Pearson's correlation coefficients and linear regression (beta).
Using the relation, I made some predictor of Y through the weighted sum.
I made step-wise linear regression model and used the weight (it showed increased correlation coefficients and beta).
Due to the lack of my knowledge, I want to know whether or not it is valid approach and get some tips (other approaches when X's have different unit).
Thank you.
KHW.
Thank you for your advise.
Actually, I know that the measrues X probabily have linear relationships between Y. My approach was to investigate Pearson's corrleation coefficient and regression beta for X1, X2, X3, and X4 individualy. I found linear relationships between the features and Y, but I wanted to increase predictability (r and beta here) by combining the features. Hence I made Y' = B(0) + B(1) * X1 + B(2) * X2 + ... + B(4) * X4, where Y' is the estimated Y and B(0) is an intercept.
First, I made Y' = B(0) + B(1)* X1 + B(4) * X4 using step-wise linear regression.
Second, I made Y' = B(0) + B(1) * X1 + B(2) * X2 + ... + B(4) * X4 using multiple linear regression.
They fitted quite good, but I worry that I actually want to say the features could predict Y, but choosing coefficient using regression requires Y, which means it is not a predictor.
Could k-fold cross validation be an way to validate?
Thank you
Related
I found in official doc that CAddTable should be done as
x = x1 + x2 # instead of CAddTable(x1, x2) in older version
and PyTorch would do the rest of things like autograd
But how about if I have multiple tensors, aka. changing the input above from two tensors to a list of tensors. Could PyTorch still do the similar things?
Just for a clean display of the code snip in the comment:
x = torch.stack((x1, x2, x3, x4), dim=0)
y = torch.sum(x, dim=0, keepdim=False) # same shape as x1, x2...
I created three Convolutional Autoencoders with the same architecture to extract features from some images related to different types of trees.
My code is something like:
model1 = myAutoencoder()
model2 = myAutoencoder()
model3 = myAutoencoder()
opt = keras.optimizers.Adam(learning_rate=0.001)
loss = keras.losses.MeanSquaredError()
model1.compile(opt=opt, loss=loss)
model2.compile(opt=opt, loss=loss)
model3.compile(opt=opt, loss=loss)
Then I train:
#X1, X2, X3 are tensors of 64x64 RGB images: for example(100, 64,64,3)
model1.fit(X1, X1)
model2.fit(X2, X2)
model3.fit(X3, X3)
However, only the first model is learning, while the second and third are stuck with the same loss as in the figure:
enter image description here
Interestingly, if I swap the positions of let's say model1 and model2, like this:
model2.fit(X2, X2)
model1.fit(X1, X1)
model3.fit(X3, X3)
then only model 2 is learning and models 1 and 3 are stuck. I cannot figure out why...
edit: The actual training that I am doing is this:
def scheduler(epoch, lr):
if epoch < 50:
return lr
else:
return lr * np.math.exp(-0.1)
model2.fit(X2, X2, epochs=100, callbacks=[LearningRateScheduler(scheduler)])
model1.fit(X1, X1, epochs=100, callbacks=[LearningRateScheduler(scheduler)])
model3.fit(X3, X3, epochs=100, callbacks=[LearningRateScheduler(scheduler)])
I figured out that if I delete the callbacks the learning process is "normal", is there a reason why the callbacks are interfering between models?
I have one linear regression model: y ~ x1 + x2 (1)
and now let x3 = x1+x2, x4=x1-x2, to form a new regression y ~ x3 + x4 (2),
would the prediction of (1) and (2) be the same?
If I add L1 regularization to both models, would the prediction of (1) and (2) be the same?
That's not code question, its a substitution problem :
y = (x1+x2) + (x1-x2) = 2x1 ...
I have data of the form
y1, x1
y2, x2
...
I am trying to train a neural network function say f such that over a set of pairs xi, xj, f(xi) - f(xj) is trained to be similar to (yi - yj) i.e. pairwise loss
You need to reconstruct the data, for each pair Xi, Xj the label will be Yi-Yj. since the problem is a regression problem use MSE as a loss function and that hopefully will lead to what you want.
mean_sqr = tf.reduce_mean(tf.pow(y_ - y, 2))
optimizer = tf.train.AdamOptimizer(LEARNING_RATE)
gradients, variables = zip(*optimizer.compute_gradients(mean_sqr))
opt = optimizer.apply_gradients(list(zip(gradients, variables)))
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for j in range(TRAINING_EPOCHS):
sess.run(opt, feed_dict={x: batch_xs, y_: batch_xs})
I don't clearly understand what compute_gradients returns? Does it return sum(dy/dx) for a given x values assigned by batch_xs, and update gradient in apply_gradients function such as :
theta <- theta - LEARNING_RATE*1/m*gradients?
Or does it already return average of gradients that is summed for each x values in a given batch such as sum(dy/dx)*1/m, m is defined as batch_size?
compute_gradients(a,b) returns d[ sum a ]/db. So in your case this returns d mean_sq / d theta, where theta is set of all variables. There is no "dx" in this equation, you are not computing gradients wrt. inputs. So what happens with batch dimension? You remove it yourself in the definition of mean_sq:
mean_sqr = tf.reduce_mean(tf.pow(y_ - y, 2))
thus (I am assuming y is 1D for simplicity)
d[ mean_sqr ] / d theta = d[ 1/M SUM_i=1^M (pred(x_i), y_i)^2 ] / d theta
= 1/M SUM_i=1^M d[ (pred(x_i), y_i)^2 ] / d theta
so you are in control of whether it sums over batch, takes the mean or does something different, if you would define mean_sqr to use reduce_sum instead of a reduce_mean, gradients would be the sum over the batch and so on.
On the other hand apply_gradients simply "applies the gradients", the exact rule for application is optimiser dependent, for GradientDescentOptimizer it would be
theta <- theta - learning_rate * gradients(theta)
For Adam that you are using the equation is more complex of course.
Note however that tf.gradients is more like "backprop" than true gradient in mathematical sense - meaning that it depends on the graph dependencies and does not recognise dependences which are in "opposite" direction.