import statsmodels.api as sm
x1 = np.array([3,4,6,8,9])
x2 = np.array([1,1,3,6,4])
x3 = np.array([2,2,5,6,9])
y = np.array([2,3,6,5,8])
xn1 = x1.reshape(-1,1)
xn2 = x2.reshape(-1,1)
xn3 = x3.reshape(-1,1)
X = np.concatenate((xn1,xn2,xn3),axis=1)
yn1 = y.reshape(-1,1)
reg = linear_model.LinearRegression()
reg.fit(X,y)
print(reg.coef_)
print(reg.intercept_)
model = sm.OLS(y,X).fit()
print(model.summary())
output from linear regression is [ 0.66424419 -0.48982558 0.48401163]
whereas as output from the statsmodel is as below
x1, coeff = 0.6455, std err = 0.425
x2, coeff =-0.4823, std err =0.388
x3, coeff = 0.4948, std err =0.412
so the figures are not matching, what is the reason, experts please advise.
Related
I have 3 parallel MLPs and want to obtain the following in Keras:
Out = W1 * Out_MLP1 + W2 * Out_MLP2 + W3 * Out_MLP3
where Out_MLPs are output layer of each MLP and have dimension of (10,) and W1, W2 and W3 are three trainable weights (floats) where they satisfy the following condition:
W1 + W2 + W3 = 1
What is the best way to implement this with Keras functional API? What if we had N parallel layers?
what you need is to apply a softmax on a set of learnable weights, in order to grant that they sum up to 1.
We initialize our learnable weights in a custom layer. this layer receives the output of our MLPs and combines them following our logic W1 * Out_MLP1 + W2 * Out_MLP2 + W3 * Out_MLP3. the output will be a tensor of shape (10,).
class W_ADD(Layer):
def __init__(self, n_output):
super(W_ADD, self).__init__()
self.W = tf.Variable(initial_value=tf.random.uniform(shape=[1,1,n_output], minval=0, maxval=1),
trainable=True) # (1,1,n_inputs)
def call(self, inputs):
# inputs is a list of tensor of shape [(n_batch, n_feat), ..., (n_batch, n_feat)]
# expand last dim of each input passed [(n_batch, n_feat, 1), ..., (n_batch, n_feat, 1)]
inputs = [tf.expand_dims(i, -1) for i in inputs]
inputs = Concatenate(axis=-1)(inputs) # (n_batch, n_feat, n_inputs)
weights = tf.nn.softmax(self.W, axis=-1) # (1,1,n_inputs)
# weights sum up to one on last dim
return tf.reduce_sum(weights*inputs, axis=-1) # (n_batch, n_feat)
in this dummy example, I create a network that has 3 parallel MLPs
inp1 = Input((100))
inp2 = Input((100))
inp3 = Input((100))
x1 = Dense(32, activation='relu')(inp1)
x2 = Dense(32, activation='relu')(inp2)
x3 = Dense(32, activation='relu')(inp3)
x1 = Dense(10, activation='linear')(x1)
x2 = Dense(10, activation='linear')(x2)
x3 = Dense(10, activation='linear')(x3)
mlp_outputs = [x1,x2,x3]
out = W_ADD(n_output=len(mlp_outputs))(mlp_outputs)
m = Model([inp1,inp2,inp3], out)
m.compile('adam','mse')
X1 = np.random.uniform(0,1, (1000,100))
X2 = np.random.uniform(0,1, (1000,100))
X3 = np.random.uniform(0,1, (1000,100))
y = np.random.uniform(0,1, (1000,10))
m.fit([X1,X2,X3], y, epochs=10)
as you can see this is easily generalizable in case of N parallel layers
This question is about TensorFlow (and TensorBoard) version 2.2rc3, but I have experienced the same issue with 2.1.
Consider the following weird code:
from datetime import datetime
import tensorflow as tf
from tensorflow import keras
inputs = keras.layers.Input(shape=(784, ))
x1 = keras.layers.Dense(32, activation='relu', name='Model/Block1/relu')(inputs)
x1 = keras.layers.Dropout(0.2, name='Model/Block1/dropout')(x1)
x1 = keras.layers.Dense(10, activation='softmax', name='Model/Block1/softmax')(x1)
x2 = keras.layers.Dense(32, activation='relu', name='Model/Block2/relu')(inputs)
x2 = keras.layers.Dropout(0.2, name='Model/Block2/dropout')(x2)
x2 = keras.layers.Dense(10, activation='softmax', name='Model/Block2/softmax')(x2)
x3 = keras.layers.Dense(32, activation='relu', name='Model/Block3/relu')(inputs)
x3 = keras.layers.Dropout(0.2, name='Model/Block3/dropout')(x3)
x3 = keras.layers.Dense(10, activation='softmax', name='Model/Block3/softmax')(x3)
x4 = keras.layers.Dense(32, activation='relu', name='Model/Block4/relu')(inputs)
x4 = keras.layers.Dropout(0.2, name='Model/Block4/dropout')(x4)
x4 = keras.layers.Dense(10, activation='softmax', name='Model/Block4/softmax')(x4)
outputs = x1 + x2 + x3 + x4
model = tf.keras.Model(inputs=inputs, outputs=outputs)
model.summary()
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = x_train.reshape(60000, 784).astype('float32') / 255
x_test = x_test.reshape(10000, 784).astype('float32') / 255
model.compile(loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
optimizer=keras.optimizers.RMSprop(),
metrics=['accuracy'])
logdir = "logs/" + datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = keras.callbacks.TensorBoard(log_dir=logdir)
model.fit(x_train, y_train,
batch_size=64,
epochs=5,
validation_split=0.2,
callbacks=[tensorboard_callback])
When running it and looking at the graph created in TensorBoard
you will see the following.
As can be seen, the addition operations are really ugly.
When replacing the line
outputs = x1 + x2 + x3 + x4
With the lines:
outputs = keras.layers.add([x1, x2], name='Model/add/add1')
outputs = keras.layers.add([outputs, x3], name='Model/add/add2')
outputs = keras.layers.add([outputs, x4], name='Model/add/add3')
a much nicer graph is created by TensorBoard (in this second screenshot, the Model as well as one of the inner blocks are shown in details).
The difference between the two representations of the model is that in the second one, we could name the addition operations and group them.
I could not find any way to name these operations, unless by using the keras.layers.add(). In this model the problem does not look that critical as the model is simple, and it is easy to replace + with keras.layers.add(). However, in more complex models, it can become a real pain. For example, operations such as t[:, start:end] should be translated to complex calls to tf.strided_slice(). So my models representations are quite messy with plenty of cryptic gather, stride and concat operations.
I wonder if there is a way to wrap / group such operations to allow nicer graphs in TensorBoard.
outputs = keras.layers.Add()([x1, x2, x3, x4])
Following the hint from Marco Cerliani, Lambda layer is indeed very useful here. So the following code will group nicely the +:
outputs = keras.layers.Lambda(lambda x: x[0] + x[1], name='Model/add/add1')([x1, x2])
outputs = keras.layers.Lambda(lambda x: x[0] + x[1], name='Model/add/add2')([outputs, x2])
outputs = keras.layers.Lambda(lambda x: x[0] + x[1], name='Model/add/add3')([outputs, x2])
Or if needed to wrap strides, the following code will group nicely the t[]:
x1 = keras.layers.Lambda(lambda x: x[:, 0:5], name='Model/stride_concat/stride1')(x1) # instead of x1 = x1[:, 0:5]
x2 = keras.layers.Lambda(lambda x: x[:, 5:10], name='Model/stride_concat/stride2')(x2) # instead of x2 = x2[:, 5:10]
outputs = keras.layers.concatenate([x1, x2], name='Model/stride_concat/concat')
This answers the question asked. But actually, there is still an open issue that is described in another question: 'TensorFlowOpLayer messes up the TensorBoard graphs'
This is a code to predict stock price movements using TensorFlow and the ReLu activation function. I run the following code:
import tensorflow as tf
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
import matplotlib.pyplot as plt
import pandas_datareader as web
dataset = web.DataReader('AAPL', data_source = 'yahoo', start = '1989-01-01', end = '2019-12-25')
import math
close_price = dataset.filter(['Close']).values
data_train_len = math.ceil(len(close_price) * .8)
sc = MinMaxScaler(feature_range = (0, 1))
sc_data = sc.fit_transform(close_price)
data_train = sc_data[0 : data_train_len, : ]
xtrain = []
ytrain = []
for i in range(60, len(data_train)):
xtrain.append(data_train[i - 60 : i, 0])
ytrain.append(data_train[i, 0])
xtrain, ytrain = np.array(xtrain), np.array(ytrain)
xtrain = np.reshape(xtrain, (xtrain.shape[0], xtrain.shape[1], 1))
print(xtrain.shape, ytrain.shape)
data_test = sc_data[data_train_len - 60 : , :]
xtest = []
ytest = close_price[data_train_len :, :]
for i in range(60, len(data_test)):
xtest.append(data_test[i - 60 : i, 0])
xtest = np.array(xtest)
xtest = np.reshape(xtest, (xtest.shape[0], xtest.shape[1], 1))
print(xtest.shape, ytest.shape)
# Number of stock in training data
n_stocks = xtrain.shape[1]
#Model architecture parameters
n_neurons_1 = 1024
n_neurons_2 = 512
n_neurons_3 = 256
n_neurons_4 = 128
# Session
sesh = tf.InteractiveSession()
# Define two variables as placeholders
a = tf.placeholder(dtype = tf.float32, shape = [None, n_stocks])
b = tf.placeholder(dtype = tf.float32, shape = [1, None])
# Initializers
sig = 1
weight_init = tf.variance_scaling_initializer(mode = "fan_avg", distribution = "uniform", scale =
sig)
bias_init = tf.zeros_initializer()
# Hidden weights
w_hid_1 = tf.Variable(weight_init([n_stocks, n_neurons_1]))
bias_hid_1 = tf.Variable(bias_init([n_neurons_1]))
w_hid_2 = tf.Variable(weight_init([n_neurons_1, n_neurons_2]))
bias_hid_2 = tf.Variable(bias_init([n_neurons_2]))
w_hid_3 = tf.Variable(weight_init([n_neurons_2, n_neurons_3]))
bias_hid_3 = tf.Variable(bias_init([n_neurons_3]))
w_hid_4 = tf.Variable(weight_init([n_neurons_3, n_neurons_4]))
bias_hid_4 = tf.Variable(bias_init([n_neurons_4]))
# Output weights
w_out = tf.Variable(weight_init([n_neurons_4, 1]))
bias_out = tf.Variable(bias_init([1]))
# Hidden layers
hid_1 = tf.nn.relu(tf.add(tf.matmul(a, w_hid_1), bias_hid_1))
hid_2 = tf.nn.relu(tf.add(tf.matmul(hid_1, w_hid_2), bias_hid_2))
hid_3 = tf.nn.relu(tf.add(tf.matmul(hid_2, w_hid_3), bias_hid_3))
hid_4 = tf.nn.relu(tf.add(tf.matmul(hid_3, w_hid_4), bias_hid_4))
# Transposed Output layer
out = tf.transpose(tf.add(tf.matmul(hid_4, w_out), bias_out))
# Cost function
mse = tf.reduce_mean(tf.squared_difference(out, b))
rmse = tf.sqrt(tf.reduce_mean(tf.squared_difference(out, b)))
opt1 = tf.train.AdamOptimizer().minimize(mse)
opt2 = tf.train.AdamOptimizer().minimize(rmse)
sesh.run(tf.global_variables_initializer())
# Setup plot
plt.ion()
fig = plt.figure()
ax1 = fig.add_subplot(111)
line1, = ax1.plot(ytest)
line2, = ax1.plot(ytest * 0.5)
plt.show()
# Fitting neural network
batch_size = 256
mse_train = []
rmse_train = []
mse_test = []
rmse_test = []
# Run tensorflow
epochs = 10
for epoch in range(epochs):
# Training data is shuffled
shuffle_ind = np.random.permutation(np.arange(len(ytrain)))
xtrain = xtrain[shuffle_ind]
ytrain = ytrain[shuffle_ind]
# Minibatch training
for i in range(0, len(ytrain) // batch_size):
start = i * batch_size
batch_x = xtrain[start : start + batch_size]
batch_y = ytrain[start : start + batch_size]
# Run optimizer with batch
sesh.run(opt1, feed_dict = {a : batch_x, b : batch_y})
sesh.run(opt2, feed_dict = {a : batch_x, b : batch_y})
I get the following error:
ValueError: Cannot feed value of shape (256, 60, 1) for Tensor 'Placeholder_30:0', which has shape '(?, 60)'
This error appears for both of the last two lines under 'Run Optimizer with Batch'. How do I fix this?
It seems like you trying to feed data that doesn't fit with place holder (I think you placeholder a), simple way to change your place holder to a = tf.placeholder(dtype = tf.float32, shape = [None, n_stocks, 1]) or change your xtest and xtrain dimension (the line that you use reshape) by reduce last dimension using np.squeeze().
For a concrete example here is some code:
input1 = Input(shape = (3,2))
x1 = LSTM(8)(input1)
x1 = Flatten()(x1)
x1 = Dense(10)(x1)
input2 = Input(shape = (3,2))
x2 = LSTM(8)(input2)
x2 = Flatten()(x2)
x2 = Dense(10)(x2)
x = concatenate([x1, x2])
y = Dense(1)(x)
model = Model([input1, input2], y)
model.compile(optimizer='Adam',
loss='mean_squared_error')
Is it possible to use a Keras generator to fit this model with two numpy arrays?
E.g.
X1 = np.random.randn((100, 2))
X2 = np.random.randn((100, 2))
I am using a Siamese architecture in my model for a classification task of whether both the inputs are similar.
in1 = Input(shape=(None,), dtype='int32', name='in1')
x1 = Embedding(output_dim=dim, input_dim=n_symbols, input_length=None,
weights=[embedding_weights], name='x1')(in1)
in2 = Input(shape=(None,), dtype='int32', name='in2')
x2 = Embedding(output_dim=dim, input_dim=n_symbols, input_length=None,
weights=[embedding_weights], name='x2')(in2)
l = Bidirectional(LSTM(units=100, return_sequences=False))
y1 = l(x1)
y2 = l(x2)
y = concatenate([y1, y2])
out = Dense(1, activation='sigmoid')(y)
model = Model(inputs=[in1, in2], outputs=[out])
It works correctly as the number of weights to be trained remain the same even when I use a single input. The thing that confused my though, was the tensorboard vizualization of the model.
tensorboard graph
Shouldn't both x1 and x2 map to the same bidirectional node?
Also, what do the 18 and 32 tensors signify?