How to train a parameter outside the model? - keras

I am implementing the following architecture in Tensorflow 2.0 Dual Encoder LSTM
C and R are sentences encoded into a fixed dimension by the two LSTM's. Then they are passed through a function sigmoid(CMR). We can assume that R and C are both 256 dimensional matrices and M is a 256 * 256 matrix. The matrix M is learned during training.
Since I want to train M, I declared M = tf.Variable(shape,trainable = True).
But after fitting the model, the values of M are still not changing. How to tell tensorflow to compute the gradients for M automatically ? Below is my code.
Code

Related

How to do constrained minimization in Pytorch?

I want to minimize an equation. The equation consists of elements which are all tensors.
f=alpha + (vnorm/2) #Equation to minimize
where, vnorm=norm(v)*norm(v)
v is a tensor vector of n*1 and alpha is a tensor of 1*1
Now I need to minimize f with respect to a contraint, that is–
(A # v)+alpha<=0 #Constraint involve in the minimization
where A is a tensor of 2*n.
How should I formulate the above equation and the the constraint to minimize the same in Pytorch ? I was successful in doing the same with 'scipy' but I want to do it in Pytorch so that I can make the minimization process faster taking the help of the tensors.

Retrieve elements from a 3D tensor with a 2D index tensor

I am playing around with GPT2 and I have 2 tensors:
O: An output tensor of shaped (B, S-1, V) where B is the batch size S is the the number of timestep and V is the vocabulary size. This is the output of a generative model and is softmaxed along the 2nd dimension.
L: A 2D tensor shaped (B, S-1) where each element is the index of the correct token for each timestep for each sample. This is basically the labels.
I want to extract the predicted probability of the corresponding correct token from tensor O based on tensor L such that I will end up with a 2D tensor shaped (B, S). Is there an efficient way of doing this apart from using loops?
For reference, I based my answer on this Medium article.
Essentially, your answer lies in torch.gather, assuming that both of your tensors are just regular torch.Tensors (or can be converted to one).
import torch
# Specify some arbitrary dimensions for now
B = 3
V = 6
S = 4
# Make example reproducible
torch.manual_seed(42)
# L necessarily has to be a torch.LongTensor, otherwise indexing will fail.
L = torch.randint(0, V, size=[B, S])
O = torch.rand([B, S, V])
# Now collect the results. L needs to have similar dimension,
# except in the axis you want to collect along.
X = torch.gather(O, dim=2, index=L.unsqueeze(dim=2))
# Make sure X has no "unnecessary" dimension
X = X.squeeze(dim=2)
It is a bit difficult to see whether this produces the exact correct results, which is why I included a random seed which makes the example deterministic in the result, and you an easily verify that it gets you the desired results. However, for clarification, one could also use a lower-dimensional tensor, for which this becomes clearer what exactly torch.gather does.
Note that torch.gather also allows you to index multiple indexes in the same row theoretically. Meaning if you instead got a multiclass example for which multiple values are correct, you could similarly use a tensor L of shape [B, S, number_of_correct_samples].

Get Keras LSTM output inside Tensorflow code

I'm working with time-variant graph embedding, where at each time step, the adjacency matrix of the graph changes. The main idea is to perform the node embedding of each timestep of the graph by looking to a set of node features and the adjacency matrix. The node embedding step is long and complicated, and is not part of the core of the problem, so I will skip this part. Suffice it to say that I use Graph Convolutional Network to embed the nodes.
Consider that I have a stack of B adjacency matrices A with sizes NxN, where B = batch size and N = number of nodes in the graph. Also, the matrices are stacked according to a time series, where matrix in index i comes before matrix in index i+1. I have already embedded the nodes of the graph, which results in a matrix of dimensions B x N x E, where E = size of the embedding (parameter). Note that the model has to deal with any graph, therefore, N is not a parameter. Another important comment is that each batch contains adjacency matrices from the same graph, and therefore all matrices of a batch have the same number of node, but the matrices of other batches may have different number of nodes.
I now need to pass these embedding through an LSTM cell. I never used Keras before, so I'm having a hard time making the Keras LSTM blend in my Tensorflow code. What I want to do is: pass each node embedding through an LSTM such that the number of timesteps = B and the LSTM batch size = N, that is, the input to my LSTM has the shape [N, B, E], where N and B are only known through execution time. I want the output of my LSTM to have the shape of [B, E*E]. The embedding matrix is called here self.embed_mat. Here is my code:
def _LSTM_layer(self):
with tf.variable_scope(self.scope, reuse=tf.AUTO_REUSE), tf.device(self.device):
in_shape = tf.shape(self.embed_mat)
lstm_input = tf.reshape(self.embed_mat, [in_shape[1], in_shape[0], EMBED_SIZE]) #lstm = [N, B, E]
input_plh = K.placeholder(name="lstm_input", shape=(None, None, EMBED_SIZE))
lstm = LSTM(EMBED_SIZE*EMBED_SIZE, input_shape=(None, None, EMBED_SIZE))
get_output = K.function(inputs=[input_plh], outputs=[lstm(input_plh)])
h = get_output([lstm_input])
I am a bit lost with the K.function part. All I want is the output tensor of the LSTM cell. I've seen that in order to get that with Keras, we need to use K.function, but I don't quite get it what it does. When I call get_output([lstm_input]), I get the following error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'worker_global/A/shape' with dtype int64 and shape [?]
Here, A is the stacked adjacency matrices with dimension BxNxN. What is going on here? Does the value of N needs to be known during graph building step? I think I made some dumb mistake with the LSTM cell, but I can't get what it is.
Thanks in advance!
If you want to get the output of your LSTM layer "out" given input of "inp" in a keras Sequential() model called "model," where "inp" is your first / input layer and "out" is an LSTM layer that happens to be, for the sake of this example, in the 4th position in your sequential model, you would obtain the output of that LSTM layer from the data you call "lstm_input" above with the following code:
inp = model.layers[0].input
out = model.layers[3].output
inp_to_out = K.function([inp], [out])
output = inp_to_out([lstm_input])

Keras: triplet loss with positive and negative sample within batch

I try to refactor my Keras code to use 'Batch Hard' sampling for the triplets, as proposed in https://arxiv.org/pdf/1703.07737.pdf.
" the core idea is to form batches by randomly sampling P classes
(person identities), and then randomly sampling K images of each class
(person), thus resulting in a batch of PK images. Now, for each
sample a in the batch, we can select the hardest positive and the
hardest negative samples within the batch when forming the triplets
for computing the loss, which we call Batch Hard"
So at the moment I have a Python generator (for use with model.fit_generator in Keras) which produces batches on the CPU. Then the actual forward and backward passes through the model could be done on the GPU.
However, how to make this fit with the 'Batch Hard' method? The generator samples 64 images, for which 64 triplets should be formed. First a forward pass is required to obtain the 64 embeddings with the current model.
embedding_model = Model(inputs = input_image, outputs = embedding)
But then the hardest positive and hardest negative have to be selected from the 64 embeddings to form triplets. Then the loss can be computed
anchor = Input(input_shape, name='anchor')
positive = Input(input_shape, name='positive')
negative = Input(input_shape, name='negative')
f_anchor = embedding_model(anchor)
f_pos = embedding_model(pos)
f_neg = embedding_model(neg)
triplet_model = Model(inputs = [anchor, positive, negative], outputs=[f_anchor, f_pos, f_neg])
And this triplet_model can be trained by defining a triplet loss function. However, is it possible with Keras to use the fit_generator and the 'Batch Hard' method? Or how to obtain access to the embeddings from the other samples in the batch?
Edit: With keras.layers.Lambda I can define an own layer creating triplets with input (batch_size, height, width, 3) and output (batch_size, 3, height, width, 3), but I also need access to the id's somewhere. Is this possible within the layer?

Defining new kernels to use in approximate_kernel.py

I am trying to test a new kernel method in Kernel Ridge Regression and want to do this by implementing the Fastfood transformation (https://arxiv.org/abs/1408.3060). I can write a function which computes this transform but it isn't playing nicely with the kernel ridge regression function in sklearn. As a result I have gone to the source code for sklearn kernel ridge regression (https://insight.io/github.com/scikit-learn/scikit-learn/blob/master/sklearn/kernel_ridge.py) and approximate_kernel.py (https://insight.io/github.com/scikit-learn/scikit-learn/blob/master/sklearn/kernel_approximation.py) in order to try and define this new kernel as a class definition in approximate_kernel.py. The problem is that I have no idea how to convert my construction to something which will work in the approximate_kernel KernelRidge programs. Would anybody be able to advise how best to do this please?
My construction for the fastfood transform is:
def fastfood_product(d):
'''
Constructs the fastfood matrix composition V = const*S*H*G*Pi*B where
S is a scaling matrix
H is Hadamard transform
G is a diagonal random Gaussian
Pi is a permutation matrix
B is a diagonal Rademacher matrix.
Inputs: n - dimensionality of the feature vectors for the kernel.
must be a power of two and be divisible by d. If not then can
pad the matrix with zeros but for simplicity assume this condition
is always met.
Output: V'''
S = np.zeros(shape=(d,d))
G = np.zeros_like(S)
B = np.zeros_like(S)
H = hadamard(d)
Pi = np.eye(d)
np.random.shuffle(Pi) # Permutation matrix
# Construct the simple matrices
np.fill_diagonal(B, 2*np.random.randint(low=0,high=2,size=(d,1)).flatten() - 1)
np.fill_diagonal(G, np.random.randn(G.shape[0],1)) # May want to change standard normal to arbitrary which will affect the scaling for V
np.fill_diagonal(S, np.linalg.norm(G,'fro')**(-0.5))
#print('Shapes of B {}, S {}, G {}, H{}, Pi {}'.format(B.shape, S.shape, G.shape, H.shape, Pi.shape))
V = d**(-0.5)*S.dot(H).dot(G).dot(Pi).dot(H).dot(B)
return V
def fastfood_feature_map(X, n):
'''Given a matrix X of data compute the fastfood transformation and feature mapping.
Input: X data of dimension d by m, n = the number of nonlinear basis functions to choose (power of 2)
Outputs: Phi - matrix of random features for fastfood kernel approximation.
Usage: Phi must be transposed for computation in the kernel ridge regression.
i.e solve ||Phi.T * w - b || + regulariser
Comments: This only uses a standard normal distribution but this could
be altered with different hyperparameters.'''
d,m = A.shape
V = fastfood_product(d)
Phi = n**(-0.5)*np.exp(1j*np.dot(V, X))
return Phi
I think the imports numpy as np and from linalg import hadamard will be necessary for the above.

Resources