How does SimCSE do dropout twice indepently - nlp

How can I pass an input sentence into bert with dropout twice independently?
here is what i try so far, the outputs are identical.
bert = AutoModel.from_pretrained('bert-base-cased')
tokenizer = AutoTokenizer.from_pretrained('bert-base-cased')
sent_dict = tokenizer('Some weights of the model checkpoint at bert-base-cased were not used when initializing BertModel', return_tensors='pt')
bert(**sent_dict).pooler_output == bert(**sent_dict).pooler_output

I forgot model.train() :(
Dropout only works in training mode

Related

Keras get the output of the last layer during training

The goal is to recover the output of the last layer of the variational auto-encoder in the training phase for use as training data for another algorithm.
Attached is the model variational autoencoder code:
encoding_dim=58
input_dim=xtrain.shape[1]
inputArray=Input(shape=(input_dim,))
encoded= Dense(units=encoding_dim,activation="tanh")(inputArray)
encoded= Dense(units=29,activation="tanh")(encoded)
encoded= Dense(units=15,activation="tanh")(encoded)
encoded= Dense(units=10,activation="tanh")(encoded)
encoded= Dense(units=3,activation="tanh")(encoded)
encoded= Dense(units=10,activation="tanh")(encoded)
decoded= Dense(units=15,activation="tanh")(encoded)
decoded= Dense(units=29,activation="tanh")(decoded)
decoded= Dense(units=encoding_dim,activation="tanh")(decoded)
decoded= Dense(units=input_dim,activation="sigmoid")(decoded)
autoecoder=Model(inputArray,decoded)
autoecoder.summary()
autoecoder.compile(optimizer=RMSprop(),loss="mean_squared_error",metrics=["mae"])
#hyperparametrs :
batchsize=100
epoch=10
history = autoecoder.fit(xtrain_noise,xtrain,
batch_size=batchsize,
epochs=epoch,
verbose=1,
shuffle=True,
validation_data=(xtest_noise,xtest),
callbacks=[TensorBoard(log_dir="../logs/DenoiseautoencoderHoussem")])
I have found that I can retrieve the desired layer as follows:
autoecoder.layers[10].output
but how do I store his output during training in a list? Thanks.
Edit:
I can do this by use the prediction method of the model on the xtrain data, but I think this is not the best way to do it.
You can train a new model using the predictions of a previously trained model simply stacking on the desired output new layers and set trainable = False on the old layer. Here a dummy example
# after autoencoder fitting
for i,l in enumerate(autoecoder.layers):
autoecoder.layers[i].trainable = False
print(l.name, l.trainable)
output_autoecoder = autoecoder.layers[10].output
x_new = Dense(32, activation='relu')(output_autoecoder) # add a new layer for exemple
new_model = Model(autoecoder.input, x_new)
new_model.compile('adam', 'mse')
new_model.summary()
I use the output of the last autoencoder layer as the input of new blocks. We can merge all compiling a new model where the inputs are the same as autoecoder, in this way we can use the training data for another algorithm without calling the prediction method
To solve this problem, the only solution that can be used is the .predict method of DL model. thank you #marrco

How to load BertforSequenceClassification models weights into BertforTokenClassification model?

Initially, I have a fine-tuned BERT base cased model using a text classification dataset and I have used BertforSequenceClassification class for this.
from transformers import BertForSequenceClassification, AdamW, BertConfig
# Load BertForSequenceClassification, the pretrained BERT model with a single
# linear classification layer on top.
model = BertForSequenceClassification.from_pretrained(
"bert-base-uncased", # Use the 12-layer BERT model, with an uncased vocab.
num_labels = 2, # The number of output labels--2 for binary classification.
# You can increase this for multi-class tasks.
output_attentions = False, # Whether the model returns attentions weights.
output_hidden_states = False, # Whether the model returns all hidden-states.
)
Now I want to use this fine-tuned BERT model weights for Named Entity Recognition and I have to use BertforTokenClassification class for this. I'm unable to figure out how to load the fine-tuned BERT model weights into the new model created using BertforTokenClassification.
Thanks in advance.......................
You can get weights from the bert inside the first model and load into the bert inside the second:
new_model = BertForTokenClassification(config=config)
new_model.bert.load_state_dict(model.bert.state_dict())
This worked for me
new_model = BertForTokenClassification.from_pretrained('/config path')
new_model.bert.load_state_dict(model.bert.state_dict())

Keras - Issues using pre-trained word embeddings

I'm following Keras tutorials on word embeddings and replicated the code (with a few modifications) from this particular one:
Using pre-trained word embeddings in a Keras model
It's a topic classification problem in which they are loading pre-trained word vectors and use them via a fixed embedding layer.
When using the pre-trained embedding vectors I can, in fact, achieve their 95% accuracy. This is the code:
embedding_layer = Embedding(len(embed_matrix), len(embed_matrix.columns), weights=[embed_matrix],
input_length=data.shape[1:], trainable=False)
sequence_input = Input(shape=(MAXLEN,), dtype='int32')
embedded_sequences = embedding_layer(sequence_input)
x = Conv1D(128, 5, activation='relu')(embedded_sequences)
x = MaxPooling1D(5)(x)
x = Conv1D(128, 5, activation='relu')(x)
x = MaxPooling1D(5)(x)
x = Dropout(0.2)(x)
x = Conv1D(128, 5, activation='relu')(x)
x = MaxPooling1D(35)(x) # global max pooling
x = Flatten()(x)
x = Dense(128, activation='relu')(x)
output = Dense(target.shape[1], activation='softmax')(x)
model = Model(sequence_input, output)
model.compile(loss='categorical_crossentropy', optimizer='adam',
metrics=['acc'])
model.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=2,
batch_size=128)
The issue happens when I remove the embedding vectors and use completely random vectors, surprisingly achieving higher accuracy: 96.5%.
The code is the same, with one modification: weighs=[random_matrix]. That's a matrix with the same shape of embed_matrix, but using random values. So this is the embedding layer now:
embedding_layer = Embedding(len(embed_matrix),
len(embed_matrix.columns), weights=[random_matrix],
input_length=data.shape[1:], trainable=False)
I experimented many times with random weights and the result is always similar. Notice that even though those weights are random, the trainable parameter is still False, so the NN is not updating them.
After that, I fully removed the embedding layer and used words sequences as the input, expecting that those weights were not contributing to the model's accuracy. With that, I got nothing more than 16% accuracy.
So, what is going on? How could random embeddings achieve the same or better performance than pre-trained ones?
And why using word indexes (normalized, of course) as inputs result in such a poor accuracy?

Errors while fine tuning InceptionV3 in Keras

I am going to fine-tune InceptionV3 model using my self-defined dataset. Unfortunately, when using model.fit to train, here comes the error below:
ValueError: Error when checking target: expected dense_6 to have shape (4,) but got array with shape (1,)
Firstly, I load my own dataset as training_data which contains a pair of image and corresponding label. Then, I use the code below to convert them into specific array-type(img_new and label_new) so that it's compatible to Keras's inputs of both data and labels.
for img, label in training_data:
img_new[i,:,:,:] = img
label_new[i,:] = label
i=i+1
Second, I fine tune the Inception Model below.
InceptionV3_model=keras.applications.inception_v3.InceptionV3(include_top=False,
weights='imagenet',
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000)
#InceptionV3_model.summary()
# add a global spatial average pooling layer
x = InceptionV3_model.output
x = GlobalAveragePooling2D()(x)
# let's add a fully-connected layer
x = Dense(1024, activation='relu')(x)
# and a logistic layer -- let's say we have 4 classes
predictions = Dense(4, activation='softmax')(x)
# this is the model we will train
model = Model(inputs=InceptionV3_model.input, outputs=predictions)
# Transfer Learning
for layer in model.layers[:311]:
layer.trainable = False
for layer in model.layers[311:]:
layer.trainable = True
from keras.optimizers import SGD
model.compile(optimizer=SGD(lr=0.001, momentum=0.9), loss='categorical_crossentropy')
model.fit(x=X_train, y=y_train, batch_size=3, epochs=3, validation_split=0.2)
model.save_weights('first_try.h5')
Does anyone have ideas of what is wrong while training using model.fit?
Sincerely thanks for your kind help.
The error is caused because my labels r integers, I gotta compile it by sparse_categorical_crossentropy which is set for integer labels instead of categorical_crossentropy which is used for one-hot encoding.
Sincerely thank for the help by #Amir very much. :-)

Training only one output of a network in Keras

I have a network in Keras with many outputs, however, my training data only provides information for a single output at a time.
At the moment my method for training has been to run a prediction on the input in question, change the value of the particular output that I am training and then doing a single batch update. If I'm right this is the same as setting the loss for all outputs to zero except the one that I'm trying to train.
Is there a better way? I've tried class weights where I set a zero weight for all but the output I'm training but it doesn't give me the results I expect?
I'm using the Theano backend.
Outputting multiple results and optimizing only one of them
Let's say you want to return output from multiple layers, maybe from some intermediate layers, but you need to optimize only one target output. Here's how you can do it:
Let's start with this model:
inputs = Input(shape=(784,))
x = Dense(64, activation='relu')(inputs)
# you want to extract these values
useful_info = Dense(32, activation='relu', name='useful_info')(x)
# final output. used for loss calculation and optimization
result = Dense(1, activation='softmax', name='result')(useful_info)
Compile with multiple outputs, set loss as None for extra outputs:
Give None for outputs that you don't want to use for loss calculation and optimization
model = Model(inputs=inputs, outputs=[result, useful_info])
model.compile(optimizer='rmsprop',
loss=['categorical_crossentropy', None],
metrics=['accuracy'])
Provide only target outputs when training. Skipping extra outputs:
model.fit(my_inputs, {'result': train_labels}, epochs=.., batch_size=...)
# this also works:
#model.fit(my_inputs, [train_labels], epochs=.., batch_size=...)
One predict to get them all
Having one model you can run predict only once to get all outputs you need:
predicted_labels, useful_info = model.predict(new_x)
In order to achieve this I ended up using the 'Functional API'. You basically create multiple models, using the same layers input and hidden layers but different output layers.
For example:
https://keras.io/getting-started/functional-api-guide/
from keras.layers import Input, Dense
from keras.models import Model
# This returns a tensor
inputs = Input(shape=(784,))
# a layer instance is callable on a tensor, and returns a tensor
x = Dense(64, activation='relu')(inputs)
x = Dense(64, activation='relu')(x)
predictions_A = Dense(1, activation='softmax')(x)
predictions_B = Dense(1, activation='softmax')(x)
# This creates a model that includes
# the Input layer and three Dense layers
modelA = Model(inputs=inputs, outputs=predictions_A)
modelA.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
modelB = Model(inputs=inputs, outputs=predictions_B)
modelB.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])

Resources