What is the appropriate way to use BCE loss with ResNet outputs?

What is the appropriate way to use BCE loss with ResNet outputs? - pytorch

The title explains the overall problem, but for some elaboration:
I'm using torchvision.models.resnet18() to run an anomaly detection scheme. I initialize the model by doing:
net = torchvision.models.resnet18(num_classes=2)
since in my particular setting 0 equals normal samples and 1 equals anomalous samples.
The output from my model is of shape (16, 2) (batch size is 16) and labels are of size (16, 1). This gives me the error that the two input tensors are of inappropriate shape.
In order to solve this, I tried something like:
>>> new_output = torch.argmax(output, dim=1)
Which gives me the appropriate shape, but running loss = nn.BCELoss(new_output, labels) gives me the error:
RuntimeError: bool value of Tensor with more than one value is ambiguous
What is the appropriate way for me to approach this issue? Thanks.
Edit
I've also tried using nn.CrossEntropyLoss as well, but am getting the same RuntimeError.
More specifically, I tried nn.CrossEntropyLoss(output, label) and nn.CrossEntropyLoss(output, label.flatten()).

If you want to use BCELoss, the output shape should be (16, 1) instead of (16, 2) even though you have two classes. You may consider reading this excellent writing to under binary cross-entropy loss.
Since you are getting output from resnet18 with shape (16, 2), you should rather use CrossEntropyLoss where you can give (16, 2) output and label of shape (16).
You should use CrossEntropyLoss as follows.
loss_crit = nn.CrossEntropyLoss()
loss = loss_crit(output, label)
where output = (16, 2) and label = (16). Please note, the label should contain either 0 or 1.
Please see the example provided (copied below) in the official documentation.
>>> loss = nn.CrossEntropyLoss()
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.empty(3, dtype=torch.long).random_(5)
>>> output = loss(input, target)
>>> output.backward()

Related

torch crossentropy loss calculation difference between 2D input and 3D input

i am running a test on torch.nn.CrossEntropyLoss. I am using the example shown on the official page.
loss = nn.CrossEntropyLoss()
input = torch.randn(3, 5, requires_grad=False)
target = torch.randn(3, 5).softmax(dim=1)
output = loss(input, target)
the output is 2.05.
in the example, both the input and the target are 2D tensors. Since in most NLP case, the input should be 3D tensor and correspondingly the output should be 3D tensor as well. Therefore, i wrote the a couple lines of testing code, and found a weird issue.
input = torch.stack([input])
target = torch.stack([target])
output = loss(ins, ts)
the output is 0.9492
This result really confuse me, except the dimensions, the numbers inside the tensors are totally the same. Does anyone know the reason why the difference is?
the reason why i am testing on the method is i am working on project with Transformers.BartForConditionalGeneration. the loss result is given in the output, which is always in (1,) shape. the output is confusing. If my batch size is greater than 1, i am supposed to get batch size number of loss instead of just one. I took a look at the code, it just simply use nn.CrossEntropyLoss(), so i am considering that the issue may be in the nn.CrossEntropyLoss() method. However, it is stucked in the method.

In the second case, you are adding an extra dimension which means that ultimately, the softmax on the logits tensor (input) won't be applied on a different dimension.
Here we compute the two quantities separately:
>>> loss = nn.CrossEntropyLoss()
>>> input = torch.randn(3, 5, requires_grad=False)
>>> target = torch.randn(3, 5).softmax(dim=1)
First you have loss(input, target) which is identical to:
>>> o = -target*F.log_softmax(input, 1)
>>> o.sum(1).mean()
And your second scenario, loss(input[None], target[None]), identical to:
>>> o = -target[None]*F.log_softmax(input[None], 1)
>>> o.sum(1).mean()

Can't use combination of gradiants for multiple losses functions of a multi-output keras model

I am doing a time-series forecasting in Keras with a CNN and the EHR dataset. The goal is to predict both what molecule to give to the patient and the time until the next patient visit. I have to implement a bi-objective gradient descent based on this paper. The algorithm to implements is here (end of page 7, the beginning of page 8):
The model I choose is this one :
With time-series of length 3 as input (correspondings to 3 consecutive visits for a client)
And 2 outputs:
the atc code (the code of the molecule to predict)
the time to wait until the next visit (in categories of months: 0,1,2,3,4 for >=4)
both outputs use the SparseCategoricalCorssentropy loss function.
when I start to implement the first operation: gs - gl I have this error :
Some values in my gradients are at None and I don't know why. My optimizer is defined as follow: optimizer=tf.Keras.optimizers.Adam(learning_rate=1e-3 when compiling my model.
Also, when I try some operations on gradients to see how things work, I have another problem: only one input is taken into account which will pose a problem later because I have to consider each loss function separately:
With this code, I have this output message : WARNING:tensorflow:Gradients do not exist for variables ['outputWaitTime/kernel:0', 'outputWaitTime/bias:0'] when minimizing the loss.
EPOCHS = 1
for epoch in range(EPOCHS):
with tf.GradientTape() as ATCTape, tf.GradientTape() as WTTape:
predictions = model(xTrain,training=False)
ATCLoss = loss(yTrain[:,:,0],predictions[ATC_CODE])
WTLoss = loss(yTrain[:,:,1],predictions[WAIT_TIME])
ATCGrads = ATCTape.gradient(ATCLoss, model.trainable_variables)
WTGrads = WTTape.gradient(WTLoss,model.trainable_variables)
grads = ATCGrads + WTGrads
model.optimizer.apply_gradients(zip(grads, model.trainable_variables))
With this code, it's okay, but both losses are combined into one, whereas I need to consider both losses separately
EPOCHS = 1
for epoch in range(EPOCHS):
with tf.GradientTape() as tape:
predictions = model(xTrain,training=False)
ATCLoss = loss(yTrain[:,:,0],predictions[ATC_CODE])
WTLoss = loss(yTrain[:,:,1],predictions[WAIT_TIME])
lossValue = ATCLoss + WTLoss
grads = tape.gradient(lossValue, model.trainable_variables)
model.optimizer.apply_gradients(zip(grads, model.trainable_variables))
I need help to understand why I have all of those problems.
The notebook containing all the code is here: https://colab.research.google.com/drive/1b6UorAAEddNKFQCxaK1Wsuj09U645KhU?usp=sharing
The implementation begins in the part Model Creation

The reason you get None in ATCGrads and WTGrads is because two gradients corresponding loss is wrt different outputs outputATC and outputWaitTime, if
outputs value is not using to calculate the loss then there will be no gradients wrt that outputs hence you get None gradients for that output layer. That is also the reason why you get WARNING:tensorflow:Gradients do not exist for variables ['outputWaitTime/kernel:0', 'outputWaitTime/bias:0'] when minimizing the loss, because you don't have those gradients wrt each loss. If you combine losses into one then both outputs are using to calculate the loss, thus no WARNING.
So if you want do a list element wise subtraction, you could first convert None to 0. before subtraction, and you cannot using tf.math.subtract(gs, gl) because it require shapes of all inputs must match, so:
import tensorflow as tf
gs = [tf.constant([1., 2.]), tf.constant(3.), None]
gl = [tf.constant([3., 4.]), None, tf.constant(4.)]
to_zero = lambda i : 0. if i is None else i
gs = list(map(to_zero, gs))
gl = list(map(to_zero, gl))
sub = [s_i - l_i for s_i, l_i in zip(gs, gl)]
print(sub)
Outpts:
[<tf.Tensor: shape=(2,), dtype=float32, numpy=array([-2., -2.], dtype=float32)>,
<tf.Tensor: shape=(), dtype=float32, numpy=3.0>,
<tf.Tensor: shape=(), dtype=float32, numpy=-4.0>]
Also beware the tape.gradient() will return a list or nested structure of Tensors (or IndexedSlices, or None), one for each element in sources. Returned structure is the same as the structure of sources; Add two list [1, 2] + [3, 4] in python will not give you [4, 6] like you do in numpy array, instead it will combine two list and give you [1, 2, 3, 4].

Is my Keras Convolutional Model returning any values?

For my project I'm mostly following this simple GAN tutorial except that my data is in a time series of 3 values between {-1,1}. I striped away a lot of its complexity to try to understand where the discrepancy is coming from. However, after lots of trail & error and Stack Overflow searches it's time I raise my hand and ask for help. I'm running Python 3.6 / Conda 4.8.3 in a VSCode Jupyter notebook on OSX with TensorFlow 2.0.0. My simplified discriminator does not return any errors in my notebook.
def build_discriminator():
discriminator_input = Input(shape=(4000,3), name='discriminator_input')
x = discriminator_input
x = Conv1D(32, 3, strides=1, padding="same", input_shape=(4000,3)) (x)
x = LeakyReLU()(x)
x = Dropout(0.3)(x)
x = Flatten()(x)
discriminator_output = Dense(1, activation='sigmoid')(x)
return Model(discriminator_input, discriminator_output)
#Test it with some random noise of the same shape as the training data
d = build_discriminator()
noise = tf.random.uniform(
(1,4000,3), minval=-1, maxval=1, dtype=tf.dtypes.float32
)
decision = d(noise)
Output I'm getting:
print(decision)
<tf.Tensor 'model_1/dense_6/Sigmoid:0' shape=(1, 1) dtype=float32>
I was expecting to put random noise in the untrained discriminator the same size as a training sample and at least get a value between [0,1] to test that the network is processing data.
Expected output:
<tf.Tensor [[0.014325]] shape=(1, 1) dtype=float32>
I need a bit of help interpreting this discrepancy. Does that mean my model isn't processing at all? Or am I missing something more subtle? What do I need to change so that my discriminator returns a tensor of values?

Against recommendations I spend some time removing Keras & Tensorflow from Conda and installing it with pip so that tf.__version__ correctly returned 2.2.0 in the notebook. To my surprise it worked and returned the expected result.
<tf.Tensor: shape=(1, 1), dtype=float32, numpy=array([[0.49497133]], dtype=float32)>
Posting here in case anyone else stumbles across this question with the same problem.

Input shape for 1D convolution network in keras

I am quite new to keras and I have a problem in understanding shapes.
I wanted to create 1D Conv Keras model as follows, I don't know this is correct or not:
TIME_PERIODS = 511
num_sensors = 2
num_classes = 4
BATCH_SIZE = 400
EPOCHS = 50
model_m = Sequential()
model_m.add(Conv1D(100, 10, activation='relu', input_shape=(TIME_PERIODS, num_sensors)))
model_m.add(Conv1D(100, 10, activation='relu'))
model_m.add(MaxPooling1D(3))
model_m.add(Conv1D(160, 10, activation='relu'))
model_m.add(Conv1D(160, 10, activation='relu'))
model_m.add(GlobalAveragePooling1D())
model_m.add(Dropout(0.5))
model_m.add(Dense(num_classes, activation='softmax'))
The input data I have is 888 different panda data frame where each frame is of shape (511, 3) where 511 is numbers of signal points and 0th column is sensor1 values, 1st column is sensor2 values and 2nd column is labelled signals.
Now how I should combine all my 888 different panda data frame so I have x_train and y_train from X and Y using Sklearn train_test_split.
Also, I think the input shape I am defining for the model is wrong and I don't think I actually have TIME_PERIODS because, for 1-time point, I have 2 sensor inputs (orange, blue line) value and 1 output label (green line).
The context of the problem I am trying to solve e.g.
input: time-based 2 sensors values say for 1 AM-2 AM hour from a user, output: the range of times e.g where the user was doing activity 1, activity 2, activity X on 1:10-1:15, 1:15-1:30, 1:30-2:00, The above plot show a sample training input and output.
The problem is inspired from here but in my case, I don't have any time period, my 1-time point has 1 output label.
Update 1:
I am almost certain that my TIME_PERIODS=1 as for the prediction I will give 511 inputs and expects to get 511 output values.

Each dataframe is an independent sequence?
fileNames = get a list of filenames here, you can maybe os.listdir for that
allFrames = [pandas.read_csv(filename,... other_things...).values for filename in fileNames]
allData = np.stack(allFrames, axis=0)
inputData = allData[:,:num_sensors]
outputData = allData[:, -1:]
You can now use train test split the way you want.
Your input shape is correct.
If you want to predict the whole sequence, then you have to remove the poolings. Every convolution should use padding='same'.
And maybe you should use a Biridectional(LSTM(units, return_sequences=True)) layer somewhere to make your model stronger.
A simple model as an example. (Notice that models are totally open to creativity)
from keras.layers import *
inputs = Input((TIME_PERIODS,num_sensors)) #Should be called "time_steps" to be precise
outputs = Conv1D(any, 3, padding='same', activation = 'tanh')(inputs)
outputs = Bidirectional(LSTM(any, return_sequences=True))(outputs)
outputs = Conv1D(num_classes, activation='softmax', padding='same')(outputs)
model = keras.models.Model(inputs, outputs)

To say the least, you're in the correct path. The full solution for this would be like,
df = pd.concat([pd.read_csv(fname, index_col=<int>, header=<int>) for f filenames], ignore_index=True, axis=0)
inputs = df.loc[:,:-1]
labels = df.loc[:,0]
X_train, X_test, y_train, y_test = train_test_split(inputs, labels, test_size=<float>)
To add a bit more information, note how you are doing,
model_m.add(Conv1D(100, 10, activation='relu', input_shape=(TIME_PERIODS, num_sensors)))
and not
model_m.add(Conv1D(100, 10, activation='relu', padding='SAME', input_shape=(TIME_PERIODS, num_sensors)))
So, as you're not setting padding="Same" for the convolution layers this might have the undesirable effect of input becoming smaller and smaller as you go deeper to the model. If that's what you need, that's okay. Otherwise, set `padding="SAME".
For example, without same-padding you'll get, a width around 144 when you get to the GlobalPooling layer, where if you use same-padding it would be roughly 170. It's not a major problem here, but can easily lead to negative sizes in your input for deeper layers.

Input dimension mismatch binary crossentropy Lasagne and Theano

I read all posts in the net adressing the issue where people forgot to change the target vector to a matrix, and as a problem remains after this change, I decided to ask my question here. Workarounds are mentioned below, but new problems show and I am thankful for suggestions!
Using a convolution network setup and binary crossentropy with sigmoid activation function, I get a dimension mismatch problem, but not during the training data, only during validation / test data evaluation. For some strange reason, of of my validation set vectors get his dimension switched and I have no idea, why. Training, as mentioned above, works fine. Code follows below, thanks a lot for help (and sorry for hijacking the thread, but I saw no reason for creating a new one), most of it copied from the lasagne tutorial example.
Workarounds and new problems:
Removing "axis=1" in the valAcc definition helps, but validation accuracy remains zero and test classification always returns the same result, no matter how many nodes, layers, filters etc. I have. Even changing training set size (I have around 350 samples for each class with 48x64 grayscale images) does not change this. So something seems off
Network creation:
def build_cnn(imgSet, input_var=None):
# As a third model, we'll create a CNN of two convolution + pooling stages
# and a fully-connected hidden layer in front of the output layer.
# Input layer using shape information from training
network = lasagne.layers.InputLayer(shape=(None, \
imgSet.shape[1], imgSet.shape[2], imgSet.shape[3]), input_var=input_var)
# This time we do not apply input dropout, as it tends to work less well
# for convolutional layers.
# Convolutional layer with 32 kernels of size 5x5. Strided and padded
# convolutions are supported as well; see the docstring.
network = lasagne.layers.Conv2DLayer(
network, num_filters=32, filter_size=(5, 5),
nonlinearity=lasagne.nonlinearities.rectify,
W=lasagne.init.GlorotUniform())
# Max-pooling layer of factor 2 in both dimensions:
network = lasagne.layers.MaxPool2DLayer(network, pool_size=(2, 2))
# Another convolution with 16 5x5 kernels, and another 2x2 pooling:
network = lasagne.layers.Conv2DLayer(
network, num_filters=16, filter_size=(5, 5),
nonlinearity=lasagne.nonlinearities.rectify)
network = lasagne.layers.MaxPool2DLayer(network, pool_size=(2, 2))
# A fully-connected layer of 64 units with 25% dropout on its inputs:
network = lasagne.layers.DenseLayer(
lasagne.layers.dropout(network, p=.25),
num_units=64,
nonlinearity=lasagne.nonlinearities.rectify)
# And, finally, the 2-unit output layer with 50% dropout on its inputs:
network = lasagne.layers.DenseLayer(
lasagne.layers.dropout(network, p=.5),
num_units=1,
nonlinearity=lasagne.nonlinearities.sigmoid)
return network
Target matrices for all sets are created like this (training target vector as an example)
targetsTrain = np.vstack( (targetsTrain, [[targetClass], ]*numTr) );
...and the theano variables as such
inputVar = T.tensor4('inputs')
targetVar = T.imatrix('targets')
network = build_cnn(trainset, inputVar)
predictions = lasagne.layers.get_output(network)
loss = lasagne.objectives.binary_crossentropy(predictions, targetVar)
loss = loss.mean()
params = lasagne.layers.get_all_params(network, trainable=True)
updates = lasagne.updates.nesterov_momentum(loss, params, learning_rate=0.01, momentum=0.9)
valPrediction = lasagne.layers.get_output(network, deterministic=True)
valLoss = lasagne.objectives.binary_crossentropy(valPrediction, targetVar)
valLoss = valLoss.mean()
valAcc = T.mean(T.eq(T.argmax(valPrediction, axis=1), targetVar), dtype=theano.config.floatX)
train_fn = function([inputVar, targetVar], loss, updates=updates, allow_input_downcast=True)
val_fn = function([inputVar, targetVar], [valLoss, valAcc])
Finally, here the two loops, training and test. The first is fine, the second throws the error, excerpts below
# -- Neural network training itself -- #
numIts = 100
for itNr in range(0, numIts):
train_err = 0
train_batches = 0
for batch in iterate_minibatches(trainset.astype('float32'), targetsTrain.astype('int8'), len(trainset)//4, shuffle=True):
inputs, targets = batch
print (inputs.shape)
print(targets.shape)
train_err += train_fn(inputs, targets)
train_batches += 1
# And a full pass over the validation data:
val_err = 0
val_acc = 0
val_batches = 0
for batch in iterate_minibatches(valset.astype('float32'), targetsVal.astype('int8'), len(valset)//3, shuffle=False):
[inputs, targets] = batch
[err, acc] = val_fn(inputs, targets)
val_err += err
val_acc += acc
val_batches += 1
Erorr (excerpts)
Exception "unhandled ValueError"
Input dimension mis-match. (input[0].shape[1] = 52, input[1].shape[1] = 1)
Apply node that caused the error: Elemwise{eq,no_inplace}(DimShuffle{x,0}.0, targets)
Toposort index: 36
Inputs types: [TensorType(int64, row), TensorType(int32, matrix)]
Inputs shapes: [(1, 52), (52, 1)]
Inputs strides: [(416, 8), (4, 4)]
Inputs values: ['not shown', 'not shown']
Again, thanks for help!

so it seems the error is in the evaluation of the validation accuracy.
When you remove the "axis=1" in your calculation, the argmax goes on everything, returning only a number.
Then, broadcasting steps in and this is why you would see the same value for the whole set.
But from the error you have posted, the "T.eq" op throws the error because it has to compare a 52 x 1 with a 1 x 52 vector (matrix for theano/numpy).
So, I suggest you try to replace the line with:
valAcc = T.mean(T.eq(T.argmax(valPrediction, axis=1), targetVar.T))
I hope this should fix the error, but I haven't tested it myself.
EDIT:
The error lies in the argmax op that is called.
Normally, the argmax is there to determine which of the output units is activated the most.
However, in your setting you only have one output neuron which means that the argmax over all output neurons will always return 0 (for first arg).
This is why you have the impression your network gives you always 0 as output.
By replacing:
valAcc = T.mean(T.eq(T.argmax(valPrediction, axis=1), targetVar.T))
with:
binaryPrediction = valPrediction > .5
valAcc = T.mean(T.eq(binaryPrediction, targetVar.T)
you should get the desired result.
I'm just not sure, if the transpose is still necessary or not.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

What is the appropriate way to use BCE loss with ResNet outputs? - pytorch

Related

torch crossentropy loss calculation difference between 2D input and 3D input

Can't use combination of gradiants for multiple losses functions of a multi-output keras model

Is my Keras Convolutional Model returning any values?

Input shape for 1D convolution network in keras

Input dimension mismatch binary crossentropy Lasagne and Theano

Categories

Resources