I am currently implementing a function to compute Custom Cross Entropy Loss.
The definition of the function is a following image.
my codes are as following,
output = output.permute(0, 2, 3, 1)
target = target.permute(0, 2, 3, 1)
batch, height, width, channel = output.size()
total_loss = 0.
for b in range(batch): # for each batch
o = output[b]
t = target[b]
loss = 0.
for w in range(width):
for h in range(height): # for every pixel([h,w]) in the image
sid_t = t[h][w][0]
sid_o_candi = o[h][w]
part1 = 0. # to store the first sigma
part2 = 0. # to store the second sigma
for k in range(0, sid_t):
p = torch.sum(sid_o_candi[k:]) # to get Pk(w,h)
part1 += torch.log(p + 1e-12).item()
for k in range(sid_t, intervals):
p = torch.sum(sid_o_candi[k:]) # to get Pk(w,h)
part2 += torch.log(1-p + 1e-12).item()
loss += part1 + part2
loss /= width * height * (-1)
total_loss += loss
total_loss /= batch
return torch.tensor(total_loss, dtype=torch.float32)
I am wondering is there any optimization could be done with these code.
I'm not sure sid_t = t[h][w][0] is the same for every pixel or not. If so, you can get rid of all for loop which boost the speed of computing loss.
Don't use .item() because it will return a Python value which loses the grad_fn track. Then you can't use loss.backward() to compute the gradients.
If sid_t = t[h][w][0] is not the same, here is some modification to help you get rid of at least 1 for-loop:
batch, height, width, channel = output.size()
total_loss = 0.
for b in range(batch): # for each batch
o = output[b]
t = target[b]
loss = 0.
for w in range(width):
for h in range(height): # for every pixel([h,w]) in the image
sid_t = t[h][w][0]
sid_o_candi = o[h][w]
part1 = 0. # to store the first sigma
part2 = 0. # to store the second sigma
sid1_cumsum = sid_o_candi[:sid_t].flip(dim=(0,)).cumsum(dim=0).flip(dims=(0,))
part1 = torch.sum(torch.log(sid1_cumsum + 1e-12))
sid2_cumsum = sid_o_candi[sid_t:intervals].flip(dim=(0,)).cumsum(dim=0).flip(dims=(0,))
part2 = torch.sum(torch.log(1 - sid2_cumsum + 1e-12))
loss += part1 + part2
loss /= width * height * (-1)
total_loss += loss
total_loss /= batch
return torch.tensor(total_loss, dtype=torch.float32)
How it works:
x = torch.arange(10);
print(x)
x_flip = x.flip(dims=(0,));
print(x_flip)
x_inverse_cumsum = x_flip.cumsum(dim=0).flip(dims=(0,))
print(x_inverse_cumsum)
# output
tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
tensor([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
tensor([45, 45, 44, 42, 39, 35, 30, 24, 17, 9])
Hope it helps.
Related
I am training a model to segment an image to predict the degree of damage (ranging from 0: no damage, to 5: severe damage) for each pixel of an image. I have approached it this way:
def simple_loss(pred, mask): # regression case
pred = torch.sigmoid(pred)
return (F.mse_loss(pred, mask, reduce='none')).mean()
def structure_loss(pred, mask): # binary case: damaged vs undamaged
weit = 1 + 5 * torch.abs(F.avg_pool2d(mask, kernel_size=31, stride=1, padding=15) - mask)
wbce = F.binary_cross_entropy_with_logits(pred, mask, reduce='none')
wbce = (weit * wbce).sum(dim=(2, 3)) / weit.sum(dim=(2, 3))
pred = torch.sigmoid(pred)
inter = ((pred * mask) * weit).sum(dim=(2, 3))
union = ((pred + mask) * weit).sum(dim=(2, 3))
wiou = 1 - (inter + 1) / (union - inter + 1)
return (wbce + wiou).mean()
Binary case yields IoU > 0.6, but the regression model is inaccurate. My datset is imbalanced (100:1) with the majority of the pixels belonging to the undamaged class. Hence, the optimization is driven towards accurate prediction of undamaged pixels.
The confusion matrix in the (1..5) region shows no correlation between the label and the predicted value.
I cannot balance the set because the undamaged region next to the damaged area is informative to humans, trained to examine the damage.
How can I modify the loss function to assign higher cost to regression errors regarding the degree of damage?
We can encode irrelevant pixels with -1. Then modify the loss function to ignore irrelevant classes this way:
from keras import backend as K
def masked_mse(mask_value):
def f(y_true, y_pred):
mask_true = K.cast(K.not_equal(y_true, mask_value), K.floatx())
masked_squared_error = K.square(mask_true * (y_true - y_pred))
masked_mse = K.sum(masked_squared_error, axis=-1) / K.sum(mask_true, axis=-1)
return masked_mse
f.__name__ = 'Masked MSE (mask_value={})'.format(mask_value)
return f
y_pred = K.constant([[ 1, 1, 1, 1],
[ 1, 1, 1, 3],
[ 1, 1, 1, 3],
[ 1, 1, 1, 3],
[ 1, 1, 1, 3],
[ 1, 1, 1, 3]])
y_true = K.constant([[ 1, 1, 1, 1],
[ 1, 1, 1, 1],
[-1, 1, 1, 1],
[-1,-1, 1, 1],
[-1,-1,-1, 1],
[-1,-1,-1,-1]])
true = K.eval(y_true)
pred = K.eval(y_pred)
loss = K.eval(masked_mse(-1)(y_true, y_pred))
for i in range(true.shape[0]):
print(true[3], pred[3], loss[3], sep='\t')
# [-1. -1. 1. 1.] [ 1. 1. 1. 3.] 2.0
For example, given logits, dim, and boundary,
boundary = torch.tensor([[0, 3, 4, 8, 0]
[1, 3, 5, 7, 9]]
# representing sections look like:
# [[00012222_]
# [_00112233]
# in shape: (2, 9)
# (sections cannot be sliced)
logits = torch.rand(2, 9, 100)
result = blocky_softmax(logits, dim = 1, boundary = boundary)
# result[:, :, 0] may look like:
# [[0.33, 0.33, 0.33, 1.00, 0.25, 0.25, 0.25, 0.25, 0.0 ]
# [0.0, 0.50, 0.50, 0.50, 0.50, 0.50, 0.50, 0.50, 0.50]]
# other 99 slices looks similar with each blocks sum to 1.
we hope the Softmax is applied to dim = 1, but sections are also applied to this dim.
My current implementation with PyTorch is using for. It is slow and cost too much memory,
which looks like:
def blocky_softmax(logits, splits, map_inf_to = None):
_, batch_len, _ = logits.shape
exp_logits = logits.exp() # [2, 9, 100]
batch_seq_idx = torch.arange(batch_len, device = logits.device)[None, :]
base = torch.zeros_like(logits)
_, n_blocks = splits.shape
for nid in range(1, n_blocks):
start = splits[:, nid - 1, None]
end = splits[:, nid, None]
area = batch_seq_idx >= start
area &= batch_seq_idx < end
area.unsqueeze_(dim = 2)
blocky_z = area * blocky_z
base = base + blocky_z
if map_inf_to is not None:
good_base = base > 0
ones = torch.ones_like(base)
base = torch.where(good_base, base, ones)
exp_logits = torch.where(good_base, exp_logits, ones * map_inf_to)
return exp_logits / base
This implementation is slowed and fattened by n_blocks times. But it could be parallel with each section.
If there is no off-the-shelf function, should I write a CUDA/C++ library? I hope you could help with my issue.
For further generalization, I hope there are discontinuities in boundary/sections.
sections = torch.tensor([[ 0, 0, 0, -1, 2, 3, 2, 3, 0, 3]
[-1, 0, 0, 1, 2, 1, 2, 1, -1, 1]]
# [[000_232303]
# [_0012121_1]]
Thank you for reading:)
I realize that scatter_add and gather perfectly solve the problem.
I am building a CNN where the input is a grayscale image (256x256x1) and I want to add a Fourier transform layer which should output a shape (256x256x2), with the 2 channels for real and imaginary. I found tf.signal.fft2d on https://www.tensorflow.org/api_docs/python/tf/signal/fft2d . Unfortunately it is hard to find any example or explanation of how to use it concretely... I have tried:
X_input = Input(input_shape,)
X_input_fft=Lambda(lambda v: tf.cast(tf.compat.v1.spectral.rfft2d(v),dtype=tf.float32))(X_input)
l1Conv1 = Conv2D(filters = 16, kernel_size = (5,5), strides = 1, padding ='same',
data_format='channels_last',
kernel_initializer= initializers.he_normal(seed=None),
bias_initializer='zeros')(X_input_fft)
but honestly I don't know what I am doing ...
Also, for the last layer, I would like to do an inverse fft, something like:
myLastLayer= Lambda(lambda v: tf.cast(tf.compat.v1.spectral.irfft2d(tf.cast(v, dtype=tf.complex64)),dtype=tf.float32))(myBeforeLastLayer)
I'm sorry that the answer comes 2 years later but I think this will help a lot of people dealing with Tensorflow fft2d
The first thing you should know is that the documentation says that TensorFlow performs the fft2d in "the inner-most 2 dimensions of input", which only means that they perform the fft2 in the last two dimensions. Then you have to permute the input tensor to work with that.
A function that will do the thing you need would be this one.
def fft2d_function(x, dtype = "complex64"):
x = tf.transpose(x, perm = [2, 0, 1])
x = tf.cast(x, dtype)
x_f = tf.signal.fft2d(x)
x_f = tf.transpose(x_f, perm = [1, 2, 0])
real_x_f, imag_x_f = tf.math.real(x_f), tf.math.imag(x_f)
return real_x_f, imag_x_f
or, if you are sure that the input is a real signal you can use rfft2d instead
def rfft2d_function(x):
x = tf.transpose(x, perm = [2, 0, 1])
x_f = tf.signal.rfft2d(x)
x_f = tf.transpose(x_f, perm = [1, 2, 0])
real_x_f, imag_x_f = tf.math.real(x_f), tf.math.imag(x_f)
return real_x_f, imag_x_f
Besides, if you want to perform the inverse of these functions would be like this.
def ifft2d_function(x_r_i_tuple):
real_x_f, imag_x_f = x_r_i_tuple
x_f = tf.complex(real_x_f, imag_x_f)
x_f = tf.transpose(x_f, perm = [2, 0, 1])
x_hat = tf.signal.ifft2d(x_f)
x_hat = tf.transpose(x_hat, perm = [1, 2, 0])
return x_hat
def irfft2d_function(x_r_i_tuple):
real_x_f, imag_x_f = x_r_i_tuple
x_f = tf.complex(real_x_f, imag_x_f)
x_f = tf.transpose(x_f, perm = [2, 0, 1])
x_hat = tf.signal.irfft2d(x_f)
x_hat = tf.transpose(x_hat, perm = [1, 2, 0])
return x_hat
To end. an important thing in Fourier is the fftshift. TensorFlow also has a
fourier_x = tf.signal.fftshift(fourier_x)
I hope this answer helps someone dealing with Fourier transform in Tensorflow
Program that finds the maximal rectangle containing only 1's of a binary matrix with the maximal histogram problem.
I am trying to do some tests on a code
def maximalRectangle(self, matrix):
if not matrix or not matrix[0]:
return 0
n = len(matrix[0])
height = [0] * (n + 1)
ans = 0
for row in matrix:
for i in range(n):
height[i] = height[i] + 1 if row[i] == '1' else 0
stack = [-1]
for i in range(n + 1):
while height[i] < height[stack[-1]]:
h = height[stack.pop()]
w = i - 1 - stack[-1]
ans = max(ans, h * w)
stack.append(i)
return ans
# Driver Code
if __name__ == '__main__':
matrix = [[0, 1, 0, 1],
[0, 1, 0, 1],
[0, 1, 1, 1],
[1, 1, 1, 1]]
print(maximalRectangle(matrix))
I get TypeError: maximalRectangle() missing 1 required positional argument: 'matrix' error
Solved by removing self and changing the print statement to:
print(maximalRectangle([
["1","0","1","0","0"],
["1","1","1","1","1"],
["1","1","1","1","1"],
["1","0","0","1","0"]]))
I was following the sklearn documentation and was able to figure out MinMaxScaler(),
but what sklearn.preprocessing.normalise does? Can anyone explain me with a simple example.Thanks in advance.
The Normalizer will process each row to rescale them to the unit circle, e.g. :
The sum of square data will be equals to 1.
So,
X = [4, 1, 2, 2]
transformer = Normalizer().fit(X)
# Returns
Normalizer(copy=True, norm='l2')
# Then when you transform you
transformer.transform(X)
# Returns
array([0.8, 0.2, 0.4, 0.4])
To verify what I said, you can verify that the sum square is equal to one :
0.8^2 + 0.2^2 + 0.4^2 + 0.4^2 = 1
The MinMaxScaler uses the max and min of a column to scale data between 0 and 1 with the following formula :
X_std = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0))
X_scaled = X_std * (max - min) + min
where min, max = feature_range
Taking the same example :
# feature_range = 0, 1 if you want to scale it between 0 and 1
X_std = [1, 0, 0.333, 0.333]
X_scaled = X_std * (1 - 0) + 0
# So X_scaled = X_std for this range
So your MinMaxScaled is X_scaled = [1, 0, 0.333, 0.333]
Taking another example, you can check the maths :
data = [[-1, 2], [-0.5, 6], [0, 10], [1, 18]]
scaler = MinMaxScaler()
print(scaler.fit(data))
#
MinMaxScaler(copy=True, feature_range=(0, 1))
print(scaler.data_max_)
[ 1. 18.]
print(scaler.transform(data))
[[0. 0. ]
[0.25 0.25]
[0.5 0.5 ]
[1. 1. ]]