x = torch.randn(1, 1, 0)
y = torch.randn(4, 1, 1)
(x+y)
tensor([], size=(4, 1, 0))
(x + y).shape
torch.Size([4, 1, 0])
shouldn’t it have been 4, 1, 1, just y?
Your x has 3rd dimension equal to 0 - it's an empty tensor. The thirs dimension of x determines the third dimension of the result: torch.Size([4, 1, 0]) - which is also empty.
try
x = torch.randn(1,1,1)
It is so by design.
Tensor x has one dimension 0.
import torch
x = torch.randn(1, 1, 0)
print(x) # tensor([], size=(1, 1, 0))
These tensors are limited and I think the design is bad, but this is my opinion.
For instance such tensors cannot be concatenated.
static void check_cat_no_zero_dim(TensorList tensors) {
for(size_t i = 0; i < tensors.size(); ++i) {
auto& t = tensors[i];
TORCH_CHECK(t.dim() > 0,
"zero-dimensional tensor (at position ", i, ") cannot be concatenated");
}
}
Also, what you noticed is inability to work with + operator.
So probable just like PyTorch tensors have the check if dimension is non-negative, it should be actually if >0 check.
inline void check_size_nonnegative(IntArrayRef size) {
for (auto x: size) {
TORCH_CHECK(x >= 0, "Trying to create tensor with negative dimension ", x, ": ", size);
}
}
Again this is just my point of view.
Related
I am training a model to segment an image to predict the degree of damage (ranging from 0: no damage, to 5: severe damage) for each pixel of an image. I have approached it this way:
def simple_loss(pred, mask): # regression case
pred = torch.sigmoid(pred)
return (F.mse_loss(pred, mask, reduce='none')).mean()
def structure_loss(pred, mask): # binary case: damaged vs undamaged
weit = 1 + 5 * torch.abs(F.avg_pool2d(mask, kernel_size=31, stride=1, padding=15) - mask)
wbce = F.binary_cross_entropy_with_logits(pred, mask, reduce='none')
wbce = (weit * wbce).sum(dim=(2, 3)) / weit.sum(dim=(2, 3))
pred = torch.sigmoid(pred)
inter = ((pred * mask) * weit).sum(dim=(2, 3))
union = ((pred + mask) * weit).sum(dim=(2, 3))
wiou = 1 - (inter + 1) / (union - inter + 1)
return (wbce + wiou).mean()
Binary case yields IoU > 0.6, but the regression model is inaccurate. My datset is imbalanced (100:1) with the majority of the pixels belonging to the undamaged class. Hence, the optimization is driven towards accurate prediction of undamaged pixels.
The confusion matrix in the (1..5) region shows no correlation between the label and the predicted value.
I cannot balance the set because the undamaged region next to the damaged area is informative to humans, trained to examine the damage.
How can I modify the loss function to assign higher cost to regression errors regarding the degree of damage?
We can encode irrelevant pixels with -1. Then modify the loss function to ignore irrelevant classes this way:
from keras import backend as K
def masked_mse(mask_value):
def f(y_true, y_pred):
mask_true = K.cast(K.not_equal(y_true, mask_value), K.floatx())
masked_squared_error = K.square(mask_true * (y_true - y_pred))
masked_mse = K.sum(masked_squared_error, axis=-1) / K.sum(mask_true, axis=-1)
return masked_mse
f.__name__ = 'Masked MSE (mask_value={})'.format(mask_value)
return f
y_pred = K.constant([[ 1, 1, 1, 1],
[ 1, 1, 1, 3],
[ 1, 1, 1, 3],
[ 1, 1, 1, 3],
[ 1, 1, 1, 3],
[ 1, 1, 1, 3]])
y_true = K.constant([[ 1, 1, 1, 1],
[ 1, 1, 1, 1],
[-1, 1, 1, 1],
[-1,-1, 1, 1],
[-1,-1,-1, 1],
[-1,-1,-1,-1]])
true = K.eval(y_true)
pred = K.eval(y_pred)
loss = K.eval(masked_mse(-1)(y_true, y_pred))
for i in range(true.shape[0]):
print(true[3], pred[3], loss[3], sep='\t')
# [-1. -1. 1. 1.] [ 1. 1. 1. 3.] 2.0
After taking audio data from a stream of length x, the data is then convolved with an impulse response of length 256.
This gives the output vector a length of (x + 256 - 1).
When the data is then fed back into a stream of length x there are 255 samples of overshoot that then causes popping and clicking.
Is there a work around for this? Im not 100% on how to merge the larger than original buffer into the output again without losing random samples or causing this issue.
I left out the larger irrelevent parts of the code, it all works its just this issue i need fixed. Its just here to give an insight into the problem.
Code:
void ConvolveEffect(int chan, void* stream, int len, void* udata)
{
////...A bunch of settings etc
//Pointer to stream
short* p = (short*)stream; //Using short to rep 16 bit ints as the stream is in 16bits.
int length = len / sizeof(short);
//Processing buffer (float)
float* audioData[2];
audioData[0] = new float[length / 2];
audioData[1] = new float[length / 2];
//Demux to L and R
for (int i = 0; i < length; i++)
{
bool even = i % 2 == 0 ? true : false;
audioData[!even][((i - !even) / 2)] = map(p[i], -32767, 32767, -1.0, 1.0);
}
////....Convolution occurs outputting OUT
std::vector<fftconvolver::Sample> outL = Convolve(audioData[0], IRL, length / 2, 256, 128, 256, 256);
std::vector<fftconvolver::Sample> outR = Convolve(audioData[1], IRR, length / 2, 256, 128, 256, 256);
//Remux
for (int i = 0; i < length; i++)
{
bool even = i % 2 == 0 ? true : false;
p[i] = map(Out[!even][(i - !even) / 2], -1.0, 1.0, -32768, 32767);
}
You remember the 255 extra samples and add their values to the 255 samples at the beginning of the next output block.
For example:
[1, 2, 1, 3] produces [2, 3, 4, 3, 2, 1] you output [2, 3, 4, 3], remember [2,1]
Next block:
[3, 2, 1, 3] produces [4, 3, 4, 5, 5, 2]
you output:
[4, 3, 4 ,5]
+ [2, 1]
-----------------
[6, 4, 4, 5]
remember [5,2]
This is referred to as the "overlap-add" method of convolution. It's usually used with FFT convolution in blocks.
The Wikipedia page is here, but it's not awesome: https://en.wikipedia.org/wiki/Overlap%E2%80%93add_method
Hi I'm student who just started for deep learning.
For example, I have 1-D tensor x = [ 1 , 2]. From this one, I hope to make 2D tensor y whose (i,j)th element has value (x[i] - x[j]), i.e y[0,:] = [0 , 1] , y[1,:]=[ -1 , 0].
Is there built-in function like this in pytorch library?
Thanks.
Here you need right dim of tensor to get expected result which you can get using torch.unsqueeze
x = torch.tensor([1 , 2])
y = x - x.unsqueeze(1)
y
tensor([[ 0, 1],
[-1, 0]])
There are a few ways you could get this result, the cleanest I can think of is using broadcasting semantics.
x = torch.tensor([1, 2])
y = x.view(-1, 1) - x.view(1, -1)
which produces
y = tensor([[0, -1],
[1, 0]])
Note I'll try to edit this answer and remove this note if the original question is clarified.
In your question you ask for y[i, j] = x[i] - x[j], which the above code produces.
You also say that you expect y to have values
y = tensor([[ 0, 1],
[-1, 0]])
which is actually y[i, j] = x[j] - x[i] as was posted in Dishin's answer. If you instead wanted the latter then you can use
y = x.view(1, -1) - x.view(-1, 1)
Program that finds the maximal rectangle containing only 1's of a binary matrix with the maximal histogram problem.
I am trying to do some tests on a code
def maximalRectangle(self, matrix):
if not matrix or not matrix[0]:
return 0
n = len(matrix[0])
height = [0] * (n + 1)
ans = 0
for row in matrix:
for i in range(n):
height[i] = height[i] + 1 if row[i] == '1' else 0
stack = [-1]
for i in range(n + 1):
while height[i] < height[stack[-1]]:
h = height[stack.pop()]
w = i - 1 - stack[-1]
ans = max(ans, h * w)
stack.append(i)
return ans
# Driver Code
if __name__ == '__main__':
matrix = [[0, 1, 0, 1],
[0, 1, 0, 1],
[0, 1, 1, 1],
[1, 1, 1, 1]]
print(maximalRectangle(matrix))
I get TypeError: maximalRectangle() missing 1 required positional argument: 'matrix' error
Solved by removing self and changing the print statement to:
print(maximalRectangle([
["1","0","1","0","0"],
["1","1","1","1","1"],
["1","1","1","1","1"],
["1","0","0","1","0"]]))
I am currently implementing a function to compute Custom Cross Entropy Loss.
The definition of the function is a following image.
my codes are as following,
output = output.permute(0, 2, 3, 1)
target = target.permute(0, 2, 3, 1)
batch, height, width, channel = output.size()
total_loss = 0.
for b in range(batch): # for each batch
o = output[b]
t = target[b]
loss = 0.
for w in range(width):
for h in range(height): # for every pixel([h,w]) in the image
sid_t = t[h][w][0]
sid_o_candi = o[h][w]
part1 = 0. # to store the first sigma
part2 = 0. # to store the second sigma
for k in range(0, sid_t):
p = torch.sum(sid_o_candi[k:]) # to get Pk(w,h)
part1 += torch.log(p + 1e-12).item()
for k in range(sid_t, intervals):
p = torch.sum(sid_o_candi[k:]) # to get Pk(w,h)
part2 += torch.log(1-p + 1e-12).item()
loss += part1 + part2
loss /= width * height * (-1)
total_loss += loss
total_loss /= batch
return torch.tensor(total_loss, dtype=torch.float32)
I am wondering is there any optimization could be done with these code.
I'm not sure sid_t = t[h][w][0] is the same for every pixel or not. If so, you can get rid of all for loop which boost the speed of computing loss.
Don't use .item() because it will return a Python value which loses the grad_fn track. Then you can't use loss.backward() to compute the gradients.
If sid_t = t[h][w][0] is not the same, here is some modification to help you get rid of at least 1 for-loop:
batch, height, width, channel = output.size()
total_loss = 0.
for b in range(batch): # for each batch
o = output[b]
t = target[b]
loss = 0.
for w in range(width):
for h in range(height): # for every pixel([h,w]) in the image
sid_t = t[h][w][0]
sid_o_candi = o[h][w]
part1 = 0. # to store the first sigma
part2 = 0. # to store the second sigma
sid1_cumsum = sid_o_candi[:sid_t].flip(dim=(0,)).cumsum(dim=0).flip(dims=(0,))
part1 = torch.sum(torch.log(sid1_cumsum + 1e-12))
sid2_cumsum = sid_o_candi[sid_t:intervals].flip(dim=(0,)).cumsum(dim=0).flip(dims=(0,))
part2 = torch.sum(torch.log(1 - sid2_cumsum + 1e-12))
loss += part1 + part2
loss /= width * height * (-1)
total_loss += loss
total_loss /= batch
return torch.tensor(total_loss, dtype=torch.float32)
How it works:
x = torch.arange(10);
print(x)
x_flip = x.flip(dims=(0,));
print(x_flip)
x_inverse_cumsum = x_flip.cumsum(dim=0).flip(dims=(0,))
print(x_inverse_cumsum)
# output
tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
tensor([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
tensor([45, 45, 44, 42, 39, 35, 30, 24, 17, 9])
Hope it helps.