requires_grad = False seems not working in my case

requires_grad = False seems not working in my case - pytorch

I received a Cannot insert a Tensor that requires grad as a constant. Consider making it a parameter or input, or detaching the gradient error with tensor W.
W has the size of (10,10) and grad_fn=<DivBackward0>. The error happens at the second line
def muy(self, x):
V = torch.tensor(self.W - self.lambda_ * torch.eye(self.ENCODING_DIM), requires_grad=False)
return -0.5 * V.inverse().mm(self.b + self.lambda_ * x[:, None])
Other vars, values taken at the time of the error
self.lambda_: 1.0
self.ENCODING_DIM: 10
self.b: torch.Size([10, 1]), requires_grad=True
x: torch.Size([3, 1, 10]), grad_fn=<MulBackward0>
How could I set the result of muy as just an ingredient of the leaf node, so grad through V is required?
I tried this monstrosity, to no avail
def muy(self, x):
V_inv = np.linalg.inv(self.V.detach().numpy())
x_numpy = x[:, None].detach().numpy()
temp= -0.5 * np.matmul(V_inv, self.b.detach().numpy() + self.lambda_ * x_numpy)
return temp
Why I cared about this JIT:
I wanted to use tensorboard to visualize my model, if I understand the error messages right, the visualzing models use Tracer
EDIT
This still gives the same error, W or W.detach()
with torch.no_grad():
V = self.W - self.lambda_ * torch.eye(self.ENCODING_DIM)
return -0.5 * V.inverse().mm(self.b + self.lambda_ * x[:, None])

V = torch.tensor(self.W - self.lambda_ * torch.eye(self.ENCODING_DIM), requires_grad=False)
What you are trying to do here doesn't make much sense. torch.tensor(value) can only be created if the value is scalar (e.g. Python's 5), while you are trying to fit torch.Tensor there.
What you should do is simply this:
V = self.W - self.lambda_ * torch.eye(self.ENCODING_DIM)
If you want to detach self.W for some reason you can do this:
V = self.W.detach() - self.lambda_ * torch.eye(self.ENCODING_DIM)
(this will make a copy of self.W with requires_grad set to False).
You could also use torch.no_grad() context manager so this operation will not be recorded on the graph which will have the same effect on the graph (but only in this case, not in general and you won't make copy of self.W so it is advised to do that):
with torch.no_grad():
V = self.W - self.lambda_ * torch.eye(self.ENCODING_DIM)
Code to reproduce
Can't reproduce this exact issue based on your code description, see below:
import torch
lambda_ = 1.0
W = torch.randn(10, 10, requires_grad=True)
ENCODING_DIM = 10
b = torch.randn(10, 1, requires_grad=True)
x = torch.randn(3, 1, 10, requires_grad=True)
with torch.no_grad():
V = W - lambda_ * torch.eye(ENCODING_DIM)
result = -0.5 * V.inverse().mm(b + lambda_ * x[:, None])
print(result)
This code gives the following (different!) error:
Traceback (most recent call last): File "foo.py", line 13, in
result = -0.5 * V.inverse().mm(b + lambda_ * x[:, None]) RuntimeError: matrices expected, got 2D, 4D tensors at
/pytorch/aten/src/TH/generic/THTensorMath.cpp:36

I think the problem is about shape of matrixes.
in this line return -0.5 * V.inverse().mm(self.b + self.lambda_ * x[:, None]),
V.inverse()shape is [10, 10], b shape is [10, 1] and x[:, None] shape is[3, 1, 1, 10]. These shapes are not good for calculating what you want. especially that of x[:, None].

Related

"An operation has `None` for gradient" problem with custom loss function

I seem to have a specific code because I couldn't find what I want on the web, so here's my problem:
I have coded a NN that takes an array of a specific length and should give me a singe value as output:
model = tf.keras.Sequential()
model.add(layers.Embedding(input_dim=int(input_len_array), output_dim=8 * int(input_len_array)))
model.add(layers.GRU(32 * int(input_len_array), return_sequences=True))
# Last layer...
model.add(layers.Dense(1, activation='tanh'))
After that I create a Custom_loss function:
custom_loss(x_, y_):
sess = tf.Session()
Sortino = self.__backtest(x_, y_)
def loss(y_true, y_pred):
print('Sortino: ', Sortino)
# The Optimizer will MAXIMIZE the Sortino so we compute -Sortino
return tf.convert_to_tensor(-Sortino)
return loss
After that I compile my model and I give it the whole batch of values in the tensor X and Y:
self.model.compile(optimizer='adam', loss=custom_loss(x, y))
Inside the Custom loss I call the function self.__backtest which is defined as below:
def __backtest(self, x_: tf.Tensor, y_r: tf.Tensor, timesteps=40):
my_list = []
sess = tf.Session()
# Defining the Encoder
# enc = OneHotEncoder(handle_unknown='ignore')
# X = [[-1, 0], [0, 1], [1, 2]]
# enc.fit(X)
# sess.run(x_)[i, :] is <class 'numpy.ndarray'>
print('in backest: int(x_.get_shape())', x_.get_shape())
for i in range(int(x_.get_shape()[0])):
output_of_nn = self.model.predict(sess.run(x_)[i, :] / np.linalg.norm(sess.run(x_)[i, :]))
# categorical_output = tf.keras.utils.to_categorical(output_of_nn)
my_list.append(scaled_output * sess.run(y_r)[i])
if i < 10:
print('First 10 scaled output: ', scaled_output)
if i > 0:
capital_evolution.append(capital_evolution[-1] * (my_list[-1] + 1))
my_array = np.array(my_list)
if len(my_array) < 10:
return -10
try:
Sortino = my_array.mean() / my_array.std()
except:
Sortino = -10
return Sortino
The computer is'nt able to run the code and gives me this error:
ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
I would be more that grateful if someone could give the solution!! MANY THANKS!!

How to properly implement data reorganization using PyTorch?

It's going to be a long post, sorry in advance...
I'm working on a denoising algorithm and my goal is to:
Use PyTorch to design / train the model
Convert the PyTorch model into a CoreML model
The denoising algorithm consists in the following 3 parts:
A "down-sampling" + noise level map
A regular convnet
An "up-sampling"
The first part is quite simple in its idea, but not so easy to explain. Given for instance an input color image and a input value "sigma" that represents the standard deviation of the image noise.
The "down-sampling" part is in fact a space-to-depth. In short, for a given channel and for a subset of 2x2 pixels, the space-to-depth creates a single pixel composed of 4 channels. The number of channels is multiplied by 4 while the height and width are divided by 2. The data is simply reorganized.
The noise level map consists in creating 3 channels containing the standard deviation value so that the convnet knows how to properly denoise the input image.
This will be maybe more clear with some code:
def downsample_and_noise_map(input, sigma):
# Input tensor size (batch, channels, height, width)
in_n, in_c, in_h, in_w = input.size()
# Output tensor size
out_h = in_h // 2
out_w = in_w // 2
sigma_c = in_c # nb of channels of the standard deviation tensor
image_c = in_c * 4 # nb of channels of the image tensor
# Standard deviation tensor
output_sigma = sigma.view(1, 1, 1, 1).repeat(in_n, sigma_c, out_h, out_w)
# Image tensor
output_image = torch.zeros((in_n, image_c, out_h, out_w))
output_image[:, 0::4, :, :] = input[:, :, 0::2, 0::2]
output_image[:, 1::4, :, :] = input[:, :, 0::2, 1::2]
output_image[:, 2::4, :, :] = input[:, :, 1::2, 0::2]
output_image[:, 3::4, :, :] = input[:, :, 1::2, 1::2]
# Concatenate standard deviation and image tensors
return torch.cat((output_sigma, output_image), dim=1)
This function is then called as the first step in the model's forward function:
def forward(self, x, sigma):
x = downsample_and_noise_map(x, sigma)
x = self.convnet(x)
x = upsample(x)
return x
Let's consider an input tensor of size 1x3x100x100 (PyTorch standard: batch, channels, height, width) and a sigma value of 0.1. The output tensor has the following properties:
Tensor's shape is 1x15x50x50
Tensor's values for channels 0, 1 and 2 are all equal to sigma = 0.1
Tensor's values for channels 3, 4, 5, 6 are composed of the input image values of channel 0
Tensor's values for channels 7, 8, 9, 10 are composed of the input image values of channel 1
Tensor's values for channels 11, 12, 13, 14 are composed of the input image values of channel 2
If this code is not clear enough, I can post an even more naive version.
The up-sampling part is the reciprocal function of the downsampling one.
I was able to use this function for training and testing in PyTorch.
Then, I tried to convert the model to CoreML with ONNX as an intermediate step.
The conversion to ONNX generated "TracerWarning". Conversion from ONNX to CoreML failed (TypeError: 1.0 has type numpy.float64, but expected one of: int, long). The problem came from the down-sampling + noise level map (and from up-sampling too).
When I removed the down-sampling + noise level map and up-sampling layers, I was able to convert to ONNX and to CoreML very easily since only a simple convnet remained. This means I have a solution to my problem: implement these 2 layers using 2 shaders on the mobile side. But I'm not satisfied with this solution as I want my model to contain all layers ^^
Before considering writing a post here, I crawled Internet to find an answer and I was able to write a better version of the previous function using reshape and permute. This version removed all ONNX warning, but the CoreML conversion still failed...
def downsample_and_noise_map(input, sigma):
# Input image size
in_n, in_c, in_h, in_w = input.size()
# Output tensor size
out_n = in_n
out_h = in_h // 2
out_w = in_w // 2
# Create standard deviation tensor
output_sigma = sigma.view(out_n, 1, 1, 1).repeat(out_n, in_c, out_h, out_w)
# Split RGB channels
channels_rgb = torch.split(input, 1, dim=1)
# Reshape (space-to-depth) each image channel
channels_reshaped = []
for channel in channels_rgb:
channel = channel.reshape(1, out_h, 2, out_w, 2)
channel = channel.permute(2, 4, 0, 1, 3)
channel = channel.reshape(1, 4, out_h, out_w)
channels_reshaped.append(channel)
# Concatenate all reshaped image channels together
output_image = torch.cat(channels_reshaped, dim=1)
# Concatenate standard deviation and image tensors
output = torch.cat([output_sigma, output_image], dim=1)
return output
So here are (some of) my questions:
What is the preferred PyTorch way to implement a function such as downsample_and_noise_map function within a model?
Same question but when the conversion to ONNX and then to CoreML is part of the equation?
Is the PyTorch -> ONNX -> CoreML still best path to deploy the model for iOS production?
Thanks for your help (and your patience) ^^

Disclaimer I'm not familiar with CoreML or deploying to iOS but I do have experience deploying PyTorch models in TensorRT and OpenVINO via ONNX.
The main issues I've faced when deploying to other frameworks is that operations like slicing and repeating tensors tend to have limited support in other frameworks. Often we can construct equivalent conv or transpose-conv operations which achieve the desired behavior.
In order to ensure we don't export the logic used to construct the conv weights I've separated the weight initialization from the application of the weights. This makes the ONNX export much more straightforward since all it sees is some constant tensors being applied.
class DownsampleAndNoiseMap():
def __init__(self):
self.initialized = False
self.weight = None
self.zeros = None
def init_weights(self, input):
with torch.no_grad():
in_n, in_c, in_h, in_w = input.size()
out_h = int(in_h // 2)
out_w = int(in_w // 2)
sigma_c = in_c
image_c = in_c * 4
# conv weights used for downsampling
self.weight = torch.zeros(image_c, in_c, 2, 2).to(input)
for c in range(in_c):
self.weight[4 * c, c, 0, 0] = 1
self.weight[4 * c + 1, c, 0, 1] = 1
self.weight[4 * c + 2, c, 1, 0] = 1
self.weight[4 * c + 3, c, 1, 1] = 1
# zeros used to replace repeat
self.zeros = torch.zeros(in_n, sigma_c, out_h, out_w).to(input)
self.initialized = True
def __call__(self, input, sigma):
assert self.initialized
output_sigma = self.zeros + sigma
output_image = torch.nn.functional.conv2d(input, self.weight, stride=2)
return torch.cat((output_sigma, output_image), dim=1)
class Upsample():
def __init__(self):
self.initialized = False
self.weight = None
def init_weights(self, input):
with torch.no_grad():
in_n, in_c, in_h, in_w = input.size()
image_c = in_c * 4
self.weight = torch.zeros(in_c + image_c, in_c, 2, 2).to(input)
for c in range(in_c):
self.weight[in_c + 4 * c, c, 0, 0] = 1
self.weight[in_c + 4 * c + 1, c, 0, 1] = 1
self.weight[in_c + 4 * c + 2, c, 1, 0] = 1
self.weight[in_c + 4 * c + 3, c, 1, 1] = 1
self.initialized = True
def __call__(self, input):
assert self.initialized
return torch.nn.functional.conv_transpose2d(input, self.weight, stride=2)
I made the assumption that upsample was the reciprocal of downsample in the sense that x == upsample(downsample_and_noise_map(x, sigma)) (correct me if I'm wrong in this assumption). I also verified that my version of downsample agrees with yours.
# consistency checking code
x = torch.randn(1, 3, 100, 100)
sigma = torch.randn(1)
# OP downsampling
y1 = downsample_and_noise_map(x, sigma)
ds = DownsampleAndNoiseMap()
ds.init_weights(x)
y2 = ds(x, sigma)
print('downsample diff:', torch.sum(torch.abs(y1 - y2)).item())
us = Upsample()
us.init_weights(x)
x_recov = us(ds(x, sigma))
print('recovery error:', torch.sum(torch.abs(x - x_recov)).item())
which results in
downsample diff: 0.0
recovery error: 0.0
Exporting to ONNX
When exporting we need to invoke init_weights for the new classes before using torch.onnx.export. For example
class Model(torch.nn.Module):
def __init__(self):
super().__init__()
self.downsample = DownsampleAndNoiseMap()
self.upsample = Upsample()
self.convnet = lambda x: x # placeholder
def init_weights(self, x):
self.downsample.init_weights(x)
self.upsample.init_weights(x)
def forward(self, x, sigma):
x = self.downsample(x, sigma)
x = self.convnet(x)
x = self.upsample(x)
return x
x = torch.randn(1, 3, 100, 100)
sigma = torch.randn(1)
model = Model()
# ... load state dict here
model.init_weights(x)
torch.onnx.export(model, (x, sigma), 'deploy.onnx', verbose=True, input_names=["input", "sigma"], output_names=["output"])
which gives the ONNX graph
graph(%input : Float(1, 3, 100, 100)
%sigma : Float(1)) {
%2 : Float(1, 3, 50, 50) = onnx::Constant[value=<Tensor>](), scope: Model
%3 : Float(1, 3, 50, 50) = onnx::Add(%2, %sigma), scope: Model
%4 : Float(12, 3, 2, 2) = onnx::Constant[value=<Tensor>](), scope: Model
%5 : Float(1, 12, 50, 50) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[2, 2], pads=[0, 0, 0, 0], strides=[2, 2]](%input, %4), scope: Model
%6 : Float(1, 15, 50, 50) = onnx::Concat[axis=1](%3, %5), scope: Model
%7 : Float(15, 3, 2, 2) = onnx::Constant[value=<Tensor>](), scope: Model
%output : Float(1, 3, 100, 100) = onnx::ConvTranspose[dilations=[1, 1], group=1, kernel_shape=[2, 2], pads=[0, 0, 0, 0], strides=[2, 2]](%6, %7), scope: Model
return (%output);
}
As for the last question about the recommended way to deploy on iOS I can't answer that since I don't have experience in that area.

.grad() returns None in pytorch

I am trying to write a simple script for parameter estimation (where parameters are weights here). I am facing problem when .grad() returns None. I have gone through this and this link also and understood the concept both theoretically and practically. For me following script should work but unfortunately, it is not working.
My 1st attempt: Following script is my first attempt
alpha_xy = torch.tensor(3.7, device=device, dtype=torch.float, requires_grad=True)
beta_y = torch.tensor(1.5, device=device, dtype=torch.float, requires_grad=True)
alpha0 = torch.tensor(1.1, device=device, dtype=torch.float, requires_grad=True)
alpha_y = torch.tensor(0.9, device=device, dtype=torch.float, requires_grad=True)
alpha1 = torch.tensor(0.1, device=device, dtype=torch.float, requires_grad=True)
alpha2 = torch.tensor(0.9, device=device, dtype=torch.float, requires_grad=True)
alpha3 = torch.tensor(0.001, device=device, dtype=torch.float, requires_grad=True)
learning_rate = 1e-4
total_loss = []
for epoch in tqdm(range(500)):
loss_1 = 0
for j in range(x_train.size(0)):
input = x_train[j:j+1]
target = y_train[j:j+1]
input = input.to(device,non_blocking=True)
target = target.to(device,non_blocking=True)
x_dt = gamma*input[0][0] + \
alpha_xy*input[0][0]*input[0][2] + \
alpha1*input[0][0]
y0_dt = beta_y*input[0][0] + \
alpha2*input[0][1]
y_dt = alpha0*input[0][1] + \
alpha_y*input[0][2] + \
alpha3*input[0][0]*input[0][2]
pred = torch.tensor([[x_dt],
[y0_dt],
[y_dt]],device=device
)
loss = (pred - target).pow(2).sum()
loss_1 += loss
loss.backward()
print(pred.grad, x_dt.grad, gamma.grad)
Above code throws an error message
element 0 of tensors does not require grad and does not have a grad_fn
at line loss.backward()
My Attempt 2: Improvement in 1st attempt is as follows:
gamma = torch.tensor(2.0, device=device, dtype=torch.float, requires_grad=True)
alpha_xy = torch.tensor(3.7, device=device, dtype=torch.float, requires_grad=True)
beta_y = torch.tensor(1.5, device=device, dtype=torch.float, requires_grad=True)
alpha0 = torch.tensor(1.1, device=device, dtype=torch.float, requires_grad=True)
alpha_y = torch.tensor(0.9, device=device, dtype=torch.float, requires_grad=True)
alpha1 = torch.tensor(0.1, device=device, dtype=torch.float, requires_grad=True)
alpha2 = torch.tensor(0.9, device=device, dtype=torch.float, requires_grad=True)
alpha3 = torch.tensor(0.001, device=device, dtype=torch.float, requires_grad=True)
learning_rate = 1e-4
total_loss = []
for epoch in tqdm(range(500)):
loss_1 = 0
for j in range(x_train.size(0)):
input = x_train[j:j+1]
target = y_train[j:j+1]
input = input.to(device,non_blocking=True)
target = target.to(device,non_blocking=True)
x_dt = gamma*input[0][0] + \
alpha_xy*input[0][0]*input[0][2] + \
alpha1*input[0][0]
y0_dt = beta_y*input[0][0] + \
alpha2*input[0][1]
y_dt = alpha0*input[0][1] + \
alpha_y*input[0][2] + \
alpha3*input[0][0]*input[0][2]
pred = torch.tensor([[x_dt],
[y0_dt],
[y_dt]],device=device,
dtype=torch.float,
requires_grad=True)
loss = (pred - target).pow(2).sum()
loss_1 += loss
loss.backward()
print(pred.grad, x_dt.grad, gamma.grad)
# with torch.no_grad():
# gamma -= leraning_rate * gamma.grad
Now the script is working but except pred.gred other two return None.
I want to update all the parameters after computing loss.backward() and update them but it is not happening due to None. Can anyone suggest me how to improve this script? Thanks.

You're breaking the computation graph by declaring a new tensor for pred. Instead you can use torch.stack. Also, x_dt and pred are non-leaf tensors so the gradients aren't retained by default. You can override this behavior by using .retain_grad().
gamma = torch.tensor(2.0, device=device, dtype=torch.float, requires_grad=True)
alpha_xy = torch.tensor(3.7, device=device, dtype=torch.float, requires_grad=True)
beta_y = torch.tensor(1.5, device=device, dtype=torch.float, requires_grad=True)
alpha0 = torch.tensor(1.1, device=device, dtype=torch.float, requires_grad=True)
alpha_y = torch.tensor(0.9, device=device, dtype=torch.float, requires_grad=True)
alpha1 = torch.tensor(0.1, device=device, dtype=torch.float, requires_grad=True)
alpha2 = torch.tensor(0.9, device=device, dtype=torch.float, requires_grad=True)
alpha3 = torch.tensor(0.001, device=device, dtype=torch.float, requires_grad=True)
learning_rate = 1e-4
total_loss = []
for epoch in tqdm(range(500)):
loss_1 = 0
for j in range(x_train.size(0)):
input = x_train[j:j+1]
target = y_train[j:j+1]
input = input.to(device,non_blocking=True)
target = target.to(device,non_blocking=True)
x_dt = gamma*input[0][0] + \
alpha_xy*input[0][0]*input[0][2] + \
alpha1*input[0][0]
# retain the gradient for non-leaf tensors
x_dt.retain_grad()
y0_dt = beta_y*input[0][0] + \
alpha2*input[0][1]
y_dt = alpha0*input[0][1] + \
alpha_y*input[0][2] + \
alpha3*input[0][0]*input[0][2]
# use stack instead of declaring a new tensor
pred = torch.stack([x_dt, y0_dt, y_dt], dim=0).unsqueeze(1)
# pred is also a non-leaf tensor so we need to tell pytorch to retain its grad
pred.retain_grad()
loss = (pred - target).pow(2).sum()
loss_1 += loss
loss.backward()
print(pred.grad, x_dt.grad, gamma.grad)
with torch.no_grad():
gamma -= learning_rate * gamma.grad
Closed form solution
Assuming you want to optimize for the parameters defined at the top of the function gamma, alpha_xy, beta_y, etc... Then what you have here is an example of ordinary least squares. See least squares for a slightly friendlier introduction to the topic. Take a look at the components of pred and you'll notice that x_dt, y0_dt, and y_dt are actually independent of each other with respect to the parameters (in this case it's obvious because they each use totally different parameters). This makes the problem much easier because it means we can actually optimize the terms (x_dt - target[0])**2, (y0_dt - target[1])**2 and (y_dt - target[2])**2 separately!
Without getting into the details the solution (without back-propagation or gradient descent) ends up being
# supposing x_train is [N,3] and y_train is [N,3]
x1 = torch.stack((x_train[:, 0], x_train[:, 0] * x_train[:, 2]), dim=0)
y1 = y_train[:, 0].unsqueeze(1)
# avoid inverses using solve to get p1 = inv(x1 . x1^T) . x1 . y1
p1, _ = torch.solve(x1 # y1, x1 # x1.transpose(1, 0))
# gamma and alpha1 are redundant. As long as gamma + alpha1 = p1[0] we get the same optimal value for loss
gamma = p1[0] / 2
alpha_xy = p1[1]
alpha1 = p1[0] / 2
x2 = torch.stack((x_train[:, 0], x_train[:, 1]), dim=0)
y2 = y_train[:, 1].unsqueeze(1)
p2, _ = torch.solve(x2 # y2, x2 # x2.transpose(1, 0))
beta_y = p2[0]
alpha2 = p2[1]
x3 = torch.stack((x_train[:, 1], x_train[:, 2], x_train[:, 0] * x_train[:, 2]), dim=0)
y3 = y_train[:, 2].unsqueeze(1)
p3, _ = torch.solve(x3 # y3, x3 # x3.transpose(1, 0))
alpha0 = p3[0]
alpha_y = p3[1]
alpha3 = p3[2]
loss_1 = torch.sum((x1.transpose(1, 0) # p1 - y1)**2 + (x2.transpose(1, 0) # p2 - y2)**2 + (x3.transpose(1, 0) # p3 - y3)**2)
mse = loss_1 / x_train.size(0)
To test this code is working I generated some fake data which I knew the underlying model coefficients (there's some noise added so the final result won't exactly match the expected).
def gen_fake_data(samples=50000):
x_train = torch.randn(samples, 3)
# define fake data with known minimal solutions
x1 = torch.stack((x_train[:, 0], x_train[:, 0] * x_train[:, 2]), dim=0)
x2 = torch.stack((x_train[:, 0], x_train[:, 1]), dim=0)
x3 = torch.stack((x_train[:, 1], x_train[:, 2], x_train[:, 0] * x_train[:, 2]), dim=0)
y1 = x1.transpose(1, 0) # torch.tensor([[1.0], [2.0]]) # gamma + alpha1 = 1.0
y2 = x2.transpose(1, 0) # torch.tensor([[3.0], [4.0]])
y3 = x3.transpose(1, 0) # torch.tensor([[5.0], [6.0], [7.0]])
y_train = torch.cat((y1, y2, y3), dim=1) + 0.1 * torch.randn(samples, 3)
return x_train, y_train
x_train, y_train = gen_fake_data()
# optimization code from above
...
print('loss_1:', loss_1.item())
print('MSE:', mse.item())
print('Expected 0.5, 2.0, 0.5, 3.0, 4.0, 5.0, 6.0, 7.0')
print('Actual', gamma.item(), alpha_xy.item(), alpha1.item(), beta_y.item(), alpha2.item(), alpha0.item(), alpha_y.item(), alpha3.item())
which results in
loss_1: 1491.731201171875
MSE: 0.029834624379873276
Expected 0.5, 2.0, 0.5, 3.0, 4.0, 5.0, 6.0, 7.0
Actual 0.50002 2.0011 0.50002 3.0009 3.9997 5.0000 6.0002 6.9994

What means T.alloc

I'm learning keras and I'm using a code about music generation to learn. I have checked the code and I have showed this, I think this is theano:
start_note_values = T.alloc(np.array(0, dtype=np.int8), 5 * 128, 1, 2)
The function is:
def y_labels(y):
start_note_values = T.alloc(np.array(0, dtype=np.int8), BATCH_SIZE * NUM_TIMESTEPS, 1, OUTPUT_LAYER)
correct_choices = y[:, :, :-1, :].reshape((BATCH_SIZE * NUM_TIMESTEPS, NUM_NOTES - 1, OUTPUT_LAYER))
features = T.concatenate([start_note_values, correct_choices], axis=1)
return features.reshape((BATCH_SIZE, NUM_TIMESTEPS, NUM_NOTES, OUTPUT_LAYER)).transpose((0, 2, 1, 3)).reshape((BATCH_SIZE * NUM_NOTES, NUM_TIMESTEPS, OUTPUT_LAYER))
get_labels_shape = lambda shape: [BATCH_SIZE * NUM_NOTES, NUM_TIMESTEPS, OUTPUT_LAYER]
previous_notes = Sequential([
Lambda(y_labels, output_shape=get_labels_shape, batch_input_shape=(BATCH_SIZE, NUM_TIMESTEPS, NUM_NOTES, OUTPUT_LAYER), name='y_labels')
])
I don't understand what it means, can someone explain me? The input has this format:
X = (440, 128, 300)
There is a way to right this in keras?
Thank you guys!

InvalidArgumentError logits and labels must be same size: logits_size=[3215,25] labels_size=[10,25]

I was having quite a few errors (OOM, shape problems, etc) which I had managed to fix somehow.
But I'm unable to get my head around this error. I have searched quite a bit and I have also tried the sparse cross entropy with logits method in tensorflow and the tf.squeeze function also but that also didn't help me in resolving this error. Here is the link of the code (it's a github gist with the entire stacktrace and errors).
Code Link
Here is the link for the data set(It's around 500 Mb)
Dataset Link
Here is the Code (just in Case):
from PIL import Image
import numpy as np
import glob
from numpy import array
import pandas as pd
from sklearn.preprocessing import LabelEncoder,OneHotEncoder
import h5py
import tensorflow as tf
def loading_saving_image_as_grayscale_train(img):
##combined_path='M:/PycharmProjects/AI+DL+CP/test_img'+img
loading=Image.open(img)
loading=loading.resize((28,28),Image.ANTIALIAS)
loading=loading.convert('L')
#loading.show()
conversion_to_array=np.asarray(loading,dtype=float)
train_data.append(conversion_to_array)
def loading_saving_image_as_grayscale_test(img):
#combined_path = 'M:/PycharmProjects/AI+DL+CP/train_img/' + img
#print(combined_path)
loading=Image.open(img,'r')
loading=loading.resize((28,28),Image.ANTIALIAS)
loading=loading.convert('L')
conversion_to_array=np.asarray(loading,dtype=float)
test_data.append(conversion_to_array)
import os
import requests, zipfile, io
import pandas as pd
#url = requests.get('https://he-s3.s3.amazonaws.com/media/hackathon/deep-learning-challenge-1/identify-the-objects/a0409a00-8-dataset_dp.zip')
#data = zipfile.ZipFile(io.BytesIO(url.content))
#data.extractall()
#os.listdir()
dataframe1=pd.read_csv('test.csv')
dataframe1.index=dataframe1.index+1
only_index=dataframe['image_id']
test_data=[]
train_data=[]
train=glob.glob('train_img/*.png')
test=glob.glob('test_img/*.png')
#other=loading_saving_image_as_grayscale('M:/PycharmProjects/AI+DL+CP/test_img/test_1000b.png')
#print(Image.open('M:/PycharmProjects/AI+DL+CP/test_img/test_1000b.png'))
#print(test)
#loading_sample=Image.open('M:/PycharmProjects/AI+DL+CP/test_img/test_1000b.png')
#loading_sample.show()
#print(train)
#print(test)
for data in train:
#print(data)
loading_saving_image_as_grayscale_train(data)
for item in test:
#print(item)
loading_saving_image_as_grayscale_test(item)
#print(train_data)
#print(test_data)
'''with Image.fromarray(train_data[1]) as img:
width,height=img.size
print(width,height)
'''
def OneHot(label,n_classes):
label=np.array(label).reshape(-1)
label=np.eye(n_classes)[label]
return label
dataframe=pd.read_csv('train.csv')
train_data=np.asarray(train_data)
test_data=np.asarray(test_data)
uni=dataframe['label']
dataframe1=pd.read_csv('test.csv')
dataframe1.index=dataframe1.index+1
only_index=dataframe['image_id']
label=LabelEncoder()
integer_encoding=label.fit_transform(uni)
#del uni
#del dataframe
#print(integer_encoding)
binary=OneHotEncoder(sparse=False)
integer_encoding=integer_encoding.reshape(len(integer_encoding),1)
onehot=binary.fit_transform(integer_encoding)
train_data=np.reshape(train_data,[-1,28,28,1])
test_data=np.reshape(test_data,[-1,28,28,1])
#onehot=np.reshape(onehot,[-1,10])
train_data=np.transpose(train_data,(0,2,1,3))
test_data=np.transpose(test_data,(0,2,1,3))
train_data=train_data.astype(np.float32)
test_data=test_data.astype(np.float32)
print(train_data.shape,test_data.shape,onehot.shape)
graph = tf.Graph()
with graph.as_default():
# placeholders for input data batch_size x 32 x 32 x 3 and labels batch_size x 10
data_placeholder = tf.placeholder(tf.float32, shape=[None, 28, 28, 1])
label_placeholder = tf.placeholder(tf.int32, shape=[None, 25])
# defining decaying learning rate
global_step = tf.Variable(0)
decay_rate = tf.train.exponential_decay(1e-4, global_step=global_step, decay_steps=10000, decay_rate=0.97)
layer1_weights = tf.Variable(tf.truncated_normal([3, 3, 1, 64],stddev=0.1))
layer1_biases = tf.Variable(tf.constant(0.1, shape=[64]))
layer2_weights = tf.Variable(tf.truncated_normal([3, 3, 64,32],stddev=0.1))
layer2_biases = tf.Variable(tf.constant(0.1,shape=[32]))
layer3_weights = tf.Variable(tf.truncated_normal([2, 2, 32, 20],stddev=0.1))
layer3_biases = tf.Variable(tf.constant(0.1,shape=[20]))
layer4_weights = tf.Variable(tf.truncated_normal([20,25],stddev=0.1))
layer4_biases = tf.Variable(tf.constant(0.1,shape=[25]))
layer5_weights = tf.Variable(tf.truncated_normal([25, 25], stddev=0.1))
layer5_biases = tf.Variable(tf.constant(0.1, shape=[25]))
def layer_multiplication(data_input_given):
#Convolutional Layer 1
#data_input_given=np.reshape(data_input_given,[-1,64,64,1])
CNN1=tf.nn.relu(tf.nn.conv2d(data_input_given,layer1_weights,strides=[1,1,1,1],padding='SAME')+layer1_biases)
print('CNN1 Done!!')
#Pooling Layer
Pool1=tf.nn.max_pool(CNN1,ksize=[1,4,4,1],strides=[1,4,4,1],padding='SAME')
print('Pool1 DOne')
#second Convolution layer
CNN2=tf.nn.relu(tf.nn.conv2d(Pool1,layer2_weights,strides=[1,1,1,1],padding='SAME'))+layer2_biases
print('CNN2 Done')
#Second Pooling
Pool2 = tf.nn.max_pool(CNN2, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')
print('pool2 Done')
#Third Convolutional Layer
CNN3 = tf.nn.relu(tf.nn.conv2d(Pool2, layer3_weights, strides=[1, 1, 1, 1], padding='SAME')) + layer3_biases
print('CNN3 Done')
#Third Pooling Layer
Pool3 = tf.nn.max_pool(CNN3, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
print('Pool3 DOne')
#Fully Connected Layer
Pool4=tf.nn.max_pool(Pool3,ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME')
FullyCon=tf.reshape(Pool4,[-1,20])
FullyCon=tf.nn.relu(tf.matmul(FullyCon,layer4_weights)+layer4_biases)
print('Fullyconnected Done')
dropout = tf.nn.dropout(FullyCon, 0.4)
dropout=tf.reshape(dropout,[-1,25])
dropout=tf.matmul(dropout,layer5_weights)+layer5_biases
#print(dropout.shape)
return dropout
train_input = layer_multiplication(train_data)
print(train_input.shape)
loss = (tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=label_placeholder,logits=train_input))
+ 0.01 * tf.nn.l2_loss(layer1_weights)
+ 0.01 * tf.nn.l2_loss(layer2_weights)
+ 0.01 * tf.nn.l2_loss(layer3_weights)
+ 0.01 * tf.nn.l2_loss(layer4_weights)
)
#other=(tf.squeeze(label_placeholder))
#print(tf.shape())
optimizer = tf.train.GradientDescentOptimizer(name='Stochastic', learning_rate=decay_rate).minimize(loss,global_step=global_step)
#print(train_input.shape)
batch_size = 10
num_steps=10000
prediction=[]
with tf.Session(graph=graph) as session:
tf.global_variables_initializer().run()
print('Initialized')
for i in range(num_steps):
print("in loop")
offset = (i * batch_size) % (onehot.shape[0] - batch_size)
batch_data = train_data[offset:(offset + batch_size), :, :]
batch_labels = onehot[offset:(offset + batch_size), :]
print("training")
feed_dict = {data_placeholder: batch_data, label_placeholder: batch_labels}
_, l, predictions = session.run(
[optimizer, loss, train_input], feed_dict=feed_dict)
print(sess.run(tf.argmax(label_placeholder, 1), feed_dict={x:test_data}))
prediction.append(sess.run(tf.argmax(label_placeholder,1),feed_dict={x:test_data}))
print('Finished')
submit=pd.Dataframe({'image_id':only_index, 'label':prediction})
submit.to_csv('submit.csv',index=False)
I also had a doubt regarding predicting class labels. Can someone tell me whether the method I'm using for storing the predicted class labels will work or not?

The reshape operations do not make sense:
FullyCon=tf.reshape(Pool4,[-1,20])
this will collapse batch dimension and feature dimensions.
Why would output of Pool4 have 20 dimensions? The fact it has 20 kernels does not mean it has 20 dimensions. Dimensionality is 20 * size of the image on this level of convolutions, which will be much bigger (my guess is it will be 6430).
It should be something among the lines of
output_shape = Pool4.shape[1] * Pool4.shape[2] * Pool4.shape[3]
FullyCon=tf.reshape(Pool4, [-1, output_shape])
and then you will have to change final layer accordingly (to match shapes).

The error has been fixed after reshaping everything properly and also in the softmax with logits part,i had to send the data_placeholder for logits.After doing this the issue got cleared.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

requires_grad = False seems not working in my case - pytorch

Related

"An operation has `None` for gradient" problem with custom loss function

How to properly implement data reorganization using PyTorch?

.grad() returns None in pytorch

What means T.alloc

InvalidArgumentError logits and labels must be same size: logits_size=[3215,25] labels_size=[10,25]

Categories

Resources