I'm new to Pytorch and neural network programming but I've an issue I encountered and I'm not able to solve it on my own. My data are numpy arrays of 1 and 0. But when I try to train my net, I get this error :
RuntimeError: Expected object of type torch.ByteTensor but found type torch.FloatTensor for argument #2 'mat2'
the line where the error comes from is in the forward method of my net
x = self.fc1(x)
I've tried these to convert my tensors but I still get the error :
x = x.type('torch.ByteTensor')
and
x.byte()
x.byte() returns what you need, but it's not an "inplace" method. Try doing:
x = x.byte()
Related
I am failing to run torch.jit.trace despite my best effort, encountering RuntimeError: Input, output and indices must be on the current device
I have a (fairly complex) model which I have already put on GPU, along with a set of inputs, also on GPU. I can verify that all input tensors and model parameters & buffers are on the same device:
(Pdb) {p.device for p in self.parameters()}
{device(type='cuda', index=0)}
(Pdb) {p.device for p in self.buffers()}
{device(type='cuda', index=0)}
(Pdb) in_ = (<several tensors here>)
(Pdb) {p.device for p in in_}
{device(type='cuda', index=0)}
(Pdb) torch.cuda.current_device()
0
I can certify the model runs and the output is on the correct device:
(Pdb) self(*in_).device
device(type='cuda', index=0)
Despite all this, tracing fails:
(Pdb) generator_script = torch.jit.trace(self, example_inputs=in_)
*** RuntimeError: Input, output and indices must be on the current device
I understand about inputs and outputs, but what are these "indices"
that must also be on the same device?
What other elements that I am
not accounting for could be causing trace to fail?
If you're not yet mapping the device during the loading process, doing so could be the solution.[1] That is, mapping the device should happen during jit.load, not as a simple call of .to(device) after jit.load has already finished. See this page for more info.
As an example of what to do:
model = jit.load("your_traced_model.pt", map_location=torch.device("cuda"))
This is different from how it works for typical/non-JIT models, where you can simply do:
model = some_model_creation_function()
_ = model.to(torch.device("cuda"))
1 = this does not currently work for the MPS device.
After hard-coding the trace command into my code, I was able to get a more precise stack trace which let me to this piece of code, which I simplified for ease of reading:
B, L, C, H, W = inp_seq.shape
ref_seq = torch.repeat_interleave(
ref_seq.squeeze(dim=1),
repeats=L,
dim=0,
)
During normal execution, L evaluates to a python int, but using pdb I was able to determine that L became a Tensor, which should be ok, except that this tensor was on cpu, and was causing the error.
Forcibly converting L to int was sufficient to overcome this error:
B, L, C, H, W = inp_seq.shape
ref_seq = torch.repeat_interleave(
ref_seq.squeeze(dim=1),
repeats=int(L),
dim=0,
)
However, this feels like a bug or at least a missing feature from pytorch: why does inp_seq.shape produce CPU tensors when inp_seq is on the GPU?
I am currently using torch 1.8.1+cu101
I'm trying to make my own neural network "library" (if you can call it that) for myself to use, since I am hobby-learning about them.
I wrote this code that makes a propagatable neural network by feeding it a structure of the desired network, and it worked pretty well.
But then when I tried giving the model a different amount of nodes, the code BUGGED
I've already tried to edit the amount of nodes in each layer and see where that takes me, and I've found out that I only get this error when the first and the second layer have the same amount of nodes in them, but the output layer has a different amount. I've also tried to do the matrix multiplication of the structure that outputs the bug on paper, and it gave me an actual result (which I've double-checked for legitness a lot of times). So now I know that it has something to do with the practical and not theoretical.
There's clearly something wrong with the matrix multiplication, I think.
The script's functions
I had to include these functions in the question, so you can have a better inside on how this code works.
is_iterable()
This function returns a boolean value that describes if the input is iterable
def is_iterable(x):
try:
x[0]
return True
except:
return False
blueprint()
This function returns a copy of the input array but changes the elements that aren't iterable to 0's
def blueprint(x):
return [blueprint(e) if is_iterable(e) else 0 for e in x]
build()
This function takes a model of your desired neural network structure as input, and outputs suited randomized biases and weights seperated in two different arrays
The 'randomize()' function returns a copy of the input array but changes the elements that aren't iterable to random floats between -1's and 1's.
The 'build-weights()' function returns randomized weights based on a model of a neural network.
def build(x):
def randomize(x):
return np.array([randomize(n) if type(n) is list else random.uniform(-1, 1) for n in x])
def build_weighs(x):
y = []
for i, l in enumerate(x):
if i == len(x) - 1:
break
y.append([randomize(x[i + 1]) for n in l])
return np.array(y)
return (randomize(x), build_weighs(x))
apply_funcs()
This function applies a list of functions to another list of functions and then returns them. If the function list contains a 0, an element from the other list positioned in the same place will not be applied to any function.
def apply_funcs(x, f):
y = x
i = 0
for xj, fj in zip(x, f):
if fj == 0:
y[i] = xj
else:
y[i] = fj(xj)
i += 1
return y
nn()
This is the class for making a neural network.
You can see that it has a function named, 'prop' for the forward propagation of the network.
class nn:
def __init__(self, structure, a_funcs=None):
self.structure = structure
self.b = np.array(structure[0])
self.w = np.array(structure[1])
if a_funcs == None:
a_funcs = blueprint(self.b)
self.a_funcs = np.array(a_funcs).
def prop(self, x):
y = np.array(x)
if y.shape != self.b[0].shape:
raise ValueError("The input needs to be intact with the Input Nodes\nInput: {} != Input Nodes: {}".format(blueprint(y), blueprint(self.b[0])))
wi = 0
# A loop through the layers of the neural network
for i in range(len(self.b)):
# This if statement is here so that the weights get applied in the right order
if i != 0:
y = np.matmul(y, self.w[wi])
wi += 1
# Applying the biases of layer i to the current information
y = np.add(y, self.b[i])
# Applying the activation functions to the current information
y = apply_funcs(y, self.a_funcs[i])
return y
Defining a neural network structure and propagating it
n is containing the structure which is a 3 layer network containing respectively 2 nodes, 2 nodes and 3 nodes.
n = [[0] * 2, [0] * 2, [0] * 3]
bot = nn(build(n))
print(bot.prop([1] * 2))
When I do this I expect the code to output an array of three semi-random numbers like this:
[-0.55889818 0.62762604 0.59222784]
but instead I get an error from numpy saying this:
File "C:\Users\Black\git\Changbot\oper.py.py", line 78, in prop
y = np.matmul(y, self.w[wi])
TypeError: Object arrays are not currently supported
And the weirdest thing about this is that (as I said earlier) I only get this error when the first and the second layer have the same amount of nodes in them, but the output layer has a different amount. All the other times I get the expected output...
I have now again checked the values that are causing this error and I don't see any objects other than a list. It's the same when it's not bugging...
So I added this try-except statement:
try:
y = np.matmul(np.array(y), self.w[wi])
except TypeError:
print("y:{}\nself.w[wi]:{}".format(y, self.w[wi]))
It then outputs this:
y:[1.6888437]
self.w[wi]:[array([-0.19013173])]
Which should have the ability to be multiplied with each other
I have even tried copy pasting the values into an interpreter and multiplying them there, and it works there...
NOTE: THIS IS A VERY BAD TEST AS THE COPY PASTE ARRAYS DOESN'T HAVE THE SAME DTYPES AS THE ACTUAL ARRAYS
np.matmul([1.6888437], [np.array([-0.19013173])])
Output for the above:
[-0.32110277]
After looking at the answers
Okay. I have now found out that the object dtype arrays lies in the structure of the neural network by doing this at the end of the script:
print("STRUCTURE:{}".format(n))
It then outputs this:
STRUCTURE:(array([array([0.6888437]), array([ 0.51590881, -0.15885684]),
array([-0.4821665 , 0.02254944, -0.19013173])], dtype=object), array([list([array([ 0.56759718, -0.39337455])]),
list([array([-0.04680609, 0.16676408, 0.81622577]), array([ 0.00937371, -0.43632431, 0.51160841])])],
dtype=object))
Solving the bug
I can understand from one of the answer to this post that np.array() tries to create as high a dimensional array as it can, and failing that falls back on object dtype (or for some combinations of inputs raises an error).
The object dtype gets created in the build() function so I tried to remove all np.array() functions in that. Actually i removed all of such from the whole script. And guess what? It worked! Thanks a 1000 times to you contributers!
Btw Happy New Year
Regarding your copy-paste testing:
In [55]: np.matmul([1.6888437], [np.array([-0.19013173])])
Out[55]: array([-0.32110277])
But this isn't what your code is using. Instead we have to make arrays that match in dtype.
In [59]: x = np.array([1.6888437]); y = np.array([np.array([-0.19013173]),None])[:1]
In [60]: x
Out[60]: array([1.6888437])
In [61]: y
Out[61]: array([array([-0.19013173])], dtype=object)
I used the None funny business to force it to create an object dtype containing an array, which will print as [array([-0.19013173])].
Now I get your error:
In [62]: np.matmul(x,y)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-62-b6212b061655> in <module>()
----> 1 np.matmul(x,y)
TypeError: Object arrays are not currently supported
Even if did work as with dot
In [66]: np.dot(x,y)
Out[66]: array([-0.32110277])
the calculations with object dtype arrays are slower.
I won't try to figure out why you have an object dtype array at this point. But I think you should avoid those in code where speed matters.
If you construct an array from arrays or lists that differ in size, the result is likely to be object dtype with a lower number of dimensions. np.array tries to create as high a dimensional array as it can, and failing that falls back on object dtype (or for some combinations of inputs raises an error).
I'm trying to randomly subsample the prediction and target array for my loss calculation.
idx = torch.randperm(target.shape[0])
target = target.index_select(0, idx[0, sample_size]
However I'm getting this error message.
index_select(): argument 'index' (position 2) must be Variable, not torch.LongTensor
Does anyone know how to fix this?
Edit:
I got one step closer. It seems like torch.randperm does not return a torch variable, so one has to explicitly convert the output:
idx = torch.randperm(target.shape[0])
idx = Variable(idx).cuda()
target = target.index_select(0, idx[0, sample_size]
only problem is now that the backpropagation fails. Seems like the operation of randomly subsampling is causing an issue with the dimensions.
However the dimensions seem to be fine when calculating the loss:
loss = F.nll_loss(prediction, target.view(-1)) # prediction shape is [Nx12] and target shape is N
Unfortunately when calling loss.backward() I get this error message:
RuntimeError: The expanded size of the tensor (12) must match the existing size (217456) at non-singleton dimension 1
I am using the keras R package in RStudio.
I want to fit a model that uses a customized loss function; specifically, (-1)* log-likelihood for a Poisson model. I am smoothing the logarithm as ln(0.0001 + x^2)/2. Following the example in this article I write
K <- backend()
poisson <- function(y_true, y_pred){
K$sum(y_pred - y_true * K$log(y_pred^2 + 1e-4)/2 + lgamma(y_true+1))
}
Here I am mixing functions used by Keras from the "backend" source, such as K$log, and R function, such as lgamma, which I used because K$lgamma threw an error.
The rest of the commands in the example did run and produced some output.
Questions:
1) Can one mix and match functions in keras running in RStudio?
2) How can I test what the function is doing? I tried
poisson(1:5,3:7)
And I get the error
Error in py_call_impl(callable, dots$args, dots$keywords) :
TypeError: Input 'y' of 'Mul' Op has type float32 that does not match type int32 of argument 'x'.
I guess it’s a bad idea mix both R and Keras function in custom loss function and here why: y_true, y_pred it’s not R vectors, but Keras tensors and this is source of your error.
I've been trying to implement a custom objective function in Keras (the negative log likelihood of the normal distribution)
Keras expects one argument for the ground truth tensor, and one for the predictions tensor; for y_pred,I'm passing a tensor that should represent a nx2 matrix where the first column is the mean of the distribution, and the second the precision.
My problem is that I haven't been able to get a clear idea how I properly slice y_pred before passing it into the likelihood function without yielding the error
'Expected an array-like object, but found a Variable: maybe you are trying to call a function on a (possibly shared) variable instead of a numeric array?'
While I understand that I'm feeding l_func arguments of the variable type when it expects an array,I don't seem to be able to grok how to properly split the input y_pred variable into its mean and precision components to plug into the likelihood function. Here are some attempts; if someone could enlighten me about how to proceed, I would greatly appreciate it.
def log_likelihood(y_true,y_pred):
mu = T.vector('mu')
beta = T.vector('beta')
x=T.vector('x')
likelihood = .5*(beta*(x-mu)**2)-T.log(beta/(2*np.pi))
l_func = function([mu,beta,x], likelihood)
return(l_func(y_pred[:,0],y_pred[:,1],y_true))
def log_likelihood(y_true,y_pred):
likelihood = .5*(y_pred[:,1]*(y_true-y_pred[:,0])**2)-T.log(y_pred[:,1]/(2*np.pi))
l_func = function([y_true,y_pred], likelihood)
return(l_func(y_true,y_pred))
def log_likelihood(y_true,y_pred):
mu=y_pred[:,0]
beta=y_pred[:,1]
x=y_true
mu_function=function([y_pred],mu)
beta_function=function([y_pred],beta)
id_function=function([y_true],x)
likelihood = .5*(beta_function(y_pred)*(id_function(y_true)-mu_function(y_pred))**2)-T.log(beta_function(y_pred)/(2*np.pi))
l_func = function([y_true,y_pred], likelihood)
return(l_func(y_true,y_pred))