Does anybody know how to deal with Tensorflow 'work_element_count' errors?
F ./tensorflow/core/util/cuda_launch_config.h:127] Check failed: work_element_count > 0 (0 vs. 0)
Aborted (core dumped)
Here is part of my source code:
class DiscriminatorModel:
def __init__(self, session, some_parameters):
self.sess = session
self.parameters = some_parameters
def build_feed_dict(self, input_frames, gt_output_frames, generator):
feed_dict = {}
batch_size = np.shape(gt_output_frames)[0]
print(batch_size) # 1
print(np.shape(generator.input_frames_train)) # (?,7,32,32,32,1)
print(np.shape(input_frames)) # (1,7,32,32,32,1)
print(np.shape(generator.gt_frames_train)) # (?,7,32,32,32,1)
print(np.shape(gt_output_frames)) # (1,7,32,32,32,1)
g_feed_dict={generator.input_frames_train:input_frames,
generator.gt_frames_train:gt_output_frames}
def getshape(d):
if isinstance(d, dict):
return {k:getshape(d[k]) for k in d}
else:
return None
print("g_feed_dict shape :", getshape(g_feed_dict),"\n")
# {<tf.Tensor 'generator/data/Placeholder:0' shape=(?, 32, 32, 32, 1) dtype=float32>: None, <tf.Tensor 'generator/data/Placeholder_1:0' shape=(?, 32, 32, 32, 1) dtype=float32>: None}
print(sys.getsizeof(generator.scale_preds_train)) # 96
print(sys.getsizeof(g_feed_dict)) # 288
# error occurs here.
g_scale_preds = self.sess.run(generator.scale_preds_train, feed_dict=g_feed_dict)
# F ./tensorflow/core/util/cuda_launch_config.h:127] Check failed: work_element_count > 0 (0 vs. 0)
# Aborted (core dumped)
def train_step(self, batch, generator):
print(np.shape(batch)) # [1, 7, 32, 32, 32, 2]
input_frames = batch[:, :, :, :, :, :-1]
gt_output_frames = batch[:, :, :, :, :, -1:]
feed_dict = self.build_feed_dict(input_frames, gt_output_frames, generator)
class GeneratorModel:
def __init__(self, session, some_parameters):
self.sess = session
self.parameters = some_parameters
self.input_frames_train = tf.placeholder(
tf.float32, shape=[None, 7, 32, 32, 32, 1])
self.gt_frames_train = tf.placeholder(
tf.float32, shape=[None, 7, 32, 32, 32, 1])
self.input_frames_test = tf.placeholder(
tf.float32, shape=[None, 7, 32, 32, 32, 1])
self.gt_frames_test = tf.placeholder(
tf.float32, shape=[None, 7, 32, 32, 32, 1])
self.scale_preds_train = []
for p in range(4):
# scale size, 4 --> 8 --> 16 --> 32
sc = 4*(2**p)
# this passes tf.Tensor array of shape (1,7,sc,sc,sc,1)
train_preds = calculate(self.width_train,
self.height_train,
self.depth_train,
...)
self.scale_preds_train.append(train_preds
# [ <..Tensor shape=(1,7,4,4,4,1) ....>,
# <..Tensor shape=(1,7,8,8,8,1) ....>,
# <..Tensor shape=(1,7,16,16,16,1)..>,
# <..Tensor shape=(1,7,32,32,32,1)..> ]
print(self.scale_preds_train)
sess = tf.Session()
d_model = DiscriminatorModel(sess, some_parameters)
g_model = GeneratorModel(sess, some_parameters)
sess.run(tf.global_variables_initializer())
# this returns numpy array of shape [1,7,32,32,32,2]
batch = get_batch()
# trouble here.
d_model.train_step(batch, g_model)
I've seen some recommendations about:
use CUDA 9.0 / cuDNN 7.0 / tensorflow-gpu 1.7.0 (--> I'm already using these)
check if batch has size greater than 0 (--> it seems they are.)
do not use more gpus than the number of samples in a batch (--> I do not)
I use single 11GB gpu among 5 of them, specified as
~$ CUDA_VISIBLE_DEVICES=2 python3 foo.py
and the batch size is 1.
Can anyone tell the missing points or things I've done wrong?
Edit 1.
I found a case that gets through this error. If I give some modification to input like
# ... previous code does not change
print(sys.getsizeof(g_feed_dict)) # 288
temp_index = 0
temp_input = [generator.scale_preds_train[temp_index],
generator.scale_preds_train[temp_index],
generator.scale_preds_train[temp_index],
generator.scale_preds_train[temp_index]]
# this <temp_input> does not raise error here.
# however temp_index > 0 don't work.
g_scale_preds = self.sess.run(temp_input, feed_dict=g_feed_dict)
This makes input passed to the sess.run with its shape something like
[(1,7,4,4,4,1), (1,7,4,4,4,1), (1,7,4,4,4,1), (1,7,4,4,4,1)]
which should be (originally) list of scaled shapes like [(1,7,4,4,4,1), (1,7,8,8,8,1), (1,7,16,16,16,1), (1,7,32,32,32,1)].
Also, the arrays in the dictionary feed_dict is of shape
(1,7,32,32,32,1).
It seems like the error comes from tensorflow-gpu trying to reach wrong indices of array (where the memory is not allocated actually) therefore the "work element is count 0" (But I'm not sure yet).
I cannot understand why the temp_index > 0 (e.g. 1, 2, 3) does throw same
Check failed error, while 0 is the only shape that does not.
Edit 2.
After I changed my gpu from TITAN Xp to GeForce GTX, the error log said
Floating point exception (core dumped)
at the same code (sess.run).
In my case, one of the conv layers has 0 output feature maps, which causes this problem.
Now I've solved it..
Just as the GTX error log had told me, there was something becomes zero, and was actually a denominator (thus irrelevant with all of those code above). Specifications at the last debug is as follows:
CUDA 8.0 / Tensorflow 1.8.0
with GeForce GTX of course. I think the log showed different (and slightly more detailed) because of versions rather than the actual GPU, even though different version itself did not solve indeed.
I was training the model on Colab and got the same problem. The issue was 'num_classes', in the config file it was set to 2 while my model had 36 classes.
You should consider paying attention to num_classes in your config file.
Related
Due to my need to speed up my written code, I have modified that to pure NumPy code to evaluate the runtime in this way and by JAX accelerator in Python. I don't know if my code is appropriate to be accelerated by JAX, but my little previous studies and JAX usage experiences encourage me to try vectorizing or parallelizing the prepared NumPy code by JAX. For initial test, I have put jax.jit decorator on the function, but it stuck at the first line of my code. it raised the following error in Colab:
<__array_function__ internals> in take(*args, **kwargs)
UnfilteredStackTrace: NotImplementedError: The 'raise' mode to jnp.take is not supported.
The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.
--------------------
The above exception was the direct cause of the following exception:
NotImplementedError Traceback (most recent call last)
<__array_function__ internals> in take(*args, **kwargs)
/usr/local/lib/python3.7/dist-packages/jax/_src/numpy/lax_numpy.py in _take(a, indices, axis, out, mode)
5437 elif mode == "raise":
5438 # TODO(phawkins): we have no way to report out of bounds errors yet.
-> 5439 raise NotImplementedError("The 'raise' mode to jnp.take is not supported.")
5440 elif mode == "wrap":
5441 indices = mod(indices, _constant_like(indices, a.shape[axis_idx]))
NotImplementedError: The 'raise' mode to jnp.take is not supported.
I don't know how to handle this code by JAX. This error is related to np.take module, although I guess it will stuck again at some other lines e.g. which contain reduce.
The sample code is:
import numpy as np
import jax
pp_ = np.array([[0.75, 0.5, 0.5], [15, 10, 15], [0.5, 3., 0.35], [15, 17, 15]])
rr_ = np.array([1, 3, 2, 5], dtype=np.float64)
gg_ = np.array([-0.48305741, -1])
ee_ = np.array([[0, 2], [1, 3]], dtype=np.int64)
#jax.jit
def JAX_acc(pp_, rr_, gg_, ee_):
rr_act = np.take(rr_, ee_)
r_add = np.add.reduce(rr_act, axis=1)
pc_dis = np.sum((r_add, gg_), axis=0)
ang_ = np.arccos((rr_act ** 5 + pc_dis[:, None] ** 2) / 1e5)
pl_rad = rr_act * np.cos(ang_)
pp_act = np.take(pp_, ee_, axis=0)
pc_vec = -np.subtract.reduce(pp_act, axis=1)
pc_ = pp_act[:, 0, :] + pc_vec / np.linalg.norm(pc_vec, axis=1)[:, None] * np.abs(pl_rad[:, 0][:, None])
return print(pc_dis, pc_, pl_rad)
JAX_acc(pp_, rr_, gg_, ee_)
main Qusestion: Could JAX library be utilized for this example? How?
Shall I use other modules instead np.take?
I would be appreciated for helping to cure this code by JAX.
---------------- solved by the update ----------------
I would be grateful for any other explanations on the following extraneus questions (not needed):
Which of math operations (-,+,*,...) and their NumPy equivalents (np.power, nu.sum,...) will be faster using JAX? Do NumPy ones will be handled by JAX in a better scheme (in terms of speed) than common math ones?
Does JAX CPU mode need other writing styles than TPU mode; I didn't use that so far.
Updates:
I have changed the code using jnp related modules based on #jakedvp comment and the problem by np.take is gone:
def JAX_acc_jnp(pp_, rr_, gg_, ee_):
rr_act = jnp.take(rr_, ee_)
r_add = jnp.sum(rr_act, axis=1) # .squees()
pc_dis = jnp.add(r_add, gg_)
ang_ = jnp.arccos((rr_act ** 5 + pc_dis[:, None] ** 2) / 1e5)
pl_rad = rr_act * jnp.cos(ang_)
pp_act = jnp.take(pp_, ee_, axis=0)
pc_vec = jnp.diff(pp_act, axis=1).squeeze()
pc_ = pp_act[:, 0, :] + pc_vec / jnp.linalg.norm(pc_vec, axis=1)[:, None] * jnp.abs(pl_rad[:, 0][:, None])
return pc_dis, pc_, pl_rad
For pc_dis and pc_ the results are true, but pl_rad is different due to ang_ different achieved values which are all -1.0927847e-10; perhaps because true values are with -13 decimals and JAX changed dtype to float32, I don't know. If so, how could I specify which dtype JAX use?
larger data sizes: pp_, rr_, gg_, ee_
I have two tensors
t1=torch.Size([400, 32, 400])
t2= torch.Size([400, 32, 32])
when i excute this
torch.matmul(t1,t2)
i got this error RuntimeError:
Expected tensor to have size 400 at dimension 1, but got size 32 for
argument #2 'batch2' (while checking arguments for bmm)
Any help will be much appreciated
You get the error because the order of matrix multiplication is wrong.
It should be:
a = torch.randn(400, 32, 400)
b = torch.randn(400, 32, 32)
out = torch.matmul(b, a) # You performed torch.matmul(a, b)
# You can also do a simpler version of the matrix multiplication using the below code
out = b # a
I started learning Machine Learning and came across Neural Networks. while implementing a program i got this error. i have tried checking for every solution but no luck. here's my code:
from numpy import exp, array, random, dot
class neural_network:
def _init_(self):
random.seed(1)
self.weights = 2 * random.random((2, 1)) - 1
def train(self, inputs, outputs, num):
for iteration in range(num):
output = self.think(inputs)
error = outputs - output
adjustment = 0.01*dot(inputs.T, error)
self.weights += adjustment
def think(self, inputs):
return (dot(inputs, self.weights))
neural = neural_network()
# The training set
inputs = array([[2, 3], [1, 1], [5, 2], [12, 3]])
outputs = array([[10, 4, 14, 30]]).T
# Training the neural network using the training set.
neural.train(inputs, outputs, 10000)
# Ask the neural network the output
print(neural.think(array([15, 2])))
this is the error which i'm getting when running neural.train:
Traceback (most recent call last):
File "neural.py", line 27, in <module>
neural.train(inputs, outputs, 10000)
File "neural.py", line 10, in train
output = self.think(inputs)
File "neural.py", line 16, in think
return (dot(inputs, self.weights))
AttributeError: 'neural_network' object has no attribute 'weights'
Though its has a self attribute self.weights() still it says no such attribute.
Well, it turns out that your initialization method should be named __init__ (two underscores), not _init_...
So, changing the method to
def __init__(self):
random.seed(1)
self.weights = 2 * random.random((2, 1)) - 1
your code works OK:
neural.train(inputs, outputs, 10000)
print(neural.think(array([15, 2])))
# [ 34.]
Your initializing method is written wrong, its two underscores __init__(self): not one underscore_init_(self):
Otherwise, nice code!
I want to add word dropout to my network so that I can have sufficient training examples for training the embedding of the "unk" token. As far as I'm aware, this is standard practice. Let's assume the index of the unk token is 0, and the index for padding is 1 (we can switch them if that's more convenient).
This is a simple CNN network which implements word dropout the way I would have expected it to work:
class Classifier(nn.Module):
def __init__(self, params):
super(Classifier, self).__init__()
self.params = params
self.word_dropout = nn.Dropout(params["word_dropout"])
self.pad = torch.nn.ConstantPad1d(max(params["window_sizes"])-1, 1)
self.embedding = nn.Embedding(params["vocab_size"], params["word_dim"], padding_idx=1)
self.convs = nn.ModuleList([nn.Conv1d(1, params["feature_num"], params["word_dim"] * window_size, stride=params["word_dim"], bias=False) for window_size in params["window_sizes"]])
self.dropout = nn.Dropout(params["dropout"])
self.fc = nn.Linear(params["feature_num"] * len(params["window_sizes"]), params["num_classes"])
def forward(self, x, l):
x = self.word_dropout(x)
x = self.pad(x)
embedded_x = self.embedding(x)
embedded_x = embedded_x.view(-1, 1, x.size()[1] * self.params["word_dim"]) # [batch_size, 1, seq_len * word_dim]
features = [F.relu(conv(embedded_x)) for conv in self.convs]
pooled = [F.max_pool1d(feat, feat.size()[2]).view(-1, params["feature_num"]) for feat in features]
pooled = torch.cat(pooled, 1)
pooled = self.dropout(pooled)
logit = self.fc(pooled)
return logit
Don't mind the padding - pytorch doesn't have an easy way of using non zero padding in CNNs, much less trainable non-zero padding, so I'm doing it manually. Dropout also doesn't allow me to use non zero dropout, and I want to separate the padding token from the unk token. I'm keeping it in my example because it's the reason for this question's existence.
This doesn't work because dropout wants Float Tensors so that it can scale them properly, while my input is Long Tensors that don't need to be scaled.
Is there an easy way of doing this in pytorch? I essentially want to use LongTensor-friendly dropout (bonus: better if it will let me specify a dropout constant that isn't 0, so that I could use zero padding).
Actually I would do it outside of your model, before converting your input into a LongTensor.
This would look like this:
import random
def add_unk(input_token_id, p):
#random.random() gives you a value between 0 and 1
#to avoid switching your padding to 0 we add 'input_token_id > 1'
if random.random() < p and input_token_id > 1:
return 0
else:
return input_token_id
#than you have your input token_id
#for this example I take just a random number, lets say 127
input_token_id = 127
#let p be your probability for UNK
p = 0.01
your_input_tensor = torch.LongTensor([add_unk(input_token_id, p)])
Edit:
So there are two options which come to my mind which are actually GPU-friendly. In general both solutions should be much more efficient.
Option one - Doing computation directly in forward():
If you're not using torch.utils and don't have plans using it later this is probably the way to go.
Instead of doing the computation before we just do it in the forward() method of main PyTorch class. However I see no (simple) way doing this in torch 0.3.1., so you would need to upgrade to version 0.4.0:
So imagine x is your input vector:
>>> x = torch.tensor(range(10))
>>> x
tensor([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
probs is a vector containing uniform probabilities for dropout so we can check later agains our probability for dropout:
>>> probs = torch.empty(10).uniform_(0, 1)
>>> probs
tensor([ 0.9793, 0.1742, 0.0904, 0.8735, 0.4774, 0.2329, 0.0074,
0.5398, 0.4681, 0.5314])
Now we apply the dropout probabilities probs on our input x:
>>> torch.where(probs > 0.2, x, torch.zeros(10, dtype=torch.int64))
tensor([ 0, 0, 0, 3, 4, 5, 0, 7, 8, 9])
Note: To see some effect I chose a dropout probability of 0.2 here. I reality you probably want it to be smaller.
You can pick for this any token / id you like, here is an example with 42 as unknown token id:
>>> unk_token = 42
>>> torch.where(probs > 0.2, x, torch.empty(10, dtype=torch.int64).fill_(unk_token))
tensor([ 0, 42, 42, 3, 4, 5, 42, 7, 8, 9])
torch.where comes with PyTorch 0.4.0:
https://pytorch.org/docs/master/torch.html#torch.where
I don't know about the shapes of your network, but your forward() should look something like this then (when using mini-batching you need to flatten the input before applying dropout):
def forward_train(self, x, l):
# probabilities
probs = torch.empty(x.size(0)).uniform_(0, 1)
# applying word dropout
x = torch.where(probs > 0.02, x, torch.zeros(x.size(0), dtype=torch.int64))
# continue like before ...
x = self.pad(x)
embedded_x = self.embedding(x)
embedded_x = embedded_x.view(-1, 1, x.size()[1] * self.params["word_dim"]) # [batch_size, 1, seq_len * word_dim]
features = [F.relu(conv(embedded_x)) for conv in self.convs]
pooled = [F.max_pool1d(feat, feat.size()[2]).view(-1, params["feature_num"]) for feat in features]
pooled = torch.cat(pooled, 1)
pooled = self.dropout(pooled)
logit = self.fc(pooled)
return logit
Note: I named the function forward_train() so you should use another forward() without dropout for evaluation / predicting. But you could also use some if conditions with train().
Option two: using torch.utils.data.Dataset:
If you're using Dataset provided by torch.utils it is very easy to do this kind of pre-processing efficiently. Dataset uses strong multi-processing acceleration by default so the the code sample above just has to be executed in the __getitem__ method of your Dataset class.
This could look like this:
def __getitem__(self, index):
'Generates one sample of data'
# Select sample
ID = self.input_tokens[index]
# Load data and get label
# using add ink_unk function from code above
X = torch.LongTensor(add_unk(ID, p=0.01))
y = self.targets[index]
return X, y
This is a bit out of context and doesn't look very elegant but I think you get the idea. According to this blog post of Shervine Amidi at Stanford it should be no problem to do more complex pre-processing steps in this function:
Since our code [Dataset is meant] is designed to be multicore-friendly, note that you
can do more complex operations instead (e.g. computations from source
files) without worrying that data generation becomes a bottleneck in
the training process.
The linked blog post - "A detailed example of how to generate your data in parallel with PyTorch" - provides also a good guide for implementing the data generation with Dataset and DataLoader.
I guess you'll prefer option one - only two lines and it should be very efficient. :)
Good luck!
I have a dataset of 21 subjects with different number of samples each one.
I made a curve (check the figure). I remove: [10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30,32,34,36,38,40] samples from each subject. I am using StratifiedShuffleSplit with a 90% train_size and 10% test_size. This means:
when I remove 10 samples, 9 will be used for training and 1 for testing
when I remove 20 samples, 18 will be used for training and 2 for testing
when I remove 30 samples, 27 will be used for training and 3 for testing
when I remove 40 samples, 36 will be used for training and 4 for testing
This curve shows the accuracy(test_score) but NOT the train_score.
How can I plot the train_score without using the learning_curve function of scikit-learn? http://scikit-learn.org/stable/auto_examples/model_selection/plot_learning_curve.html
The code:
result_list = []
#LOADING .mat FILE
x=sio.loadmat('/home/curve.mat')['x']
s_y=sio.loadmat('/home/rocio/curve.mat')['y']
y=np.ravel(s_y)
#SENDING THE FILE TO PANDAS
df = pd.DataFrame(x)
df['label']=y
#SPECIFYING THE # OF SAMPLES TO BE REMOVED
for j in [10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30,32,34,36,38,40]:
df1 = pd.concat(g.sample(j) for idx, g in df.groupby('label'))
#TURNING THE DATAFRAME TO ARRAY
X = df1[[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19]].values
y = df1.label.values
#Cross-validation
clf = make_pipeline(preprocessing.RobustScaler(), neighbors.KNeighborsClassifier())
####################10x2 SSS####################
print("Cross-validation:10x10")
xSSSmean10 = []
for i in range(10):
sss = StratifiedShuffleSplit(2, test_size=0.1, random_state=i)
scoresSSS = model_selection.cross_val_score(clf, X, y, cv=sss.split(X, y))
xSSSmean10.append(scoresSSS.mean())
result_list.append(xSSSmean10)
print("")
StratifiedShuffleSplit.split returns two values: train and test. You can assign the value resulting from sss.split(X, y) to a tuple, say testtuple. Then you create a new tuple which is made only of train sets, traintuple, constructed as follows:
traintuple = (testtuple[0],testtuple[0])
then you calculate the accuracy on just the training set:
scoreSSS_train = model_selection.cross_val_score(clf, X, y, cv=traintuple)
In this way both training and testing are performed on the same set.
Append the mean of scoreSSS_train to a new empty list just like you do with xSSSmean10 and it should work (I could not test it, sorry).