I am trying to move my device onto a gpu. After running the function to determine if there is an available GPU, and determined there is one (see below)
> device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
> device
device(type='cuda', index=0)
When I call model.to(device) I get no change in the model attribute?
> model.to(device)
S2SModel(
(encoder): Encoder(
(lstm): LSTM(5, 32, batch_first=True)
)
(decoder): Decoder(
(lstm): LSTM(4, 32, batch_first=True)
)
(output_layer): Linear(in_features=32, out_features=1, bias=True)
)
> model.device
'cpu'
Though I have read that you do not need to assign the model.to() calls back to the object, i have tried that too.
> model = model.to(device)
> model.device
'cpu'
device is likely to be a user-defined attribute here that is different to the actual device the model sits on. This seems to be the reason why model.device returns 'cpu' To check if your model is on CPU or GPU, you can look at its first parameter:
>>> next(model.parameters()).device
Related
I am trying to code a GNN example problem as shown in the given link: https://towardsdatascience.com/hands-on-graph-neural-networks-with-pytorch-pytorch-geometric-359487e221a8
I am using a Macbook Pro 2016 edition, without a Nvidia graphic card!
The example problem is implementing 'CUDA' toolkit. Can I somehow modify the code and run in on my current laptop? I have made the dataset sufficiently small, such that it does not requires high computation and can run on my PC!
The part of the code which is giving an error is as follows!
def train():
model.train()
loss_all = 0
for data in train_loader:
data = data.to(device)
optimizer.zero_grad()
output = model(data)
label = data.y.to(device)
loss = crit(output, label)
loss.backward()
loss_all += data.num_graphs * loss.item()
optimizer.step()
return loss_all / len(train_dataset)
device = torch.device('cuda')
model = Net().to(device) # Net = A class inherited from torch.nn.Module
optimizer = torch.optim.Adam(model.parameters(), lr=0.005)
crit = torch.nn.BCELoss()
train_loader = DataLoader(train_dataset, batch_size=batch_size)
for epoch in range(num_epochs):
train()
The error is as follows
AssertionError: Torch not compiled with CUDA enabled
You are using:
device = torch.device('cuda')
If you like to use cpu please change to:
device = torch.device('cpu')
So I am using the ann_visualizer for showing my keras model neural network graphically. The model works properly, but it gives this error whenever I try to visualize it via ann_viz().
"ValueError: ANN Visualizer: Layer not supported for visualizing"
I searched the internet but couldn't find a valid solution.
this is the neural network model code
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=(28,28)))
model.add(keras.layers.Dense(128,activation=keras.activations.relu))
model.add(keras.layers.Dense(10,activation=keras.activations.softmax))
model.compile(
optimizer="adam",
loss=keras.losses.sparse_categorical_crossentropy,
metrics=["accuracy"]
)
model.fit(train_data, train_lables, epochs=10)
test_loss, test_acc = model.evaluate(test_data, test_lables)
And this is the ann_viz() function call
from ann_visualizer.visualize import ann_viz
ann_viz(model, title="Model")
Any idea how to make it work?
I also got the same error but was able to resolve by removing the Flatten() layer.
#Flatten the input
X = X.reshape(X.shape[0], 28*28)
model = keras.Sequential()
#added flat input shape
model.add(keras.layers.Dense(128,activation=keras.activations.relu, input_shape=(28*28,)))
model.add(keras.layers.Dense(10,activation=keras.activations.softmax))
model.compile(
optimizer="adam",
loss=keras.losses.sparse_categorical_crossentropy,
metrics=["accuracy"]
)
#now you can call the ann_viz
from ann_visualizer.visualize import ann_viz
ann_viz(model, title="Model")
Basically, I Flattened the input, removed the Flatten layer and added the flat input shape in the next layer.
Don't know the exact reason though.
I am trying to train a pytorch neural network on a GPU device. In order to do so, I load my inputs and network onto the default cuda enabled GPU decive. However, when I load my inputs, the model's weights do not stay cuda tensors. Here is my train function
def train(network: nn.Module, name: str, learning_cycles: dict, num_epochs):
# check we have a working gpu to train on
assert(torch.cuda.is_available())
# load model onto gpu
network = network.cuda()
# load train and test data with a transform
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
train_set = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_set, batch_size=128,
shuffle=True, num_workers=2)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(network.parameters(), lr=0.001, momentum=0.9)
for epoch in range(num_epochs):
for i, data in enumerate(train_loader, 0):
inputs, labels = data
# load inputs and labels onto gpu
inputs, labels = inputs.cuda(), labels.cuda()
optimizer.zero_grad()
outputs = network(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
When calling train, I get the following error.
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same
Interestingly, when I delete the line inputs, labels = inputs.cuda(), labels.cuda() I get the error RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same
I would very much like to train my network, and I have searched the internet to no avail. Any good ideas?
Given that a device mismatch crops up regardless of the device the inputs are on, it's likely that some of your model's parameters are not being moved over to the GPU when you call network = network.cuda(). You have model parameters on both the CPU and the GPU.
Post your model code. It's likely you have a Pytorch module in an incorrect container.
Lists of modules should be in a nn. ModuleList. Modules in a Python list will not transfer over. Compare
layers1 = [nn.Linear(256, 256), nn.Linear(256, 256), nn.Linear(256, 256)]
layers2 = nn.ModuleList([nn.Linear(256, 256), nn.Linear(256, 256), nn.Linear(256, 256)])
If you called model.cuda() on a model with the above two lines, the layers in layer1 would remain on the CPU, while the layers in layer2 would be moved to the GPU.
Similarly, a list of nn.Parameter objects should be contained in an nn.ParameterList object.
There's also nn. ModuleDict and nn.ParameterDict for dictionary containers.
I am trying to use Google Colab's free TPU for my training and I have created a dataset with tf.data. my y_label is label encoded data with 7 labels. and I get this error
InvalidArgumentError: Can not squeeze dim[1], expected a dimension of >1, got 7 for >'tpu_140081340653808/metrics/metrics/sparse_categorical_accuracy/remov>e_squeezable_dimensions/Squeeze' (op: 'Squeeze') with input shapes:
[1024,7].
How I load my data
def preprocess_image(image):
image = tf.image.decode_jpeg(image,channels = 3)
image = tf.image.convert_image_dtype(image, tf.float32)
image = tf.image.resize_images(image,[135,180])
image /= 255.0
return image
def load_and_preprocess_image(path,label):
image = tf.read_file(path)
return preprocess_image(image),label
def label_encode(dataset):
le = LabelEncoder()
dataset['encoded'] = le.fit_transform(dataset['dx'])
return dataset
def load_dataset(image_paths,image_labels):
label_dataset = tf.cast(image_labels, tf.int32)
path_ds = tf.data.Dataset.from_tensor_slices((image_paths,label_dataset))
ds = path_ds.map(load_and_preprocess_image,tf.data.experimental.AUTOTUNE)
return ds
def get_training_dataset(image_file, label_file, batch_size):
dataset = load_dataset(image_file, label_file)
#dataset = dataset.cache() # this small dataset can be entirely cached in RAM, for TPU this is important to get good performance from such a small dataset
#dataset = dataset.shuffle(buffer_size=image_count)
dataset = dataset.repeat() # Mandatory for Keras for now
dataset = dataset.batch(batch_size,drop_remainder=True) # drop_remainder is important on TPU, batch size must be fixed
dataset = dataset.prefetch(buffer_size=AUTOTUNE) # fetch next batches while training on the current one
return dataset
training_dataset = get_training_dataset(train_image_paths, train_image_labels, BATCH_SIZE)
# For TPU, we will need a function that returns the dataset with batches
training_input_fn = lambda: get_training_dataset(train_image_paths, train_image_labels, BATCH_SIZE)
my model
def create_res(input_sp):
resnet = ResNet50(input_shape=input_sp,include_top=False,weights='imagenet')
resnet.trainable=False
return resnet
def create_seq_model(input_shape):
tf.keras.backend.clear_session()
resnet = create_res(input_shape)
model = Sequential()
model.add(resnet)
model.add(GlobalAveragePooling2D())
model.add(Dense(1024,activation= 'relu'))
model.add(Dense(7,activation='softmax'))
return model
This is where I create my tpu model and compile the model for training and after I run I get the error mentioned above after starting epoch 1
strategy = tf.contrib.tpu.TPUDistributionStrategy(tpu)
trained_model = tf.contrib.tpu.keras_to_tpu_model(model, strategy=strategy)
trained_model.compile(optimizer=tf.train.AdagradOptimizer(learning_rate=0.1),
loss='sparse_categorical_crossentropy',
metrics=['sparse_categorical_accuracy'])
# Work in progress: reading directly from dataset object not yet implemented
# for Keras/TPU. Keras/TPU needs a function that returns a dataset.
history = trained_model.fit(training_input_fn, steps_per_epoch=10, epochs=EPOCHS)
INFO:tensorflow:Querying Tensorflow master (grpc://10.34.91.42:8470) for TPU system metadata.
INFO:tensorflow:Found TPU system:
INFO:tensorflow:*** Num TPU Cores: 8
INFO:tensorflow:*** Num TPU Workers: 1
INFO:tensorflow:*** Num TPU Cores Per Worker: 8
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, -1, 5096825871840033721)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 4168719798427690218)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 17179869184, 12924042521108751459)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 17179869184, 2745039220817617241)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 17179869184, 3340897553582653661)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 17179869184, 5742351359072887449)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 17179869184, 8474216619759453218)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 17179869184, 10296052414400763019)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 17179869184, 5559949278042991869)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 17179869184, 13163336187739408258)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 17179869184, 4869688774298217560)
WARNING:tensorflow:tpu_model (from tensorflow.contrib.tpu.python.tpu.keras_support) is experimental and may change or be removed at any time, and without warning.
Epoch 1/5
INFO:tensorflow:New input shapes; (re-)compiling: mode=train (# of cores 8), [TensorSpec(shape=(128,), dtype=tf.int32, name=None), TensorSpec(shape=(128, 135, 180, 3), dtype=tf.float32, name=None), TensorSpec(shape=(128,), dtype=tf.int32, name=None)]
INFO:tensorflow:Overriding default placeholder.
INFO:tensorflow:Remapping placeholder for resnet50_input
INFO:tensorflow:Remapping placeholder for input_1
INFO:tensorflow:Default: input_1
ERROR:tensorflow:Operation of type Placeholder (tpu_140081340653808/input_1) is not supported on the TPU. Execution will fail if this op is used in the graph.
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py in _create_c_op(graph, node_def, inputs, control_inputs)
1658 try:
-> 1659 c_op = c_api.TF_FinishOperation(op_desc)
1660 except errors.InvalidArgumentError as e:
InvalidArgumentError: Can not squeeze dim[1], expected a dimension of 1, got 7 for 'tpu_140081340653808/metrics/metrics/sparse_categorical_accuracy/remove_squeezable_dimensions/Squeeze' (op: 'Squeeze') with input shapes: [1024,7].
For sparse_categorical_accuracy, your labels need to be integers, i.e. the shape of your labels must be (batch_size, 1). From your error message, it seems that your labels reaching sparse_categorical_accuracy are one-hot encoded, i.e. of shape (batch_size, 7) instead.
You can see this from the implementation:
# If the shape of y_true is (num_samples, 1), squeeze to (num_samples,)
if (len(K.int_shape(y_true)) == len(K.int_shape(y_pred))):
y_true = array_ops.squeeze(y_true, [-1])
It's hard to see from your code how exactly your dataset reaches the training stage, but it seems that the label-encoded version stored in dataset['encoded'] is not used during accuracy calculation.
I have attempted unsuccessfully to implement an Estimator-based Tensorflow Model using the TPUEstimator API. It hits an error during training:
InvalidArgumentError (see above for traceback): No OpKernel was registered to support Op 'CrossReplicaSum' with these attrs. Registered devices: [CPU], Registered kernels: <no registered kernels>
[[Node: CrossReplicaSum_5 = CrossReplicaSum[T=DT_FLOAT](gradients/dense_2/BiasAdd_grad/tuple/control_dependency_1)]]
There's also a warning at the beginning, though I'm not certain it's relevant:
WARNING:tensorflow:CrossShardOptimizer should be used within a tpu_shard_context, but got unset number_of_shards. Assuming 1.
Here's the relevant part of the model function:
def model_fn(features, labels, mode, params):
"""A simple NN with two hidden layers of 10 nodes each."""
input_layer = tf.feature_column.input_layer(features, params['feature_columns'])
dense1 = tf.layers.dense(inputs=input_layer, units=10, activation=tf.nn.relu, kernel_initializer=tf.glorot_uniform_initializer())
dense2 = tf.layers.dense(inputs=dense1, units=10, activation=tf.nn.relu, kernel_initializer=tf.glorot_uniform_initializer())
logits = tf.layers.dense(inputs=dense2, units=4)
reshaped_logits = tf.reshape(logits, [-1, 1, 4])
onehot_labels = tf.one_hot(indices=tf.cast(labels, tf.int32), depth=4)
loss = tf.losses.softmax_cross_entropy(onehot_labels=onehot_labels, logits=reshaped_logits)
if mode == tf.estimator.ModeKeys.TRAIN:
optimizer = tf.contrib.tpu.CrossShardOptimizer(tf.train.AdagradOptimizer(learning_rate=0.05))
train_op = optimizer.minimize(
loss=loss,
global_step=tf.train.get_global_step())
I'm attempting local CPU execution using TPUEstimator by setting the --use_tpu flag to False. The TPUEstimator is instantiated and train is called thusly:
estimator_classifier = tf.contrib.tpu.TPUEstimator(
model_fn=model_fn,
model_dir="/tmp/estimator_classifier_logs",
config=tf.contrib.tpu.RunConfig(
session_config=tf.ConfigProto(
allow_soft_placement=True, log_device_placement=True),
tpu_config=tf.contrib.tpu.TPUConfig()
),
train_batch_size=DEFAULT_BATCH_SIZE,
use_tpu=False,
params={
'feature_columns': feature_columns
}
)
tensors_to_log = {"probabilities": "softmax_tensor"}
logging_hook = tf.train.LoggingTensorHook(tensors=tensors_to_log, every_n_iter=50)
estimator_classifier.train(
input_fn=data_factory.make_tpu_train_input_fn(train_x, train_y, DEFAULT_BATCH_SIZE),
steps=DEFAULT_STEPS,
hooks=[logging_hook]
)
What is the meaning of this error, and how can I troubleshoot it?
The context is not clear.
Are your running your job in Cloud TPU environment or some environment with TPU hardware?
If no, this is expected. TPUEstimator is designed to be used mainly for Cloud TPU environment, where the backend worker has all kernels linked into the Tensorflow server correctly. CrossReplicaSum is part the kernel registered for device TPU (not CPU).
If yes, did you set your master address correctly. According to the log, it seems your tensorflow session master does not have the TPU device in it. If you are running the job in Cloud TPU, you can do
with tf.Session('<replace_with_your_worker_address>') as sess:
print(sess.list_devices())
you should see at least see a device like "/<some_thing_varies_in_your_env>/device:TPU:0".
As per Tensorflow Using TPUs guide:
The CrossShardOptimizer is not compatible with local training. So, to have the same code run both locally and on a Cloud TPU, add lines like the following:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
if FLAGS.use_tpu:
optimizer = tf.contrib.tpu.CrossShardOptimizer(optimizer)