PyTorch : Different output on different cuda device for FasterRCNN/MaskRCNN - pytorch
I am trying out pretrained faster-rcnn model for object detection in PyTorch and observed a weird behavior on executing the following code on different cuda devices.
import io
import torch
import torchvision.transforms as transforms
from torchvision.models.detection.faster_rcnn import FasterRCNN
from torchvision.models.detection.backbone_utils import resnet_fpn_backbone
from PIL import Image
from torch.autograd import Variable
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
backbone = resnet_fpn_backbone('resnet50', True)
model = FasterRCNN(backbone, num_classes=91)
state_dict = torch.load('/home/ubuntu/state_dicts/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth')
model.load_state_dict(state_dict)
model.to(device)
model.eval()
def pre_process(image_bytes):
my_preprocess = transforms.Compose([transforms.ToTensor()])
image = Image.open(io.BytesIO(image_bytes))
image = my_preprocess(image)
return image
def get_prediction(image_bytes, threshold=0.5):
tensor = pre_process(image_bytes=image_bytes)
tensor = Variable(tensor).to(device)
pred = model([tensor])
print(pred)
with open("/home/ubuntu/persons.jpg", 'rb') as f:
image_bytes = f.read()
get_prediction(image_bytes=image_bytes)
The above code returns different output for device cuda:0 and cuda:1. Can someone help understand this difference in output for the same model on different cuda devices on the same machine.
On using cuda:0 it returns same response as it does on CPU.
Observed same behavior while using MaskRCNN model.
Output on cuda:0
[{'boxes': tensor([[167.4223, 57.0383, 301.3054, 436.6868],
[ 89.6149, 64.8980, 191.4021, 446.6606],
[362.3454, 161.9877, 515.5366, 385.2343],
[ 67.3742, 277.6379, 111.6810, 400.2647],
[228.7159, 145.8775, 303.5066, 231.1051],
[379.4247, 259.9776, 419.0149, 317.9510],
[517.9014, 149.5500, 636.5953, 365.5251],
[268.9992, 217.2433, 423.9517, 390.4785],
[539.6832, 157.8171, 616.1689, 253.0961],
[477.1378, 147.9255, 611.0255, 297.9276],
[286.6689, 216.3575, 550.4538, 383.1956],
[627.4468, 177.1990, 640.0000, 247.3514],
[ 88.3993, 226.4796, 560.9189, 421.6618],
[406.9602, 261.8285, 453.7620, 357.5365],
[451.3659, 207.4905, 504.6570, 287.6619],
[454.3897, 207.9612, 487.7692, 270.3133],
[451.8828, 208.3855, 631.0622, 355.3239],
[497.1180, 289.9157, 581.5941, 356.1050],
[600.6650, 183.4176, 621.5589, 250.3380],
[559.7050, 202.6747, 608.1462, 250.1502],
[375.3307, 245.6641, 444.8958, 333.0625],
[453.1024, 210.8463, 553.8406, 296.7747],
[555.2745, 199.9524, 611.2347, 250.5636],
[359.7946, 219.5903, 425.5572, 316.5619],
[476.7842, 249.0592, 583.8101, 354.6469],
[ 71.4854, 333.2897, 108.0255, 399.1010],
[207.6522, 121.4260, 301.1808, 251.5350],
[550.4424, 175.4845, 621.4010, 317.4897],
[445.1313, 209.7148, 519.7682, 331.3234],
[523.6974, 193.5186, 548.5457, 234.6627],
[449.0608, 229.3627, 572.3047, 293.8238],
[348.8312, 185.0679, 620.9442, 368.1201],
[578.4594, 232.6871, 586.2761, 246.6013],
[359.9344, 166.1812, 502.6697, 287.2637],
[ 43.1700, 244.8350, 407.5768, 394.7983],
[115.0793, 126.5799, 177.2827, 198.4358],
[476.8102, 147.0127, 566.3655, 260.0383],
[410.9664, 258.0466, 514.5250, 357.0403],
[450.8164, 277.2901, 521.0891, 359.8105],
[ 63.9356, 221.3673, 126.4192, 409.7991],
[625.5704, 189.2636, 640.0000, 256.4739],
[ 1.7555, 174.2491, 86.2912, 436.6681],
[ 65.3964, 274.4007, 106.8389, 349.2521],
[558.3841, 197.9385, 639.8632, 368.0412],
[193.0894, 164.9078, 599.5771, 384.6865],
[269.0641, 126.7004, 324.2201, 146.3630],
[359.1832, 201.2081, 484.3798, 276.5368],
[580.0465, 231.4633, 593.2866, 247.9024],
[454.5699, 142.0131, 634.2507, 258.4456],
[616.1375, 246.1040, 639.7282, 255.8053],
[309.7035, 151.7276, 518.3733, 249.3150],
[615.1505, 246.0356, 639.2537, 255.4936],
[452.0419, 199.0634, 584.8884, 357.6918],
[270.1078, 216.1271, 408.6000, 395.1962],
[564.9176, 199.7667, 606.9827, 245.9028],
[ 1.7000, 279.6961, 92.9089, 393.7010],
[495.4763, 253.3147, 640.0000, 361.1835],
[452.0239, 208.3828, 502.1486, 285.4540],
[554.9769, 214.0762, 601.4109, 248.5285],
[473.0355, 251.5581, 575.2361, 298.9354],
[383.1731, 259.1596, 418.4447, 312.5125],
[265.9569, 143.7254, 640.0000, 311.1364],
[353.1688, 200.4693, 494.6974, 272.1262],
[229.8953, 142.8851, 254.5031, 226.0164]], device='cuda:0',
grad_fn=<StackBackward>), 'labels': tensor([ 1, 1, 1, 31, 31, 31, 1, 15, 1, 1, 15, 1, 15, 31, 62, 62, 15, 15,
1, 18, 31, 62, 1, 31, 15, 31, 31, 1, 31, 32, 15, 15, 77, 1, 15, 27,
1, 31, 31, 31, 62, 64, 31, 1, 15, 15, 62, 77, 15, 15, 15, 67, 62, 62,
27, 64, 15, 15, 31, 15, 44, 15, 15, 31], device='cuda:0'), 'scores': tensor([0.9995, 0.9995, 0.9978, 0.9925, 0.9922, 0.9896, 0.9828, 0.9582, 0.8994,
0.8727, 0.8438, 0.8364, 0.7470, 0.7322, 0.6674, 0.5940, 0.4650, 0.3875,
0.3826, 0.3792, 0.3722, 0.3720, 0.3480, 0.3407, 0.2381, 0.2210, 0.2163,
0.2060, 0.1994, 0.1939, 0.1769, 0.1652, 0.1589, 0.1521, 0.1516, 0.1499,
0.1495, 0.1419, 0.1248, 0.1184, 0.1124, 0.1098, 0.1077, 0.1059, 0.1035,
0.0986, 0.0975, 0.0910, 0.0909, 0.0882, 0.0863, 0.0802, 0.0733, 0.0709,
0.0699, 0.0668, 0.0662, 0.0651, 0.0600, 0.0586, 0.0578, 0.0578, 0.0577,
0.0540], device='cuda:0', grad_fn=<IndexBackward>)}]
Output on cuda:1 (or any GPU other then cuda:0)
[{'boxes': tensor([[218.7705, 0.0000, 640.0000, 491.0000]], device='cuda:1',
grad_fn=<StackBackward>), 'labels': tensor([77], device='cuda:1'), 'scores': tensor([0.0646], device='cuda:1', grad_fn=<IndexBackward>)}]
Related
HuggingFace-Transformers --- NER single sentence/sample prediction
I am trying to predict with the NER model, as in the tutorial from huggingface (it contains only the training+evaluation part). I am following this exact tutorial here : https://github.com/huggingface/notebooks/blob/master/examples/token_classification.ipynb The training works flawlessly, but the problems that I have begin when I try to predict on a simple sample. model_checkpoint = "distilbert-base-uncased" tokenizer = AutoTokenizer.from_pretrained(model_checkpoint) loaded_model = AutoModel.from_pretrained('./my_model_own_custom_training.pth', from_tf=False) input_sentence = "John Nash is a great mathematician, he lives in France" tokenized_input_sentence = tokenizer([input_sentence], truncation=True, is_split_into_words=False, return_tensors='pt') predictions = loaded_model(tokenized_input_sentence["input_ids"])[0] Predictions is of shape (1,13,768) How can I arrive at the final result of the form [JOHN <-> ‘B-PER’, … France <-> “B-LOC”], where B-PER and B-LOC are two ground truth labels, representing the tag for a person and location respectively? The result of the prediction is: torch.Size([1, 13, 768]) If I write: print(predictions.argmax(axis=2)) tensor([613, 705, 244, 620, 206, 206, 206, 620, 620, 620, 477, 693, 308]) I get the tensor above. However I would have expected to get the tensor representing the ground truth [0…8] labels from the ground truth annotations. Summary when loading the model : loading configuration file ./my_model_own_custom_training.pth/config.json Model config DistilBertConfig { “name_or_path": “distilbert-base-uncased”, “activation”: “gelu”, “architectures”: [ “DistilBertForTokenClassification” ], “attention_dropout”: 0.1, “dim”: 768, “dropout”: 0.1, “hidden_dim”: 3072, “id2label”: { “0”: “LABEL_0”, “1”: “LABEL_1”, “2”: “LABEL_2”, “3”: “LABEL_3”, “4”: “LABEL_4”, “5”: “LABEL_5”, “6”: “LABEL_6”, “7”: “LABEL_7”, “8”: “LABEL_8” }, “initializer_range”: 0.02, “label2id”: { “LABEL_0”: 0, “LABEL_1”: 1, “LABEL_2”: 2, “LABEL_3”: 3, “LABEL_4”: 4, “LABEL_5”: 5, “LABEL_6”: 6, “LABEL_7”: 7, “LABEL_8”: 8 }, “max_position_embeddings”: 512, “model_type”: “distilbert”, “n_heads”: 12, “n_layers”: 6, “pad_token_id”: 0, “qa_dropout”: 0.1, “seq_classif_dropout”: 0.2, “sinusoidal_pos_embds”: false, "tie_weights”: true, “transformers_version”: “4.8.1”, “vocab_size”: 30522 }
The answer is a bit trickier than expected[Huge credits to Niels Rogge]. Firstly, loading models in huggingface-transformers can be done in (at least) two ways: AutoModel.from_pretrained('./my_model_own_custom_training.pth', from_tf=False) AutoModelForTokenClassification.from_pretrained('./my_model_own_custom_training.pth', from_tf=False) It seems that, according to the task at hand, different AutoModels subclasses need to be used. In this scenario I posted, it is the AutoModelForTokenClassification() that has to be used. After that, a solution to obtain the predictions would be to do the following: # forward pass outputs = model(**encoding) logits = outputs.logits predictions = logits.argmax(-1)
convert_to_generator_like num_samples Attribute Error: 'int' object has no attribute 'shape'
I've written a custom generator using Keras sequence, but at the end of first epoch i got: Attribute Error: Custom Generator object has no attribute 'shape' Ubuntu 18.04 Cuda 10 Tried Tensorflow 1.13 & 1.14 seeing this page: https://github.com/keras-team/keras/issues/12586 i tried changing from keras.utils import Sequence to from tensorflow.python.keras.utils.data_utils import Sequence but no luck! class CustomGenerator(Sequence): def __init__(self, ....): ... # Preallocate memory if mode == 'train' and self.crop_shape: self.X = np.zeros((batch_size, crop_shape[0], crop_shape[1], 4), dtype='float32') # edge # self.X2 = np.zeros((batch_size, crop_shape[1], crop_shape[0], 3), dtype='float32') self.Y1 = np.zeros((batch_size, crop_shape[0] // 4, crop_shape[1] // 4, self.n_classes), dtype='float32') def on_epoch_end(self): # Shuffle dataset for next epoch c = list(zip(self.image_path_list, self.label_path_list, self.edge_path_list)) random.shuffle(c) self.image_path_list, self.label_path_list, self.edge_path_list = zip(*c) # Fix memory leak (tensorflow.python.keras bug) gc.collect() def __getitem__(self, index): for n, (image_path, label_path,edge_path) in enumerate( zip(self.image_path_list[index * self.batch_size:(index + 1) * self.batch_size], self.label_path_list[index * self.batch_size:(index + 1) * self.batch_size], self.edge_path_list[index * self.batch_size:(index + 1) * self.batch_size])): image = cv2.imread(image_path, 1) label = cv2.imread(label_path, 0) edge = cv2.imread(edge_path, 0) .... self.X[n] = image self.Y1[n] = to_categorical(cv2.resize(label, (label.shape[1] // 4, label.shape[0] // 4)), self.n_classes).reshape((label.shape[0] // 4, label.shape[1] // 4, -1)) self.Y2[n] = to_categorical(cv2.resize(label, (label.shape[1] // 8, label.shape[0] // 8)), self.n_classes).reshape((label.shape[0] // 8, label.shape[1] // 8, -1)) self.Y3[n] = to_categorical(cv2.resize(label, (label.shape[1] // 16, label.shape[0] // 16)), self.n_classes).reshape((label.shape[0] // 16, label.shape[1] // 16, -1)) return self.X, [self.Y1, self.Y2, self.Y3] def __len__(self): return math.floor(len(self.image_path_list) / self.batch_size) def random_crop(image, edge, label, random_crop_size=(800, 1600)): .... return image, label The error is: 742/743 [============================>.] - ETA: 0s - loss: 1.8465 - conv6_cls_loss: 1.1261 - sub24_out_loss: 1.2478 - sub4_out_loss: 1.3827 - conv6_cls_categorical_accuracy: 0.6705 - sub24_out_categorical_accuracy: 0.6250 - sub4_out_categorical_accuracy: 0.5963Traceback (most recent call last): File "/home/user/Desktop/Keras-ICNet/train1.py", line 75, in <module> use_multiprocessing=True, shuffle=True, max_queue_size=10, initial_epoch=opt.epoch) File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1433, in fit_generator steps_name='steps_per_epoch') File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_generator.py", line 322, in model_iteration steps_name='validation_steps') File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_generator.py", line 144, in model_iteration shuffle=shuffle) File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_generator.py", line 480, in convert_to_generator_like num_samples = int(nest.flatten(data)[0].shape[0]) AttributeError: 'int' object has no attribute 'shape'
Looking at the stack trace, num_samples = int(nest.flatten(data)[0].shape[0]) AttributeError: 'int' object has no attribute 'shape' The data actually refers to the validation_data parameter passed in fit_generator. This is supposed to be a generator or tuple. My guess is this is passed as an array as a result of which nest.flatten(data)[0] returns an int and hence the error.
Training a Random Forest on Tensorflow
I am trying to train a tensorflow based random forest regression on numerical and continuos data. When I try to fit my estimator it begins with the message below: INFO:tensorflow:Constructing forest with params = INFO:tensorflow:{'num_trees': 10, 'max_nodes': 1000, 'bagging_fraction': 1.0, 'feature_bagging_fraction': 1.0, 'num_splits_to_consider': 10, 'max_fertile_nodes': 0, 'split_after_samples': 250, 'valid_leaf_threshold': 1, 'dominate_method': 'bootstrap', 'dominate_fraction': 0.99, 'model_name': 'all_dense', 'split_finish_name': 'basic', 'split_pruning_name': 'none', 'collate_examples': False, 'checkpoint_stats': False, 'use_running_stats_method': False, 'initialize_average_splits': False, 'inference_tree_paths': False, 'param_file': None, 'split_name': 'less_or_equal', 'early_finish_check_every_samples': 0, 'prune_every_samples': 0, 'feature_columns': [_NumericColumn(key='Average_Score', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), _NumericColumn(key='lat', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), _NumericColumn(key='lng', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)], 'num_classes': 1, 'num_features': 2, 'regression': True, 'bagged_num_features': 2, 'bagged_features': None, 'num_outputs': 1, 'num_output_columns': 2, 'base_random_seed': 0, 'leaf_model_type': 2, 'stats_model_type': 2, 'finish_type': 0, 'pruning_type': 0, 'split_type': 0} Then the process breaks down and I get a value error below: ValueError: Shape must be at least rank 2 but is rank 1 for 'concat' (op: 'ConcatV2') with input shapes: [?], [?], [?], [] and with computed input tensors: input[3] = <1>. This is the code I am using: import tensorflow as tf from tensorflow.contrib.tensor_forest.python import tensor_forest from tensorflow.python.ops import resources import pandas as pd from tensorflow.contrib.tensor_forest.client import random_forest from tensorflow.python.estimator.inputs import numpy_io import numpy as np def getFeatures(): Average_Score = tf.feature_column.numeric_column('Average_Score') lat = tf.feature_column.numeric_column('lat') lng = tf.feature_column.numeric_column('lng') return [Average_Score,lat ,lng] # Import hotel data Hotel_Reviews=pd.read_csv("./DataMining/Hotel_Reviews.csv") Hotel_Reviews_Filtered=Hotel_Reviews[(Hotel_Reviews.lat.notnull() | Hotel_Reviews.lng.notnull())] Hotel_Reviews_Filtered_Target = Hotel_Reviews_Filtered[["Reviewer_Score"]] Hotel_Reviews_Filtered_Features = Hotel_Reviews_Filtered[["Average_Score","lat","lng"]] #Preprocess the data x=Hotel_Reviews_Filtered_Features.to_dict('list') for key in x: x[key] = np.array(x[key]) y=Hotel_Reviews_Filtered_Target.values #specify params params = tf.contrib.tensor_forest.python.tensor_forest.ForestHParams( feature_colums= getFeatures(), num_classes=1, num_features=2, regression=True, num_trees=10, max_nodes=1000) #build the graph graph_builder_class = tensor_forest.RandomForestGraphs est=random_forest.TensorForestEstimator( params, graph_builder_class=graph_builder_class) #define input function train_input_fn = numpy_io.numpy_input_fn( x=x, y=y, batch_size=1000, num_epochs=1, shuffle=True) est.fit(input_fn=train_input_fn, steps=500) The variables x is a list of numpy array of shape (512470,): {'Average_Score': array([ 7.7, 7.7, 7.7, ..., 8.1, 8.1, 8.1]), 'lat': array([ 52.3605759, 52.3605759, 52.3605759, ..., 48.2037451, 48.2037451, 48.2037451]), 'lng': array([ 4.9159683, 4.9159683, 4.9159683, ..., 16.3356767, 16.3356767, 16.3356767])} The variable y is numpy array of shape (512470,1): array([[ 2.9], [ 7.5], [ 7.1], ..., [ 2.5], [ 8.8], [ 8.3]])
Force each array in x to be 2 dim using ndmin=2. Then the shapes should match and concat should be able to operate.
How to get all parameters of estimator in PySpark
I have a RandomForestRegressor, GBTRegressor and I'd like to get all parameters of them. The only way I found it could be done with several get methods like: from pyspark.ml.regression import RandomForestRegressor, GBTRegressor est = RandomForestRegressor() est.getMaxDepth() est.getSeed() But RandomForestRegressor and GBTRegressor have different parameters so it's not a good idea to hardcore all that methods. A workaround could be something like this: get_methods = [method for method in dir(est) if method.startswith('get')] params_est = {} for method in get_methods: try: key = method[3:] params_est[key] = getattr(est, method)() except TypeError: pass Then output will be like this: params_est {'CacheNodeIds': False, 'CheckpointInterval': 10, 'FeatureSubsetStrategy': 'auto', 'FeaturesCol': 'features', 'Impurity': 'variance', 'LabelCol': 'label', 'MaxBins': 32, 'MaxDepth': 5, 'MaxMemoryInMB': 256, 'MinInfoGain': 0.0, 'MinInstancesPerNode': 1, 'NumTrees': 20, 'PredictionCol': 'prediction', 'Seed': None, 'SubsamplingRate': 1.0} But I think there should be a better way to do that.
extractParamMap can be used to get all params from every estimator, for example: >>> est = RandomForestRegressor() >>> {param[0].name: param[1] for param in est.extractParamMap().items()} {'numTrees': 20, 'cacheNodeIds': False, 'impurity': 'variance', 'predictionCol': 'prediction', 'labelCol': 'label', 'featuresCol': 'features', 'minInstancesPerNode': 1, 'seed': -5851613654371098793, 'maxDepth': 5, 'featureSubsetStrategy': 'auto', 'minInfoGain': 0.0, 'checkpointInterval': 10, 'subsamplingRate': 1.0, 'maxMemoryInMB': 256, 'maxBins': 32} >>> est = GBTRegressor() >>> {param[0].name: param[1] for param in est.extractParamMap().items()} {'cacheNodeIds': False, 'impurity': 'variance', 'predictionCol': 'prediction', 'labelCol': 'label', 'featuresCol': 'features', 'stepSize': 0.1, 'minInstancesPerNode': 1, 'seed': -6363326153609583521, 'maxDepth': 5, 'maxIter': 20, 'minInfoGain': 0.0, 'checkpointInterval': 10, 'subsamplingRate': 1.0, 'maxMemoryInMB': 256, 'lossType': 'squared', 'maxBins': 32}
As described in How to print best model params in pyspark pipeline , you can get any model parameter that is available in the original JVM object of any model using the following structure <yourModel>.stages[<yourModelStage>]._java_obj.<getYourParameter>() All get-parameters are available here https://spark.apache.org/docs/latest/api/java/org/apache/spark/ml/classification/RandomForestClassificationModel.html For example, if you want to get MaxDepth of your RandomForest after cross-validation (getMaxDepth is not available in PySpark) you use cvModel.bestModel.stages[-1]._java_obj.getMaxDepth()
How to get the train_scores to plot a learning curve without using the learning_curve fuction fo scikitlearn?
I have a dataset of 21 subjects with different number of samples each one. I made a curve (check the figure). I remove: [10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30,32,34,36,38,40] samples from each subject. I am using StratifiedShuffleSplit with a 90% train_size and 10% test_size. This means: when I remove 10 samples, 9 will be used for training and 1 for testing when I remove 20 samples, 18 will be used for training and 2 for testing when I remove 30 samples, 27 will be used for training and 3 for testing when I remove 40 samples, 36 will be used for training and 4 for testing This curve shows the accuracy(test_score) but NOT the train_score. How can I plot the train_score without using the learning_curve function of scikit-learn? http://scikit-learn.org/stable/auto_examples/model_selection/plot_learning_curve.html The code: result_list = [] #LOADING .mat FILE x=sio.loadmat('/home/curve.mat')['x'] s_y=sio.loadmat('/home/rocio/curve.mat')['y'] y=np.ravel(s_y) #SENDING THE FILE TO PANDAS df = pd.DataFrame(x) df['label']=y #SPECIFYING THE # OF SAMPLES TO BE REMOVED for j in [10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30,32,34,36,38,40]: df1 = pd.concat(g.sample(j) for idx, g in df.groupby('label')) #TURNING THE DATAFRAME TO ARRAY X = df1[[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19]].values y = df1.label.values #Cross-validation clf = make_pipeline(preprocessing.RobustScaler(), neighbors.KNeighborsClassifier()) ####################10x2 SSS#################### print("Cross-validation:10x10") xSSSmean10 = [] for i in range(10): sss = StratifiedShuffleSplit(2, test_size=0.1, random_state=i) scoresSSS = model_selection.cross_val_score(clf, X, y, cv=sss.split(X, y)) xSSSmean10.append(scoresSSS.mean()) result_list.append(xSSSmean10) print("")
StratifiedShuffleSplit.split returns two values: train and test. You can assign the value resulting from sss.split(X, y) to a tuple, say testtuple. Then you create a new tuple which is made only of train sets, traintuple, constructed as follows: traintuple = (testtuple[0],testtuple[0]) then you calculate the accuracy on just the training set: scoreSSS_train = model_selection.cross_val_score(clf, X, y, cv=traintuple) In this way both training and testing are performed on the same set. Append the mean of scoreSSS_train to a new empty list just like you do with xSSSmean10 and it should work (I could not test it, sorry).