PyTorch : Different output on different cuda device for FasterRCNN/MaskRCNN - pytorch

I am trying out pretrained faster-rcnn model for object detection in PyTorch and observed a weird behavior on executing the following code on different cuda devices.
import io
import torch
import torchvision.transforms as transforms
from torchvision.models.detection.faster_rcnn import FasterRCNN
from torchvision.models.detection.backbone_utils import resnet_fpn_backbone
from PIL import Image
from torch.autograd import Variable
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
backbone = resnet_fpn_backbone('resnet50', True)
model = FasterRCNN(backbone, num_classes=91)
state_dict = torch.load('/home/ubuntu/state_dicts/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth')
model.load_state_dict(state_dict)
model.to(device)
model.eval()
def pre_process(image_bytes):
my_preprocess = transforms.Compose([transforms.ToTensor()])
image = Image.open(io.BytesIO(image_bytes))
image = my_preprocess(image)
return image
def get_prediction(image_bytes, threshold=0.5):
tensor = pre_process(image_bytes=image_bytes)
tensor = Variable(tensor).to(device)
pred = model([tensor])
print(pred)
with open("/home/ubuntu/persons.jpg", 'rb') as f:
image_bytes = f.read()
get_prediction(image_bytes=image_bytes)
The above code returns different output for device cuda:0 and cuda:1. Can someone help understand this difference in output for the same model on different cuda devices on the same machine.
On using cuda:0 it returns same response as it does on CPU.
Observed same behavior while using MaskRCNN model.
Output on cuda:0
[{'boxes': tensor([[167.4223, 57.0383, 301.3054, 436.6868],
[ 89.6149, 64.8980, 191.4021, 446.6606],
[362.3454, 161.9877, 515.5366, 385.2343],
[ 67.3742, 277.6379, 111.6810, 400.2647],
[228.7159, 145.8775, 303.5066, 231.1051],
[379.4247, 259.9776, 419.0149, 317.9510],
[517.9014, 149.5500, 636.5953, 365.5251],
[268.9992, 217.2433, 423.9517, 390.4785],
[539.6832, 157.8171, 616.1689, 253.0961],
[477.1378, 147.9255, 611.0255, 297.9276],
[286.6689, 216.3575, 550.4538, 383.1956],
[627.4468, 177.1990, 640.0000, 247.3514],
[ 88.3993, 226.4796, 560.9189, 421.6618],
[406.9602, 261.8285, 453.7620, 357.5365],
[451.3659, 207.4905, 504.6570, 287.6619],
[454.3897, 207.9612, 487.7692, 270.3133],
[451.8828, 208.3855, 631.0622, 355.3239],
[497.1180, 289.9157, 581.5941, 356.1050],
[600.6650, 183.4176, 621.5589, 250.3380],
[559.7050, 202.6747, 608.1462, 250.1502],
[375.3307, 245.6641, 444.8958, 333.0625],
[453.1024, 210.8463, 553.8406, 296.7747],
[555.2745, 199.9524, 611.2347, 250.5636],
[359.7946, 219.5903, 425.5572, 316.5619],
[476.7842, 249.0592, 583.8101, 354.6469],
[ 71.4854, 333.2897, 108.0255, 399.1010],
[207.6522, 121.4260, 301.1808, 251.5350],
[550.4424, 175.4845, 621.4010, 317.4897],
[445.1313, 209.7148, 519.7682, 331.3234],
[523.6974, 193.5186, 548.5457, 234.6627],
[449.0608, 229.3627, 572.3047, 293.8238],
[348.8312, 185.0679, 620.9442, 368.1201],
[578.4594, 232.6871, 586.2761, 246.6013],
[359.9344, 166.1812, 502.6697, 287.2637],
[ 43.1700, 244.8350, 407.5768, 394.7983],
[115.0793, 126.5799, 177.2827, 198.4358],
[476.8102, 147.0127, 566.3655, 260.0383],
[410.9664, 258.0466, 514.5250, 357.0403],
[450.8164, 277.2901, 521.0891, 359.8105],
[ 63.9356, 221.3673, 126.4192, 409.7991],
[625.5704, 189.2636, 640.0000, 256.4739],
[ 1.7555, 174.2491, 86.2912, 436.6681],
[ 65.3964, 274.4007, 106.8389, 349.2521],
[558.3841, 197.9385, 639.8632, 368.0412],
[193.0894, 164.9078, 599.5771, 384.6865],
[269.0641, 126.7004, 324.2201, 146.3630],
[359.1832, 201.2081, 484.3798, 276.5368],
[580.0465, 231.4633, 593.2866, 247.9024],
[454.5699, 142.0131, 634.2507, 258.4456],
[616.1375, 246.1040, 639.7282, 255.8053],
[309.7035, 151.7276, 518.3733, 249.3150],
[615.1505, 246.0356, 639.2537, 255.4936],
[452.0419, 199.0634, 584.8884, 357.6918],
[270.1078, 216.1271, 408.6000, 395.1962],
[564.9176, 199.7667, 606.9827, 245.9028],
[ 1.7000, 279.6961, 92.9089, 393.7010],
[495.4763, 253.3147, 640.0000, 361.1835],
[452.0239, 208.3828, 502.1486, 285.4540],
[554.9769, 214.0762, 601.4109, 248.5285],
[473.0355, 251.5581, 575.2361, 298.9354],
[383.1731, 259.1596, 418.4447, 312.5125],
[265.9569, 143.7254, 640.0000, 311.1364],
[353.1688, 200.4693, 494.6974, 272.1262],
[229.8953, 142.8851, 254.5031, 226.0164]], device='cuda:0',
grad_fn=<StackBackward>), 'labels': tensor([ 1, 1, 1, 31, 31, 31, 1, 15, 1, 1, 15, 1, 15, 31, 62, 62, 15, 15,
1, 18, 31, 62, 1, 31, 15, 31, 31, 1, 31, 32, 15, 15, 77, 1, 15, 27,
1, 31, 31, 31, 62, 64, 31, 1, 15, 15, 62, 77, 15, 15, 15, 67, 62, 62,
27, 64, 15, 15, 31, 15, 44, 15, 15, 31], device='cuda:0'), 'scores': tensor([0.9995, 0.9995, 0.9978, 0.9925, 0.9922, 0.9896, 0.9828, 0.9582, 0.8994,
0.8727, 0.8438, 0.8364, 0.7470, 0.7322, 0.6674, 0.5940, 0.4650, 0.3875,
0.3826, 0.3792, 0.3722, 0.3720, 0.3480, 0.3407, 0.2381, 0.2210, 0.2163,
0.2060, 0.1994, 0.1939, 0.1769, 0.1652, 0.1589, 0.1521, 0.1516, 0.1499,
0.1495, 0.1419, 0.1248, 0.1184, 0.1124, 0.1098, 0.1077, 0.1059, 0.1035,
0.0986, 0.0975, 0.0910, 0.0909, 0.0882, 0.0863, 0.0802, 0.0733, 0.0709,
0.0699, 0.0668, 0.0662, 0.0651, 0.0600, 0.0586, 0.0578, 0.0578, 0.0577,
0.0540], device='cuda:0', grad_fn=<IndexBackward>)}]
Output on cuda:1 (or any GPU other then cuda:0)
[{'boxes': tensor([[218.7705, 0.0000, 640.0000, 491.0000]], device='cuda:1',
grad_fn=<StackBackward>), 'labels': tensor([77], device='cuda:1'), 'scores': tensor([0.0646], device='cuda:1', grad_fn=<IndexBackward>)}]

Related

HuggingFace-Transformers --- NER single sentence/sample prediction

I am trying to predict with the NER model, as in the tutorial from huggingface (it contains only the training+evaluation part).
I am following this exact tutorial here : https://github.com/huggingface/notebooks/blob/master/examples/token_classification.ipynb
The training works flawlessly, but the problems that I have begin when I try to predict on a simple sample.
model_checkpoint = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
loaded_model = AutoModel.from_pretrained('./my_model_own_custom_training.pth',
from_tf=False)
input_sentence = "John Nash is a great mathematician, he lives in France"
tokenized_input_sentence = tokenizer([input_sentence],
truncation=True,
is_split_into_words=False,
return_tensors='pt')
predictions = loaded_model(tokenized_input_sentence["input_ids"])[0]
Predictions is of shape (1,13,768)
How can I arrive at the final result of the form [JOHN <-> ‘B-PER’, … France <-> “B-LOC”], where B-PER and B-LOC are two ground truth labels, representing the tag for a person and location respectively?
The result of the prediction is:
torch.Size([1, 13, 768])
If I write:
print(predictions.argmax(axis=2))
tensor([613, 705, 244, 620, 206, 206, 206, 620, 620, 620, 477, 693, 308])
I get the tensor above.
However I would have expected to get the tensor representing the ground truth [0…8] labels from the ground truth annotations.
Summary when loading the model :
loading configuration file ./my_model_own_custom_training.pth/config.json
Model config DistilBertConfig {
“name_or_path": “distilbert-base-uncased”,
“activation”: “gelu”,
“architectures”: [
“DistilBertForTokenClassification”
],
“attention_dropout”: 0.1,
“dim”: 768,
“dropout”: 0.1,
“hidden_dim”: 3072,
“id2label”: {
“0”: “LABEL_0”,
“1”: “LABEL_1”,
“2”: “LABEL_2”,
“3”: “LABEL_3”,
“4”: “LABEL_4”,
“5”: “LABEL_5”,
“6”: “LABEL_6”,
“7”: “LABEL_7”,
“8”: “LABEL_8”
},
“initializer_range”: 0.02,
“label2id”: {
“LABEL_0”: 0,
“LABEL_1”: 1,
“LABEL_2”: 2,
“LABEL_3”: 3,
“LABEL_4”: 4,
“LABEL_5”: 5,
“LABEL_6”: 6,
“LABEL_7”: 7,
“LABEL_8”: 8
},
“max_position_embeddings”: 512,
“model_type”: “distilbert”,
“n_heads”: 12,
“n_layers”: 6,
“pad_token_id”: 0,
“qa_dropout”: 0.1,
“seq_classif_dropout”: 0.2,
“sinusoidal_pos_embds”: false,
"tie_weights”: true,
“transformers_version”: “4.8.1”,
“vocab_size”: 30522
}
The answer is a bit trickier than expected[Huge credits to Niels Rogge].
Firstly, loading models in huggingface-transformers can be done in (at least) two ways:
AutoModel.from_pretrained('./my_model_own_custom_training.pth', from_tf=False)
AutoModelForTokenClassification.from_pretrained('./my_model_own_custom_training.pth', from_tf=False)
It seems that, according to the task at hand, different AutoModels subclasses need to be used. In this scenario I posted, it is the AutoModelForTokenClassification() that has to be used.
After that, a solution to obtain the predictions would be to do the following:
# forward pass
outputs = model(**encoding)
logits = outputs.logits
predictions = logits.argmax(-1)

convert_to_generator_like num_samples Attribute Error: 'int' object has no attribute 'shape'

I've written a custom generator using Keras sequence, but at the end of first epoch i got:
Attribute Error: Custom Generator object has no attribute 'shape'
Ubuntu 18.04
Cuda 10
Tried Tensorflow 1.13 & 1.14
seeing this page:
https://github.com/keras-team/keras/issues/12586
i tried changing
from keras.utils import Sequence
to
from tensorflow.python.keras.utils.data_utils import Sequence
but no luck!
class CustomGenerator(Sequence):
def __init__(self, ....):
...
# Preallocate memory
if mode == 'train' and self.crop_shape:
self.X = np.zeros((batch_size, crop_shape[0], crop_shape[1], 4), dtype='float32')
# edge
# self.X2 = np.zeros((batch_size, crop_shape[1], crop_shape[0], 3), dtype='float32')
self.Y1 = np.zeros((batch_size, crop_shape[0] // 4, crop_shape[1] // 4, self.n_classes), dtype='float32')
def on_epoch_end(self):
# Shuffle dataset for next epoch
c = list(zip(self.image_path_list, self.label_path_list, self.edge_path_list))
random.shuffle(c)
self.image_path_list, self.label_path_list, self.edge_path_list = zip(*c)
# Fix memory leak (tensorflow.python.keras bug)
gc.collect()
def __getitem__(self, index):
for n, (image_path, label_path,edge_path) in enumerate(
zip(self.image_path_list[index * self.batch_size:(index + 1) * self.batch_size],
self.label_path_list[index * self.batch_size:(index + 1) * self.batch_size],
self.edge_path_list[index * self.batch_size:(index + 1) * self.batch_size])):
image = cv2.imread(image_path, 1)
label = cv2.imread(label_path, 0)
edge = cv2.imread(edge_path, 0)
....
self.X[n] = image
self.Y1[n] = to_categorical(cv2.resize(label, (label.shape[1] // 4, label.shape[0] // 4)),
self.n_classes).reshape((label.shape[0] // 4, label.shape[1] // 4, -1))
self.Y2[n] = to_categorical(cv2.resize(label, (label.shape[1] // 8, label.shape[0] // 8)),
self.n_classes).reshape((label.shape[0] // 8, label.shape[1] // 8, -1))
self.Y3[n] = to_categorical(cv2.resize(label, (label.shape[1] // 16, label.shape[0] // 16)),
self.n_classes).reshape((label.shape[0] // 16, label.shape[1] // 16, -1))
return self.X, [self.Y1, self.Y2, self.Y3]
def __len__(self):
return math.floor(len(self.image_path_list) / self.batch_size)
def random_crop(image, edge, label, random_crop_size=(800, 1600)):
....
return image, label
The error is:
742/743 [============================>.] - ETA: 0s - loss: 1.8465 - conv6_cls_loss: 1.1261 - sub24_out_loss: 1.2478 - sub4_out_loss: 1.3827 - conv6_cls_categorical_accuracy: 0.6705 - sub24_out_categorical_accuracy: 0.6250 - sub4_out_categorical_accuracy: 0.5963Traceback (most recent call last):
File "/home/user/Desktop/Keras-ICNet/train1.py", line 75, in <module>
use_multiprocessing=True, shuffle=True, max_queue_size=10, initial_epoch=opt.epoch)
File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1433, in fit_generator
steps_name='steps_per_epoch')
File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_generator.py", line 322, in model_iteration
steps_name='validation_steps')
File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_generator.py", line 144, in model_iteration
shuffle=shuffle)
File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_generator.py", line 480, in convert_to_generator_like
num_samples = int(nest.flatten(data)[0].shape[0])
AttributeError: 'int' object has no attribute 'shape'
Looking at the stack trace,
num_samples = int(nest.flatten(data)[0].shape[0])
AttributeError: 'int' object has no attribute 'shape'
The data actually refers to the validation_data parameter passed in fit_generator. This is supposed to be a generator or tuple. My guess is this is passed as an array as a result of which nest.flatten(data)[0] returns an int and hence the error.

Training a Random Forest on Tensorflow

I am trying to train a tensorflow based random forest regression on numerical and continuos data.
When I try to fit my estimator it begins with the message below:
INFO:tensorflow:Constructing forest with params =
INFO:tensorflow:{'num_trees': 10, 'max_nodes': 1000, 'bagging_fraction': 1.0, 'feature_bagging_fraction': 1.0, 'num_splits_to_consider': 10, 'max_fertile_nodes': 0, 'split_after_samples': 250, 'valid_leaf_threshold': 1, 'dominate_method': 'bootstrap', 'dominate_fraction': 0.99, 'model_name': 'all_dense', 'split_finish_name': 'basic', 'split_pruning_name': 'none', 'collate_examples': False, 'checkpoint_stats': False, 'use_running_stats_method': False, 'initialize_average_splits': False, 'inference_tree_paths': False, 'param_file': None, 'split_name': 'less_or_equal', 'early_finish_check_every_samples': 0, 'prune_every_samples': 0, 'feature_columns': [_NumericColumn(key='Average_Score', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), _NumericColumn(key='lat', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), _NumericColumn(key='lng', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)], 'num_classes': 1, 'num_features': 2, 'regression': True, 'bagged_num_features': 2, 'bagged_features': None, 'num_outputs': 1, 'num_output_columns': 2, 'base_random_seed': 0, 'leaf_model_type': 2, 'stats_model_type': 2, 'finish_type': 0, 'pruning_type': 0, 'split_type': 0}
Then the process breaks down and I get a value error below:
ValueError: Shape must be at least rank 2 but is rank 1 for 'concat' (op: 'ConcatV2') with input shapes: [?], [?], [?], [] and with computed input tensors: input[3] = <1>.
This is the code I am using:
import tensorflow as tf
from tensorflow.contrib.tensor_forest.python import tensor_forest
from tensorflow.python.ops import resources
import pandas as pd
from tensorflow.contrib.tensor_forest.client import random_forest
from tensorflow.python.estimator.inputs import numpy_io
import numpy as np
def getFeatures():
Average_Score = tf.feature_column.numeric_column('Average_Score')
lat = tf.feature_column.numeric_column('lat')
lng = tf.feature_column.numeric_column('lng')
return [Average_Score,lat ,lng]
# Import hotel data
Hotel_Reviews=pd.read_csv("./DataMining/Hotel_Reviews.csv")
Hotel_Reviews_Filtered=Hotel_Reviews[(Hotel_Reviews.lat.notnull() |
Hotel_Reviews.lng.notnull())]
Hotel_Reviews_Filtered_Target = Hotel_Reviews_Filtered[["Reviewer_Score"]]
Hotel_Reviews_Filtered_Features = Hotel_Reviews_Filtered[["Average_Score","lat","lng"]]
#Preprocess the data
x=Hotel_Reviews_Filtered_Features.to_dict('list')
for key in x:
x[key] = np.array(x[key])
y=Hotel_Reviews_Filtered_Target.values
#specify params
params = tf.contrib.tensor_forest.python.tensor_forest.ForestHParams(
feature_colums= getFeatures(),
num_classes=1,
num_features=2,
regression=True,
num_trees=10,
max_nodes=1000)
#build the graph
graph_builder_class = tensor_forest.RandomForestGraphs
est=random_forest.TensorForestEstimator(
params, graph_builder_class=graph_builder_class)
#define input function
train_input_fn = numpy_io.numpy_input_fn(
x=x,
y=y,
batch_size=1000,
num_epochs=1,
shuffle=True)
est.fit(input_fn=train_input_fn, steps=500)
The variables x is a list of numpy array of shape (512470,):
{'Average_Score': array([ 7.7, 7.7, 7.7, ..., 8.1, 8.1, 8.1]),
'lat': array([ 52.3605759, 52.3605759, 52.3605759, ..., 48.2037451,
48.2037451, 48.2037451]),
'lng': array([ 4.9159683, 4.9159683, 4.9159683, ..., 16.3356767,
16.3356767, 16.3356767])}
The variable y is numpy array of shape (512470,1):
array([[ 2.9],
[ 7.5],
[ 7.1],
...,
[ 2.5],
[ 8.8],
[ 8.3]])
Force each array in x to be 2 dim using ndmin=2. Then the shapes should match and concat should be able to operate.

How to get all parameters of estimator in PySpark

I have a RandomForestRegressor, GBTRegressor and I'd like to get all parameters of them. The only way I found it could be done with several get methods like:
from pyspark.ml.regression import RandomForestRegressor, GBTRegressor
est = RandomForestRegressor()
est.getMaxDepth()
est.getSeed()
But RandomForestRegressor and GBTRegressor have different parameters so it's not a good idea to hardcore all that methods.
A workaround could be something like this:
get_methods = [method for method in dir(est) if method.startswith('get')]
params_est = {}
for method in get_methods:
try:
key = method[3:]
params_est[key] = getattr(est, method)()
except TypeError:
pass
Then output will be like this:
params_est
{'CacheNodeIds': False,
'CheckpointInterval': 10,
'FeatureSubsetStrategy': 'auto',
'FeaturesCol': 'features',
'Impurity': 'variance',
'LabelCol': 'label',
'MaxBins': 32,
'MaxDepth': 5,
'MaxMemoryInMB': 256,
'MinInfoGain': 0.0,
'MinInstancesPerNode': 1,
'NumTrees': 20,
'PredictionCol': 'prediction',
'Seed': None,
'SubsamplingRate': 1.0}
But I think there should be a better way to do that.
extractParamMap can be used to get all params from every estimator, for example:
>>> est = RandomForestRegressor()
>>> {param[0].name: param[1] for param in est.extractParamMap().items()}
{'numTrees': 20, 'cacheNodeIds': False, 'impurity': 'variance', 'predictionCol': 'prediction', 'labelCol': 'label', 'featuresCol': 'features', 'minInstancesPerNode': 1, 'seed': -5851613654371098793, 'maxDepth': 5, 'featureSubsetStrategy': 'auto', 'minInfoGain': 0.0, 'checkpointInterval': 10, 'subsamplingRate': 1.0, 'maxMemoryInMB': 256, 'maxBins': 32}
>>> est = GBTRegressor()
>>> {param[0].name: param[1] for param in est.extractParamMap().items()}
{'cacheNodeIds': False, 'impurity': 'variance', 'predictionCol': 'prediction', 'labelCol': 'label', 'featuresCol': 'features', 'stepSize': 0.1, 'minInstancesPerNode': 1, 'seed': -6363326153609583521, 'maxDepth': 5, 'maxIter': 20, 'minInfoGain': 0.0, 'checkpointInterval': 10, 'subsamplingRate': 1.0, 'maxMemoryInMB': 256, 'lossType': 'squared', 'maxBins': 32}
As described in How to print best model params in pyspark pipeline , you can get any model parameter that is available in the original JVM object of any model using the following structure
<yourModel>.stages[<yourModelStage>]._java_obj.<getYourParameter>()
All get-parameters are available here
https://spark.apache.org/docs/latest/api/java/org/apache/spark/ml/classification/RandomForestClassificationModel.html
For example, if you want to get MaxDepth of your RandomForest after cross-validation (getMaxDepth is not available in PySpark) you use
cvModel.bestModel.stages[-1]._java_obj.getMaxDepth()

How to get the train_scores to plot a learning curve without using the learning_curve fuction fo scikitlearn?

I have a dataset of 21 subjects with different number of samples each one.
I made a curve (check the figure). I remove: [10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30,32,34,36,38,40] samples from each subject. I am using StratifiedShuffleSplit with a 90% train_size and 10% test_size. This means:
when I remove 10 samples, 9 will be used for training and 1 for testing
when I remove 20 samples, 18 will be used for training and 2 for testing
when I remove 30 samples, 27 will be used for training and 3 for testing
when I remove 40 samples, 36 will be used for training and 4 for testing
This curve shows the accuracy(test_score) but NOT the train_score.
How can I plot the train_score without using the learning_curve function of scikit-learn? http://scikit-learn.org/stable/auto_examples/model_selection/plot_learning_curve.html
The code:
result_list = []
#LOADING .mat FILE
x=sio.loadmat('/home/curve.mat')['x']
s_y=sio.loadmat('/home/rocio/curve.mat')['y']
y=np.ravel(s_y)
#SENDING THE FILE TO PANDAS
df = pd.DataFrame(x)
df['label']=y
#SPECIFYING THE # OF SAMPLES TO BE REMOVED
for j in [10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30,32,34,36,38,40]:
df1 = pd.concat(g.sample(j) for idx, g in df.groupby('label'))
#TURNING THE DATAFRAME TO ARRAY
X = df1[[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19]].values
y = df1.label.values
#Cross-validation
clf = make_pipeline(preprocessing.RobustScaler(), neighbors.KNeighborsClassifier())
####################10x2 SSS####################
print("Cross-validation:10x10")
xSSSmean10 = []
for i in range(10):
sss = StratifiedShuffleSplit(2, test_size=0.1, random_state=i)
scoresSSS = model_selection.cross_val_score(clf, X, y, cv=sss.split(X, y))
xSSSmean10.append(scoresSSS.mean())
result_list.append(xSSSmean10)
print("")
StratifiedShuffleSplit.split returns two values: train and test. You can assign the value resulting from sss.split(X, y) to a tuple, say testtuple. Then you create a new tuple which is made only of train sets, traintuple, constructed as follows:
traintuple = (testtuple[0],testtuple[0])
then you calculate the accuracy on just the training set:
scoreSSS_train = model_selection.cross_val_score(clf, X, y, cv=traintuple)
In this way both training and testing are performed on the same set.
Append the mean of scoreSSS_train to a new empty list just like you do with xSSSmean10 and it should work (I could not test it, sorry).

Resources