Optimising FOR LOOP or alternative to it - python-3.x

I am using a FOR LOOP to calculate a simple probability on a dataset with approximately 500K rows of data.
For loop
class_ = 4
class_freq = Counter(list_[-1] for list_ in train_list) # Counter({5: 1476, 1: 1531, 4: 1562, 3: 1430, 2: 1498, 7: 1517, 6: 1486})
def cp(x, class_, freq_): # x is column index passed from another function
for row in train_list:
pos = 0
neg = 0
if row[x] == 1 and row[54] == class_:
pos+=1
else:
neg+=1
cal_0 = (neg + 0.1) / (class_freq[class_value] + 0.2)
cal_1 = (pos + 0.1) / (class_freq[class_value] + 0.2)
if prob_1 > prob_0:
return prob_1
else:
return prob_0
Train_list sample
[3050, 180, 4, 277, -3, 5782, 221, 242, 156, 2721, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]
[2818, 119, 19, 30, 10, 5213, 248, 220, 92, 4497, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2]
[3182, 115, 10, 553, 10, 4605, 237, 231, 124, 1768, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5]
[3024, 312, 18, 474, 177, 5785, 169, 224, 194, 4961, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2]
[3067, 32, 4, 30, -2, 6679, 219, 230, 147, 2947, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4]
[2716, 1, 10, 234, 27, 2100, 206, 222, 153, 5581, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4]
...
The FOR LOOP works well on small dataset (few hundred rows) as expected. Unfortunately, when I try to use it on 20K rows of data, the processing time take ages. I cannot imagine how long it will take to run 500K rows of data.
FOR LOOP is grossly bad in performance for large dataset. What is an alternative to this? Will Lambda improve processing speed? I appreciate advice and assistance here, thanks.
Edited:
Thanks to everyone comments, I have tried to work on another algorithm to replace the FOR LOOP.
def cp(x, class_, class_):
filtered_list = [t for t in train_list if t[54] == class_]
count_binary = Counter(binary[col] for binary in filtered_list)
binary_1 = count_binary[1]
binary_0 = count_binary[0]
cal_0 = (binary_0 + 0.1) / (class_freq[class_value] + 0.2)
cal_1 = (binary_1 + 0.1) / (class_freq[class_value] + 0.2)
if prob_1 > prob_0:
return prob_1
else:
return prob_0
I am still running the above code in my program and the process is not done yet - so can't tell if it is much efficient. I will appreciate if someone can provide their opinion on this new block of code.
FYI, if this is indeed a better and more efficient code, then the issue of processing speed is most likely on other parts of my code.

Related

How can I draw the Confusion Matrix when using image_dataset_from_directory in Tensorflow2.x?

My TF version is 2.9 and Python 3.8.
I have built an image binary classification CNN model and I am trying to get a confusion matrix.
The dataset structure is as follows.
train/
│------ benign/
│------ normal/
test/
│------ benign/
│------ normal/
The dataset configuration is as follows.
train_ds = tf.keras.utils.image_dataset_from_directory(
directory = train_data_dir,
labels="inferred",
validation_split=0.2,
subset="training",
seed=1337,
color_mode='grayscale',
image_size=(img_height, img_width),
batch_size=batch_size,
)
val_ds = tf.keras.utils.image_dataset_from_directory(
directory = train_data_dir,
labels="inferred",
validation_split=0.2,
subset="validation",
seed=1337,
color_mode='grayscale',
image_size=(img_height, img_width),
batch_size=batch_size,
)
test_ds = tf.keras.utils.image_dataset_from_directory(
directory = test_data_dir,
color_mode='grayscale',
seed=1337,
image_size=(img_height, img_width),
batch_size=batch_size,
)
I wrote the code referring to the following link to get the confusion matrix.
Reference Page
And this is my code about the confusion matrix.
predictions = model.predict(test_ds)
y_pred = []
y_true = []
# iterate over the dataset
for image_batch, label_batch in test_ds: # use dataset.unbatch() with repeat
# append true labels
y_true.append(label_batch)
# compute predictions
preds = model.predict(image_batch)
# append predicted labels
y_pred.append(np.argmax(preds, axis = - 1))
# convert the true and predicted labels into tensors
true_labels = tf.concat([item for item in y_true], axis = 0)
predicted_labels = tf.concat([item for item in y_pred], axis = 0)
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(true_labels, predicted_labels)
print(cm)
y_pred and y_true were obtained from test_ds as above, and the results of confusion matrix were as follows.
[[200 0]
[200 0]]
So I tried outputting true_labels and predicted_labels, and confirmed that predicted_labels are both 0 as follows.
print(true_labels)
<tf.Tensor: shape=(400,), dtype=int32, numpy=
array([0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0,
1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0,
0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0,
0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0,
1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0,
0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1,
0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1,
1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1,
1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1,
0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0,
1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0,
0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1,
0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0,
1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0,
1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0,
0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1,
1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0,
0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0,
0, 0, 1, 1])>
print(predicted_labels)
<tf.Tensor: shape=(400,), dtype=int64, numpy=
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0], dtype=int64)>
I'm not sure why predicted_labels are all zero.
But this is wrong. I think the following results are correct.
[[200 0]
[0 200]]
What is wrong? I've been struggling for a few days. Please please help me.
Thanks a lot.
In case of Image Binary Classification, threshold should be used to obtain predict label after model.predict(test_ds). I found that modifying the code in my question y_pred.append(np.argmax(preds, axis = - 1)) to y_pred.append(np.where(preds > threshold, 1, 0)) solved the problem. Hope it was helpful to someone.

How Can I declare constraints in Xpress IVE?

I am trying to write a model in Xpress IVE however I got
error101: Incompatible types for operator ('mpvar' * 'mpvar' not defined). error.
I tried to write this constraint but I couldn't make it.
The two consecutive characters on the string must be positioned to the neighboring
nodes of the grid.
I think, my model is true and all of my decision variables is true.
Can anyone help me about this issue?
Here is my code:
grid := 16
length := 8
!sample declarations section
declarations
! Declaring S and N array for the input
S: array(1..length) of integer
N: array(1..grid,1..grid) of integer
! Declaring decision variables
X: array(1..length, 1..grid) of mpvar
V: array(1..grid) of mpvar
C: array(1..grid,1..grid) of mpvar
W: real
constraint1, constraint2,constraint3: linctr
end-declarations
! Decision Variable Declaration
forall(i in 1..length, k in 1..grid) X(i,k) is_binary
forall(k in 1..grid) V(k) is_binary
forall(l in 1..grid) V(l) is_binary
forall(k in 1..grid, l in 1..grid) C(k,l) is_binary
!Input String
S:: [ 1, 0, 0, 1, 0, 1, 1, 0 ]
! Neighbours in the grid.
N:: [ 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0,
0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0]
! Finding consecutive 1's in the string
forall(i in 1..length-1) do
if S(i) = 1 and S(i+1) = 1
then W := W + 1
end-if
end-do
! Declaring Constraints
! Constraint 1
forall(k in 1..grid) constraint1 := sum(i in 1..length) X(i,k) <= 1
! Constraint 2
forall(i in 1..length) constraint2 := sum(k in 1..grid) X(i,k) = 1
!Constraint 3
forall( i in 1..length - 1 ) constraint3 := (sum(j in 1..grid)(sum(k in 1..grid) N(k,j) * X(i,k) * X(i + 1,j))) = 1
Since you are creating the product of two variables in Constraint3, your problem is no longer linear but now quadratic (thus non-linear). This means you have to use the mmnl (non-linear) Mosel module. Putting
uses "mmnl"
at the top of your model should do that. It enables multiplication of decision variables.
Note that due to the quadratic terms in it, your Constraint3 will no longer be of type linctr. It will now be nlctr and you have to adjust this in the declaration.

Unable to use custom dataset: AttributeError: 'list' object has no attribute 'keys'

I am trying to train a classification model with a custom dataset using Huggingface Transformers, but I keep getting errors. Last error seems solvable but I somehow I do not understand how.
What am I doing wrong?
I encode data with
model_name = "dbmdz/bert-base-italian-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name, do_lower_case = True)
def encode_data(texts):
return tokenizer.batch_encode_plus(
texts,
add_special_tokens=True,
return_attention_mask=True,
padding = True,
truncation=True,
max_length=200,
return_tensors='pt'
)
Then I create my datasets with
import torch
class my_Dataset(torch.utils.data.Dataset):
def __init__(self, encodings, labels):
self.encodings = encodings
self.labels = torch.tensor(labels)
def __getitem__(self, idx):
item = {key: val[idx] for key, val in self.encodings.items()}
item['labels'] = self.labels[idx]
print(item)
return item
def __len__(self):
return len(self.labels)
So I have
encoded_data_train = encode_data(df_train['text'].tolist())
encoded_data_val = encode_data(df_val['text'].tolist())
encoded_data_test = encode_data(df_test['text'].tolist())
dataset_train = my_Dataset(encoded_data_train, df_train['labels'].tolist())
dataset_val = my_Dataset(encoded_data_val, df_val['labels'].tolist())
dataset_test = my_Dataset(encoded_data_test, df_test['labels'].tolist())
Then I initiate my Trainer with
from transformers import AutoConfig, TrainingArguments, DataCollatorWithPadding, Trainer
training_args = TrainingArguments(
output_dir='/trial',
learning_rate=1e-6,
do_train=True,
do_eval=True,
evaluation_strategy='epoch',
num_train_epochs=10,
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
warmup_steps=0,
weight_decay=0.2,
logging_dir="./logs",
)
num_labels = len(label_dict)
model = AutoModelForSequenceClassification.from_pretrained(model_name,num_labels = num_labels)
trainer = Trainer(
model=model,
args=training_args,
data_collator=DataCollatorWithPadding(tokenizer),
tokenizer= tokenizer,
train_dataset=dataset_train,
eval_dataset=dataset_val,
)
and finally I train
trainer.train()
Here is the error I get
AttributeErrorTraceback (most recent call last)
<ipython-input-22-5d018b4b061d> in <module>
----> 1 trainer.train()
/opt/conda/lib/python3.8/site-packages/transformers/trainer.py in train(self, resume_from_checkpoint, trial, **kwargs)
1032 self.control = self.callback_handler.on_epoch_begin(self.args, self.state, self.control)
1033
-> 1034 for step, inputs in enumerate(epoch_iterator):
1035
1036 # Skip past any already trained steps if resuming training
/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py in __next__(self)
433 if self._sampler_iter is None:
434 self._reset()
--> 435 data = self._next_data()
436 self._num_yielded += 1
437 if self._dataset_kind == _DatasetKind.Iterable and \
/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py in _next_data(self)
473 def _next_data(self):
474 index = self._next_index() # may raise StopIteration
--> 475 data = self._dataset_fetcher.fetch(index) # may raise StopIteration
476 if self._pin_memory:
477 data = _utils.pin_memory.pin_memory(data)
/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py in fetch(self, possibly_batched_index)
45 else:
46 data = self.dataset[possibly_batched_index]
---> 47 return self.collate_fn(data)
/opt/conda/lib/python3.8/site-packages/transformers/data/data_collator.py in __call__(self, features)
116
117 def __call__(self, features: List[Dict[str, Union[List[int], torch.Tensor]]]) -> Dict[str, torch.Tensor]:
--> 118 batch = self.tokenizer.pad(
119 features,
120 padding=self.padding,
/opt/conda/lib/python3.8/site-packages/transformers/tokenization_utils_base.py in pad(self, encoded_inputs, padding, max_length, pad_to_multiple_of, return_attention_mask, return_tensors, verbose)
2558 if self.model_input_names[0] not in encoded_inputs:
2559 raise ValueError(
-> 2560 "You should supply an encoding or a list of encodings to this method"
2561 f"that includes {self.model_input_names[0]}, but you provided {list(encoded_inputs.keys())}"
2562 )
AttributeError: 'list' object has no attribute 'keys'
What I am doing wrong?
I also tried using
import torch
from torch.utils.data import TensorDataset
dataset_train = TensorDataset(encoded_data_train['input_ids'], encoded_data_train['attention_mask'], torch.tensor(df_train['labels'].tolist()))
dataset_test = TensorDataset(encoded_data_test['input_ids'], encoded_data_test['attention_mask'], torch.tensor(df_test['labels'].tolist()))
dataset_val = TensorDataset(encoded_data_val['input_ids'], encoded_data_val['attention_mask'], torch.tensor(df_val['labels'].tolist()))
getting the same error. I am using torch == 1.7.1 and transformers == 4.4.2
EDIT FOLLOWING FIRST COMMENT.
Here is an example of 5-dimensional encoded_data_train
{'input_ids': tensor([[ 102, 927, 9534, 30936, 2729, 29505, 123, 11805, 7427, 10587,
9703, 927, 9534, 30936, 2719, 10118, 2321, 784, 366, 113,
3627, 7763, 9433, 223, 148, 30937, 4051, 3400, 4011, 20005,
6079, 784, 366, 7809, 11967, 192, 3497, 784, 366, 7809,
11967, 192, 3497, 784, 366, 7809, 11967, 192, 3497, 784,
366, 7809, 11967, 192, 3497, 714, 927, 9534, 30936, 2729,
29505, 123, 11805, 7427, 260, 480, 1556, 152, 7113, 20734,
151, 143, 784, 366, 113, 3627, 7763, 19638, 159, 1233,
1674, 5442, 119, 9433, 223, 148, 30937, 135, 642, 829,
2250, 223, 743, 151, 143, 14572, 13799, 1767, 28915, 12057,
12342, 784, 366, 113, 9703, 927, 9534, 30936, 9480, 10125,
8418, 3726, 8379, 2955, 119, 1006, 30946, 8897, 123, 6423,
115, 1601, 544, 30938, 3013, 160, 30941, 137, 124, 14118,
30936, 193, 2701, 19214, 1457, 2701, 1864, 409, 19727, 13305,
6423, 115, 10389, 13908, 127, 4092, 14079, 1601, 2009, 24286,
23419, 103],
[ 102, 10587, 2130, 182, 8022, 2719, 10118, 132, 30976, 30943,
17961, 5123, 3292, 3627, 11532, 2719, 10118, 132, 30976, 30943,
17961, 5123, 3292, 3627, 11532, 2719, 10118, 201, 17961, 5123,
3292, 3627, 11532, 6354, 480, 1556, 28951, 17586, 113, 12699,
135, 480, 1556, 7347, 677, 135, 3110, 103, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 102, 2719, 10118, 6729, 6530, 10754, 11752, 10272, 11752, 119,
4200, 209, 30944, 19919, 2201, 5754, 642, 838, 15657, 6156,
30941, 148, 30937, 2201, 7305, 642, 6331, 3348, 30937, 170,
148, 30937, 2463, 103, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 102, 780, 30938, 18834, 2336, 2719, 10118, 8823, 784, 366,
113, 135, 1543, 2080, 1233, 20734, 316, 1767, 1542, 2771,
152, 25899, 119, 8823, 119, 4472, 784, 366, 113, 137,
1031, 510, 7763, 123, 21478, 3200, 111, 985, 119, 1670,
4999, 290, 30941, 119, 6951, 12042, 106, 1542, 135, 245,
30942, 26609, 199, 983, 119, 261, 28040, 8142, 148, 30937,
150, 143, 917, 1621, 7161, 111, 26609, 8217, 3723, 12510,
290, 30941, 119, 8886, 30934, 9798, 106, 204, 30942, 5807,
155, 1176, 213, 12057, 189, 387, 4953, 214, 2643, 4429,
123, 11224, 3096, 193, 143, 8823, 387, 2353, 2009, 193,
982, 176, 18789, 299, 8292, 553, 9798, 8886, 30934, 20853,
490, 4802, 19222, 642, 3829, 1455, 26321, 167, 148, 30937,
11498, 123, 103, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 102, 10587, 491, 5462, 7664, 22790, 2719, 10118, 8498, 408,
24484, 112, 491, 5462, 7664, 22790, 3671, 135, 341, 1011,
299, 18239, 113, 143, 575, 8498, 265, 669, 113, 3850,
16465, 480, 283, 28951, 810, 21223, 103, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0]])}
and accordingly the result of dataset_train.__getitem__(0)
{'input_ids': tensor([ 102, 927, 9534, 30936, 2729, 29505, 123, 11805, 7427, 10587,
9703, 927, 9534, 30936, 2719, 10118, 2321, 784, 366, 113,
3627, 7763, 9433, 223, 148, 30937, 4051, 3400, 4011, 20005,
6079, 784, 366, 7809, 11967, 192, 3497, 784, 366, 7809,
11967, 192, 3497, 784, 366, 7809, 11967, 192, 3497, 784,
366, 7809, 11967, 192, 3497, 714, 927, 9534, 30936, 2729,
29505, 123, 11805, 7427, 260, 480, 1556, 152, 7113, 20734,
151, 143, 784, 366, 113, 3627, 7763, 19638, 159, 1233,
1674, 5442, 119, 9433, 223, 148, 30937, 135, 642, 829,
2250, 223, 743, 151, 143, 14572, 13799, 1767, 28915, 12057,
12342, 784, 366, 113, 9703, 927, 9534, 30936, 9480, 10125,
8418, 3726, 8379, 2955, 119, 1006, 30946, 8897, 123, 6423,
115, 1601, 544, 30938, 3013, 160, 30941, 137, 124, 14118,
30936, 193, 2701, 19214, 1457, 2701, 1864, 409, 19727, 13305,
6423, 115, 10389, 13908, 127, 4092, 14079, 1601, 2009, 24286,
23419, 103]), 'token_type_ids': tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0]), 'attention_mask': tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1]), 'labels': tensor(5)}
The error is in the __getitem__ function.
As you see in dataset_train.__getitem__(0) we get the dictionary with inputids
and all other keys. The below fix worked for me:
def __getitem__(self, idx):
input_ids = torch.tensor(self.encodings['input_ids'])
target_ids = torch.tensor(self.labels[idx])
return {"input_ids": input_ids, "labels": target_ids}

How to delete values from a list using a for loop in my code below?

b=[2, 3, 0, 5, 0, 7, 0, 0, 0, 11, 0, 13, 0, 0, 0, 17, 0, 19, 0, 0, 0, 23, 0, 0, 0, 0, 0, 29, 0, 31, 0, 0, 0, 0, 0, 37, 0, 0, 0, 41, 0, 43, 0, 0, 0, 47, 0, 0, 0, 0, 0, 53, 0, 0, 0, 0, 0, 59, 0, 61, 0, 0, 0, 0, 0, 67, 0, 0, 0, 71, 0, 73, 0, 0, 0, 0, 0, 79, 0, 0, 0, 83, 0, 0, 0, 0, 0, 89, 0, 0, 0, 0, 0, 0, 0, 97, 0, 0, 0]
#b is a list of primes with composite numbers turned into zero!
#code to remove zeroes
for k in b:
if k==0:
del k
print(b)
I'm welcome to any suggestions that are reasonably simple since I'm a beginner and self-taught. If there's anything I'm doing horribly wrong, please do point that out as well. Thank You
I would use list comprehension to remove all occurences of 0:
b=[2, 3, 0, 5, 0, 7, 0, 0, 0, 11, 0, 13, 0, 0, 0, 17, 0, 19, 0, 0, 0, 23, 0, 0, 0, 0, 0, 29, 0, 31, 0, 0, 0, 0, 0, 37, 0, 0, 0, 41, 0, 43, 0, 0, 0, 47, 0, 0, 0, 0, 0, 53, 0, 0, 0, 0, 0, 59, 0, 61, 0, 0, 0, 0, 0, 67, 0, 0, 0, 71, 0, 73, 0, 0, 0, 0, 0, 79, 0, 0, 0, 83, 0, 0, 0, 0, 0, 89, 0, 0, 0, 0, 0, 0, 0, 97, 0, 0, 0]
#b is a list of primes with composite numbers turned into zero!
# Iterates through all the values and returns them unless they are 0
b = [x for x in b if not x == 0]
print(b)

Where should I put the csv test file in PyTorch dataloader?

Let say I have test.csv
filename
1 a.jpg
2 b.jpb
then I have test image folder
/test
test_dataset = torchvision.datasets.ImageFolder(root= path + 'test/',transform=trans)
this will bring all the test files
If I want to make a submission file after done training, how should I link test folder's name and submission.csv file name?
%%time
from torch.autograd import Variable
results = []
with torch.no_grad():
model.eval()
print('start')
for num, data in enumerate(test_loader):
#print(num)
imgs, label = data
imgs,labels = imgs.to(device), label.to(device)
test = Variable(imgs)
output = model(test)
ps = torch.exp(output)
top_p, top_class = ps.topk(1, dim = 1)
#print(top_class)
results += top_class.cpu().numpy().tolist()
predictions = np.array(results).flatten()
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
0,
1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
How should I know which result is from which file?

Resources