Running out of RAM in Google Colab - keras

I run the following code in Colab. my array is not too long. when I run this, after 50000 iteration, it goes out of RAM. Could you please help? the lenght of the data is limited. The RAM starts to go up and finally reach to its max limit.
`
import numpy as np
datamigrantsentiment = np.empty(len(data), dtype=int)
for i in range(0,len(data)):
if i%1000 ==0 :
print(i)
!rm -rf model
model = build_model(base_layers.PREDICT)
model.load_weights(path+'goemotions/model_checkpoint')
if datamigrantslistnort[i]:
results = model.predict(x=[data[i]],verbose=0)
#print('')
#print('{}:'.format(text))
labels = np.flip(np.argsort(results[0]))
label = LABELS[labels[0]]
label = SENTIMENT_MAP[label]
#print('{}: {}'.format(label, results[0][labels[0]]))
datasentiment[i]=label
`
Whats the problem with the code?

Related

How long does load_dataset take time in huggingface?

I want to pre-train a T5 model using huggingface. The first step is training the tokenizer with this code:
import datasets
from t5_tokenizer_model import SentencePieceUnigramTokenizer
vocab_size = 32_000
input_sentence_size = None
# Initialize a dataset
dataset = datasets.load_dataset("oscar", name="unshuffled_deduplicated_fa", split="train")
tokenizer = SentencePieceUnigramTokenizer(unk_token="<unk>", eos_token="</s>", pad_token="<pad>")
# Build an iterator over this dataset
def batch_iterator(input_sentence_size=None):
if input_sentence_size is None:
input_sentence_size = len(dataset)
batch_length = 100
for i in range(0, input_sentence_size, batch_length):
yield dataset[i: i + batch_length]["text"]
# Train tokenizer
tokenizer.train_from_iterator(
iterator=batch_iterator(input_sentence_size=input_sentence_size),
vocab_size=vocab_size,
show_progress=True,
)
# Save files to disk
tokenizer.save("./persian-t5-base/tokenizer.json")
For the downloading part the message is:
Downloading and preparing dataset oscar/unshuffled_deduplicated_fa (download: 9.74 GiB, generated: 37.24 GiB, post-processed: Unknown size, total: 46.98 GiB) to /root/.cache/huggingface/datasets/oscar/unshuffled_deduplicated_fa/1.0.0/...
I am running it on Google Colab Pro (with High Ram setting and on TPU). However, it's about 2 hours and the execution line is still on load_datset
what is doing? is it normal for load_dataset to take so much time? Should I interrupt it an run it again?

Receiving coordinates from inference Pytorch

I'm trying to get the coordinates of the pixels inside of a mask that is generated by Pytorches DefaultPredictor, to later on get the polygon corners and use this in my application.
However, DefaultPredictor produced a tensor of pred_masks, in the following format: [False, False ... False], ... [False, False, .. False]
Where the length of each individual list is length of the image, and the number of total lists is the height of the image.
Now, as I need to get the pixel coordinates that are inside of the mask, the simple solution seemed to be looping through the pred_masks, checking the value and if == "True" creating tuples of these and adding them to a list. However, as we are talking about images with width x height of about 3200 x 1600, this is a relatively slow process (~4 seconds to loop through a single 3200x1600, yet as there are quite some objects for which I need to get the inference in the end - this will end up being incredibly slow).
What would be the smarter way to get the the coordinates (mask) of the detected object using the pytorch (detectron2) model?
Please find my code below for reference:
from __future__ import print_function
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.data import MetadataCatalog
from detectron2.data.datasets import register_coco_instances
import cv2
import time
# get image
start = time.time()
im = cv2.imread("inputImage.jpg")
# Create config
cfg = get_cfg()
cfg.merge_from_file("detectron2_repo/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5 # Set threshold for this model
cfg.MODEL.WEIGHTS = "model_final.pth" # Set path model .pth
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1
cfg.MODEL.DEVICE='cpu'
register_coco_instances("dataset_test",{},"testval.json","Images_path")
test_metadata = MetadataCatalog.get("dataset_test")
# Create predictor
predictor = DefaultPredictor(cfg)
# Make prediction
outputs = predictor(im)
#Loop through the pred_masks and check which ones are equal to TRUE, if equal, add the pixel values to the true_cords_list
outputnump = outputs["instances"].pred_masks.numpy()
true_cords_list = []
x_length = range(len(outputnump[0][0]))
#y kordinaat on range number
for y_cord in range(len(outputnump[0])):
#x cord
for x_cord in x_length:
if str(outputnump[0][y_cord][x_cord]) == "True":
inputcoords = (x_cord,y_cord)
true_cords_list.append(inputcoords)
print(str(true_cords_list))
end = time.time()
print(f"Runtime of the program is {end - start}") # 14.29468035697937
//
EDIT:
After changing the for loop partially to compress - I've managed to reduce the runtime of the for loop by ~3x - however, ideally I would like to receive this from the predictor itself if possible.
y_length = len(outputnump[0])
x_length = len(outputnump[0][0])
true_cords_list = []
for y_cord in range(y_length):
x_cords = list(compress(range(x_length), outputnump[0][y_cord]))
if x_cords:
for x_cord in x_cords:
inputcoords = (x_cord,y_cord)
true_cords_list.append(inputcoords)
The problem is easily solvable with sufficient knowledge about NumPy or PyTorch native array handling, which allows 100x speedups compared to Python loops. You can study the NumPy library, and PyTorch tensors are similar to NumPy in behaviour.
How to get indices of values in NumPy:
import numpy as np
arr = np.random.rand(3,4) > 0.5
ind = np.argwhere(arr)[:, ::-1]
print(arr)
print(ind)
In your particular case this will be
ind = np.argwhere(outputnump[0])[:, ::-1]
How to get indices of values in PyTorch:
import torch
arr = torch.rand(3, 4) > 0.5
ind = arr.nonzero()
ind = torch.flip(ind, [1])
print(arr)
print(ind)
[::-1] and .flip are used to inverse the order of coordinates from (y, x) to (x, y).
NumPy and PyTorch even allow checking simple conditions and getting the indices of values that meet these conditions, for further understanding see the according NumPy docs article
When asking, you should provide links for your problem context. This question is actually about Facebook object detector, where they provide a nice demo Colab notebook.

GAN Model Summary Pytorch using TensorBoard?

Is there a way can I visualize the complete training loop for the GAN architecture in TensorBoard using Pytorch? I think it's possible using TF, but I am having a hard time to figure out one using Pytorch.
You can use TensorboardX for this.
You can make use of SummaryWriter from TensorboardX to create an event file in a given directory and add summaries and events to it.
The code below is an example that you can use but you have to add in the loss values, the ground truth images and the generated images yourself. I commented where they would have to go.
from tensorboardX import SummaryWriter
import torchvision.utils as vutils
import numpy as np
REPORT_EVERY_ITER = 100
SAVE_IMAGE_EVERY_ITER = 1000
if __name__ == "__main__":
writer = SummaryWriter()
gen_losses = []
dis_losses = []
iter_no = 0
// looping over the batches in the environment
for batch_v in iterate_batches(envs):
// getting the outputs
// getting the generators loss
// getting the discriminators loss
iter_no += 1
// save the loss values for both generators and the discriminator every 100 steps
if iter_no % REPORT_EVERY_ITER == 0:
log.info(
"Iter %d: gen_loss=%.3e, dis_loss=%.3e",
iter_no,
np.mean(gen_losses),
np.mean(dis_losses),
)
writer.add_scalar("gen_loss", np.mean(gen_losses), iter_no)
writer.add_scalar("dis_loss", np.mean(dis_losses), iter_no)
gen_losses = []
dis_losses = []
// save the images being produced from both the ground truth and the generator
// it is saved every 1000 iterations
if iter_no % SAVE_IMAGE_EVERY_ITER == 0:
// save the generated images from the generator
writer.add_image(
"fake",
vutils.make_grid(gen_output_v.data[:64], normalize=True),
iter_no
)
// add the ground truth images here
// these will be the same throughout the cycle
writer.add_image(
"real",
vutils.make_grid(batch_v.data[:64], normalize=True),
iter_no
)
To view the results just run the command: tensorboard --logdir runs in the same directory where you ran the model training(runs contains the results from the training). A link will be shown which you can go to view the plots such as the one below. If you want to run Tensorboard on a remote server then you would have to add in the command --bind_all in the command line to access it from the outside.
Viewing the generated images
Viewing the loss values

Faster K-Means Clustering in TensorFlow

Dear TensorFlow Community,
I'm training a classifier with tf.contrib.factorization.KMeansClustering,
but the training goes really slow, and only uses 1% of my GPU.
However, my 4 CPU cores are hitting about 35% use constantly.
Is it the case that K-Means is written more for the CPU than the GPU?
Is there a way I can shift more of the computation to the GPU, or some
other approach to speed up training?
Below is my script for training (Python3).
Thank you for your time.
import tensorflow as tf
def parser(record):
features={
'feats': tf.FixedLenFeature([], tf.string),
}
parsed = tf.parse_single_example(record, features)
feats = tf.convert_to_tensor(tf.decode_raw(parsed['feats'], tf.float64))
return {'feats': feats}
def my_input_fn(tfrecords_path):
dataset = (
tf.data.TFRecordDataset(tfrecords_path)
.map(parser)
.batch(1024)
)
iterator = dataset.make_one_shot_iterator()
batch_feats = iterator.get_next()
return batch_feats
### SPEC FUNCTIONS ###
train_spec_kmeans = tf.estimator.TrainSpec(input_fn = lambda: my_input_fn('/home/ubuntu/train.tfrecords') , max_steps=10000)
eval_spec_kmeans = tf.estimator.EvalSpec(input_fn = lambda: my_input_fn('/home/ubuntu/eval.tfrecords') )
### INIT ESTIMATOR ###
KMeansEstimator = tf.contrib.factorization.KMeansClustering(
num_clusters=500,
feature_columns = [tf.feature_column.numeric_column(
key='feats',
dtype=tf.float64,
shape=(377,),
)],
use_mini_batch=True)
### TRAIN & EVAL ###
tf.estimator.train_and_evaluate(KMeansEstimator, train_spec_kmeans, eval_spec_kmeans)
Best,
Josh
Here's my best answer so far with time information, building off of Eliethesaiyan's answer and link to docs.
My original Dataset codeblock and performance:
dataset = (
tf.data.TFRecordDataset(tfrecords_path)
.map(parse_fn)
.batch(1024)
)
real 1m36.171s
user 2m57.756s
sys 0m42.304s
Eliethesaiyan's answer (prefetch + num_parallel_calls)
dataset = (
tf.data.TFRecordDataset(tfrecords_path)
.map(parse_fn,num_parallel_calls=multiprocessing.cpu_count())
.batch(1024)
.prefetch(1024)
)
real 0m41.450s
user 1m33.120s
sys 0m18.772s
From the docs using map_and_batch + num_parallel_batches + prefetch:
dataset = (
tf.data.TFRecordDataset(tfrecords_path)
.apply(
tf.contrib.data.map_and_batch(
map_func=parse_fn,
batch_size=1024,
num_parallel_batches=multiprocessing.cpu_count()
)
)
.prefetch(1024)
)
real 0m32.855s
user 1m11.412s
sys 0m10.408s
one of the thing that i saw that increases gpu and cpu usage, is using prefetch on the dataset.It keeps the dataset producer fetch the data while the model is also consuming the previous batch therefore maximizing resource usage. Also specifying the max of your cpu would speed up the process.
I would restructure it this way
dataset = (
tf.data.TFRecordDataset(tfrecords_path)
.map(parser,num_parallel_calls=multiprocessing.cpu_count())
.batch(1024)
)
dataset = dataset.prefetch(1024)
here is a nice guide of best practice when it comes to use TfRecords here

Plot clusters from LDA Gensim with Bokeh

I apologise in advance as I cannot reproduce the dataset I'm working with. So I am just going to describe steps and hope someone is familiar with the whole process.
I'm trying to use LDA Gensim to extract topics from a list of text documents.
from gensim.models import LdaModel
from gensim.corpora import Dictionary
I build dictionary and corpus:
dictionary = Dictionary(final_docs)
corpus = [dictionary.doc2bow(doc) for doc in final_docs]
where final_docs is a list of lists with cleaned tokens for each text like this:
final_docs = [['cat','dog','animal'],['school','university','education'],...['music','dj','pop']]
then I initiate the model like this:
# Set training parameters:
num_topics = 60
chunksize = 100
passes = 20
iterations = 400
eval_every = None
# Make an index to word dictionary
temp = dictionary[0] # This is only to "load" the dictionary.
id2word = dictionary.id2token
model = LdaModel(corpus=corpus, id2word=id2word, chunksize=chunksize, \
alpha='auto', eta='auto', \
iterations=iterations, num_topics=num_topics, \
passes=passes, eval_every=eval_every)
I can print topics and terms (10 most important). And they make sense. So it seems working fine.
for idx in range(n_topics):
print("Topic #%s:" % idx, model.print_topic(idx, 10))
BUT I struggle to plot all the documents as clusters using Bokeh. (And I really need Bokeh because I compare the same plot from different models). I know I have to reduce dimensionality to 2. And I try to do it using CountVectorizer and then T-sne:
from sklearn.feature_extraction.text import CountVectorizer
docs_vect = [' '.join(txt) for txt in final_docs]
cvectorizer = CountVectorizer(min_df=6, max_df=0.50, max_features=10000, stop_words=stop)
cvz = cvectorizer.fit_transform(docs_vect)
X_lda = model.fit_transform(cvz)
But I get this error: AttributeError: 'LdaModel' object has no attribute 'fit_transform'
I'm definitely doing something wrong with CountVectorizer. Could anyone help me out?

Resources