SO has an answer about how to check the total # of params from the model:
pytorch_total_params = sum(p.numel() for p in model.parameters())
However, how does one check the total # of params from
the state_dict?
state_dict = torch.load(model_path, map_location='cpu')?
You can count the number of saved entries in the state_dict:
sum(p.numel() for p in state_dict.values())
However, there's a snag here: a state_dict stores both parameters and persistent buffers (e.g., BatchNorm's running mean and var). There's no way (AFAIK) to tell them apart from the state_dict itself, you'll need to load them into the model and use sum(p.numel() for p in model.parameters() to count only the parameters.
For instance, if you checkout resnet50
from torchvision.models import resnet50
model = resnet50(pretrained=True)
state_dict = torch.load('~/.torch/models/resnet50-19c8e357.pth')
num_parameters = sum(p.numel() for p in model.parameters())
num_state_dict = sum(p.numel() for p in state_dict.values())
print('num parameters = {}, stored in state_dict = {}, diff = {}'.format(num_parameters, num_state_dict, num_state_dict - num_parameters))
Resulting with
num parameters = 25557032, stored in state_dict = 25610152, diff = 53120
As you can see there can be quite a gap between the two values.
Related
I'm trying to train Detectron2 on a custom dataset that I annotated with coco-annotator. After training I wanted to predict Instances of my Image, but I dont get any shown.
Training:
from detectron2.engine import DefaultTrainer
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))
cfg.DATASETS.TRAIN = ("TrashTron_train",)
cfg.DATASETS.TEST = ("TrashTron_val",)
# cfg.DATASETS.TEST = ()
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml") # Let training initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.00025 # pick a good LR
cfg.SOLVER.MAX_ITER = 300 # 300 iterations seems good enough for this toy dataset; you will need to train longer for a practical dataset
cfg.SOLVER.STEPS = [] # do not decay learning rate
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 512 # faster, and good enough for this toy dataset (default: 512)
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 24 # only has one class (ballon). (see https://detectron2.readthedocs.io/tutorials/datasets.html#update-the-config-for-new-datasets)
# NOTE: this config means the number of classes, but a few popular unofficial tutorials incorrect uses num_classes+1 here.
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()
Prediction:
test_data = [{'1191.jpg': '/content/datasets/val/1191.jpg',
'image_id': 1308}]
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7 # set a custom testing threshold
cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth") # path to the model we just trained
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 24
predictor = DefaultPredictor(cfg)
outputs = predictor(im)
# print(outputs["instances"].pred_densepose)
im = cv2.imread(test_data[0]["1191.jpg"])
v = Visualizer(im[:, :, ::-1],
metadata=MetadataCatalog.get(cfg.DATASETS.TRAIN[0]),
scale=0.5,
instance_mode=ColorMode.IMAGE_BW)
out = v.draw_instance_predictions(outputs["instances"].to("cpu"))
img = cv2.cvtColor(out.get_image()[:, :, ::-1], cv2.COLOR_RGBA2RGB)
plt.imshow(img)
The corresponding image is shown, but no instances.
Any suggestions? The overall evaluation scores aren't that great, but I picked the best class and there I also dont get any predictions...
I would try to lower the threshold, since you have said that overall training scores were not great.
In this answer in official repo, following code is suggested to change the threshold:
cfg.MODEL.TENSOR_MASK.SCORE_THRESH_TEST = 0.5
at another answer at the same thread, other thresholds are modified as well.
cfg.MODEL.RETINANET.SCORE_THRESH_TEST = args.confidence_threshold
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = args.confidence_threshold
cfg.MODEL.PANOPTIC_FPN.COMBINE.INSTANCES_CONFIDENCE_THRESH = args.confidence_threshold
They are already 2 posts about this topics, but they have not been updated for the recent TF2.1 release...
In brief, I've got a lot of tif images to read and parse with a specific pipeline.
import tensorflow as tf
import numpy as np
files = # a list of str
labels = # a list of int
n_unique_label = len(np.unique(labels))
gen = functools.partial(generator, file_list=files, label_list=labels, param1=x1, param2=x2)
dataset = tf.data.Dataset.from_generator(gen, output_types=(tf.float32, tf.int32))
dataset = dataset.map(lambda b, c: (b, tf.one_hot(c, depth=n_unique_label)))
This processing works well. Nevertheless, I need to parallelize the file parsing part, trying the following solution:
files = # a list of str
files = tensorflow.data.Dataset.from_tensor_slices(files)
def wrapper(file_path):
parser = partial(tif_parser, param1=x1, param2=x2)
return tf.py_function(parser, inp=[file_path], Tout=[tf.float32])
dataset = files.map(wrapper, num_parallel_calls=2)
The difference is that I parse one file at a time here with the parser function. However, but it does not work:
File "loader.py", line 643, in tif_parser
image = numpy.array(Image.open(file_path)).astype(float)
File "python3.7/site-packages/PIL/Image.py", line 2815, in open
fp = io.BytesIO(fp.read())
AttributeError: 'tensorflow.python.framework.ops.EagerTensor' object has no attribute 'read'
[[{{node EagerPyFunc}}]] [Op:IteratorGetNextSync]
As far as I understand, the tif_parser function does not receive a string but an (unevaluated) tensor. At now, this function is fairly simple:
def tif_parser(file_path, param1=1, param2=2):
image = numpy.array(Image.open(file_path)).astype(float)
image /= 255.0
return image
Here is how I have proceeded
dataset = tf.data.Dataset.from_tensor_slices((files, labels))
def wrapper(file_path, label):
import functools
parser = functools.partial(tif_parser, param1=x1, param2=x2)
return tf.data.Dataset.from_generator(parser, (tf.float32, tf.int32), args=(file_path, label))
dataset = dataset.interleave(wrapper, cycle_length=tf.data.experimental.AUTOTUNE)
# The labels are converted to 1-hot vectors, could be integrated in tif_parser
dataset = dataset.map(lambda i, l: (i, tf.one_hot(l, depth=unique_label_count)))
dataset = dataset.shuffle(buffer_size=file_count, reshuffle_each_iteration=True)
dataset = dataset.batch(batch_size=batch_size, drop_remainder=False)
dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
Concretely, I generate a data set every time the parser is called. The parser is run cycle_length time at each call, meaning that cycle_length images are read at once. This is suited to my specific case, because I cannot load all the images in memory. I am unsure whether the prefetch is used correctly or not here.
i'd like to create few Random forest model within a for
loop in which i move the number of estimators. Train each of them on the same data sample and measure the accuracy of each.
This is my beginning code:
r = range(0, 100)
for i in r:
RF_model_%i = RandomForestClassifier(criterion="entropy", n_estimators=i, oob_score=True)
RF_model_%i.fit(X_train, y_train)
y_predict = RF_model_%i.predict(X_test)
accuracy_%i = accuracy_score(y_test, y_predict)
what i like to understand is:
how can i put the i-parameter on the name of each model (in
order to recognise them)?
after tained and calculated the
accuracy score each of the i-models how can i put the result on
a list within the for-loop?
You can do something like this:
results = [] # init
r = range(0, 100)
for i in r:
RF_model_%i = RandomForestClassifier(criterion="entropy", n_estimators=i, oob_score=True)
RF_model_%i.id = i # dynamically add fields to objects
RF_model_%i.fit(X_train, y_train)
y_predict = RF_model_%i.predict(X_test)
accuracy_%i = accuracy_score(y_test, y_predict)
results.append(accuracy_%i) # put the result on a list within the for-loop
I want to retrain pre-trained word embeddings in Python using Gensim. The pre-trained embeddings I want to use is Google's Word2Vec in the file GoogleNews-vectors-negative300.bin.
Following Gensim's word2vec tutorial, "it’s not possible to resume training with models generated by the C tool, load_word2vec_format(). You can still use them for querying/similarity, but information vital for training (the vocab tree) is missing there."
Therefore I can't use the KeyedVectors and for training a model the tutorial suggests to use:
model = gensim.models.Word2Vec.load('/tmp/mymodel')
model.train(more_sentences)
(https://rare-technologies.com/word2vec-tutorial/)
However, when I try this:
from gensim.models import Word2Vec
model = Word2Vec.load('data/GoogleNews-vectors-negative300.bin')
I get an error message:
1330 # Because of loading from S3 load can't be used (missing readline in smart_open)
1331 if sys.version_info > (3, 0):
-> 1332 return _pickle.load(f, encoding='latin1')
1333 else:
1334 return _pickle.loads(f.read())
UnpicklingError: invalid load key, '3'.
I didn't find a way to convert the binary google new file into a text file properly, and even if so I'm not sure whether that would solve my problem.
Does anyone have a solution to this problem or knows about a different way to retrain pre-trained word embeddings?
The Word2Vec.load() method can only load full models in gensim's native format (based on Python object-pickling) – not any other binary/text formats.
And, as per the documentation's note that "it’s not possible to resume training with models generated by the C tool", there's simply not enough information in the GoogleNews raw-vectors files to reconstruct the full working model that was used to train them. (That would require both some internal model-weights, not saved in that file, and word-frequency-information for controlling sampling, also not saved in that file.)
The best you could do is create a new Word2Vec model, then patch some/all of the GoogleNews vectors into it before doing your own training. This is an error-prone process with no real best-practices and many caveats about the interpretation of final results. (For example, if you bring in all the vectors, but then only re-train a subset using only your own corpus & word-frequencies, the more training you do – making the word-vectors better fit your corpus – the less such re-trained words will have any useful comparability to retained untrained words.)
Essentially, if you can look at the gensim Word2Vec source & work-out how to patch-together such a frankenstein-model, it may be appropriate. But there's no built-in support or handy off-the-shelf recipes that make it easy, because it's an inherently murky process.
I have already answered it here .
Save the google news model as text file in wor2vec format using gensim.
Refer this answer to save it as text file
Then try this code .
import os
import pickle
import numpy as np
import gensim
from gensim.models import Word2Vec, KeyedVectors
from gensim.models.callbacks import CallbackAny2Vec
import operator
os.mkdir("model_dir")
# class EpochSaver(CallbackAny2Vec):
# '''Callback to save model after each epoch.'''
# def __init__(self, path_prefix):
# self.path_prefix = path_prefix
# self.epoch = 0
# def on_epoch_end(self, model):
# list_of_existing_files = os.listdir(".")
# output_path = 'model_dir/{}_epoch{}.model'.format(self.path_prefix, self.epoch)
# try:
# model.save(output_path)
# except:
# model.wv.save_word2vec_format('model_dir/model_{}.bin'.format(self.epoch), binary=True)
# print("number of epochs completed = {}".format(self.epoch))
# self.epoch += 1
# list_of_total_files = os.listdir(".")
# saver = EpochSaver("my_finetuned")
# function to load vectors from existing model.
# I am loading glove vectors from a text file, benefit of doing this is that I get complete vocab of glove as well.
# If you are using a previous word2vec model I would recommed save that in txt format.
# In case you decide not to do it, you can tweak the function to get vectors for words in your vocab only.
def load_vectors(token2id, path, limit=None):
embed_shape = (len(token2id), 300)
freqs = np.zeros((len(token2id)), dtype='f')
vectors = np.zeros(embed_shape, dtype='f')
i = 0
with open(path, encoding="utf8", errors='ignore') as f:
for o in f:
token, *vector = o.split(' ')
token = str.lower(token)
if len(o) <= 100:
continue
if limit is not None and i > limit:
break
vectors[token2id[token]] = np.array(vector, 'f')
i += 1
return vectors
# path of text file of your word vectors.
embedding_name = "word2vec.txt"
data = "<training data(new line separated tect file)>"
# Dictionary to store a unique id for each token in vocab( in my case vocab contains both my vocab and glove vocab)
token2id = {}
# This dictionary will contain all the words and their frequencies.
vocab_freq_dict = {}
# Populating vocab_freq_dict and token2id from my data.
id_ = 0
training_examples = []
file = open("{}".format(data),'r', encoding="utf-8")
for line in file.readlines():
words = line.strip().split(" ")
training_examples.append(words)
for word in words:
if word not in vocab_freq_dict:
vocab_freq_dict.update({word:0})
vocab_freq_dict[word] += 1
if word not in token2id:
token2id.update({word:id_})
id_ += 1
# Populating vocab_freq_dict and token2id from glove vocab.
max_id = max(token2id.items(), key=operator.itemgetter(1))[0]
max_token_id = token2id[max_id]
with open(embedding_name, encoding="utf8", errors='ignore') as f:
for o in f:
token, *vector = o.split(' ')
token = str.lower(token)
if len(o) <= 100:
continue
if token not in token2id:
max_token_id += 1
token2id.update({token:max_token_id})
vocab_freq_dict.update({token:1})
with open("vocab_freq_dict","wb") as vocab_file:
pickle.dump(vocab_freq_dict, vocab_file)
with open("token2id", "wb") as token2id_file:
pickle.dump(token2id, token2id_file)
# converting vectors to keyedvectors format for gensim
vectors = load_vectors(token2id, embedding_name)
vec = KeyedVectors(300)
vec.add(list(token2id.keys()), vectors, replace=True)
# setting vectors(numpy_array) to None to release memory
vectors = None
params = dict(min_count=1,workers=14,iter=6,size=300)
model = Word2Vec(**params)
# using build from vocab to build the vocab
model.build_vocab_from_freq(vocab_freq_dict)
# using token2id to create idxmap
idxmap = np.array([token2id[w] for w in model.wv.index2entity])
# Setting hidden weights(syn0 = between input layer and hidden layer) = your vectors arranged accoring to ids
model.wv.vectors[:] = vec.vectors[idxmap]
# Setting hidden weights(syn0 = between hidden layer and output layer) = your vectors arranged accoring to ids
model.trainables.syn1neg[:] = vec.vectors[idxmap]
model.train(training_examples, total_examples=len(training_examples), epochs=model.epochs)
output_path = 'model_dir/final_model.model'
model.save(output_path)
i created a single node with 3 inputs and one output with bias 0 and no activation function.
as far as i understand, the only thing that happens here is a matrix multiplication between the input vector and the randomly initialized weights but when i do the multiplication myself with the same inputs and weights i get a different outcome? what am i missing/doing wrong?
thanks in advance!
i base my calculation on some code provided here
here is the code:
def example_code(self):
import tensorflow as tf
data = [[1.0,2.0,3.0]]
x = tf.placeholder(tf.float32,shape=[1,3],name="mydata")
node = tf.layers.Dense(units=1)
y = node(x)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
print("input: "+str(data))
outcome = sess.run(y,feed_dict={x:data})
#print("outcome from tensorflow: "+str(outcome))
weights = node.get_weights()[0]
bias = node.get_weights()[1]
print("weights: "+str(weights))
print("bias: "+str(bias))
print("outcome from tensorflow: " + str(outcome))
outcome = tf.matmul(data,weights)
print("manually calculated outcome: "+str(sess.run(outcome)))
output from code:
input: [[1.0, 2.0, 3.0]]
weights: [[ 0.72705185] [-0.70188504] [ 0.5336163 ]]
bias: [0.]
outcome from tensorflow: [[-1.3463312]]
manually calculated outcome: [[0.9241307]]
The problem is that tf.layers is not using uses is not using your session sess. This in turn results in different initializations for the weights, hence the two different values. tf.layers ends up using tf.keras.backend.get_session() to retrieve the session used for initialization and retrieval of weights (node.get_weights()). tf.keras.backend.get_session() tries to use the default session if there is one, and if there is not then it creates its own session. In this case, sess is not configured as default session (only tf.InteractiveSession gets automatically configured as default session on construction). The simplest fix is to use tf.Session in the recommended way, as a context manager:
def example_code(self):
import tensorflow as tf
with tf.Session() as sess:
data = [[1.0,2.0,3.0]]
x = tf.placeholder(tf.float32,shape=[1,3],name="mydata")
node = tf.layers.Dense(units=1)
y = node(x)
init = tf.global_variables_initializer()
sess.run(init)
print("input: "+str(data))
outcome = sess.run(y,feed_dict={x:data})
#print("outcome from tensorflow: "+str(outcome))
weights = node.get_weights()[0]
bias = node.get_weights()[1]
print("weights: "+str(weights))
print("bias: "+str(bias))
print("outcome from tensorflow: " + str(outcome))
outcome = tf.matmul(data,weights)
print("manually calculated outcome: "+str(sess.run(outcome)))
This will set sess as default session, and also it will make sure its resources are freed when the function is finished (which was another issue in your code). If for whatever reason you want to use some session as default but do not want to close it with the context manager, you can just use as_default():
def example_code(self):
import tensorflow as tf
sess = tf.Session():
with sess.as_default():
data = [[1.0,2.0,3.0]]
x = tf.placeholder(tf.float32,shape=[1,3],name="mydata")
node = tf.layers.Dense(units=1)
y = node(x)
init = tf.global_variables_initializer()
sess.run(init)
print("input: "+str(data))
outcome = sess.run(y,feed_dict={x:data})
#print("outcome from tensorflow: "+str(outcome))
weights = node.get_weights()[0]
bias = node.get_weights()[1]
print("weights: "+str(weights))
print("bias: "+str(bias))
print("outcome from tensorflow: " + str(outcome))
outcome = tf.matmul(data,weights)
print("manually calculated outcome: "+str(sess.run(outcome)))
# You need to manually ensure that the session gets closed after
sess.close()