How to update PyTorch model parameters (tensors) after averaging them? - pytorch

I'm currently working on a distributed federated learning infrastructure and am trying to implement PyTorch. For this I also need federated averaging which averages the retrieved parameters from all the nodes and then passes those to a next training round.
The gathering of the parameters looks like this:
def RPC_get_parameters(data, model):
"""
Get parameters from nodes
"""
with torch.no_grad():
for parameters in model.parameters():
# store parameters in dict
return {"parameters": parameters}
The averaging function which happens at the central server looks like this:
# stores results from RPC_get_parameters() in results
results = client.get_results(task_id=task.get("id"))
# averaging of returned parameters
global_sum = 0
global_count = 0
for output in results:
global_sum += output["parameters"]
global_count += len(global_sum)
#
averaged_parameters = global_sum/global_count
#
new_params = {'averaged_parameters': averaged_parameters}
Now my question is, how do you update all the parameters (tensors) in Pytorch from this? I tried a few things and they usually returned errors like "Value Error: can't optimize a non-leaf tensor" when inserting new_params into the optimizer where usually model.parameters() go optimizerD = optim.SGD(new_params, lr=0.01, momentum = 0.5). So how do I actually update the model so it uses the averaged parameters?
Thank you!
https://github.com/simontkl/torch-vantage6/blob/fed_avg-w/local_dp/v6-ppsdg-py/master.py

I think the most convenient way to work with parameters (outside the SGD context) is using the state_dict of the model.
new_params = OrderedDict()
n = len(clients) # number of clients
for client_model in clients:
sd = client_model.state_dict() # get current parameters of one client
for k, v in sd.items():
new_params[k] = new_params.get(k, 0) + v / n
After that new_params is a state_dict (you can load it using .load_state_dict) with the average weights of the clients.

Related

Why do genetic algorithms converge to end up with a population that is identical?

I was implementing a genetic algorithm with tf keras, where i manualy modify the weight, make the gene cross over, all that. Ive found that after a few docen generations, the predictions of all the network are essentialy identical, and after a few more generations the predictions are exactly the same. trying to google the problem i found this page
that mentions the problem in a conceptual level but i cant understand how this would happen if im manualy creating genetic diverity every generation.
def model_mutate(weights,var):
for i in range(len(weights)):
for j in range(len(weights[i])):
if( random.uniform(0,1) < 0.2): #learing rate of 15%
change = np.random.uniform(-var,var,weights[i][j].shape)
weights[i][j] += change
return weights
def crossover_brains(parent1, parent2):
global brains
weight1 = parent1.get_weights()
weight2 = parent2.get_weights()
new_weight1 = weight1
new_weight2 = weight2
gene = random.randint(0,len(new_weight1)-1) #we change a random weight
#or set of weights
new_weight1[gene] = weight2[gene]
new_weight2[gene] = weight1[gene]
q=np.asarray([new_weight1,new_weight2],dtype=object)
return q
def evolve(best_fit1,best_fit2):
global generation
global best_brain
global best_brain2
mutations=[]
for i in range(total_brains//2):
cross_weights=model_crossover(best_fit1,best_fit2)
mutation1=model_mutate(cross_weights[0],0.5)
mutation2=model_mutate(cross_weights[1],0.5)
mutations.append(mutation1)
mutations.append(mutation2)
for i in range(total_brains):
brains[i].set_weights(mutations[i])
generation+=1
def find_best_fit():
fitness=np.loadtxt("fitness.txt")
print(f"fitness average {np.mean(fitness)} in generation {generation}")
print(f"fitness max is {np.max(fitness)} in generation {generation} ")
fitness_t.append(np.mean(fitness))
maxfit1=np.max(fitness)
best_fit1=np.where(fitness==maxfit1)[0]
fitness[best_fit1]=0
maxfit2=np.max(fitness)
best_fit2=np.where(fitness==maxfit2)[0]
if len(best_fit1)>1: #this is a band_aid for when several indiviuals are the same
# this would lead to best_fit(1,2) being an array of indeces
best_fit1=best_fit1[0]
if len(best_fit2)>1:
best_fit2=best_fit2[0]
return int(best_fit1),int(best_fit2)
bf1,bf2=find_best_fit()
evolve(bf1,bf2)
This is the code im using to set the modified weights to the existing keras models (mostly not mine, i dont understand it enough to have created this myself)
if keras is working how i think its working, then i dont see how this would converge to anything that does not maximize fitness, further more, it seems to be decreasing over time.

How to use Keras Conv2D layers with OpenAI gym?

Using OpenAI's gym environment, I've created my own environment in which the observation space of box type, and the shape is (21,21,1).
The intention is to use a keras Conv2D layer as the model's input. Ideally, the shape going into this model would be (None,21,21,1), with None representing the batch size. Kera's documentation is here: https://keras.io/api/layers/convolution_layers/convolution2d/
The issue I'm having is that an extra dimension is being required while checking the shaping. Because of this, the shape it expects is (None,1,21,21,1). This is prohibiting me from using MaxPooling layers in the model. After investigating the keras RL library, this is due to two functions that are adding this dimensionality.
The first function is found in memory.py, where a current observation is put into a list and returned as such. Here:
def get_recent_state(self, current_observation):
"""Return list of last observations
# Argument
current_observation (object): Last observation
# Returns
A list of the last observations
"""
# This code is slightly complicated by the fact that subsequent observations might be
# from different episodes. We ensure that an experience never spans multiple episodes.
# This is probably not that important in practice but it seems cleaner.
state = [current_observation]
idx = len(self.recent_observations) - 1
for offset in range(0, self.window_length - 1):
current_idx = idx - offset
current_terminal = self.recent_terminals[current_idx - 1] if current_idx - 1 >= 0 else False
if current_idx < 0 or (not self.ignore_episode_boundaries and current_terminal):
# The previously handled observation was terminal, don't add the current one.
# Otherwise we would leak into a different episode.
break
state.insert(0, self.recent_observations[current_idx])
while len(state) < self.window_length:
state.insert(0, zeroed_observation(state[0]))
return state
The second function is called just after and computes the Q values based on the recent observation. It creates a list of the state when passing onto "compute_batch_q_values".
def compute_q_values(self, state):
q_values = self.compute_batch_q_values([state]).flatten()
assert q_values.shape == (self.nb_actions,)
return q_values
I understand that one extra dimension should be added to represent the batch size, but is it twice? Can anyone explain why this is or how to use Conv2d layers with OpenAI gym?
Thanks.

Proper way to log things when using Pytorch Lightning DDP

I was wondering what is the proper way of logging metrics when using DDP. I noticed that if I want to print something inside validation_epoch_end it will be printed twice when using 2 GPUs. I was expecting validation_epoch_end to be called only on rank 0 and to receive the outputs from all GPUs, but I am not sure this is correct anymore. Therefore I have several questions:
validation_epoch_end(self, outputs) - When using DDP does every subprocess receive the data processed from the current GPU or data processed from all GPUs, i.e. does the input parameter outputs contains the outputs of the entire validation set, from all GPUs?
If outputs is GPU/process specific what is the proper way to calculate any metric on the entire validation set in validation_epoch_end when using DDP?
I understand that I can solve the printing by checking self.global_rank == 0 and printing/logging only in that case, however I am trying to get a deeper understanding of what I am printing/logging in this case.
Here is a code snippet from my use case. I would like to be able to report f1, precision and recall on the entire validation dataset and I am wondering what is the correct way of doing it when using DDP.
def _process_epoch_outputs(self,
outputs: List[Dict[str, Any]]
) -> Tuple[torch.Tensor, torch.Tensor]:
"""Creates and returns tensors containing all labels and predictions
Goes over the outputs accumulated from every batch, detaches the
necessary tensors and stacks them together.
Args:
outputs (List[Dict])
"""
all_labels = []
all_predictions = []
for output in outputs:
for labels in output['labels'].detach():
all_labels.append(labels)
for predictions in output['predictions'].detach():
all_predictions.append(predictions)
all_labels = torch.stack(all_labels).long().cpu()
all_predictions = torch.stack(all_predictions).cpu()
return all_predictions, all_labels
def validation_epoch_end(self, outputs: List[Dict[str, Any]]) -> None:
"""Logs f1, precision and recall on the validation set."""
if self.global_rank == 0:
print(f'Validation Epoch: {self.current_epoch}')
predictions, labels = self._process_epoch_outputs(outputs)
for i, name in enumerate(self.label_columns):
f1, prec, recall, t = metrics.get_f1_prec_recall(predictions[:, i],
labels[:, i],
threshold=None)
self.logger.experiment.add_scalar(f'{name}_f1/Val',
f1,
self.current_epoch)
self.logger.experiment.add_scalar(f'{name}_Precision/Val',
prec,
self.current_epoch)
self.logger.experiment.add_scalar(f'{name}_Recall/Val',
recall,
self.current_epoch)
if self.global_rank == 0:
print((f'F1: {f1}, Precision: {prec}, '
f'Recall: {recall}, Threshold {t}'))
Questions
validation_epoch_end(self, outputs) - When using DDP does every
subprocess receive the data processed from the current GPU or data
processed from all GPUs, i.e. does the input parameter outputs
contains the outputs of the entire validation set, from all GPUs?
Data processed from the current GPU only, outputs are not synchronized, there is only backward synchronization (gradients are synchronized during training and distributed to replicas of models residing on each GPU).
Imagine that all of the outputs were passed from 1000 GPUs to this poor master, it could give it an OOM very easily
If outputs is GPU/process specific what is the proper way to calculate
any metric on the entire validation set in validation_epoch_end when
using DDP?
According to documentation (emphasis mine):
When validating using a accelerator that splits data from each batch
across GPUs, sometimes you might need to aggregate them on the master
GPU for processing (dp, or ddp2).
And here is accompanying code (validation_epoch_end would receive accumulated data across multiple GPUs from single step in this case, also see the comments):
# Done per-process (GPU)
def validation_step(self, batch, batch_idx):
x, y = batch
y_hat = self.model(x)
loss = F.cross_entropy(y_hat, y)
pred = ...
return {'loss': loss, 'pred': pred}
# Gathered data from all processes (per single step)
# Allows for accumulation so the whole data at the end of epoch
# takes less memory
def validation_step_end(self, batch_parts):
gpu_0_prediction = batch_parts.pred[0]['pred']
gpu_1_prediction = batch_parts.pred[1]['pred']
# do something with both outputs
return (batch_parts[0]['loss'] + batch_parts[1]['loss']) / 2
def validation_epoch_end(self, validation_step_outputs):
for out in validation_step_outputs:
# do something with preds
Tips
Focus on per-device calculations and as small number of between-GPU transfers as possible
Inside validation_step (or training_step if that's what you want, this is general) calculate f1, precision, recall and whatever else on a per-batch basis
Returns those values (say, as a dict). Now you will return 3 numbers from each device instead of (batch, outputs) (which could be significantly larger)
Inside validation_step_end get those 3 values (actually (2, 3) if you have 2 GPUs) and sum/take mean of them and return 3 values
Now validation_epoch_end will get (steps, 3) values that you can use to accumulate
It would be even better if instead of operating on list of values during validation_epoch_end you could accumulate them in another 3 values (say you have a lot of validation steps, the list could grow too large), but this should be enough.
AFAIK PyTorch-Lightning doesn't do this (e.g. instead of adding to list, apply some accumulator directly), but I might be mistaken, so any correction would be great.

Best practices for high performance input pipeline using only tf.data API (no feed_dict)

The official TensorFlow Performance Guide states the following:
While feeding data using a feed_dict offers a high level of
flexibility, in general feed_dict does not provide a scalable
solution. If only a single GPU is used, the difference between the
tf.data API and feed_dict performance may be negligible. Our
recommendation is to avoid using feed_dict for all but trivial
examples. In particular, avoid using feed_dict with large inputs.
However, avoiding the use of feed_dict entirely appears to be impossible. Consider the following setup with train, validation, and test datasets.
ds = tf.data.Dataset
n_files = 1000 # total number of tfrecord files
split = int(.67 * n_files)
files = ds.zip((ds.range(n_files),ds.list_files("train/part-r-*")))
train_files = files.filter(lambda a, b: a < split).map(lambda a,b: b)
validation_files = files.filter(lambda a, b: a >= split).map(lambda a,b: b)
test_files = ds.list_files("test/part-r-*")
A common method to parse the datasets might look like the following:
def setup_dataset(self, file_ds, mode="train"):
data = file_ds.apply(tf.contrib.data.parallel_interleave(
tf.data.TFRecordDataset,
cycle_length=4,
sloppy=True,
buffer_output_elements=self.batch_size * 8,
prefetch_input_elements=self.batch_size * 8
))
if mode == "train":
data = data.map(self.train_data_parser)
else:
data = data.map(self.test_data_parser)
return data
Then instead of feeding the individual features through a feed_dict in session.run(), you would create a reusable iterator with either Iterator.from_structure() or Iterator.from_string_handle(). I will show an example with the former, but you run into the same problem either way.
train = self.setup_dataset(train_files)
self.ops["template_iterator"] = tf.data.Iterator.from_structure(train.output_types, train.output_shapes)
self.ops["next_batch"] = self.ops["template_iterator"].get_next(name="next_batch")
self.ops["train_init"] = self.ops["template_iterator"].make_initializer(train)
validation = self.setup_dataset(validation_files)
self.ops["validation_init"] = self.ops["template_iterator"].make_initializer(validation)
This all works great, but what am I supposed to do with the test dataset? The test dataset will not contain the label feature(s) and therefore not conform to the same output_types and output_shapes as the train and validation datasets.
I would ideally like to restore from a SavedModel and initialize the test dataset rather than serve the model over an API.
What is the trick that I am missing to incorporate test dataset during inference?
I have my datasets and iterators set up for training and inference like this:
# Train dataset
images_train = tf.placeholder(tf.float32, train_images.shape)
labels_train = tf.placeholder(tf.float32, train_masks.shape)
dataset_train = tf.data.Dataset.from_tensor_slices({"images": images_train, "masks": labels_train})
dataset_train = dataset_train.batch(MINIBATCH)
dataset_train = dataset_train.map(lambda x: map_helper(x, augmentation), num_parallel_calls=8)
dataset_train = dataset_train.shuffle(buffer_size=10000)
iterator_train = tf.data.Iterator.from_structure(dataset_train.output_types, dataset_train.output_shapes)
training_init_op = iterator_train.make_initializer(dataset_train)
batch_train = iterator_train.get_next()
# Inference dataset
images_infer = tf.placeholder(tf.float32, shape=[None] + list(valid_images.shape[1:]))
labels_infer = tf.placeholder(tf.float32, shape=[None] + list(valid_masks.shape[1:]))
dataset_infer = tf.data.Dataset.from_tensor_slices({"images": images_infer, "masks": labels_infer})
dataset_infer = dataset_infer.batch(MINIBATCH)
iterator_infer = tf.data.Iterator.from_structure(dataset_infer.output_types, dataset_infer.output_shapes)
infer_init_op = iterator_infer.make_initializer(dataset_infer)
batch_infer = iterator_infer.get_next()
Training
Initialise the iterator for training using training_init_op
sess.run(training_init_op, feed_dict={images_train: train_images, labels_train: train_masks})
Validation
Initialise the inference iterator for validation using infer_init_op
sess.run(infer_init_op, feed_dict={images_infer: images_val, labels_infer: masks_val})
Test
Initialise the inference iterator for testing using infer_init_op. This is a bit hacky, but I create an array with zeros where the labels would go and use the same iterator I used for validation
sess.run(infer_init_op, feed_dict={images_infer: images_test, labels_infer: np.zeros(images_test.shape)})
Alternatively, you could create 3 different dataset/iterators for train/validation/test

How to merger NaiveBayesClassifier object in NLTK

I am working on a project using the NLTK toolkit. With the hardware I have, I am able to run the classifier object on a small data set. So, I divided the data into smaller chunks and running the classifier object in them while storing all these individual object in a pickle file.
Now for testing I need to have the whole object as one to get better result. So my question is how can I combine these objects into one.
objs = []
while True:
try:
f = open(picklename,"rb")
objs.extend(pickle.load(f))
f.close()
except EOFError:
break
Doing this does not work. And it gives the error TypeError: 'NaiveBayesClassifier' object is not iterable.
NaiveBayesClassifier code :
classifier = nltk.NaiveBayesClassifier.train(training_set)
I am not sure about the exact format of your data, but you can not simply merge different classifiers. The Naive Bayes classifier stores a probability distribution based on the data it was trained on, and you can not merge probability distributions without access to the original data.
If you look at the source code here: http://www.nltk.org/_modules/nltk/classify/naivebayes.html
an instance of the classifier stores:
self._label_probdist = label_probdist
self._feature_probdist = feature_probdist
these are calculated in the train method using relative frequency counts. (e.g P(L_1) = (# of L1 in training set) / (# labels in training set). To combine the two, you would want to get (# of L1 in Train 1 + Train 2)/(# of labels in T1 + T2).
However, the naive bayes procedure isn't too hard to implement from scratch, especially if you follow the 'train' source code in the link above. Here is an outline, using the NaiveBayes source code
Store 'FreqDist' objects for each subset of the data for the labels and features.
label_freqdist = FreqDist()
feature_freqdist = defaultdict(FreqDist)
feature_values = defaultdict(set)
fnames = set()
# Count up how many times each feature value occurred, given
# the label and featurename.
for featureset, label in labeled_featuresets:
label_freqdist[label] += 1
for fname, fval in featureset.items():
# Increment freq(fval|label, fname)
feature_freqdist[label, fname][fval] += 1
# Record that fname can take the value fval.
feature_values[fname].add(fval)
# Keep a list of all feature names.
fnames.add(fname)
# If a feature didn't have a value given for an instance, then
# we assume that it gets the implicit value 'None.' This loop
# counts up the number of 'missing' feature values for each
# (label,fname) pair, and increments the count of the fval
# 'None' by that amount.
for label in label_freqdist:
num_samples = label_freqdist[label]
for fname in fnames:
count = feature_freqdist[label, fname].N()
# Only add a None key when necessary, i.e. if there are
# any samples with feature 'fname' missing.
if num_samples - count > 0:
feature_freqdist[label, fname][None] += num_samples - count
feature_values[fname].add(None)
# Use pickle to store label_freqdist, feature_freqdist,feature_values
Combine those using their built-in 'add' method. This will allow you to get the relative frequency across all the data.
all_label_freqdist = FreqDist()
all_feature_freqdist = defaultdict(FreqDist)
all_feature_values = defaultdict(set)
for file in train_labels:
f = open(file,"rb")
all_label_freqdist += pickle.load(f)
f.close()
# Combine the default dicts for features similarly
Use the 'estimator' to create a probability distribution.
estimator = ELEProbDist()
label_probdist = estimator(all_label_freqdist)
# Create the P(fval|label, fname) distribution
feature_probdist = {}
for ((label, fname), freqdist) in all_feature_freqdist.items():
probdist = estimator(freqdist, bins=len(all_feature_values[fname]))
feature_probdist[label, fname] = probdist
classifier = NaiveBayesClassifier(label_probdist, feature_probdist)
The classifier will not combine the counts across all the data and produce what you need.

Resources