I am planning to use Opacus to implement differential privacy in my federated learning model but I have a very basic doubt that I would love to have cleared before that.
So as far as my understanding goes, using Opacus, we use an optimizer like DPSGD that adds differential noise to each batch of each client’s dataset while they are in “local training”. And in federated learning, we train client models for a few “local epochs” before sending their weights out to a central server for aggregation, and we add differential noise before sending out the model weights.
So my question is, why do we use DPSGD to add noise to every single batch of every single client dataset during local training when we could just add noise to the local weights before they are sent out? Why do we not let the local training epochs happen as is and simply add noise to the outbound weights at the time of departure? What am I missing?
Related
I fine-tuned two bert-base models, initialized with different weights, on the same dataset. I then attempted to combine my pretrained models via a shared linear layer. Supposed there is no problem in my code, is there a possibility that this combination performs worse during training and hence on a test set than the individual models? - This is my situation.
No. The shared linear layer is essentially an ensemble machine learning method, which combines two "weak" models into a single stronger model. The parameters for this combination are learned to optimize performance on the training set, so unless the shared layer is designed in such a way that it doesn't actually utilize the input features, its performance should always be at least as good as the worse of the two ensembled models on the training set. This is because at a minimum, the shared layer should be able to learn to output exactly the result of the better model and ignore the other model. Of course, it would be reasonable to achieve worse testing performance as the distribution of the data may differ.
Some causes of your issue may be:
Initializing in a local optimum
Different activation function in shared layer
Other parameter settings
I builted some network based on LSTM. I tuneded parameters. The results are shown in the figure and are not impressive.
How to understand what is bad? Is the dataset bad or the network is not well built?
Since validation loss decreased initially and later increased what you're experiencing is model overfitting.
Since training loss kept decreasing, your model has learnt training set excessively and now model is not generalizing well. Due to this validation loss increased.
To avoid overfitting, you need to regularize your model. You can use L1 or L2 regularization techniques. Additionally, you can also try dropout in your model.
Now coming to your question:
If the dataset is of good quality i.e. it is annotated well and it surely has features which could give result, then dataset and model hand-in-hand decides the quality of the predictions.
Since you're using RNNs that consists a good numbers of parameters, make sure that dataset is also huge to avoid RNNs overfitting on a small dataset. If available dataset is small, start with a small deep learning with less parameters (you can build a small neural network) and gradually scale up the model until you're satisfied with the prediction scores.
You can also refer this: https://towardsdatascience.com/rnn-training-tips-and-tricks-2bf687e67527
I have a Neural Network with five inputs for a classification task. Two inputs out of those five are very important and have a direct relationship to the classification task. Therefore, I need to prioritize those two inputs within the network and give less priority to the other three. Is there a way in the neural network to facilitate my requirement?
If training works well, the NN should automatically pick up what's most important for your classification. That's the entire point of a NN (or ML in general); so that you don't have to manually tell it what's more important and what's not. After learning, you can verify that the model indeed does learn the correct order of importance between the features.
You can use any model explanation technique for this. ELI5, SHAP or LIME are some examples. All these will tell you if your model did indeed learn that the features that you know are important is actually important to the network.
You probably shouldn't try to manually incorporate such biases into the network (unless you have a very good reason for doing so, like incorporating spatial information of images via CNNs). Trust the learning xD
I've been working with the MLPClassifier for a while and I think I had a wrong interpretation of what the function is doing for the whole time and I think I got it right now, but I am not sure about that. So I will summarize my understanding and it would be great if you could add your thoughts on the right understanding.
So with the MLPClassifier we are building a neural network based on a training dataset. Setting early_stopping = True it is possible to use a validation dataset within the training process in order to check whether the network is working on a new set as well. If early_stopping = False, no validation within he process is done. After one has finished building, we can use the fitted model in order to predict on a third dataset if we wish to.
What I was thiking before is, that doing the whole training process a validation dataset is being taken aside anways with validating after every epoch.
I'm not sure if my question is understandable, but it would be great if you could help me to clear my thoughts.
The sklearn.neural_network.MLPClassifier uses (a variant of) Stochastic Gradient Descent (SGD) by default. Your question could be framed more generally as how SGD is used to optimize the parameter values in a supervised learning context. There is nothing specific to Multi-layer Perceptrons (MLP) here.
So with the MLPClassifier we are building a neural network based on a training dataset. Setting early_stopping = True it is possible to use a validation dataset within the training process
Correct, although it should be noted that this validation set is taken away from the original training set.
in order to check whether the network is working on a new set as well.
Not quite. The point of early stopping is to track the validation score during training and stop training as soon as the validation score stops improving significantly.
If early_stopping = False, no validation within [t]he process is done. After one has finished building, we can use the fitted model in order to predict on a third dataset if we wish to.
Correct.
What I was thiking before is, that doing the whole training process a validation dataset is being taken aside anways with validating after every epoch.
As you probably know by now, this is not so. The division of the learning process into epochs is somewhat arbitrary and has nothing to do with validation.
I have a machine learning model built that tries to predict weather data, and in this case I am doing a prediction on whether or not it will rain tomorrow (a binary prediction of Yes/No).
In the dataset there is about 50 input variables, and I have 65,000 entries in the dataset.
I am currently running a RNN with a single hidden layer, with 35 nodes in the hidden layer. I am using PyTorch's NLLLoss as my loss function, and Adaboost for the optimization function. I've tried many different learning rates, and 0.01 seems to be working fairly well.
After running for 150 epochs, I notice that I start to converge around .80 accuracy for my test data. However, I would wish for this to be even higher. However, it seems like the model is stuck oscillating around some sort of saddle or local minimum. (A graph of this is below)
What are the most effective ways to get out of this "valley" that the model seems to be stuck in?
Not sure why exactly you are using only one hidden layer and what is the shape of your history data but here are the things you can try:
Try more than one hidden layer
Experiment with LSTM and GRU layer and combination of these layers together with RNN.
Shape of your data i.e. the history you look at to predict the weather.
Make sure your features are scaled properly since you have about 50 input variables.
Your question is little ambiguous as you mentioned RNN with a single hidden layer. Also without knowing the entire neural network architecture, it is tough to say how can you bring in improvements. So, I would like to add a few points.
You mentioned that you are using "Adaboost" as the optimization function but PyTorch doesn't have any such optimizer. Did you try using SGD or Adam optimizers which are very useful?
Do you have any regularization term in the loss function? Are you familiar with dropout? Did you check the training performance? Does your model overfit?
Do you have a baseline model/algorithm so that you can compare whether 80% accuracy is good or not?
150 epochs just for a binary classification task looks too much. Why don't you start from an off-the-shelf classifier model? You can find several examples of regression, classification in this tutorial.