I am currently working with PyTorch (more precisely with LSTMs using CUDA) on Ubuntu 18.04. As mentioned here, I have set CUBLAS_WORKSPACE_CONFIG=:4096:2.
However, if I train my LSTM using the same hyperparameters as before its performance decreases a lot. So I would like to reset the setting. Does anyone know the default value or how I could possibly obtain the settings I used before?
I think to rest the environment variable you just need to unset the variable
simply in your terminal write
unset CUBLAS_WORKSPACE_CONFIG
for more info:
How do I delete an exported environment variable?
Related
I'm trying to train AllenNLP on custom data instead of using the pre-trained model for coreference resolution. The instructions are here but they are very vague and I am not sure how to progress, in particular I don't know how to modify the JSONNET file to indicate the path to my train, test and dev ConLL-2012 training files. Has anyone ever accomplished this before? Thank you very much.
You can specify the path to your data in these lines in the jsonnet config:
"train_data_path": std.extVar("COREF_TRAIN_DATA_PATH"),
"validation_data_path": std.extVar("COREF_DEV_DATA_PATH"),
"test_data_path": std.extVar("COREF_TEST_DATA_PATH"),
Either you can update the config to use your paths explicitly, or else set these environment variables before running the config with the allennlp train command.
I am trying to train my code with distributed data parallelism, I already trained using torch.nn.DataParallel and now I am trying to see how much gain I can get in training speed if I train using torch.nn.parallel.DistributedDataParallel since I read on numerous pages that its better to use DistributedDataParallel. So I followed one of the examples but I am not sure how to set the following environment variables (os.environ['MASTER_ADDR'] and os.environ['MASTER_PORT']) since I am using a cloud service so I am not sure which specific node my code gets allocated for training my model. Can anyone help me to set these variables?
I have some models saved that have dropout layers. Unfortunately, the dropout_keep_dim value was not given as placeholders. Now when I restore the model for test purpose, it gives random output for each run. So, my question is, is it possible to change the dropout_keep_dim of a saved variable? The dropout layer is added the following way:
tf.nn.dropout(layer_no, dropout_keep_dim)
I have already wasted hours on google and didn't find any working solution. Is there even a solution or are my saved models of no use now? Tf.assign doesn't work as, in my case, dropout_keep_dim is not a variable. Any kind of help is appreciated.
NB. I can restore the dropout_keep_dim value and print it. I want to change it if that's possible and then test with the saved weights.
I am using Pytorch on Windows 10 OS, and having trouble understanding the correct use of Pytorch TensorboardX.
After instantiating a writer (writer = SummaryWriter()), and adding the value of the loss function (in every iteration) to it (write.add_scalar('data/loss_func', loss.data[0].item(), iteration)), I have a folder which contains the saved run.
My questions are:
1) As far as I understand, after the training is complete, I need to write in the terminal (which corresponds to the command line prompt in Windows):
tensorboard --logdir=C:\Users\DrJohn\Documents\runs
where this is the folder which contains the file created by tensorboardX. What is the valid syntax in Windows command prompt? I couldn't understand this from the online tutorials
2) Is it possible to see the learning during the training, by using tensorboardX? (i.e. to plot the learning curve during the iterations?)Is the only option is to see everything once the training ends?
Thanks in advance
I am using sklearn to train a model. The train dataset is about 3000k, so i use SGDClassifier. The feature is not very good, so i know it may not converge. But i want SGDClassifier to stop early according to my setting just like max_iter = 1000. As far as I am concerned, the function SGDClassifier has no parameter like max_iter. How can i do it?
This is the code.
This is the print information.
Any help will be appreciated...
This is weird, by default in scikit-learn 0.18.2, n_iter is set to 5 epochs. Can you please update your question with a script that makes it possible to reproduce the behavior using a toy dataset (for instance generated with numpy.random.randn or similar).
Note that in scikit-learn master and 0.19 once released, n_iter will be deprecated and replaced by max_iter and a tol (for instance set to 1e-3) to automatically stop when the objective function is no longer making progress.
The 20hours running could be not so strange since you have a dataset of 3000k and you use SGDClassifier that is slow. What processor do you have?
Try stopping it by using CTRL+C if you are in Windows. Then, use n_iter to control the number of iterations that you want. The default is 5 however.
Finally, if you want to save a model see here:
Save and Load Machine Learning Models in Python with scikit-learn