I run a deep learning program in PyTorch using nn.DataParallel. Since I have eight GPUs available. I am passing device_ids=[0, 1, 2, 3, 4, 5, 6, 7].
My program runs on the first seven GPUs [0, 1, 2, 3, 4, 5, 6] but not on the last GPU, whose index is 7.
I have no clue on the reason for the same. What can be the possible reason for the non-utilization of the GPU even though it is free to use?
I found the solution from this
I am using the batch size of 7. So it is using only 7 GPUs, and if I change it to eight, it is using all the GPUs.
Related
I'm running the LunarLander-v2 gym environment and have successfully trained a policy using PPO. I saw the gym API that there is a function to save videos to a file. I need to do this as I am running my code on a server and there isn't a GUI. I've looked into the source code of save_video as well and notice that it saves videos in a "cubic" fashion. Videos of episode number [0, 1, 4, 27, 64, etc.] will be saved. I followed the example in the API but excluded step_starting_index as I don't think I need that. My code is below:
frames = env.render()
save_video(
frames,
"videos",
fps=240,
episode_index=n
)
Episode 0 and 1 successfully gets saved, but after that (Episode 2, 3 or 4), the Python process gets Killed, which suggests a memory error. If I comment out save_video and let it run, the evaluations up to Episode 10 and beyond all succeed. This suggests that there seems to be a problem with save_video and I can't identify it.
I'm currently reading about Page Replacement Algorithms and I find complex question for me.
Question is:
Page replacement algorithm should minimize the number of page faults.
Description:
We can achieve this minimization by distributing heavily used pages evenly over all of memory, rather than having them compete for a small number of page frames. We can associate with each page frame a counter of the number of pages associated with that frame. Then, to replace a page, we can search for the page frame with the smallest counter.
b) How many page faults occur for your algorithm for the following reference string with four page frames?
1, 2, 3, 4, 5, 3, 4, 1, 6, 7, 8, 7, 8, 9, 7, 8, 9, 5, 4, 5, 4, 2
There is command chcpu. I know how to use this with one CPU number. How to use this with list or set of CPU numbers ?
chcpu -e cpu-list
How to write this cpu-list ?
From man chcpu
Some options have a cpu-list argument. Use this argument to specify a comma-separated list of CPUs. The list can contain indi‐
vidual CPU addresses or ranges of addresses. For example, 0,5,7,9-11 makes the command applicable to the CPUs with the addresses
0, 5, 7, 9, 10, and 11.
My computer has two GPUs. And this is my first time using two GPUs. When I had one GPU, I just run the Cuda program, and it runs on the only one GPU. However, I don't know how to control the program to use which GPU and how to run program on the only one GPU. I searched the Internet and post says
export CUDA_VISIBLE_DEVICES=0
This must be used before run the program. I have two program to run. The one is torch script and the other is Cuda script. I opened two terminals and in the 1st terminal, I used the command above and run the torch program. After that, in the 2nd terminal, I also used the command above by only changing the number from 0 to 1 and run the cuda program.
However, seeing the picture of nvidia-smi, it shows the two programs are assigned to the 0th GPU. I wanted to assigned torch program(PID 19520) to the 0th and the cuda program(PID 20351) to the 1st GPU.
How can I assign the two program to different GPU devices?
The followings are the settings of the torch script. (Ubuntu 14.04, nvidia titan gtx x, cuda-7.5)
--[[command line arguments]]--
cmd = torch.CmdLine()
cmd:text()
cmd:text('Train a Recurrent Model for Visual Attention')
cmd:text('Example:')
cmd:text('$> th rnn-visual-attention.lua > results.txt')
cmd:text('Options:')
cmd:option('--learningRate', 0.01, 'learning rate at t=0')
cmd:option('--minLR', 0.00001, 'minimum learning rate')
cmd:option('--saturateEpoch', 800, 'epoch at which linear decayed LR will reach minLR')
cmd:option('--momentum', 0.9, 'momentum')
cmd:option('--maxOutNorm', -1, 'max norm each layers output neuron weights')
cmd:option('--cutoffNorm', -1, 'max l2-norm of contatenation of all gradParam tensors')
cmd:option('--batchSize', 20, 'number of examples per batch')
cmd:option('--cuda', true, 'use CUDA')
cmd:option('--useDevice', 1, 'sets the device (GPU) to use')
cmd:option('--maxEpoch', 2000, 'maximum number of epochs to run')
cmd:option('--maxTries', 100, 'maximum number of epochs to try to find a better local minima for early-stopping')
cmd:option('--transfer', 'ReLU', 'activation function')
cmd:option('--uniform', 0.1, 'initialize parameters using uniform distribution between -uniform and uniform. -1 means default initialization')
cmd:option('--xpPath', '', 'path to a previously saved model')
cmd:option('--progress', false, 'print progress bar')
cmd:option('--silent', false, 'dont print anything to stdout')
CUDA_VISIBLE_DEVICES=0 th [torch script]
CUDA_VISIBLE_DEVICES=1 [CUDA script]
Since os.loadavg() returns [0, 0, 0], is there any way of getting the CPU's average load in Windows based systems for 1, 5, and 15 minute intervals without having to check every few seconds and save the result yourself?
Digging out this topic.
Use loadavg-windows from npm to enable os.loadavg() on Windows.
Wait few seconds to see first result different than 0.
You can use the windows-cpu npm package to get similar results.