Pytorch Cuda for ubuntu 20.04 - pytorch

I'm trying to get pytorch with cuda 10 compatibility via :
conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
from(https://discuss.pytorch.org/t/pytorch-with-cuda-11-compatibility/89254)
but there is timeout error:
Proceed ([y]/n)? y
Downloading and Extracting Packages
pytorch-mutex-1.0 | 3 KB | | 0%
torchvision-0.12.0 | 8.8 MB | | 0%
ffmpeg-4.3 | 9.9 MB | | 0%
pytorch-1.11.0 | 622.9 MB | | 0%
CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://conda.anaconda.org/pytorch/noarch/pytorch-mutex-1.0-cuda.tar.bz2>
Elapsed: -
CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://conda.anaconda.org/pytorch/noarch/pytorch-mutex-1.0-cuda.tar.bz2>
Elapsed: -

So I was running WSL2 and didn't shut down for many days. A reboot fixed the issue.

Related

nvidia-docker : Got permission denied

Docker newbie question here, so please be nice.
I know this might be asked before but I could not find anything related to nvidia-docker.
I completed the installation instructions on the official guide.
When I wanted to test Nvidia-docker:
docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
I got this error:
(base) user#adminme:~$ docker run --gpus all --rm nvidia/cuda nvidia-smi
docker: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.40/containers/create: dial unix /var/run/docker.sock: connect: permission denied.
See 'docker run --help'.
I found this answer here, but it felt a bit different for my case. I am very new to docker and still learning. let me know what you think?
here is some information about my remote Linux machine:
(base) user#adminme:~$ lspci | grep -i nvidia
02:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1080] (rev a1)
02:00.1 Audio device: NVIDIA Corporation GP104 High Definition Audio Controller (rev a1)
nvidia-smi command:
(base) user#adminme:~$ nvidia-smi
Sun May 31 01:12:25 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64.00 Driver Version: 440.64.00 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 Off | 00000000:02:00.0 Off | N/A |
| 0% 33C P8 9W / 215W | 17MiB / 8116MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2545 G /usr/lib/xorg/Xorg 15MiB |
+-----------------------------------------------------------------------------+
docker-version :
(base) user#adminme:~$ docker --version
Docker version 19.03.10, build 9424aeaee9
The quick fix would be to run the container using sudo:
sudo docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
If you want to run docker as non-root user then you need to add it to the docker group.
Create the docker group if it does not exist
sudo groupadd docker
Add your user to the docker group.
sudo usermod -aG docker $USER
Run the following command or Logout and login again and run (that doesn't work you may need to reboot your machine first)
newgrp docker
Check if docker can be run without root
docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
Ref:- https://docs.docker.com/engine/install/linux-postinstall/
In addition to what nischay goyal answered, sometimes after adding the user to the docker group you have to do
su - ${USER}
in order to log out and log back.

the tensorflow docker gpu image doesn't detect my GPU

Running the latest docker with:
docker run -it -p 8888:8888 tensorflow/tensorflow:latest-gpu-jupyter jupyter notebook --notebook-dir=/tf --ip 0.0.0.0 --no-browser --allow-root --NotebookApp.allow_origin='https://colab.research.google.com'
code:
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
gives me:
2020-07-27 19:44:03.826149: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-07-27 19:44:03.826179: E tensorflow/stream_executor/cuda/cuda_driver.cc:313] failed call to cuInit: UNKNOWN ERROR (-1)
2020-07-27 19:44:03.826201: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] no NVIDIA GPU device is present: /dev/nvidia0 does not exist
I'm on Pop_OS 20.04, have tried installing the CUDA drivers from the Pop repository as well as from NVidia. No dice. Any help appreciated.
Running
docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
gives me:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.05 Driver Version: 450.51.05 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 2080 On | 00000000:09:00.0 On | N/A |
| 0% 52C P5 15W / 225W | 513MiB / 7959MiB | 17% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
As per the docs here and here, you have to add a "gpus" argument when creating a the docker container to have gpu support.
So you should start your container something like this. The "--gpus all" makes all the gpus available on the host to be visible to the container.
docker run -it --gpus all -p 8888:8888 tensorflow/tensorflow:latest-gpu-jupyter jupyter notebook --notebook-dir=/tf --ip 0.0.0.0 --no-browser --allow-root --NotebookApp.allow_origin='https://colab.research.google.com'
Also you can try running nvidia-smi on the tensorflow image to quickly check if gpu is accessible on the container.
docker run -it --rm --gpus all tensorflow/tensorflow:latest-gpu-jupyter nvidia-smi
Would return this in my case.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100 Driver Version: 440.100 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1070 Off | 00000000:07:00.0 On | N/A |
| 0% 45C P8 8W / 166W | 387MiB / 8116MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
As you can see, I'm running an older nvidia driver(440.100), so I cannot confirm that this would solve your problem. I'm also on Pop_OS 20.04 and didn't install anything other than docker along with dependencies and nvidia-container-toolkit.
Also I would highly suggest avoiding the latest tag when creating containers as it might cause you to unknowingly upgrade to a newer image. Go with version numbered images.
For example tensorflow/tensorflow:2.3.0-gpu-jupyter.

HyperLedger Fabric Samples - Downloading Platform specific fabric binaries on windows 10

I am installing the Fabric-Samples from https://hyperledger-fabric.readthedocs.io/en/release-2.0/install.html on windows 10.
When I try to run the command, curl -sSL https://raw.githubusercontent.com/hyperledger/fabric/master/scripts/bootstrap.sh | bash -s, I am getting an error in downloading the binaries. Please find the dump from the terminal below. I am running this from the fabric-samples folder where the cloning is done.
Clone hyperledger/fabric-samples repo
===> Checking out v2.0.0 of hyperledger/fabric-samples
error: pathspec 'v2.0.0' did not match any file(s) known to git
Pull Hyperledger Fabric binaries
===> Downloading version 2.0.0 platform specific fabric binaries
===> Downloading: https://github.com/hyperledger/fabric/releases/download/v2.0.0/hyperledger-fabric-msys_nt-10.0-18362-amd64-2.0.0.tar.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 t0 a r . e x e0: E r r o r0 o-p-e:n-i-n:g- -a r-c-:h-i-v:e-:- F-a-i:l-e-d: -t-o o p e n0 '\\.\tape0'
100 9 100 9 0 0 9 0 0:00:01 0:00:01 --:--:-- 4
(23) Failed writing body
==> There was an error downloading the binary file.
------> 2.0.0 platform specific fabric binary is not available to download <----
But when I run this in Git-Cmd(as suggested in HyperLedger-downloading Platform-specific Binaries on Windows 10), I get the following:
Clone hyperledger/fabric-samples repo
===> Checking out v2.0.0 of hyperledger/fabric-samples
error: pathspec 'v2.0.0' did not match any file(s) known to git
Pull Hyperledger Fabric binaries
===> Downloading version 2.0.0 platform specific fabric binaries
===> Downloading: https://github.com/hyperledger/fabric/releases/download/v2.0.0/hyperledger-fabric-windows-amd64-2.0.0.tar.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0tar.exe: Error opening archive: Failed to open '\\.\tape0'
100 656 100 656 0 0 328 0 0:00:02 0:00:02 --:--:-- 244
0 64.3M 0 16943 0 0 2420 0 7:44:40 0:00:07 7:44:33 3929
curl: (23) Failed writing body (0 != 16384)
==> There was an error downloading the binary file.
------> 2.0.0 platform specific fabric binary is not available to download <----
I created /bin and /config folders in the fabric-samples folder. Pls let me know what I am doing wrong here.
Thanks in advance.
Try specifying the latest Fabric version explicitly:
curl -sSL https://raw.githubusercontent.com/hyperledger/fabric/master/scripts/bootstrap.sh | bash -s 2.1.0

user data is not running at launch aws ec2

I am trying to launch an ec2 linux instance (linux 2 ami) and while doing that in the user data, I am trying to
get node js installed, and at the same time install git.
Then I am trying to get clone my github repo and then start the node js server, all this done in the userdata.
I am trying to check everywhere, in the cloudlog init file to find some error why my user data is not working.
Here are the codes.
#!/bin/bash
sudo yum update -y
curl -o- https://raw.githubusercontent.com/creationix/nvm/v0.32.0/install.sh | bash
. ~/.nvm/nvm.sh
nvm install 4.4.5
sudo yum upgrade
sudo yum install git -y
git clone https://github.com/myname/one_user.git
cd one_user
dnsaddress=$(curl -s http://169.254.169.254/latest/meta-data/public-hostname)
export dns_name=${dnsaddress}
npm install -y
node server.js
The code below is from the cloud init log file.
Cloud-init v. 18.2-72.amzn2.0.6 running 'init-local' at Sun, 10 Feb 2019 15:49:35 +0000. Up 4.93 seconds.
Cloud-init v. 18.2-72.amzn2.0.6 running 'init' at Sun, 10 Feb 2019 15:49:38 +0000. Up 7.42 seconds.
ci-info: ++++++++++++++++++++++++++++++++++++++Net device info+++++++++++++++++++++++++++++++++++++++
ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+
ci-info: | Device | Up | Address | Mask | Scope | Hw-Address |
ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+
ci-info: | eth0 | True | 10.0.1.72 | 255.255.255.0 | global | 0e:1f:76:6a:3c:6c |
ci-info: | eth0 | True | fe80::c1f:76ff:fe6a:3c6c/64 | . | link | 0e:1f:76:6a:3c:6c |
ci-info: | lo | True | 127.0.0.1 | 255.0.0.0 | host | . |
ci-info: | lo | True | ::1/128 | . | host | . |
ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+
ci-info: ++++++++++++++++++++++++++++++Route IPv4 info+++++++++++++++++++++++++++++++
ci-info: +-------+-----------------+----------+-----------------+-----------+-------+
ci-info: | Route | Destination | Gateway | Genmask | Interface | Flags |
ci-info: +-------+-----------------+----------+-----------------+-----------+-------+
ci-info: | 0 | 0.0.0.0 | 10.0.1.1 | 0.0.0.0 | eth0 | UG |
ci-info: | 1 | 10.0.1.0 | 0.0.0.0 | 255.255.255.0 | eth0 | U |
ci-info: | 2 | 169.254.169.254 | 0.0.0.0 | 255.255.255.255 | eth0 | UH |
ci-info: +-------+-----------------+----------+-----------------+-----------+-------+
ci-info: +++++++++++++++++++Route IPv6 info+++++++++++++++++++
ci-info: +-------+-------------+---------+-----------+-------+
ci-info: | Route | Destination | Gateway | Interface | Flags |
ci-info: +-------+-------------+---------+-----------+-------+
ci-info: | 9 | fe80::/64 | :: | eth0 | U |
ci-info: | 11 | local | :: | eth0 | U |
ci-info: | 12 | ff00::/8 | :: | eth0 | U |
ci-info: +-------+-------------+---------+-----------+-------+
Cloud-init v. 18.2-72.amzn2.0.6 running 'modules:config' at Sun, 10 Feb 2019 15:49:39 +0000. Up 8.99 seconds.
Loaded plugins: extras_suggestions, langpacks, priorities, update-motd
Existing lock /var/run/yum.pid: another copy is running as pid 3265.
Another app is currently holding the yum lock; waiting for it to exit...
The other application is: yum
Memory : 31 M RSS (321 MB VSZ)
Started: Sun Feb 10 15:49:38 2019 - 00:02 ago
State : Sleeping, pid: 3265
Another app is currently holding the yum lock; waiting for it to exit...
The other application is: yum
Memory : 70 M RSS (361 MB VSZ)
Started: Sun Feb 10 15:49:38 2019 - 00:04 ago
State : Running, pid: 3265
--> 1:openssl-libs-1.0.2k-16.amzn2.0.1.x86_64 from installed removed (updateinfo)
--> 1:openssl-1.0.2k-16.amzn2.0.1.x86_64 from installed removed (updateinfo)
--> 1:openssl-libs-1.0.2k-16.amzn2.0.2.x86_64 from amzn2-core removed (updateinfo)
--> 1:openssl-1.0.2k-16.amzn2.0.2.x86_64 from amzn2-core removed (updateinfo)
1 package(s) needed (+0 related) for security, out of 3 available
Resolving Dependencies
--> Running transaction check
---> Package kernel-tools.x86_64 0:4.14.88-88.76.amzn2 will be updated
---> Package kernel-tools.x86_64 0:4.14.94-89.73.amzn2 will be an update
--> Finished Dependency Resolution
Dependencies Resolved
================================================================================
Package Arch Version Repository Size
================================================================================
Updating:
kernel-tools x86_64 4.14.94-89.73.amzn2 amzn2-core 111 k
Transaction Summary
================================================================================
Upgrade 1 Package
Total download size: 111 k
Downloading packages:
Delta RPMs disabled because /usr/bin/applydeltarpm not installed.
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Updating : kernel-tools-4.14.94-89.73.amzn2.x86_64 1/2
Cleanup : kernel-tools-4.14.88-88.76.amzn2.x86_64 2/2
Verifying : kernel-tools-4.14.94-89.73.amzn2.x86_64 1/2
Verifying : kernel-tools-4.14.88-88.76.amzn2.x86_64 2/2
Updated:
kernel-tools.x86_64 0:4.14.94-89.73.amzn2
Complete!
Cloud-init v. 18.2-72.amzn2.0.6 running 'modules:final' at Sun, 10 Feb 2019 15:49:47 +0000. Up 16.22 seconds.
Loaded plugins: extras_suggestions, langpacks, priorities, update-motd
Existing lock /var/run/yum.pid: another copy is running as pid 3324.
Another app is currently holding the yum lock; waiting for it to exit...
The other application is: yum
Memory : 54 M RSS (270 MB VSZ)
Started: Sun Feb 10 15:49:45 2019 - 00:03 ago
State : Running, pid: 3324
Resolving Dependencies
--> Running transaction check
---> Package kernel.x86_64 0:4.14.94-89.73.amzn2 will be installed
---> Package openssl.x86_64 1:1.0.2k-16.amzn2.0.1 will be updated
---> Package openssl.x86_64 1:1.0.2k-16.amzn2.0.2 will be an update
---> Package openssl-libs.x86_64 1:1.0.2k-16.amzn2.0.1 will be updated
---> Package openssl-libs.x86_64 1:1.0.2k-16.amzn2.0.2 will be an update
--> Finished Dependency Resolution
Dependencies Resolved
================================================================================
Package Arch Version Repository Size
================================================================================
Installing:
kernel x86_64 4.14.94-89.73.amzn2 amzn2-core 19 M
Updating:
openssl x86_64 1:1.0.2k-16.amzn2.0.2 amzn2-core 496 k
openssl-libs x86_64 1:1.0.2k-16.amzn2.0.2 amzn2-core 1.2 M
Transaction Summary
================================================================================
Install 1 Package
Upgrade 2 Packages
Total download size: 21 M
Downloading packages:
Delta RPMs disabled because /usr/bin/applydeltarpm not installed.
--------------------------------------------------------------------------------
Total 33 MB/s | 21 MB 00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Updating : 1:openssl-libs-1.0.2k-16.amzn2.0.2.x86_64 1/5
Updating : 1:openssl-1.0.2k-16.amzn2.0.2.x86_64 2/5
Installing : kernel-4.14.94-89.73.amzn2.x86_64 3/5
Cleanup : 1:openssl-1.0.2k-16.amzn2.0.1.x86_64 4/5
Cleanup : 1:openssl-libs-1.0.2k-16.amzn2.0.1.x86_64 5/5
Verifying : 1:openssl-libs-1.0.2k-16.amzn2.0.2.x86_64 1/5
Verifying : kernel-4.14.94-89.73.amzn2.x86_64 2/5
Verifying : 1:openssl-1.0.2k-16.amzn2.0.2.x86_64 3/5
Verifying : 1:openssl-libs-1.0.2k-16.amzn2.0.1.x86_64 4/5
Verifying : 1:openssl-1.0.2k-16.amzn2.0.1.x86_64 5/5
Installed:
kernel.x86_64 0:4.14.94-89.73.amzn2
Updated:
openssl.x86_64 1:1.0.2k-16.amzn2.0.2
openssl-libs.x86_64 1:1.0.2k-16.amzn2.0.2
Complete!
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 10007 100 10007 0 0 10007 0 0:00:01 --:--:-- 0:00:01 99079
=> Downloading nvm as script to '/.nvm'
=> Profile not found. Tried (as defined in $PROFILE), ~/.bashrc, ~/.bash_profile, ~/.zshrc, and ~/.profile.
=> Create one of them and run this script again
=> Create it (touch ) and run this script again
OR
=> Append the following lines to the correct file yourself:
export NVM_DIR="/.nvm"
[ -s "$NVM_DIR/nvm.sh" ] && . "$NVM_DIR/nvm.sh" # This loads nvm
=> Close and reopen your terminal to start using nvm or run the following to use it now:
export NVM_DIR="/.nvm"
[ -s "$NVM_DIR/nvm.sh" ] && . "$NVM_DIR/nvm.sh" # This loads nvm
/var/lib/cloud/instance/scripts/part-001: line 4: /root/.nvm/nvm.sh: No such file or directory
/var/lib/cloud/instance/scripts/part-001: line 5: nvm: command not found
Loaded plugins: extras_suggestions, langpacks, priorities, update-motd
Existing lock /var/run/yum.pid: another copy is running as pid 11772.
Another app is currently holding the yum lock; waiting for it to exit...
The other application is: yum
Memory : 52 M RSS (268 MB VSZ)
Started: Sun Feb 10 15:50:08 2019 - 00:02 ago
State : Running, pid: 11772
No packages marked for update
Loaded plugins: extras_suggestions, langpacks, priorities, update-motd
Resolving Dependencies
--> Running transaction check
---> Package git.x86_64 0:2.17.2-2.amzn2 will be installed
--> Processing Dependency: perl-Git = 2.17.2-2.amzn2 for package: git-2.17.2-2.amzn2.x86_64
--> Processing Dependency: git-core-doc = 2.17.2-2.amzn2 for package: git-2.17.2-2.amzn2.x86_64
--> Processing Dependency: git-core = 2.17.2-2.amzn2 for package: git-2.17.2-2.amzn2.x86_64
--> Processing Dependency: emacs-filesystem >= 25.3 for package: git-2.17.2-2.amzn2.x86_64
--> Processing Dependency: perl(Term::ReadKey) for package: git-2.17.2-2.amzn2.x86_64
--> Processing Dependency: perl(Git::I18N) for package: git-2.17.2-2.amzn2.x86_64
--> Processing Dependency: perl(Git) for package: git-2.17.2-2.amzn2.x86_64
--> Processing Dependency: libsecret-1.so.0()(64bit) for package: git-2.17.2-2.amzn2.x86_64
--> Running transaction check
---> Package emacs-filesystem.noarch 1:25.3-3.amzn2.0.1 will be installed
---> Package git-core.x86_64 0:2.17.2-2.amzn2 will be installed
---> Package git-core-doc.noarch 0:2.17.2-2.amzn2 will be installed
---> Package libsecret.x86_64 0:0.18.5-2.amzn2.0.2 will be installed
---> Package perl-Git.noarch 0:2.17.2-2.amzn2 will be installed
--> Processing Dependency: perl(Error) for package: perl-Git-2.17.2-2.amzn2.noarch
---> Package perl-TermReadKey.x86_64 0:2.30-20.amzn2.0.2 will be installed
--> Running transaction check
---> Package perl-Error.noarch 1:0.17020-2.amzn2 will be installed
--> Finished Dependency Resolution
Dependencies Resolved
================================================================================
Package Arch Version Repository Size
================================================================================
Installing:
git x86_64 2.17.2-2.amzn2 amzn2-core 217 k
Installing for dependencies:
emacs-filesystem noarch 1:25.3-3.amzn2.0.1 amzn2-core 64 k
git-core x86_64 2.17.2-2.amzn2 amzn2-core 4.0 M
git-core-doc noarch 2.17.2-2.amzn2 amzn2-core 2.3 M
libsecret x86_64 0.18.5-2.amzn2.0.2 amzn2-core 153 k
perl-Error noarch 1:0.17020-2.amzn2 amzn2-core 32 k
perl-Git noarch 2.17.2-2.amzn2 amzn2-core 70 k
perl-TermReadKey x86_64 2.30-20.amzn2.0.2 amzn2-core 31 k
Transaction Summary
================================================================================
Install 1 Package (+7 Dependent packages)
Total download size: 6.8 M
Installed size: 36 M
Downloading packages:
--------------------------------------------------------------------------------
Total 18 MB/s | 6.8 MB 00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Installing : git-core-2.17.2-2.amzn2.x86_64 1/8
Installing : git-core-doc-2.17.2-2.amzn2.noarch 2/8
Installing : libsecret-0.18.5-2.amzn2.0.2.x86_64 3/8
Installing : 1:perl-Error-0.17020-2.amzn2.noarch 4/8
Installing : perl-TermReadKey-2.30-20.amzn2.0.2.x86_64 5/8
Installing : 1:emacs-filesystem-25.3-3.amzn2.0.1.noarch 6/8
Installing : perl-Git-2.17.2-2.amzn2.noarch 7/8
Installing : git-2.17.2-2.amzn2.x86_64 8/8
Verifying : 1:emacs-filesystem-25.3-3.amzn2.0.1.noarch 1/8
Verifying : perl-TermReadKey-2.30-20.amzn2.0.2.x86_64 2/8
Verifying : 1:perl-Error-0.17020-2.amzn2.noarch 3/8
Verifying : libsecret-0.18.5-2.amzn2.0.2.x86_64 4/8
Verifying : git-core-2.17.2-2.amzn2.x86_64 5/8
Verifying : git-2.17.2-2.amzn2.x86_64 6/8
Verifying : perl-Git-2.17.2-2.amzn2.noarch 7/8
Verifying : git-core-doc-2.17.2-2.amzn2.noarch 8/8
Installed:
git.x86_64 0:2.17.2-2.amzn2
Dependency Installed:
emacs-filesystem.noarch 1:25.3-3.amzn2.0.1
git-core.x86_64 0:2.17.2-2.amzn2
git-core-doc.noarch 0:2.17.2-2.amzn2
libsecret.x86_64 0:0.18.5-2.amzn2.0.2
perl-Error.noarch 1:0.17020-2.amzn2
perl-Git.noarch 0:2.17.2-2.amzn2
perl-TermReadKey.x86_64 0:2.30-20.amzn2.0.2
Complete!
Cloning into 'zero2architect'...
/var/lib/cloud/instance/scripts/part-001: line 16: npm: command not found
/var/lib/cloud/instance/scripts/part-001: line 17: node: command not found
Feb 10 15:50:16 cloud-init[3314]: util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/part-001 [127]
Feb 10 15:50:16 cloud-init[3314]: cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts)
Feb 10 15:50:16 cloud-init[3314]: util.py[WARNING]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python2.7/site-packages/cloudinit/config/cc_scripts_user.pyc'>) failed
Cloud-init v. 18.2-72.amzn2.0.6 finished at Sun, 10 Feb 2019 15:50:16 +0000. Datasource DataSourceEc2. Up 45.55 seconds
Have you checked your SGs and NACL configurations? You need to allow port 80 (HTTP) and port 443 (HTTPS) in outbound SG rules - destination 0.0.0.0/0. You also need to allow the same on outbound NACL's rules and ephemeral ports on inbound rules. Makes sure that your instance is sitting in a public subnet (with route to IGW) and it has public/Elastic IPv4 attached to it.
If any one of those conditions is not met, your user data scrip will fail, since it need connection to the Internet (in your case).
If your instance is sitting in a private subnet, make sure that you are running NAT Gateway or NAT instance and that your route tables are properly configured, as well as SGs and NACLs. Also make sure that source/destination check is disabled on NAT instance if you are using it.
UPDATE
You log clearly says that you don't have npm and node installed so you need to pass installation instruction to the script as well.
npm install -y
is not the right way to install npm. You can follow these steps to install both node and npm.
curl -o- https://raw.githubusercontent.com/creationix/nvm/v0.32.0/install.sh | bash
. ~/.nvm/nvm.sh
nvm install 4.4.5
This will install node version 4.4.5. If you want some other version of node to be installed, then change the number for whatever supported version.

CNTK on Azure Data Science VM

I have an N-Series Azure VM (the Data Science VM) with Tesla K80 GPU. According to the NVIDIA scanner my GPU driver is up to date.
When I run my CNTK Brainscript it says "No GPUs Found" and runs in CPU mode. What can I do to troubleshoot?
requestnodes [MPIWrapper]: using 1 out of 1 MPI nodes on a single host (1 reques
ted); we (0) are in (participating)
-------------------------------------------------------------------
Build info:
Built time: Dec 22 2016 01:43:24
Last modified date: Thu Dec 22 01:35:04 2016
Build type: Release
Build target: GPU
With 1bit-SGD: yes
With ASGD: yes
Math lib: mkl
CUDA_PATH: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8
.0
CUB_PATH: c:\src\cub-1.4.1
CUDNN_PATH: C:\local\cudnn-8.0-windows10-x64-v5.1
Build Branch: HEAD
Build SHA1: 8e8b5ff92eff4647be5d41a5a515956907567126
Built by svcphil on DPHAIM-24
Build Path: C:\jenkins\workspace\CNTK-Build-Windows\Source\CNTK\
-------------------------------------------------------------------
No GPUs found
Edit: here is the output from NVidia_smi.exe:
C:\Program Files\NVIDIA Corporation\NVSMI>.\nvidia-smi.exe
Fri Jan 13 19:00:43 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 369.30 Driver Version: 369.30 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 TCC | 0BD1:00:00.0 Off | Off |
| N/A 43C P8 27W / 149W | 0MiB / 12189MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 TCC | 5871:00:00.0 Off | Off |
| N/A 35C P8 34W / 149W | 0MiB / 12189MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
The Windows Data Science VM bydefault does not come with the GPU drivers, CUDA etc. We do have an extension called "Deep Learning toolkit for DSVM" that adds on drivers, CUDA and GPU edition of deep learning software like CNTK, Tensorflow, MxNet.
More Info: http://aka.ms/dsvm/deeplearning
We also recently released a Ubuntu version of DSVM with builtin CUDA, GPU drivers and several more deep learning tools and can be deployed either on GPU VM or CPU only VMs on Azure.
Would it be possible for you to run the python notebooks and see if you could run them with the device being set to gpu(id)? or from activated CNTK python environment you could try setting some device.
import cntk as C
from cntk.device import set_default_device, gpu
C.device.set_default_device(C.device.gpu(0))
This might give you some clues whether it is Brainscript specific issue.
Well the python script and Brainscript work now, after installing CUDA (I installed it to run NVIDIA_SMI). I should not have assumed that the Azure Data Science image (that only works with an N Series VM) has the necessary NVIDIA libraries pre-installed. :-)

Resources