Sagemaker Studio Lab GPU not working with pytorch

Sagemaker Studio Lab GPU not working with pytorch - pytorch

I can't get the GPU to work even though i chose the GPU option with 4 hours at the start. Is there something to do in order to activate it?
edit:
I installed pytorch with cuda:
%conda install pytorch torchvision torchaudio cudatoolkit=11.4 -c pytorch

Could you try building a custom environment via the YAML file option, using something like this (below is just an example file that works for me, you can add remove packages specific to your use case)
name: torch
channels:
- pytorch
dependencies:
- python=3.9
- pip
- pip:
- ipywidgets
- conda
- conda:
- ipykernel
- pytorch
- torchvision
- torchaudio
- torchserve
- cudatoolkit=11.3
- sklearn

Related

Installing cudatoolkit works with conda install but not with conda create -f

I have a PyTorch environment file:
name: torch
channels:
- defaults
- conda-forge
dependencies:
- python=3.7
- pytorch::pytorch
- pytorch::torchvision
- pytorch::torchaudio
- pytorch::cudatoolkit
- numpy
- scipy
- scikit-learn
- matplotlib
- pillow
- tqdm
- joblib
- visdom
- jsonpatch
- pip
- pip:
- torchsummary
- opencv-python==4.1.1.26
Trying to create a conda environment from it with conda create -f torch.yml fails:
(base) prompt#PC:~$ conda env create -f environment.yml
Collecting package metadata (repodata.json): done
Solving environment: failed
ResolvePackageNotFound:
- pytorch::cudatoolkit
The environment is created without issues if I remove cudatoolkit from the list of dependencies.
However, conda install cudatoolkit -c pytorch finds and installs the package without issues. The same happens if I replace cudatoolkit with cudatoolkit=11.3 (the current most recent version listed on the PyTorch website) in both cases.

You have that error because there is no package called pytorch::cudatoolkit found by conda.
You yml env file have to look like this:
name: torch
channels:
- defaults
- conda-forge
- pytorch
dependencies:
- python=3.7
- pytorch
- torchvision
- torchaudio
- cudatoolkit=11.3
- numpy
- scipy
- scikit-learn
- matplotlib
- pillow
- tqdm
- joblib
- visdom
- jsonpatch
- pip
- pip:
- torchsummary
- opencv-python==4.1.1.26

I managed to resolve the issue by installing cudatoolkit from the nvidia channel, rather than pytorch. I'm still not sure why cudatoolkit is available from pytorch with one method but not the other, but this solves my issue (although the nvidia version seems to be larger, so it's probably a superset package of pytorch's cudatoolkit). My YAML file now looks like this:
name: ritnet
channels:
- defaults
- conda-forge
dependencies:
- python=3.7
- pytorch::pytorch
- pytorch::torchvision
- pytorch::torchaudio
- nvidia::cudatoolkit=11.3
- numpy
- scipy
- scikit-learn
- matplotlib
- pillow
- tqdm
- joblib
- visdom
- jsonpatch
- pip
- pip:
- torchsummary
- opencv-python==4.1.1.26

AssertionError: Torch not compiled with CUDA enabled (depite several reinstallations)

Whenever I try to move a variable to cuda in pytorch (e.g. torch.zeros(1).cuda(), I get the error message "AssertionError: Torch not compiled with CUDA enabled". Besides,torch.cuda.is_available() returns False.
I have read several answers to approaching this error but for some reason several attempts to reinstall cuda and putorch didn't change anything. Here are some of the settings I used:
conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
conda install pytorch torchvision cudatoolkit=11 -c pytorch-nightly
conda install pytorch torchvision cudatoolkit=9.0 -c pytorch
Yet the same error remains. What could be the issue?
Some settings:
I'm using Ubuntu 20.04, GPU is RTX 2080, nvidia-smi works fine (NVIDIA-SMI 460.91.03, Driver Version: 460.91.03, (max possible) CUDA Version: 11.2)

Try installing with pip
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
You can go through this thread for detailed explanations
Pytorch for cuda 11.2

Pytorch Import Error: <pathname>: object file has no loadable segments

I've been trying to install the Pytorch module for my Ubuntu 16.04 LTS through conda. I used conda install pytorch torchvision cpuonly -c pytorch to install it (non CUDA version). However when I type import torch on the Python shell, this is what I see -
ImportError: /home/student/anaconda2/lib/python2.7/site-packages/torch/_C.so: object file has no loadable segments
I have verified that Pytorch was installed using conda list

I had the same issue on Ubuntu 18.04 for conda env with python 3.8. The problem I think is for the incomplete torch installation. So I did pip install from wheel instead of conda install. You may follow as below (assuming you have cuda11 installed):
create conda env
conda create --name=myenv python=3.8
conda activate myenv
Install torch from wheel
pip install torch==1.7.0+cu110 torchvision==0.8.1+cu110 torchaudio===0.7.0 -f https://download.pytorch.org/whl/torch_stable.html
Please note I had to install torchvision==0.8.1+cu110 as reported here

trouble importing Pytorch in Jupyter notebook

Iam new to deep learning and Iam trying to import Pytorch on Jupyter Notebook.
I installed Pytorch with the following lines of code in Anaconda Prompt.
conda create -n pytorch_p37 python=3.7
conda activate pytorch_p37
conda install pytorch torchvision -c pytorch
conda install jupyter
conda list
it all executed well.
but on importing pytorch it shows errors.
import torch
this error below:-
OSError: [WinError 126] The specified module could not be found
error showing image

The problem lied where I was installing a CUDA version.
I tried installing the CPU version and it worked fine (CUDA None).
conda install pytorch torchvision cpuonly -c pytorch

!pip install torch
It worked for me in a Anaconda's Jupyter notebook.

I lost one hour and found that launching the conda environment as stacked version does not lead to error:
in your example:
conda activate pytorch_p37
and from jupyter
import torch # error
from terminal:
conda activate --stack pytorch_p37
and from jupyter:
import torch # success
I could not figure out why :/
https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#activating-an-environment

pip search finds tensorflow, but pip install does not

I am trying to build a Django app that would use Keras models to make recommendations. Right now I'm trying to use one custom container that would hold both Django and Keras. Here's the Dockerfile I've written.
# myproject/docker/app/Dockerfile
FROM python:3.7-alpine # I've tried 3.5, 3.6 and 3.7
RUN apk add --no-cache postgresql-libs && \
apk add --no-cache --virtual .build-deps \
gfortran \
build-base \
freetype-dev \
libpng-dev \
openblas-dev \
postgresql-dev \
python3-dev \
wget
WORKDIR /app
COPY ./misc/requirements.txt /app/
RUN pip search tensorflow
RUN pip install tensorflow
RUN pip install -r /app/requirements.txt
COPY . /app
EXPOSE 8000
ENTRYPOINT ["exec /start.sh"]
Problem is, when I try to build app image, pip can't install tensorflow, even though pip search tensorflow lists tensorflow (1.12) in results.
$ docker-compose -f "docker/docker-compose.yml" --project-directory /path/to/myproject build
psql uses an image, skipping
redis uses an image, skipping
Building app
Step 1/12 : FROM python:3.7-alpine
3.7-alpine: Pulling from library/python
cd784148e348: Already exists
a5ca736b15eb: Already exists
f320f547ff02: Pull complete
2edd8ff8cb8f: Pull complete
9381128744b2: Pull complete
Digest: sha256:f708ad35a86f079e860ecdd05e1da7844fd877b58238e7a9a588b2ca3b1534d8
Status: Downloaded newer image for python:3.7-alpine
---> 1a8edcb29ce4
Step 2/12 : ENV PYTHONBUFFERED 1
---> Running in 5178b24df888
Removing intermediate container 5178b24df888
---> 0f928fbf30f1
Step 3/12 : RUN apk add --no-cache postgresql-libs && apk add --no-cache --virtual .build-deps gfortran build-base freetype-dev libpng-dev openblas-dev postgresql-dev python3-dev wget
---> Running in 2a8f4653e3f9
fetch http://dl-cdn.alpinelinux.org/alpine/v3.8/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.8/community/x86_64/APKINDEX.tar.gz
(1/5) Installing db (5.3.28-r0)
(2/5) Installing libsasl (2.1.26-r14)
(3/5) Installing libldap (2.4.46-r0)
(4/5) Installing libpq (10.5-r0)
(5/5) Installing postgresql-libs (10.5-r0)
OK: 19 MiB in 39 packages
fetch http://dl-cdn.alpinelinux.org/alpine/v3.8/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.8/community/x86_64/APKINDEX.tar.gz
(1/36) Installing binutils (2.30-r5)
(2/36) Installing gmp (6.1.2-r1)
(3/36) Installing isl (0.18-r0)
(4/36) Installing libgomp (6.4.0-r9)
(5/36) Installing libatomic (6.4.0-r9)
(6/36) Installing pkgconf (1.5.3-r0)
(7/36) Installing libgcc (6.4.0-r9)
(8/36) Installing mpfr3 (3.1.5-r1)
(9/36) Installing mpc1 (1.0.3-r1)
(10/36) Installing libstdc++ (6.4.0-r9)
(11/36) Installing gcc (6.4.0-r9)
(12/36) Installing libquadmath (6.4.0-r9)
(13/36) Installing libgfortran (6.4.0-r9)
(14/36) Installing gfortran (6.4.0-r9)
(15/36) Installing libmagic (5.32-r0)
(16/36) Installing file (5.32-r0)
(17/36) Installing musl-dev (1.1.19-r10)
(18/36) Installing libc-dev (0.7.1-r0)
(19/36) Installing g++ (6.4.0-r9)
(20/36) Installing make (4.2.1-r2)
(21/36) Installing fortify-headers (0.9-r0)
(22/36) Installing build-base (0.5-r1)
(23/36) Installing libpng (1.6.34-r1)
(24/36) Installing freetype (2.9.1-r1)
(25/36) Installing zlib-dev (1.2.11-r1)
(26/36) Installing libpng-dev (1.6.34-r1)
(27/36) Installing freetype-dev (2.9.1-r1)
(28/36) Installing openblas-ilp64 (0.3.0-r0)
(29/36) Installing openblas (0.3.0-r0)
(30/36) Installing openblas-dev (0.3.0-r0)
(31/36) Installing libressl-dev (2.7.4-r0)
(32/36) Installing postgresql-dev (10.5-r0)
(33/36) Installing python3 (3.6.6-r0)
(34/36) Installing python3-dev (3.6.6-r0)
(35/36) Installing wget (1.19.5-r0)
(36/36) Installing .build-deps (0)
Executing busybox-1.28.4-r2.trigger
OK: 488 MiB in 75 packages
Removing intermediate container 2a8f4653e3f9
---> 0a6733c0891e
Step 4/12 : WORKDIR /app
---> Running in e99a4dadbd78
Removing intermediate container e99a4dadbd78
---> 11d698c20e86
Step 5/12 : COPY ./misc/requirements.txt /app/
---> aa6b85587b84
Step 6/12 : RUN pip search tensorflow
---> Running in a4434a87e740
tensorflow (1.12.0) - TensorFlow is an open source machine learning framework for everyone.
tensorflow-qndex (0.0.22) - tensorflow-qnd x tensorflow-extenteten
tensorflow-estimator (1.10.12) - TensorFlow Estimator.
mesh-tensorflow (0.0.5) - Mesh TensorFlow
tensorflow-io (0.1.0) - TensorFlow IO
tensorflow-plot (0.2.0) - TensorFlow Plot
tensorflow-lattice (0.9.8) - TensorFlow Lattice provides lattice models in TensorFlow
tensorflow-datasets (0.0.2) - tensorflow/datasets is a library of datasets ready to use with TensorFlow.
tensorflow-extenteten (0.0.22) - TensorFlow extention library
cxflow-tensorflow (0.5.0) - TensorFlow extension for cxflow.
emloop-tensorflow (0.1.0) - TensorFlow extension for emloop.
tensorflow-k8s (0.0.2) - Tensorflow serving extension
tensorflow-transform (0.11.0) - A library for data preprocessing with TensorFlow
dask-tensorflow (0.0.2) - Interactions between Dask and Tensorflow
tensorflow-tracer (1.1.0) - Runtime Tracing Library for TensorFlow
sagemaker-tensorflow (1.12.0.1.0.0.post1) - Amazon Sagemaker specific TensorFlow extensions.
tensorflow-qnd (0.1.11) - Quick and Dirty TensorFlow command framework
tensorflow-probability (0.5.0) - Probabilistic modeling and statistical inference in TensorFlow
tensorflow-utils (0.1.0) - Classes and methods to make using TensorFlow easier
tensorflow-model (0.1.1) - Command-line tool to inspect TensorFlow models
tensorflow-lattice-gpu (0.9.8) - TensorFlow Lattice provides lattice models in TensorFlow
tensorflow-template (0.2) - A tensorflow template for quick starting a deep learning project.
tensorflow-rocm (1.12.0) - TensorFlow is an open source machine learning framework for everyone.
intel-tensorflow (1.12.0) - TensorFlow is an open source machine learning framework for everyone.
tensorflow-font2char2word2sent2doc (0.0.12) - TensorFlow implementation of Hierarchical Attention Networks for Document Classification
tensorflow-gpu (1.12.0) - TensorFlow is an open source machine learning framework for everyone.
tensorflow-aarch64 (1.2) - Tensorflow r1.2 for aarch64[arm64,pine64] CPU only.
tensorflow-fedora28 (1.9.0rc0) - TensorFlow is an open source machine learning framework for everyone.
tensorflow-model-analysis (0.11.0) - A library for analyzing TensorFlow models
tensorflow-transform-canary (0.9.0) - A library for data preprocessing with TensorFlow
rav-tensorflow-transform (0.7.0.910) - A library for data preprocessing with TensorFlow
tensorflow-serving-api (1.12.0) - TensorFlow Serving Python API.
tensorflow-serving-client (0.0.10) - Python client for tensorflow serving
tensorflow-hub (0.2.0) - TensorFlow Hub is a library to foster the publication, discovery, and consumption of reusable parts of machine learning models.
tensorflow-estimator-2.0-preview (1.13.0.dev2019010100) - TensorFlow Estimator.
ngraph-tensorflow-bridge (0.8.0) - Intel nGraph compiler and runtime for TensorFlow
tensorflow-probability-gpu (0.4.0) - Probabilistic modeling and statistical inference in TensorFlow
simple-tensorflow-serving (0.6.6) - The simpler and easy-to-use serving service for TensorFlow models
tensorflow-auto-detect (1.11.0) - Automatically install CPU or GPU tensorflow determined by looking for a CUDA installation.
tensorflow-serving-api-python3 (1.8.0) - *UNOFFICIAL* TensorFlow Serving API libraries for Python3
tensorflow-exercise-hx (1.0.1) - tensorflow练习：鸢尾花种类预测，加州房价预测
tensorflow-metadata (0.9.0) - Library and standards for schema and statistics.
tensorflow-tensorboard (1.5.1) - TensorBoard lets you watch Tensors Flow
resnet-tensorflow (0.0.1) - Deep Residual Neural Network
mlops-tensorflow (0.1.0) -
tensorflow-gpu-macosx (1.8.1) - Unoffcial NVIDIA CUDA GPU support version of Google Tensorflow for MAC OSX 10.13. For more info, please check out my github page. I highly recommend you directly download and install it from my github's release. If you insist on compiling it, you'd do it on a shell to debug.
syntaxnet-with-tensorflow (0.2) - SyntaxNet: Neural Models of Syntax
tensorflow-data-validation (0.11.0) - A library for exploring and validating machine learning data.
ogres (0.0.2) - Thin tensorflow wrapper. Requires tensorflow
tfmesos (0.0.10) - Tensorflow on Mesos
tf-estimator-nightly (1.12.0.dev20181217) - TensorFlow Estimator.
TFBOYS (0.0.1) - TensorFlow BOYS
TFTree (0.1.0) - Tree to tensorflow
tfdebugger (0.1.1) - TensorFlow Debugger
tfextras (0.0.8) - Tensorflow extras
tfu (0.0.1.dev0) - tensorflow utils
tnt (0.12.0.7) - tnt is not tensorflow
easytf (13.9) - Tensorflow CS
tftf (0.0.29) - TensorFlow TransFormer
tf-datasets (0.0.1) - tensorflow/datasets
tfds-nightly (0.0.2.dev201901020014) - tensorflow/datasets is a library of datasets ready to use with TensorFlow.
tf-common (1.0.0) - A common liberary of tensorflow
ParticleFlow (0.0.1) - Particle simulations with tensorflow
tf_decompose (0.1) - Tensor decomposition with TensorFlow
tensorbase (0.3) - Minimalistic TensorFlow Framework
miniflow (0.2.9) - Minimal implementation of TensorFlow
bob.learn.tensorflow (1.0.3) - Bob bindings for tensorflow
quantile-transformer-tf (1.2) - An implementation of QuantileTransformer in tensorflow
tf-env (0.1.0) - RL environments for TensorFlow.
tfseqestimator (2.2.0) - Sequence estimators for Tensorflow
wavenet (0.1.2) - An implementation of WaveNet for TensorFlow.
tf_kaldi_io (0.3.0) - kaldi-io for Tensorflow
ptfutils (0.0.29) - Useful modules for tensorflow
iceflow (0.0.1a2) - tensorflow meta-framework
tf2onnx (0.3.2) - Tensorflow to ONNX converter
tensorforce (0.4.3) - Reinforcement learning for TensorFlow
simnets (0.0.1) - SimNets implementation in tensorflow
saliency (0.0.2) - Saliency methods for TensorFlow
tensor-lib (1.8.19) - Simplified tensorflow library
tensorsets (0.1.0) - Standard datasets for TensorFlow.
tfstage (0.1.7) - TensorFlow project scaffolding
top-hat (0.0.2) - Recommendation system in TensorFlow
TensorMol (0.1) - TensorFlow+Molecules = TensorMol
tfshop (0.0.1) - common tensorflow paradigms
kfac (0.1.0) - K-FAC for TensorFlow
transferflow (0.1.8) - Transfer learning for Tensorflow
train (0.0.3) - Training utilities for TensorFlow.
vibranium (0.1.0) - Opinionated Tensorflow projects
tf-data (0.0.4) - Easy datasets for tensorflow
tensorfunk (0.0.0) - tensorflow model converter to create tensorflow-independent prediction functions.
tf1 (1.1.0) - F1-score metric for TensorFlow
tensorboard-easy (0.2.3) - A tensorflow-independent tensorboard logger
gpflow (1.3.0) - Gaussian process methods in tensorflow
layer (0.1.14) - tensorflow custom comfort wrapper
tflab (0.1.3) - A laboratory for experimenting with Tensorflow abstraction
tensorpack (0.9.0.1) - Neural Network Toolbox on TensorFlow
tensorflowservingclient (0.5.1.post2) - Prebuilt tensorflow serving client
EasyFlow (0.1.dev3) - Modular Distributed TensorFlow Framework
tfgraph (0.2) - Python's Tensorflow Graph Library
serving-utils (0.6.0) - Some utilities for tensorflow serving
Removing intermediate container a4434a87e740
---> a14248285cb2
Step 7/12 : RUN pip install tensorflow
---> Running in 2c14fe29c431
Collecting tensorflow
Could not find a version that satisfies the requirement tensorflow (from versions: )
No matching distribution found for tensorflow
ERROR: Service 'app' failed to build: The command '/bin/sh -c pip install tensorflow' returned a non-zero code: 1
Do I have to resort to building tensorflow from source?
EDIT
Writing this question made me realize I could use two separate containers: one for my Django app and a prebuilt tensorflow with gpu. I would still like to learn how to resolve issues like this, but any pointers to docs how to make two separate docker containers talk, would be appreciated.

It looks like tensorflow only publishes wheels (and only up to 3.6), and Alpine linux is not manylinux1-compatible due to its use of musl instead of glibc. Because of this, pip cannot find a suitable installation candidate and fails. Your best options are probably to build from source or change your base image.

Depending on what you're doing, consider if you need alpine at all. If you're like me and only using apk add to install things that allow you to install certain Python packages, you might be fine if you remove all your apk add commands and replace FROM python:3.6-alpine with FROM python:3.6. Tensorflow installs without any issue from that build.
Edit: Never mind, FROM python:3.6 simply installs Ubuntu anyway. If you're looking for a smaller image without some of the drawbacks of Alpine, consider using FROM python:3.7-slim-buster instead.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Sagemaker Studio Lab GPU not working with pytorch - pytorch

I can't get the GPU to work even though i chose the GPU option with 4 hours at the start. Is there something to do in order to activate it? edit: I installed pytorch with cuda: %conda install pytorch torchvision torchaudio cudatoolkit=11.4 -c pytorch

Related

Installing cudatoolkit works with conda install but not with conda create -f

AssertionError: Torch not compiled with CUDA enabled (depite several reinstallations)

Pytorch Import Error: <pathname>: object file has no loadable segments

trouble importing Pytorch in Jupyter notebook

pip search finds tensorflow, but pip install does not

Categories

Resources