the train won't start and I don't know why - pytorch

https://www.kaggle.com/code/vladimirsydor/two-head-inference-best
I'm useing this code
The error message does not appear and does not start
I think the problem is the dataloader of the torch, but I don't know how to solve it
predicted_df = inf_model(test_loader)
0%| | 0/282 [00:00<?, ?it/s]
my torch version is 1.10.1
torchvision version is 0.11.2

Related

Pytorch Training; "Runtime Error:PyTorch and torchvision versions are incompatible ..."

SOLUTION at the bottom!
I want to do Object Detection with this tutorial:
https://towardsdatascience.com/building-your-own-object-detector-pytorch-vs-tensorflow-and-how-to-even-get-started-1d314691d4ae
Although I have compatible versions of Pytorch, Torchvision and Cuda:
conda list torch gives me:
I get the following RunTime Error at the bottom:
RuntimeError: Couldn't load custom C++ ops. This can happen if your
PyTorch and torchvision versions are incompatible, or if you had
errors while compiling torchvision from source. For further
information on the compatible versions, check
https://github.com/pytorch/vision#installation for the compatibility
matrix. Please check your PyTorch version with torch__version__ and
your torchvision version with torchvision__version__ and verify if
they are compatible, and if not please reinstall torchvision so that
it matches your PyTorch install.
when running:
num_epochs = 10
for epoch in range(num_epochs):
train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)#.to_fp16()
lr_scheduler.step()
evaluate(model, data_loader_test, device=device)
Is it really an error resulting from incompatibility of pytorch and torchvision?
Thank you very much.
SOLUTION:
I imported torchvision from the wrong directory. I found out using following:
import torchvision
print(torchvision.__path__)

After downgrading Tensorflow 2.0 to 1.5 results changed and results reproduction is not available

Would you help me to achieve reproducible results with Tensorflow 1.15 without restarting Python kernel. And why the output results in TF 2.0 and TF 1.5 are different with absolutely identical parameters and dataset? Is it possible to achieve identical output?
More details:
I tried to interpret model results in TF 2.0 by:
import shap
background = df3.iloc[np.random.choice(df3.shape[0], 100, replace=False)]
explainer = shap.DeepExplainer(model, background)
I recieved an error:
`get_session` is not available when using TensorFlow 2.0.`get_session` is not available when using TensorFlow 2.0.
According to the SO topic, I tried to setup TF 2.0 compatibility with TF 1 by using in the front of my code:
import tensorflow.compat.v1 as tf
But the error appeared again.
Following advice by many users, I downgraded TF2 to TF 1.15 it solved the problem, and shap module interprets the results but:
1) to make results reproducible now I have to change tf.random.set_seed(7) on tf.random.set_random_seed(7) and restart the Python kernel every time! In TF2 I didn't have to restart the kernel.
2) prediction results has been changed, especially, Economical efficiency (that is, TF1.5. wrongly classifies more important samples than TF2.0).
TF 2:
Accuracy: 94.95%, Economical efficiency = 64%
TF 1:
Accuracy: 94.85%, Economical efficiency = 56%
The code of the model is here
First, results differ from each other not only in TF1 and TF2 versions, but also in TF2.0 and TF2.2 versions. Probably, it depends on diffenent internal parameters in the packages.
Second, TensorFlow2 works with DeepExplainer in the following versions:
import tensorflow
import pandas as pd
import keras
import xgboost
import numpy
import shap
print(tensorflow.__version__)
print(pd.__version__)
print(keras.__version__)
print(xgboost.__version__)
print(numpy.__version__)
print(shap.__version__)
output:
2.2.0
0.24.2
2.3.1
0.90
1.17.5
0.35.0
But you will face some difficulties in updating the libraries.
In Python 3.5, running TF2.2, you will face the error 'DLL load failed: The specified module could not be found'.
It 100% can be solved by installing newer C++ package. See this:https://github.com/tensorflow/tensorflow/issues/22794#issuecomment-573297027
Link to download the package:https://support.microsoft.com/ru-ru/help/2977003/the-latest-supported-visual-c-downloads
In Python 3.7 you will not find the shap 0.35.0 version with whl extention. Only tar.gz extension which gives the error: "Install visual c++ package". But installation doesn't help.
Then download shap 0.35.0 for Python 3.7 here: https://anaconda.org/conda-forge/shap/files. Run Anaconda shell. Type: conda install -c conda-forge C:\shap-0.35.0-py37h3bbf574_0.tar.bz2.

How to use Lazy Adam optimizer in tensorflow 2.0.0

This code doesnt work: it has problem with tf.contrib
model.compile(optimizer=TFOptimizer(tf.contrib.opt.LazyAdamOptimizer()), loss='categorical_crossentropy')
I have tried something with tensorflow_addons.optimizers.LazyAdam() but that does not work either.
Any ideas how to run LazyAdam in tensorflow 2.0.0 ?
PS: only Adam works well as following:
model.compile(optimizer=tf.keras.optimizers.Adam(), loss='categorical_crossentropy')
import tensorflow_addons as tfa
optimizer = tfa.optimizers.LazyAdam()
tensorflow_addons is an extra functionality for TensorFlow 2.x, but now Tensorflow 2.x is still not very stable, if you are facing with module 'tensorflow_core.keras.utils' has no attribute 'register_keras_serializable', try to update you tensorflow to the latest stable version.

python3 keras import error with both tensorflow and theano

With python3 (version 3.6.8) and keras
the simple script:
import keras
gives an error:
Using TensorFlow backend.
Ungültiger Maschinenbefehl (Speicherabzug geschrieben)
(in english it would be something like: "invalid machine command (memory image written)")
So I tried to use theano instead:
import os
os.environ['KERAS_BACKEND'] = 'theano'
from keras import backend as K
With python3 it shows this output:
Using Theano backend.
Ungültiger Maschinenbefehl (Speicherabzug geschrieben)
How could I get further information about the problem?
Try
from tensorflow import keras
If the problem persists, try going through the documentation on how to install and how to use it.
Keras - Tensorflow
Keras Overview - Tensorflow
Keras.io

keras -> mlmodel: coreml object has no attribute 'convert'

I am trying to convert my keras model into mlmodel using coreml. However, it is saying that coremltools module has no attribute 'convert'.
AttributeError: 'module' object has no attribute 'convert'
My coremltools, keras, tensorflow(tensorflow-gpu) modules are all up to date.
I am also using python 2.7.10.
I've used windows and mac, in which, neither worked. However, caffe.convert is working using a caffe model.
Code:
coreml_model = coremltools.converters.keras.convert(MODEL_PATH)
As per the documentation, I expected the converters.keras.convert method to be available in coremltools.
Documentation: https://apple.github.io/coremltools/generated/coremltools.converters.keras.convert.html
Please help, thanks in advance!
Edit:
import coremltools
# from keras.models import load_model
import keras
import sys
from keras.applications import MobileNet
from keras.utils.generic_utils import CustomObjectScope
with CustomObjectScope({'relu6': keras.applications.MobileNet.relu6, 'DepthwiseConv2D': keras.applications.mobilenet.DepthwiseConv2D}):
model = load_model('weights.hdf5')
MODEL_PATH = "data/model_wide_cifar-10_fruits_model.h5"
def main():
""" Takes in keras model and convert to .mlmodel"""
print(sys.version)
# Load in keras model.
# model = load_model(MODEL_PATH)
# load labels
labels=[]
label_handler = open("fruit-labels.txt", 'r')
for label in label_handler:
labels.append(label.rstrip())
label_handler.close()
print("[INFO] Labels: {0}".format(labels))
# Convert to .mlmodel
coreml_model = coremltools.converters.keras.convert(
model=MODEL_PATH,
input_names="image",
output_names="image",
class_labels=labels)
labels = 'fruit-labels.txt'
# Save .mlmodel
coreml_model.utils.save_spec('fruitclassifier.mlmodel')
The solution is to use virtualenv. Follow the instructions from the coremltools README:
Installation
We recommend using virtualenv to use, install, or build coremltools. Be
sure to install virtualenv using your system pip.
pip install virtualenv
The method for installing coremltools follows the
standard python package installation steps.
To create a Python virtual environment called pythonenv follow these steps:
# Create a folder for virtualenv
mkdir virtualenvs
cd virtualenvs
# Create a Python virtual environment for your Core ML project
virtualenv coremltools
To activate your new virtual environment and install coremltools in this environment, follow these steps:
# Active your virtual environment
source coremltools/bin/activate
# Install coremltools in the new virtual environment, pythonenv
pip install --upgrade pip
pip install -U coremltools==3.0b5
Install keras and tensorflow
pip install keras tensorflow
Now make sure it works. With the coremltools environment activated, run
>>> python
Python 3.7.4 (v3.7.4:e09359112e, Sep 5 2019, 14:54:52)
>>> import coremltools
>>> coremltools.converters.keras.convert()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: convert() missing 1 required positional argument: 'model'
coremltools documentation
Credit to this guys issue: https://github.com/apple/coremltools/issues/440
Communist Hacker's answer does not work for my current setup:
tensorflow 2.4.1
coremltools 4.1
Python 3.8.7
However, after reviewing the documentation for coremltools here, I was able to fix it by removing keras from the function and the call now works:
import coremltools
coreml_model = coremltools.converters.convert(model,
input_names="inputname",
output_names="outputname")
Running the above command now produces this in my Jupyter notebook:
Running TensorFlow Graph Passes: 100%|██████████| 5/5 [00:00<00:00, 37.53 passes/s]
Converting Frontend ==> MIL Ops: 100%|██████████| 6/6 [00:00<00:00, 5764.05 ops/s]
Running MIL optimization passes: 100%|██████████| 17/17 [00:00<00:00, 5633.05 passes/s]
Translating MIL ==> MLModel Ops: 100%|██████████| 3/3 [00:00<00:00, 6864.65 ops/s]
Note - I am a complete noob with this, so I probably described things
incorrectly.

Resources