Torchvision 0.3.0 for training a model on AML service - azure-machine-learning-service

I'm building an image to train on AML service, trying to get torchvision==0.3.0 onboard that image. The notebook VM that I'm using has torchvision 0.3.0 and pytorch 1.1.0 that and it allowed me to do what I'm trying to do... but only on the notebook VM. When I submit the job to AML, I get an error :
Error occurred: module 'torchvision.models' has no attribute 'googlenet'
I've managed to capture the logs at image creation. This a part of the extract that shows partially what's going on:
Created wheel for dill: filename=dill-0.3.0-cp36-none-any.whl size=77512 sha256=b39463bd613a2337f86181d449e55c84446bb76c2fad462b0ff7ed721872f817
Stored in directory: /root/.cache/pip/wheels/c9/de/a4/a91eec4eea652104d8c81b633f32ead5eb57d1b294eab24167
Successfully built horovod future json-logging-py psutil absl-py pathspec liac-arff dill
Installing collected packages: tqdm, ptvsd, gunicorn, applicationinsights, urllib3, idna, chardet, requests, asn1crypto, cryptography, pyopenssl, isodate, oauthlib, requests-oauthlib, msrest, jsonpickle, azure-common, PyJWT, python-dateutil, adal, msrestazure, azure-mgmt-authorization, azure-mgmt-containerregistry, pyasn1, ndg-httpsclient, pathspec, azure-mgmt-keyvault, websocket-client, docker, contextlib2, azure-mgmt-resource, backports.weakref, backports.tempfile, jeepney, SecretStorage, pytz, azure-mgmt-storage, ruamel.yaml, azure-graphrbac, jmespath, azureml-core, configparser, json-logging-py, werkzeug, click, MarkupSafe, Jinja2, itsdangerous, flask,liac-arff, pandas, dill, azureml-model-management-sdk, azureml-defaults, torchvision, cloudpickle, psutil, horovod, markdown, protobuf, grpcio, absl-py, tensorboard, future
Found existing installation: torchvision 0.3.0
Uninstalling torchvision-0.3.0:
Successfully uninstalled torchvision-0.3.0
Successfully installed Jinja2-2.10.1 MarkupSafe-1.1.1 PyJWT-1.7.1 SecretStorage-3.1.1 absl-py-0.7.1 adal-1.2.2 applicationinsights-0.11.9 asn1crypto-0.24.0 azure-common-1.1.23 azure-graphrbac-0.61.1 azure-mgmt-authorization-0.60.0 azure-mgmt-containerregistry-2.8.0 azure-mgmt-keyvault-2.0.0 azure-mgmt-resource-3.1.0 azure-mgmt-storage-4.0.0 azureml-core-1.0.55 azureml-defaults-1.0.55 azureml-model-management-sdk-1.0.1b6.post1 backports.tempfile-1.0 backports.weakref-1.0.post1 chardet-3.0.4 click-7.0 cloudpickle-1.2.1 configparser-3.7.4 contextlib2-0.5.5 cryptography-2.7 dill-0.3.0 docker-4.0.2 flask-1.0.3 future-0.17.1 grpcio-1.22.0 gunicorn-19.9.0 horovod-0.16.1 idna-2.8 isodate-0.6.0 itsdangerous-1.1.0 jeepney-0.4.1 jmespath-0.9.4 json-logging-py-0.2 jsonpickle-1.2 liac-arff-2.4.0 markdown-3.1.1 msrest-0.6.9 msrestazure-0.6.1 ndg-httpsclient-0.5.1 oauthlib-3.1.0 pandas-0.25.0 pathspec-0.5.9 protobuf-3.9.1 psutil-5.6.3 ptvsd-4.3.2 pyasn1-0.4.6 pyopenssl-19.0.0 python-dateutil-2.8.0 pytz-2019.2 requests-2.22.0 requests-oauthlib-1.2.0 ruamel.yaml-0.15.89 tensorboard-1.14.0 torchvision-0.2.1 tqdm-4.33.0 urllib3-1.25.3 websocket-client-0.56.0 werkzeug-0.15.5
Without going into too much details, here's the code that I use to create the estimator, and then, submit the job. Nothing particularly fancy.
I tried debugging the image creation process (looking into the logs) and this is where I've captured what's shown above. I've also tried connecting using a python debugger to the running processes, and/or log to bash inside the running docker container to try python interactive to see what my problem is. Originally the problem is I can't use the torchvision.models.googlenet as it's not figuring in the version in use.
conda_packages=['pytorch', 'scikit-learn', 'torchvision==0.3.0']
pip_packages=['tqdm', 'ptvsd']
and I create my estimator with this :
pyTorchEstimator = PyTorch(source_directory='./aml-image-models',
compute_target=ct,
entry_script='train_network.py',
script_params=script_params,
node_count=1,
process_count_per_node=1,
conda_packages=conda_packages,
pip_packages=pip_packages,
use_gpu=True,
framework_version = '1.1')
and submit with typical code.
I'd expect given that I'm specifying 0.3.0 in the dependencies, that it would just work.
Thoughts?

torchvision 0.2.1 is pre-configured in PyTorch estimator for torch version 1.0/1.1.
https://learn.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.dnn.pytorch?view=azure-ml-py#remarks
However, you still can override the torchvision after estimator initialization.
estimator.conda_dependencies.add_pip_package('torchvision==0.3.0')
Anther option is just to use generic Estimator if you are sure about the dependencies you need.
conda_packages=['pytorch', 'scikit-learn', 'torchvision==0.3.0']
pip_packages=['tqdm', 'ptvsd']
estimator = Estimator(source_directory='./aml-image-models',
compute_target=ct,
entry_script='train_network.py',
script_params=script_params,
conda_packages=conda_packages,
pip_packages=pip_packages,
use_gpu=True)

Related

serverless python error: No such file or directory: '/tmp/_temp-sls-py-req' -> '/tmp/sls-py-req'

I am working on a project using serverless framework and python. I've used serverless-python-requirements plugins and it still giving me an error.
The deployment is fine, but every time I trigger the function it gives me this error:
[ERROR] FileNotFoundError: [Errno 2] No such file or directory: '/tmp/_temp-sls-py-req' -> '/tmp/sls-py-req'
Here's a piece of my serverless.yml file:
custom:
pythonRequirements:
dockerizePip: true
zip: true
plugins:
- serverless-offline
- serverless-python-requirements
and here's a piece of my code that causing the error:
try:
import unzip_requirements
except ImportError:
pass
import json
import boto3
import pandas as pd
from sklearn.neighbors import NearestNeighbors
from boto3.dynamodb.types import TypeDeserializer
All I knew was, it was giving an error when importing unzip_requirements (line 2). I followed the documentation and it requires me to do the import.
The cause of the error seems like because it can't find something on lambda /tmp folder.
We ran into a similar issue with serverless-python-requirements recently. For us it was caused by a deployment, where serverless didn't recognize the requirements correctly for one function and tried to unzip an empty .requirements.zip file as a result. The failing line unzip_requirements.py#L22:
#...
zipfile.ZipFile(zip_requirements, 'r').extractall(tempdir)
# the following line fails as the extract command unzips nothing
# and doesn't create any directory that could be renamed
os.rename(tempdir, pkgdir) # Atomic
Therefore I'd recommend trying out the following:
Use pandas as a layer instead, e.g. by using a public pandas layer or creating you own one following the plugin instructions for layers e.g.
# in serverless.yaml
custom:
pythonRequirements:
layer: true
#...
functions:
hello:
handler: handler.hello
layers:
- Ref: PythonRequirementsLambdaLayer
or if the requirements are small enough, then try the unzipped version. This solved it for us with serverless version 2.69.0
I also heard that potentially downgrading to serverless 1.83 addressed a similar issue in the past, but couldn't verify that so far. Good luck

Issues Connecting to Impala Kerberos Hadoop - Windows/Python 3.6

I have gone through a wide search but nothing is working for me. Code goes something like this:
from impala.dbapi import connect
conn = connect(host = 'myhost', port = 21050, auth_mechanism = 'GSSAPI', kerberos_service_name = 'impala')
cursor = conn.cursor()
TTransportException: TTransportException(type=1, message="Could not start SASL: b'Error in sasl_client_start (-4) SASL(-4): no mechanism available: Unable to find a callback: 2'")
Have tried many different versions of the following, but currently here are any possible relevant libraries:
Python 3.6.9
impyla 0.14.0
pure-sasl 0.6.2
pysasl 0.5.0
sasl 0.2.1
thrift 0.13.0
thrift-sasl 0.3.0
thriftpy 0.3.9
thriftpy2 0.4.8
Any help would be greatly appreciated.
I have tried multiple libraries on python and failed when trying to authenticate from a windows machine.There is no easy way. The Kerberos libraries mainly work on Linux. The workarounds for Windows do not work. So what can be the solution to this.
Well... be a Roman while in Rome. Try the windows native libraries from Python.
import sys
import clr
from System.Net.Http import *
myClienthandler = HttpClientHandler()
myClienthandler.UseDefaultCredentials = True
myClient = HttpClient(myClienthandler)
x = myClient.GetStringAsync("putyourURLwithinthequoteshere")
myresult = x.Result
print(myresult)
Note that the this python script will have to run by the user who has access to the URL you are trying to access. By setting UseDefaultCredentials property as True you are passing the Kerberos tickets for the logged in user.

Pyppeteer crushes after 20 seconds with pyppeteer.errors.NetworkError

During usage of pyppeteer for controlling the Chromium I have been receiving an error approximately after 20 seconds of work:
pyppeteer.errors.NetworkError: Protocol Error (Runtime.callFunctionOn): Session closed. Most likely the page has been closed.
As described here the issue is probably caused by implementation of python websockets>=7 package and by its usage within pyppeteer.
There are 3 solutions to prevent disconnection from Chromium:
- Patching the code like described here (preferable):
Run the snippet before running any other Pyppeteer commands
def patch_pyppeteer():
import pyppeteer.connection
original_method = pyppeteer.connection.websockets.client.connect
def new_method(*args, **kwargs):
kwargs['ping_interval'] = None
kwargs['ping_timeout'] = None
return original_method(*args, **kwargs)
pyppeteer.connection.websockets.client.connect = new_method
patch_pyppeteer()
- Change the trouble making library:
Downgrade websockets package to websockets-6.0 e.g via
pip3 install websockets==6.0 --force-reinstall (in your virtual environment)
- Change the code base
as described in this pull request, which will be hopefully merged soon.

How to get WKHTMLTOPDF working on Heroku?

I created a website which generates PDF using PDFKIT and I know how to install and setup environment variable path on Window. I managed to deploy my first website on Heroku but now I'm getting error "No wkhtmltopdf executable found: "b''" When trying to generate the PDF.
I have no idea, How to install and setup WKHTMLTOPDF on Heroku because this is first time I'm dealing with Linux.
I really tried everything before asking this but even following this not working for me.
Python 3 flask install wkhtmltopdf on heroku
If possible, please guide me with step by step on how to install and setup this.
I followed all the resource and everything but couldn't make it work. Every time I get the same error.
I'm using Django version 2. Python version 3.7.
This is what I get if I do heroku stack
Available Stacks
cedar-14
container
heroku-16
* heroku-18
Error, I'm getting when generating the PDF.
No wkhtmltopdf executable found: "b''"
If this file exists please check that this process can read it. Otherwise please install wkhtmltopdf - https://github.com/JazzCore/python-pdfkit/wiki/Installing-wkhtmltopdf
My website works very well on localhost without any problem and as far as I know, I'm sure that I have done something wrong in installing wkhtmltopdf.
Thank you
It's non-trivial. If you want to avoid all of the below's headache, you can just use my service, api2pdf: https://github.com/api2pdf/api2pdf.python. Otherwise, if you want to try and work through it, see below.
1) Add this to your requirements.txt to install a special wkhtmltopdf pack for heroku as well as pdfkit.
git+git://github.com/johnfraney/wkhtmltopdf-pack.git
pdfkit==0.6.1
2) I created a pdf_manager.py in my flask app. In pdf_manager.py I have a method:
def _get_pdfkit_config():
"""wkhtmltopdf lives and functions differently depending on Windows or Linux. We
need to support both since we develop on windows but deploy on Heroku.
Returns:
A pdfkit configuration
"""
if platform.system() == 'Windows':
return pdfkit.configuration(wkhtmltopdf=os.environ.get('WKHTMLTOPDF_BINARY', 'C:\\Program Files\\wkhtmltopdf\\bin\\wkhtmltopdf.exe'))
else:
WKHTMLTOPDF_CMD = subprocess.Popen(['which', os.environ.get('WKHTMLTOPDF_BINARY', 'wkhtmltopdf')], stdout=subprocess.PIPE).communicate()[0].strip()
return pdfkit.configuration(wkhtmltopdf=WKHTMLTOPDF_CMD)
The reason I have the platform statement in there is that I develop on a windows machine and I have the local wkhtmltopdf binary on my PC. But when I deploy to Heroku, it runs in their linux containers so I need to detect first which platform we're on before running the binary.
3) Then I created two more methods - one to convert a url to pdf and another to convert raw html to pdf.
def make_pdf_from_url(url, options=None):
"""Produces a pdf from a website's url.
Args:
url (str): A valid url
options (dict, optional): for specifying pdf parameters like landscape
mode and margins
Returns:
pdf of the website
"""
return pdfkit.from_url(url, False, configuration=_get_pdfkit_config(), options=options)
def make_pdf_from_raw_html(html, options=None):
"""Produces a pdf from raw html.
Args:
html (str): Valid html
options (dict, optional): for specifying pdf parameters like landscape
mode and margins
Returns:
pdf of the supplied html
"""
return pdfkit.from_string(html, False, configuration=_get_pdfkit_config(), options=options)
I use these methods to convert to PDF.
Just follow these steps to Deploy Django app(pdfkit) on Heroku:
Step 1:: Add following packages in requirements.txt file
wkhtmltopdf-pack==0.12.3.0
pdfkit==0.6.0
Step 2: Add below lines in the views.py to add path of binary file
import os, sys, subprocess, platform
if platform.system() == "Windows":
pdfkit_config = pdfkit.configuration(wkhtmltopdf=os.environ.get('WKHTMLTOPDF_BINARY', 'C:\\Program Files\\wkhtmltopdf\\bin\\wkhtmltopdf.exe'))
else:
os.environ['PATH'] += os.pathsep + os.path.dirname(sys.executable)
WKHTMLTOPDF_CMD = subprocess.Popen(['which', os.environ.get('WKHTMLTOPDF_BINARY', 'wkhtmltopdf')],
stdout=subprocess.PIPE).communicate()[0].strip()
pdfkit_config = pdfkit.configuration(wkhtmltopdf=WKHTMLTOPDF_CMD)
Step 3: And then pass pdfkit_config as argument as below
pdf = pdfkit.from_string(html,False,options, configuration=pdfkit_config)

usbip not working with OpenWRT

I am using MT7688 module with openWRT OS, version 15.05. I did install usbip into the device with:
#opkg install http://downloads.lede-project.org/releases/17.01.1/targets/ramips/mt7688/packages/kmod-usbip-client_4.4.61-1_mipsel_24kc.ipk
#opkg install http://downloads.lede-project.org/releases/17.01.1/targets/ramips/mt7688/packages/kmod-usbip-server_4.4.61-1_mipsel_24kc.ipk
#opkg install http://downloads.lede-project.org/releases/17.01.1/targets/ramips/mt7688/packages/kmod-usbip_4.4.61-1_mipsel_24kc.ipk
Failure scenario:
root#mylinkit:/# usbip
-ash: usbip: not found
So, looks like something broken at user space. Do any one know the solution for it?
Below are the logs which shows kernel module is installed:
root#mylinkit:/# lsmod|grep usbip
usbip_core 4768 2 vhci_hcd
usbip_host 11256 0
root#mylinkit:/# find -name *usbip*
./etc/modules.d/usbip-server
./etc/modules.d/usbip
./etc/modules.d/usbip-client
./lib/modules/3.18.23/usbip-core.ko
./lib/modules/3.18.23/usbip-host.ko
./overlay/upper/etc/modules.d/usbip-server
./overlay/upper/etc/modules.d/usbip
./overlay/upper/etc/modules.d/usbip-client
./overlay/upper/lib/modules/3.18.23/usbip-core.ko
./overlay/upper/lib/modules/3.18.23/usbip-host.ko
./overlay/upper/usr/lib/opkg/info/kmod-usbip-server.postinst-pkg
./overlay/upper/usr/lib/opkg/info/kmod-usbip.control
./overlay/upper/usr/lib/opkg/info/kmod-usbip-server.prerm
./overlay/upper/usr/lib/opkg/info/kmod-usbip-client.postinst
./overlay/upper/usr/lib/opkg/info/kmod-usbip.list
./overlay/upper/usr/lib/opkg/info/kmod-usbip-client.prerm
./overlay/upper/usr/lib/opkg/info/kmod-usbip-server.list
./overlay/upper/usr/lib/opkg/info/kmod-usbip-server.postinst
./overlay/upper/usr/lib/opkg/info/kmod-usbip-client.control
./overlay/upper/usr/lib/opkg/info/kmod-usbip.postinst
./overlay/upper/usr/lib/opkg/info/kmod-usbip.prerm
./overlay/upper/usr/lib/opkg/info/kmod-usbip-server.control
./overlay/upper/usr/lib/opkg/info/kmod-usbip.postinst-pkg
./overlay/upper/usr/lib/opkg/info/kmod-usbip-client.postinst-pkg
./overlay/upper/usr/lib/opkg/info/kmod-usbip-client.list
./sys/bus/usb/drivers/usbip-host
./sys/devices/platform/vhci_hcd/usbip_debug
./sys/module/usbip_core
./sys/module/usbip_core/parameters/usbip_debug_flag
./sys/module/usbip_core/holders/usbip_host
./sys/module/usbcore/holders/usbip_host
./sys/module/usbip_host
./sys/module/usbip_host/drivers/usb:usbip-host
./usr/lib/opkg/info/kmod-usbip-server.postinst-pkg
./usr/lib/opkg/info/kmod-usbip.control
./usr/lib/opkg/info/kmod-usbip-server.prerm
./usr/lib/opkg/info/kmod-usbip-client.postinst
./usr/lib/opkg/info/kmod-usbip.list
./usr/lib/opkg/info/kmod-usbip-client.prerm
./usr/lib/opkg/info/kmod-usbip-server.list
./usr/lib/opkg/info/kmod-usbip-server.postinst
./usr/lib/opkg/info/kmod-usbip-client.control
./usr/lib/opkg/info/kmod-usbip.postinst
./usr/lib/opkg/info/kmod-usbip.prerm
./usr/lib/opkg/info/kmod-usbip-server.control
./usr/lib/opkg/info/kmod-usbip.postinst-pkg
./usr/lib/opkg/info/kmod-usbip-client.postinst-pkg
./usr/lib/opkg/info/kmod-usbip-client.list
I spend much time to figure out the solution. And in the end, doubt was correct. The installer ipk from release branch, as mentioned in query post, does not have user space binaries.
Solution: To get rid of it, I took complete source from official openwrt
- `git clone https://github.com/openwrt/openwrt`
- `make menuconfig`
- *Enabling from menuconfig*
- `networking->usbip` `networking->usbip-client` and `networking->usbip-server`
And after compiling I got two binaries in sbin
/usr/sbin/usbip
/usr/sbin/usbipd
Which was needed and I was looking for. It works perfectly now.

Resources