TensorFlow Optimization in DSVM - azure

Problem statement first: How does one properly setup tensorflow for running on a DSVM using a remote Docker environment? Can this be done in aml_config/*.runconfig?
I receive the following message and I would like to be able to utilize the increased speeds of the extended FMA operations.
tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Background: I utilize a local docker environment managed through Azure ML Workbench for initial testing and code validation so that I'm not running an expensive DSVM constantly. Once I assess that my code is to my liking, I then run it on a remote docker instance on an Azure DSVM.
I want a consistent conda environment across my compute environments, so this works out extremely well. However, I cannot figure out how to control the tensorflow build to optimize for the hardware at hand (i.e. my local docker on macOS vs. remote docker on Ubuntu DSVM)

The notification is indicating that you should compile TensorFlow from the source to leverage this particular CPU architecture, so it runs faster. You can safely ignore this If you choose to compile, though, you can compile and install the TensorFlow source code, and then use the native VM execution mode (vs. using Docker) to run it from Azure Machine Learning.
Hope this helps,
Serina

Related

Can I use Spyder IDE with Azure Machine Learning?

I am learning to use Azure Machine Learning. it has its Notebooks (which are ok!) and also it allows me to use Jupyter Notebook and VSCode.
However I am wondering if there is a way to efficiently use Spyder with Azure Machine Learing.
eg. I was able to install R-Studio as a custom application using a docker image using steps provided here Stackoverflow link
Spyder supports connecting to a Remote Python kernel, it does, however require SSH.
You can enable SSH on your Compute Instance (see below), but only when you set it up. Also, many companies have policies against enabling SSH, so this might not work for you. If it doesn't, I can highly recommend VSCode.

Diesel compilation hangs on Lightsail

I'm trying to deploy a database-backed Rust app on Amazon Lightsail. It uses the ORM crate Diesel. It compiles without trouble on my local (Arch) Linux machine.
To compile the app remotely, I SSH into a Lightsail Debian VM. After installing Rust, cloning the repo, and specifying the toolchain, I run cargo build. This does compile a bunch of crates, but in compiling Diesel it appears to hang. At least, calling ps shows the cargo and rustc processes appearing to continue after 30 mins.
I've tried Diesel versions 1.4.5 and 2.0.0, stable and nightly Rust toolchains, and Ubuntu as well as Debian VMs.
[Edit: the app also compiles without trouble on a Linode VM.]
What could be the problem? (How can I collect further information for diagnosis?)
What is the CPU graph showing?
Lightsail uses burstable instances that have a baseline of CPU and can handle occasional traffic spikes, but if you spike the CPU for too long then the CPU gets throttled.
If you check on the instances metrics tab you can see if it's running out of burst capacity (choose burst capacity percentage or minutes from the drop down).

Inferencing with a tf.lite model on 32bit linux

So I am 400 hours into a project at work building an automated image classification pipeline. I have have overcame many hurdles, and am about finished with the first alpha. Every thing runs in docker containers on my workstation. The only thing left is to build the inference service. So I set up one more docker container pull in my libraries and set up the flask endpoints, and copy the tflite file to the shared volume; every thing seems to be in order, I can hit the API with the chrome and I get the right responses.
So I very happily report that the project is ready for testing 5 weeks early! I explain that all we have to do is install docker, build and run the docker file, and we are ready to go. To this my coworker responds "the target machines are 32bit! no docker!"
Upgrading to 64 bit is off the table.
I tried to compile tensorflow to 32 bit..........
I want to add a single board PC (x64) to the machine network and run the docker from there but management wants a solution that does not require retrofitting.
The target machines have very unstable internet connections managed by other companies in just about every country on earth so a cloud solution is not going to work.(plus I need sub 50 ms latency)
Does anyone have an idea of how to tackle this challenge? at this point I think I am stuck recompiling tf to 32bit; but I don't know how!
The target machines is running a custom in house distro of Debian 6 32bit.
The target machines are old and have outdated software but were very high end at the time they were built.
It's not clear which 32bit architecture you want to use. I guess it's ARM32.
If that's the case, you can build TF or TFLite for ARM32.
Check the following links.
https://www.tensorflow.org/install/source_rpi
https://www.tensorflow.org/lite/guide/build_rpi
Though they're about RPI, you could get some idea on how to build it for ARM32.

Selenium Webdriver Performance for Parallel tests with Chrome Driver on Windows 10 vs Ubuntu Linux and local vs cloud

I have some parallel Selenium Webdriver tests with long series of steps and some threads running on windows 10 via chrome driver (headless).
Any pointers on whether the tests would run faster on a Ubuntu Linux?
To make things even faster would you suggest deploying on google cloud /AWS?
If yes, which OS/Cloud platform combination performs the best versus local server?
My local server has Intel Core i5/Windows 10/16 GB RAM/SSD
Accoding to the article "4 Reasons to use Linux Agents vs Windows Agents" it seems to be a good idea to use Linux and Amazon EC2.
The major advantage of using a cloud solution like EC2 is the availability of multiple instances. You can setup your own Selenium grid that will outperform you local server. Set up a grid hub and begin with two instances. Add more instances until your runtime improvement is marginal. Then remove the last instance and use that configuration.

Should I containerize a standalone command-line or terminal application which requires 16 vCPU?

We are using an application which is currently compiled for windows (it is a standalone .exe, not hooked into registry) and which can also be cross-compiled for *nix if needed. This application runs optimally using about 16 threads in parallel.
Deploying an entire windows (or Linux) stack seems burdensome and heavy but I don't understand if containers make sense. Where I am confused is that I THOUGHT containers would run on Azure or AWS basically on a shim of some sort. What it looks like is that, instead, I need to spin up a host virtual machine to hold containers. If that is true, then I can only put two containers on a 32 vCPU and containers don't make sense (I think).
Hopefully I am just misunderstanding this. Is there anything lightweight out there which can let me run a process which does heavy computation and file I/O (result files are 16gb+ each) but doesn't rely a GUI, etc?
With all the advertisement out there for docker / swarm, core-os, kubernetes, mesos / mesosphere, I am really cornfused.
Your application is similar to work we've done to support parallel execution of Office File conversion, using Microsoft's converter. We run the converter to support ppt to mp4 conversion, with multiple containers each with a fairly large cpu allocation. The container design on WinDocks is lighter-weight, as it doesn't include OS files. You can give it a try using the free Community Edition, at WinDocks.com
Disclosure: I am the Co-Founder of WinDocks

Resources