Azure ML: Include additional files during model deployment - azure-machine-learning-service

In my AML pipeline, I've got a model built and deployed to the AciWebservice. I now have a need to include some additional data that would be used by score.py. This data is in json format (~1mb) and is specific to the model that's built. To accomplish this, I was thinking of sticking this file in blob store and updating some "placholder" vars in the score.py during deployment, but it seems hacky.
Here are some options I was contemplating but wasn't sure on the practicality
Option 1:
Is it possible to include this file, during the model deployment itself so that it's part of the docker image?
Option 2:
Another possibility I was contemplating, would it be possible to include this json data part of the Model artifacts?
Option 3:
How about registering it as a dataset and pull that in the score file?
What is the recommended way to deploy dependent files in a model deployment scenario?

There are few ways to accomplish this:
Put the additional file in the same folder as your model file, and register the whole folder as the model. In this approach the file is stored alongside the model.
Put the file in a local folder, and specify that folder as source_directory in InferenceConfig. In this approach the file is re-uploaded every time you deploy a new endpoint.
Use custom base image in InferenceConfig to bake the file into Docker image itself.

To extend the answer by #Roope Astala - MSFT, this is how you can implement it by using the second approach
Put the file in a local folder, and specify that folder as source_directory in InferenceConfig. In this approach the file is re-uploaded every time you deploy a new endpoint.
Let's say this is your file structure.
.
└── deployment
├── entry.py
├── env.yml
└── files
├── data.txt
And you want to read the files/names.txt in entry.py script.
This is how would you read it in entry.py:
file_path = 'deployment/files/data.txt'
with open(file_path, 'r') as f:
...
And this is how you would set up your deployment configuration.
deployment_config = AciWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1)
inference_config = InferenceConfig(
runtime='python',
source_directory='deployment',
entry_script='entry.py',
conda_file='env.yml'
)

Related

Nodejs in Azure functions, importing function in a folder containing index.ts not working as expected

My dev team has a project that consists of a nodejs/typescript backend with graphql. It's hosted using Azure functions and we use the Azure Functions Core Tools's func start command to run our project locally.
We recently had a bug where everything seemed to run as normal, however when trying to do requests (from the frontend, or through the local graphql sandbox mode), we got a null response.
We found this a bit strange as all the tests passed without isses. One of our tests is to make a entry in the database, and then get the data again using the graphql/apollo server.
After a lot of troubleshooting we found that the reason was how we imported our resolvers.
This is a simplified version of our folder structure:
.
└── src
├── __tests__ # Folder containing all the tests
├── resolvers # Folder containing all the resolvers
│ ├─ index.ts # Collects all the resolvers into a single file
│ ├─ eg. user.ts
│ ...
├── index.ts # Imports resolvers and defines the graphql/apollo client
...
It turns out it all came down to how we imported the resolvers.
This did not work in src/index.ts but DID work in the tests (by changing the path so that it was relative to the test folder):
import resolvers from './resolvers'
--
This, however, worked in src/index.ts (and also for the tests):
import resolvers from './resolvers/index'
I'm just wondering if anyone has any insight into why this is the case that I have to explicitly include /index when importing when the code is to be run using azure functions and not when using eg. jest as we do for our tests.
Normally, when you deploy the Azure Functions project of any runtime stack - you'll get the individual folder of each function available in the Function Project.
According to this MS Doc of Azure Functions Folder structure, folders are not published but created automatically on function name of each individual function deployed.
When you create the folder for the Function Files, it will work locally based on the relative path import resolvers from './resolvers/index we specified but not in the Azure because the custom folders are not published where we need to create folder manually in the kudu site of the Azure function App and move the related files to that folder for using that relative path to import the code from required files.

How to deploy a detectron2 model using file in azureML

I have a detectron2 detection model trained and saved ( by someone else ) as a file model.pth, I also have a cfg.yaml file that specifies the weights path as the path to model.pth as follows :
inside cfg.yaml we have this line
WEIGHTS: /var/azureml-app/azureml-models/detect_containers/1/outputs/model.pth
I noticed that the person before me ( who trained the model .. ) used the cfg.yaml file to create the predictor configuration as follows :
cfg = get_cfg()
cfg.merge_from_file("cfg.yaml")
predictor = DefaultPredictor(cfg)
He created and used his predictor inside the run function of the scoring.py script used to deploy the model as a webservice in azureML.
So the problem here is that I have no access to this person's azureML account nor workspace so when I try deploying the model on my own I get errors mainly path errors indicating that the file /var/azureml-app/azureml-models/detect_containers/1/outputs/model.pth is not found, so I uploaded the file in my azureML workspace in the same folder as my notebooks but apparently, that doesn't work either because the path I provide then is considered a local path and when the model is being deployed I guess it's basically going to a remote Microsoft server ( that I have no idea how its files are organized so i don't have the path ).
I also tried creating a new model with the same name, and uploaded the file to this model in order to use this line during deployment :
model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'model.pth')
This way I managed to avoid path errors but once the model got deployed I couldn't test it because of scoring URI errors that are not explained not even in the logs.
Normally the model is functional ( as it was deployed and used before by the person who trained it lol ).
So do you guys have any idea on how should I approach the problem or how to solve it ?
And does anyone have any kind of resource explaining how to get paths to files after they've been deployed, personally once a model is deployed I feel like it's in a black magical box, I can't see how the files are organized in the remote Microsoft server.

What does IP_NETWORK and IP_DEVICE in the Decouple Python library mean?

I was reading through the Decouple Python library, but I don't understand what the following code does-
IP_NETWORK = config("IP_NETWORK")
IP_DEVICE = config("IP_DEVICE")
I know that, there has to be a .env file setup, where the IP_NETWORK and IP_DEVICE have to be declared. But I'm not sure how this module works.
Also, how do I find the IP_NETWORK and the IP_DEVICE ?
I'm not too sure what I'm talking about and may not make sense, but any explanation is appreciated!
Python Decouple library: Strict separation of settings from code
Install:
pip install python-decouple
This library comes handy in separating your settings parameters from your source code. it’s always a good idea to keep your secret key, database url, password etc... in a separate place (environment file - .ini/.env file) and not in your source code git repository for security reasons.
It also comes handy if you want to have different project settings on different environment (e.g - You might want debug mode on for your development environment but not on production.)
How do we decide whether parameter should go inside your source code git repository or environment file ?
It's simple trick - Parameters related to project settings goes straight to the source code and parameters related to instance settings goes to an environment file.
Below first 2 are project settings the last 3 are instance settings.
Locale and i18n;
Middlewares and Installed Apps;
Resource handles to the database, Memcached, and other backing services;
Credentials to external services such as Amazon S3 or Twitter;
Per-deploy values such as the canonical hostname for the instance.
Let's understand how to use it with Django(python framework).
First create a file named .env or .ini in the root of your project and say below is the content of that file.
DEBUG=True
SECRET_KEY=ARANDOMSECRETKEY
DB_NAME=Test
DB_USER=Test
DB_PASSWORD=some_strong_password
Now let's see how we can use it with Django. Sample snippet of settings.py
# other import statement..
from decouple import config
SECRET_KEY = config('SECRET_KEY')
DEBUG = config('DEBUG', cast=bool)
DATABASES = {
'default': {
'NAME': config('DB_NAME'),
'USER': config('DB_USER'),
'PASSWORD': config('DB_PASSWORD'),
# other parameters
}
}
# remaining code.
Hope this answer your question.

config prod, config stage and keys files are good practice?

I am using config files to store defaults and passwords/tokens/keys.
Defaults are no problem to be public.
Obviously I want passwords to remain secret.
I mean - not to push the to GitHub.
I thought about make a configs directory contains the following files:
common.js everybody can see. keys.js passwords/tokens/keys. Shouldn't be pushed to GitHub - using .gitignore file to prevent this. keys-placeholder.js should contain just placeholders so who clones the project should understand to create keys.js file and place his real passwords.
Is it a good practice? How do you hide passwords from pushing to GitHub and also make it comfortable to use when build the project for first time?
Personally, I use config for public app configuration/constants and .env file and dotenv package for secrets.
Then add .env in .gitignore.
So example project would be
config // app configuration/constants
- prod.json
- dev.json
- test.json
.env // secrets
src/
- models
- app.js
...
----- added -----
Why don't you put the config in the src dir?
A: Of course it's totally up to you where to put your config folder.
It's just a matter of preference.
What about staging config?
A: Like question#1, you can add staging.json under config.
If you don't provide any placeholder file for .env, how do I know which passwords should I fill in this file?
A: Typical .env file looks like below.
API_CREDENTIAL=your api credentials
DB_PASSWORD=your db password
How do you lazyload the prod/dev config files to the node app?
A: I don't see much benefit for lazyloading small json files.
If you're asking specific how to guide for config and dotenv library,
please refer to their Github repository.(config, dotenv)

IFileProvider Azure File storage

I am thinking about implementing IFileProvider interface with Azure File Storage.
What i am trying to find in docs is if there is a way to send the whole path to the file to Azure API like rootDirectory/sub1/sub2/example.file or should that actually be mapped to some recursion function that would take path and traverse directories structure on file storage?
just want to make sure i am not missing something and reinvent the wheel for something that already exists.
[UPDATE]
I'm using Azure Storage Client for .NET. I would not like to mount anything.
My intentention is to have several IFileProviders which i could switch based on Environment and other conditions.
So, for example, if my environment is Cloud then i would use IFileProvider implementation that uses Azure File Services through Azure Storage Client. Next, if i have environment MyServer then i would use servers local file system. Third option would be environment someOther with that particular implementation.
Now, for all of them, IFileProvider operates with path like root/sub1/sub2/sub3. For Azure File Storage, is there a way to send the whole path at once to get sub3 info/content or should the path be broken into individual directories and get reference/content for each step?
I hope that clears the question.
Now, for all of them, IFileProvider operates with path like ˙root/sub1/sub2/sub3. For Azure File Storage, is there a way to send the whole path at once to getsub3` info/content or should the path be broken into individual directories and get reference/content for each step?
For access the specific subdirectory across multiple sub directories, you could use the GetDirectoryReference method for constructing the CloudFileDirectory as follows:
var fileshare = storageAccount.CreateCloudFileClient().GetShareReference("myshare");
var rootDir = fileshare.GetRootDirectoryReference();
var dir = rootDir.GetDirectoryReference("2017-10-24/15/52");
var items=dir.ListFilesAndDirectories();
For access the specific file under the subdirectory, you could use the GetFileReference method to return the CloudFile instance as follows:
var file=rootDir.GetFileReference("2017-10-24/15/52/2017-10-13-2.png");

Resources