I am planning to use Azure Digital twin for representing a factory model. I am planning to load a 3d model in GLTF or GLB format and attach properties to each machine or asset in the 3d Model. The machines in the model are named properly. So is there anyway I can interact with the 3d model in Azure programmatically. I am expecting an API to create properties for each element. I already have a Database with the machine id and properties, I just have to write a program to identify the asset in the 3d diagram using the id and attach the properties to it. If some API is exposed for this purpose, please let me know.
You can edit .gtlf file in the blob store just like any dataset:
Using gtlflib
from gltflib import GLTF
import pandas as pd
import yaml
gltf = GLTF.load('..\data\OutdoorTanks.gltf')
gltf.model.nodes[0]
will produce a list of nodes, which can be piped into a dataframe:
colnames = ["extensions", "extras", "name", "camera", "children", "skin", "matrix", "mesh", "rotation", "scale", "translation", "weights"]
pd.DataFrame([[n.extensions,n.extras,n.name,n.camera,n.children,n.skin,n.matrix,n.mesh,n.rotation,n.scale,n.translation,n.weights] for n in gltf.model.nodes],columns=colnames)
How do you productionalize that?:
Azure function that operates on a trigger of your choosing
Fetching gtlf from blob store
Operate on whatever automated thing that you need.
Alternatively, you could use databricks or another lagre scale processing step. Anything that can a) access the blob store and b) run some python code can do this.
Additionally, if you get into manipulating GTLF files this doc is a good read
Related
How can I combine mutiple datasets into one using Azure Machine Learning Studio?
(The following graph doesn't work)
Same question: https://learn.microsoft.com/en-us/answers/questions/666021/unable-to-use-34join-data34-to-combine-multiple-da.html
As per official documentation, The Join Data module does not support a right outer join, so if you want to ensure that rows from a particular dataset are included in the output, that dataset must be on the lefthand input.
For more information follow this link - How to configure Join Data
I have created a ML model and I want to publish the predictions of the test set onto a web page for better visualization for non-technical team members.
I have converted the predictions to a data frame to the case numbers of the test set and original data.
Predictions=pd.DataFrame({'Case.Number':CN_test,'Org_Data':y_test,'Predictions':y_pred})
As I am new to this, my experience with API is just of creating a basic API for hello world.
Requesting guidance on how to do this using API or any other way to get this done.
Regards
Sudhir
Since dataframe can't be rendered directly hence it has to converted into a list
below is the code for the same.
I got the solution in another query:
Return Pandas dataframe as JSONP response in Python Flask
I had been wondering if it were possible to apply "data preparation" (.dprep) files to incoming data in the score.py, similar to how Pipeline objects may be applied. This would be very useful for model deployment. To find out, I asked this question on the MSDN forums and received a response confirming it were possible, but little explanation about how to actually do it. The response was:
in your score.py file, you can invoke the dprep package from Python
SDK to apply the same transformation to the incoming scoring data.
make sure you bundle your .dprep file in the image you are building.
So my questions are:
What function do I apply to invoke this dprep package?
Is it: run_on_data(user_config, package_path, dataflow_idx=0, secrets=None, spark=None) ?
How do I bundle it into the image when creating a web-service from the CLI?
Is there a switch to -f for score files?
I have scanned through the entire documentation and Workbench Repo but cannot seem to find any examples.
Any suggestions would be much appreciated!
Thanks!
EDIT:
Scenario:
I import my data from a live database and let's say this data set has 10 columns.
I then feature engineer this (.dsource) data set using the Workbench resulting in a .dprep file which may have 13 columns.
This .dprep data set is then imported as a pandas DataFrame and used to train and test my model.
Now I have a model ready for deployment.
This model is deployed via Model Management to a Container Service and will be fed data from a live database which once again will be of the original format (10 columns).
Obviously this model has been trained on the transformed data (13 columns) and will not be able to make a prediction on the 10 column data set.
What function may I use in the 'score.py' file to apply the same transformation I created in workbench?
I believe I may have found what you need.
From this documentation you would import from the azureml.dataprep package.
There aren't any examples there, but searching on GitHub, I found this file which has the following to run data preparation.
from azureml.dataprep import package
df = package.run('Data analysis.dprep', dataflow_idx=0)
Hope that helps!
To me, it looks like this can be achieved by using the run_on_data(user_config, package_path, dataflow_idx=0, secrets=None, spark=None) method from the azureml.dataprep.package module.
From the documentation :
run_on_data(user_config, package_path, dataflow_idx=0, secrets=None, spark=None) runs the specified data flow based on an in-memory data source and returns the results as a dataframe. The user_config argument is a dictionary that maps the absolute path of a data source (.dsource file) to an in-memory data source represented as a list of lists.
In Azure Data Factory v2 I've created a number of pipelines. I noticed that each pipeline I create there is a source and destination dataset created.
According to the ADF documentation: A dataset is a named view of data that simply points or references the data you want to use in your activities as inputs and outputs.
These datasets are visible within my data factory. I'm curious why I would care about these? These almost seem like 'under the hood' objects ADF creates to move data around. What value are these to me and why would I care about them?
These datasets are entities that can be reused. For example, dataset A can be referenced by many pipelines if those pipelines need the same data (same table or same file).
Linked services can be reused too. I think that's why ADF has these concepts.
You may be seeing those show up in your Factory if you create pipelines via the Copy Wizard Tool. That will create Datasets for your Source & Sink. The Copy Activity is the primary consumer of Datasets in ADF Pipelines.
If you are using ADFv2 to transform data, no DataSet is required. But if you are using ADF copy activity to copy data, DataSet is used to let ADF know the path and name of object to copy from/to. Once you have one dataset created, it can be used in many pipelines. Could you please help to let me understand more why creating a dataset is a friction to you in your projects?
I have a dynamic fleet of devices self-registering with IoT Hub and feeding data into Azure Stream Analytics - each device has a uniquely generated ID. I would like to be able to randomly pick 10 of them and output this filtered dataset to Power BI for visualisation purposes. I'm using streaming datasets.
How do I go about constructing this subset...? WHERE deviceId LIKE isn't the right approach since the device ID is uniquely generated.
Thanks!
The easiest thing would be to use Stream Analytics, and have the list of devices you want to output as reference data somewhere to augment the stream with.
You could then flag that data from the reference set and use a second Stream Analytics output with a where clause on it.
What benefit will this activity have though? Maybe something like an average of all devices would be better? I don't know what the business driver is :-)
RAND is not directly supported in ASA Query Language, but can be used using JavaScript UDF (User Defined Function).
However we don't recommend to use a random generator in ASA, since it affects the repeatability in recovery scenarios.
Anthony suggestion to use reference data or an aggregate function may be the best option.
Thanks!
JS (Azure Stream Analytics team)