Hazelcast Jet 0.6.1 - Dag Definition - hazelcast-jet

The Hazelcast Jet prints the DAG definition on the console,once started
This converts the Pipeline definition to the DAG.
Here is a Pipeline definition.
private Pipeline buildPipeline() {
Pipeline p = Pipeline.create();
p.drawFrom(Sources.<String, Record>remoteMapJournal("record", getClientConfig(), START_FROM_OLDEST))
.addTimestamps((v) -> getTimeStamp(v), 3000)
.peek()
.groupingKey((v) -> Tuple2.tuple2(getUserID(v),getTranType(v)))
.window(WindowDefinition.sliding(SLIDING_WINDOW_LENGTH_MILLIS, SLIDE_STEP_MILLIS))
.aggregate(counting())
.map((v)-> getMapKey(v))
.drainTo(Sinks.remoteMap("Test", getClientConfig()));
return p;
}
and here is a DAG definition printed on console.
.vertex("remoteMapJournalSource(record)").localParallelism(1)
.vertex("sliding-window-step1").localParallelism(4)
.vertex("sliding-window-step2").localParallelism(4)
.vertex("map").localParallelism(4)
.vertex("remoteMapSink(Test)").localParallelism(1)
.edge(between("remoteMapJournalSource(record)", "sliding-window-step1").partitioned(?))
.edge(between("sliding-window-step1", "sliding-window-step2").partitioned(?).distributed())
.edge(between("sliding-window-step2", "map"))
.edge(between("map", "remoteMapSink(Test)"))
Is there any way to get the DAG definition with all the details like sliding window details, aggregation APIs etc ?

No, it's technically impossible. If you write a lambda (for example for a key extractor), there's no way to display the code that defined the lambda. The only way for you to get more information is to embed that information into the vertex name.
In Jet 0.7, this printout will be changed to the graphviz format so that you can copy-paste it to a tool and see the DAG as an image.

Related

How to create and read createOrReplaceGlobalTempView when using static clusters

In my deployment.yaml file I have defined a static cluster as such:
custom:
basic-cluster-props: &basic-cluster-props
spark_version: "11.2.x-scala2.12"
basic-static-cluster: &basic-static-cluster
new_cluster:
<<: *basic-cluster-props
num_workers: 1
node_type_id: "Standard_DS3_v2"
I use this for all of my tasks. In one of the tasks, I save a DataFrame using:
transactions.createOrReplaceGlobalTempView("transactions")
And in another task (which is depended on the previous task), I try to read the temporary view as such:
global_temp_db = session.conf.get("spark.sql.globalTempDatabase")
# Load wallet features
transactions = session.sql(f"""SELECT *
FROM """ + global_temp_db + """.transactions""")
But I get the error:
AnalysisException: Table or view not found: global_temp.transactions; line 2 pos 43;
'Project [*]
+- 'UnresolvedRelation [global_temp, transactions], [], false
Both tasks run within the same SparkSession, so why can it not find my global temp view?
Unfortunately this won't work unless you're using a cluster-reuse feature (otherwise you have a new cluster each time, therefore you won't be able to cross-reference this view).
A more pythonic approach would be to add the code that initializes the view in every task, e.g. if you're using the pre-defined Task class:
class TaskWithPreInitializedView(Task):
def _add_transactions_view(self):
transactions = ... # some code to define the view
transactions.createOrReplaceGlobalTempView(...)
def launch(self):
self._add_transactions_view()
class RealTask(TaskWithPreInitializedView):
def launch(self):
super(RealTask).launch()
... # your code
Since view creation is a very cheap operation which doesn't take much time, this is a quite efficient approach.

Chainlink node CRON job. How does it get paid?

these are the docs
https://docs.chain.link/docs/jobs/types/cron/
type = "cron"
schemaVersion = 1
schedule = "CRON_TZ=UTC * */20 * * * *"
externalJobID = "0EEC7E1D-D0D2-476C-A1A8-72DFB6633F01"
observationSource = """
fetch [type="http" method=GET url="https://chain.link/ETH-USD"]
parse [type="jsonparse" path="data,price"]
multiply [type="multiply" times=100]
fetch -> parse -> multiply
"""
But what I am wondering is how the job connects to the Oracle contract. How does it connect with the user contract to get paid. Where and how do we send the data once job is complete at specified increment.
Does the job start when the job is posted on the node side. Or does it start the clock once a user contract calls it?
Any help would be much appreciated. I trying to run through the types of jobs to familarize myself with the capbilities of a chainlink node.
A Chron job executes a job based on a chron defined schedule. This means it's triggered based on some condition that the Chainlink node evaluates, and not triggered externally via a smart contract. Because of this, the node isn't paid in LINK tokens for processing the request like it does for API calls, because it's initiating a request itself as opposed to receiving a request (and payment) from on-chain. A node can't initiate a job on its own and then expect to receive payment from a consuming contract, if you require such functionality then you can try to have some logic in the function called to withdraw some LINK. But be careful of who can call this function.
If you want to send data on-chain to a smart contract once the job is completed, you need to manually define an ethtx task at the end of the cron job.
Here's an extended version of your job that sends the result back to a function called someFunction at the contract deployed at address 0xa36085F69e2889c224210F603D836748e7dC0088. The data can be in any format, as long as the abi of the function matches what's being encoded in the job. ie if the function expects a bytes param, you need to ensure your encoding a bytes param, if it expects a uint param, then you need to encode a uint param. In this example, a bytes parameter is used
type = "cron"
schemaVersion = 1
name = "GET > bytes32 (cron)"
schedule = "CRON_TZ=UTC #every 1m"
observationSource = """
fetch [type="http" method=GET url="https://min-api.cryptocompare.com/data/price?fsym=ETH&tsyms=USD"]
parse [type="jsonparse" path="USD"]
multiply [type="multiply" times=100]
encode_response [type="ethabiencode"
abi="(uint256 data)"
data="{\\"data\\": $(multiply) }"]
encode_tx [type="ethabiencode"
abi="someFunction(bytes32 data)"
data="{ \\"data\\": $(encode_response) }"]
submit_tx [type="ethtx"
to="0x6495C9684Cc5702522A87adFd29517857FC99f45"
data="$(encode_tx)"]
fetch -> parse -> multiply -> encode_response -> encode_tx -> submit_tx
"""
And here's the consuming contract for the cron job above:
// SPDX-Lincense-Identifier: MIT
pragma solidity ^0.8.7;
contract Cron {
bytes32 public currentPrice;
function someFunction(bytes32 _price) public {
currentPrice = _price;
}
}
To answer your other question, the is active as soon as it's created on the node, and will start evaluating the triggering conditions for it to active a run, based on the schedule defined

Azure Stream Analytics: ML Service function call in cloud job results in no output events

I've got a problem with an Azure Stream Analytics (ASA) job that should call an Azure ML Service function to score the provided input data.
The query was developed und tested in Visual Studio (VS) 2019 with the "Azure Data Lake and Stream Analytics Tools" Extension.
As input the job uses an Azure IoT-Hub and as output the VS local output for testing purposes (and later even with Blobstorage).
Within this environment everything works fine, the call to the ML Service function is successfull and it returns the desired response.
Using the same query, user-defined functions and aggregates like in VS in the cloud job, no output events are generated (with neither Blobstorage nor Power BI as output).
In the ML Webservice it can be seen, that ASA successfully calls the function, but somehow does not return any response data.
Deleting the ML function call from the query results in a successfull run of the job with output events.
For the deployment of the ML Webservice I tried the following (working for VS, no output in cloud):
ACI (1 CPU, 1 GB RAM)
AKS dev/test (Standard_B2s VM)
AKS production (Standard_D3_v2 VM)
The inference script function schema:
input: array
output: record
Inference script input schema looks like:
#input_schema('data', NumpyParameterType(input_sample, enforce_shape=False))
#output_schema(NumpyParameterType(output_sample)) # other parameter type for record caused error in ASA
def run(data):
response = {'score1': 0,
'score2': 0,
'score3': 0,
'score4': 0,
'score5': 0,
'highest_score': None}
And the return value:
return [response]
The ASA job subquery with ML function call:
with raw_scores as (
select
time, udf.HMMscore(udf.numpyfySeq(Sequence)) as score
from Sequence
)
and the UDF "numpyfySeq" like:
// creates a N x 18 size array
function numpyfySeq(Sequence) {
'use strict';
var transpose = m => m[0].map((x, i) => m.map(x => x[i]));
var array = [];
for (var feature in Sequence) {
if (feature != "time") {
array.push(Sequence[feature])
}
}
return transpose(array);
}
"Sequence" is a subquery that aggregates the data into sequences (arrays) with an user-defined aggregate.
In VS the data comes from the IoT-Hub (cloud input selected).
The "function signature" is recognized correctly in the portal as seen in the image: Function signature
I hope the provided information is sufficient and you can help me.
Edit:
The authentication for the Azure ML webservice is key-based.
In ASA, when selecting to use an "Azure ML Service" function, it will automatically detect and use the keys from the deployed ML model within the subscription and ML workspace.
Deployment code used (in this example for ACI, but looks nearly the same for AKS deployment):
from azureml.core.model import InferenceConfig, Model
from azureml.core.environment import Environment
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.webservice import AciWebservice
ws = Workspace.from_config()
env = Environment(name='scoring_env')
deps = CondaDependencies(conda_dependencies_file_path='./deps')
env.python.conda_dependencies = deps
inference_config = InferenceConfig(source_directory='./prediction/',
entry_script='score.py',
environment=env)
deployment_config = AciWebservice.deploy_configuration(auth_enabled=True, cpu_cores=1,
memory_gb=1)
model = Model(ws, 'HMM')
service = Model.deploy(ws, 'hmm-scoring', models,
inference_config,
deployment_config,
overwrite=True,)
service.wait_for_deployment(show_output=True)
with conda_dependencies:
name: project_environment
dependencies:
# The python interpreter version.
# Currently Azure ML only supports 3.5.2 and later.
- python=3.7.5
- pip:
- sklearn
- azureml-core
- azureml-defaults
- inference-schema[numpy-support]
- hmmlearn
- numpy
- pip
channels:
- anaconda
- conda-forge
The code used in the score.py is just a regular score operation with the loaded models and formatting like so:
score1 = model1.score(data)
score2 = model2.score(data)
score3 = model3.score(data)
# Same scoring with model4 and model5
# scaling of the scores to a defined interval and determination of model that delivered highest score
response['score1'] = score1
response['score2'] = score2
# and so on

How to export metrics from a containerized component in kubeflow pipelines 0.2.5

I have a pipeline made up out of 3 containerized components. In the last component I write the metrics I want to a file named /mlpipeline-metrics.json, just like it's explained here.
This is the Python code I used.
metrics = {
'metrics': [
{
'name': 'accuracy',
'numberValue': accuracy,
'format': 'PERCENTAGE',
},
{
'name': 'average-f1-score',
'numberValue': average_f1_score,
'format': 'PERCENTAGE'
},
]
}
with open('/mlpipeline-metrics.json', 'w') as f:
json.dump(metrics, f)
I also tried writing the file with the following code, just like in the example linked above.
with file_io.FileIO('/mlpipeline-metrics.json', 'w') as f:
json.dump(metrics, f)
The pipeline runs just fine without any errors. But it won't show the metrics in the front-end UI.
I'm thinking it has something to do with the following codeblock.
def metric_op(accuracy, f1_scores):
return dsl.ContainerOp(
name='visualize_metrics',
image='gcr.io/mgcp-1190085-asml-lpd-dev/kfp/jonas/container_tests/image_metric_comp',
arguments=[
'--accuracy', accuracy,
'--f1_scores', f1_scores,
]
)
This is the code I use to create a ContainerOp from the containerized component. Notice I have not specified any file_outputs.
In other ContainerOp I have to specify file_outputs to be able to pass variables to the next steps in the pipeline. Should I do something similar here to map the /mlpipeline-metrics.json onto something so that kubeflow pipelines detects it?
I'm using a managed AI platform pipelines deployment running Kubeflow Pipelines 0.2.5 with Python 3.6.8.
Any help is appreciated.
So after some trial and error I finally came to a solution. And I'm happy to say that my intuition was right. It did have something to do with the file_outputs I didn't specify.
To be able to export your metrics you will have to set file_outputs as follows.
def metric_op(accuracy, f1_scores):
return dsl.ContainerOp(
name='visualize_metrics',
image='gcr.io/mgcp-1190085-asml-lpd-dev/kfp/jonas/container_tests/image_metric_comp',
arguments=[
'--accuracy', accuracy,
'--f1_scores', f1_scores,
],
file_outputs={
'mlpipeline-metrics': '/mlpipeline-metrics.json'
}
)
Here is another way of showing metrics when you write python functions based method:
# Define your components code as standalone python functions:======================
def add(a: float, b: float) -> NamedTuple(
'AddOutput',
[
('sum', float),
('mlpipeline_metrics', 'Metrics')
]
):
'''Calculates sum of two arguments'''
sum = a+b
metrics = {
'add_metrics': [
{
'name': 'sum',
'numberValue': float(sum),
}
]
}
print("Add Result: ", sum) # this will print it online in the 'main-logs' of each task
from collections import namedtuple
addOutput = namedtuple(
'AddOutput',
['sum', 'mlpipeline_metrics'])
return addOutput(sum, metrics) # the metrics will be uploaded to the cloud
Note: I am jsut using a basci function here. I am not using your function.

Revit Python Wrapper

i'm getting into revit python wrapper / revit python shell and am having trouble on a very simple task.
I have one wall in my project and I'm just trying to change the top offset from 0'- 0" to 4'-0". I've been able to change the Comments in the properties but that's about it.
Here's my code:
import rpw
from rpw import revit, db, ui, DB, UI
element = db.Element.from_int(352690)
with db.Transaction('Change height'):
element.parameters['Top Offset'].value = 10
Here's my error:
[ERROR] Error in Transaction Context: has rolled back.
Exception : System.Exception: Parameter is Read Only: Top Offset
at Microsoft.Scripting.Interpreter.ThrowInstruction.Run(InterpretedFrame frame)
at Microsoft.Scripting.Interpreter.Interpreter.HandleException(InterpretedFrame frame, Exception exception)
at Microsoft.Scripting.Interpreter.Interpreter.Run(InterpretedFrame frame)
at Microsoft.Scripting.Interpreter.LightLambda.Run2[T0,T1,TRet](T0 arg0, T1 arg1)
at IronPython.Compiler.PythonScriptCode.RunWorker(CodeContext ctx)
at Microsoft.Scripting.Hosting.ScriptSource.Execute(ScriptScope scope)
at Microsoft.Scripting.Hosting.ScriptSource.ExecuteAndWrap(ScriptScope scope, ObjectHandle& exception)
Any and all help is appreciated. I've read the docs however they dont seem to go over Read Only items.
I'm in revit 2019. RPS is using python 2.7.7
I think this is a "Revit Python Wrapper" (RPW) question more than a "RevitPythonShell" (RPS) one, Im familiar with the way transactions are handled in RPS but the documentation for RPW seems quite different.
This is what your code would look like in RevitPythonShell:
import clr
clr.AddReference('RevitAPI')
clr.AddReference('RevitAPIUI')
from Autodesk.Revit.DB import *
from Autodesk.Revit.UI import *
app = __revit__.Application
doc = __revit__.ActiveUIDocument.Document
ui = __revit__.ActiveUIDocument
element = doc.GetElement(ElementId(352690))
t = Transaction (doc, 'Change Height')
t.Start()
parameter = element.GetParameters('Top Offset')[0]
parameter.Set(10)
t.Commit()

Resources