how to test read and write performance in caliper - hyperledger-fabric

My smart contract includes following function:
three functions have query ability: function1,function2,function3
three functions have update ability: function4,function5,function6
How can I test query ability and update ability(throughout) overall?
I try to invoke all three query functions in submitTransaction(), but it gets calling function1,function2 and function3 serially...something doesn't feel quite right

Related

Can we set task wise parameters using Databricks Jobs API "run-now"

I have a job with multiple tasks like Task1 -> Task2. I am trying to call the job using api "run now". Task details are below
Task1 - It executes a Note Book with some input parameters
Task2 - It executes a Note Book with some input parameters
So, how I can provide parameters to job api using "run now" command for task1,task2?
I have a parameter "lib" which needs to have values 'pandas' and 'spark' task wise.
I know that we can give unique parameter names like Task1_lib, Task2_lib and read that way.
current way:
json = {"job_id" : 3234234, "notebook_params":{Task1_lib: a, Task2_lib: b}}
Is there a way to send task wise parameters?
It's not supported right now - parameters are defined on the job level. You can ask your Databricks representative (if you have) to communicate this ask to the product team who works on the Databricks Workflows.

Python boto3 step function list executions filter

I am trying to retrieve a specific step function execution input in the past using the list_executions and describe_execution functions in boto3, first to retrieve all the calls and then to get the execution input (I can't use describe_execution directly as I do not know the full state machine ARN). However, list_executions does not accept a filter argument (such as "name"), so there is no way to return partial matches, but rather it returns all (successful) executions.
The solution for now has been to list all the executions and then loop over the list and select the right one. The issue is that this function can return a max 1000 newest records (as per the documentation), which will soon be an issue as there will be more than 1000 executions and I will need to get old executions.
Is there a way to specify a filter in the list_executions/describe_execution function to retrieve execution partially filtered, for ex. using prefix?
import boto3
sf=boto3.client("stepfunctions").list_executions(
stateMachineArn="arn:aws:states:something-something",
statusFilter="SUCCEEDED",
maxResults=1000
)
You are right that the SFN APIs like ListExecutions do not expose other filtering options. Nonetheless, here are two ideas to make your task of searching execution inputs easier:
Use the ListExecutions Paginator to help with looping through the response items.
If you know in advance which inputs are of interest, add a Task to the State Machine to persist execution inputs and ARNs to, say, a DynamoDB table, in a manner that makes subsequent searches easier.

How to acces output folder from a PythonScriptStep?

I'm new to azure-ml, and have been tasked to make some integration tests for a couple of pipeline steps. I have prepared some input test data and some expected output data, which I store on a 'test_datastore'. The following example code is a simplified version of what I want to do:
ws = Workspace.from_config('blabla/config.json')
ds = Datastore.get(ws, datastore_name='test_datastore')
main_ref = DataReference(datastore=ds,
data_reference_name='main_ref'
)
data_ref = DataReference(datastore=ds,
data_reference_name='main_ref',
path_on_datastore='/data'
)
data_prep_step = PythonScriptStep(
name='data_prep',
script_name='pipeline_steps/data_prep.py',
source_directory='/.',
arguments=['--main_path', main_ref,
'--data_ref_folder', data_ref
],
inputs=[main_ref, data_ref],
outputs=[data_ref],
runconfig=arbitrary_run_config,
allow_reuse=False
)
I would like:
my data_prep_step to run,
have it store some data on the path to my data_ref), and
I would then like to access this stored data afterwards outside of the pipeline
But, I can't find a useful function in the documentation. Any guidance would be much appreciated.
two big ideas here -- let's start with the main one.
main ask
With an Azure ML Pipeline, how can I access the output data of a PythonScriptStep outside of the context of the pipeline?
short answer
Consider using OutputFileDatasetConfig (docs example), instead of DataReference.
To your example above, I would just change your last two definitions.
data_ref = OutputFileDatasetConfig(
name='data_ref',
destination=(ds, '/data')
).as_upload()
data_prep_step = PythonScriptStep(
name='data_prep',
script_name='pipeline_steps/data_prep.py',
source_directory='/.',
arguments=[
'--main_path', main_ref,
'--data_ref_folder', data_ref
],
inputs=[main_ref, data_ref],
outputs=[data_ref],
runconfig=arbitrary_run_config,
allow_reuse=False
)
some notes:
be sure to check out how DataPaths work. Can be tricky at first glance.
set overwrite=False in the `.as_upload() method if you don't want future runs to overwrite the first run's data.
more context
PipelineData used to be the defacto object to pass data ephemerally between pipeline steps. The idea was to make it easy to:
stitch steps together
get the data after the pipeline runs if need be (datastore/azureml/{run_id}/data_ref)
The downside was that you have no control over where the pipeline is saved. If you wanted to data for more than just as a baton that gets passed between steps, you could have a DataTransferStep to land the PipelineData wherever you please after the PythonScriptStep finishes.
This downside is what motivated OutputFileDatasetConfig
auxilary ask
how might I programmatically test the functionality of my Azure ML pipeline?
there are not enough people talking about data pipeline testing, IMHO.
There are three areas of data pipeline testing:
unit testing (the code in the step works?
integration testing (the code works when submitted to the Azure ML service)
data expectation testing (the data coming out of the meets my expectations)
For #1, I think it should be done outside of the pipeline perhaps as part of a package of helper functions
For #2, Why not just see if the whole pipeline completes, I think get more information that way. That's how we run our CI.
#3 is the juiciest, and we do this in our pipelines with the Great Expectations (GE) Python library. The GE community calls these "expectation tests". To me you have two options for including expectation tests in your Azure ML pipeline:
within the PythonScriptStep itself, i.e.
run whatever code you have
test the outputs with GE before writing them out; or,
for each functional PythonScriptStep, hang a downstream PythonScriptStep off of it in which you run your expectations against the output data.
Our team does #1, but either strategy should work. What's great about this approach is that you can run your expectation tests by just running your pipeline (which also makes integration testing easy).

How to call cloud function inside another cloud function and pass some input parameters or argument using python?

I have total 3 cloud functions, 1st cloud function which has some conditions, if condition is true then it should trigger 2nd and 3rd cloud function and also pass required arguments or parameters to 2nd and 3rd cloud function before it triggers them.
I have tried separate execution of all of three but I need help in performing above scenario.
Cloud function-1 code in python:
def main(request):
dest_table_name = 'my_dest_table'
myquery = "select count(*) size from `myproject.mydataset.mytable`"
client = bigquery.Client()
job = client.query(myquery)
result = job.result()
for row in results:
print("Total rows available: ",row.size)
if row.size != 0:
pass "dest_table_name" to 2nd cloud function and execute it.
pass "dest_table_name" to 3rd cloud function and execute it.
else:
print("query result is empty")
Your options are:
call the function via an HTTP trigger, e.g. with requests.get(<your function URL>) and pass parameters as URL parameters;
create a PubSub topic and have your functions produce/consume PubSub messages; or:
just factor all your shared functions into the same file and call it as a regular Python function.
The latter will have significantly less overhead, but may contribute to a longer overall function runtime.
With your details and requirements, I recommend you to use PubSub and to do this
Create a PubSub topic
Deploy the function #2 with a trigger on PubSub event on the previously created topic
Deploy the function #3 with a trigger on PubSub event on the previously created topic
Function #1 create a PubSub message with the right parameters and published it
Function #2 and #3 are trigger in parallel with the message published in the PubSub. They extracts the parameters from it and do the process
By the way, you are not coupled, you are scalable, you are parallel.

How to implement SUM with #QuerySqlFunction?

The examples seen so far that cover #QuerySqlFunction are trivial. I put one below. However, I'm looking for an example / solution / hint for providing a cross row calculation, e.g. average, sum, ... Is this possible?
In the example, the function returns value 0 from an array, basically an implementation of ARRAY_GET(x, 0). All other examples I've seen are similar: 1 row, get a value, do something with it. But I need to be able to calculate the sum of a grouped result, or possible a lot more business logic. If somebody could provide me with the QuerySqlFunction for SUM, I assume would allow me to do much more than just SUM.
Step 1: Write a function
public class MyIgniteFunctions {
#QuerySqlFunction
public static double value1(double[] values) {
return values[0];
}
}
Step 2: Register the function
CacheConfiguration<Long, MyFact> factResultCacheCfg = ...
factResultCacheCfg.setSqlFunctionClasses(new Class[] { MyIgniteFunctions.class });
Step 3: Use it in a query
SELECT
MyDimension.groupBy1,
MyDimension.groupBy2,
SUM(VALUE1(MyFact.values))
FROM
"dimensionCacheName".DimDimension,
"factCacheName".FactResult
WHERE
MyDimension.uid=MyFact.dimensionUid
GROUP BY
MyDimension.groupBy1,
MyDimension.groupBy2
I don't believe Ignite currently has clean API support for custom user-defined QuerySqlFunction that spans multiple rows.
If you need something like this, I would suggest that you make use of IgniteCompute APIs and distribute your computations, lambdas, or closures to the participating Ignite nodes. Then from inside of your closure, you can either execute local SQL queries, or perform any other cache operations, including predicate-based scans over locally cached data.
This approach will be executed across multiple Ignite nodes in parallel and should perform well.

Resources