How to implement split_part () in Azure Data Factory Expression - azure

I'm trying to implement a join task in the pipeline of Azure, the join condition needs to come with split_part() which I think it's from the feature of Snowflake or other DBMS.
I can see there are 2 build-in functions inside the Azure datafactory, split() and regexSplit().
However, both cannot return the assigned value after split.
Ex my join condition value should be split_part(Stack_Overflow, '_', 2)
and it will return flow as the join value.
Does anyone know what I can use to implement it with Azure built in function??
Thanks

Code should be
split('Stack_Overflow', '_')[2]
Note the index of Azure array starts with 1 not 0.
Reference
https://learn.microsoft.com/en-us/azure/data-factory/concepts-data-flow-expression-builder

Related

Azure Pipeline Trigger Parameters

I have two YAML pipelines A and B, where A triggers B. They both have same parameter P. Pipeline B has set resource trigger and is run after pipeline A finishes - this works. However it seems that pipeline B is not run using the same parameter P as pipeline A is. B uses always default (first) parameter.
I have tried finding a solution to pass parameters from A to B, without success. I found some older (2020) similar question, where there is stated it is not possible.
Is this something that cannot be done (using resource triggers) or am I missing something?
As per the So thread answer return by MSFT you can't pass different parameter values to pipeline triggers.
And you can follow the workaround provided by the same MSFT that using DevOps counters.
variables:
internalVersion: 1
semanticVersion: $[counter(variables['internalVersion'], 1)]
for more information of DevOps counters check this document.
You can even raise a feature request.

How to prevent the next step when no results from "Get Entities" in Logic Apps

Get Entities (from Azure Table Storage) is the first action in my Logic Apps. Based on the results, rest of the steps will proceed. Please anyone let me know how to stop the execution of next action when there is no results from Get Entities.
Add a condition in your azure logic app to determine whether the number of entities returned by Get entities is greater than 0.
You can use length to get the number of results of Get entities.
If it is greater than 0, execute the original workflow in True, if it is equal to 0, do nothing.
The expression of condition is:
length(body('Get_entities')?['value'])

OData query for all rows from the last 10 minutes

I need to filter rows from an Azure Table Store that are less than 10 minutes old. I'm using a Azure Function App integration to query the table, so a coded solution is not viable in this case.
I'm aware of the datetime type, but for this I have to specify an explicit datetime, for example -
Timestamp gt datetime'2018-07-10T12:00:00.1234567Z'
However, this is insufficient as I need the query to run on a timer every 10 minutes.
According to the OData docs, there are built in functions such as totaloffsetminutes() and now(), but using these causes the function to fail.
[Error] Exception while executing function: Functions.FailedEventsCount. Microsoft.WindowsAzure.Storage: The remote server returned an error: (400) Bad Request.
Is there a way to query a Table Store dynamically in this way?
Turns out that this was easier than expected.
I added the following query filter to the Azure Table Store input integration -
Timestamp gt datetime'{afterDateTime}'
In conjunction with a parameter in the Function trigger route, and Bob's your uncle -
FailedEventsCount/after/{afterDateTime}
Appreciate for other use cases it may not be viable to pass in the datatime, but for me that is perfectly acceptable.

How to use the evaluate operator with the app or workspace scope function in Azure Log Analytics?

I've been looking at the evaluate operator when doing queries using Azure Log Analytics, in particular with the autocluster plugin (but I seem to have the same problem even with preview and diffpatterns).
If I have a query accessing the resource directly (including all tables or just one) it works fine. But if I do the same query across several apps or workspaces. I get an error message:
One or more pattern references were not declared. Detected pattern references: Support
The use of app() or workspace() scope function seems to be the problem- not doing a union to query across several resources.
This doesn't work:
workspace("vmPROD").Perf
| evaluate autocluster()
Neither does this:
app("someService").traces
| evaluate autocluster()
This works:
Perf
| evaluate autocluster()
The problem is that I want to evaluate across resources. At first I thought it might be a scope function limitation, but table() - also a scope function, works.
This works:
table("Perf")
| evaluate autocluster()
How can i work around this limitation? Is this a bug? There is nothing in the documentation that mentions this limitation.
After trying different ways to solve this I came across the materialize() function. This function allows you to cache the result of a subquery, and it seems like I can use the machine learning functions against the cached result when using app() or workspace() to reference the resource. This also works when doing joins- which is what I wanted to do across resources. There are two main limitations to think about, you can at most cache a 5GB result, and you have to use the let operator.
Here is an example with a join:
let joinResult = union app('Konstrukt.SL.CalculationEngine').requests,app('Konstrukt.SL.AggregationEngine').requests;
let cachedJoinResult = materialize(joinResult);
cachedJoinResult
| where success == false
| project session_Id, user_Id, appName,operation_Id,itemCount
| evaluate autocluster();

How to implement SUM with #QuerySqlFunction?

The examples seen so far that cover #QuerySqlFunction are trivial. I put one below. However, I'm looking for an example / solution / hint for providing a cross row calculation, e.g. average, sum, ... Is this possible?
In the example, the function returns value 0 from an array, basically an implementation of ARRAY_GET(x, 0). All other examples I've seen are similar: 1 row, get a value, do something with it. But I need to be able to calculate the sum of a grouped result, or possible a lot more business logic. If somebody could provide me with the QuerySqlFunction for SUM, I assume would allow me to do much more than just SUM.
Step 1: Write a function
public class MyIgniteFunctions {
#QuerySqlFunction
public static double value1(double[] values) {
return values[0];
}
}
Step 2: Register the function
CacheConfiguration<Long, MyFact> factResultCacheCfg = ...
factResultCacheCfg.setSqlFunctionClasses(new Class[] { MyIgniteFunctions.class });
Step 3: Use it in a query
SELECT
MyDimension.groupBy1,
MyDimension.groupBy2,
SUM(VALUE1(MyFact.values))
FROM
"dimensionCacheName".DimDimension,
"factCacheName".FactResult
WHERE
MyDimension.uid=MyFact.dimensionUid
GROUP BY
MyDimension.groupBy1,
MyDimension.groupBy2
I don't believe Ignite currently has clean API support for custom user-defined QuerySqlFunction that spans multiple rows.
If you need something like this, I would suggest that you make use of IgniteCompute APIs and distribute your computations, lambdas, or closures to the participating Ignite nodes. Then from inside of your closure, you can either execute local SQL queries, or perform any other cache operations, including predicate-based scans over locally cached data.
This approach will be executed across multiple Ignite nodes in parallel and should perform well.

Resources