Get an incrementing number in Logic App Select - azure

I am using a Logic App to transform some data for an integration. I am trying to avoid using For Each loops as the amount of data I am working with is high, and these incur a cost for each action and iteration of the for each loop.
However the integration I am working with requires a unique incrementing number for each line. They don't have to be sequential, or even starting with 1 but the order should be kept the same.
So with the above, the first one would get LineNumber 1, the second LineNumber 2, etc.. (or like I said, it could be 67829, 67835, etc..)
I tried to set a variable with ticks(utcNow()) before the start of the mapping, and then use sub(ticks(utcNow()), variables('startTicks')) but this is evaluated once and the same number is applied to all.
My next thought is to use an azure function/inline javascript to go through afterward and assign them, but just wondering if there is a way to accomplish this in the select.

or like I said, it could be 67829, 67835, etc..
Answering to this requirement,
Inside the Select Option :
indexOf(string(variables('<DATA Variable>')),string(item()))
Explanation :
item() - current item (of all items) in the select - stringified the same & tried to find the same in stringified version of the entire data - the index number will be returned.
OUTPUT
Please note :
Did not get a chance to check on a very large dataset.
This may fail, if a specific row(all values in the row) repetitive in nature - I assume this may not
be your case (order number might unique )

Related

Python boto3 step function list executions filter

I am trying to retrieve a specific step function execution input in the past using the list_executions and describe_execution functions in boto3, first to retrieve all the calls and then to get the execution input (I can't use describe_execution directly as I do not know the full state machine ARN). However, list_executions does not accept a filter argument (such as "name"), so there is no way to return partial matches, but rather it returns all (successful) executions.
The solution for now has been to list all the executions and then loop over the list and select the right one. The issue is that this function can return a max 1000 newest records (as per the documentation), which will soon be an issue as there will be more than 1000 executions and I will need to get old executions.
Is there a way to specify a filter in the list_executions/describe_execution function to retrieve execution partially filtered, for ex. using prefix?
import boto3
sf=boto3.client("stepfunctions").list_executions(
stateMachineArn="arn:aws:states:something-something",
statusFilter="SUCCEEDED",
maxResults=1000
)
You are right that the SFN APIs like ListExecutions do not expose other filtering options. Nonetheless, here are two ideas to make your task of searching execution inputs easier:
Use the ListExecutions Paginator to help with looping through the response items.
If you know in advance which inputs are of interest, add a Task to the State Machine to persist execution inputs and ARNs to, say, a DynamoDB table, in a manner that makes subsequent searches easier.

How to use synchronous messages on rabbit queue?

I have a node.js function that needs to be executed for each order on my application. In this function my app gets an order number from a oracle database, process the order and then adds + 1 to that number on the database (needs to be the last thing on the function because order can fail and therefore the number will not be used).
If all recieved orders at time T are processed at the same time (asynchronously) then the same order number will be used for multiple orders and I don't want that.
So I used rabbit to try to remedy this situation since it was a queue. It seems that the processes finishes in the order they should, but a second process does NOT wait for the first one to finish (ack) to begin, so in the end I'm having the same problem of using the same order number multiple times.
Is there anyway I can configure my queue to process one message at a time? To only start process n+1 when process n has been acknowledged?
This would be a life saver to me!
If the problem is to avoid duplicate order numbers, then use an Oracle sequence, or use an identity column when you insert into a table to generate the order number:
CREATE TABLE mytab (
id NUMBER GENERATED BY DEFAULT ON NULL AS IDENTITY(START WITH 1),
data VARCHAR2(20));
INSERT INTO mytab (data) VALUES ('abc');
INSERT INTO mytab (data) VALUES ('def');
SELECT * FROM mytab;
This will give:
ID DATA
---------- --------------------
1 abc
2 def
If the problem is that you want orders to be processed sequentially, then don't pull an order from the queue until the previous one is finished. This will limit your throughput, so you need to understand your requirements and make some architectural decisions.
Overall, it sounds Oracle Advanced Queuing would be a good fit. See the node-oracledb documentation on AQ.

Task server on ML

I have a query that may return up to 2000 documents.
Within these documents I need six pcdata items return as string values.
There is a possiblity, since the documents size range from small to very large,
exp tree cache error.
I am looking at spawn-function to break up my result set.
I will pass wildcard values, based on known "unique key structure", and will know the max number of results possible;each wildcard values will return 100 documents max.
Note: The pcdata for the unique key structure does have a range index on it.
Am I on the right track with below?
The task server will create three tasks.
The task server will allow multiple queries to run, but what stops them all running simultaneously and blowing out the exp tree cache?
i.e. What, if anything, forces one thread to wait for another? Or one task to wait for another so they all do not blow out the exp tree cache together?
xquery version "1.0-ml";
let $messages :=
(:each wildcard values will return 100 documents max:)
for $message in ("WILDCARDVAL1","WILDCARDVAL2", "WILDCARDVAL3")
let $_ := xdmp:log("Starting")
return
xdmp:spawn-function(function() {
let $_ := xdmp:sleep(5000)
let $_ := xdmp:log(concat("Searching on wildcard val=", $message))
return concat("100 pcdata items from the matched documents for ", $message) },
<options xmlns="xdmp:eval">
<result>true</result>
<transaction-mode>update-auto-commit</transaction-mode>
</options>)
return $messages
The Task Server configuration listed in the Admin UI defines the maximum number of simultaneous threads. If more tasks are spawned than there are threads, they are queued (FIFO I think, although ML9 has task priority options that modify that behavior), and the first queued task takes the next available thread.
The <result>true</result> option will force the spawning query to block until the tasks return. The tasks themselves are run independently and in parallel, and they don't wait on each other to finish. You may still run into problems with the expanded tree cache, but by splitting up the query into smaller ones, it could be less likely.
For a better understanding of why you are blowing out the cache, take a look at the functions xdmp:query-trace() and xdmp:query-meters(). Using the Task Server is more of a brute force solution, and you will probably get better results by optimizing your queries using information from those functions.
If you can't make your query more selective than 2000 documents, but you only need a few string values, consider creating range indexes on those values and using cts:values to select only those values directly from the index, filtered by the query. That method would avoid forcing the database to load documents into the cache.
It might be more efficient to use MarkLogic's capability to return co-occurrences, or even 3+ tuples of value combinations from within documents using functions like cts:values. You can blend in a (cts:uri-reference](http://docs.marklogic.com/cts:uri-reference) to get the document uri returned as part of the tuples.
It requires having range indexes on all those values though..
HTH!

Excel Get & Transform (Power Query) M Code Style and Performance

I've created a few reasonably complex M queries and have started running into some severe performance issues. I'm wondering if has to do with how I sometime organize my code.
The issues I've been having are:
1) Power Query constantly uses all of several CPU cores, calculating something, even if I'm not waiting for a result.
2) In task manager I can sometimes see that the Power Query threads ("Microsoft.mashup.Container.NetFX40.exe") are nearly idle, while Excel.exe is using 100% of one core for tens of minutes - even though at most I'm looking values in a few parameter tables that don't contain more than a couple dozen cells.
3) Some steps take extremely long to calculate, even though the operations involved are trivial. For example, I have a list of 10 text values taken from an Excel table. This list appears as one of my query steps when I 'preview' it. Then I want to remove a single value, so the next step = List.RemoveItems(myList, {"val"}). It didn't compute after 30 minutes, even though I could see the list was correctly loaded in a previous step.
4) UI sometimes becomes unresponsive for several minutes after changing code. Can still right-click on Queries at left hand side to enter advanced editor, and click the red X at top right and choose to keep changes, but all the rest is unresponsive. Not greyed out, just unresponsive.
Anyway, I just wanted to ask if anyone's had similar trouble, and if anyone knows what triggers particularly bad performance in PQ.
I'll often use something like the following pattern to keep the total number of queries down while still being able to easily inspect individual steps:
let
ThisWB = Excel.CurrentWorkbook(),
CfgTbl = ThisWB{[Name="myCfgTbl"]}[Content],
x = aFn(CfgTbl),
y = bFn(CfgTbl),
output = [ThisWB=ThisWB, CfgTbl=CfgTbl, x=x, y=y]
in
output
Is this likely to lead to any issues? Just thought it might because at one point after waiting a very long time for a simple function result, I created a new query = Excel.CurrentWorkbook(){[Name="myCfgTbl"]}[Content], referenced it from the other query, and my result calculated immediately. No idea why.
It calculates previews. Turn off auto preview generation.
I messed with something like this in cases with formula-heavy tables.
The rest probably requires code examples, especially your last case.
BTW, is your version of power query (or Excel 2016) up-to-date?

Cassandra - Write doesn't fail, but values aren't inserted

I have a cluster of 3 Cassandra 2.0 nodes. My application I wrote a test which tries to write and read some data into/from Cassandra. In general this works fine.
The curiosity is that after I restarted my computer, this test will fail, because after writting I read the same value I´ve write before and there I get null instead of the value, but the was no exception while writing.
If I manually truncate the used column family, the test will pass. After that I can execute this test how often I want, it passes again and again. Furthermore it doesn´t matter if there are values in the Cassandra or not. The result is alwalys the same.
If I look at the CLI and the CQL-shell there are two different views:
Does anyone have an ideas what is going wrong? The timestamp in the CLI is updated after re-execution, so it seems to be a read-problem?
A part of my code:
For inserts I tried
Insert.Options insert = QueryBuilder.insertInto(KEYSPACE_NAME,TABLENAME)
.value(ID, id)
.value(JAHR, zonedDateTime.getYear())
.value(MONAT, zonedDateTime.getMonthValue())
.value(ZEITPUNKT, date)
.value(WERT, entry.getValue())
.using(timestamp(System.nanoTime() / 1000));
and
Insert insert = QueryBuilder.insertInto(KEYSPACE_NAME,TABLENAME)
.value(ID, id)
.value(JAHR, zonedDateTime.getYear())
.value(MONAT, zonedDateTime.getMonthValue())
.value(ZEITPUNKT, date)
.value(WERT, entry.getValue());
My select looks like
Select.Where select = QueryBuilder.select(WERT)
.from(KEYSPACE_NAME,TABLENAME)
.where(eq(ID, id))
.and(eq(JAHR, zonedDateTime.getYear()))
.and(eq(MONAT, zonedDateTime.getMonthValue()))
.and(eq(ZEITPUNKT, Date.from(instant)));
Consistencylevel is QUORUM (for both) and replicationfactor 3
I'd say this seems to be a problem with timestamps since a truncate solves the problem. In Cassandra last write wins and this could be a problem caused by the use of System.nanoTime() since
This method can only be used to measure elapsed time and is not related to any other notion of system or wall-clock time.
...
The values returned by this method become meaningful only when the difference between two such values, obtained within the same instance of a Java virtual machine, is computed.
http://docs.oracle.com/javase/7/docs/api/java/lang/System.html#nanoTime()
This means that the write that occured before the restart could have been performed "in the future" compared to the write after the restart. This would not fail the query, but the written value would simply not be visible due to the fact that there is a "newer" value available.
Do you have a requirement to use sub-millisecond precision for the insert timestamps? If possible I would recommend using System.currentTimeMillis() instead of nanoTime().
http://docs.oracle.com/javase/7/docs/api/java/lang/System.html#currentTimeMillis()
If you have a requirement to use sub-millisecond precision it would be possible to use System.currentTimeMillis() with some kind of atomic counter that ranged between 0-999 and then use that as a timestamp. This would however break if multiple clients insert the same row at the same time.

Resources