Updating a table "Y" in file "B" with new added rows from table "X" in file "A"

Updating a table "Y" in file "B" with new added rows from table "X" in file "A" - excel

I am trying to create an "instant cloud flow" on Power Automate.
Context:
Excel file "A" contains a table "X" which gets updated regularly with new rows at the bottom containing new information.
Excel file "B" contains a table "Y" with the same characteristics, number of columns, headers etc. Except for the number of rows since the table "X" is being updated regularly with new rows of data.
Both files are stored on OneDrive cloud and will possibly move into Sharepoint file storage, so they will be in the cloud, not stored locally on any device.
Challenge/need:
I need table "Y", within file "B", to mirror the changes happening on table "X" from file "A". Specifically the new rows of data being added to table "X":
Internet/world > New rows of data at the bottom of Table "X" of file "A" > These same new rows get copied into also the bottom of Table "Y" of file "B". Basically both tables, "X" and "Y" need to stay exactly the same with a maximum interval of 3 minutes.
Solution tried:
I tried a flow which gets triggered every minute. In this flow, I tried creating an array containing the new rows of data added to table "X". Then using the Apply to each control with the values from this new array, I tried the actions Add a row into a table, followed by Update a row for each item inside this array. Keeping in this way table "Y" updated as per table "X". This part works, rows are added and updated on table "Y".
My problem:
The Condition that compares the data from the 2 tables, decides that all rows from table "X" are new data, even though some are already present in table "Y". This is a problem because too many rows are added to table "Y" and the tables become out of sync due to the difference in the number of rows/body length. In my understanding, this happens because an item/object is generated by List rows present in a table called ItemInternalId.
This ItemInternalId generates different id numbers for the same rows already updated previously, and because of this, the condition identifies all rows on table "X" as new data to be updated on table "Y".
Questions:
Could someone confirm that this ItemInternalId is the problem here? I am in doubt because I tried removing this by creating another array using the Select action and then proceeded using just the columns/headers I need, excluding this way ItemInternalId. Problem is that the "header" is excluded (which I need), containing only the value, and also the condition proceeds to identify all rows on "X" as new data again anyway...
Maybe the problem is that I am doing it wrong and there is another simple, or better way to get an array with the new items from table "X"? Here is the condition that I use to try to feed feed a new Array with the new rows from table "X":
Thank you

I found a workaround. I will not accept this as the right answer because it is just a workaround not the definitive solution to the problem.
Basically, The file "A" needs to have a "X" table with just 1 blank row. The Power Automate flow will "add new rows" with the information to this table.
Then on file "B" the table "Y" will need to be created with a certain amount of rows depending on how much data comes in per day, but can be like 100. Then create a Power automate flow that "updates the table" this will add the information from "X" table to "Y" table.
Please be aware that you will need a Key column on both tables so that Power automate knows what rows to update. You can just use basic numerical order for each row on the Key column.

Related

Excel data tables: Multiple outputs with only one input column

I am trying to create a data table with multiple outputs across periods, but for the same scenarios.
Is it possible to create that without inserting an extra column between each output column to deliver input for the data table (i.e. input column = index 50-110).
Is this in any way possible? See picture of what I would usually mark to create the data table (this does only cover one period/output though). But if I were to make the scenario for FY23, then I would need to insert a column between FY22 and FY23 where I copy the index 50-110 again. I would like to not have to do that.

Upsert Option in ADF Copy Activity

With the "upsert option" , should I expect to see "0" as "Rows Written" in a copy activity result summary?
My situation is this: The source and sink table columns are not exactly the same but the Key columns to tell it how to know the write behavior are correct.
I have tested and made sure that it does actually do insert or update based on the data I give to it BUT what I don't understand is if I make ZERO changes and just keep running the pipeline , why does it not show "zero" in the Rows Written summary?

The main reason why rowsWritten is not shown as 0 even when the source and destination have same data is:
Upsert inserts data when a key column value is absent in target table and updates the values of other rows whenever the key column is found in target table.
Hence, it is modifying all records irrespective of the changes in data. As in SQL Merge, there is no way to tell copy activity that if an entire row already exists in target table, then ignore that case.
So, even when key_column matches, it is going to update the values for rest of the columns and hence counted as row written. The following is an example of 2 cases
The rows of source and sink are same:
The rows present:
id,gname
1,Ana
2,Ceb
3,Topias
4,Jerax
6,Miracle
When inserting completely new rows:
The rows present in source are (where sink data is as above):
id,gname
8,Sumail
9,ATF

Delete bottom two rows in Azure Data Flow

I would like to delete the bottom two rows of an excel file in ADF, but I don't know how to do it.
The flow I am thinking of is this.
enter image description here
*I intend to filter -> delete the rows to be deleted in yellow.
The file has over 40,000 rows of data and is updated once a month. (The number of rows changes with each update, so the condition must be specified with a function.)
The contents of the file are also shown here.
The bottom two lines contain spaces and asterisks.
enter image description here
Any help would be appreciated.
I'm new to Azure and having trouble.
I need your help.

Add a surrogate key transformation to put a row number on each row. Add a new branch to duplicate the stream and in that new branch, add an aggregate.
Use the aggregate transformation to find the max() value of the surrogate key counter.
Then subtract 2 from that max number and filter for just the rows up to that max-2.

Let me provide a more detailed answer here ... I think I can get it in here without writing a separate blog.
The simplest way to filter out the final 2 rows is a pattern depicted in the screenshot here. Instead of the new branch, I just created 2 sources both pointing to the same data source. The 2nd stream is there just to get a row count and store it in a cached sink. For the aggregation expression I used this: "count(1)" as the row count aggregator.
In the first stream, that is the primary data processing stream, I add a Surrogate Key transformation so that I can have a row number for each row. I called my key column "sk".
Finally, set the Filter transformation to only allow rows with a row number <= the max row count from the cached sink minus 2.
The Filter expression looks like this: sk <= cachedSink#output().rowcount-2

Default date aggregation for Excel

What is the default behavior of adding a date, time, or datetime into an Excel pivot row/column? I have seen it sometimes add it as the "raw value", sometimes it will add it as a Year > Query > Value, and other times (?) perhaps in between. For example:
When does Excel add it without aggregating it, and when does Excel aggregate it? Does it have to do with value cardinality, date range, or something else?

First, every entry in the column has to be a date/time or you won't be able to group them. In that case, obviously, the default would be not grouped.
Assuming everything is groupable, the default is no grouping. Each date will show individually.
The exception is if a pivot cache already exists. In that case it will group based on what the pivot cache says - the last way that field was grouped. This happens when you have more than one pivot table on the same data. The first pivot table creates the cache and all subsequent pivot tables use that existing cache.
In a new workbook (2010), I add a date field to the Row Labels and they are initially ungrouped by default.
I group them by month
Now I go back to the original data and make a new pivot table. I add the date field to the Column Labels.
Because it uses the same cache, it automatically has them grouped the same way. Finally, I go back to the source data and replace one of the dates with a string. If I create another pivot table, it will look like the others. But when I refresh it ungroups them because I have a non-date in there.
And if I try to Group now, it says "Cannot group that selection"
That's why it works the way it does - shared pivot cache. There are ways you can give each pivot table it's own cache but that uses more memory. However, if you want to group the same data differently, that's what you have to do.

Extracting specific rows from one table to another on Excel

I have a three-column table on excel to keep track of my expenditures; the first column shows what the purpose of the expenditure was, the second shows the amount and on the third I put a "P" for transactions that are still pending on my credit card. I wish to extract the rows of the transactions that are still pending to another table (same sheet) so that I can view what is still pending seperately. It should auto update i.e. new entries should be added as I add new transactions to my table and older entries should be removed if I remove the P from the main table (the transaction gets posted). Thank you in advance.

I would just use filtering. Select the column headers (presumably A1 to C1), then click Home -> Editing (farthest section on right by default) -> Sort & Filter -> Filter.
The column headers then have a drop down menu that allow you to sort your fields alphabetically or from lowest to highest (or vice versa) or oldest to newest. Then below you can select what values to show, in this case "Pending", and only those rows will show. Bottom left of Excel will tell you how many rows out of total rows are showing too so you can quickly see how many transactions are pending.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string