Excel Power Query - Incremental Load from Query and adding date - excel

I am trying to do something quite simple which I am failing to understand.
Take the output from a query, date time stamp and write it into a Excel table.
Iterate the logic again and you get the same output but the generated date time has progressed in time.
Query 1 -- From SQL which yields 2 columns category, count.
I am taking this and adding a generated date to it using DateTime.LocalNow().
Query 2 -- Target table
How can i construct a query which adds to an existing table and doesnt require me to load the result into a new table.
I have seen this blog.oraylis.de and i cant make it work since the DateTime.LocalNow() call runs for source and target and i end up with the same datetime throughout the query.
I think i am missing something obvious.
EDIT:-
= Table.Combine({SOURCE_DATA, TARGET_DATA})
This loads into a 3rd new table and doesnt take into account that 3rd table when loading - so you just end up with a new version of just the first two tables with new timestamp

These steps should work
create a query Q1 based on the SQL Statement, add your timestamp using DateTime.LocalNow() and load this into an Excel table (execute the query)
create a new query Q2 based on this Excel new table (just like that, no transforms)
Modify the first query Q1 by adding the Table.Combine with Q2 as the last step.
So, in other words, Q2 loads the existing data from the Excel table into which Q1 writes. The Excel table is always written completely but since the existing data is preserved you will get the result of new data being loaded to the table. Hope this helps.
Good luck, Hilmar

Related

Output updated rows following Spark DML statements

When updating data in a SQL Server database, the updated records of update, insert, delete and merge statements can out retrieved by adding an output clause.
This is particularly useful when there is a merge statement that retains some parts of the old record within the new, merged version (such as a PreviousVersion or PreviousDate type column).
output allows that data to be carried forwards into another process, as it returns the merged version of the record without having to query the target table again. This facilitates further processing only the newly arrived data, including the updates produced by the merge, without having to execute a subsequent select on the target table (e.g. filtering on an UpdatedDate type column) or as a join from the new data into the updated target table.
Having looked through the documentation for Spark, I can't see any way of replicating this output clause behaviour without an additional read of or join onto the target table. Is there a way of outputting only the updated records from a merge statement and if not, what is the best way to achieve something similar?
An example of this logic would be something like:
New Data
ID
Start
End
1
2022-01-01
2022-08-01
Target Table
ID
Start
End
PreviousEnd
1
2022-01-01
2022-07-01
2022-06-01
MANY
MORE
DATA
ROWS
Merge Logic (pseudo)
when matched
target.End = source.End
and target.PreviousEnd = target.End
output updated
Merge Output (Just one data row)
ID
Start
End
PreviousEnd
1
2022-01-01
2022-08-01
2022-07-01
And then from this point the output row can be used to go and (as an easy example) add an additional month of time (End - PreviousEnd) to a summary held somewhere else, without having to query into the larger target table a second time.

Remove Duplicates based on latest date in power query

I got a dataset that I am loading into my sheet via power query and wish to transform the data a little bit according to my liking before loading it in.
To give a little more context, I have some ID's and I would like the older rows to be removed and the rows which have the newer date to be loaded in.
Solution is described at https://exceleratorbi.com.au/remove-duplicates-keep-last-record-power-query/
"Remove Duplicates and Keep the Last Record with Power Query"
In short, sort per date in a buffered table and then remove duplicate id
Another way I think would be to group by id and get MAX date but it depends of the data size

Spotfire- limiting Information link colum expression

I have a column of data [Sales ID] that bringing in duplicate data for an analysis. My goal is to try and limit the data to pull unique sales ID's for the max day of every month in the analysis only (instead of daily). Im basically trying to get it to only pull in unique sales ID values for the last the day of every month in the analysis ,and if the current day is the last day so far then it should pull that in. So it should pull in the MAX date in any given month. Please how do i write an expresion with the [Sales ID] column and [Date ] column to acieve this?
Probably the two easiest options are to
1) Adjust the SQL as niko mentioned
2) Limit the visualization with the "Limit Data Using Expression" option, using the following:
Rank(Day([DATE]), "desc", Month([DATE]), Year([DATE])) = 1
If you had to do it in the Data on Demand section (maybe the IL itself is a usp or you don't have permission to edit it), my preference would be to create another data table that only has the max dates for each month, and then filter your first data table by that.
However, if you really need to do it in the Data on Demand section, then I'm guessing you don't have the ability to create your own information links. This would mean you can't key off additional data tables, and you're probably going to have to get creative.
Constraints of creativity include needing to know the "rules" of your data -- are you pulling the data in daily? Once a week? Do you have today's data, or today - 2? You could probably write a python script to grab the last day of every month for the last 10 years, and then whatever yesterday's date was, and throw all those values into a document property. This would allow you to do a "Values from Property".
(Side Note: I want to say you could also do it directly in the expression portion with something like an extremely long
Date(DateTimeNow()),DateAdd("dd",-1,Date(Year(DateTimeNow()), Month(DateTimeNow()), 1))
But Spotfire is refusing to accept that as multiple values. Interestingly, when I pull the logic for a StringList property, it gives this: $map("${udDates}", ","), which suggests commas are an accurate methodology, but I get an error reading "Expected 'End of expression' but found ','" . Uncertain if this is a Spotfire issue, or related to my database connection)
tl;dr -- Doing it in the Data on Demand section is probably convoluted. Recommend adjusting in SQL if possible, and otherwise limiting in the visualization

Is there a workaround for the maximum length of an ODBCConnection.CommandText string in VBA?

I have a VBA script that generates a query string for a SAP HANA ODBC Connection in Excel. The query is determined by user inputs and can vary greatly in length. The query itself uses many versions of a similar query appended to one another using UNION ALL syntax.
The script sometimes throws a runtime error when trying to refresh. From my research, it has become clear that the reason for this is that the CommandText string exceeds a maximum allowed length of 32,767 (https://ask.sqlservercentral.com/questions/50819/too-long-sql-in-excel-vba.html).
I wondered whether there is a workaround for this, other than using a stored procedure (I am not against this if there is a way to create a stored procedure at runtime then execute it, but I cannot use a predefined stored procedure as my query is always different hence the need for VBA to create it)
Some more info about the dynamic query in VBA:
Column names, as well as parameters, are created dynamically and can be different every time
The query uses groups of lists of product numbers to generate an IN statement for each product group, then sums the sales for those products under the name of the group. These are then all UNION'd together to create one table with grouped records
Example of user input:
Example of resulting query:
WITH SOME_CTE (SOME_FIELDS) AS
(SELECT SOME_STUFF
FROM SOME_TABLE
WHERE SOME_STUFF_IS_GOING_ON)
SELECT GEND "Gender", 'Attribute 1' "Attribute", SUM(UNITS) "Units", SUM(VAL) "Value", SUM(MARGIN) "Margin"
FROM SOME_CTE
WHERE PRODUCT IN ('12345', '23456', '34567', '45678')
GROUP BY GEND
UNION ALL
SELECT GEND, 'Attribute 2' ATTR_NAME, SUM(UNITS), SUM(VAL), SUM(MARGIN)
FROM SOME_CTE
WHERE PRODUCT IN ('01234', '02345', '03456', '03567')
GROUP BY GEND
ORDER BY "Gender", "Attribute"
...and so on.
As you can see, with 2 attribute groups containing 4 products each there is no problem, but when we get to about 30 with several hundred each, it could be too long.
Note: I have tried things like shortening field references in the repeated parts of the query string to 1 character etc. which helps but does not solve the problem.
Any help would be greatly appreciated.
One workaround is to send multiple queries. Since you are using union all, you could execute every time single select statement, i.e.
create table in (for example) master database (don't create temporary tables! as they will be dropped after every query) - but before that, make sure you create new table, so delete old one if exists (also drop the table after you are done with it). Now every single select statement you'll change to insert statement, which will insert records to your so-called temporary table.
This way, you'll avoid lengthy queries, you'll just send single insert .. into.. select statements.
At the end, to get all results, you just need simple select query. After getting this data, you should drop that table, as it's no longer needed.

Performance tuning in Cognos Report Studio

Working in Cognos Report Studio 10.2.1. I have two query items. First query item is the base table which results in some million records. Second query item is coming from a different table. I need to LEFT OUTER JOIN the first query item with other. In the third query item post the join, I am filtering on a date column which is in formatYYYYMM to give me records falling under 201406 i.e the current Month and Year. This is the common column in both the table apart from AcctNo which is used to join both the tables. The problem is, when I try to view Tabular datathe report takes forever to run. After waiting patiently for 30 mins, I just have to cancel the report. When I add the same filter criteria to the 1st query item on the date column and then view the third query item, it gives me the output. But in the long run, I have to join multiple tables with this base table and in one of the table the filter criteria needs to give output for two months. I am converting a SAS code to Cognos, In SAS code, there is no filter on the base table and even then the join query takes few seconds to run.
My question is: Is there any way to improve the performance of the query so that it runs and more importantly runs in less time? Pl note: Modelling my query in FM is not an option in this case.
I was able to get this resolved myself after many trial and errors.
What I did is created a copy of 1st Query item, and filtered 1st query item with current month and year and the for the copy of 1st query item added a filter for two months. That way I was able to run my query and get the desired results.
Though this is a rare case scenario, hope it helps someone else.

Resources