Remove Duplicates based on latest date in power query - excel

I got a dataset that I am loading into my sheet via power query and wish to transform the data a little bit according to my liking before loading it in.
To give a little more context, I have some ID's and I would like the older rows to be removed and the rows which have the newer date to be loaded in.

Solution is described at https://exceleratorbi.com.au/remove-duplicates-keep-last-record-power-query/
"Remove Duplicates and Keep the Last Record with Power Query"
In short, sort per date in a buffered table and then remove duplicate id
Another way I think would be to group by id and get MAX date but it depends of the data size

Related

Default date aggregation for Excel

What is the default behavior of adding a date, time, or datetime into an Excel pivot row/column? I have seen it sometimes add it as the "raw value", sometimes it will add it as a Year > Query > Value, and other times (?) perhaps in between. For example:
When does Excel add it without aggregating it, and when does Excel aggregate it? Does it have to do with value cardinality, date range, or something else?
First, every entry in the column has to be a date/time or you won't be able to group them. In that case, obviously, the default would be not grouped.
Assuming everything is groupable, the default is no grouping. Each date will show individually.
The exception is if a pivot cache already exists. In that case it will group based on what the pivot cache says - the last way that field was grouped. This happens when you have more than one pivot table on the same data. The first pivot table creates the cache and all subsequent pivot tables use that existing cache.
In a new workbook (2010), I add a date field to the Row Labels and they are initially ungrouped by default.
I group them by month
Now I go back to the original data and make a new pivot table. I add the date field to the Column Labels.
Because it uses the same cache, it automatically has them grouped the same way. Finally, I go back to the source data and replace one of the dates with a string. If I create another pivot table, it will look like the others. But when I refresh it ungroups them because I have a non-date in there.
And if I try to Group now, it says "Cannot group that selection"
That's why it works the way it does - shared pivot cache. There are ways you can give each pivot table it's own cache but that uses more memory. However, if you want to group the same data differently, that's what you have to do.

Excel XP for inventory management Simple in theory but can't achieve it

I need to create a simple and easy to use inventory management sheet or database.
It may be better to use access as I see it, but people is more familiar using excel.
Imagine a warehouse where we store goods. Goods are often delivered so I have to reduce the stock cound for a particular item. Then if the warehouse is short in some goods, more of these are bought.
The thing is I need to store a history of delivers we make but also store per item the actual count.
I thought having a column for initial items count, then add ins and subtract outs.
I tried using db functions, dynamic tables etc, but the problem is that when I add new registries for new item outs and ins, the dynamic table wont resize it's source and the same for range for functions.
What would be the best way to achieve what I want?
The thing is that we do everything manually, counting, summing and subtracting each time we have ins and outs.
I didn't want to make something overcomplicated to use, but rather save time by automatinc the ins and outs calculations and making it easier to search for particular registries.
You can do this with an Excel Table and a pivot table (or some formulas). The columns should include date, item code, transaction type (coming in or going out), number of units. You can add columns with more information.
Next, enter a starting stock for each item code. Then enter new lines for each transaction. If you have bought new stock, put a positive number into Units. If you have sold or delivered stock, put a negative number.
Then you can build a pivot table that calculates the totals per item code (or use formulas). You can build other pivot tables to calculate values per month or using other data you may want to include in the data entry table.
An Excel Table will automatically adjust formulas and formatting to new rows. If you base the pivot table on the Excel Table, you only need to refresh the pivot table after you have entered new data. If you prefer formulas, you can use Sumifs(), but you need to keep the list of items for the stock totals list up to date manually.

Spotfire- limiting Information link colum expression

I have a column of data [Sales ID] that bringing in duplicate data for an analysis. My goal is to try and limit the data to pull unique sales ID's for the max day of every month in the analysis only (instead of daily). Im basically trying to get it to only pull in unique sales ID values for the last the day of every month in the analysis ,and if the current day is the last day so far then it should pull that in. So it should pull in the MAX date in any given month. Please how do i write an expresion with the [Sales ID] column and [Date ] column to acieve this?
Probably the two easiest options are to
1) Adjust the SQL as niko mentioned
2) Limit the visualization with the "Limit Data Using Expression" option, using the following:
Rank(Day([DATE]), "desc", Month([DATE]), Year([DATE])) = 1
If you had to do it in the Data on Demand section (maybe the IL itself is a usp or you don't have permission to edit it), my preference would be to create another data table that only has the max dates for each month, and then filter your first data table by that.
However, if you really need to do it in the Data on Demand section, then I'm guessing you don't have the ability to create your own information links. This would mean you can't key off additional data tables, and you're probably going to have to get creative.
Constraints of creativity include needing to know the "rules" of your data -- are you pulling the data in daily? Once a week? Do you have today's data, or today - 2? You could probably write a python script to grab the last day of every month for the last 10 years, and then whatever yesterday's date was, and throw all those values into a document property. This would allow you to do a "Values from Property".
(Side Note: I want to say you could also do it directly in the expression portion with something like an extremely long
Date(DateTimeNow()),DateAdd("dd",-1,Date(Year(DateTimeNow()), Month(DateTimeNow()), 1))
But Spotfire is refusing to accept that as multiple values. Interestingly, when I pull the logic for a StringList property, it gives this: $map("${udDates}", ","), which suggests commas are an accurate methodology, but I get an error reading "Expected 'End of expression' but found ','" . Uncertain if this is a Spotfire issue, or related to my database connection)
tl;dr -- Doing it in the Data on Demand section is probably convoluted. Recommend adjusting in SQL if possible, and otherwise limiting in the visualization

What is the best way to move out-of-order Access records into the proper order by using a locked ID field?

I have roughly 1500 records in an Access database. I have a field ID that acts as the primary key, and as such cannot be manually changed. After looking through the original Excel sheet these records were kept in, I noticed that a few records in Excel were missing from the Access database. After going through all of them, I added the three missing records into Access.
This database stores records in date order, grouped by a manufacturer. Ex. records from Manufacturer1 collected during week 1 of June '16 are all located together, and records from Manufacturer2 collected during week 2 of June '16 are stored directly afterwards. This is important for us because the data in this database often needs to be looked at visually, so keeping things in date order is essential. There is also a macro that export the data to an Excel sheet and formats it to be easier to read, which exports the records in the order in which they are stored (by the ID field). This is a problem because the three missing records are from years past - now they are in the middle of records from 2018. The IDs they were assigned upon entry keeps them in that location.
Is there a way to reliably insert these records into the database in the location at which they should be? Such as shifting the values of other records ID fields down by 3 to allow room for the missing records? I know I can probably manually have those three records move to the desired location in the macro that exports to Excel, but I'd rather have a less hacky solution that could work if a similar problem happens again.
The order of data in a database is of no interest to the database - it's the relation between data that matters.
To always view your data in the order you want use the ORDER BY clause in an SQL statement. Generally you can add data to the underlying table directly through the query - unless you've got many-to-one type queries where your update would need to affect more than one record.
SELECT FieldName1, FieldName2, . . . .
FROM MyDataTable
ORDER BY Manufacturer, Date
Edit: Even here you'll be adding new records to the bottom of the dataset, but refreshing the query will move the records to the correct order.

How do I create report-like data tables in Excel?

In the past I have created websites that extract data from a database and format it using tables.
Now, I am trying to do the same thing but with Excel, and I'm lost. I am used to using SQL commands to extract data from given fields and then sort/manipulate it.
Currently, I am able to print a report that provides me with an Excel spreadsheet full of raw data, but I would like to make my life easier and organize it into a report.
The column that I would like to reference contains duplicates, but the data in the adjacent columns is different.
To give an example, assume I had a spreadsheet of sales transactions. One column would be the Customer ID, and the adjacent columns would contain the quantity, the cost per unit, total cost, order ID, etc.
What I would want to do in this case would be to select all the transactions with the same Customer ID and add them together based on their Order ID. Then, I would want to print the result to a second sheet.
I realize that I can use built-in functions to accomplish this, but I would also like to format this report evenually using VBA. Also, since I will have a variable number of rows that differ from one report to the next, I haven't encountered a fucnction that will allow you to add rows.
I'm assuming this must be done with VBA.
Well you can do it manually, but it would take ages. So VBA would be good, particularly as you would be able to generate future reports quickly.
My interpretation of what your saying is that each row in your report will be the total for one customer ID. If it's something else, I imagine the below will still be mostly relevant.
I think it would be a bit much to give you the full answer, particularly as you haven't provided full detail but to take a stab at what you'd do:
Create your empty report page, whether it be a new worksheet or a new workbook
Loop through the table (probably using While next is not empty)
a. Identifying if a row is for a customer ID you haven't covered yet
i. If so then add a new entry in your report
ii. Else add it to the existing customer ID record (loop through until you find it)
Format your report so it looks pretty, e.g:
a. Fill the background in white
b. Throw in some filled bars
c. Put in good titles and totals etc.
For part 1, it might be better building an array first and then dumping the contents into the report. It depends how process intensive it will be - if very intense, an array should shave off time.

Resources