Get duplicates from two columns - excel

I'm having trouble filtering an excel table. M, it is a set of two rows from two tables, where it is necessary to find duplicates.
2 rows with duplicates
Some idents are repeated, they are present both in the current and previous months. In the example below, with the help with this function =IFERROR(MATCH(A2;B:B;0); "NO"), I obtained information about which data from last month is repeated in the current month and exactly in which row it is located. The code for determining whether it is repeated is as follows =COUNTIFS($A$2:$B$13;A2)>1
duplicates and if repeated
I would like to retrieve only duplicates from the list, I tried the code =IFERROR(INDEX(A:A;SMALL(IF(NOT(D$2:D$104=TRUE);ROW(B2)-ROW(INDEX(B2;1;1))+1);ROW(G:G)));" ERROR")to get the ones that are repeat and skip those ones that arent, but the result is not as desired. In line G, you can see an example of how Excel gives me data regarding the entered function. In cell H, it is shown how I would like a new row to be created with only duplicates.
Current vs. desired display
In this example, the columns are a bit small, but in reality there could be at least a thousand rows, so I would need help filtering those.

You implied these columns were present in two different tables. So I used Tables with structured references. You can convert to normal addressing if you require that instead.
If you have Windows Excel 2021 or later, you can use:
=FILTERXML("<t><s>" &TEXTJOIN("</s><s>",,UNIQUE(LastMonth[Last month marks],FALSE,TRUE),UNIQUE(CurrentMonth[Current Month],FALSE,TRUE))& "</s></t>","//s[following::*=.]")
Create a list of distinct items for each row
Create an XML by concatenating the items into an array using Textjoin
Extract only those items that are followed by an identical item
With your earlier version of Excel, again, I would still use Tables and structured references but I would also use a Helper Column
D2: =IFERROR(MATCH(lastMonth[#[last month]],currentMonth[current month],0),"NO") *and fill down*
E2: =IFERROR(INDEX(currentMonth[current month], AGGREGATE(15,6,[Duplicates in Which Row],ROWS($1:1))),"")

Related

Excel automatically sort column of numbers

I am looking to automatically sort a column of numbers in descending order without touching the sort button, or VBA.
Unfortunately I am trying to achieve this in a work environment where I have no access to VBA and excel is not one of the latest versions that contains the new SORT function in 365.
It is quite literally a column of numbers, and the numbers are automatically updated as they are totals of rows of smaller numbers elsewhere, and these change based on something else - so this specific column will always change numbers, but I need the column to automatically keep on top of sorting numbers by descending order.
I tried using rank.eq and some other bits with adding a 1 to each row to avoid duplicates, but this buggered it up if more than two duplicates was found. Is there anything I can do at all? Even if it involves going a very long way round it and building extra tables and things etc.
Grateful for any help.
It'd be easier to see your data and without being able to use a spill range, it's impossible to know how many rows. I also think you're intending to use LARGE function rather than RANK.
If you had your numbers in column A, you could drag the following formula down the appropriate number of rows to get the numbers sorted... (starting in cell B1)
=LARGE(A$1:INDEX(A:A,COUNTA(A:A),1),ROW())
If you can get your numbers in a table, you could use a similar formula but the table would ensure the appropriate rows exist (assume table name is Table1 and note the column names of RawNumbers and Sorted). Put this in Sorted Column:
=LARGE([RawNumbers],ROW()-ROW(Table1[[#Headers],[Sorted]]))
I presume using a pivot table is not a viable option... but these are how you could accomplish your objective of sorting by formula.

How can I make a drop down list in Excel 2013 based on several conditions?

What I would like to achieve is that sellers can choose the STORE in the blue cell (either with a drop down list or by hard-typing the STORE name) and, based on the selection on the blue cell, the available POSITIONS for that particular PRODUCT and that particular STORE are show in the green cell as a drop down list.
Let's say I have an Excel workbook, which contains a worksheet with this table with products data, which is automatically imported daily from our Nav server with this layout. It has 4 columns including PRODUCT CODE, DESCRIPTION, STORE IN WHICH IT CAN BE LOCATED and POSITION INSIDE DE STORE (please, check screenshot). It contains 1.5k rows and it changes dynamically, for example, new items are added or positions are exchanged.
As you can see, the same product (PRODUCT 2) can be located in several stores (STORES 1, 2 and 3), and it can be in several locations on each store (POSITIONS 2, 3, 1 and 4).
Now I need sellers to report which of these items they pick and from where, not only the STORE but its POSITION inside the store too. They do it with another worksheet inside the same Excel workbook. It looks more or less like this (please, check screenshot).
I know the drop down list is achieved via Data Validation but I can't figure out the formula for this. I have tried several approaches like:
Array formula to return all POSITIONS in the same ROW, following this (Formula 2.): https://www.ablebits.com/office-addins-blog/2017/02/22/vlookup-multiple-values-excel/. It is quite slow to calculate on the 1.5k items and, once done, I can't figure out how to make Data Validation to look for the 4 or 5 or 10 POSITIONS returned by the array formula, which also need to be filtered by STORE (please, check screenshot for the closest that I have been, array formula returning POSITIONS from column E).
Same formula as above directly on the Data Validation list box, which returns only the first POSITION found.
VBA custom fucntions which are not allowed in the Data Validation box.
I feel comfortable with both Power Query and VBA, and forumla as well, and can adapt most of the code I see but I don't know why I just can't figure out how to achieve this, maybe it is only I am blocked or something but every path I start to follow ends up in a dead end.
Does anyone have an idea on how to approach this? It doesn't really seem that complicated but it is becoming impossible for me.
Thank you very much for your time!!
This is what I have finally done, just in case someone else is facing this situation.
Instead of a plain-text table for the POSITIONS, I created a PowerQuery importing that CSV. Named that worksheet _LOCATIONS.
Added a custom column (Column E) combining the PRODUCT and the STORE so I had something like a Unique Identificator, resulting something like this but in PowerQuery.
Combined column:
Sorted column E and sub-sorted column D, so I make sure the list will always be ordered as I need, and saved the query.
Then, in worksheet REPORT, I entered this formula to create the drop down list in Data Validation in cell D2:
OFFSET(_LOCATIONS!$D$1,MATCH($A2&"-"&$C2,_LOCATIONS!$E:$E,0)-1,0,COUNTIF(_LOCATIONS!$E:$E,$A2&"-"&$C2))
And I am able to choose from the available POSITIONS for the selected PRODUCT in the selected STORE.
Brief explanation:
I set the reference for the OFFSET function in the very first POSITION (D1), and then I move it the amount of rows detected by the MATCH function (which searches for the "PRODUCT 2-STORE 2" string in the newly created combined column) minus 1 (PoweryQuery table has headers) and 0 columns. This leaves me on the first occurrence of my string (but on the POSITIONS column). Then I make the offset as high as the amount of rows detected by the COUNTIF function (which counts all occurrences of my PRODUCT-STORE pair), returning an array of all the positions (column D) matching the PRODUCT-STORE pair.
Ask for formula in Spanish if you need it.

Combining Non-Zero Values from Multiple Columns into One in Excel

Every month I get given a budget from one of our clients in a Google sheet, which I need to convert into a SQL query so it can be uploaded into our database. As the number of rows and columns changes, I want to write some formula to semi-automate the process for time saving and mistake elimination.
This budget has spends in multiple columns, which I've managed to write formulas to combine into one column, with the correct details in the columns next to it (see example links below).
How I've transformed the data so far
The issue is this budget per country and partner, then has to be split again across multiple options. This leaves me with three columns worth of spend values, that I'd really like to combine back into one column, and ideally skip out all the zero values.
I've found an array formula on this site that will skip the zeroes, but I can't get it to work on more than one column.
=IFERROR(INDEX($U:$U,SMALL(ROW(myRange)*(myRange<>0),SUMPRODUCT(N(myRange=0))+ROWS($1:1))),"")
From this Question's Answer
Is it possible to write a formula, that skips the zero values down one column, and then starts at the next? And that will also allow me to keep the correct matching details from the other columns alongside it, as well as bring in the column headers for the options as entries in a new column?
Thanks
Edit:
Here is the final format I'm looking for:
There is a concatenated field off the end that combines all the columns. Most of the values are populated by various Vlookups, to transform from the text version, into the database IDs, needed to fill the table.
It's also worth saying, that not being able to skip the zeros, is OK, as I can manually delete them fairly easily.
But as the number of countries and partners can and will change, I want the formula to be able to move column at the end of the dataset.

formula that matches against multiple dates and grabs the most recent

I have a data table with two columns. The first column has a list of Project IDs, and the second column has a bunch of dates associated with those projects. A project can have multiple dates associated with it.
I would like to create a separate summary table of two columns. The first column will be a list of unique Project IDs (I've been able to do this with an index/match function). I want the second column to search the dates column and identify the most recent date associated with each project.
Is it possible to create this second column of my summary table using standard excel formulas and without using any VBA? After an hour or two, I'm not convinced that this is possible.
I was hoping that, for a given project ID, there might be a way to do the following:
--> identify the row numbers for all rows that contain a given project ID;
--> use this row number information to grab the corresponding cell values from the dates column (presumably by first constructing a list of cell references)
--> display the max date out of those that are returned.
What my spreadsheet looks like
The AGGREGATE¹ function can quickly calculate a pseudo-MAXIF function.
In E2 as a standard formula,
=AGGREGATE(14, 6, (B$1:INDEX(B:B, MATCH(1E+99,B:B)))/(A$1:INDEX(A:A, MATCH(1E+99,B:B ))=D2), 1)
Fill down as necessary.
Like the SUMPRODUCT function, AGGREGATE benefits from referencing the minimum number of rows necessary. The MATCH(1E+99,B:B) truncates each column referenced by the INDEX function at the extents of the daes in column B.
¹ The AGGREGATE function was introduced with Excel 2010. It is not available in earlier versions.

Optimizing multiple-criteria IFs

I'm performing array calculations that are taking a long time to complete. I'd like to optimize my formulas some more. All of the formulas are of the same nature - they perform some high-level function (Average, Slope, Min, Max) across a column of values. However, not all cells in a column are included in the array. I use multiple IF criteria to choose which cells get included. All comparisons are made to the current row. Here's an example of the data:
A B C D E
1 Company Generation Date Value ToCalculate
2 Abc 1 1/1/2010 5.6
3 ... ... ... ... ...
E would look something like this
{=Average(If(A2=A2:A1000, If(B2=B2:B1000, If(C2 > C2:C1000, D2:D1000))))}
So once E2 is calculated then I have to autofill down column E. Column F, G, H, ... Uses the same approach, either selects different values to operate on or a different function to perform. My dataset is quite large, and with only a few of these the spreadsheet is taking an hour plus to compute. Every so often I'll add a fourth criteria, all other criteria being the same.
Is there an efficiency? Some thoughts:
Can I use a single array per column instead of thousands per column?
Can I condense the first three criteria so that the output is row numbers? Perhaps then subsequent formulas won't have to search for multiple criteria but can just perform the function?
or somehow build the crtieria up? So a new column returns all rows where the company is the same. another column returns all rows from the first column where generation is the same...and so on...
For the Average you can do without arrays:
=AVERAGEIFS(D2:D$1000,A2:A$1000,A2,B2:B$1000,B2,C2:C$1000,"<="&C2)
As there is also a COUNTIFS and a SUMIFS, I think your slopes could be calculated the same way.
For the rest of the functions (max, min, etc), we should analyze case by case.
I did a slight performance test, and this is apparently better, but of course my datasets are just mocked.
HTH!
Note: Excel 2007 and up only!
Edit - Answering your comment.
Without knowing the dimensions of the problem is difficult to give advice, but I'll risk one anyway:
You could write a VBA function that:
1) Generates a new sheet for each company-generation pair
2) Sorts the data in those sheets by date
3) Adds the formulas to those sheets (no conditionals needed in this context)
4) Recalculates and Gets the results from those formulas and populates the original sheet
5) Deletes the auxiliary sheets
To capture the rows and re-use try this approach:
Sort the data by Company & Generation.
Make a unique list of Companies & generations (use Advanced Filter, Unique Only, Copy)
For each Company generation pair in the list build 2 columns of formulae. First column gives the count of rows in the data for this pair (use COUNTIFS), second column gives the first row in the data for this pair (=first row for previous pair+count of rows for previous pair). Then you can use a function like OFFSET to return only the rows of data for the Company-Generation pair and embed this inside the final function/array formula (AVERAGEIFS etc) You could extend this sort and count approach to include dates if you wanted. There is a drawback that if the list of cities and generations change you have to change the list of uniques and associated formulas. There are examples of this approach on my website athttp://www.decisionmodels.com/optspeedk.htmhttp://www.decisionmodels.com/optspeedj.htm

Resources