Deleting duplicate data in one of three column in Excel - excel-formula

I tried to use some functions that I found whilst searching to solve my problem, they are slightly modified it to remove duplicate data in a field.
File
Rather than a count of 4, I would like the count of 2 from column J. The information below are my attempts for 4 different sections on the attached document as I always thought the next one would give me the result that I wanted.
H ====I==========J
P13C Body Exterior 4943
P13C Body Exterior 4943
P13C Body Exterior 5122
P13C Body Exterior 5122
=IFERROR(INDEX($K$7:$K$142,MATCH(0,COUNTIFS($H$7:$H$142,B14,$K$7:$K$142,$E$14),0)),"")
as does this
=IFERROR(INDEX($J$7:$J$142,MATCH(,IF(H$7:H$142="P13C",COUNTIF(I7:I142,$J$7:$J$142)),)),"")
and this
=IFERROR(INDEX($K$7:$K$142,MATCH(0,COUNTIF($H$7:$H$142,$K$7:$K$142),0)),"")
This, gives me a 0
=IF($J$7:$J$142>1,IF($K$7:$K$142="20",SUM(IF(FREQUENCY($H$7:$H$142,$H$7:$H$142)>1,1))))
This gives me a DIV error
=SUMPRODUCT(((H7:H142="P13C")*(I7:I142="Body Exterior"))/(COUNTIFS(J7:J142, J7:J142, H7:H142, "P13C", I7:I142,"Body Exterior")+((H7:H142<>"P13C")+(I7:I142="Body Exterior"))))
There are duplicates in $J$7:$J$142, but I only want the one count.

Sort the column J in smallest to largest order
concatenate all three values by using '&' on next column K ---> H&I&J
then use IF and COUNTIF formula on next column L,
Column L:
=IF(K1=K2,COUNTIF(K1,K1:K142),"")
this a easy method to get count 1 for J column

Related

Textjoin values of column B if duplicates are present in column A

I want to consolidate the data of column B into a single cell ONLY IF the index (ie., Column A) is duplicated.
For example:
Currently, I'm doing manually for each duplicated index by using the following formula:
=TEXTJOIN(", ",TRUE,B4:B6)
Is there a better way to do this all at once?
Any help is appreciated.
There may easier way but you can try this formula-
=BYROW(A2:A17,LAMBDA(p,IF(INDEX(MAP(A2:A17,LAMBDA(x,SUM(--(A2:INDEX(A2:A17,ROW(x)-1)=x)))),ROW(p)-1,1)=1,TEXTJOIN(", ",1,FILTER(B2:B17,A2:A17=p)),"")))
Using REDUCE might be possible for a more succinct solution, though try this for now:
=BYROW(A2:A17,LAMBDA(ζ,LET(α,A2:A17,IF((COUNTIF(α,ζ)>1)*(COUNTIF(INDEX(α,1):ζ,ζ)=1),TEXTJOIN(", ",,FILTER(B2:B17,α=ζ)),""))))
For the sake of alternatives about how to solve it:
Using XMATCH/UNIQUE
=LET(A, A2:A17, ux, UNIQUE(A),idx, FILTER(XMATCH(ux, A), COUNTIF(A, ux)>1),
MAP(SEQUENCE(ROWS(A)), LAMBDA(s, IF(ISNA(XMATCH(s, idx)), "", TEXTJOIN(",",,
FILTER(B2:B17, A=INDEX(A,s)))))))
or using SMALL/INDEX to identify the first element of the repetition:
=LET(A, A2:A17, n, ROWS(A), s, SEQUENCE(n),
MAP(A, s, LAMBDA(aa,ss, LET(f, FILTER(B2:B17, A=aa), IF((ROWS(f)>1)
* (INDEX(s, SMALL(IF(A=aa, s, n+1),1))=ss), TEXTJOIN(",",, f), "")))))
Here is the output:
Explanation
XMATCH and UNIQUE
The main idea here is to identify the first unique elements of column A via ux, and find their corresponding index position in A via XMATCH(ux, A). It is an array of the same size as ux. Then COUNTIF(A, ux)>1) returns an array of the same size as XMATCH output indicating where we have a repetition.
Here is the intermediate result:
XMATCH(ux, A) COUNTIF(A, ux)>1)
1 FALSE
2 FALSE
3 TRUE
6 FALSE
7 TRUE
9 TRUE
11 FALSE
12 TRUE
15 FALSE
16 FALSE
so FILTER takes only the rows form the first column where the second column is TRUE, i.e the index position (idx) where the repetition starts. For our sample it will be: {3;7;9;12}.
Now we iterate over the sequence of index positions (s) via MAP . If s is found in idx via XMATCH (also XLOOKUP(s, idx, TRUE, FALSE) can be used for the same purpose) then we join the values of column B filtered by column A equal to INDEX(A,s).
SMALL and INDEX
This is a more flexible approach because in the case we want to do the concatenation in another position of the repetition you just need to specify the order and the formula doesn't change.
We iterate via MAP through elements of column A and index position (s). The name f has the filtered values from column B where column A is equal to a given value of the iteration aa. We need to identify only filtered rows with repetition, so the first condition ROWS(f) > 1 ensures it.
The second condition identifies only the first element of the repetition:
INDEX(s, SMALL(IF(A=aa, s, n+1),1))=ss
The second argument of SMALL indicates we want the first smallest value, but it could be the second, third, etc.
Where A is equal to aa, IF assigns the corresponding value of the sequence (remember IF works as an array formula), if not then it assigns a value that will never be the smallest one, for example, n+1, where n represents the number of rows of column B. SMALL returns the smallest index position. If the current index position ss is not the smallest one, the conditions FALSE.
Finally, we do a TEXTJOIN only when both conditions are met (we multiply them to ensure an AND condition).

2D vector lookup [x;y]

I've got a matrix, with two coordinates [i;j]
I'm trying to automatize a lookup:
As an example, this would have the coordinates of [1;2]
Here's a table of all the coordinates:
So here, obviously [1;2] would equate to 143,33
To simplify the issue:
I'll try to go step by step over what I'm trying to do to make the question bit less confusing.
Think of what I'm trying to do as a function, lookup(i, j) => value
Now, refer to the second picture (table)
I find all rows containing index [i] (inside column C) and then
only for those rows find row containing index [j] (inside column D ∩ for rows from previous step)
Return [i;j] value
So if u invoked lookup(2, 4)
Find all rows matching i = 2
Row 5: i = 2 ; j = 3
Row 6: i = 2 ; j = 4
Row 7: i = 2 ; j = 5
Lookup column j for j=4 from found rows
Found row 6: i = 2 ; j = 4.
Return value (offset for yij column = 143,33)
Now this isn't an issue algorhitmically speaking, but I have no idea how to go about doing this with excel formulas.
PS: I know this is reltively simple vba issue but I would prefer formulas
PSS: I removed what I tried to make the question more readable.
You can use SUMPRODUCT, which return 0 for not found values:
=SUMPRODUCT(($C$4:$C$18=$I4)*($D$4:$D$18=J$3)*$E$4:$E$18)
or AGGREGATE, which returns an error that can be hidden by the IFERROR function:
=IFERROR(AGGREGATE(15,6,(1/(($C$4:$C$18=$I12)*($D$4:$D$18=J$3)))*$E$4:$E$18,1),"")
You can use SUMIFS here assuming you will not have exact duplicate combinations of [i, j]. If you did have a duplicate combination, the amounts will be summed and placed in the corresponding cell
In cell B2 place this equation: =SUMIFS($Q$2:$Q$16,$P$2:$P$16,B$1,$O$2:$O$16,$A2) and drag across and over as needed
IF you want to convert the 0's to blanks you can nest the above formula inside a text formatter like so:
=TEXT([formula], "0;-0;;#")

How do I sum data based on a PART of the headers name?

Say I have columns
/670 - White | /650 - black | /680 - Red | /800 - Whitest
These have data in their rows. Basically, I want to SUM their values together if their headers contain my desired string.
For modularity's sake, I wanted to merely specify to sum /670, /650, and /680 without having to mention the rest of the header text.
So, something like =SUMIF(a1:c1; "/NUM & /NUM & /NUM"; a2:c2)
That doesn't work, and honestly I don't know what i should be looking for.
Additional stuff:
I'm trying to think of the answer myself, is it possible to mention the header text as condition for ifs? Like: if A2="/650 - Black" then proceed to sum the next header. Is this possible?
Possibility it would not involve VBA, a draggable formula would be preferable!
At this point, I may as well request a version which handles the complete header name rather than just a part of it as I believe it to be difficult for formula code alone.
Thanks for having a look!
Let me know if I need to elaborate.
EDIT: In regards to data samples, any positive number will do actually, damn shame stack overflow doesn't support table markdown. Anyway, for example then..:
+-------------+-------------+-------------+-------------+-------------+
| A | B | C | D | E |
+---+-------------+-------------+-------------+-------------+-------------+
| 1 |/650 - Black |/670 - White |/800 - White |/680 - Red |/650 - Black |
+---+-------------+-------------+-------------+-------------+-------------+
| 2 | 250 | 400 | 100 | 300 | 125 |
+---+-------------+-------------+-------------+-------------+-------------+
I should have clarified:
The number range for these headers would go from /100 - /9999 and no more than that.
EDIT:
Progress so far:
https://docs.google.com/spreadsheets/d/1GiJKFcPWzG5bDsNt93eG7WS_M5uuVk9cvkt2VGSbpxY/edit?usp=sharing
Formula:
=SUMPRODUCT((A2:D2*
(MID($A$1:$D$1,2,4)=IF(LEN($H$1)=4,$H$1&"",$H$1&" ")))+(A2:D2*
(MID($A$1:$D$1,2,4)=IF(LEN($I$1)=4,$I$1&"",$I$1&" ")))+(A2:D2*
(MID($A$1:$D$1,2,4)=IF(LEN($J$1)=4,$J$1&"",$J$1&" "))))
Apparently, each MID function is returning false with each F9 calculation.
EDIT EDIT:
Okay! I found my issue, it's the /being read when you ALSO mentioned that it wasn't required. Man, I should stop skimming!
Final Edit:
=SUMPRODUCT((RETURNSUM*
(MID(HEADER,2,4)=IF(LEN(Match5)=4,Match5&"",Match5&" ")))+(RETURNSUM*
(MID(HEADER,2,4)=IF(LEN(Match6)=4,Match6&"",Match6&" ")))+(RETURNSUM*
(MID(HEADER,2,4)=IF(LEN(Match7)=4,Match7&"",Match7&" ")))
The idea is that Header and RETURNSUM will become match criteria like the matches written above, that way it would be easier to punch new criterion into the search table. As of the moment, it doesn't support multiple rows/dragging.
I have knocked up a couple of formulas that will achieve what you are looking for. For ease I have made the search input require the number only as pressing / does not automatically type into the formula bar. I apologise for the length of the answer, I got a little carried away with the explanation.
I have set this up for 3 criteria located in J1, K1 and L1.
Here is the output I achieved:
Formula 1 - SUMPRODUCT():
=SUMPRODUCT((A4:G4*(MID($A$1:$G$1,2,4)=IF(LEN($J$1)=4,$J$1&"",$J$1&" ")))+(A4:G4*(MID($A$1:$G$1,2,4)=IF(LEN($K$1)=4,$K$1&"",$K$1&" ")))+(A4:G4*(MID($A$1:$G$1,2,4)=IF(LEN($L$1)=4,$L$1&"",$L$1&" "))))
Sumproduct(array1,[array2]) behaves as an array formula without needed to be entered as one. Array formulas break down ranges and calculate them cell by cell (in this example we are using single rows so the formula will assess columns seperately).
(A4:G4*(MID($A$1:$G$1,2,4)=IF(LEN($J$1)=4,$J$1&"",$J$1&" ")))
Essentially I have broken the Sumproduct() formula into 3 identical parts - 1 for each search condition. (A4:G4*: Now, as the formula behaves like an array, we will multiply each individual cell by either 1 or 0 and add the results together.
1 is produced when the next part of the formula is true and 0 for when it is false (default numeric values for TRUE/FALSE).
(MID($A$1:$G$1,2,4)=IF(LEN($J$1)=4,$J$1&"",$J$1&" "))
MID(text,start_num,num_chars) is being used here to assess the 4 digits after the "/" and see whether they match with the number in the 3 cells that we are searching from (in this case the first one: J1). Again, as SUMPRODUCT() works very much like an array formula, each cell in the range will be assessed individually.
I have then used the IF(logical_test,[value_if_true],[value_if_false]) to check the length of the number that I am searching. As we are searching for a 4 digit text string, if the number is 4 digits then add nothing ("") to force it to a text string and if it is not (as it will have to be 3 digits) add 1 space to the end (" ") again forcing it to become a text string.
The formula will then perform the calculation like so:
The MID() formula produces the array: {"650 ","670 ","800 ","680 ","977 ","9999","143 "}. This combined with the first search produces {TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE} which when multiplied by A4:G4
(remember 0 for false and 1 for true) produces this array: {250,0,0,0,0,0,0} essentially pulling the desired result ready to be summed together.
Formula 2: =SUM(IF(Array)): [This formula does not work for 3 digit numbers as they will exist within the 4 digit numbers! I have included it for educational purposes only]
=SUM(IF(ISNUMBER(SEARCH($J$1,$A$1:$G$1)),A8:G8),IF(ISNUMBER(SEARCH($K$1,$A$1:$G$1)),A8:G8),IF(ISNUMBER(SEARCH($L$1,$A$1:$G$1)),A8:G8))
The formula will need to be entered as an array (once copy and pasted while still in the formula bar hit CTRL+SHIFT+ENTER)
This formula works in a similar way, SUM() will add together the array values produced where IF(ISNUMBER(SEARCH() columns match the result column.
SEARCH() will return a number when it finds the exact characters in a cell which represents it's position in number of characters. By using ISNUMBER() I am avoiding having to do the whole MID() and IF(LEN()=4,""," ") I used in the previous formula as TRUE/FALSE will be produced when a match is found regardless of it's position or cell formatting.
As previously mentioned, this poses a problem as 999 can be found within 9999 etc.
The resulting array for the first part is: {250,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE} (if you would like to see the array you can highlight that part of the formula and calculate with F9 but be sure to highlight the exact brackets for that part of the formula).
I hope I have explained this well, feel free to ask any questions about stuff that you don't understand. It is good to see people keen to learn and not just fishing for a fast answer. I would be more than happy to help and explain in more depth.
I start this solution with the names in an array, you can read the header names into an array with not too much difficulty.
Sub test()
Dim myArray(1 To 4) As String
myArray(1) = "/670 - White"
myArray(2) = "/650 - black"
myArray(3) = "/680 - Red"
myArray(4) = "/800 - Whitest"
For Each ArrayValue In myArray
'Find position of last character
endposition = InStr(1, ArrayValue, " - ", vbTextCompare)
'Grab the number section from the string, based on starting and ending positions
stringvalue = Mid(ArrayValue, 2, endposition - 2)
'Convert to number
NumberValue = CLng(stringvalue)
'Add to total
Total = Total + NumberValue
Next ArrayValue
'Print total
Debug.Print Total
End Sub
This will print the answer to the debug window.

Using tbl.Lookup to match just part of a column value

This question relates to the Schematiq add-in for Microsoft Excel.
Using =tbl.Lookup(table, columnsToSearch, valuesToFind, resultColumn, [defaultValue]) the values in the valuesToFind column have a consistent 3 characters to the left and then varying characters after (e.g. 908-123456 or 908-321654 - i.e. 908 is always consistent)
How can I tell the function to lookup the value based on the first 3 characters only? The expected answer should be the sum of the results of the above, i.e. 500 + 300 = 800
tbl.Lookup() works by looking for an exact match - this helps ensure it's fast but in this case it means you need an extra step to calculate a column of lookup values, something like this:
A2: =tbl.CalculateColumn(A1, "code", "x => LEFT(x, 3)", "startOfCode")
This will give you a new column that you can use for the columnsToSearch argument, however tbl.Lookup() also looks for just one match - it doesn't know how to combine values together if there is more than one matching row in the table, so I think you also need one more step to group your table by the first 3 chars of the code, like this:
A3: =tbl.Group(A2, "startOfCode", "amount")
Because tbl.Group() adds values together by default, this will give you a table with a row for each distinct value of startOfCode and the subtotal of amount for each of those values. Finally, you can do the lookup exactly as you requested, which for your input table will return 800:
A4: =tbl.Lookup(A3, "startOfCode", "908", "amount")

how to conditionally match in excel

I've got two data sets: Data-A and Data-B.
Data-A
A B C D Start_Date End_Date
N C P 1 23-05-2015 27-05-2015
N C K 1 30-05-2015 07-06-2015
N C Ke 1 09-06-2015 28-06-2015
N C Ch 1 14-07-2015 25-07-2015
N C Th 1 29-06-2015 13-07-2015
N C Po 2 23-05-2015 27-05-2015
N C Kan 2 30-05-2015 08-06-2015
Data-B
X D Date A B C
444 1 09-07-2015
455 1 20-07-2015
1542 1 28-06-2015
2321 1 21-07-2015
2744 1 01-07-2015
7455 2 25-05-2015
12454 2 02-06-2015
18568 2 24-05-2015
28329 2 03-06-2015
28661 2 31-05-2015
Values is data-Bare missing and I need to fill them using conditional index matching/vlookup such that column D(Data-B) is matched along with Date(Data-B) such that Start Date<= Date <=End Date.
Desired Output:
X D Date A B C
444 1 09-07-2015 N C Th
455 1 20-07-2015 N C Ch
1542 1 28-06-2015 N C Ke
2321 1 21-07-2015 N C Ch
2744 1 01-07-2015 N C Th
7455 2 25-05-2015 N C Po
12454 2 02-06-2015 N C Kan
18568 2 24-05-2015 N C Po
28329 2 03-06-2015 N C Kan
28661 2 31-05-2015 N C Kan
Proof of Concept
In order to achieve the above I used the AGGREGATE function. It is a normal formula that performs array like calculations. The following formula will return the results from the first row that matches your criteria.
=INDEX(A$2:A$8,AGGREGATE(15,6,ROW($D$2:$D$8)/(($J2=$D$2:$D$8)*($E$2:$E$8<=$K2)*($K2<=$F$2:$F$8)),1)-1)
This assumed your table Data-A Started in A1 and included 1 row as a header row. The formula can be place in the first cell under A in Data-B and copied down and to the right as needed.
UPDATE Formula explained
The aggregate function performs array calculations within its brackets for certain sub function. There are about 19 different subfunctions. Subfunction 14 and 15 are both array calculations. This is a nice feature since it does array like calculations while being a regular formula.
Since I wanted the first row that met your criteria, I opted to use the small function or subfunction 15 for the first argument. Basically I am telling the aggregate function to generate a list and sort it in ascending order.
The second argument has a value of 6 which tell the aggregate to ignore any results from the array that generate errors. This will come in very handy if we can make results we do not want turn in to errors.
Now we are getting into the array portion of the formula. You can take this next part of the equation and highlight the appropriate rows in a neighbouring column and enter it as a CONTROL+SHIFT+ENTER (CSE) formula. As long as you do this in the top cell the array formula will propagate to the remainder of the selected cells and show you the results of the array. Also check the formula bar to see if { } appeared around your formula. You cannot add the { } manually.
{=ROW($D$2:$D$8)/(($J2=$D$2:$D$8)*($E$2:$E$8<=$K2)*($K2<=$F$2:$F$8))}
What this will do is determine the current row and then will divide it by the results of our conditions. You can also try each of the following conditions in a separate column as CSE formulas in the same manner described above to see their results.
($J2=$D$2:$D$8)
($E$2:$E$8<=$K2)
($K2<=$F$2:$F$8)
These on their own will provide you with either TRUE or FALSE as it checks each row. Now the interesting thing is, and this applies to excel formulas, when you perform a math operation on a Boolean, it will treat 0 as false and anything other number as TRUE. It will actually convert TRUE to 1. You will also note that each of the logic checks was separated by *. In this case * is acting like an AND operator as only when all results are true will you get an answer of 1. (+ will act like an OR operator)
Now if you remember from earlier 6 said to ignore all errors. So any row that does not meet our logic check will result in a division by 0 since not all logic checks results in TRUE or 1. All the checks that wound up false wind up getting ignored. So now after doing that, a list of only row numbers that met our criteria is left inside the aggregates array.
After the logic check there is a ,1 for the next argument. In this case we are telling the aggregate to return the 1st number in the list which is the first row number that met our criteria. If we wanted the third number, this would be ,3 instead.
So aggregate is returning the first row number of the results we want. When this is paired with an INDEX function, when can use the result to tell us what row of the INDEX function to look in. In this case we said we wanted to look in the index A$2:A$8. The aggregate function is telling us how many rows to go down in the index. If the index had start in row 1 we would not have to do anything. But since there is a header row, we need to adjust the results from the aggregate function by subtracting 1 for the head row (in reality you need to subtract the row number above the start of your data). This is why you see the -1 after the aggregate function.
Now if you pay attention to the lock on the range you will notice I did not lock the A in A$2:A$8. I did this so that I could copy the formula to the right and the column A address would update as I did. This only works because you were keeping the columns in the same order. If the order has changed I would have changed the index from a 1D array to a 2D array and used a MATCH function to line up the column headers.

Resources