Excel - identify duplicates consisting of two columns, in changing order - excel

I am trying to identify the second duplicate combinations of cells across two (or more) columns. Complicating matters, the order of combinations varies.
To illustrate - my data looks something like this:
aaa 111
222 aaa
111 aaa
111 bbb
bbb 111
I'm looking for a formula that for this example would tell me that rows 1, 2 and 4 are the first instances of the combined values, whereas rows 3 and 5 are the second instances.

If you don't mind adding another column to the sheet, you can create an "index" for each row, making sure you always list the columns in the same order. (You can always copy values and delete the intermediary column.)
For example the following formula:
=IF($A1<$B1,CONCATENATE($A1,"!",$B1),CONCATENATE($B1,"!",$A1))
Will create a unique identifier like the following, using a ! character for a delimiter:
aaa 111 111!aaa
222 aaa 222!aaa
111 aaa 111!aaa
111 bbb 111!bbb
bbb 111 111!bbb
Then it's just a matter of using COUNTIF, for example with the index in column C:
=COUNTIF($C$1:$C1,$C1)
This will produce the following sheet, the number being the ordinal 1-based count of how many times that pair has appeared.
aaa 111 111!aaa 1
222 aaa 222!aaa 1
111 aaa 111!aaa 2
111 bbb 111!bbb 1
bbb 111 111!bbb 2
Someone with more time on their hands can probably come up with an array formula to do the same thing in memory.

Related

Excel qty IN/qty OUT formula

I need help with a formula calculating qty in and out based on need date where oldest date pulls first if the PN matches an item on the inventory list.
I have a master list of total on-hand inventory.
PN
Total QTY on-Hand
aaa
10
bbb
15
ccc
0
I need to compare the total on-hand inventory list with individual order lines, but I cannot figure out how to get the remaining balance to be the starting balance (QTY OH column) of the next line. Below is what I would like it to look like.
Date Due
PN
QTY OH
QTY Demand
QTY Balance
1/15/2023
aaa
10
2
8
1/17/2023
aaa
8
3
5
1/20/2023
aaa
5
4
1
1/19/2023
bbb
15
11
4
1/25/2023
bbb
4
6
-2
1/30/2023
bbb
-2
3
-5
1/11/2023
ccc
0
8
-8
1/16/2023
ccc
-8
7
-15
1/21/2023
ccc
-15
4
-19
I have tried helper columns with vLookup (can't get away from circular references) and multiple IF/AND/OR statements. I can do SumProduct and SumIf, but I need to know the qty by the individual demand, not just the total. Is this something I can do with a formula or I need to move to VBA?
Subtract the sum of the qty demand(SUMIFS) to that point from the original qty on hand.
=VLOOKUP(B2,H:I,2,0)-SUMIFS($D$1:D1,$B$1:B1,B2)
Note what is and what is not absolute referencing. As the formula is dragged down it will only refer to the values above.
And, as you can see, since we are using conditional formula it does not matter that it is sorted by date instead of by PN and Date:

Multiple Unique Lists from Master List in Excel

I have a list of companies with multiple contacts for each company, but I would like to create separate lists with 1 unique contact per company. E.g.
Original list
Name Company Email
John AAA john#aaa.com
David AAA david#aaa.com
Jane BBB jane#bbb.com
Julia CCC julia#ccc.com
Craigh CCC craig#ccc.com
John CCC john#ccc.com
In this case, to have 1 unique record from each company in a separate list, I want to end up with 3 separate lists:
List 1
Name Company Email
John AAA john#aaa.com
Jane BBB jane#bbb.com
Julia CCC julia#ccc.com
List 2
Name Company Email
David AAA david#aaa.com
Craigh CCC craig#ccc.com
List 3
Name Company Email
John CCC john#ccc.com
As you can see in each list there is only 1 record for each company.
Any help on how to do this would be most appreciated.
I have tried advance filter to list unique records only by selecting the entire original list as the range and setting the company column as the criteria see the unique records which I could then select visible cells and cut to a different list, and then re-run the filter again to create the next set of unique records etc...but the advance filter doesn't give the results as expected.
-
-
This might be a silly idea but still worth trying... Add a helper column with Count formula:
=COUNTIF($B$1:B2,B2)
Apply Filter and Sort your data set by the fourth column (Sort Smallest to Largest):
This seems to match your Table 1, 2 & 3 output.
Alternatively, if you still want to create separate tables, you can use the helper column and array formulas (Ctrl+Shift+Enter) as per example below:
=IFERROR(INDEX($A$1:$C$7,SMALL(IF($D$1:$D$7=1,ROW($D$1:$D$7)),ROW()-1),COLUMN()-5),"")
Change $D$1:$D$7=1 to =2 and =3 to replicate the 2nd and the 3rd table.
Edit: with additional "Location" column
=COUNTIFS($B$1:B2,B2,$D$1:D2,D2)

Spark file cleansing and transformation

I have a file with 5 columns tab separated value (tsv). I need to do data scrubbing and transformation.
case 1) Remove the special character (\001 and \x0D ) with ""
case 2) Filter rows which has less then 5 columns count as Bad_Row RDD
case 3) Iterate Bad_Row RDD and check if last character of row is "\n" then remove last character "\n" and append next row until we get column count 5
Sample File Formate
------------------------------
one two 12345 four five
aaa ppp 12345 ttt
bbb
ccc rrr 12355
yyy
ddd
eee iii 12845 rrr two
Good_Rows RDD
-------------------------------
one two 12345 four five
eee iii 12845 rrr two
BAD_Row RDD
-------------------------------
aaa ppp 12345 ttt
bbb
ccc rrr 12355
yyy
ddd
Remove "\n" from second line and append third line into second line and recalculate column count if column count is 5 then consider as good_row and into Good_Rows RDD.
Sample Code Snippet
def FilterData(rdd):
row=rdd.split("\t")
col_count=len(row)
if col_count!=5 :
return row
textFile1=sc.textFile("hdfs://localhost:9000/A/test.tsv")
Clean_RDD=textFile1.map(lambda x: x.replace("\\001|\\x0D","")) # case 1
Badrow_RDD=Clean_RDD.map(FilterData) # case 2
Please help to implement case 3
Thanks
Vishal
Below is the code for your 3rd problem in Scala.
val data=sc.textFile("file:/home/rieter/Test_Streaming/ab.txt").map(x=>x.split(" +").map(x=>x.trim)).filter(x=>(x.size<5)).flatMap(x=>x).collect
val data1=data.take(data.length-(data.length%5))
data1.splitAt(5)
Output
scala> data1.splitAt(5)
res59: (Array[String], Array[String]) = (Array(aaa, ppp, 12345, ttt, bbb),Array(ccc, rrr, 12355, yyy, ddd))

Sum the value in one column based on the value in another column - excel 2013

I have 3 Columns, B,C and D.
Columns B,C,D will be updated periodically and the number of rows will be increased day by day.
Columns B contains NAME, C contains Quantity and D has Date value.
Name Qt Date
SSS 20
SSS 30
NNN 50
PPP 40 13-Jul-15
PPP 20 13-Jul-15
AAA 20
CCC 100
GGG 300
FFF 200
BBB 50 28-Aug-15
AAA 20
GGG 100
BBB 30
JJJ 50
BBB 30
FFF 50
FFF 25 24-Aug-15
CCC 75
JJJ 30
FFF 50
JJJ 36 24-Aug-15
FFF 50 24-Aug-15
I need the result as below.
Name Qt
AAA 40
BBB 30
CCC 175
FFF 300
GGG 400
JJJ 80
NNN 50
SSS 50
Rows with date field(column D) having a value should be omitted from adding.
Name field can contain any name and future names can vary from current names.
The Result can be in the same sheet or in the different sheet.
This may not meet your exact needs but it should get you the data you want in the first instance.
In Excel select the Insert tab and select Pivot Table.
Set the table range to be A1 to C60000 (or whatever the max number of rows is)
Select New Worksheet and click OK.
Add the Name and Qt fields to the report. It should automatically sum Qt.
Add Date to the Filters. This should add a bar above the pivot table that says something like:
Date (All)
Click on (All) and select (blank), and that should exclude the rows with dates specified.
use helper cells next to the date range =isblank(d2) this will return TRUE in E2
put AAA in G6, in H6 put the following formula =SUMIFS(C:C,E:E,"TRUE",B:B,G6)
if there is trash data in the blank cells like random spaces and whatnot use =isblank(trim(d2))
i loooove pivot tables but sometimes they just make file sizes too big if using a bunch of them in a single workbook, i also get tired of refreshing them.

using Excel functions to count unique values in one column based on match in other column?

I have an Excel question - is there a method of using functions to count all of the unique values of a column based on matching another column's contents with a particular data interest? For example, suppose I have two columns of data.
A | B
toothbrush | AAA
shampoo | AAA
toothbrush | AAA
toothbrush | BBB
conditioner | CCC
toothbrush | BBB
shampoo | CCC
toothbrush | CCC
toothpaste | CCC
toothpaste | AAA
toothbrush | AAA
shampoo | BBB
I would like to generate, on a separate tab, two columns that display the following information, for example:
C | D
toothbrush | 3
toothpaste | 2
shampoo | 3
conditioner | 1
Column D would be the number of unique customers for each of the 4 products.
Is there a way to do this in Column D using a formula composed of SUM, IF, COUNTIF, etc.?
Thanks in advance and please let me know if I can clarify anything further.
Since others may not be so particular about what readily available tools they decide to avoid, create a PivotTable with "Add this data to the Data Model" with A for ROWS and B for VALUES, then change Count of B to Distinct Count of B.
For the list of unique items, this will do the trick (it will produce blank cells as well):
=IF(COUNTIF(Sheet1!$A$1:A1,Sheet1!A1)>1,"",Sheet1!A1)
Put =IF(COUNTIF(Sheet1!$B$1:B1,Sheet1!B1)>1,"",Sheet1!B1) into column E to produce unique customers.
Finally, put this into column D, making sure to extend it to accommodate the number of rows on sheet1 and pressing CTRL+SHIFT+ENTER to make it an array formula:
{=IF(A1="","",
SUM(IF(COUNTIFS(Sheet1!$A$1:$A$12,A1,Sheet1!$B$1:$B$12,$C$1:$C$12)>0,1,0)))}

Resources