Counting duplicates across multiple columns in excel

Counting duplicates across multiple columns in excel - excel

I have a spreadsheet that looks like this
State
City
WA
Seattle
WA
Seattle
WA
Yakama
OR
Portland
OR
Albany
NY
Albany
OR
Portland
I want to count the duplicates but only for the times that BOTH columns are the same value, I would like the output to give me this info
State
City
Count
WA
Seattle
3
WA
Yakama
1
OR
Portland
2
OR
Albany
1
NY
Albany
1
I know this should be simple but I am having trouble finding this exact question elsewhere... thanks

You have a couple options.
Solution 1: Formulas
First copy and paste your state and city to new columns, then dedupe them using the Data tab. Then here's the formula for cell F3:
=COUNTIFS(A:A,D3,B:B,E3)
Solution 2: Pivot Table
Create a pivot off your data. Rows would be State and City. Values is Count of Whatever (city for example). Change your design to Tabular, repeat all labels, do not show grand or subtotals.

Just for fun a Microsoft365 solution (assuming you made a typo in your sample data):
=CHOOSE({1,2,3},UNIQUE(A2:B8),INDEX(UNIQUE(A2:B8),0,2),COUNTIFS(A2:A8,INDEX(UNIQUE(A2:B8),0,1),B2:B8,INDEX(UNIQUE(A2:B8),0,2)))

Related

How do I solve (use) ETL when columns and rows are unmatched from flat file

I have 2 large files, an Excel spreadsheet and a csv file, which are messed up, but still need to be uploaded into a table. I'm in progress learning how to use SSIS. Assume the columns and rows look something like this..
1st Excel spreadsheet (file extension .xlxs)...
ID Name GroupName City Time Price Date
A1 South Group1 London 10/06/2018 $4.50 13.30
A2 North Group2 New York $60 10/07/2018 09:00 AM
Fig 1
2nd Excel spreadsheet (file extension .csv)...
ID Name GroupName City Date Time Price
A3 East Group3 Paris 09/09/2017 $5.00 03:00 AM
A4 West Group4 Berlin 01/05/2018 $12.50 18:00
Fig 2
If you look at ID A2 in Fig 1, you will see Date as 9.00 then AM in different column. How do you solve a problem like that? This is an example, so Time data is randomly different in each column. Also note in Fig 2 for A4
I am familiar to a degree with the Script Task and Foreach Loop Container.
I search on the net and found this website....
It's is sort of what I am looking for.
For now a table has been created with these column names
ID, Name, GroupName, City, Date, Time and Price.
So ideally when data is loaded into the table it should look like this...
ID Name GroupName City Date Time Price
A1 South Group1 London 10/06/2018 13.30 $4.50
A2 North Group2 New York 10/07/2018 09:00AM $60
A3 East Group3 Paris 09/09/2017 03:00AM $5.00
A4 West Group4 Berlin 01/05/2018 18:00 $12.50
I am not sure how to approach this.
Please note: I just want to know what SSIS Toolbox Components I need to use. Once I know, I will attempt to solve this problem. That's the reason for no code example.
Thanks in advance.
Update
Thanks Hadi. If nobody mind I will keep this thread open and update when SSIS is fully available in VS 2019 and have the chance to find a solution.

I don't think there is an easy solution for that. But i will try to give some suggestions:
Convert the Excel file into csv file
In the Flat file connection manager only define on column of type DT_STR and length = 4000
In the Data Flow Task add a Script Component to split each line and validate each column value and assign it to the relevant output column
You can refer to the following answers to learn more since it contains helpful information on how to read data from flat file when data is not structured very well (Even if it is not the same case)
SSIS ragged file not recognized CRLF
How to load unstructured flat file with uneven space as delimeter? And also file contain two header
SSIS reading LF as terminator when its set as CRLF

Unsure which function would work in Excel (IF-THEN, VLOOKUP, etc)

I am coding something for a yearly tournament I do. Scores need to be listed in 2 spots on the form, but I don't want to have to manually enter them in both spots to avoid mistakes.
Buffalo 1 Detroit 2
Carolina 4 Los Angeles 6
Chicago 2 Nashville 0
Colorado 3 New York 1
Is there a way to code another cell to find the value of "Buffalo" (for example) in either column A or C and return the value directly to the right (in column B or D). Because values listed above may switch around when I do the game schedule, I need the 2nd set of scores to be "smart" in that they can find "Buffalo" in either of those columns and give me the correct value.
I've been doing some trial and error using different functions and haven't been able to figure it out yet.
Thanks in advance for your help!
enter image description here

use SUMIF()
=SUMIF($A$1:$C$4,G2,$B$1:$D$4)
Note the same size but offset ranges.

Excel: Matching 1500 names (Column A) with Supervisors (Column B) and placing Identifier in Corresponding Column C,D,E,etc

I have a list of names with their direct supervisor that I am trying to expand upon by showing the employees the next few layers down that work for the list of names.
Column C of the linked image brings back results of 1-7 by matching the list of names in A to those in the column of 7 names. This shows that "HB" works for "SW" but that "SW" works for "ZJ," so "HB" is technically under "ZJ" too. What I am hoping to accomplish is to have a result similar to that shown below (or something that will show the employees under each supervisor). As shown below, in the data there are many names not being searched for but that are needing to be mapped to those that are being searched for. At the current count there are 1500 employee names with 7 of them being the ones searched out of a list of 143 supervisors that repeat for the employees.
Names to Look for: "Fictional names used for scenario"
Sam W. (SW) 1
Robert R. (RR) 2
Kegan G. (KG) 3
Isiah B. (IB) 4
Orville E. (OE) 5
Robert J. (RJ) 6
Zach J. (ZJ) 7
Column A Column B Column C Column D Column E
Superv. Employ.
HB PJ 7
SW HB 1 7
BE JR 2
HB IL 1 7
IL AP 1
BE WP 2
RR BE 2
KG JW 3
JW JH 3
ZJ SW 7
These results would then be used to create lists of employees under a certain person.
Things I'm not sure affects this would be how the name is constructed in the workbook. Example Sam W. is listed as Wilson, Sam in the workbook.
Of course if there is another way to achieve the final result that would be easier then I wouldn't mind an altered format to what I currently have. If anyone has an idea how to achieve this please respond. If there are any questions about more specific things in the workbook that I could supply that could help resolve this quicker also let me know for any future inquiries I may have. Thank you.

Well, if I understand what you're asking, you need to normalize your data. In this case, that means creating two tables linked by an employee code that you'll need to create.
Once you have these two tables, you can easily perform any query and summary report you need.

Giving a range that contains the 7 names and next to them 7 numbers the name AUarray then in C3 and copied down to suit:
=IFERROR(VLOOKUP(A3,LUarray,2,0),"")
and in D2 and copied down to suit:
=IFERROR(VLOOKUP(INDEX(A:A,MATCH(A3,B:B,0)),LUarray,2,0),"")
I think something similar (but maybe a lot longer!) would work for ColumnE but I don't have time for that at present.
`

Display text as values in pivot table

I have a dataset sample shown below:
Name Area Term Score
AA SW Summer A
AB NW Spring B
AC SW Winter D
AD NW Spring C
I need to display a pivot table with text as values the table should follow the below
Summer Spring Winter
A AA
B AB
C AD
D AC
Inside the grid should display the Name(s) of each person who achieves the respective grade in a specific for example Name "AA" would appear in the first cell between Grade A and Summer term.
I have found many examples online but nothing specifically for what I need. I am using excel version 2010.
I would be very grateful if anyone can help me
Thanks

the easiest way to do it is
set up a pivot table on a new tab with
rows - score, name
columns - term
value - count of name
to the right set h5=IF(C5>0,$B5,"")
copy that to your range
hide the desired columns of the pivot table
there's more complicated ways, but that's the easiest

excel group data on one page allowing for sorting but also updating (prefer no VBA)

I would like to create a spreadsheet that has 3 types of data (list of students, list of teachers, list of parking), for example:
Students Grade Parking_Lot
Joe A 1
Carl B 2
Teachers Class Parking_Lot
Mr. Bob Science 1
Ms. Ann Math 2
Name Parking_Lot Position
Joe 1 Student
Carl 2 Student
Mr. Bob 1 Teacher
Ms. Ann 2 Teacher
I don't want to enter names more than once. I figured I could edit the students and the teachers table, and create the parking table automatically. But then I can't sort the parking table as the entries are all formulas. I'd like to be able to sort by parking lot or name. I also would like to be able to change parking lot from any of the tables.
Is there any way to accomplish this without VBA? Am I thinking about this wrong?
Thanks,
Nachum

Example with PT (using Count of Name for Σ Values):
Unfortunately referencing the parking lots with just 1 and 2 makes the PT more difficult to interpret.
Added some clarification.
Position, Name and Grade are Row Labels (in that order from the top), Parking Lot and Class are Column Labels, with Count of Name for Σ Values. I chose Type for the Report Filter and removed all Field Subtotals (click on each Field if necessary and select None) and in PivotTable Options, Totals & Filters removed the Grand Totals by unchecking Show grand totals for rows and Show grand totals for columns. Note that the PT is driven by the source data – not vice versa. Changes to the source data will be reflected in the PT (after Refresh) but you can’t edit data cells for fields in the row, column, or page area of the PT.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string