Hi I have a really large data set in Excel (~43,000 rows) where the first two columns indicate the employee ID and name. The following 17 columns indicate monthly sales from Jan 13 - May 14. The problem is the data is pretty disorganized and has repeats of the same employee in different rows (i.e. Employee Mary has different rows each with different data). Is there a quick Excel function that I can use to sum the 17 columns after the employee ID and name if the ID number and name match, essentially grouping and combining the data belonging to one individual employee in one singular row. I think the sumif function might work but I don't know how to make the criteria general enough that Excel will know to go through the entire spreadsheet and find the two matching indicators in each row. I appreciate any input/help!
Sort by ID (which I assume is a unique identifier) then apply Subtotal for each change in ID, to each of the seventeen columns.
Related
I have a table containing a some football data, such as Country, League, Teams, Standing table information, such as total matches played, wins, draws, losses, goals scored and conceded, and so on.
Here's a file download link
It contains two sheets.
First sheet is STANDINGS_EXTENDED:
I need to fill these 3 tables with the data contained in another STANDINGS worksheet.
Here's a screenshot of the STANDINGS sheet:
My aim is that once I fill LeagueId and Group Id (which is optional) fields then inside all three tables will be produced the data as in this sample.
I wonder if it is possible to achieve this without VBA. But i have no clue on where to start from. I tried is several different ways, but i get only first result from STANDINGS worksheet for any league i enter.
Looking forward to your help.
Thank you!
UPDATE:
So far I could get the count of rows related to Overall, Home and Away using these formulas:
=COUNTIFS(STANDINGS!E:E;STANDINGS_EXTENDED!E1;STANDINGS!F:F;"StandingsOverall")
=COUNTIFS(STANDINGS!E:E;STANDINGS_EXTENDED!$E$1;STANDINGS!F:F;"StandingsHome")
=COUNTIFS(STANDINGS!E:E;STANDINGS_EXTENDED!$E$1;STANDINGS!F:F;"StandingsAway")
Also, what I can get is the first row of these results using this formula:
=VLOOKUP($E$1;STANDINGS!$E:$V;4;FALSE)
What I need to figure out is how to modify above formulas so that I fill tables with remaining rows.
In order to do this you need a formula in every single field of your 3 tables that link it to data on the Standings tab. That would be 13 x 3 x 20 formulas. Therefore one would try to create formulas that can be copied, in the best case less than 13 original ones, but definitely one formula for each field.
Each formula would look for a unique identifier in the Standings list. I can't see any unique identifiers there but you might create them by concatenation, such as "League" + "Country" + "Position". The more detail you need the larger the formula. The key is: without a unique identifier for each row you can't retrieve data. But once a row has been identified you can get the value from any of its columns.
If your tables sometimes have 12 rows, sometimes 20, and sometimes 25 you must provide space for the possible maximum and then design your formulas to return a blank if there is nothing to display.
In conclusion, the core of your system is in the Standings table. It must be set up so that data can be retrieved from it. Ideally, your selection on the Standings Extended sheet would generate a concatenated unique identifier for a list to which you can add the fixed number in the Pos column to identify individual rows in the Standing table. As long as you can't identify rows no data can be retrieved.
Using VBA gives you more flexibility but doesn't relieve you of the task to create uniquely identifiable rows.
I am trying to figure out how to best do this. I have a sheet which has about 44 columns and around 64,000 rows. The columns have different customer data points such as name, date of birth, phone number, and e-mail (these are the most relevant columns for my purposes). I was wondering how I could sort by or highlight the rows in which at least three column data points match, to show a duplicate record for a customer. To explain clearly, I only want to highlight the rows that are duplicates based on at least 3 columns (the name column (the constant) and either phone number or DOB or e-mail.)
For example:
In the above, John Smith matched based on DOB alone. Lisa winters based on email, and Stephanie wright based on both DOB and email.
Now that I am looking at it more I will combine first and last name into one column so it will only have to match 2 or more columns instead of three.
I posted in superuser and all I got was countifs which seems like a start, but I seem to need to incorporate " and, or" logic as well?
Any help with specific formulas is greatly appreciated!
Just for comparison, this would be the array-type approach but as #Luuklag rightly says, it could be slow with 64K rows of data, although it does give complete results
=SUMPRODUCT(($A2<>"")*($A2=$A$2:$A$10)*($B2=$B$2:B$10)*SIGN((($C2=$C$2:$C$10)+($D2=$D$2:$D$10)+($E2=$E$2:$E$10))))>1
So this tests all rows to see if there is more than one which agrees with the current row on last name, first name, and one of DOB, phone and email, assuming your data is in the first five columns and omitting any rows where last name is blank. Adjust ranges to suit.
This is too slow on 64K rows. A little better is to use SUMIFS
=(COUNTIFS($A$2:$A$64000,$A2,$B$2:$B$64000,$B2,$C$2:$C$64000,$C2)
+COUNTIFS($A$2:$A$64000,$A2,$B$2:$B$64000,$B2,$D$2:$D$64000,$D2)
+COUNTIFS($A$2:$A$64000,$A2,$B$2:$B$64000,$B2,$E$2:$E$64000,$E2))>3
You should sort your data on name. And then create an extra helper column that binary indicates wether it is a duplicate or not.
You could simply use a formula in F2 like:
=IF(AND($A2=$A1,$B2=$B1,OR($C2=$C1,$D2=$D1,$E2=$E1)),1,0)
This will give you 1's in column F for those that are a duplicate of the row above based on both first and last name, and at least one other column. This isn't an completely ideal situation ofcourse, as it doesn't always show a duplication. For example:
If there are 3 entries with the same name, and the first has all other fields populated. The second entry has only name and email. And is considered a match to the first entry. The third entry has only name and DOB, and isn't considered a match to the second entry, as only the names match.
To circumvent this you would require the use of INDEX(MATCH()), however that is quiet the burden on your pc, especially if you are going to use it recursively on 64K entries.
This sounds simple, but I'm getting a real headache trying to figure it out:
I have two tables in excel in the same workbook but on different sheets. I want to count unique items in column B on the first table that meet a criteria (based on the data that's in column A of that table) and appear on the second table (on a different worksheet).
Because the data I'm working with is confidential, I've made up the two tables below (I've just clipped .jpgs). They are similarly formatted, but in reality I have much more data.
I need a formula that counts the number of unique people in Column B of Table 1 who also appear in Column B of Table 2 and whose date (in Column A of Table 1) is on or before 4/2/2016.
In this example it should come out with the answer three (for Bob, Jim, and Sue).
Table 1
Table 2
Any help you can provide would be hugely appreciated!!
If you put this =IF(AND(COUNTIF('Second Table'!$B:$B,'First Table'!$B2)>0,$A2<DATE(2016,4,2),SUMIF(B1:$B1,B2,$C$1:$C1)=0),1,"") in C2, then auto-fill it down would do it and you could sum it then at the bottom..
I apologize if the question sounds vague, but let me explain. I have a spreadsheet with 20 rows that each consist of a column for employee name, ID, Calls taken MTD, calls dropped MTD, and satisfaction rate respectively. I'm trying to create separate cells which will list the top 3 employees with the highest satisfaction rate. Since this spreadsheet is updated daily, I'm trying to create a formula which will list the top three for me as opposed to manually typing each time.
So basically instead of returning the cell containing the second highest value of satisfaction rate (which I did with =LARGE()), I am trying to return the name of the employee corresponding to that cell. That is where I am stuck. Any help would be appreciated. I'm just trying to do this in Excel only, not with VBA.
You can do that with LARGE, INDEX, AND MATCH (assuming you don't have two employees with the same rate)
=index(employees,match(satisfaction_rate,satisfaction_rates,0))
i.e.
=index(employees,match(large(satisfaction_rates,2),satisfaction_rates,0))
Good day to anyone who can help!
I have two long columns in excel of employee names stretching over 1000 in each one. They are not in any order.
One column shows a list of employees who worked for the company 10th January and the other column shows who works for the company 10th February. Now there will be leavers and there will be new starts so the names and amount of employees will have changed.
Is there a way in an excel spreadsheet to highlight this difference? Whether it highlights all the names that match or it highlights the names that don't. I need a way of finding out the difference between the two columns to show essentially who has left the company as keeping this record of who has left isn't available. All I can get is a list of current employees and when they have started. I need to find who has left in between the two dates.
I hope this makes sense.
Many Thanks
If each list contains only uniques, then Conditional Formatting with two rules may suit:
ColumnA: =COUNTIF(B:B,A1)<>1
ColumnB: =COUNTIF(A:A,B1)<>1
each applied to the occupied range of the respective column.
I would recommend using VLOOKUP twice - it basically searchs a given list for a value and returns something if the value is found. If it does not find the value, it returns #NV
So, you create two new columns: Number One contains a vlookup for january, the scond one for february.
The rest is conditional formatting.