Most Common Value within Sublists - excel

I apologize in advance if this is unclear, I will try to explain everything as best I can! I am working with a data set in Google Sheets such that Column A is a list of student IDs and Column B is a list of student behaviors. It looks something like this:
A(ID) B(Behaviors)
12345 Talking
54321 Out of Seat
98765 Lying
12345 Talking
12345 Lying
98765 Lying
The list is data set is quite large because it contains recorded data from the entire school population over the course of the year, and as you can see the entire student population is pooled in one list. I am looking for a way to find each students (identified by their IDs) most commonly assigned behavior. For example, for the above data, student 12345 would have 'Talking' listed as their most common behavior and student 98765 would have 'Lying' listed as their most common behavior.
Ideally, I want to create a separate spreadsheet that looks something like this:
A(ID) B(Most Common Behavior)
12345 Talking
98765 Lying
54321 Out of Seat
Such that column A is a list of all the student's IDs and column B lists their most common behavior.
I found that I could use this formula:
=INDEX(Behaviors,MODE(MATCH(Behaviors,Behaviors,0)))
To pull out the most common value from the column containing scholar behaviors, but this formula gives me the most common behavior among the entire student population, so I am interested in modifying it so that the formula first looks at the student ID and then looks at the most common behavior within that sublist.
Please let me know if you require any further information. Thanks in advance for your help!

Are you familiar using PivotTables? You could just create a PivotTable with ID as a Row Label and Behavior as a column label and Value. Then it would just be a matter of copying/pasting those values and using a MAX formula to get the greatest behavior count.

Related

Is there a logical function in excel to extract unique text values from a range of similar texts?

I am working on a dataset which has data (text) entries captured in different styles like we see in the table below in 1000's of rows:
**School Name **
Abirem school
Abirem sec School
Abirem Secondary school
Abirem second. School
Metropolitan elementary
Metropolitan Element.
Metropolitan ele
I need help to extract the unique data values within a group of similar entries regardless of the style it was entered. The output I want should look like we see below:
**School Name **
Abirem school
Metropolitan elementary
I have tried using the functions; EXACT, UNIQUE, MATCH and even XLOOKUP (with the wildcard option) but none of them gives me the output I want.
Is there a logical function that can be used?
This will prove to be tricky. Excel would not know wheather or not two different names that look similar are actually meant to be similar. Even for us humans it will become trivial. I mean; would School1 ABC be similar to School1 DEF or not? Without actually knowing geographical locations about these two schools these could well be two different schools with a similar first word in their names.
Either way, if you happen to be willing to accept this ambiguity you could make a match on the 1st word of each line here and return only those where they match first:
Formula in C1:
=LET(a,A1:A7,UNIQUE(XLOOKUP(TEXTSPLIT(a," ")&" *",a&" ",a,,2)))

Sum of Questionnaire Scores Based on a Domain Table

I have created a questionnaire that consists of around 100 questions. Participants are asked to fill them in online, where the items are shuffled each time. These items are separated into 6 domains where, for the sake of easier understanding, let's just call them Domain 1 - 6.
I have them typed in one specific table called "Correspondence", with format like below:
(An example)
Question No.|Domain
   1  |Domain A
   2  |Domain C
   3  |Domain A
   4  |Domain B
   5  |Domain A
   6  |Domain C
I used Google Form to generate a spreadsheet of RAW data of respondents, where it will help me mark the RAW Scores, for each item on a separate column:
(An example)
Submission ID|Question 1|Question 2|Question 3|Question 4|Question
5|Question 6
Participant 1 |  2  |  3  |  5   |  1   |  2   |  4   |
Participant 2 |  5  |  4  |  5   |  3   |  5   |  1   |
Participant 3 |  1  |  1  |  1   |  2   |  2   |  2   |
The next thing I need to do is generate another table that sums up the Domain totals for each participant. So from the example above, I need to sum 1,3,5 as Domain A, 4 as Domain B and 2 & 6 as Domain C:
(An example)
Participant 1
    |Domain A|Domain B|Domain C|
Total |  9  |  1  |  7   |
The hardest thing is to find a proper method to kick start this process. Can anyone point me in the right direction? Either formulas or VBAs would be fine too. Thanks!
This can be done if you are able to create a helper row.
First, I created a table to link the question to domain. That is named in my example as "Correspondence". This table is somewhat the answer key. From your description of the problem, you need a table like this to establish which question is associated with the domain/category/point system you want to use.
I then created a helper row for the survey results shown on row 9. This has =INDEX($B$3:$C$8,MATCH(B$10,$B$3:$B$8,0),2) in cell B9 as the code to reference the question to the domain. This is immediately above the questions in the example, but you can put it on a separate sheet if needed.
Then you can just sum them up.
=SUMPRODUCT(SUMIFS(INDIRECT(MATCH($E3,$A:$A,0)&":"&MATCH($E3,$A:$A,0)),$9:$9,F$2))
This formula uses MATCH, which returns an integer, inside INDIRECT to be used as a dynamic row reference. This will fail if the participant names are not unique. The SUMIFS inside the SUMPRODUCT allows the row to be treated like an array without using an array formula. So you can recreate the example I have and copy/paste or drag and paste the formulas as you wish.
A different approach may be that you want to sum up the points to the questions first and then do the conversion from question to domain. That way you don't ever have to manipulate the raw data, just the reports. That may be the better approach for you, actually.
Edit: Added information about the formulas and the example.

INDEX function to locate customer ID when entering Name

I've recently started working with Excel due to running my own business now. As with anything I do, I want my logs to be practical, efficient and most of all working correctly. I'm almost satisfied with what I got so far but I can't seem to figure out how to let Excel look up a customer ID.
Basically what I want is:
In the first sheet I add a customer by name in column B, his assigned customer ID is in column A
In sheet 2 I type in the name of the customer in column E and I want Excel to look up that lastname in sheet 1 and then add the related customer ID in sheet 2 in column A
The reason for this is that I have returning customers and I don't want them to have a new customer ID, I want them to have the same ID as they had previously without going through all my customers to look if they are a returning customer and if so what their customer ID is.
I've been playing around with the INDEX function as that seemed to be the function used for this kinda stuff, but I just can't figure it out.
I look forward to hearing your tips and tricks in regards to this issue, thanks in advance!
Marc
In Sheet 2 A2
=INDEX(Sheet1!A:A,MATCH(E2,Sheet1!B:B,0),1)
This answer would work for you assuming Sheet1 stores ID in A, name in B, and your second sheet has you type the name in E.
Caveats being that matching based on typed names are extremely prone to error, it would have to be an exact match. Perhaps consider using data validation or a more robust solution in the medium term.

How to retrieve a matched substring where a string matches a list of potential substrings

Long-time user of Stack Overflow though couldn't find a clear answer to this
OBJECTIVE: Next to each Product field I need to find and retrieve a matching Brand which substring is in another worksheet.
-Worksheet 1 (Products) I have a single column with 76,000 rows
-Worksheet 2 (Brands) I have a single column with 2,000 brands and growing
Sample Data:
Products
COLUMN A
ANGEL BRAND BLACK OLIVES 2KG
ANMOL SALT TABLE 750GR
76,000 others
Brands
COLUMN A
ANTONELLI
AH
AHG
So the result should fetch a brand from a growing list and dynamically place it in the Products Worksheet:
A1- PRODUCT B1- BRAND
A2- ANGEL BRAND BLACK OLIVES 2KG B2- ANGEL
C2- ANMOL SALT TABLE 750GR C3- ANMOL
I have searched a number of forums and am convinced that the INDEX MATCH Array is what I need to do, though can't seem to get the syntax... it might be that I need to include a SEARCH in there somewhere.
This was the closest thing I found to what I need though couldn't exactly make it work for me: How to find if substring exists in a list of strings (and return full value in list if so)
Thank you for your patience in my explanation... I'll get better at this!
UPDATE: This kinda does what I am after though it takes quite a while to refresh and it's only picking up the first couple of letters as opposed to the entire word:
=IFERROR(INDEX(BRANDS!A:A,MATCH(TRUE,INDEX(ISNUMBER(SEARCH(BRANDS!A:A,A2)),,),0)),"No Match Found")
If the brand name is always included in the Products then, you can try:
=INDEX(Brands!$A:$A,MATCH(1,IF(ISERROR(SEARCH(Brands!$A:$A,A2)),0,1),0))
You enter this formula on B2 using Ctrl+Shift+Enter.
So for example, you have below set-up:
It will give you what you want. Take note though that re-calculation may take a while since you are dealing with a lot of data.
Another option would be to do this using macro which you'll need to run every time you open the workbook.
IMPORTANT: This will fail though if you have brand names like BLACK or OLIVES which is also found on the product name.

Can't get INDEX/MATCH functions to do what I need?

It is quite an in depth excel sheet (to me) so here is a link to it: https://dl.dropboxusercontent.com/u/19122839/Movies.xlsm
On the Filters sheet, I have a search feature. This allows you to put in different genres, years, etc. and will pull up results.
The genre part does not seem to be working correctly for some reason.
In the movie_genres sheet, there is a Genre Equals and Genre Count column that seem to be marking the information correctly, but when you go to the movies sheet, the Matches Genre column does not. I use this function:
=INDEX(Genres[Genre Count],MATCH(Movies[[#This Row],[ID]],Genres[ID],0))
Which, to me, should pull the Genre Count, but in the case where there are more than one genre (I used Blank Check as an example in this case), it doesn't mark it as a 1. How can I make it so that this gets corrected.
For example, if you add the Comedy as a second genre, it pulls up more results than if you only have Family. I think I just need a fresh pair of eyes looking at this and it is probably something dumb, but any help would be great.
I believe I need to make it so that the index/match function I use in Movies[Matches Genre] will work as long as there is a 1 in Genres[Genre Count] for that ID. It only seems to work if there is a 1 in the first instance of the ID.
EDIT: I have added in a COUNT feature to better explain what I am talking about. With only Family as a genre, it shows there are 10 results, but when you add Comedy as a second genre, you get 40 results. This number should never go up as you add genres.
Perhaps try using SUMIF like this
=SUMIF(Genres[ID],[#ID],Genres[Genre Count])
If one movie might have several 1s but you only want 1 maximum then change to
=IF(SUMIF(Genres[ID],[#ID],Genres[Genre Count])>0,1,0)

Resources