Consolidate table using soundex - soundex

I have a massive table of similar sounding entries(around 500) and the order numbers next to them: eg.
Benzin 3000,
Ben 209,
benziloate 800,
.
.
.
Cret 20,
cretis 333,
etc.
please note that the first 3 letters are not always the same but the names are similar sounding.
I have been trying to figure out a system to find all the similar sounding entries, collate them all into 1 and add up the values of their orders.
I have tried Soundex as that searches by similar sounding letters, however I have not been able to modify it so that it includes more than one entry and includes the sum of the orders next to it.
Does anyone have any suggestions of an easier way of doing this?
thank you very much in advance

Related

How do I get the top N results based on wildcard criteria via a formula in Excel 2019?

Since I've exhausted about every resource I could find in regards to this question, I figured it was finally time to ask this community.
I have a very large (15k+ row) dataset that I'm looking to generate a report on giving the top 25 largest values based on one of the columns, HOWEVER, there is additional criteria that needs to be considered other than just the values in one column. I have done this already with less criteria, but adding more is giving me trouble.
My (working) formula for Top N with some criteria:
{=LARGE(IF('IMPORTED DATA'!$X$4:$X$1048576 = IF('Data Cleanup'!$AX$3 = 1, "Gaming Designed", "Not Gaming Designed"), 'IMPORTED DATA'!$BH$4:$BH$1048576), ROW(A2) - ROW(A$1))}
The issue comes when I have another criteria I need to add that uses wildcard characters to distinguish the 'correct' criteria. Here is what I've come up with so far, but this just results in the COUNTIF portion always resulting in true, so not actually applying the added criteria:
{=LARGE(IF(COUNTIF('IMPORTED DATA'!$P$6:$P$1048576, IF('Data Cleanup'!$AX$3 = 1, "?????", "????")) * ('IMPORTED DATA'!$X$6:$X$1048576 = IF('Data Cleanup'!$AX$3 = 1, "Gaming Designed", "Not Gaming Designed")) * ('IMPORTED DATA'!$E$6:$E$1048576 <> "All Other (Suppressed)"), 'IMPORTED DATA'!$BH$6:$BH$1048576), ROW(A2) - ROW(A$1))}
I tried to work-around IF statements not accepting wildcard characters using the COUNTIF method but to no avail.
I understand that this is a bit of a rough question, but I'll do my best to respond to as many questions as I can to help clarify.
A couple more bits of information that may be helpful:
This is entirely based in Excel 2019, I know that FILTER would be an easy solution, but I don't have access to that in this version of excel.
The reason for using wildcards is because it was the easiest way to distinguish between the two categories to sort: above or below 100Hz. Anything under 100Hz will be 4 characters, while anything above will be 5.
I also need other data from the same row as the results, so any methods must be also applicable to MATCH criteria so that I can look up the rest of the data with the same search parameters.
its very hard to understand without seeing the data.
What i understood is that if you make a helper column in the dataset as per the criteria you want that would solve your problem.
at least thats how i am also using.
You need to create a ranking column in the data sheet.
Ranking Formula = =COUNTIFS($M$3:$M$233,">="&M3,$K$3:$K$233,K3)
with thise formula you can add as many as criteria as you want.
Index Formula = =INDEX($K$3:$K$233,MATCH(1,($K$3:$K$233=$B$1)*($N$3:$N$233=A3),0))
you need to change the columns names you want.
no need row() functions just try always to use simple sequence will work
Good luck
Ended up solving this in a very simple method thanks to Scott Craner's comment.
Since wildcards don't work in if statements, using LEN did the trick. Final formula ended up being:
{=LARGE(IF(LEN('IMPORTED DATA'!$P$6:$P$30000)=IF('Data Cleanup'!$AX$3 =1,5,4) * ('IMPORTED DATA'!$X$6:$X$30000 = IF('Data Cleanup'!$AX$3 = 1, "Gaming Designed", "Not Gaming Designed")) * ('IMPORTED DATA'!$E$6:$E$30000 <> "All Other (Suppressed)"), 'IMPORTED DATA'!$BH$6:$BH$30000), ROW(A2) - ROW(A$1))}
Thank you to everyone for your help!

How to count multiple occurrences in a set list?

I am trying to run an easier query that is a combination of both vlookup/index match and countif. So far, I have used countif, but it requires a lot of manual work especially if the data is split up.
I have attached a sample data below. I want to know how many times has "Elizabeth" done chats, phones, and/or cases over a time period.
So as you can see, Elizabeth appears twice from C4:C45. I want to know how many times she has done Chats in the two rows she appeared on the list (the list goes on forever). The answer should be something like: Elizabeth: Chat: 3, Phone 6, Cases 1
Any help would be very much appreciated!

Ranking and then displaying results

I'm trying to display the name from top to bottom based on a ranking. Can't seem to find a way to pull the info I need. I looked into VLOOKUP but the order of the cells don't work for me.
Pretty simple, I have rep, number of calls, ranked based on most calls and I want to display the names in the order of ranking. I want it to look like this
Want it to look like this:
I know its a simple solution but I am stuck.
Thinking ahead, how would I deal with same results when RANK.EQ give me 2 people on 4th position for example.
Thank you!
In V2:W2,
=SMALL(T$2:T$8, ROW(1:1))
=INDEX(Q:Q, AGGREGATE(15, 6, ROW($2:$8)/(T$2:T$8=V2), COUNTIF(V$2:V2, V2)))
Fill down.

Excel data set with non-numeric data - filtering? Counting? What?

I'm dealing with a massive data set about academic book publications from a research e-repository, with over 100 000 rows. I've been asked to pull the number of publications per Publisher and create a Top 20 using the data set. We're only interested in counting titles from the Social Sciences classification. But the data set has allowed for up to 4 different classifications, and thus if it's a Social Sciences publication, it could be listed in Classification 1, 2, 3, or 4 columns. For eg:
Example of Publishers and the Classification columns
So I want the final product to count rows where there's a value of "Social Sciences" in Columns B:E, and then group those rows by the Value in Column A. I feel like the answer is right in front of my face but I just can't see it. I've tried pivot tables, filters, COUNTIF, COUNTA, and nothing seems to be giving me what I want in a clean way. And I know a distant colleague did it with the same type of data set 2 years ago but I haven't heard back from them yet.
Any help is greatly appreciated. Thanks.
EDIT: My attempts at advanced filtering seem to get rid of everything. Here's an example:
Advanced Filter attempt 1
So it turns out the reason that my initial filtering wasn't working was because I was forgetting the '=' alongside the text requirement. Silly, I know, but it's fixed now and works beautifully. Thanks for helping me talk it out.

Google Drive(Excel): Using Sumif for each unique string value without making a separate sumif command for each

Okay, Looked for about 30 minutes before I realized that I don't have the necessary knowledge to know the proper terms to use for this question... please help... If it matters, I'm using google docs, which as best as I can tell is ALMOST always compatible with Excel.
I have a spreadsheet that has two columns that I care about, Column B contains names, and Column T contains numeric values that I need to add together depending on the name in Column B.
="Reon: " & SUMIF('Form Responses'!B2:B10000,"Reon",'Form Responses'!T2:T10000)
I could make a few hundred sumif commands like the above, and add new ones every time a new name submits the form for the first time, but this is manual and would take forever. Is there a way to get the unique values of Column B, and put them into this formula to make a list resembling:
Name1: 247, Name2: 698, Name3: 420
The Exact formatting does not really matter as long as the name is displayed with the number and not a long string of indecipherable numbers. The generated list will be read by a real live person.
Thanks for any and all help you can provide.
Pnuts, Thank you for your wonderful links!
Here's the Code I used to manage it based on them.
Column One
=UNIQUE('Form Responses'!B2:B1000)
Column Two
=SUMIF('Form Responses'!B2:B100,UNIQUE('Form Responses'!B2:B100),'Form Responses'!T2:T100)
This gave me two columns with corresponding numbers and names in each. Thanks so much!

Resources