Looking up a value in a range and checking a second value in the row above - excel

I have a calendar that uses dynamic, dependent drop-down menus.
Each column represents a date.
In each column, there can be up to 30 evaluations written over 3 rows:
- Subject
- Group - Teacher
- Type of evaluation
At the very bottom, there is a second table that counts how many evaluations each group will have that day. However, groups are sometimes split (eg.: group 301 will split into Music and Dance classes). So in that second table, group 301 will have a total for the Dance group, and another for the music group.
If they all followed the same schedule, it'd be easy to simply have a formula like: =COUNTIF(E$4:E$113,"301").
I could use a REALLY long formula along the lines of =COUNTIF(E$4:E$113,"301")-COUNTIFS(E4:E4,"301",E3:E3,"musique")-COUNTIFS(E7:E7,"301",E6:E6,"musique")-....., but with 30 entries, this doesn't seem efficient at all.
I'm looking into INDEX, MATCH, OFFSET, etc... but can't seem to pin down the right formula. The goal of it would be to look for the group number (eg: 301), see if the previous row holds the keyword I want to subtract from that total (eg: remove 301 music's evaluation from the total for group 301 taking dance class).
I can't use VBA, because teachers will be using the online version of this form, so I have to stick to formulas.
Any help would be appreciated (file can be found here).

Related

Lookup several keywords in a column, if found, return 4 columns

I've got two databases, one with 11,000 entries, and another that I've narrowed down to around 600. They are databases with Company name, and contact information for people at that company. So column A - Company Name. Column B - last name, column C - first name, column D - position, and column E - email address. What I'd like to do is search column D - position for several keywords - vice, benefit, resource, and if found, return the four columns - last name, first name, position and email address. So for company name X, we might have 10 contact names, and a couple may have contact info for people that fall into those keywords. I'd like to return those specific people.
I've managed to get the formatting between the company names normalized between the two lists, using brute force, and some index matching formulas (that was fun!), so those are the same, and I could probably do something like add 5 or 6 rows after each unique company name to accommodate the potential number of contacts we might have for each company, but I have no idea how to return multiple specific cells for a keyword search.
I think something like this might work -
=index(columntoreturn, small(if(isnumber(search(keywords, columntosearch)), match(row(column), row(column))), rows(array)))
But that will only return an individual cell, rather than the four I would need.
Here's an example of the two databases I'm working with.
as you asked the same have been closed as answered thorugh comment behalf of #Scott Craner
Answer
As I stated in my first comment, Advanced Filter is designed for this. You can put code to automatically do it based on certain cells on the page changing values. A formula is not ideal in that it would require it to be an array formula and the more array formulas and the larger the dataset would bog down the calc times. See here for an example on how to set up advanced filter using vba.

Spotfire DenseRank by category, do I use OVER?

I'm trying to rank some data in spotfire, and I'm having a bit of trouble writing a formula to calculate it. Here's a breakdown of what I am working with.
Group: the test group
SNP: what SNP I am looking at
Count: how many counts I get for the specific SNP
What I'd like to do is rank the average # of counts that are present for each SNP, within the group. Thus, I could then see, within a group, which SNP ranks #1, #2, etc.
Thanks!
TL;DR Disclaimer: You can do this, though if you are changing your cross table frequently, it may become a giant hassle. Make sure to double-check that logic is what you'd expect after any modification. Proceed with caution.
The basis of the Custom Expression you seem to be looking for is as follows:
Max(DenseRank(Count() OVER (Intersect([Group],[SNP])),"desc",[Group]))
This gives the total count of rows instead of the average; I was uncertain if "Count" was supposed to be a column or not. If you really do want to turn it into an average, make sure to adjust accordingly.
If all you have is the Group and the SNP nested on the left, you're done and good to go.
First issue, when you want to filter it down, it gives you the dense rank of only those in the filtered set. In some cases this is good, and what you're looking for; in others, it isn't. If you want it to hold fast to its value, regardless of filtering, you can use the same logic, but throw it in a Calculated column, instead of in the custom expression. Then, in your CrossTable Aggregation, get the max of the Calculated Column value.
Calculated Column:
DenseRank(Count() OVER (Intersect([Group],[SNP])),"desc",[Group])
Second Issue: You want to pivot by something other than Group and SNP. Perhaps, for example, by date? If you throw the Date across the top, it's going to show the same numbers for every month -- the overall numbers. This is not particularly helpful.
To a certain extent, Spotfire's Custom Expressions can handle this modification. If you switch between using a single column, you could use the following:
Max(DenseRank(Count() OVER (Intersect([${Axis.Columns.ShortDisplayName}],[Group],[SNP])),"desc",[Group],[${Axis.Columns.ShortDisplayName}]))
That would automatically pull in the column from the top, and show you the ranking for each individual process date.
However, if you start nesting, using hierarchies, renaming your columns, or having multiple aggregations and throwing (Column Names) across the top, you're going to start having to pay a great deal to your custom expression. You'll need to do some form of string replacement around the Axis.Column, or use expression instead of Short Names, and get rid of Nests, etc.
Any layer of complexity will require this sort of analysis, so if your end-users have access to modify the pivot table... honestly, I probably wouldn't give them this column.
Third Issue: I don't know if this is an issue, exactly, but you said "Average Counts" -- Average per day? Per Month? When averaging, you will need to decide if, for example, a month is the total number of days in month or the number of days that particular payor had data. However you decide to aggregate it, make sure you're doing it on the right level.
For the record, I liked the premise of this question; it's something I'd thought would be useful before, but never took the time to try to implement, since sorting a column or limiting a table to only show the top 10 values is much simpler

How to find 1st, 2nd, 3rd most occuring text string in Excel using functions?

I am really having troubles with this one:
In one column, I have a long list of company names. The company names appear several times (based on how many tickets they have raised, but that is another story).
I am now looking for a function that would give me the Company name that occurs the most often. In the Cell below I would like to get company name that occurs the second most often. In the Cell below the company that occurs the third most often end so on and so on.
I thank everyone who is spending some time to help me figure this out.
Stephan
Make a PivotTable out of the data. Put the field of interest in it as both a Row Field and as a Values field, and make sure that the aggregation being applied is a Count. (It will be by default if your things are text).
See my answer at Excel, i have a big list (44000 records), i would like to sepperate them into 2 lists, 1 with unique values and 1 with duplicate value's

List of items find almost duplicates

Within excel I have a list of artists, songs, edition.
This list contains over 15000 records.
The problem is the list does contain some "duplicate" records. I say "duplicate" as they aren't a complete match. Some might have a few typo's and I'd like to fix this up and remove those records.
So for example some records:
ABBA - Mamma Mia - Party
ABBA - Mama Mia! - Official
Each dash indicates a separate column (so 3 columns A, B, C are filled in)
How would I mark them as duplicates within Excel?
I've found out about the tool Fuzzy Lookup. Yet I'm working on a mac and since it's not available on mac I'm stuck.
Any regex magic or vba script what can help me out?
It'd also be alright to see how much similar the row is (say 80% similar).
One of the common methods for fuzzy text matching is the Levenshtein (distance) algorithm. Several nice implementations of this exist here:
https://stackoverflow.com/a/4243652/1278553
From there, you can use the function directly in your spreadsheet to find similarities between instances:
You didn't ask, but a database would be really nice here. The reason is you can do a cartesian join (one of the very few valid uses for this) and compare every single record against every other record. For example:
select
s1.group, s2.group, s1.song, s2.song,
levenshtein (s1.group, s2.group) as group_match,
levenshtein (s1.song, s2.song) as song_match
from
songs s1
cross join songs s2
order by
group_match, song_match
Yes, this would be a very costly query, depending on the number of records (in your example 225,000,000 rows), but it would bubble to the top the most likely duplicates / matches. Not only that, but you can incorporate "reasonable" joins to eliminate obvious mismatches, for example limit it to cases where the group matches, nearly matches, begins with the same letter, etc, or pre-filtering out groups where the Levenschtein is greater than x.
You could use an array formula, to indicate the duplicates, and you could modify the below to show the row numbers, this checks the rows beneath the entry for any possible 80% dupes, where 80% is taken as left to right, not total comparison. My data is a1:a15000
=IF(NOT(ISERROR(FIND(MID($A1,1,INT(LEN($A1)*0.8)),$A2:$A$15000))),1,0)
This way will also look back up the list, to indicate the ones found
=SUM(IF(ISERROR(FIND(MID($A2,1,INT(LEN($A1)*0.8)),$A3:$A$15000,1)),0,1))+SUM(IF(ISERROR(FIND(MID($A2,1,INT(LEN($A2)*0.8)),$A$1:$A1,1)),0,1))
The first entry i.e. row 1 is the first part of the formula, and the last row will need the last part after the +
try this worksheet fucntions in your loop:
=COUNTIF(Range,"*yourtexttofind*")

Nested list in excel

I'm not even sure how to ask this.
I have a database, where each row is a person. Columns are contact info, phone, etc. One column is 'date visited'. There can be multiple dates visited for each person. I don't want to use a comma or stack them all in one field.
Is there a way to have a 'nested' list (not a drop-down menu - just a list of visited dates for each person), such that one person still only consumes one single row?
Yes,
To accomplish this give each person an ID that is unique and won't change.
Then on a separate sheet, store the ID and date.
main sheet ( ID, Name, Contact Info, phone, ect)
second sheet ( ID, date visited)
In database theory this is called a 'one to many' relationship, and what i'm describing is called 'normalizing your dataset'.
In Excel you can now use formulas to manipulate the data however you need to or can imagine after you split this apart.
As you mentioned in comment, counting all visited dates for a user.
On the main sheet to the right you could use:
=countif(Sheet2!A:A,Sheet1!A1)
This would Count all of the ID's in the second sheet that match the current row's ID on your main sheet.
Notes about using one cell:
Storing all the dates in one cell will eventually max it out, and will make it hard ot view/search as it grows so i highly advise against this approach.
If however you insist on keeping the dates in there, you could count the visits by counting the total number of comma's + 1 liek this =(LEN(G1) - LEN(SUBSTITUTE(G1,",","")))+1 This formula takes the length of all the dates, and the length of dates with commas removed and subtracts them to get a number of occurrences.
Notes about using multiple columns:
This approach has the same idea as the one I suggested, where we are associating a number of dates with the row's identity of a person. However, there are a few key limitations and drawbacks.
The main difference is that when we abstract the dates by transposing them to extend vertically we can manipulate them easier, and make a list of 20 dates for one person much easier to read. By transposing the dates vertically in the second sheet instead of using this approach we also gain the ability to use Excel's built in filter. Just storing large amounts of data is useless by itself. While storing it in a way that you can view and manipulate easy makes everything much more powerful.

Resources