Excel: Function to replace variable name - excel

Before I begin this question, I would like to state that I have read a few articles, I have tried a couple of functions (=REPLACE, =SUBSTITUTE) but I'm not able to get the results required, I'm new to Excel.
The following is a homework question.
Question: Use appropriate text functions to shorten the variable names to something like Arizona Females Young, Arizona Females Old, and Arizona Females All, also is there a way to do it automatically for all variables in 1 function. The screenshot is attached.

If you need change headers:
Licensed Drivers Alabama Female 19 and under # -> Female 19 and under #
Licensed Drivers Alabama Female 19 and under # -> Female 85 and over #
function "Find and Replace" is the best solution.
You also can extract value to other place by formulas:
=SUBSTITUTE(A1,"Licensed Drivers Alabama ","")
=SUBSTITUTE(B1,"Licensed Drivers Alabama ","")

Related

Calculate the average sales of every state using the Subtotal function on excel

I'm new to excel functions and I've been asked to calculate the average sales for every state using the "Subtotal" function.
The dataset is like this:
State
Sales
California
22,5
Utah
75
Utah
122,4
California
99
Texas
101,3
Indiana
47
Texas
136
Can anyone help me?
I'm not sure Subtotal is the best fit here. My first instinct would be to use a pivot table. You can also get the same answers with dynamic array formulas.
Formula in D3:
=SORT(UNIQUE(StatesTable[State]))
Formula in E3:
=SUMIFS(StatesTable[Sales],StatesTable[State],D3#)/COUNTIFS(StatesTable[State],D3#)

EXCEL Finding a cell text when calculating a Maximum value with INDEX and MAX

I have been trying to get this to work for days but I am not getting anywhere.
I have a sheet with locations and temperature recordings during different days by different people.
I would like to find the latest date a measure was taken by location and the name of who took the recording only if he/she is a supervisor:
Locat. Name Title Date Latest measure I Want this?? and this??
CA23 Tom Supervisor 8/5/2018 2/24/2020 1/15/2019 Tom
CA23 Tom Supervisor 1/15/2019 2/24/2020 1/15/2019 Tom
CA23 John Contractor 2/24/2020 2/24/2020 1/15/2019 Tom
AZ58 Tina Supervisor 6/25/2019 12/21/2019 6/25/2019 Tina
AZ58 Jose Contractor 7/28/2018 12/21/2019 6/25/2019 Tina
AZ58 Karl Contractor 12/21/2019 12/21/2019 6/25/2019 Tina
FL61 Tony Contractor 3/26/2019 3/15/2020 3/15/2020 Linda
FL61 Emma Supervisor 8/28/2019 3/15/2020 3/15/2020 Linda
FL61 Linda Supervisor 3/15/2020 3/15/2020 3/15/2020 Linda
To get the latest date by location I used =MAXIFS(D3:D11,A3:A11,A3) but I have not been able to put a condition to count the date only if the title is a supervisor and even less to get the name of the supervisor who took the latest measure by location.
Can anyone point me in the right direction?
MAXIFS allows multiple criteria:
=MAXIFS(D:D,C:C,"Supervisor",A:A,A2)
Then if one has the Dynamic Array formula FILTER, for the name:
=#FILTER(B:B,(D:D=F2)*(A:A=A2))
If one does not have FILTER then look here for how to do INDEX with multiple criteria: Vlookup using 2 columns to reference another

get new list from pivot_table?

I'm using python 3 and pandas. I have this pivot table below.
>>> print(p.head())
question_id 1 2 3 4 ... 26 27 28 29
assessment_attempt_id
...
243908 21-24 Female 4th year undergraduate White ... Disagree Disagree Agree Agree
290934 25-29 Male Prefer Not to Answer Black or African American ... Neutral Neutral Neutral Neutral
312457 18-20 Female 1st year undergraduate White ... Strongly Agree Strongly Agree Strongly Agree Strongly Agree
312766 18-20 Female 2nd year undergraduate Hispanic or Latina/o ... Agree Agree Agree Agree
312786 21-24 Female 4th year undergraduate Black or African American ... Strongly Disagree Agree Agree Agree
It is produced from this command:
p= pandas.pivot_table(df, index=["assessment_attempt_id"], columns=["question_id"], values="text", aggfunc='first')
The table is basically exactly what I want. Now I just want columns 1,2,3,4 and the assessment_attempt_id column in a Datatable, so I can join that data by assessment_attempt_id with another existing data table.
Normally I would subset the data by doing something like this:
df1 = df[['a','b']]
but that produces and error:
KeyError: "['1' '2'] not in index"
It seems like this should be a simple and solved problem but I can not find the answer. I also tried a groupby variation which produced the same output, and also I could not extract the columns I wanted. I assume I can't at the multi-index, but I don't know how. Thank you.

Lookup with multiple criteria, one a MAX value

I am trying to lookup the LOCATION of an employee (NAME) and their MANAGER from the most recent month (largest month number) in a particular QUARTER in data like this:
NAME LOCATION MANAGER QUARTER MONTH
Ryan Smith Sioux Falls Rick James 3 7
Jane Doe Tampa Bobby Brown 3 7
John Rogers Tampa Tracy Lane 3 7
Ryan Smith Sioux Falls Rick James 3 8
Jane Doe Denver Thomas Craig 3 8
John Rogers Tampa Cody Davis 3 8
So if I know the name of the employee and the quarter I'm looking up, the results should display who their last manager was and the location they were in, as these may change month to month.
I have used an INDEX and MATCH array formula:
{=INDEX($B$2:$B$7,MATCH(A12,IF($D$2:$D$7=D12,$A$2:$A$7),0))}
but this just provides the first match and not necessarily the most recent month in that quarter. I attempted to include a MAX function which looked something like this:
{=INDEX($B$2:$B$7,MAX($E2:$E7,MATCH(A12,IF($D$2:$D$7=D12,$A$2:$A$7),0)))}
but that didn't quite get me there either.
What formula do I need to get this to work?
I think I'd choose a PivotTable for its versatility and speed:
I think a pivot table is probably the best option and can easily be modified with the filters when new entries are added to the underlying data. I was working on a solution with a formula, but it requires you to add a lookup column.
The formula for the lookup column is: =E6&" "&H6&" "&I6
I wasn't clear on how the OP was going to be "entering" his employee name and quarter, so I had to make an assumption that it would be in a separate column:
And the formula in column B (which is cumbersome) is:
=VLOOKUP(A6&" "&MAX(IF(H1:H100=NUMBERVALUE(RIGHT(A6,1)),I1:I100)),$D$6:$G$11,3,FALSE)&", managed by "&VLOOKUP(A6&" "&MAX(IF(H1:H100=NUMBERVALUE(RIGHT(A6,1)),I1:I100)),$D$6:$G$11,4,FALSE)
But it works, and as long as the lookup range is adjusted, is scaleable.

Rapidminer / Excel Missing Value Replacement

Im learning how to use Rapidminer for a project. Im stuck at some point. I have a dataset as follows: There are countries. For each country Im keeping track of some values (medals lets say) for years 1990-2012. As an example:
Country Year Gold Silver Bronze
USA 1990 10 5 7
.....
USA 2012 12 3 8
Spain 1990 8 12 9
...
Spain 1992 7 ? 8
....
Spain 2012 4 11 12
...GOES ON...
What I want to do is to replace the missing values. For example Spain has a missing value in 1992 for Silver Medals. I want to find the average for Silver data available for Spain and replace the missing value with that. How can I do this? If the present modules in Rapidminer not able to do this, is there some kind of macro etc? I can also use Excel to preprocess the data (but how)???.
Use the Replace Missing Values operator. Its default settings fill any missing data with the average of that column - exactly what you want.

Resources