Conditionally assign unique values - excel

I have data, for example:
John Doe MD 1
Ben Doe PA
Cal Doe MD 1
Drum Doe PA
Egg Doe NP
Fun Doe MD 1
So everywhere there's an MD I have an IF condition assigning the number 1. But I want to isolate and pull all the names with a 1 consecutively. Example:
John Doe
Cal Doe
Fun Doe
I know I just have to have numeric values such as: john doe-1, cal doe-2, fun doe-3.
I'm having problem with the logic if someone can help I'll appreciate it.

If your data is in three columns (starting A2) then maybe:
=IF(B2="MD",MAX(C$1:C1)+1,"")

Related

How update a dataframe column value from second dataframe where values on two specific columns that can repeat on first match on both dataframes?

I have two dataframes with different information about a person, on the first dataframe, person's name may repeat in different rows. I want to add/update the first dataframe with data from the second dataframe where the two columns containing person's data matches on both. Here an example on what I need to accomplish:
df1:
name surname
0 john doe
1 mary doe
2 peter someone
3 mary doe
4 john another
5 paul another
df2:
name surname account_id
0 peter someone 100
1 john doe 200
2 mary doe 300
3 john another 400
I need to accomplish this:
df1:
name surname account_id
0 john doe 200
1 mary doe 300
2 peter someone 100
3 mary doe 300
4 john another 400
5 paul another <empty>
Thanks!

How do you fill uneven pandas dataframe column with first value in column

import pandas as pd
dict = {'Name' : ['John'], 'Last Name': ['Smith'], 'Activity':['Run', 'Jump', 'Hide', 'Swim', 'Eat', 'Sleep']}
df = pd.DataFrame(dict)
How do I make it so 'John' & 'Smith' are populated in each 'Activity' that he does in a dataframe?
Let us try json_normalize
out = pd.json_normalize(d,'Activity',['Name','Last Name'])
Out[160]:
0 Name Last Name
0 Run John Smith
1 Jump John Smith
2 Hide John Smith
3 Swim John Smith
4 Eat John Smith
5 Sleep John Smith
Input
d = {'Name' : ['John'], 'Last Name': ['Smith'], 'Activity':['Run', 'Jump', 'Hide', 'Swim', 'Eat', 'Sleep']}
If you strictly have one pair of Name/Last Name, you can modify the dictionary so that pandas reads activity as a list
d = {k: [v] if len(v) > 1 else v for k, v in d.items()}
df = pd.DataFrame(d)
df.explode('Activity')
Name Last Name Activity
0 John Smith Run
0 John Smith Jump
0 John Smith Hide
0 John Smith Swim
0 John Smith Eat
0 John Smith Sleep

Separate a name into first and last name using Pandas

I have a DataFrame that looks like this:
name birth
John Henry Smith 1980
Hannah Gonzalez 1900
Michael Thomas Ford 1950
Michelle Lee 1984
And I want to create two new columns, "middle" and "last" for the middle and last names of each person, respectively. People who have no middle name should have None in that data frame.
This would be my ideal result:
name middle last birth
John Henry Smith 1980
Hannah None Gonzalez 1900
Michael Thomas Ford 1950
Michelle None Lee 1984
I have tried different approaches, such as this:
df['middle'] = df['name'].map(lambda x: x.split(" ")[1] if x.count(" ")== 2 else None)
df['last'] = df['name'].map(lambda x: x.split(" ")[1] if x.count(" ")== 1 else x.split(" ")[2])
I even made some functions that try to do the same thing more carefully, but I always get the same error: "List Index out of range". This is weird because if I go about printing df.iloc[i,0].split(" ") for i in range(len(df)), I do get lists with length 2 or length 3 only.
I also printed x.count(" ") for all x in the "name" column and I always got either 1 or 2 as a result. There are no single names.
This is my first question so thank you so much!
Use Series.str.replace with expand = True.
df2 = (df['name'].str
.split(' ',expand = True)
.rename(columns = {0:'name',1:'middle',2:'last'}))
new_df = df2.assign(middle = df2['middle'].where(df2['last'].notnull()),
last = df2['last'].fillna(df2['middle']),
birth = df['birth'])
print(new_df)
name middle last birth
0 John Henry Smith 1980
1 Hannah NaN Gonzalez 1900
2 Michael Thomas Ford 1950
3 Michelle NaN Lee 1984

How do I create single cell arrays based on unique contiguous groups?

Column A identifies unique families using multiple other columns of data.
Column B is a list of individuals.
I would like Column C to contain cell arrays of these families (Shown Below).
For some reason, the MATCH formula in my attempted solution is returning the last occurrence of the match, so it does not work.
I have tried this formula (the output of this is shown in Column D in the picture):
{=OFFSET(INDEX(A:A, MATCH(A1,A:A)),0,1,COUNTIF(A:A,A1))}
A B C D
1 Tom 1 {Tom One, Sue One} Sue 1
1 Sue 1 {Tom One, Sue One} Sue 1
2 Bob 2 {Bob Two, Joan Two, John Two} John 2
2 Joan 2 {Bob Two, Joan Two, John Two} John 2
2 John 2 {Bob Two, Joan Two, John Two} John 2
3 Tom 3 {Tom Three} Tom 3
4 Joe 4 {Joe Four} Joe 4
You can use the following formula, the condition is that it is sorted baed on column A:
="{"&TEXTJOIN(",",TRUE,INDEX(B:B,MATCH(A1,A:A,0)):INDEX(B:B,MATCH(A1,A:A,1)))&"}"

ms excel turn columns into rows

I am looking for a way to take an excel spread sheet and change the columns to rows. The sheet is currently designed like this.
Employee Name Expiration 1 Expiration 2 Expiration 3 Expiration 4
John Doe
Jane Doe
What I need to do is convert it to this
Employee Name Expiration Date
John Doe 1 12-12-12
John Doe 2 12-12-12
John Doe 3 12-12-12
John Doe 4 12-12-12
Jane Doe 1 12-1-12
Jane Doe 2 12-1-12
Jane Doe 3 12-1-12
Jane Doe 4 12-1-12
I am not even sure if this is possible.
You can do this very simply with an INDEX(MATCH(),MATCH()) formula.
This screenshot should be enough to get you started:
Screenshot(1)
Let me know if you need any more info.

Resources