How to find max and return adjacent cell in Excel - excel

Imagine a table:
Red 8 Black 1
Red 2 Black 3
Red 1 Black 0
Red 7 Black 8
Red 4 Black 5
How do I return "Red" or "Black" in a third column for each row depending on which has a larger value?
It would be:
Red 8 Black 1 Red
Red 2 Black 3 Black
Red 1 Black 0 Red
Red 7 Black 8 Black
Red 4 Black 5 Black

Use:
=INDEX(A2:D2,MATCH(MAX(A2:D2),A2:D2,0)-1)
Edit:
Since there are only two Options, a simple IF will work:
=IF(B2>D2,A2,C2)

Related

How can I display data from 3 tables into a single table?

I have 3 datasets a,b,c and 3 tables that give values based on the strength of their relationships. a has 5 items, b has 7 items, and c has 4 items. (a,b)(a,c)(b,c). I am trying to find a way to display all the data from these tables in a single graphic. The axis would have a Y shape with each leg representing a different data set and the cells representing their relationship strength.
I have looked through excel charts and haven't found anything that might help represent this. Is there another program which would be better? Looking for something simple to use.
Below is an example of the 3 tables and the type of data they contain. I want a way to show the scores for each relationship in a grid.
Table 1
A
B
Score
1
x
3
2
x
5
3
x
0
1
y
2
2
y
6
3
y
0
1
z
5
2
z
8
3
z
0
Table 2
A
C
Score
1
blue
3
2
blue
8
3
blue
2
1
red
0
2
red
4
3
red
1
1
yellow
3
2
yellow
3
3
yellow
9
Table 3
B
C
Score
x
blue
2
x
red
1
x
yellow
5
y
blue
0
y
red
3
y
yellow
7
z
blue
0
z
red
1
z
yellow
3
Here is an example of what I am trying to do with the data
I manually created this type of visual in autoCAD
This works as a one-off but it doesn't scale and is very tedious. Hoping there is a programmatic way to create something similar.

Grouping By 2 Columns In Pandas Ignoring Order

I have a Dataframe in Pandas where there are 2 columns that are almost identical but not quite and hence sometimes I want to group by both columns ignoring the order.
As an example:
mydf = pd.DataFrame({'Colour1': ['Red', 'Red', 'Blue', 'Green', 'Blue'], 'Colour2': ['Red', 'Blue', 'Red', 'Blue', 'Green'], 'Rating': [4, 5, 7, 8, 2]})
Colour1 Colour2 Rating
0 Red Red 4
1 Red Blue 5
2 Blue Red 7
3 Green Blue 8
4 Blue Green 2
I would like to group by Colour1 and Colour2 whilst ignoring the order and then transforming the Dataframe by taking the mean to produce the following Dataframe:
Colour1 Colour2 Rating MeanRating
0 Red Red 4 4
1 Red Blue 5 6
2 Blue Red 7 6
3 Green Blue 8 5
4 Blue Green 2 5
Is there a good way of doing this? Thanks in advance.
You can first sort the column1 and 2 using np.sort then groupby:
s = pd.Series(map(tuple,np.sort(mydf[['Colour1','Colour2']],axis=1)),index=mydf.index)
mydf['MeanRating'] = mydf['Rating'].groupby(s).transform('mean')
print(mydf)
Colour1 Colour2 Rating MeanRating
0 Red Red 4 4
1 Red Blue 5 6
2 Blue Red 7 6
3 Green Blue 8 5
4 Blue Green 2 5

Pandas Identify duplicate records, create a new column and add the ID of first occurrence

I am a newbie in python, so please be mercy with me :)
Let's say, that there is a dataframe like this
ID B C D E isDuplicated
1 Blue Green Blue Pink false
2 Red Green Red Green false
3 Red Orange Yellow Green false
4 Blue Pink Blue Pink false
5 Blue Orange Pink Green false
6 Blue Orange Pink Green true
7 Red Orange Yellow Green true
8 Red Orange Yellow Green true
If I have duplicates in the rows with the subset= B,C,D,E.
Then I would like to add an other column 'firstOccurred', which should have the ID of the first occurrence.
My desired dataframe should look like this:
ID B C D E isDuplicated firstOccurred
1 Blue Green Blue Pink false
2 Red Green Red Green false
3 Red Orange Yellow Green false
4 Blue Pink Blue Pink false
5 Blue Orange Pink Green false
6 Blue Orange Pink Green true 5
7 Red Orange Yellow Green true 3
8 Red Orange Yellow Green true 3
I would be grateful for any help!
Thank you in advance!
Use GroupBy.transform with first only for roww with True passed in numpy.where:
df['firstOccurred'] = np.where(df['isDuplicated'],
df.groupby(['B','C','D','E'])['ID'].transform('first'),
np.nan)
print (df)
ID B C D E isDuplicated firstOccurred
0 1 Blue Green Blue Pink False NaN
1 2 Red Green Red Green False NaN
2 3 Red Orange Yellow Green False NaN
3 4 Blue Pink Blue Pink False NaN
4 5 Blue Orange Pink Green False NaN
5 6 Blue Orange Pink Green True 5.0
6 7 Red Orange Yellow Green True 3.0
7 8 Red Orange Yellow Green True 3.0

Pandas create a new data frame from counting rows into columns

I have something like this data frame:
item color
0 A red
1 A red
2 A green
3 B red
4 B green
5 B green
6 C red
7 C green
And I want to count the times a color repeat for each item and group-by it into columns like this:
item red green
0 A 2 1
1 B 1 2
2 C 1 1
Any though? Thanks in advance

How to count matching words from 2 csv files

I have 2 csv files, dictionary.csv and story.csv. I wanted to count how many words in story.csv per row matches with words from dictionary.csv
Below are truncated examples
Story.csv
id STORY
0 Jennie have 2 shoes, a red heels and a blue sneakers
1 The skies are pretty today
2 One of aesthetic color is grey
Dictionary.csv
red
green
grey
blue
black
The output i expected is
output.csv
id STORY Found
0 Jennie have 2 shoes, a red heels and a blue sneakers 2
1 The skies are pretty today 0
2 One of aesthetic color is grey 1
These are the codes i have so far, but i only got NaN(empty cells)
import pandas as pd
import csv
news=pd.read_csv("Story.csv")
dictionary=pd.read_csv("Dictionary.csv")
news['STORY'].value_counts()
news['How many found in 1'] = dictionary['Lists'].map(news['STORY'].value_counts())
news.to_csv("output.csv")
I tried using .str.count as well, but i kept on getting zeros
Try this
import pandas as pd
#create the sample data frame
data = {'id':[0,1,2],'STORY':['Jennie have 2 shoes, a red heels and a blue sneakers',\
'The skies are pretty today',\
'One of aesthetic color is grey']}
word_list = ['red', 'green', 'grey', 'blue', 'black']
df = pd.DataFrame(data)
#start counting
df['Found'] = df['STORY'].astype(str).apply(lambda t: pd.Series({word: t.count(word) for word in word_list}).sum())
#alternatively, can use this
#df['Found'] = df['STORY'].astype(str).apply(lambda t: sum([t.count(word) for word in word_list]))
Output
df
# id STORY Found
#0 0 Jennie have 2 shoes, a red heels and a blue sneakers 2
#1 1 The skies are pretty today 0
#2 2 One of aesthetic color is grey 1
Bonus edit: if you want to see the detailed break down of word count by word, then run this
df['STORY'].astype(str).apply(lambda t: pd.Series({word: t.count(word) for word in word_list}))
# red green grey blue black
#0 1 0 0 1 0
#1 0 0 0 0 0
#2 0 0 1 0 0

Resources