Excel formula to find specific data based on max row result - excel

Good day beautiful people of Scotl... Stackoverflow.
I have faced issue in Excel which I have no idea how to solve. I tried many formulas but I believe that the problem is in my mind, since I have troubles to imagine the logical way it should follow.
I have attached a screenshot to clarify my problem:
Excel screenshot
Description of a screenshot
Column B - data name,
Rows C3:H3 - product name,
Table C4:H15 - some data (description, dates, etc.).
Column I is my extra and it is not mandatory to be there.
Desired result
I want to get data from table above to the table below but if there is one or more "DataX", I want Excel to pick the "DataX" where the biggest amount of rows are filled up (I have marked them blue for each DataX).
For example, for:
Data 1 - row 4,
Data 2 - row 7,
Data 3 - (obviously) row 9,
Data 4 - rows 11,
Data 5 - row 13.
If one or more records will match (all rows are empty / filled up), I don't care which row will be presented as a result.
What I have tried
I have added calculation (column I) which shows how many rows were updated and I was trying to find combination of v,hlookup + max but it wasn't working correctly.
I also created VBA code for it, which was working... almost good but then I received information that macros are no-go zone for this project.
Logic
I strongly believe that the logic should be as following:
Find matching DataX,
Find max value in row I (or include it in formula),
Find corresponding rows / columns for this record.
A
B
C
D
E
F
G
H
I
2
CAT 1
CAT 2
CAT 3
CAT 4
CAT 5
CAT 6
Count not blank
3
1
2
3
4
5
6
4
Data 1
AAA
BBB
CCC
EEE
FFF
=$H$3-COUNTBLANK(C4:H4)
5
Data 1
BBB
CCC
DDD
=$H$3-COUNTBLANK(C5:H5)
6
Data 1
AAA
BBB
EEE
FFF
=$H$3-COUNTBLANK(C6:H6)
7
Data 2
AAA
BBB
CCC
DDD
EEE
FFF
=$H$3-COUNTBLANK(C7:H7)
8
Data 2
AAA
BBB
CCC
DDD
FFF
=$H$3-COUNTBLANK(C8:H8)
9
Data 3
AAA
BBB
CCC
EEE
FFF
=$H$3-COUNTBLANK(C9:H9)
10
Data 4
CCC
DDD
EEE
FFF
=$H$3-COUNTBLANK(C10:H10)
11
Data 4
AAA
BBB
CCC
DDD
FFF
=$H$3-COUNTBLANK(C11:H11)
12
Data 4
AAA
BBB
CCC
EEE
FFF
=$H$3-COUNTBLANK(C12:H12)
13
Data 5
AAA
BBB
CCC
DDD
EEE
FFF
=$H$3-COUNTBLANK(C13:H13)
14
Data 5
BBB
CCC
DDD
EEE
FFF
=$H$3-COUNTBLANK(C14:H14)
15
Data 5
AAA
BBB
DDD
EEE
FFF
=$H$3-COUNTBLANK(C15:H15)

Hello dear son of Scotl.. overflow!
Please add to column J (range J4:J15) this additional formula
=CONCATENATE(B4,I4)
and then paste this to C19:
=INDEX(C$4:C$15,MATCH(CONCATENATE($B19,MAX(IF($B$4:$B$15=$B19,$I$4:$I$15,0))), $J$4:$J$15,0))
paste it as an array formula, i.e. press Ctrl+Shift+Enter simultaneously. Then populate it to the rest of the desired range.
The numbers in my example table do not mean anything, it's the number in I that matters.
Regards!!

Related

Algorithm / Code to find dependency and build row column wise hierarchy model using VBA

Suppose I have two columns ColA = Calling programs and ColB = Called programs, now I want to build a hierarchy between calling and called program and print them with the calling dependency lvl column as below.
Note:
Calling program for which called program is SPACES, is the initial program for a new branch.
Output representation can differ, but it has to be in row and column only.
Input columns:
COLA COLB
AAA
AAA BBB
AAA CCC
BBB
BBB CCC
CCC DDD
CCC GGG
CCC HHH
DDD
DDD III
DDD MMM
EEE
EEE BBB
EEE FFF
EEE JJJ
EEE KKK
FFF
FFF LLL
FFF MMM
FFF NNN
MMM OOO
Output:
COLA(Initial) LVL COLB(Calling) COLC(Called)
AAA 1
AAA 2 BBB
AAA 3 CCC
AAA 4 DDD
AAA 5 III
AAA 5 MMM
AAA 6 OOO
AAA 4 GGG
AAA 4 HHH
AAA 2 CCC
AAA 3 DDD
AAA 4 III
AAA 4 MMM
AAA 5 OOO
AAA 3 GGG
AAA 3 HHH
BBB 1
BBB 2 CCC
BBB 3 DDD
BBB 4 III
BBB 4 MMM
BBB 5 OOO
BBB 3 GGG
BBB 3 HHH
DDD 1
DDD 2 III
DDD 2 MMM
DDD 3 OOO
EEE 1
EEE 2 FFF
EEE 3 LLL
EEE 3 MMM
EEE 4 OOO
EEE 3 NNN
EEE 2 JJJ
EEE 2 KKK
FFF 1
FFF 2 LLL
FFF 2 MMM
FFF 3 OOO
FFF 2 NNN
I tried, but I am stuck at LVL 4 and the recursive loop. Please suggest
for i = 1 to i <= last row
lvl_no = 0
if CCi == SPACES
OBJECT_NAME = CAi
lvl_no = 1
copy row i to new excel
for j = 1 to j <= last row
if CAj = OBJECT_NAME && CCj != SPACES
lvl_no = 1 + 1
copy row j to new excel
dep_obj = CCj
ROW = 1 BBB
function_dep(dep_obj,lvl_no,ROW)
j++
ELSE J++
function_dep (object_name, lvl, row)
{
for k=row to k<= last_row
if CAk = object_name && CCk !=spaces
lvl = lvl + 1
dep_obj = CCk
row = 1
print line k, lvl
call function_dep(dep_obj, lvl, row)
else k++
}
As per the below comment suggestion I updated my input with some new rows like (DDD , EEE BBB and MMM OOO), and as per the input the output also got updated with new levels as per dependencies.
Below suggested solution no working for me, as for EEE->BBB dependency it is only showing single row EEE->BBB and missed the whole forward dependencies (EEE->BBB->CCC->DDD and so on) considering it as a duplicate.

Pandas - Groupby Company and drop rows according to criteria based off the Dates of values being out of order

I have a history data log and want to calculate the number of days between the progress by Company (Timestamp of the early stage must be smaller than the later stage).
Company Progress Time
AAA 3. Contract 07/10/2020
AAA 2. Discuss 03/09/2020
AAA 1. Start 02/02/2020
BBB 3. Contract 11/13/2019
BBB 3. Contract 07/01/2019
BBB 1. Start 06/22/2019
BBB 2. Discuss 04/15/2019
CCC 3. Contract 05/19/2020
CCC 2. Discuss 04/08/2020
CCC 2. Discuss 03/12/2020
CCC 1. Start 01/01/2020
Expected outputs:
Progress (1. Start --> 2. Discuss)
Company Progress Time
AAA 1. Start 02/02/2020
AAA 2. Discuss 03/09/2020
CCC 1. Start 01/01/2020
CCC 2. Discuss 03/12/2020
Progress (2. Discuss --> 3. Contract)
Company Progress Time
AAA 2. Discuss 03/09/2020
AAA 3. Contract 07/10/2020
CCC 2. Discuss 03/12/2020
CCC 3. Contract 05/19/2020
I did try some stupid ways to do the work but still need manualyl filter in excel, below is my coding:
df_stage1_stage2 = df[(df['Progress']=='1. Start')|(df['Progress']=='2. Discuss ')]
pd.pivot_table(df_stage1_stage2 ,index=['Company','Progress'],aggfunc={'Time':min})
Can anyone help with the problem? thanks
Create some masks to filter out the relevant rows. m1 and m2 filter out groups where 1. Start is not the "first" datetime if looking at in reverse order )since your dates are sorted by Company ascending and date descending). You can create more masks if you need to also check if 2. Discuss and 3. Contract are in order, instead of the current logic which is only checking to make sure that 1. is in order. But, with the data you provided that returns the correct output:
m1 = df.groupby('Company')['Progress'].transform('last')
m2 = np.where((m1 == '1. Start'), 'drop', 'keep')
df = df[m2=='drop']
df
intermediate output:
Company Progress Time
0 AAA 3. Contract 07/10/2020
1 AAA 2. Discuss 03/09/2020
2 AAA 1. Start 02/02/2020
7 CCC 3. Contract 05/19/2020
8 CCC 2. Discuss 04/08/2020
9 CCC 2. Discuss 03/12/2020
10 CCC 1. Start 01/01/2020
From there, filter as you have indicated by sorting and dropping duplicates based off a subset of the first two columns and keep the 'first' duplicate:
final df1 and df2 output:
df1
df1 = df[df['Progress'] != '3. Contract'] \
.sort_values(['Company', 'Time'], ascending=[True,True]) \
.drop_duplicates(subset=['Company', 'Progress'], keep='first')
df1 output:
Company Progress Time
2 AAA 1. Start 02/02/2020
1 AAA 2. Discuss 03/09/2020
10 CCC 1. Start 01/01/2020
9 CCC 2. Discuss 03/12/2020
df2
df2 = df[df['Progress'] != '1. Start'] \
.sort_values(['Company', 'Time'], ascending=[True,True]) \
.drop_duplicates(subset=['Company', 'Progress'], keep='first')
df2 output:
Company Progress Time
1 AAA 2. Discuss 03/09/2020
0 AAA 3. Contract 07/10/2020
9 CCC 2. Discuss 03/12/2020
7 CCC 3. Contract 05/19/2020
Something like this could work, assuming an already sorted df:
(full example)
data = {
'Company':['AAA', 'AAA', 'AAA', 'BBB','BBB','BBB','BBB','CCC','CCC','CCC','CCC',],
'Progress':['3. Contract', '2. Discuss', '1. Start', '3. Contract', '3. Contract', '2. Discuss', '1. Start', '3. Contract', '2. Discuss', '2. Discuss', '1. Start', ],
'Time':['07-10-2020','03-09-2020','02-02-2020','11-13-2019','07-01-2019','06-22-2019','04-15-2019','05-19-2020','04-08-2020','03-12-2020','01-01-2020',],
}
df = pd.DataFrame(data)
df['Time'] = pd.to_datetime(df['Time'])
# We want to measure from the first occurrence (last date) if duplicated:
df.drop_duplicates(subset=['Company', 'Progress'], keep='first', inplace=True)
# Except for the rows of 'start', calculate the difference in days
df['days_delta'] = np.where((df['Progress'] != '1. Start'), df.Time.diff(-1), 0)
Output:
Company Progress Time days_delta
0 AAA 3. Contract 2020-07-10 123 days
1 AAA 2. Discuss 2020-03-09 36 days
2 AAA 1. Start 2020-02-02 0 days
3 BBB 3. Contract 2019-11-13 144 days
5 BBB 2. Discuss 2019-06-22 68 days
6 BBB 1. Start 2019-04-15 0 days
7 CCC 3. Contract 2020-05-19 41 days
8 CCC 2. Discuss 2020-04-08 98 days
10 CCC 1. Start 2020-01-01 0 days
If you do not want the 'days' word in output use:
df['days_delta'] = df['days_delta'].dt.days
First Problem
#Coerce Time to Datetime
df['Time']=pd.to_datetime(df['Time'])
#`groupby().nth[]` `to slice the consecutive order`
df2=(df.merge(df.groupby(['Company'])['Time'].nth([-2,-1]))).sort_values(by=['Company','Time'], ascending=[True, True])
#Apply the universal rule for this problem which is, after groupby nth, drop any agroup with duplicates
df2[~df2.Company.isin(df2[df2.groupby('Company').Progress.transform('nunique')==1].Company.values)]
#Calculate the diff() in Time in each group
df2['diff'] = df2.sort_values(by='Progress').groupby('Company')['Time'].diff().dt.days.fillna(0)#.groupby('Company')['Time'].diff() / np.timedelta64(1, 'D')
#Filter out the groups where start and Discuss Time are in conflict
df2[~df2.Company.isin(df2.loc[df2['diff']<0, 'Company'].unique())]
Company Progress Time diff
1 AAA 1.Start 2020-02-02 0.0
0 AAA 2.Discuss 2020-03-09 36.0
5 CCC 1.Start 2020-01-01 0.0
4 CCC 2.Discuss 2020-03-12 71.0
Second Problem
#Groupbynth to slice right consecutive groups
df2=(df.merge(df.groupby(['Company'])['Time'].nth([0,1]))).sort_values(by=['Company','Time'], ascending=[True, True])
#Drop any groups after grouping that have duplicates
df2[~df2.Company.isin(df2[df2.groupby('Company').Progress.transform('nunique')==1].Company.values)]
Company Progress Time
1 AAA 2.Discuss 2020-03-09
0 AAA 3.Contract 2020-07-10
5 CCC 2.Discuss 2020-04-08
4 CCC 3.Contract 2020-05-19

Excel : Getting first values out of a table

(First of all, sorry that the pictures are linked and not displayed, but I still don't have 10 reputation to do so :'( )
Hello everyone !
I'm in need of your advices and answers : Despite searching through the Internet, I couldn't find anything for my problem. So, I'm coming here, in hope that you all can enlight me.
I'm having a data table looking something like this : (This is an example)
For copy / paste:
Name Index Val 1 Val 2 Val 3
AAA 1 121 12 81921
BBB 2 651 9491 1
CCC 3 11 90121 210
DDD 4 612 18 29
EEE 5 1441 12 123
FFF 6 12 1921 51
GGG 7 210 120 1245
… … … … …
I'm looking for formulas that will allow me to get the 5 highest values of a specific type, then display it in the following formant : (Another example handmade)
Which will look like this :
Name Val 1 Name Val 2 Name Val 3
EEE 1441 CCC 90121 AAA 81921
BBB 651 BBB 9491 GGG 1245
DDD 612 FFF 1921 CCC 210
GGG 210 GGG 120 EEE 123
AAA 121 DDD 18 FFF 51
In each different newly-created table, I should have the 5 highest datas of a given value.
Ideally, these new tables should be automatically updated when new datas are entered in the main table, so that there is no need to recheck everything.
Thanks a lot for your future answers ! If you need answers to better understand my problem and what I'd need, I'll love to answer you !
After creating appropriate column header labels, put these two formulas in G2:H2.
=INDEX($A:$A, AGGREGATE(15, 7, ROW($B$2:INDEX($B:$B, MATCH(1E+99, $B:$B)))/(INDEX($A:$E, 2, MATCH(H$1, $A$1:$E$1, 0)):INDEX($A:$E, MATCH(1E+99, $B:$B), MATCH(H$1, $A$1:$E$1, 0))=H2), COUNTIF(H$2:H2, H2)))
=AGGREGATE(14, 7, INDEX($A:$E, 2, MATCH(H$1, $A$1:$E$1, 0)):INDEX($A:$E, MATCH(1E+99, $B:$B), MATCH(H$1, $A$1:$E$1, 0)), ROW(1:1))
Fill down four additional rows then copy G2:H6 to J2:K6 and M2:N6.

Split a column's values by a special character and group by pandas

I have a df like this,
Owner Messages
AAA (YY) Duplicates
AAA Missing Number; (VV) Corrected Value; (YY) Duplicates
AAA (YY) Duplicates
BBB (YY) Duplicates
BBB Missing Measure; Missing Number
When I do a normal groupby like this,
df_grouped = df.groupby([' Owner', 'Messages']).size().reset_index(name='count')
df_grouped
I get this as expected,
Owner Messages count
0 AAA (YY) Duplicates 2
1 AAA Missing Number; (VV) Corrected Value; (YY) Duplicates 1
2 BBB (YY) Duplicates 1
3 BBB Missing Measure; Missing Number 1
However, I need something (desired output) like this splitting by ; inside Messages column.
Owner Messages count
0 AAA (YY) Duplicates 3
1 AAA Missing Number 1
2 AAA (VV) Corrected Value 1
3 BBB (YY) Duplicates 1
4 BBB Missing Measure 1
5 BBB Missing Number 1
So far, based on this post, #LeoRochael's answer, it splits Messages column's values by ; and puts into a list. Anyhow, I can not get the individual count after splitting.
Any ideas how to get my desired output?
You need to unnest your original dataframe , then we just do group size
s=df.set_index('Owner').Messages.str.split('; ',expand=True).stack().to_frame('Messages').reset_index()
s.groupby(['Owner','Messages']).size()
Out[1213]:
Owner Messages
AAA (VV) Corrected Value 1
(YY) Duplicates 3
Missing Number 1
BBB (YY) Duplicates 1
Missing Measure 1
Missing Number 1
dtype: int64
from collections import Counter
import pandas as pd
pd.Series(
Counter([(o, m) for o, M in df.values for m in M.split('; ')])
).rename_axis(['Owner', 'Message']).reset_index(name='Count')
Owner Message Count
0 AAA (VV) Corrected Value 1
1 AAA (YY) Duplicates 3
2 AAA Missing Number 1
3 BBB (YY) Duplicates 1
4 BBB Missing Measure 1
5 BBB Missing Number 1

Delete entire row if Consequent Data is Matching in Excel 2007

I have a problem is if my columns Consequent Data is Matching then it should Delete row.
for e.g.
Before
column A column B
aaa 10
aaa 10
aaa 5
bbb 6
aaa 10
bbb 5
After
column A column B
aaa 10
aaa 5
bbb 6
bbb 5
Select all the data in column a and b and then on the data ribbon select remove duplicates.

Resources