Finding the Max of Excel Matrix Data based on Criteria from Maxtrix - excel

I have a data on Matrix and I also have the criteria data in Matrix as well See below
Data from the Matrix
Period
0.0
30
45
60
75
90
105
120
135
150
180
6.0
0.356
0.443
0.469
0.505
0.579
0.525
0.516
0.475
0.342
0.271
0.171
7.0
0.439
0.541
0.558
0.678
0.802
0.642
0.747
0.499
0.436
0.336
0.232
8.0
0.505
0.544
0.591
0.694
0.759
0.747
0.736
0.584
0.560
0.467
0.269
9.0
0.489
0.614
0.618
0.630
0.791
0.687
0.631
0.577
0.507
0.562
0.340
10.0
0.538
0.603
0.572
0.580
0.703
0.643
0.619
0.556
0.489
0.459
0.399
11.0
0.503
0.491
0.513
0.578
0.585
0.630
0.587
0.542
0.439
0.459
0.345
12.0
0.517
0.446
0.539
0.588
0.546
0.564
0.552
0.497
0.411
0.412
0.355
13.0
0.470
0.439
0.545
0.534
0.530
0.482
0.510
0.470
0.422
0.404
0.329
14.0
0.399
0.427
0.469
0.442
0.462
0.434
0.409
0.425
0.382
0.395
0.340
15.0
0.370
0.390
0.388
0.397
0.421
0.393
0.355
0.387
0.355
0.341
0.331
Criteria for the matrix
Period
0.0
30
45
60
75
90
105
120
135
150
180
6.0
3
5
5
6
7
6
6
5
3
2
0
7.0
5
6
7
9
10
8
10
6
5
3
1
8.0
6
6
7
9
10
10
9
7
7
5
2
9.0
6
8
8
8
10
9
8
7
6
7
3
10.0
6
7
7
7
9
8
8
7
6
5
4
11.0
6
6
6
7
7
8
7
6
5
5
3
12.0
6
5
6
7
6
7
7
6
4
4
3
13.0
5
5
6
6
6
5
6
5
4
4
3
14.0
4
5
5
5
5
5
4
5
4
4
3
15.0
4
4
4
4
4
4
3
4
3
3
3
Is there any way to find the maximum of no 3 or 10 from the criteria data on the criteria Matrix, and the max values should be taken the matrix data based on the location from the matrix criteria ?
So from the above No 10 should be the maximum from Matrix ( [7,75] or [7,105] or [8,75] or [8,90] or [9,75] )?
I am expecting Excel function or VBA to find the max data of those numbers?
Thanks alot for your help and taught about it
Excel Function or Excel VBA

Assume tables start (with header row and column) in cell A1 of two sheets named Criteria and Data:
=MAX(SUMPRODUCT( (Criteria!B2:L11=10) * (Data!B2:L11) ) )

Max in Matrix Using Criteria Matrix
If you have Microsoft 365 and if the criteria are in the range N2:N12, in cell O2 of sheet Criteria you could use:
=MAX(TOCOL(($B$2:$L$11=N2)*Data!$B$2:$L$11))
or (more of the same i.e. getting familiar with the LET function)
=LET(tCriteria,$B$2:$L$11,tData,Data!$B$2:$L$11,Criteria,N2,
MAX(TOCOL((tCriteria=Criteria)*tData)))
used in cell P2 of the screenshot, and copy down.

Related

Merge 'column attributes' of a single column into seperate columns, to lower the amount of dummy variables of that single column

if a column has for example 14 different [Unique Values]value_counts(), and they possess something in common,
in our example [when we groupby 'Loan.Purpose' with 'Interest.Rate' column, and compute mean of each [Unique Values]value_counts() based on Loan.Purpose mean() values], we get a certain common average rates for certain value_counts, for e.g :-('car','educational','major_purchase') attributes has the mean = 11.0, now i want to merge the above mentioned ('car','educational','major_purchase') [Unique Values]value_counts(), under column_name "LP_cem" because they have same mean, likewise i want to do the same with other value_counts(),
So that i can reduce the amount of dummy variables from 14 to 4.
basically, i want to merge the 14 different value_counts() under 3/4 columns based on their mean() and then create dummies out of those 3/4 columns
like this given below
LP_cem LP_chos LP_dm LP_hmvw LP_renewable_energy
0 0 0 1 0 0
1 0 0 1 0 0
2 0 0 1 0 0
3 0 0 1 0 0
4 0 1 0 0 0
raw_data['Loan.Purpose'].value_counts()
debt_consolidation 1306
credit_card 443
other 200
home_improvement 151
major_purchase 101
small_business 86
car 50
wedding 39
medical 30
moving 28
vacation 21
house 20
educational 15
renewable_energy 4
Name: Loan.Purpose, dtype: int64
i have clubbed the data from Loan.Purpose based on mean of the Interest.Rate
raw_data_8 = round(raw_data_5.groupby('Loan.Purpose')['Interest.Rate'].mean())
raw_data_8
Loan.Purpose
CHOS 15.0
DM 12.0
car 11.0
credit_card 13.0
debt_consolidation 14.0
educational 11.0
home_improvement 12.0
house 13.0
major_purchase 11.0
medical 12.0
moving 14.0
other 13.0
renewable_energy 10.0
small_business 13.0
vacation 12.0
wedding 12.0
Name: Interest.Rate, dtype: float64
now i want to club the values with same mean's together, i even tried the code but it is giving an error
for i in range(len(raw_data_5.index)):
if raw_data_5['Loan.Purpose'][i] in ['car','educational','major_purchase']:
raw_data_5.iloc[i, 'Loan.Purpose'] = 'cem'
if raw_data_5['Loan.Purpose'][i] in ['home_improvement','medical','vacation','wedding']:
raw_data_5.iloc[i, 'Loan.Purpose'] = 'hmvw'
if raw_data_5['Loan.Purpose'][i] in ['credit_care','house','other','small_business']:
raw_data_5.iloc[i, 'Loan.Purpose'] = 'chos'
if raw_data_5['Loan.Purpose'][i] in ['debt_consolidation','moving']:
raw_data_5.iloc[i, 'Loan.Purpose'] = 'dcm'
error = TypeError Traceback (most recent
call last)
<ipython-input-51-cf7ef2ae1efd> in <module>
----> 1 for i in range(raw_data_5.index):
2 if raw_data_5['Loan.Purpose'][i] in ['car','educational','major_purchase']:
3 raw_data_5.iloc[i, 'Loan.Purpose'] = 'cem'
4 if raw_data_5['Loan.Purpose'][i] in ['home_improvement','medical','vacation','wedding']:
5 raw_data_5.iloc[i, 'Loan.Purpose'] = 'hmvw'
TypeError: 'Int64Index' object cannot be interpreted as an integer
Interest.Rate Loan.Length Loan.Purpose
0 8.90 36.0 debt_consolidation
1 12.12 36.0 debt_consolidation
2 21.98 60.0 debt_consolidation
3 9.99 36.0 debt_consolidation
4 11.71 36.0 credit_card
5 15.31 36.0 other
6 7.90 36.0 debt_consolidation
7 17.14 60.0 credit_card
8 14.33 36.0 credit_card
10 19.72 36.0 moving
11 14.27 36.0 debt_consolidation
12 21.67 60.0 debt_consolidation
13 8.90 36.0 debt_consolidation
14 7.62 36.0 debt_consolidation
15 15.65 60.0 debt_consolidation
16 12.12 36.0 debt_consolidation
17 10.37 60.0 debt_consolidation
18 9.76 36.0 credit_card
19 9.99 60.0 debt_consolidation
20 21.98 36.0 debt_consolidation
21 19.05 60.0 credit_card
22 17.99 60.0 car
23 11.99 36.0 credit_card
24 16.82 60.0 vacation
25 7.90 36.0 debt_consolidation
26 14.42 36.0 debt_consolidation
27 15.31 36.0 debt_consolidation
28 8.59 36.0 other
29 7.90 36.0 debt_consolidation
30 21.00 60.0 debt_consolidation

Interpolate above and below a range of values in a column - Pandas

I was looking for the way to extend the range values inside a Pandas column by interpolation, but I still don't know how to set the 'limits' of the interpolation, I mean, it's something like:
[Distance] [Radiation]
12 120
13 130
14 140
15 150
16 160
17 170
So, what I'm trying to get is the full range of column [Radiation] according to the complete secuence of column [Distance] by interpolation.
[Distance] [Radiation]
1 10
2 20
. .
. .
12 120
13 130
14 140
15 150
16 160
. .
. .
20 200
I was looking in the documentation of pandas and scipy methods but I think I couldn't find it yet.
Thanks for your insights.
One idea is use DataFrame.reindex for add all not existing values of distance and then use DataFrame.interpolate with barycentric method:
df = (df.set_index('Distance')
.reindex(range(1, 21))
.interpolate(method='barycentric', limit_direction='both')
.reset_index())
print (df)
Distance Radiation
0 1 10.0
1 2 20.0
2 3 30.0
3 4 40.0
4 5 50.0
5 6 60.0
6 7 70.0
7 8 80.0
8 9 90.0
9 10 100.0
10 11 110.0
11 12 120.0
12 13 130.0
13 14 140.0
14 15 150.0
15 16 160.0
16 17 170.0
17 18 180.0
18 19 190.0
19 20 200.0

Compairing 4 graph in one graph

I have 4 dataframe with value count of number of occurance per month.
I want to compare all 4 value counts in one graph, so i can see visual difference between every month on these four years.
Like below
i like to have output like this image with years and month
newdf2018.Month.value_counts()
output
1 3451
2 3895
3 3408
4 3365
5 3833
6 3543
7 3333
8 3219
9 3447
10 2943
11 3296
12 2909
newdf2017.Month.value_counts()
1 2801
2 3048
3 3620
4 3014
5 3226
6 3962
7 3500
8 3707
9 3601
10 3349
11 3743
12 2002
newdf2016.Month.value_counts()
1 3201
2 2034
3 2405
4 3805
5 3308
6 3212
7 3049
8 3777
9 3275
10 3099
11 3775
12 2115
newdf2015.Month.value_counts()
1 2817
2 2604
3 2711
4 2817
5 2670
6 2507
7 3256
8 2195
9 3304
10 3238
11 2005
12 2008
Create dictionary of DataFrames and concat together, then use plot:
dfs = {2015:newdf2015, 2016:newdf2016, 2017:newdf2017, 2018:newdf2018}
df = pd.concat({k:v['Month'].value_counts() for k, v in dfs.items()}, axis=1)
df.plot.bar()

why am I getting a too many indexers error?

cars_df = pd.DataFrame((car.iloc[:[1,3,4,6]].values), columns = ['mpg', 'dip', 'hp', 'wt'])
car_t = car.iloc[:9].values
target_names = [0,1]
car_df['group'] = pd.series(car_t, dtypre='category')
sb.pairplot(cars_df)
I have tried using .iloc(axis=0)[xxxx] and making a slice into a list and a tuple. no dice. Any thoughts? I am trying to make a scatter plot from a lynda.com video but in the video, the host is using .ix which is deprecated. So I am using .iloc[]
car = a dataframe
a few lines of data
"Car_name","mpg","cyl","disp","hp","drat","wt","qsec","vs","am","gear","carb"
"Mazda RX4",21,6,160,110,3.9,2.62,16.46,0,1,4,4
"Mazda RX4 Wag",21,6,160,110,3.9,2.875,17.02,0,1,4,4
"Datsun 710",22.8,4,108,93,3.85,2.32,18.61,1,1,4,1
"Hornet 4 Drive",21.4,6,258,110,3.08,3.215,19.44,1,0,3,1
"Hornet Sportabout",18.7,8,360,175,3.15,3.44,17.02,0,0,3,2
"Valiant",18.1,6,225,105,2.76,3.46,20.22,1,0,3,1
"Duster 360",14.3,8,360,245,3.21,3.57,15.84,0,0,3,4
"Merc 240D",24.4,4,146.7,62,3.69,3.19,20,1,0,4,2
"Merc 230",22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2
"Merc 280",19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4
"Merc 280C",17.8,6,167.6,123,3.92,3.44,18.9,1,0,4,4
"Merc 450SE",16.4,8,275.8,180,3.07,4.07,17.4,0,0,3,3
I think you want select multiple columns by iloc:
cars_df = car.iloc[:, [1,3,4,6]]
print (cars_df)
mpg disp hp wt
0 21.0 160.0 110 2.620
1 21.0 160.0 110 2.875
2 22.8 108.0 93 2.320
3 21.4 258.0 110 3.215
4 18.7 360.0 175 3.440
5 18.1 225.0 105 3.460
6 14.3 360.0 245 3.570
7 24.4 146.7 62 3.190
8 22.8 140.8 95 3.150
9 19.2 167.6 123 3.440
10 17.8 167.6 123 3.440
11 16.4 275.8 180 4.070
sb.pairplot(cars_df)
Not 100% sure with another code, it seems need:
#select also 9. column
cars_df = car.iloc[:, [1,3,4,6,9]]
#rename 9. column
cars_df = cars_df.rename(columns={'am':'group'})
#convert it to categorical
cars_df['group'] = pd.Categorical(cars_df['group'])
print (cars_df)
mpg disp hp wt group
0 21.0 160.0 110 2.620 1
1 21.0 160.0 110 2.875 1
2 22.8 108.0 93 2.320 1
3 21.4 258.0 110 3.215 0
4 18.7 360.0 175 3.440 0
5 18.1 225.0 105 3.460 0
6 14.3 360.0 245 3.570 0
7 24.4 146.7 62 3.190 0
8 22.8 140.8 95 3.150 0
9 19.2 167.6 123 3.440 0
10 17.8 167.6 123 3.440 0
11 16.4 275.8 180 4.070 0
#add parameetr hue for different levels of a categorical variable
sb.pairplot(cars_df, hue='group')

Transposing multi index dataframe in pandas

HID gen views
1 1 20
1 2 2532
1 3 276
1 4 1684
1 5 779
1 6 200
1 7 545
2 1 20
2 2 7478
2 3 750
2 4 7742
2 5 2643
2 6 208
2 7 585
3 1 21
3 2 4012
3 3 2019
3 4 1073
3 5 3372
3 6 8
3 7 1823
3 8 22
this is a sample section of a data frame, where HID and gen are indexes.
how can it be transformed like this
HID 1 2 3 4 5 6 7 8
1 20 2532 276 1684 779 200 545 nan
2 20 7478 750 7742 2643 208 585 nan
3 21 4012 2019 1073 3372 8 1823 22
Its called pivoting i.e
df.reset_index().pivot('HID','gen','views')
gen 1 2 3 4 5 6 7 8
HID
1 20.0 2532.0 276.0 1684.0 779.0 200.0 545.0 NaN
2 20.0 7478.0 750.0 7742.0 2643.0 208.0 585.0 NaN
3 21.0 4012.0 2019.0 1073.0 3372.0 8.0 1823.0 22.0
Use unstack:
df = df['views'].unstack()
If need also HID column add reset_index + rename_axis:
df = df['views'].unstack().reset_index().rename_axis(None, 1)
print (df)
HID 1 2 3 4 5 6 7 8
0 1 20.0 2532.0 276.0 1684.0 779.0 200.0 545.0 NaN
1 2 20.0 7478.0 750.0 7742.0 2643.0 208.0 585.0 NaN
2 3 21.0 4012.0 2019.0 1073.0 3372.0 8.0 1823.0 22.0

Resources