I have 2 pandas dataframes:
data1=
sample ID
name
sex
0
a
male
1
b
male
2
c
male
3
d
male
4
e
male
data2=
samples
Diabetic
age
0
yes
43
1
yes
50
2
no
63
3
no
21
4
yes
44
I want to merge both data frames to end up with the following data frame
samples
Diabetic
age
name
sex
0
yes
43
a
male
1
yes
50
b
male
2
no
63
c
male
3
no
21
d
male
4
yes
44
e
male
enter image description here
I've combined 10 excel files each with 1yr of NFL passing stats and there are certain columns (Games played, Completions, Attempts, etc) that I have summed but I'd need (Passer rating, and QBR) that I'd like to see the avg for.
df3 = df3.groupby(['Player'],as_index=False).agg({'GS':'sum' ,'Cmp': 'sum', 'Att': 'sum','Cmp%': 'sum','Yds': 'sum','TD': 'sum','TD%': 'sum', 'Int': 'sum', 'Int%': 'sum','Y/A': 'sum', 'AY/A': 'sum','Y/C': 'sum','Y/G':'sum','Rate':'sum','QBR':'sum','Sk':'sum','Yds.1':'sum','NY/A': 'sum','ANY/A': 'sum','Sk%':'sum','4QC':'sum','GWD': 'sum'})
Quick note: don't attach photos of your code, dataset, errors, etc. Provide the actual code, the actual dataset (or a sample of the dataset), etc, so that users can reproduce the error, issue, etc. No one is really going to take the time to manufacture your dataset from a photo (or I should say rarely, as I did do that...because I love working with sports data, and I could grab it realitively quickly).
But to get averages in stead of the sum, you would use 'mean'. Also, in your code, why are you summing percentages?
import pandas as pd
df = pd.DataFrame()
for season in range(2010, 2020):
url = 'https://www.pro-football-reference.com/years/{season}/passing.htm'.format(season=season)
df = df.append(pd.read_html(url)[0], sort=False)
df = df[df['Rk'] != 'Rk']
df = df.reset_index(drop=True)
df['Player'] = df.Player.str.replace('[^a-zA-Z .]', '')
df['Player'] = df['Player'].str.strip()
strCols = ['Player','Tm', 'Pos', 'QBrec']
numCols = [ x for x in df.columns if x not in strCols ]
df[['QB_wins','QB_loss', 'QB_ties']] = df['QBrec'].str.split('-', expand=True)
df[numCols] = df[numCols].apply(pd.to_numeric)
df3 = df.groupby(['Player'],as_index=False).agg({'GS':'sum', 'TD':'sum', 'QBR':'mean'})
Output:
print (df3)
Player GS TD QBR
0 A.J. Feeley 3 1 27.300000
1 A.J. McCarron 4 6 NaN
2 Aaron Rodgers 142 305 68.522222
3 Ace Sanders 4 1 100.000000
4 Adam Podlesh 0 0 0.000000
5 Albert Wilson 7 1 99.700000
6 Alex Erickson 6 0 NaN
7 Alex Smith 121 156 55.122222
8 Alex Tanney 0 1 42.900000
9 Alvin Kamara 9 0 NaN
10 Andrew Beck 6 0 NaN
11 Andrew Luck 86 171 62.766667
12 Andy Dalton 133 204 53.375000
13 Andy Lee 0 0 0.000000
14 Anquan Boldin 32 0 11.600000
15 Anthony Miller 4 0 81.200000
16 Antonio Andrews 10 1 100.000000
17 Antonio Brown 55 1 29.300000
18 Antonio Morrison 4 0 NaN
19 Antwaan Randle El 0 2 100.000000
20 Arian Foster 13 1 100.000000
21 Armanti Edwards 0 0 41.466667
22 Austin Davis 10 13 38.150000
23 B.J. Daniels 0 0 NaN
24 Baker Mayfield 29 49 53.200000
25 Ben Roethlisberger 130 236 66.833333
26 Bernard Scott 1 0 5.600000
27 Bilal Powell 12 0 17.700000
28 Billy Volek 0 0 89.400000
29 Blaine Gabbert 48 48 37.687500
.. ... ... ... ...
329 Tim Boyle 0 0 NaN
330 Tim Masthay 0 1 5.700000
331 Tim Tebow 16 17 42.733333
332 Todd Bouman 1 2 57.400000
333 Todd Collins 1 0 0.800000
334 Tom Brady 156 316 72.755556
335 Tom Brandstater 0 0 0.000000
336 Tom Savage 9 5 38.733333
337 Tony Pike 0 0 2.500000
338 Tony Romo 72 141 71.185714
339 Travaris Cadet 1 0 NaN
340 Travis Benjamin 8 0 1.700000
341 Travis Kelce 15 0 1.800000
342 Trent Edwards 3 2 98.100000
343 Tress Way 0 0 NaN
344 Trevone Boykin 0 1 66.400000
345 Trevor Siemian 25 30 40.750000
346 Troy Smith 6 5 38.500000
347 Tyler Boyd 14 0 2.800000
348 Tyler Bray 0 0 0.000000
349 Tyler Palko 4 2 56.600000
350 Tyler Thigpen 1 2 30.233333
351 Tyreek Hill 13 0 0.000000
352 Tyrod Taylor 46 54 51.242857
353 Vince Young 11 14 50.850000
354 Will Grier 2 0 NaN
355 Willie Snead 4 1 100.000000
356 Zach Mettenberger 10 12 24.600000
357 Zach Pascal 13 0 NaN
358 Zay Jones 15 0 0.000000
I have below table
Month LoB Score Rank
Jan A 1
Jan B 2
Feb B 1
Feb B 2
Jan A 2
Mar C 1
Feb A 3
Jan A 3
Mar C 2
Mar A 1
Mar C 3
I want to Rank the scores basis Month and LoB. For ex in Jan for A whatever is highest will get Rank 1. Similarly in Jan for LoB B whatever is highest will get Rank 1.
I understand that Index and Row formula are to be used in conjunction with Rank.eq but i am unable to put it together at all. I would appreciate any help on this.
Thank you
Assuming Row1 is the header row and actual data lies in the range A2:C11, then try this...
In D2
=SUMPRODUCT(($A$2:$A$11=A2)*($B$2:$B$11=B2)*($C$2:$C$11>C2))+1
and copy it down.
AoA / Good morning
by using Rank formula problems face in correct position.=RANK(K5,K5:K34)
Marks Position total marks 350
obtained
290 29
346 9 (student obtained 346 marks how he have 9th position he must be 4th position)
250 30
343 20
345 13
342 21
334 26
346 9
345 13
346 9
346 9
348 5
350 1
349 3
335 24
345 13
335 24
348 5
339 22
295 28
350 1
345 13
348 5
344 18
345 13
338 23
347 2
349 3
297 27
Looking to merge some data and summarize the results. Bene poking around google but haven't found anything that will match up duplicates and summarize.
The left side of the table is what I'm starting with, I would like the output on the right side.
Street Name Widgets Sprockets Nuts Bolts Street Name Widgets Sprockets Nuts Bolts
123 Any street ACB Co 10 248 2 50 123 Any street ACB Co 10 846 10 78
123 Any street Bob's plumbing 25 22 2 7 123 Any street Bob's plumbing 25 22 2 7
456 Another st Bill's cars 55 5 456 456 Another st Bill's cars 62 878 13 55
123 Any street ACB Co 54 4 6 789 789 Ave Shelley and co 5 2 2 78
456 Another st Bill's cars 7 878 8 55 789 Ave Divers down 7 90 10 11
789 Ave Shelley and co 5 2 2 78 456 Another st ACB Co 6 50 5
123 Any street ACB Co 544 4 22
456 Another st ACB Co 6 50 5
789 Ave Divers down 6 90 9 4
789 Ave Divers down 1 1 7
Use Pivot Tables an set the layout to tabular.
Details can be found here: https://www.youtube.com/watch?v=LkFPBn7sgEc
I have the above excel table and i would like to calculate the total per company per departament per year. I used:
=SUMPRODUCT(--($A$2:$A$9=A12),--($B$2:$B$9=B12)*$C$2:$F$9)
dosen`t seems to work.
A B C D E F
1 COMPANY DEPART. QUARTER 1 QUARTER 2 QUARTER 3 QUARTER 4
2 AB PRO 123 223 3354 556
3 CD PIV 222 235 223 568
4 CD PRO 236 254 184 223
5 AB STA 254 221 96 265
6 EF PIV 254 112 485 256
7 CD STA 558 185 996 231
8 GH PRO 548 696 698 895
9 AB PRO 148 254 318 229
10
11 TOAL PER COMPANY PER DEPARTAMENT PER YEAR:
12 AB PRO =
Asusming that in Row 12, Col A = AB, and Row 12, Col B == PRO, then:
=SUMPRODUCT((A2:A9=A12)*(B2:B9=B12) *C2:F9)
Example: