I have a database that contains 60,000+ rows of college football recruit data. From there, I want to create seperate dataframes where each one contains just one value. This is what a sample of the dataframe looks like:
,Primary Rank,Other Rank,Name,Link,Highschool,Position,Height,weight,Rating,National Rank,Position Rank,State Rank,Team,Class
0,1,,D.J. Williams,https://247sports.com/Player/DJ-Williams-49931,"De La Salle (Concord, CA)",ILB,6-2,235,0.9998,1,1,1,Miami,2000
1,2,,Brock Berlin,https://247sports.com/Player/Brock-Berlin-49926,"Evangel Christian Academy (Shreveport, LA)",PRO,6-2,190,0.9998,2,1,1,Florida,2000
2,3,,Charles Rogers,https://247sports.com/Player/Charles-Rogers-49984,"Saginaw (Saginaw, MI)",WR,6-4,195,0.9988,3,1,1,Michigan State,2000
3,4,,Travis Johnson,https://247sports.com/Player/Travis-Johnson-50043,"Notre Dame (Sherman Oaks, CA)",SDE,6-4,265,0.9982,4,1,2,Florida State,2000
4,5,,Marcus Houston,https://247sports.com/Player/Marcus-Houston-50139,"Thomas Jefferson (Denver, CO)",RB,6-0,208,0.9980,5,1,1,Colorado,2000
5,6,,Kwame Harris,https://247sports.com/Player/Kwame-Harris-49999,"Newark (Newark, DE)",OT,6-7,320,0.9978,6,1,1,Stanford,2000
6,7,,B.J. Johnson,https://247sports.com/Player/BJ-Johnson-50154,"South Grand Prairie (Grand Prairie, TX)",WR,6-1,190,0.9976,7,2,1,Texas,2000
7,8,,Bryant McFadden,https://247sports.com/Player/Bryant-McFadden-50094,"McArthur (Hollywood, FL)",CB,6-1,182,0.9968,8,1,1,Florida State,2000
8,9,,Sam Maldonado,https://247sports.com/Player/Sam-Maldonado-50071,"Harrison (Harrison, NY)",RB,6-2,215,0.9964,9,2,1,Ohio State,2000
9,10,,Mike Munoz,https://247sports.com/Player/Mike-Munoz-50150,"Archbishop Moeller (Cincinnati, OH)",OT,6-7,290,0.9960,10,2,1,Tennessee,2000
10,11,,Willis McGahee,https://247sports.com/Player/Willis-McGahee-50179,"Miami Central (Miami, FL)",RB,6-1,215,0.9948,11,3,2,Miami,2000
11,12,,Antonio Hall,https://247sports.com/Player/Antonio-Hall-50175,"McKinley (Canton, OH)",OT,6-5,295,0.9946,12,3,2,Kentucky,2000
12,13,,Darrell Lee,https://247sports.com/Player/Darrell-Lee-50580,"Kirkwood (Saint Louis, MO)",WDE,6-5,230,0.9940,13,1,1,Florida,2000
13,14,,O.J. Owens,https://247sports.com/Player/OJ-Owens-50176,"North Stanly (New London, NC)",S,6-1,195,0.9932,14,1,1,Tennessee,2000
14,15,,Jeff Smoker,https://247sports.com/Player/Jeff-Smoker-50582,"Manheim Central (Manheim, PA)",PRO,6-3,190,0.9922,15,2,1,Michigan State,2000
15,16,,Marco Cooper,https://247sports.com/Player/Marco-Cooper-50171,"Cass Technical (Detroit, MI)",OLB,6-2,235,0.9918,16,1,2,Ohio State,2000
16,17,,Chance Mock,https://247sports.com/Player/Chance-Mock-50163,"The Woodlands (The Woodlands, TX)",PRO,6-2,190,0.9918,17,3,2,Texas,2000
17,18,,Roy Williams,https://247sports.com/Player/Roy-Williams-55566,"Permian (Odessa, TX)",WR,6-4,202,0.9916,18,3,3,Texas,2000
18,19,,Matt Grootegoed,https://247sports.com/Player/Matt-Grootegoed-50591,"Mater Dei (Santa Ana, CA)",OLB,5-11,205,0.9914,19,2,3,USC,2000
19,20,,Yohance Buchanan,https://247sports.com/Player/Yohance-Buchanan-50182,"Douglass (Atlanta, GA)",S,6-1,210,0.9912,20,2,1,Florida State,2000
20,21,,Mac Tyler,https://247sports.com/Player/Mac-Tyler-50572,"Jess Lanier (Hueytown, AL)",DT,6-6,320,0.9912,21,1,1,Alabama,2000
21,22,,Jason Respert,https://247sports.com/Player/Jason-Respert-55623,"Northside (Warner Robins, GA)",OC,6-3,300,0.9902,22,1,2,Tennessee,2000
22,23,,Casey Clausen,https://247sports.com/Player/Casey-Clausen-50183,"Bishop Alemany (Mission Hills, CA)",PRO,6-4,215,0.9896,23,4,4,Tennessee,2000
23,24,,Albert Means,https://247sports.com/Player/Albert-Means-55968,"Trezevant (Memphis, TN)",SDE,6-6,310,0.9890,24,2,1,Alabama,2000
24,25,,Albert Hollis,https://247sports.com/Player/Albert-Hollis-55958,"Christian Brothers (Sacramento, CA)",RB,6-0,190,0.9890,25,4,5,Georgia,2000
25,26,,Eric Moore,https://247sports.com/Player/Eric-Moore-55973,"Pahokee (Pahokee, FL)",OLB,6-4,226,0.9884,26,3,3,Florida State,2000
26,27,,Willie Dixon,https://247sports.com/Player/Willie-Dixon-55626,"Stockton Christian School (Stockton, CA)",WR,5-11,182,0.9884,27,4,6,Miami,2000
27,28,,Cory Bailey,https://247sports.com/Player/Cory-Bailey-50586,"American (Hialeah, FL)",S,5-10,175,0.9880,28,3,4,Florida,2000
28,29,,Sean Young,https://247sports.com/Player/Sean-Young-55972,"Northwest Whitfield County (Tunnel Hill, GA)",OG,6-6,293,0.9878,29,1,3,Tennessee,2000
29,30,,Johnnie Morant,https://247sports.com/Player/Johnnie-Morant-60412,"Parsippany Hills (Morris Plains, NJ)",WR,6-5,225,0.9871,30,5,1,Syracuse,2000
30,31,,Wes Sims,https://247sports.com/Player/Wes-Sims-60243,"Weatherford (Weatherford, OK)",OG,6-5,310,0.9869,31,2,1,Oklahoma,2000
31,33,,Jason Campbell,https://247sports.com/Player/Jason-Campbell-55976,"Taylorsville (Taylorsville, MS)",PRO,6-5,190,0.9853,33,5,1,Auburn,2000
32,34,,Antwan Odom,https://247sports.com/Player/Antwan-Odom-50168,"Alma Bryant (Irvington, AL)",SDE,6-7,260,0.9851,34,3,2,Alabama,2000
33,35,,Sloan Thomas,https://247sports.com/Player/Sloan-Thomas-55630,"Klein (Spring, TX)",WR,6-2,188,0.9847,35,6,5,Texas,2000
34,36,,Raymond Mann,https://247sports.com/Player/Raymond-Mann-60804,"Hampton (Hampton, VA)",ILB,6-1,233,0.9847,36,2,1,Virginia,2000
35,37,,Alphonso Townsend,https://247sports.com/Player/Alphonso-Townsend-55975,"Lima Central Catholic (Lima, OH)",DT,6-6,280,0.9847,37,2,3,Ohio State,2000
36,38,,Greg Jones,https://247sports.com/Player/Greg-Jones-50158,"Battery Creek (Beaufort, SC)",RB,6-2,245,0.9837,38,6,1,Florida State,2000
37,39,,Paul Mociler,https://247sports.com/Player/Paul-Mociler-60319,"St. John Bosco (Bellflower, CA)",OG,6-5,300,0.9833,39,3,7,UCLA,2000
38,40,,Chris Septak,https://247sports.com/Player/Chris-Septak-57555,"Millard West (Omaha, NE)",TE,6-3,245,0.9833,40,1,1,Nebraska,2000
39,41,,Eric Knott,https://247sports.com/Player/Eric-Knott-60823,"Henry Ford II (Sterling Heights, MI)",TE,6-4,235,0.9831,41,2,3,Michigan State,2000
40,42,,Harold James,https://247sports.com/Player/Harold-James-57524,"Osceola (Osceola, AR)",S,6-1,220,0.9827,42,4,1,Alabama,2000
For example, if I don't use a for loop, this line of code is what I use if I just want to create one dataframe:
recruits2022 = recruits_final[recruits_final['Class'] == 2022]
However, I want to have a named dataframe for each recruiting class.
In other words, recruits2000 would be a dataframe for all rows that have a class value equal to 2000, recruits2001 would be a dataframe for all rows that have a class value to 2001, and so forth.
This is what I tried recently, but have no luck saving the dataframe outside of the for loop.
databases = ['recruits2000', 'recruits2001', 'recruits2002', 'recruits2003', 'recruits2004',
'recruits2005', 'recruits2006', 'recruits2007', 'recruits2008', 'recruits2009',
'recruits2010', 'recruits2011', 'recruits2012', 'recruits2013', 'recruits2014',
'recruits2015', 'recruits2016', 'recruits2017', 'recruits2018', 'recruits2019',
'recruits2020', 'recruits2021', 'recruits2022', 'recruits2023']
for i in range(len(databases)):
year = pd.to_numeric(databases[i][-4:], errors = 'coerce')
db = recruits_final[recruits_final['Class'] == year]
db.name = databases[i]
print(db)
print(db.name)
print(year)
recruits2023
I would get this error instead of what I wanted
NameError Traceback (most recent call last)
<ipython-input-49-7cb5d12ab92f> in <module>()
29
30 # print(db.name)
---> 31 recruits2023
32
33
NameError: name 'recruits2023' is not defined
Is there something that I am missing to get this for loop to work? Any assistance is truly appreciated. Thanks in advance.
List use a dictionary of dataframes using groupby:
dict_dfs = dict(tuple(df.groupby('Class')))
Access you individual dataframes using
dict_dfs[2022]
You override variable db at each iteration and recruits2023 is not a variable so you can't use it like that:
You can use a dict to store your data:
recruits = {}
for year in recruits_final['Class'].unique():
recruits[year] = recruits_final[recruits_final['Class'] == year]
>>> recruits[2000]
Primary Rank Other Rank Name Link ... Position Rank State Rank Team Class
0 1 NaN D.J. Williams https://247sports.com/Player/DJ-Williams-49931 ... 1 1 Miami 2000
1 2 NaN Brock Berlin https://247sports.com/Player/Brock-Berlin-49926 ... 1 1 Florida 2000
2 3 NaN Charles Rogers https://247sports.com/Player/Charles-Rogers-49984 ... 1 1 Michigan State 2000
3 4 NaN Travis Johnson https://247sports.com/Player/Travis-Johnson-50043 ... 1 2 Florida State 2000
...
38 40 NaN Chris Septak https://247sports.com/Player/Chris-Septak-57555 ... 1 1 Nebraska 2000
39 41 NaN Eric Knott https://247sports.com/Player/Eric-Knott-60823 ... 2 3 Michigan State 2000
40 42 NaN Harold James https://247sports.com/Player/Harold-James-57524 ... 4 1 Alabama 2000
>>> recruits.keys()
dict_keys([2000])