Group and represent pie chart - python-3.x

I have a Dataframe like this:
Hours Person
10 Jack
20 Louis
10 Jack
30 Anne
10 Anne
And I want to represent this data as a pie chart where 50% of the hours belongs to Anne, 25% to Jack an 25% to Louis. I have tried with goupby but it doesn´t represent what I want.

Try:
df.groupby('Person')['Hours'].sum().plot.pie(autopct='%.2f')
Output:

Related

Creating multiple named dataframes by a for loop

I have a database that contains 60,000+ rows of college football recruit data. From there, I want to create seperate dataframes where each one contains just one value. This is what a sample of the dataframe looks like:
,Primary Rank,Other Rank,Name,Link,Highschool,Position,Height,weight,Rating,National Rank,Position Rank,State Rank,Team,Class
0,1,,D.J. Williams,https://247sports.com/Player/DJ-Williams-49931,"De La Salle (Concord, CA)",ILB,6-2,235,0.9998,1,1,1,Miami,2000
1,2,,Brock Berlin,https://247sports.com/Player/Brock-Berlin-49926,"Evangel Christian Academy (Shreveport, LA)",PRO,6-2,190,0.9998,2,1,1,Florida,2000
2,3,,Charles Rogers,https://247sports.com/Player/Charles-Rogers-49984,"Saginaw (Saginaw, MI)",WR,6-4,195,0.9988,3,1,1,Michigan State,2000
3,4,,Travis Johnson,https://247sports.com/Player/Travis-Johnson-50043,"Notre Dame (Sherman Oaks, CA)",SDE,6-4,265,0.9982,4,1,2,Florida State,2000
4,5,,Marcus Houston,https://247sports.com/Player/Marcus-Houston-50139,"Thomas Jefferson (Denver, CO)",RB,6-0,208,0.9980,5,1,1,Colorado,2000
5,6,,Kwame Harris,https://247sports.com/Player/Kwame-Harris-49999,"Newark (Newark, DE)",OT,6-7,320,0.9978,6,1,1,Stanford,2000
6,7,,B.J. Johnson,https://247sports.com/Player/BJ-Johnson-50154,"South Grand Prairie (Grand Prairie, TX)",WR,6-1,190,0.9976,7,2,1,Texas,2000
7,8,,Bryant McFadden,https://247sports.com/Player/Bryant-McFadden-50094,"McArthur (Hollywood, FL)",CB,6-1,182,0.9968,8,1,1,Florida State,2000
8,9,,Sam Maldonado,https://247sports.com/Player/Sam-Maldonado-50071,"Harrison (Harrison, NY)",RB,6-2,215,0.9964,9,2,1,Ohio State,2000
9,10,,Mike Munoz,https://247sports.com/Player/Mike-Munoz-50150,"Archbishop Moeller (Cincinnati, OH)",OT,6-7,290,0.9960,10,2,1,Tennessee,2000
10,11,,Willis McGahee,https://247sports.com/Player/Willis-McGahee-50179,"Miami Central (Miami, FL)",RB,6-1,215,0.9948,11,3,2,Miami,2000
11,12,,Antonio Hall,https://247sports.com/Player/Antonio-Hall-50175,"McKinley (Canton, OH)",OT,6-5,295,0.9946,12,3,2,Kentucky,2000
12,13,,Darrell Lee,https://247sports.com/Player/Darrell-Lee-50580,"Kirkwood (Saint Louis, MO)",WDE,6-5,230,0.9940,13,1,1,Florida,2000
13,14,,O.J. Owens,https://247sports.com/Player/OJ-Owens-50176,"North Stanly (New London, NC)",S,6-1,195,0.9932,14,1,1,Tennessee,2000
14,15,,Jeff Smoker,https://247sports.com/Player/Jeff-Smoker-50582,"Manheim Central (Manheim, PA)",PRO,6-3,190,0.9922,15,2,1,Michigan State,2000
15,16,,Marco Cooper,https://247sports.com/Player/Marco-Cooper-50171,"Cass Technical (Detroit, MI)",OLB,6-2,235,0.9918,16,1,2,Ohio State,2000
16,17,,Chance Mock,https://247sports.com/Player/Chance-Mock-50163,"The Woodlands (The Woodlands, TX)",PRO,6-2,190,0.9918,17,3,2,Texas,2000
17,18,,Roy Williams,https://247sports.com/Player/Roy-Williams-55566,"Permian (Odessa, TX)",WR,6-4,202,0.9916,18,3,3,Texas,2000
18,19,,Matt Grootegoed,https://247sports.com/Player/Matt-Grootegoed-50591,"Mater Dei (Santa Ana, CA)",OLB,5-11,205,0.9914,19,2,3,USC,2000
19,20,,Yohance Buchanan,https://247sports.com/Player/Yohance-Buchanan-50182,"Douglass (Atlanta, GA)",S,6-1,210,0.9912,20,2,1,Florida State,2000
20,21,,Mac Tyler,https://247sports.com/Player/Mac-Tyler-50572,"Jess Lanier (Hueytown, AL)",DT,6-6,320,0.9912,21,1,1,Alabama,2000
21,22,,Jason Respert,https://247sports.com/Player/Jason-Respert-55623,"Northside (Warner Robins, GA)",OC,6-3,300,0.9902,22,1,2,Tennessee,2000
22,23,,Casey Clausen,https://247sports.com/Player/Casey-Clausen-50183,"Bishop Alemany (Mission Hills, CA)",PRO,6-4,215,0.9896,23,4,4,Tennessee,2000
23,24,,Albert Means,https://247sports.com/Player/Albert-Means-55968,"Trezevant (Memphis, TN)",SDE,6-6,310,0.9890,24,2,1,Alabama,2000
24,25,,Albert Hollis,https://247sports.com/Player/Albert-Hollis-55958,"Christian Brothers (Sacramento, CA)",RB,6-0,190,0.9890,25,4,5,Georgia,2000
25,26,,Eric Moore,https://247sports.com/Player/Eric-Moore-55973,"Pahokee (Pahokee, FL)",OLB,6-4,226,0.9884,26,3,3,Florida State,2000
26,27,,Willie Dixon,https://247sports.com/Player/Willie-Dixon-55626,"Stockton Christian School (Stockton, CA)",WR,5-11,182,0.9884,27,4,6,Miami,2000
27,28,,Cory Bailey,https://247sports.com/Player/Cory-Bailey-50586,"American (Hialeah, FL)",S,5-10,175,0.9880,28,3,4,Florida,2000
28,29,,Sean Young,https://247sports.com/Player/Sean-Young-55972,"Northwest Whitfield County (Tunnel Hill, GA)",OG,6-6,293,0.9878,29,1,3,Tennessee,2000
29,30,,Johnnie Morant,https://247sports.com/Player/Johnnie-Morant-60412,"Parsippany Hills (Morris Plains, NJ)",WR,6-5,225,0.9871,30,5,1,Syracuse,2000
30,31,,Wes Sims,https://247sports.com/Player/Wes-Sims-60243,"Weatherford (Weatherford, OK)",OG,6-5,310,0.9869,31,2,1,Oklahoma,2000
31,33,,Jason Campbell,https://247sports.com/Player/Jason-Campbell-55976,"Taylorsville (Taylorsville, MS)",PRO,6-5,190,0.9853,33,5,1,Auburn,2000
32,34,,Antwan Odom,https://247sports.com/Player/Antwan-Odom-50168,"Alma Bryant (Irvington, AL)",SDE,6-7,260,0.9851,34,3,2,Alabama,2000
33,35,,Sloan Thomas,https://247sports.com/Player/Sloan-Thomas-55630,"Klein (Spring, TX)",WR,6-2,188,0.9847,35,6,5,Texas,2000
34,36,,Raymond Mann,https://247sports.com/Player/Raymond-Mann-60804,"Hampton (Hampton, VA)",ILB,6-1,233,0.9847,36,2,1,Virginia,2000
35,37,,Alphonso Townsend,https://247sports.com/Player/Alphonso-Townsend-55975,"Lima Central Catholic (Lima, OH)",DT,6-6,280,0.9847,37,2,3,Ohio State,2000
36,38,,Greg Jones,https://247sports.com/Player/Greg-Jones-50158,"Battery Creek (Beaufort, SC)",RB,6-2,245,0.9837,38,6,1,Florida State,2000
37,39,,Paul Mociler,https://247sports.com/Player/Paul-Mociler-60319,"St. John Bosco (Bellflower, CA)",OG,6-5,300,0.9833,39,3,7,UCLA,2000
38,40,,Chris Septak,https://247sports.com/Player/Chris-Septak-57555,"Millard West (Omaha, NE)",TE,6-3,245,0.9833,40,1,1,Nebraska,2000
39,41,,Eric Knott,https://247sports.com/Player/Eric-Knott-60823,"Henry Ford II (Sterling Heights, MI)",TE,6-4,235,0.9831,41,2,3,Michigan State,2000
40,42,,Harold James,https://247sports.com/Player/Harold-James-57524,"Osceola (Osceola, AR)",S,6-1,220,0.9827,42,4,1,Alabama,2000
For example, if I don't use a for loop, this line of code is what I use if I just want to create one dataframe:
recruits2022 = recruits_final[recruits_final['Class'] == 2022]
However, I want to have a named dataframe for each recruiting class.
In other words, recruits2000 would be a dataframe for all rows that have a class value equal to 2000, recruits2001 would be a dataframe for all rows that have a class value to 2001, and so forth.
This is what I tried recently, but have no luck saving the dataframe outside of the for loop.
databases = ['recruits2000', 'recruits2001', 'recruits2002', 'recruits2003', 'recruits2004',
'recruits2005', 'recruits2006', 'recruits2007', 'recruits2008', 'recruits2009',
'recruits2010', 'recruits2011', 'recruits2012', 'recruits2013', 'recruits2014',
'recruits2015', 'recruits2016', 'recruits2017', 'recruits2018', 'recruits2019',
'recruits2020', 'recruits2021', 'recruits2022', 'recruits2023']
for i in range(len(databases)):
year = pd.to_numeric(databases[i][-4:], errors = 'coerce')
db = recruits_final[recruits_final['Class'] == year]
db.name = databases[i]
print(db)
print(db.name)
print(year)
recruits2023
I would get this error instead of what I wanted
NameError Traceback (most recent call last)
<ipython-input-49-7cb5d12ab92f> in <module>()
29
30 # print(db.name)
---> 31 recruits2023
32
33
NameError: name 'recruits2023' is not defined
Is there something that I am missing to get this for loop to work? Any assistance is truly appreciated. Thanks in advance.
List use a dictionary of dataframes using groupby:
dict_dfs = dict(tuple(df.groupby('Class')))
Access you individual dataframes using
dict_dfs[2022]
You override variable db at each iteration and recruits2023 is not a variable so you can't use it like that:
You can use a dict to store your data:
recruits = {}
for year in recruits_final['Class'].unique():
recruits[year] = recruits_final[recruits_final['Class'] == year]
>>> recruits[2000]
Primary Rank Other Rank Name Link ... Position Rank State Rank Team Class
0 1 NaN D.J. Williams https://247sports.com/Player/DJ-Williams-49931 ... 1 1 Miami 2000
1 2 NaN Brock Berlin https://247sports.com/Player/Brock-Berlin-49926 ... 1 1 Florida 2000
2 3 NaN Charles Rogers https://247sports.com/Player/Charles-Rogers-49984 ... 1 1 Michigan State 2000
3 4 NaN Travis Johnson https://247sports.com/Player/Travis-Johnson-50043 ... 1 2 Florida State 2000
...
38 40 NaN Chris Septak https://247sports.com/Player/Chris-Septak-57555 ... 1 1 Nebraska 2000
39 41 NaN Eric Knott https://247sports.com/Player/Eric-Knott-60823 ... 2 3 Michigan State 2000
40 42 NaN Harold James https://247sports.com/Player/Harold-James-57524 ... 4 1 Alabama 2000
>>> recruits.keys()
dict_keys([2000])

Formula to calculate total duty hours

How do I find the total working hours of the below driver?
Duty code Dep.Time Arri.Time
A001 03:35 04:20
A001 04:35 05:20
A001 05:51 06:20
A001 06:40 07:20
A001 09:40 10:20
Total Working Hour: 10:20-03:35 = 06:45hrs
Is there a formula to find the total working hours of a single person or a single duty card?
If you only have one Duty Code as in the example, you can use the MAX and MIN functions to calculate the total hours.
If you have more than one Duty Code, you can use MAXIFS and MINIFS.

Convert .txt file into multi index dataframe pandas

I have a very unorganized dataset located in a text file say file.txt
The sample looks something like so
TYPE Invoice C AC DATE TIME Total Invoice Qty1 ITEMVG By Total 3,000.00
Piece Item
5696 01/03/2018 09:21 32,501.35 1 Golden Plate ÞÔÞæÇä ÈÞÑ 6,517.52
1 áÈä ÑæÇÈí ÊÚäÇíá 2 ßÛ 4,261.45
1 Magic chef pop corn 907g 3,509.43
1 áÈäÉ ÊÚäÇíá ÔÝÇÝÉ 1 ßíáæ 9,525.60
1 KHOURY UHT 1 L 2,506.74
1 ÎÈÒ ÔãÓíä ÕÛíÑ 1,002.69
2 Almera 200Tiss 2,506.74
1.55 VG Potato 1,550.17
0.41 VG Eggplant 619.67
1 Delivery Charge 501.35
5697 01/03/2018 09:31 15,751.35 0.5 Halloum 1K. 4,476.03
0.59 Cheese double Cream 3,253.75
3 ãæáÇä ÏæÑ ÎÈÒ æÓØ 32 3,760.11
3 ãæáÇä ÏæÑ ÎÈÒ æÓØ 32 3,760.11
1 Delivery Charge 501.35
I want to import it into a data frame pandas using multi-index. Can someone help me with this?
In fact it can not read it as a txt file
# Obtain the Unorganized data from txt
file1=open('file.txt','r')
UnOrgan=file1.read()
You should be able to just read it in using read_table.
import pandas as pd
df = pd.read_table(<your file>, sep="\t", headers=[rows with column info])
I'm guessing that the separator is a tab.

Multi Criterion Max If Statement

My dataset looks like this...
State Close Date Probability Highest Prob/State
WA 12/31/2016 50% FALSE
WA 12/19/2016 80% FALSE
WA 10/15/2016 80% TRUE
My objective is to build a formula to populate the right-most column. The formula should assess Close Dates and Probabilities within each state. First, it should select the highest probability, then it should select the nearest close date if there is a tie on probability (as in the example). For that record, it should read "TRUE".
I assume this would include a MAX IF statement but haven't been able to get it to work.
Here is a more robust set of data I'm working with. It may actually be easier to first find the highest probability within each Region then select the minimum (oldest) date if there is a tie on probability. This too will serve my purposes.
Region Forecast Close Date Probability (%)
Okeechobee FL 6/27/2016 90
Okeechobee West FL 7/1/2016 40
Albany GA 3/11/2016 100
Emerald Coast FL 6/30/2016 60
Emerald Coast FL 10/1/2016 40
Cullman_Hartselle TN 4/30/2016 10
North MS 10/1/2016 25
Roanoke VA 8/31/2016 25
Roanoke VA 8/1/2016 40
Gardena CA 6/1/2016 80
Gardena CA 6/1/2016 80
Lomita-Harbor City 6/30/2016 60
Lomita-Harbor City 6/30/2016 0
Lomita-Harbor City 6/30/2016 40
Eastern NC 6/30/2016 60
Northwest NC 9/16/2016 10
Fort Collins_Greeley CO 3/1/2016 100
Northwest OK 6/30/2016 100
Southwest MO 7/29/2016 90
Northern NH-VT 3/1/2016 20
South DE 12/1/2016 0
South DE 12/1/2016 20
Kingston NY 12/30/2016 5
Longview WA 11/30/2016 5
North DE 12/1/2016 20
North DE 12/1/2016 0
Salt Lake City UT 8/31/2016 20
Idaho Panhandle 8/26/2016 0
Bridgeton_Salem NJ 7/1/2016 25
Bridgeton_Salem NJ 7/1/2016 65
Layton_Ogden UT 3/25/2016 5
Central OR 6/30/2016 10
The following Array formula should work:
=(ABS(B2-$F$2)=MIN(IF(($A$2:$A$33=A2)*(C2=MAX(IF($A$2:$A$33=A2,$C$2:$C$33))),ABS($B$2:$B$33-$F$2))))*(C2=MAX(IF($A$2:$A$33=A2,$C$2:$C$33)))>0
Being an array formula use Ctrl-Shift-Enter when exiting Edit mode. If done properly Excel will put {} around the formula.
Edit
Added #tigeravatar suggestion to avoid volatile functions.
I think this is OK now but needs to be checked against the more complete set of data provided by OP.
It counts:-
(1) Any rows with same state but higher probability
(2) Any rows with same state and probability, in the future (or present) and nearer to today's date
(3) Any rows with same state and probability, in the past and nearer to today's date.
If all these are zero, you should have the right one.
=COUNTIFS($A$2:$A$100,$A2,$C$2:$C$100,">"&$C2)
+COUNTIFS($A$2:$A$100,$A2,$C$2:$C$100,$C2,$B$2:$B$100,"<"&$G$2+IF ($B2>=$G$2,DATEDIF($G$2,$B2,"d"),DATEDIF($B2,$G$2,"d")),$B$2:$B$100,">="&$G$2)
+COUNTIFS($A$2:$A$100,$A2,$C$2:$C$100,$C2,$B$2:$B$100,">"&$G$2-IF($B2>=$G$2,DATEDIF($G$2,$B2,"d"),DATEDIF($B2,$G$2,"d")),$B$2:$B$100,"<"&$G$2)
=0
If the dates are all in the future, it can be simplified a lot:-
=COUNTIFS($A$2:$A$100,$A2,$C$2:$C$100,">"&$C2)
+COUNTIFS($A$2:$A$100,$A2,$C$2:$C$100,$C2,$B$2:$B$100,"<"&$G$2+DATEDIF($G$2,$B2,"d"))
=0

Can I use excel to work out how something is being calculated by giving it an example output?

I have 2 columns in excel.
Column 1 indicates 'pieces' (of delivery) and the other indicates 'processing time'.
I typed these in by hand because i was given them on a sheet of paper, so there is no maths formula visible.
Is there a way to get Excel to tell me how 'Process time' is being calculated because I really can't figure it out.
--- Example of situation ---
Total pieces | Pro Time (MM:SS)
40 | 00:21
3 | 00:01
12 | 00:04
43 | 00:22

Resources