Multi Criterion Max If Statement - excel

My dataset looks like this...
State Close Date Probability Highest Prob/State
WA 12/31/2016 50% FALSE
WA 12/19/2016 80% FALSE
WA 10/15/2016 80% TRUE
My objective is to build a formula to populate the right-most column. The formula should assess Close Dates and Probabilities within each state. First, it should select the highest probability, then it should select the nearest close date if there is a tie on probability (as in the example). For that record, it should read "TRUE".
I assume this would include a MAX IF statement but haven't been able to get it to work.
Here is a more robust set of data I'm working with. It may actually be easier to first find the highest probability within each Region then select the minimum (oldest) date if there is a tie on probability. This too will serve my purposes.
Region Forecast Close Date Probability (%)
Okeechobee FL 6/27/2016 90
Okeechobee West FL 7/1/2016 40
Albany GA 3/11/2016 100
Emerald Coast FL 6/30/2016 60
Emerald Coast FL 10/1/2016 40
Cullman_Hartselle TN 4/30/2016 10
North MS 10/1/2016 25
Roanoke VA 8/31/2016 25
Roanoke VA 8/1/2016 40
Gardena CA 6/1/2016 80
Gardena CA 6/1/2016 80
Lomita-Harbor City 6/30/2016 60
Lomita-Harbor City 6/30/2016 0
Lomita-Harbor City 6/30/2016 40
Eastern NC 6/30/2016 60
Northwest NC 9/16/2016 10
Fort Collins_Greeley CO 3/1/2016 100
Northwest OK 6/30/2016 100
Southwest MO 7/29/2016 90
Northern NH-VT 3/1/2016 20
South DE 12/1/2016 0
South DE 12/1/2016 20
Kingston NY 12/30/2016 5
Longview WA 11/30/2016 5
North DE 12/1/2016 20
North DE 12/1/2016 0
Salt Lake City UT 8/31/2016 20
Idaho Panhandle 8/26/2016 0
Bridgeton_Salem NJ 7/1/2016 25
Bridgeton_Salem NJ 7/1/2016 65
Layton_Ogden UT 3/25/2016 5
Central OR 6/30/2016 10

The following Array formula should work:
=(ABS(B2-$F$2)=MIN(IF(($A$2:$A$33=A2)*(C2=MAX(IF($A$2:$A$33=A2,$C$2:$C$33))),ABS($B$2:$B$33-$F$2))))*(C2=MAX(IF($A$2:$A$33=A2,$C$2:$C$33)))>0
Being an array formula use Ctrl-Shift-Enter when exiting Edit mode. If done properly Excel will put {} around the formula.
Edit
Added #tigeravatar suggestion to avoid volatile functions.

I think this is OK now but needs to be checked against the more complete set of data provided by OP.
It counts:-
(1) Any rows with same state but higher probability
(2) Any rows with same state and probability, in the future (or present) and nearer to today's date
(3) Any rows with same state and probability, in the past and nearer to today's date.
If all these are zero, you should have the right one.
=COUNTIFS($A$2:$A$100,$A2,$C$2:$C$100,">"&$C2)
+COUNTIFS($A$2:$A$100,$A2,$C$2:$C$100,$C2,$B$2:$B$100,"<"&$G$2+IF ($B2>=$G$2,DATEDIF($G$2,$B2,"d"),DATEDIF($B2,$G$2,"d")),$B$2:$B$100,">="&$G$2)
+COUNTIFS($A$2:$A$100,$A2,$C$2:$C$100,$C2,$B$2:$B$100,">"&$G$2-IF($B2>=$G$2,DATEDIF($G$2,$B2,"d"),DATEDIF($B2,$G$2,"d")),$B$2:$B$100,"<"&$G$2)
=0
If the dates are all in the future, it can be simplified a lot:-
=COUNTIFS($A$2:$A$100,$A2,$C$2:$C$100,">"&$C2)
+COUNTIFS($A$2:$A$100,$A2,$C$2:$C$100,$C2,$B$2:$B$100,"<"&$G$2+DATEDIF($G$2,$B2,"d"))
=0

Related

Creating multiple named dataframes by a for loop

I have a database that contains 60,000+ rows of college football recruit data. From there, I want to create seperate dataframes where each one contains just one value. This is what a sample of the dataframe looks like:
,Primary Rank,Other Rank,Name,Link,Highschool,Position,Height,weight,Rating,National Rank,Position Rank,State Rank,Team,Class
0,1,,D.J. Williams,https://247sports.com/Player/DJ-Williams-49931,"De La Salle (Concord, CA)",ILB,6-2,235,0.9998,1,1,1,Miami,2000
1,2,,Brock Berlin,https://247sports.com/Player/Brock-Berlin-49926,"Evangel Christian Academy (Shreveport, LA)",PRO,6-2,190,0.9998,2,1,1,Florida,2000
2,3,,Charles Rogers,https://247sports.com/Player/Charles-Rogers-49984,"Saginaw (Saginaw, MI)",WR,6-4,195,0.9988,3,1,1,Michigan State,2000
3,4,,Travis Johnson,https://247sports.com/Player/Travis-Johnson-50043,"Notre Dame (Sherman Oaks, CA)",SDE,6-4,265,0.9982,4,1,2,Florida State,2000
4,5,,Marcus Houston,https://247sports.com/Player/Marcus-Houston-50139,"Thomas Jefferson (Denver, CO)",RB,6-0,208,0.9980,5,1,1,Colorado,2000
5,6,,Kwame Harris,https://247sports.com/Player/Kwame-Harris-49999,"Newark (Newark, DE)",OT,6-7,320,0.9978,6,1,1,Stanford,2000
6,7,,B.J. Johnson,https://247sports.com/Player/BJ-Johnson-50154,"South Grand Prairie (Grand Prairie, TX)",WR,6-1,190,0.9976,7,2,1,Texas,2000
7,8,,Bryant McFadden,https://247sports.com/Player/Bryant-McFadden-50094,"McArthur (Hollywood, FL)",CB,6-1,182,0.9968,8,1,1,Florida State,2000
8,9,,Sam Maldonado,https://247sports.com/Player/Sam-Maldonado-50071,"Harrison (Harrison, NY)",RB,6-2,215,0.9964,9,2,1,Ohio State,2000
9,10,,Mike Munoz,https://247sports.com/Player/Mike-Munoz-50150,"Archbishop Moeller (Cincinnati, OH)",OT,6-7,290,0.9960,10,2,1,Tennessee,2000
10,11,,Willis McGahee,https://247sports.com/Player/Willis-McGahee-50179,"Miami Central (Miami, FL)",RB,6-1,215,0.9948,11,3,2,Miami,2000
11,12,,Antonio Hall,https://247sports.com/Player/Antonio-Hall-50175,"McKinley (Canton, OH)",OT,6-5,295,0.9946,12,3,2,Kentucky,2000
12,13,,Darrell Lee,https://247sports.com/Player/Darrell-Lee-50580,"Kirkwood (Saint Louis, MO)",WDE,6-5,230,0.9940,13,1,1,Florida,2000
13,14,,O.J. Owens,https://247sports.com/Player/OJ-Owens-50176,"North Stanly (New London, NC)",S,6-1,195,0.9932,14,1,1,Tennessee,2000
14,15,,Jeff Smoker,https://247sports.com/Player/Jeff-Smoker-50582,"Manheim Central (Manheim, PA)",PRO,6-3,190,0.9922,15,2,1,Michigan State,2000
15,16,,Marco Cooper,https://247sports.com/Player/Marco-Cooper-50171,"Cass Technical (Detroit, MI)",OLB,6-2,235,0.9918,16,1,2,Ohio State,2000
16,17,,Chance Mock,https://247sports.com/Player/Chance-Mock-50163,"The Woodlands (The Woodlands, TX)",PRO,6-2,190,0.9918,17,3,2,Texas,2000
17,18,,Roy Williams,https://247sports.com/Player/Roy-Williams-55566,"Permian (Odessa, TX)",WR,6-4,202,0.9916,18,3,3,Texas,2000
18,19,,Matt Grootegoed,https://247sports.com/Player/Matt-Grootegoed-50591,"Mater Dei (Santa Ana, CA)",OLB,5-11,205,0.9914,19,2,3,USC,2000
19,20,,Yohance Buchanan,https://247sports.com/Player/Yohance-Buchanan-50182,"Douglass (Atlanta, GA)",S,6-1,210,0.9912,20,2,1,Florida State,2000
20,21,,Mac Tyler,https://247sports.com/Player/Mac-Tyler-50572,"Jess Lanier (Hueytown, AL)",DT,6-6,320,0.9912,21,1,1,Alabama,2000
21,22,,Jason Respert,https://247sports.com/Player/Jason-Respert-55623,"Northside (Warner Robins, GA)",OC,6-3,300,0.9902,22,1,2,Tennessee,2000
22,23,,Casey Clausen,https://247sports.com/Player/Casey-Clausen-50183,"Bishop Alemany (Mission Hills, CA)",PRO,6-4,215,0.9896,23,4,4,Tennessee,2000
23,24,,Albert Means,https://247sports.com/Player/Albert-Means-55968,"Trezevant (Memphis, TN)",SDE,6-6,310,0.9890,24,2,1,Alabama,2000
24,25,,Albert Hollis,https://247sports.com/Player/Albert-Hollis-55958,"Christian Brothers (Sacramento, CA)",RB,6-0,190,0.9890,25,4,5,Georgia,2000
25,26,,Eric Moore,https://247sports.com/Player/Eric-Moore-55973,"Pahokee (Pahokee, FL)",OLB,6-4,226,0.9884,26,3,3,Florida State,2000
26,27,,Willie Dixon,https://247sports.com/Player/Willie-Dixon-55626,"Stockton Christian School (Stockton, CA)",WR,5-11,182,0.9884,27,4,6,Miami,2000
27,28,,Cory Bailey,https://247sports.com/Player/Cory-Bailey-50586,"American (Hialeah, FL)",S,5-10,175,0.9880,28,3,4,Florida,2000
28,29,,Sean Young,https://247sports.com/Player/Sean-Young-55972,"Northwest Whitfield County (Tunnel Hill, GA)",OG,6-6,293,0.9878,29,1,3,Tennessee,2000
29,30,,Johnnie Morant,https://247sports.com/Player/Johnnie-Morant-60412,"Parsippany Hills (Morris Plains, NJ)",WR,6-5,225,0.9871,30,5,1,Syracuse,2000
30,31,,Wes Sims,https://247sports.com/Player/Wes-Sims-60243,"Weatherford (Weatherford, OK)",OG,6-5,310,0.9869,31,2,1,Oklahoma,2000
31,33,,Jason Campbell,https://247sports.com/Player/Jason-Campbell-55976,"Taylorsville (Taylorsville, MS)",PRO,6-5,190,0.9853,33,5,1,Auburn,2000
32,34,,Antwan Odom,https://247sports.com/Player/Antwan-Odom-50168,"Alma Bryant (Irvington, AL)",SDE,6-7,260,0.9851,34,3,2,Alabama,2000
33,35,,Sloan Thomas,https://247sports.com/Player/Sloan-Thomas-55630,"Klein (Spring, TX)",WR,6-2,188,0.9847,35,6,5,Texas,2000
34,36,,Raymond Mann,https://247sports.com/Player/Raymond-Mann-60804,"Hampton (Hampton, VA)",ILB,6-1,233,0.9847,36,2,1,Virginia,2000
35,37,,Alphonso Townsend,https://247sports.com/Player/Alphonso-Townsend-55975,"Lima Central Catholic (Lima, OH)",DT,6-6,280,0.9847,37,2,3,Ohio State,2000
36,38,,Greg Jones,https://247sports.com/Player/Greg-Jones-50158,"Battery Creek (Beaufort, SC)",RB,6-2,245,0.9837,38,6,1,Florida State,2000
37,39,,Paul Mociler,https://247sports.com/Player/Paul-Mociler-60319,"St. John Bosco (Bellflower, CA)",OG,6-5,300,0.9833,39,3,7,UCLA,2000
38,40,,Chris Septak,https://247sports.com/Player/Chris-Septak-57555,"Millard West (Omaha, NE)",TE,6-3,245,0.9833,40,1,1,Nebraska,2000
39,41,,Eric Knott,https://247sports.com/Player/Eric-Knott-60823,"Henry Ford II (Sterling Heights, MI)",TE,6-4,235,0.9831,41,2,3,Michigan State,2000
40,42,,Harold James,https://247sports.com/Player/Harold-James-57524,"Osceola (Osceola, AR)",S,6-1,220,0.9827,42,4,1,Alabama,2000
For example, if I don't use a for loop, this line of code is what I use if I just want to create one dataframe:
recruits2022 = recruits_final[recruits_final['Class'] == 2022]
However, I want to have a named dataframe for each recruiting class.
In other words, recruits2000 would be a dataframe for all rows that have a class value equal to 2000, recruits2001 would be a dataframe for all rows that have a class value to 2001, and so forth.
This is what I tried recently, but have no luck saving the dataframe outside of the for loop.
databases = ['recruits2000', 'recruits2001', 'recruits2002', 'recruits2003', 'recruits2004',
'recruits2005', 'recruits2006', 'recruits2007', 'recruits2008', 'recruits2009',
'recruits2010', 'recruits2011', 'recruits2012', 'recruits2013', 'recruits2014',
'recruits2015', 'recruits2016', 'recruits2017', 'recruits2018', 'recruits2019',
'recruits2020', 'recruits2021', 'recruits2022', 'recruits2023']
for i in range(len(databases)):
year = pd.to_numeric(databases[i][-4:], errors = 'coerce')
db = recruits_final[recruits_final['Class'] == year]
db.name = databases[i]
print(db)
print(db.name)
print(year)
recruits2023
I would get this error instead of what I wanted
NameError Traceback (most recent call last)
<ipython-input-49-7cb5d12ab92f> in <module>()
29
30 # print(db.name)
---> 31 recruits2023
32
33
NameError: name 'recruits2023' is not defined
Is there something that I am missing to get this for loop to work? Any assistance is truly appreciated. Thanks in advance.
List use a dictionary of dataframes using groupby:
dict_dfs = dict(tuple(df.groupby('Class')))
Access you individual dataframes using
dict_dfs[2022]
You override variable db at each iteration and recruits2023 is not a variable so you can't use it like that:
You can use a dict to store your data:
recruits = {}
for year in recruits_final['Class'].unique():
recruits[year] = recruits_final[recruits_final['Class'] == year]
>>> recruits[2000]
Primary Rank Other Rank Name Link ... Position Rank State Rank Team Class
0 1 NaN D.J. Williams https://247sports.com/Player/DJ-Williams-49931 ... 1 1 Miami 2000
1 2 NaN Brock Berlin https://247sports.com/Player/Brock-Berlin-49926 ... 1 1 Florida 2000
2 3 NaN Charles Rogers https://247sports.com/Player/Charles-Rogers-49984 ... 1 1 Michigan State 2000
3 4 NaN Travis Johnson https://247sports.com/Player/Travis-Johnson-50043 ... 1 2 Florida State 2000
...
38 40 NaN Chris Septak https://247sports.com/Player/Chris-Septak-57555 ... 1 1 Nebraska 2000
39 41 NaN Eric Knott https://247sports.com/Player/Eric-Knott-60823 ... 2 3 Michigan State 2000
40 42 NaN Harold James https://247sports.com/Player/Harold-James-57524 ... 4 1 Alabama 2000
>>> recruits.keys()
dict_keys([2000])

Remove custom stop words from pandas dataframe not working

I am trying to remove a custom list of stop words, but its not working.
desc = pd.DataFrame(description, columns =['description'])
print(desc)
Which gives the following results
description
188693 The Kentucky Cannabis Company and Bluegrass He...
181535 Ohio County Sheriff
11443 According to new reports from federal authorit...
213919 KANSAS CITY, Mo. (AP)The Chiefs will be withou...
171509 The crew of Insight, WCNY's weekly public affa...
... ...
2732 The Arkansas Supreme Court on Thursday cleared...
183367 Larry Pegram, co-owner of Pure Ohio Wellness, ...
134291 Joe Biden will spend the next five months pres...
239270 Find out where your Texas representatives stan...
246070 SAN TAN VALLEY — Two men have been charged wit...
[9875 rows x 1 columns]
I found the following code here, but it doesn't seem to work
remove_words = ["marijuana", "cannabis", "hemp", "thc", "cbd"]
pat = '|'.join([r'\b{}\b'.format(w) for w in remove_words])
desc.assign(new_desc=desc.replace(dict(string={pat: ''}), regex=True))
Which produces the following results
description new_desc
188693 The Kentucky Cannabis Company and Bluegrass He... The Kentucky Cannabis Company and Bluegrass He...
181535 Ohio County Sheriff Ohio County Sheriff
11443 According to new reports from federal authorit... According to new reports from federal authorit...
213919 KANSAS CITY, Mo. (AP)The Chiefs will be withou... KANSAS CITY, Mo. (AP)The Chiefs will be withou...
171509 The crew of Insight, WCNY's weekly public affa... The crew of Insight, WCNY's weekly public affa...
... ... ...
2732 The Arkansas Supreme Court on Thursday cleared... The Arkansas Supreme Court on Thursday cleared...
183367 Larry Pegram, co-owner of Pure Ohio Wellness, ... Larry Pegram, co-owner of Pure Ohio Wellness, ...
134291 Joe Biden will spend the next five months pres... Joe Biden will spend the next five months pres...
239270 Find out where your Texas representatives stan... Find out where your Texas representatives stan...
246070 SAN TAN VALLEY — Two men have been charged wit... SAN TAN VALLEY — Two men have been charged wit...
9875 rows × 2 columns
As you can see, the stop words weren't removed. Any help you can provide would be greatly appreciated.
Handle the case, simplify pattern,
remove_words = ["marijuana", "cannabis", "hemp", "thc", "cbd"]
pat = '|'.join(remove_words)
desc['new_desc'] = desc.description.str.lower().replace(pat,'', regex=True)
description new_desc
0 The Kentucky Cannabis Company and Bluegrass He... the kentucky company and bluegrass he...
1 Ohio County Sheriff ohio county sheriff
2 According to new reports from federal authorit... according to new reports from federal authorit...
3 KANSAS CITY, Mo. (AP)The Chiefs will be mariju... kansas city, mo. (ap)the chiefs will be witho...
4 The crew of Insight, WCNY's weekly public affa... the crew of insight, wcny's weekly public affa...

How to know the treadmills are enough or not?

I'm doing an analysis of a gym and want to know if we should get more treadmills or not.
The running history record contains the users name, start time, end time and duration.
My idea is to use a pivot table in Excel, put the start date and time into the row, and the sum of duration in the value, the right click on the time, click group, by hour, then the duration will be the sum for each hour from morning to evening.
If the sum of duration of each hour is close to 120 mins for 2 treadmills, it means it's full.
But if people run over the hour, for example, from 1:30pm to 2:30pm, the duration will all count to 1-2pm, so it's not correct.
Does anyone have a good method?
Thanks
Here is some sample data
name sex device id device type start date start time end date end time duration(mins) distance(km) Calorie
Emmie Aguila Female a001 Treadmill 2020-7-25 9:34:18 2020-7-25 10:20:20 46 4.5 338.6
Dusty Gorham Female a002 Treadmill 2020-7-25 9:13:45 2020-7-25 9:49:02 35 3.1 192.2
Diann Lafreniere Female a001 Treadmill 2020-7-25 9:12:06 2020-7-25 9:33:27 21 2.1 142.6
Rima Hoop Male a001 Treadmill 2020-7-25 7:10:10 2020-7-25 7:30:14 20 2.4 230.8
my pivot result
One of mine friend find a solution for Excel.
Convert start/end time to seconds, for example, 0-3600 for 0-1am, 3600-7200 for 1-2pm
Get the overlap time length by comparing the duration with each hour.
> name sex mobile device-id device-type start date starttime endtime duration(mins) distance(km) calorie Start–sec end-sec 3600 7200 10800 14400 18000 21600 25200 28800 32400 36000 39600 43200 46800 50400 54000 57600 61200 64800 68400 72000 75600 79200 82800 86400
> Emmie Female 1.23412E+11 a001 Treadmill 2020-7-25 9:34:18 2020-07-25 10:20:20 46 4.5 338.6 34458 37220 0 0 0 0 0 0 0 0 1542 1220 0 0 0 0 0 0 0 0 0 0 0 0 0
(I tried many times but last few zero digits can not align correctly)
convert time to seconds
starttiem(H3) -> start-sec(M3)
=HOUR(H3)*3600+MINUTE(H3)*60+SECOND(H3)
endtime(I3) -> end-sec(N3)
=HOUR(I3)*3600+MINUTE(I3)*60+SECOND(I3)
2.compare the duration with each hour in secods
=IF(OR(O$1>$N3, $M3>O$2), 0, MIN($N3, O$2)- MAX($M3, O$1))
Apply the fomula for whole sheet, then the data is ready.
Create a pivot sheet, drag the date into the row, all the hour seconds(1-24) into column, we can see the result clearly by creating a bar chart.
usage for treadmills for each day

how to separate Unit/Suite/APT/# from an address in Excel

I have a data base 272,000 addresses But some addresses have unit/suite/STE/APT seeexample below
16 BRIARWOOD COURT UNIT B MONTVALE, NJ 07645
100 CROWN COURT #471 EDGEWATER, NJ 07020
23-05 HIGH ST APT A FAIR LAWN, NJ 07410
15-01 BROADWAY STE 6 FAIR LAWN, NJ 07410
80 BROADWAY, SUITE 1A CRESSKILL, N.J. 07626
300 GORGE ROAD APT 11 CLIFFSIDE PARK, N.J. 07010
I would like to split the text to the next column when it comes across unit/suite/STE/APT
I want to separate these so I can use Advance filter with unique records and create a master find and replace to clean the list....
Any formulas I can use for this would be helpful....
You can batch geocode your file on geocoder.ca
This is the result I got:
rawlocation Latitude Longitude Score StandardCivicNumber StandardAddress StandardCity StandardStateorProvinceAbbrv PostalZip Confidence
16 BRIARWOOD COURT UNIT B MONTVALE NJ 07645 41.035587 -74.06744 1
16 Briarwood Crt Montvale NJ 7677 0.7
100 CROWN COURT #471 EDGEWATER NJ 07020 40.822893 -73.978375 1 100 Crown Crt Edgewater NJ 07020-1137 0.8
23-05 HIGH ST APT A FAIR LAWN NJ 07410 40.940276 -74.120329 1 23 High St Fair Lawn NJ 07410-3574 0.8
15-01 BROADWAY STE 6 FAIR LAWN NJ 07410 40.920501 -74.091153 1 1 S Broadway Fair Lawn NJ 07410-5529 0.8
80 BROADWAY - 0
300 GORGE ROAD APT 11 CLIFFSIDE PARK N.J. 07010 40.814151 -73.990015 1 300 Gorge Rd Cliffside Park NJ 07010-2759 0.8
From the cleaned up version you can then street compare to extract additional entities.
Since not all addresses have a secondary number (such as APT C, or STE 312), I would recommend separating every time you come across a ZIP (5 digits) or a ZIP+4 (like 07010-2759). This will help you break that string into discrete addresses.
If you then want to clean up the list by correcting small typos and standardizing abbreviations, etc, I recommend using an address validation and standardization service like Melissa Data, or SmartyStreets. SmartyStreets has tools for validating/cleansing large lists of addresses and even extracting addresses out of text. (Full disclosure) I'm a software developer for SmartyStreets.

Daily and Hourly Averages from (m/d/yyyy h:mm) timestamps in Excel

I have an Excel 2007 spreadsheet with date entries in this format m/d/yyyy h:mm (one cell). I would like find the hourly and daily average of all the columns of this spreadsheet and save each time aggregation to a new worksheet.
The data is recorded every ~10 minutes, but throughout the dates of data collection there was some time slips. Not every hour has the same number of rows. Also, the ending minute is either 0 or 6 depending on the time correction.
What would be a good way to approach this task within Excel 2007? It seems like this might be possible with a pivot table if I can create a formula that will select the correct range for the timestamps. Thanks.
For example, an date time entry in TIMESTAMP, 10/31/2012 0:06 which is in one cell.
TIMESTAMP Month Day Year Hour Min Rain_mm Rain_mm_2 AirTC AirTC_2 FuelM FuelM_2 VW ... there are ~16 variables (total) after the data time
10/31/2012 0:06 10 31 2012 0 06 0 0 26.11 26.08 2.545 6.4 0.049
10/31/2012 0:16 10 31 2012 0 16 0 0 25.98 25.97 2.624 6.6 0.049
10/31/2012 0:26 10 31 2012 0 26 0 0 24.32 23.33 2.543 6.5 0.048
10/31/2012 0:36 10 31 2012 0 36 0 0 24.32 23.33 2.543 6.5 0.048
10/31/2012 0:46 10 31 2012 0 46 0 0 24.32 23.33 2.543 6.5 0.048
10/31/2012 0:56 10 31 2012 0 56 0 0 25.87 25.87 2.753 7.3 0.049
10/31/2012 1:06 10 31 2012 0 06 0 0 25.74 25.74 2.879 8.1 0.051
## The above is just over one hour of collection on one day ##
...
## Different Day ### Notice Missing Time Stamp
11/30/2012 0:00 11 30 2012 0 06 0 0.1 26.12 26.18 2.535 6.4 0.049
11/30/2012 0:10 11 30 2012 0 16 0 0.1 25.90 25.77 2.424 6.6 0.049
11/30/2012 0:20 11 30 2012 0 26 0.1 0.2 24.12 24.43 2.542 6.4 0.046
11/30/2012 0:30 11 30 2012 0 36 0.1 0 24.22 22.32 2.543 6.5 0.048
11/30/2012 0:50 11 30 2012 0 56 0.1 0.2 26.77 25.87 2.743 6.3 0.049
11/30/2012 1:00 11 30 2012 0 06 0 0 24.34 24.77 2.459 5.1 0.050
## so forth on so on ##
After clarification of the requirement for daily averages edited to cover both daily and hourly averages:
Add a column (here B) for ‘H’ (ie hour) with =HOUR(A2) copied down.
(Note: Though formatted to show only m/d/y content of ColumnA is, in line with title, assumed to be all of mm/dd/yyyy hh:mm. Makes existing columns [with names jumbled] Month, Day, Year, Hour redundant).
Select data range.
Data, Subtotal, At each change in: TIMESTAMP, Use function: Average, Add subtotal to: check only columns G and to the right, OK.
Uncheck Replace current subtotals in Subtotal and apply At each change in: H, Use function: Average, and Add subtotal to: as before, OK.
Replace =SUBTOTAL(1, in Min column with =MIN( .
Delete ‘spare’ Grand Average row.
Reformat as required.
Hopefully this will be achieved and is what is required!:
Note midnight 'tonight' is counted as within first hour of tomorrow.
I had a similar need and worked it out this way:
Add a column for Date (assuming your dd/mm/yyyy hh:mm:ss data is in cell A2)
=DATE(YEAR(A2),MONTH(A2),DAY(A2))
Add a column for Year. If you have weeks from a single year, the year column can be neglected.
=YEAR(A2)
Add a column for Week Number
=WEEKNUM(A2)
Add 2 pivot tables, 1 for daily and 1 for weekly analysis.
Choose fields "Date" and the quantities you want. Put "Date" in the Rows section and sum/average of values in the Values section. You will get a date wise sum/average of the values you need.
In the weekly pivot table, do the same as above, just add "Year" and "Week no" in the Rows section instead of "Dates" as in above.
Hope this helps

Resources