What type of analysis would fit for this dataset problem? - statistics

I have a data set which shows how many cars passed through a carwash through out each September for 2019, 2020, and 2021, by day. So for Sept 2019, it will show 37 cars on Monday the 2nd, 38 cars on Tuesday the 3rd, etc. What I would like to do is use the data for the three years to make a comparison for each day before and after labor day, and see if there is any relevance to the trends in comparison to the rest of the month. Some kind of statistical analysis that might show statistical significance how much change is on those two days versus the rest of the month. How could I go about doing this? Would that include something like linear regression? I'm not a statistics guy so I'm just throwing this out there for someone more knowledgeable.
Here's a snippet of one of those data tables
Date Weekday CarCount
01/07/2019 Mon 37
01/08/2019 Tue 38
01/09/2019 Wed 26
01/10/2019 Thu 46
01/11/2019 Fri 40
01/12/2019 Sat 45

Related

Excel Plotting two Date Ranges on the same Pivot Table

I am trying to trend the occurrence of a system event in Excel. I've attached a screen shot showing the pretty simple data that I'm working with.
The spreadsheet lists the times of the event (Column A) for a two week period and I've used a ceiling formula to group those events into 15 minute increments (Column B). Using a pivot table and chart, it's pretty simple to then take that data and graph the events into a line chart that shows the 15 minute increment time, and the count of the events in that 15 minute period.
Now I'd like to take subsets of that data and compare them. For example I'd like to compare Friday Saturday Sunday of last week Friday Saturday and Sunday of this week. So far, the only way I found to achieve this is to create duplicate pivot tables (Columns E, F and H,I), filter each table by the days I want to compare (25, 26, and 27 in one table and 1, 2 and 3 in the second), and then visually compare them (the Two charts).
But what I'd really like to do is to combine both series into one chart as a stacked line graph. I thought that would be pretty simple, but since the dates are so very dissimilar, I always wind up getting a single line or duplicate data, or totally off the wall results.
I've gotten kinda close by converting the row labels in to a custom DDD HH:MM format. That lists the Dates as (For Oct 25th)
Fri 00:15 11
Fri 00:30 1
Fri 00:45 4
Fri 01:00 3
Fri 01:15 6
Etc.
And (For Nov 1st)
Fri 00:15 7
Fri 00:30 1
Fri 00:45 2
Fri 01:00 1
Fri 01:30 2
Etc
Now I have something I can join on (Friday 00:15, 00:30, 00:45, etc is in both charts)
But the problem is, the two series often don't have the same Labels. For example the data for Nov 1 is missing 01:15,01:45 and 02:00 because no events occurred at those times. Same problem for the Oct 25th's data which is also missing 02:00 in addition to all increments between 02:30 and 03:45.
I tried just manually inserting the missing times in both charts but that quickly became a very complex and error prone chore.
So the question is: Is there a way to get excel to automatically fill in the missing time slots or am I completely off base in the way I'm trying to do this?
OK.. that was two questions :)
Thanks so much for your help!

Filter dates by month in pivot table that doesn't start with the 1st of every month

Does anyone know how to filter dates by specific dates by month such as Jan 15 to February 15, Feb 15-March 15....etc on pivot table in excel?
My data is January 15 2017 to January 15 2018, so I like to filter it by specific dates but so far it's only letting me do the number of days which is not accurate since feb has only 28 days. thank you!

Finding the difference in minutes or seconds for two timestamps in excel?

I have two columns in a excel containing value like : Mon Dec 01 09:27:04 2014.
Lets say column A and column B and each column has rows containing values in the above format. How can we find the difference between two dates?
Mon Dec 01 09:27:04 2014 - Sun Nov 30 11:08:36 2014 = 25 hrs 51 minutes
Please see this for how to do that..
A more direct answer: you can format the cell
=TEXT(B2-A2,"h:mm")
I suspect the data is text strings for which I suggest:
=VALUE(MID(A1,9,2)&"/"&MID(A1,5,3)&"/"&RIGHT(A1,4))-VALUE(MID(B1,9,2)&"/"&MID(B1,5,3)&"/"&RIGHT(B1,4))+(MID(A1,12,8)-MID(B1,12,8))
However this relies on Excel's ability to interpret the likes of Dec as the month of December which may not be the case for all language versions. If so, I suggest a lookup table and extracting the dates with DATE along these lines:
=DATE(RIGHT(A1,4),VLOOKUP(MID(A1,5,3),Table,2,0),MID(A1,9,2))
However by such calculation the difference of the timestamps given in the question is:
22 hrs 18 minutes 28 seconds
when formatted:
[hh] "hrs" mm "minutes" ss "seconds"

How to produce average for calendar months when spreadsheet quantity spans months?

I am documenting my historic home energy consumption. I am entering in to a Google drive spreadsheet the kWh figure found in gas bills from the last few years.
I have come far - https://docs.google.com/spreadsheet/pub?key=0AuQU5u-2PP8NdC1iNFJVNFVxeDE2WHhVdTUtbGNDWnc&output=html (here is the Google Doc - https://docs.google.com/spreadsheet/ccc?key=0AuQU5u-2PP8NdC1iNFJVNFVxeDE2WHhVdTUtbGNDWnc&usp=sharing)
Now I want to analyse this data in interesting ways, to be aware of my changing consumption over time - principally, kWh by calendar month. The problem is, the issued gas bills containing kWh figures span multiple and partial months. eg (Feb 1 to May 11, then 12 May to 6 Aug)...
All data in the sheet is logged on a row containing two key identifiers - period start and end dates - which are formatted as dates.
My question: How can I rationalise this stuff to traverse those awkward multi-month bill figures, to produce some kind of average or mean for kWh used on a calendar-month basis (ie. Feb 2007, Mar 2007)? Is that even mathematically possible or reliable?
Thanks in advance.
Try =YEARFRAC(StartDate, EndDate, [convention]) which will give you the fractional number of years between the dates using a reasonable convention for day count.
See http://office.microsoft.com/en-gb/excel-help/yearfrac-HP005209344.aspx for more details on various day count conventions available.
The first problem is that per-month information doesn't fit your current table structure; to help explain, if you worked out different monthly rates for Feb, May and Jun 2007 (they are different rates) where would you put these numbers in your table?
There are many options, but I believe the best solution is to:
Create a new table with consistent frequency (i.e. consecutive months down column A), then create formulae to interpolate the relevant values from the source table.
I would actually recommend this 'pure' table uses a line per day (rather than per month) because:
the maths is easier to get a per-day rate read out of the source
table
you can always aggregate daily data back up to monthly
you are not in danger of running out of lines in your sheet
Yes it is doable.
Calculate your cumulative usage (since your bills started) for each gas bill. Interpolate the cumulative usage for the 1st of each month. For Feb 2007 = (Mar_1_2007_cumulative - Feb_1_2007_cumulative).
Goal "... consumption over time - kWh by calendar month."
Even if you had daily consumption figures, as months like January (31 days) are longer than February (28/29), charting what you request would show a + bias in long months and - bias in short months. So let's change the goal to
Goal "... daily consumption over time - kWh/day by calendar month."
Say you have figures like where you list the data, usage since last data and you calculate the cumulative usage since the beginning of your records set.
date kWH *1 Total*2
Jan 1, 2012 - 0
Mar 3, 2012 100 100
Apr 4, 2012 30 130
May 2, 2012 35 165
Aug 9, 2012 75 240
Dec 25, 2012 100 340
Jun 7, 2013 200 540
*1 energy used since previous period
*2 Sum of total usage
(Ignore the "kWH *1" column for the following)
Now make a table for the first of the month for a year, say 2012, and find in the above table an entry <= the first of the month, and the next entry.
Jan 1, 2012 (Jan 1, 2012 0) (Mar 3, 2012 100)
Feb 1, 2012 (Jan 1, 2012 0) (Mar 3, 2012 100)
Mar 1, 2012 (Jan 1, 2012 0) (Mar 3, 2012 100)
Apr 1, 2012 (Mar 3, 2012 100) (Apr 4, 2012 130)
May 1, 2012 (Apr 4, 2012 130) (May 2, 2012 165)
...
Dec 1, 2012 ....
As dates can have a serial number, you liner interpolate that serial number into the the 2 date/cumulative_usage pairs. this provides the cumulation usage to your 1st of the month. That becomes column "Interpolation" for the below table. The "Days/Month" is straight forward (days form First of the month to the next). The Usage/Day for a given month is then the (change in "Interpolation") / "Days/Month". E. g. 1-Feb-12 --> (96.8-50.0)/29 = 3.34.
Date Interpolation Days/Month Usage/Day
1-Jan-12 - 31 1.61
1-Feb-12 50.0 29 3.34
1-Mar-12 96.8 31 4.10
1-Apr-12 127.2 30 5.46
1-May-12 163.8 31
All thats left is to chart Usage/Day vs. Date.

MS Excel: Fraction of date span within another date span

In Microsoft Excel 2007, given a date span, I want to find our what fraction of that date span lies within each season, which is obviously another date span.
For example:
Given a span of 26 Nov 10—28 Feb 11 (95 days):
26 Nov—30 Nov is in Spring (or Autumn for you Northerners) (5 days)
1 Dec—28 Feb in Summer (or Winter) (90 days)
Thus, 5.3% is in Spring/Autumn and 94.7% is in Summer/Winter.
Any Excel formula to work this out? Preferably not macro-dependent, but not a deal-breaker.
I would be difficult to explain in post. I have uploaded the solution at http://www.2shared.com/file/WHU5v_h1/Book1.html Let me know if you have questions...

Resources