how would you check whether two years of the same time series are statistically different from one another? - statistics

Wondering the best way to check whether there is a statistical difference between two different years of time series data from the same data set, ie 2019 data vs 2020 data (expecting there should be a difference due to covid). Would you just t test the means of each year, or is there something more specific?

Related

How to convert BC dates in NodaTime Instant to a regular date

Instant.FromUnixTimeSeconds(-100100000000).ToDateTimeUtc1
Once the date gets too ancient this doesn't work anymore, for example, BC dates.
Is there any easy way to convert NodaTime instant values to years months days, that works for the entire range of supported Instant values (aka 27000 BCE to 31000 CE) ?
I don't mind what data type, I am just looking to easily extract the regular time periods from the Instant values.
It's been a year or more since I've used Nodatime, but [this page in the user guide][1] says
Additionally, all calendars are restricted to four digit formats, even
in year-of-era representations, which avoids ever having to parse
5-digit years. This leads to a Gregorian calendar from 9999 BCE to
9999 CE inclusive, or -9998 to 9999 in "absolute" years.
You're question could be read to mean you didn't think BC dates worked at all. When you get more than a few thousand years from the present, strange things start to happen, such as the changing rotation rate of the Earth means there are different kinds of days; those that would be counted from sunrises vs. the kind of time used in radioactive decay, or calculation of planetary positions. It might be helpful if you mentioned your application.
[1]: https://nodatime.org/3.1.x/userguide/range

Calculate most common time of day from spreadsheet values

Preliminary
This question applies to any spreadsheet system. I would like help in breaking down the problem, as opposed to an answer to the problem. (Although the latter would be most useful.)
I understand Stack Overflow is good for specific programming problems, and I understand it may take me a few attempts to get my question right, so please help me clarify my question by providing suggestions and I will update it.
Like many data novices I have good experience with discreet data (e.g. how many enquiries last month), but I struggle to understand how to deal with continuous data (e.g. how to discover patterns, and where the criteria for a query are not yet known).
The question
I have a spreadsheet where each row represents a "website enquiry". There is a datetime column, and I'd like to discover patterns in this data, to answer questions like:
what is the most common time of day to receive an enquiry
what is the most common day of the week to receive an enquiry
other useful information I can glean from the data, to allow me to target possible customers
This would be similar to the functions you often see in Social Media analytics, such as "best time to tweet".
I understand that calculating the most common day of the week is very simple, as days are discreet objects. So I don't need help with this!
I would like to avoid simply splitting up the day into four arbitrary time periods (e.g. breakfast, lunch, dinner, nighttime) and counting the number of rows that fall into these bounds. What if these time periods are not best to use to segment the data?
Is there another way, other than quantizing my data using arbitrary bounds?
You could use clustering to find out what the most common times are. Basically, you compare the time separation of enquiries and cluster them just like discrete 1D set of numbers using, for example, the average linkage clustering criterion. As you reach a reasonably small number of clusters, you will start to see the most dominant times of day (and if you want to evaluate those, you can take the time values which are the weighted centres of the biggest clusters).

Statistical method for time-course data comparison

I have a question for statistical method which i cant find in my textbook. I want to compare data of two groups. For example, both group have data of day 0, but one group have data of day 2, and another day 6. How can I analyse the outcome with the data and the date? i.e. I want to show that the if data taken on day XX are YY, it has an impact on the outcome.
Thanks in advance.
I'd use a repeated measures ANOVA in this case. However, since you don't have a complete dataset, day X and Y would be just operationalized as the endpoint of your dependent variable. If you'd have measures of all days I'd include.all of them in the analysis in order to fully compare the two timelines. You could then also compare the days of interest directly by using post-hoc tests (e.g. Bonferroni)

Trying to use countif with multiple criteria involving a time range and a set amount [duplicate]

This question already has answers here:
Excel Countif with multiple if requirements
(4 answers)
Closed 8 years ago.
Needing to automate results of alerts within certain time frames based on a 24 hour clock (00:00:00). I believe that a COUNTIF command may be a solution, but not really sure how to use or setup.
Here are my columns that I am trying to use...
E11:E61 - represents my time. I am looking to pick out those values that fall within 02:00:00 - 06:59:59 from this column.
Then, after filtering that answer. I need to see which of cells in that time frame equal a win or a loss. My column for win or losses is Q11:Q61. That will be one separate field that I will calculate this in.
Then, in another field, I need to calculate a win amount based on whether a cell falls within the time frame and shows a win. My win amount column is U11:U61.
So, a little more complicated than my experience allows me to solve. So, I could whatever suggestions or recommendations for solving this.
This has been a great resource for me, and I appreciate everyone's input.
As guitarthrower suggested in the comments, you could use the COUNTIFS function to achieve what you are looking for:
WINS =COUNTIFS(E11:E61,">=2:00:00",E11:E61,"<=6:59:59",Q11:Q61, "WIN")
LOSSES =COUNTIFS(E11:E61,">=2:00:00",E11:E61,"<=6:59:59",Q11:Q61, "LOSS")

How Can I Model Many Short Time Series Samples?

How Can I Model Multiple Short Time Series Samples?
For example, let's say I have a new subject each month, and I measure each subject every day for the entire month. I then want to model these multiple strings of independent time series because I assume that there is an underlying pattern that applies to all 12 subjects. However, a time series with an n of 30 is too short to model, so is there some way to group these 12 time series together for a parallel analysis?
I imagine the way to handle this is similar to how one might handle a time series with multiple breaks of unknown length. Unfortunately, I unaware of how to deal with this type of data structure.
Any thoughts on where to even begin? What terms I should research?
Well. Depends on what you're interested in. Makes it a lot easier if we know what kind of data you have, and what you're trying to analyse.
Trying to answer your question: If you assume that there is some underlying structure which is homogenous for, say, 6 of the subjects, and different for the other half, you can just pool the two data sets and do some kind of group-mean analysis. If you're interested in a temporal change over the 12 months, then you need to assume that each subject are homogenous across whatever variable you're measuring.
Normally, for e.g. timeseries in economics, what you're describing is called "censored" or "truncated data".
If we want to measure the income of everyone in a country, we do this by checking electronic paychecks or something. But some people at the end of each tail, may not have a visible income. Poor people may be earning income in other ways, and rich people may want to hide some of their income. This is censored data, and any advanced timeseries stats book will have something on that.
Truncated data is similar. Just imagine income again. If we truncate everyone who makes < 10,000$ a year, then this will "cut off the end" of your distribution. There are also remedies for this. Again check an advanced time series book.
Hope this helped a bit.

Resources