I have a question for statistical method which i cant find in my textbook. I want to compare data of two groups. For example, both group have data of day 0, but one group have data of day 2, and another day 6. How can I analyse the outcome with the data and the date? i.e. I want to show that the if data taken on day XX are YY, it has an impact on the outcome.
Thanks in advance.
I'd use a repeated measures ANOVA in this case. However, since you don't have a complete dataset, day X and Y would be just operationalized as the endpoint of your dependent variable. If you'd have measures of all days I'd include.all of them in the analysis in order to fully compare the two timelines. You could then also compare the days of interest directly by using post-hoc tests (e.g. Bonferroni)
Related
An example, the time someone left home and the time someone called 9-1-1 and put these points in to predict ideally the time of incident on an excel format. I can put in a time in column a and column b but all it does is give me the half way point between the two. example column a says 12:00 and column b says 1:00 and the result would be 12:30. If I can get some thing more predictive using this approach, that is ideally what I'm looking for.
I used some of the standard functions in Excel to predict time based series.
We were looking at predicting data points for 1mis, 3mis and 6mis (mis = Months In Service).
We found that the forecast() function with some "fiddle" factors - sorry finely tuned polynomial assumptions - gave a reasonable prediction for our needs. We fed it steps of historical data to see the performance until it was suitable for what we needed.
I am required to do some summary statistics on the attached table as an example.
Some of the questions to answer include:
1) How many countries with valid time series (countries that have at least one value/number for a given indicator name over the time period of 2010-2015)
e.g: Count how many countries have valid times series for the indicator: "Number of completed applications"
2) For a given country and indicator what is the number of year(s) with valid time series.
e.g: For the indicator number of completed applications and the country Canada? (Answer: 2 --> 2014, 2015)
Alternatively, if the table look like this instead (which is a typical csv format) what approach could be taken to answer the two summary statistics questions above?
I have tried method of sumproduct formula for the pivoted table. Is there a better way than this method?
=SUMPRODUCT(N((B2:B14>0)+(C2:C14>0)+(D2:D14>0)+(E2:E14>0)+(F2:F14>0)+(G2:G14>0)+(H2:H14>0)+(I2:I14>0)+(J2:J14>0)>0))
But what about when it is a flat table?
So, an example of countifs() and also sumifs():
From Nevsky -- Thanks a lot for the example! I took the liberty to modify it a bit as follows :
I am averaging a number, grouped by week inside of Google Data Studio, and i am averaging the same numbers grouped by week inside of Big query however the output is slightly different.
Overall Score
AVG(table.score) OVER (PARTITION BY Weeknum) as OverallScore
The datasource is a list of scores, along with a date. I am averaging this inside DS using the aggregate function within the metric, and using the Time dimension ISO Year Week.
The purpose of this is to have one set of numbers hard coded, whilst the other line is used to filter to different departments, keeping the original "overall" score present to be used as a benchmark.
Exporting my table into excel, i can average it filtered by week 3 (See below) and i it returns 19.59 as well. Meaning, the avg aggregate function inside Datastudio is the same as excel. Also, i can query the table using the below, which rules out an averaging difference inside bigquery. However when i place overall score into the graph below i get slightly different numbers for the overall score..
SELECT avg(overallscore) FROM `dbo.table` where weeknum = '2018 3'
Output = 19.59
Does anyone have an idea what may be causing this?
When you open the report, you should be able to see the query it runs in your query history in Big Query. Check that it's using the same formula as sometimes it uses approximate aggregates.
I have a calculated field a/b which makes sense at week level, where a is last of period metric and b is sum of the period metric. I need to find avg(a/b) for the weeks that fall under a month and not end_of_month(a)/sum(b) for the month. I made my a/b metric with regular aggregate as calculated and then monthly average metric with regular aggregate as average, but it doesn't work. The report is a crosstab report. How can I solve this?
Edit: a is end on hand inventory, b is sales, a/b is weeks of supply. Both a and b spread along product/location/time dimensions. For a, I've set its regular aggregate for time as last.
Your form of expression gives me error, so, I tried average((total([a] for [week]))/(total([b] for [week]))) which is error-free but doesn't give correct result. I used total[a] because it still has to sum along other dimensions except time. Any ideas?
I was also trying an alternative way- get the individual weeks of supply and then derive a new metric as (first week wos + ...+fifth week wos)/5. But I get a warning when I try to put a case statement as "Relational query objects are being used in conjunction with Dimensionally-modeled relational objects", and the metric give garbage value. How can I apply case involving relational item in query items of measure dimension?
You have to be more clearer on what you are trying to achieve.
Also, the header says framework manager, while you are talking about report.
My best guess is you need to use the for expression in your aggregate the values:
avg((max([a] for [week]))/(sum([b] for [week])))
You might need to use other (more sophisticated) summary function
for more details about the FOR and AT look here:
Using the AT and FOR Options with Relational Summary Functions
I have a data matrix depicting the number of telephone calls from one telephone to another, all calls are unidirectional. The rows represent days and the columns represent hours. The data is not a sample - it is the full population. Rows are days of the week and columns are one hour blocks of a 24 hour clock. Values in the cells represent the number of telephone calls from telephone A to telephone B for that specific hour.
I would like to have a repeatable measure that enables me to tell my audience that the likelihood of this distribution occurring randomly is <x.
I'd like the formula for Excel 2007 or, as a last resort, VBA code.
I've searched and found answers that tell me how to statistically determine the significance of differences between two different data sets but not how to measure for just one data set against a random outcome.
Thanx in advance.
If the total number of calls in a given hour is T, and the total calling population is P; then the number of calls from A to B should be about T/P if "random". To test whether this is really the case you'd use the Chi-squared test. I'm afraid I don't have time to give you the full answer - but it'd be the testvalue=sum((observed_i/P - (T/P))^2/(T/P)) where you check the testvalue against the chi-squared table, and you can pick off the probability too. Excel can calculate these values. Refer Chi-Squared Test for more details.