How can I obtain hourly readings from 24 hour moving average data? - excel

I have an excel dataset of 24-hour moving averages for PM10 air pollution concentration levels, and need to obtain the individual hourly readings from them. The moving average data is updated every hour, so at hour t, the reading is the average of the 24 readings from t-23 to t hours, and at hour t+1, the reading is the average of t-22 to t+1, etc. I do not have any known data points to extrapolate from, just the 24-hour moving averages.
Is there any way I can obtain the individual hourly readings for time t, t+1, etc, from the moving average?
The dataset contains data over 3 years, so with 24 readings a day (at every hour), the dataset has thousands of readings.
I have tried searching for a possible way to implement a simple excel VBA code to do this, but come up empty. Most of the posts I have seen on Stackoverflow and stackexchange, or other forums, involve calculating moving averages from discrete data, which is the reverse of what I need to do here.
The few I have seen involve using matrices, which I am not very sure how to implement.
(https://stats.stackexchange.com/questions/67907/extract-data-points-from-moving-average)
(https://stats.stackexchange.com/questions/112502/estimating-original-series-from-their-moving-average)
Any suggestions would be greatly appreciated!

Short answer: you can't.
Consider a moving average on 3 points. And even consider we multiply each MA term by 3, so we really have sums of consecutive
Data: a b c d e f g
MA a+b+c
b+c+d
c+d+e
d+e+f
e+f+g
With initial values, you can do something. To find the value of d, you would need to know b+c, hance to know a (since a+b+c is known). Then to find e, you know c+d+e and d, so you must find c, and since a is already needed, you will need also need b.
More generally, for a MA of length n, if you know the first n-1 values (hence also the nth, since you know the sum), then you can find all subsequent values. You can also start from the end. But basically, if you don't have enough original data, you are lost: there is a 1-1 relation between the n-1 first values of your data and the possible MA series. If you don't have enough information, there are infinitely many possibilities, and you can't decide which one is right.
Here I consider the simplest MA where the coefficient of each variable is 1/n (hence you compute the sum and divide by n). But this would apply to any MA, with slightly more complexity to account for different coefficients for each term in the sum.

Related

Calculating number of hours spent per product, with diminishing effort

I want to calculate the number of work hours it takes to produce X. The first X takes 20 hours, but for each X it takes 20% less time. However, it will always take a minimum of 2 hours.
Any help is appreciated.
In Excel, this is really easy: 20% less means you are calculating 80% of the value, which in fact means that you are multiplying the value with 0.8.
As the value can't go below 2, you can simply take the maximum between the calculated value and 2, using the formula:
=MAX(2,0.8*A1)
The result looks as follows:
Have fun!
In order to calculate the sum, you can use the simple formula =SUM(A$1:A2) up to the end, as you can see in following screenshot:
An individual term of a geometric series is given by
The sum of a geometric series is given by
where in your case a=20 and r=0.8
You can show by taking logs or by trial and error that in your particular case you have
0.8^10 = 0.107374
so when n=11 you can see that the time has diminished to just over 2 hours. After that each rep takes 2 hours. So you have
=a*(1-r_^MIN(C2,11))/(1-r_)+MAX(0,C2-11)*2
for the total.
If you just want the time per item, it's
=IF(C2<=11,a*r_^(C2-1),2)
where a and r_ are named ranges for a and r, and the values of N are in column C.

Fast token overlap between strings

I have two sets of tokenised sentences A and B and I want to calculate the overlap between them in terms of common tokens. For example, the overlap between two individual sentences a1 today is a good day and b1 today I went to a park is 2 (today and a). I need a simple string matching method, without fuzzy or advanced methods. So the result is a matrix between all sentences in A and B with an overlap count for each pair.
The problem is that, while trivial, it is a quadratic operation (size of A x size of B pair-wise comparisons). With large data, the computation gets very slow very quickly. What would be a smart way of computing this avoiding pair-wise comparisons or doing them very fast? Are there packages/data structures particularly good for this?

Aligning columns and associated data based on similar times/dates in excel

I have two sets of data.
Dataset 1 has a time and date associated with a dive depth (column B). These times are (mostly) at 90s intervals.
Dataset 2 has a time and date associated with a seafloor depth (column E), but these times and dates are much less frequent and rarely match the dive depth time exactly.
I need each dive depth to be associated with a seafloor depth based on the time it was recorded.
For example in the table below, I would want all of the dive depths seen here to be associated with the value from E2 as this is the closest data temporally, and I would like this to be displayed in the column C.
Both sets of data cover a time period of over a month and I have many thousands of rows worth of dive data.

Comparing count data with lots of zeroes

I'm not one to search for the most tenuous significant difference I can find, but hear me out.
I have some count data with four groups (3 of these can be combined to one, if necessary), groups A, B, C, and X.
Looking at the means and interval plots, X is clearly greater than the others (in terms of mean value), yet I cannot find any statistical test to back this up. This is, I believe, somewhat due to a high variability within groups and the large number of zero values.
I have tried normalized, removing zeroes, parametric, non-parametric, and more, with no success!
Any advice would be greatly appreciated as to how to approach this.
Many thanks.
The link below has the raw data. Groups A, B, and C can be combined into one group if it is relevant.
https://drive.google.com/open?id=0B6iQ6-J6e2TeU25Rd2hsd0Uxd2c

Excel Finding average speed

I have got 1500 rows of travels. In column A I have got total time on travel, in column B total km driven. In column C I did calculation on the average speed of specific travel. Whats the best way to calculate the average speed of all travels? The lengths are from 0 to 20 kms approx, time always shorter than one hour.
First I eliminated all travels shorter than 2 km then
I managed to do a frequency table and have written frequencies of speeds in 0-5,5-10,... km/h. Now I can do a histogram, but should I eliminate more data or how to approach this problem?
In another cell enter:
=SUM(B:B)/SUM(A:A)
A common error would be to try to average the values in column C.
it depends on your data. if it is statistics, don't throw data away, use them.
you have column A for travel time, and column B for travle distance. using this two column you can find the total average speed like what Gary's student suggest i.e. SUM(B:B)/SUM(A:A).
you also have column C the average speed for each travel, you can use this two counter check. simply do SUMPRODUCT(A:A,C:C), you should find the result equals to SUM(B:B). if the results match, then i'll say "ok i'm satisfied with my calculation".

Resources