I have two sets of data.
Dataset 1 has a time and date associated with a dive depth (column B). These times are (mostly) at 90s intervals.
Dataset 2 has a time and date associated with a seafloor depth (column E), but these times and dates are much less frequent and rarely match the dive depth time exactly.
I need each dive depth to be associated with a seafloor depth based on the time it was recorded.
For example in the table below, I would want all of the dive depths seen here to be associated with the value from E2 as this is the closest data temporally, and I would like this to be displayed in the column C.
Both sets of data cover a time period of over a month and I have many thousands of rows worth of dive data.
Related
(I use the term "teams" generically here because the entirety of this question rests on ranking, and it seemed to be the most intuitive language to describe my problem.)
In a league of 30 teams, each day only 8 teams play. The results for those teams are ranked ordinally from 1 to 8 for the day. This continues "forever", so that additional results must be recorded every day.
Example after 4 days:
I want to calculate a single number to describe the relationship between two teams. For instance, given the example, the value (in a 2d table) that describes the relationship of Ace to Get is 1. Ace beat Get twice and Get beat Ace once (2-1).
I have been messing with Sumproduct, Match, and Index to get get values, which I could calculate using many extra tables, but I may need to add "teams" on the fly, and I do not know how large the pool of teams will become. Because of this, I was hoping to be able to use a single formula in the 2d relationship table. The results of that table, looking at just day 1 and day 2 given the previous example, are:
Is there a direct formula I can use to calculate the results to populate that table?
You can try following formula:
=IF($A11<>B$10;
SUMPRODUCT(
IF(MMULT(($B$1:$I$1)*($B$2:$I$3=$A11);ROW($1:$8)^0)
<MMULT(($B$1:$I$1)*($B$2:$I$3=B$10);ROW($1:$8)^0);
1;
-1)
*(((MMULT(--($B$2:$I$3<>$A11);ROW($1:$8)^0)=8)
+(MMULT(--($B$2:$I$3<>B$10);ROW($1:$8)^0)=8))
=0));
"")
Copy right and down.
I have an excel dataset of 24-hour moving averages for PM10 air pollution concentration levels, and need to obtain the individual hourly readings from them. The moving average data is updated every hour, so at hour t, the reading is the average of the 24 readings from t-23 to t hours, and at hour t+1, the reading is the average of t-22 to t+1, etc. I do not have any known data points to extrapolate from, just the 24-hour moving averages.
Is there any way I can obtain the individual hourly readings for time t, t+1, etc, from the moving average?
The dataset contains data over 3 years, so with 24 readings a day (at every hour), the dataset has thousands of readings.
I have tried searching for a possible way to implement a simple excel VBA code to do this, but come up empty. Most of the posts I have seen on Stackoverflow and stackexchange, or other forums, involve calculating moving averages from discrete data, which is the reverse of what I need to do here.
The few I have seen involve using matrices, which I am not very sure how to implement.
(https://stats.stackexchange.com/questions/67907/extract-data-points-from-moving-average)
(https://stats.stackexchange.com/questions/112502/estimating-original-series-from-their-moving-average)
Any suggestions would be greatly appreciated!
Short answer: you can't.
Consider a moving average on 3 points. And even consider we multiply each MA term by 3, so we really have sums of consecutive
Data: a b c d e f g
MA a+b+c
b+c+d
c+d+e
d+e+f
e+f+g
With initial values, you can do something. To find the value of d, you would need to know b+c, hance to know a (since a+b+c is known). Then to find e, you know c+d+e and d, so you must find c, and since a is already needed, you will need also need b.
More generally, for a MA of length n, if you know the first n-1 values (hence also the nth, since you know the sum), then you can find all subsequent values. You can also start from the end. But basically, if you don't have enough original data, you are lost: there is a 1-1 relation between the n-1 first values of your data and the possible MA series. If you don't have enough information, there are infinitely many possibilities, and you can't decide which one is right.
Here I consider the simplest MA where the coefficient of each variable is 1/n (hence you compute the sum and divide by n). But this would apply to any MA, with slightly more complexity to account for different coefficients for each term in the sum.
I am new to Spotfire and need help in getting the right expression for a calculated column.
My Data contains different subjects grouped in column ID. For every ID, Bodyweight was measured on different days. Days are given in column Day and stated as 1,2,3...
The last day is denoted by Last and Bodyweight measurements given in another column. Another column is present which is called Baseline. The Body weight measured is considered as baseline if the column contains a Y for that row.
I need to insert a calculated column, which will contain the difference between Body measurement measured on Day denoted Last and Body measurement marked by Y in column Baseline.
This should be done for every new ID. I am not able to figure this out. Could someone advise me on how to go about it?
Here is an example attached
So, the calculated column for Rita will give -4 (body weight at Last=56 and BodyWeight at baseline=56, so 52-56 =-4)
the sample data you provided is a little weird, particularly the [Day] column. if it's within your control, I suggest to use actual dates rather than a number/string here.
barring that, I was able to get your desired results, but it required two calculated columns: the first one will consolidate the [Day] and [Baseline] columns into a single column, and the second one contains your desired info.
column 1, which I called Day (int):
CASE
WHEN [Day]="Last" THEN 1000000
WHEN [Baseline]="Y" THEN -1000000
WHEN [Day]!="Last" THEN Integer([Day])
END
I picked a random high and low max to establish a chronological order. this will put 1000000 in place of "Last" (if you have any programs that are longer than one million days, you'll need to increase this number). the same for the [Baseline] column, but that value will be -1000000, which is presumably the lowest value you will ever see in this column. both of these are assumptions and may not work for your implementation. finally, in all other cases, the day number will be used.
column 2, which I called Diff:
Last([Weight]) OVER (Intersect([Name],LastNode([Day (int)]))) -
First([Weight]) OVER (Intersect([Name],FirstNode([Day (int)])))
the first line uses what's called an OVER expression to retrieve the first value for [Weight], ordered by [Day (int)], per [Name]. the second line gives the reverse of that, and so the difference is calculated as -4 (or whatever is appropriate).
I'm analyzing a set of clinical data in Excel that includes
the level of care that a variety of individuals have received
the date of the receipt of that care
the date that the care ended
The question I'm trying to answer in a report is how many individuals that have received a higher level of care connected to a lower level within 14 days.
I've organized the data in such a way that by creating a pivot table, it organizes the data nicely. However, in this dataset individuals have had multiple instances of each level of care that may or may not be connected within 14 days to a lower level of care.
Granted, the dataset is small enough to count this out by hand, but I foresee having to do this many times in the future with possibly much larger datasets.
As such, I'm wondering if there's a way to automate this process. I can almost conceptualize a nested if statement to flag the instances prior to developing the pivot table, and then count these flags as follows:
as everything is already 2 way sorted by individual and then by date, I might be able to do if(levelofcare<>levelofcare of the cell above it, if(date of admission - date of discharge of the cell above it <=14,1),0 and then generate the pivot table and sum that column.
However, I feel this would be rather inaccurate considering the data and that sometimes, the "level of care" field isn't a standardized string.
I would add columns to the data that calculate what you need to show in the pivot table.
Probably a column that shows individuals that received a higher level of care (an if statement that gives a yes or no answer). A column that shows if an individual received a lower level within 14 days if the previous column is yes.
These can then be added to the pivot.
Hypothetical answer as no data was provided in the question - happy to edit if data provided.
I have a data matrix depicting the number of telephone calls from one telephone to another, all calls are unidirectional. The rows represent days and the columns represent hours. The data is not a sample - it is the full population. Rows are days of the week and columns are one hour blocks of a 24 hour clock. Values in the cells represent the number of telephone calls from telephone A to telephone B for that specific hour.
I would like to have a repeatable measure that enables me to tell my audience that the likelihood of this distribution occurring randomly is <x.
I'd like the formula for Excel 2007 or, as a last resort, VBA code.
I've searched and found answers that tell me how to statistically determine the significance of differences between two different data sets but not how to measure for just one data set against a random outcome.
Thanx in advance.
If the total number of calls in a given hour is T, and the total calling population is P; then the number of calls from A to B should be about T/P if "random". To test whether this is really the case you'd use the Chi-squared test. I'm afraid I don't have time to give you the full answer - but it'd be the testvalue=sum((observed_i/P - (T/P))^2/(T/P)) where you check the testvalue against the chi-squared table, and you can pick off the probability too. Excel can calculate these values. Refer Chi-Squared Test for more details.