Is there a way to operate and do calculations with the numbers between data gaps? - excel

I would like to do certain operations between groups of data in the same row. I have data from a device into Excel. This data gives me the time on the position of certain object on a screen. But there are some fragments of data missing due to noise of the instrument among others.
Is there a way to operate and do calculations with the numbers between these data gaps?
For example:
Cell Time Position
---- ------ --------- -------------
1 0.05 s 12.3 mm
2 0.10 s 12 mm
3 0.15 s 13 mm
4 0.20 s 13.33 mm
5 0.28 s 0 mm (data missing)
6 0.30 s 0 mm
7 0.36 s 14.21 mm
So, in the next column I want to have the final position - the initial position (overall object movement). and I want Excel to find the 0 and subtract the cell above from the first one. In the example it would be 13.33-12.30 and I want to do the same for the next set of data starting with 14.21 mm (until it reaches another 0).

Related

How to count items in Excel within a date range or without an end date

Note: date formats are DD/MM/YYYY
I have a list of records, each with one column for a start date, and one for an end date.
Every record has a start date, but if an item is current it has no end date and the end date cell is blank.
I want to write a/some formulas to determine how many records were a given age at a given date, rounded down to the nearest whole year.
So for example, how many records were 0-1 years old at the date at (cell reference R1), and then how many were 1-2 years, 2-3 years etc.
I want this to be reusable so that I can update the date at R1 each month and it recalculates automatically. This is easy enough for R1=TODAY, as I can assume all end dates are in the past, but for R1=EDATE(TODAY,-12) it becomes trickier.
As an example, in the yellow highlighted cell I want to calculate how many records were between 1&2 years old as of 30/06/21 (S1), AND were current at the time (i.e. exclude from the count any records that have an end date before 30/06/21).
The blue highlighted area is my data, the green area is what I'm trying to calculate. I don't mind adding an extra data column or two if it assists in the calculation, but I don't want to have to add an extra column for every year that I'm trying to calculate, if it can be avoided.
Start Date
End Date
Years (as of 30/06/2022)
Age
30/06/2022
30/06/2021
30/06/2020
30/06/2019
30/06/2018
30/06/2017
30/06/2016
30/06/2015
30/06/2014
30/06/2013
20/09/2021
0.77
13
0
7/09/2020
4/12/2020
0.24
12
0
6/08/2019
2.90
11
0
17/02/2020
2.37
10
0
1/04/2019
3.25
9
0
16/03/2020
18/11/2020
0.68
8
0
17/08/2021
19/11/2021
0.26
7
0
23/08/2022
-0.15
6
0
16/11/2020
1/04/2022
1.37
5
0
20/04/2020
21/10/2021
1.50
4
0
7/05/2019
26/02/2021
1.81
3
2
29/06/2020
7/01/2021
0.53
2
5
16/08/2021
20/04/2022
0.68
1
5
0
13
I created a table for the data (insert table) --> tblData
=LET(calculatedAge,MAP(tblData[Start Date],tblData[End Date],
LAMBDA(startdate,enddate,
ROUNDDOWN((MIN(IF(ISBLANK(enddate),E$1,enddate),E$1)-startdate)/365,0))
),
filteredAges,FILTER(calculatedAge,calculatedAge=$D2),
IFERROR(ROWS(filteredAges),0))
The MAP-function returns the calculated age per target date (E$1) - rounded down.
It simulates a helper column - that then can be filtered.
Thanks to #Robert Mearns for the FILTER-hack - as COUNTIF doesn't work in this scenario (s. Using LET with COUNTIF and Array, e.g. MAP)

Function for finding duration where wave height <3m where time is between 5:00am and 6:00PM

I am trying to find duration for time where wave height is under 3m and time period is between 5:00am and 6:00pm. Trying to find this duration for a month of tidal data.
I have raw data for wave height and timestamps when it is high and low.
eg.
Timestamp Wave_Height
1/01/2022 3:16 0.68
1/01/2022 9:37 6.62
1/01/2022 16:14 1.07
1/01/2022 21:54 5.37
2/01/2022 4:06 0.59
etc…
So far I have got linear interpolation to find points where wave height=3. I am struggling to get a function to find the durations for my limits on time.
Included a picture to explain
Graph of wave data over time
The timestamps occur over different days in the month so difference between times must consider the changed dates in some cases(see rev 2 errors ####### where errors occur for changing of dates)
rev 2 error
The following should work. I have added some columns to avoid complicated formulas.
interpolate when the wave_height = 3 (column G)
add column H which is True when wave_height increases and False if it decreases (at the time in column G):
so cell H6 = F7<3 gives TRUE
add column E to limit the time window to 5:00-18:00.
E7 is =IF(D7<$G$2;$G$2;IF(D7>$H$2;$H$2;D7))
Added column I to calculate the time during wich wave_height < 3. The sum of that column is what you need.
I8 is =H8*(G8-E7)+NOT(H8)*(D8-G8)

Filtering discrepancies in duplicate measurements

I have a dataset with the following problem.
Sometimes, a temperature sensor would return duplicate readings at the exact same minute, where sometimes 1 of 2 of the duplicates is "reasonable" and the other is slightly off.
For example:
TEMP TIME
1 24.5 4/1/18 2:00
2 24.7 4/1/18 2:00
3 24.6 4/1/18 2:05
4 28.3 4/1/18 2:05
5 24.3 4/1/18 2:10
6 24.5 4/1/18 2:10
7 26.5 4/1/18 2:15
8 24.4 4/1/18 2:15
9 24.7 4/1/18 2:20
10 22.0 4/1/18 2:20
Line 5, 7 & 10 are readings that are to be removed as they are too high or low (doesn't make sense that within 5 minutes it will rise and drop more than a degree in a relatively stable environment).
The goal at the end with this dataset is to "average" the similar values (such as in line 1 & 2) and just remove the lines that are too extreme (such as line 5 & 7) from the dataset entirely.
Currently my idea to formulate this is to look at a previously obtained row, and if one of the 2 duplicates is +/- 0.5 degree, to mark in a 3rd column with TRUE so I can filter out all the TRUE values in the end. I'm not sure how to communicate within the if statement that I'm looking for a + OR - 0.5 of a previous number however. Does anyone know?
Here is a google sheet example that does what you want:
https://docs.google.com/spreadsheets/d/1Va9RjSeulOfVTd-0b4EM4azbUkYUb22jXNc_EcafUO8/edit?usp=sharing
What I did:
Calculate a column of a 3-item running average of the data using "=AVERAGE(B3:B1)"
Filter the list using "=IF(ABS(B2-C2) < 1, B2, )"
Calculate the average of the filtered list
The use of Absolute Value is what provides "+ OR -" that you were looking for. It is saying if the distance between two numbers is too much, then don't include the term.
So, A Simple Solution came to my mind. Follow the Following steps given below:
Convert Data to Table
Add a 4th column at the last
Enter the formula "Current Value - Previous Value"
Filter the Column with high difference values
Delete those rows of filtered data and you'll be left with Normal Values
Here's the ref. Image
Or If you want to consider the Same time difference only then do the following:
Convert your data to Table
Add 4th column at the end of table
Writhe the Following Formula to 4th Column
IF(Current_Time = Previous_Time, Current_Temp-Previous_Temp,"")
Filter and Delete the Data with high Difference
See the following Image:

Conditional Formatting based on date and value in Excel

I am trying to return the color for a score based on the date for the score and the score itself. Scoring has used different cut-offs over time:
Table 1
Date1 Score Color
Sep-16 24 [should be red]
Jul-16 6 [should be green]
Apr-14 12 [should be yellow]
... ... ...
Table 2
Date2 Red Orange Yellow Green
Aug-16 20 15 9.5 0
Jul-16 20 15.5 9.5 0
Apr-16 20 15 9.5 0
Mar-15 19 14 7 0
Feb-15 20 13 8.5 0
Jan-15 19 14 7 0
Apr-14 19 14 7 0
I want to place a formula in the "Color" cell that will evaluate Table 2 and return the column name for instances where the date in date1 is the most recent instance where it is greater than date 2, and for which the score given on table 1 is equal to or larger than the score given on table 2 for the correct row.
Thanks,
You need nested approximate lookups. This would be easier if your data was sorted the other way around. At least table 2 should have the columns in ascending order, instead of descending, so the match function can return the correct position of the number with an approximate match.
If you can arrange the columns in Table2 in the order Date2, Green, Yellow, Orange, Red, then the following formula will be possible.
=INDEX(Table3[[#Headers],[Green]:[Red]],MATCH([#Score],INDEX(Table3[Green],IFERROR(MATCH([#Date1],Table3[Date2],-1),1)):INDEX(Table3[Red],IFERROR(MATCH([#Date1],Table3[Date2],-1),1)),1))
This uses structured references, which accommodates rows being inserted into the tables without breaking the formulas.
Now you can use conditional formatting based on the cell values in column C.
Just for comparison, I have chosen to keep the lookup table (in Sheet2 rather than in an actual table) the same as in the question i.e. both tables are sorted from largest to smallest or most recent to least recent and the MATCHes both have -1 as the third argument:-

Implementing the offset function in Excel to calculate simple stats (Formula)

I have a massive amount of data which I would like to calculate some simple statistics (sum, mean, max). The data is grouped in columns and what I would like to do is calculate these statistics for the data in groups of sixteen columns. It is possible to manually select the columns to process but given the massive amount of data (365 columns x 601552 rows), I'm very likely to make mistakes. What I've been trying to figure out is how to get Excel to displace the cells selected each time the calculation is done. I know this entails the use of offset function but I can't figure out how to make it work. Any pointers will be highly appreciated. Thanks!
EDIT:
Essentially the data looks as follows:
LAT LONG 1 2 3 4 … 365
-40 -20 10.50 0.00 1.70 0.00 … 0.00
-40 -19.9 19.00 5.00 0.00 0.00 … 9.30
-40 -19.8 0.00 0.00 0.00 5.60 … 0.00
-40 -19.7 12.00 3.40 0.00 0.00 … 0.00
… … … … … … … …
40 55 0.00 0.00 7.60 7.00 … 0.00
It is basically 365 days worth of rainfall for a large group of coordinates. What I want to do is collate basic stats (sum, mean and maximum rainfall) for each coordinate in 16-day aggregates (which comes to 22 full 16 day aggregates plus one with 13 or 14 days depending on whether it is a leap year). The formula I'm using right now is =SUM(OFFSET(C2,,,1,16)) which works fine for the first column (reference cell C2) but I want to copy this across the entire sheet. I think there is a way to get it to increment the reference cell by 16 each time but I can't seem to figure that out.
Here is a proof of concept of what you might be after. I've set it up to calculate statistics on a four-day cycle, but you should be able to extend it to a 16-day cycle:
I copied the "labels" (LAT/LONG values) to duplicate the index.
The formula in cell D17 is
=IF(MOD(COLUMN()-2,4)=0,AVERAGE(OFFSET(D17,-15,-3,1,4)),
IF(MOD(COLUMN()-2,4)=3,MEDIAN(OFFSET(D17,-15,-2,1,4)),
IF(MOD(COLUMN()-2,4)=2,MAX(OFFSET(D17,-15,-1,1,4)),"")))
where the condition is meant to only show a specific statistics based on the column you're in. The syntax OFFSET(<ref>,<rows>,<cols>,<height>,<width>) selects a range (<rows>,<cols>) from <ref> with a height of <height> and width <width>. So, for cell D17, OFFSET(D17,-15,-3,1,4) selects the 1x4 range with a top-left corner in C2.
You can use the same formula to obtain the column labels MAX, MEDIAN, AVG, ...

Resources