Count concurrent occurrences of overlapping times, down to the second, across multiple dates - excel

I have a table with the following structure
Start
End
18/11/2022 15:09:39
18/11/2022 15:09:54
18/11/2022 15:09:10
18/11/2022 15:09:24
I am looking for a formula that will count the concurrent occurrences that overlap down to the second. It should be a count for each set of overlaps.
i.e., take below example data
Start
End
18/11/2022 14:57:09
18/11/2022 15:07:15
18/11/2022 15:06:02
18/11/2022 15:07:07
18/11/2022 15:03:10
18/11/2022 15:03:26
18/11/2022 15:02:52
18/11/2022 15:03:05
18/11/2022 14:58:31
18/11/2022 14:58:44
18/11/2022 14:27:50
18/11/2022 14:56:38
18/11/2022 14:52:21
18/11/2022 14:54:11
The first 5 rows overlap with each other, so that is 5 concurrent overlaps
the last 2 rows overlap, but not with the first 5, so that would be a new count of concurrent overlaps.
The data goes over multiple days, so this needs to be considered in the solution
Multiple days data
Start
End
18/11/2022 9:59:27
18/11/2022 10:00:07
18/11/2022 9:49:51
18/11/2022 9:53:21
18/11/2022 9:38:16
18/11/2022 9:46:59
18/11/2022 9:45:37
18/11/2022 9:45:45
18/11/2022 9:41:44
18/11/2022 9:42:14
18/11/2022 8:34:01
18/11/2022 8:34:44
18/11/2022 8:11:46
18/11/2022 8:13:58
18/11/2022 8:08:46
18/11/2022 8:09:41
17/11/2022 19:18:47
17/11/2022 19:18:54
17/11/2022 18:50:49
17/11/2022 18:51:11
17/11/2022 17:40:20
17/11/2022 17:40:45
17/11/2022 17:00:04
17/11/2022 17:03:48
17/11/2022 16:58:35
17/11/2022 16:58:50
17/11/2022 16:54:31
17/11/2022 16:57:55
17/11/2022 16:34:01
17/11/2022 16:34:29
17/11/2022 16:32:30
17/11/2022 16:33:31
17/11/2022 16:28:23
17/11/2022 16:32:59
17/11/2022 16:30:38
17/11/2022 16:30:57
17/11/2022 16:22:10
17/11/2022 16:22:27
17/11/2022 15:51:36
17/11/2022 15:51:48
17/11/2022 15:48:10
17/11/2022 15:48:49
17/11/2022 15:40:22
17/11/2022 15:40:46
17/11/2022 15:30:32
17/11/2022 15:36:44
17/11/2022 15:33:11
17/11/2022 15:34:30
17/11/2022 15:32:05
17/11/2022 15:33:14
17/11/2022 15:23:27
17/11/2022 15:32:31
I have tried a few examples that seem to only deal with days but don't go to the seconds level.

Depending on how you want to display the result, you can have different options. I am assuming the following one, if not please update your question about the expected output for the sample data you provided. Assuming no excel version constraints as per the tags listed on the question.
=LET(starts, A2:A27, ends, B2:B27, MAP(starts, ends, LAMBDA(s,e,
SUM((starts <= e) * (ends >= s))-1)))
It indicates on each interval, the total number of overlaps. Here is the output:
General logical condition to check if two intervals A, and B overlap is the following:
AND(startA <= endB, endA >= startB)
In our case, we are comparing a given start (s) and end (e) on each iteration of MAP with the entire column starts, and ends, since each interval overlaps with itself, we need to subtract 1.

Related

How to extract months with data and find n-th value as starting point and n-th value as ending point in Excel Power Query, maybe VBA

I have a data set which consists of Date/Time, Pressure and Custom Column. This represents pressure over time data, where I wanna know my starting point (after 5 minutes) and ending point of -before last value (row) within one month. To help you a bit out, usually the measurements are taking roughly 30-40 mins what you can see on this example down. So it means the amount of data can vary.
The Time column is calculated using:
=([#[Date/Time]]-I5)*1440+L5
This data set represents whole data and all the months with values, and I need separated (filtered) months with these starting/ending points as on the screenshot. I used Power Query a lot to play with data, but maybe there is another method to obtain those values...and make them dynamic when possible for future data.
I will also upload my dummy workbook with whole data set (all the months), filter table with months if needed for your infos and test.
https://docs.google.com/spreadsheets/d/1LGl-eri6ewCni2NJ2wGeoYIf-40KO2Lr/edit?usp=sharing&ouid=101738555398870704584&rtpof=true&sd=true
In Power Query:
Based on your shared workbook and what you have written, it seems that for any given month, you
edit: minor change in algorithm
start the minute count after excluding the first entry in the month.
If that is a typo/error, just remove the function that removes that first line
with that second entry = minute 0, return the first entry in or after minute 5 as well as the next to last entry in the table.
Note that I started with just the Date and Pressure columns
Algorithm
Add a column of monthYear
GroupBy monthYear
Custom aggregation to
Remove the first and last rows of the table
Create a list of durations in minutes of each time compared with the first time in month. This will be a minute + fraction of a minute
Add that list as a column to the original table
Determine the first entry in or after the fifth minute
Determine the last entry
Filter the month subtable to return those two entries.
If you want to see the result for just a given month, you can filter the result in the resultant Excel table.
M Code
please read the comments and examine the Applied Steps to better understand the algorithm
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Date/Time", type datetime}, {"P7 [mbar]", Int64.Type}}),
//add month/year column for grouping
#"Added Custom" = Table.AddColumn(#"Changed Type", "month Year",
each Number.ToText(Date.Month([#"Date/Time"]),"00") & Number.ToText(Date.Year([#"Date/Time"]),"0000")),
#"Grouped Rows" = Table.Group(#"Added Custom", {"month Year"}, {
//elapsed minutes column
{"Elapsed Minutes", (x)=> let
//remove first and last rows from table
t=Table.RemoveColumns(Table.RemoveFirstN(Table.RemoveLastN(x)),"month Year"),
//add a column with the elapsed minutes
TableToFilter = Table.FromColumns(
Table.ToColumns(t)
& {List.Generate(
()=>[em=null, idx=0],
each [idx]< Table.RowCount(t),
each [em=Duration.TotalMinutes(t[#"Date/Time"]{[idx]+1} - t[#"Date/Time"]{0}), idx=[idx]+1],
each [em])}, type table[#"Date/Time"=datetime, #"P7 [mbar]"=number, elapsed=number]),
//filter for last entry (which would be next to last in the month
maxMinute = List.Max(TableToFilter[elapsed]),
//filter for first entry in the 5th minute
fifthMinute = List.Select(TableToFilter[elapsed], each Number.IntegerDivide(_,1)>=5){0},
//select the 5th minute and the last row
FilteredTable = Table.SelectRows(TableToFilter, each [elapsed]=fifthMinute or [elapsed]=maxMinute)
in FilteredTable,type table[#"Date/Time"=datetime, #"P7 [mbar]"=number, elapsed=number]}
}),
//remove uneeded column and expand the others
#"Removed Columns" = Table.RemoveColumns(#"Grouped Rows",{"month Year"}),
#"Expanded Elapsed Minutes" = Table.ExpandTableColumn(#"Removed Columns", "Elapsed Minutes", {"Date/Time", "P7 [mbar]"}, {"Date/Time", "P7 [mbar]"})
in
#"Expanded Elapsed Minutes"
Results from your shared workbook data
In Office/Excel 365
Filter Column (eg for January 2020)
E4: 1/1/2020
E5: 1/1/2020
Results
F4 (date/time 5th minute): =IF(COUNTIFS(Table1[Date/Time],">="&E4,Table1[Date/Time],"<" & EDATE(E4,1))=0,"",
LET(x,FILTER(Table1[Date/Time],(Table1[Date/Time]>=E4)*(Table1[Date/Time]<EDATE(E4,1))),
y, (x-INDEX(x,2))*1440,
z, XMATCH(5,y,1),
INDEX(x,z,1)))
G4: (Pressure 5th minute): =IF(F4="","",
LET(x,FILTER(Table1,(Table1[Date/Time]>=E4)*(Table1[Date/Time]<EDATE(E4,1))),
y, (INDEX(x,0,1)-INDEX(x,2,1))*1440,
z, XMATCH(5,y,1),
INDEX(x,z,2)))
F5: (Date next to last): =IF(COUNTIFS(Table1[Date/Time],">="&E5,Table1[Date/Time],"<" & EDATE(E5,1))=0,"",
LET(x,FILTER(Table1[Date/Time],(Table1[Date/Time]>=E5)*(Table1[Date/Time]<EDATE(E5,1))),
INDEX(x,COUNT(x)-1)))
G5: (Pressure next to last):=IF(F5="","",
LET(x,FILTER(Table1,(Table1[Date/Time]>=E5)*(Table1[Date/Time]<EDATE(E5,1))),
INDEX(x,COUNT(INDEX(x,0,1))-1,2)))

How to remove HH:MM:SS from Date column in dataframe?

I have a dataframe called: twitter_df with two columns, tweets and date columns. For every tweet there is a datetime entry like below:
Date
2019-11-29 12:50:54
2019-11-29 12:46:53
2019-11-29 12:46:10
2019-11-29 12:33:36
2019-11-29 12:17:43
I would like the entries for Date column to be this:
Date
2019-11-29
2019-11-29
2019-11-29
2019-11-29
2019-11-29
The reason I want the HH:MM:SS to be removed is because I am going to group it by the date.
I have tried several links but no luck on my end, can anyone assist? Some say I should make use of pd.to_datetime but not sure how i should go about.
This will create a new column, onlyDate, with only the date from the Date column.
If you want to replace the existing column change 'onlyDate' to 'Date' on the left hand side.
twitter_df['onlyDate'] = twitter_df['Date].dt.date

Pyspark date format from multiple columns

I have four string columns 'hour', 'day', 'month', 'year' in my data frame. I would like to create new column fulldate in format 'dd/MM/yyyy HH:mm'.
df2 = df1.withColumn("fulldate", to_date(concat(col('day'), lit('/'), col('month'), lit('/'), col('year'), lit(' '), col('hour'), lit(':'), lit('0'), lit('0')), 'dd/MM/yyyy HH:mm'))
but it doesn't seem to work. I'm getting format "yyyy-mm-dd".
Am I missing something?
For Spark 3+, you can use make_timestamp function to create a timestamp column from those columns and use date_format to convert it to the desired date pattern :
from pyspark.sql import functions as F
df2 = df1.withColumn(
"fulldate",
F.date_format(
F.expr("make_timestamp(year, month, day, hour, 0, 0)"),
"dd/MM/yyyy HH:mm"
)
)
Use date_format instead of to_date.
to_date converts a column to date type from the given format, while date_format converts a date type column to the given format.
from pyspark.sql.functions import date_format, concat, col, lit
df2 = df1.withColumn(
"fulldate",
date_format(
concat(col('year'), lit('/'), col('month'), lit('/'), col('day'), lit(' '), col('hour'), lit(':'), lit('00'), lit(':'), lit('00')),
'dd/MM/yyyy HH:mm'
)
)
For better readability, you can use format_string:
from pyspark.sql.functions import date_format, format_string, col
df2 = df1.withColumn(
"fulldate",
date_format(
format_string('%d/%d/%d %d:00:00', col('year'), col('month'), col('day'), col('hour')),
'dd/MM/yyyy HH:mm'
)
)

How to convert HHMM into time in minutes in excel

I have a column in excel in hhmm format. Note that there is no ':' in the middle. Also they are not zero padded. For example 620 is 6:20 AM.
1820 is 18:20 or 6:20 PM.
How do I convert this column into minutes.
My intention is to subtract two such columns to obtain the time difference in minutes.
Formula:
=INT(A1/100)*60+MOD(A1,100)
converts your timevalue into minutes
You can perform some transformation in Power Query Editor for your purpose as stated below-
Let your data looks like below-
Here is Advanced Editor code for new steps-
let
//Your existing code,
#"Split Column by Position" = Table.SplitColumn(#"previous_step_name", "time_in_number", Splitter.SplitTextByPositions({0, 2}, true), {"time_in_number.1", "time_in_number.2"}),
#"Merged Columns" = Table.CombineColumns(#"Split Column by Position",{"time_in_number.1", "time_in_number.2"},Combiner.CombineTextByDelimiter(":", QuoteStyle.None),"Merged"),
#"Changed Type2" = Table.TransformColumnTypes(#"Merged Columns",{{"Merged", type time}})
in
#"Changed Type2"
Here is the final output-

Create a timestamp Column in Spark Dataframe from other column having timestamp value

I have a spark dataframe having a timestamp Column.
I want to get previous day date of the column.Then add time (3,59,59) to the date.
Ex- value in current column(x1) : 2018-07-11 21:40:00
previous day date : 2018-07-10
after adding time(3,59,59) to the previous day date ,it should be like :
2018-07-10 03:59:59 (x2)
I want to add a column in the dataframe with "x2" values corresponding to "x1" values in all records.
I want one more column with values equal to difference of (x1-x2).totalDays in exact double values
Substracting day and adding time and converting to timestamp type
from pyspark.sql.types import *
from pyspark.sql import *
>>>df.withColumn('x2',concat(date_sub(col("x1"),1),lit(" 03:59:59")).cast("timestamp"))
Caluculating Time and Date difference:
Date Difference:-
Using datediff function we can caluculate date difference
>>>df1.withColumn("x3",datediff(col("x1"),col("x2")))
Time Difference
Calculate time difference for this convert to unix time then subtract x2 column from x1
>>>df1.withColumn("x3",unix_timestamp(col("x1"))-unix_timestamp(col("x2")))

Resources