How to count the number of cells in a row that have a number greater than 7 as part of the text such as "8 days" - excel

I have a spreadsheet that has several columns. I'm only going to show data from 2 of them here, because they're the 2 that I'm dealing with in this problem.
The first column is IP Addresses. The second column is how long ago the last response was or the last response date:
Address
Last Response
10.1.1.109
10/17/2022
10.1.1.113
10/17/2022
10.1.1.137
10/17/2022
10.1.1.188
4 days
10.1.1.199
10/17/2022
10.1.21.5
10/17/2022
10.1.21.50
45 days
10.1.50.41
10/17/2022
10.1.50.71
10/17/2022
10.1.88.10
10/17/2022
10.1.88.249
6 days
10.16.6.190
4 days
10.64.0.76
28 days
10.64.3.48
45 days
What I need to do is to get a few counts worked out. I want to know how many from each IP subnet have
A response older than 1 week
A Response older then 1 month.
In the sample data you can see 3 IP subnets: 10.4, 10.16, and 10.64. I am expecting to get results like:
IP Subnet
> Week
> Month
10.1
9
1
10.16
0
0
10.64
2
1
I have a formula for the "> Week", but I don't like it. I am not able to figure out how to count based on the number at the beginning of the text in that column. I tried a formula like this:
=COUNTIFS(AllIPAddresses,"10.1.*",AllResponses, NUMBERVALUE(LEFT(AllResponses, FIND(" ",AllResponses)))&">7")
Obviously this doesn't work. It gives me a column full of 0's.
What I have working for the "> Week" column:
=COUNTIFS(AllIPAddresses,CONCAT(A2,"*"),AllResponses,"<>7 days",AllResponses,"<>6 days",AllResponses,"<>5 days",AllResponses,"<>4 days",AllResponses,"<>3 days",AllResponses,"<>2 days",AllResponses,"<>Today",AllResponses,"<>Yesterday")
But like I said, I don't like it as it is just looking at the column and not counting 8 of the options. I would prefer if I could have a way to get it to look at the column and count those whose number of days is > 7. Something simple would be great, but something that is shorter and/or simpler than what I have I'll take. And I cannot reuse that effectively for the "> month" result because then I'd have to list some 30 different options that I don't want to count.
It would be better to have it look for the 1 option that I do want.
I'm hoping for something like:
First COUNTIFS counts all the text that have a number > 7
Second COUNTIFS counts all the dates that are more than 7 days before today
=COUNTIFS(AllIPAddresses, CONCAT(A2,"*"),AllResponses, LEFT(AllResponse,2)&">7")+COUNTIFS(AllIPAddresses, CONCAT(A2,"*"),AllResponses,"<"&today()-7)
And then I can reuse this for the "> month" by changing the 7 to a 30.
Though I know this formula doesn't work.
Any assistance with this problem would be appreciated!
Some Notes about my formulas
For ease of use I have named ranges:
AllIPAddresses = A2:A700
AllResponses = B2:B700
(in my formula for > week) A2 is referring to the "10.1." so that the CONCAT will give the result of "10.1.*" to the COUNTIFS
EDIT
I have added an answer that explains why I chose the solution that I did and how I had to tweak the answers I received to make them work for my specific scenario.

This could be accomplished using:
=LET(data,A2:B15,
_d1,INDEX(data,,1),
_d2,INDEX(data,,2),
lr,TODAY()-IF(ISNUMBER(_d2),_d2,TODAY()-(LEFT(_d2,LEN(_d2)-LEN(" days")))),
lft,TEXTBEFORE(_d1,".",2),
unq,UNIQUE(lft),
sq,SEQUENCE(COUNTA(lft),,1,0),
mm,--(TRANSPOSE(unq)=lft),
wk,MMULT(--(TRANSPOSE(lft)=unq)*TRANSPOSE(lr>7),sq),
mn,MMULT(--(TRANSPOSE(lft)=unq)*TRANSPOSE(lr>30),sq),
stack,HSTACK(unq,wk,mn),
VSTACK({"IP Subnet","> Week","> Month"},stack))
lft uses TEXTBEFORE to get the first 2 sections of the IP address.
lr calculates the number of days of the last response compared to today.
unq is the unique values of lft (IP subnet).
wk uses MMULT to calculate the conditional count of unique IP subnet values where lr is greater than 7.
mn is the same as wk, but where lr is greater than 30.

Perhaps you could try:
• Formula used in cell D2
=LET(_ipaddress,TEXTBEFORE(Address,".",2),
_days,IFERROR(TODAY()-ISNUMBER(Last_Response)*Last_Response,
SUBSTITUTE(Last_Response," days","")+0),
_uip,UNIQUE(_ipaddress),
_week,BYROW(_uip,LAMBDA(x,SUM(--(x=_ipaddress)*(_days>7)))),
_month,BYROW(_uip,LAMBDA(x,SUM(--(x=_ipaddress)*(_days>30)))),
VSTACK({"IP Subnet","> Week","> Month"},HSTACK(_uip,_week,_month)))
Explanations of each named variables used to make the calculations:
• Address --> is a defined name range and refers to
=$A$2:$A$15
• Last_Response --> is a defined name range and refers to
=$B$2:$B$15
• _ipaddress --> extracts the IP Subnet using TEXTBEFORE()
TEXTBEFORE(Address,".",2)
• _days checks whether the range is a number or text, since in Excel dates are stored as a number, we are using ISNUMBER() to check which returns TRUE and FALSE for text values,
So that said, the first part of the IFERROR() checks and returns the number of days,
TODAY()-ISNUMBER(Last_Response)*Last_Response
While the second part which is a text value, we are only substituting the " days" with empty and converting it to a number as well.
SUBSTITUTE(Last_Response," days","")+0
=IFERROR(TODAY()-ISNUMBER(Last_Response)*Last_Response,
SUBSTITUTE(Last_Response," days","")+0)
• _uip --> This gives us the Unique IP SUBNET
UNIQUE(_ipaddress)
• _week --> this gives us the count for each unique values row wise and returns as an array of output, for those days which are greater than 7 days.
BYROW(_uip,LAMBDA(x,SUM(--(x=_ipaddress)*(_days>7))))
• _month --> while this gives us the count for each unique values row wise and returns as an array of output, for those days which are greater than 30 days.
BYROW(_uip,LAMBDA(x,SUM(--(x=_ipaddress)*(_days>30))))
• Last but not least, we are packing all the variables that are needed to show as an output with in an HSTACK()
HSTACK(_uip,_week,_month)
To make it look good with a proper header we are wrapping it within an VSTACK(), along with the headers, [1x3] array
Well, you can also perform such tasks quite easily using Power Query as well:
To accomplish this task using Power Query please follow the steps,
• Select some cell in your Data Table,
• Data Tab => Get&Transform => From Table/Range,
• When the PQ Editor opens: Home => Advanced Editor,
• Make note of all the 2 Tables Names,
• Paste the M Code below in place of what you see.
• And refer the notes
let
//IPAddresstbl Uploaded in PQ Editor,
Source = Excel.CurrentWorkbook(){[Name="IPAddresstbl"]}[Content],
//Date Type Changed
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Last Response", type text}}),
//Extracting the IP SUBNET
#"Extracted Text Before Delimiter" = Table.TransformColumns(#"Changed Type", {{"Address", each Text.BeforeDelimiter(_, ".", 1), type text}}),
//Replacing " days" in last response column
#"Replaced Value" = Table.ReplaceValue(#"Extracted Text Before Delimiter"," days","",Replacer.ReplaceText,{"Last Response"}),
//Removing " 12:00:00 AM" from Date Time since we changed the data type of lastresponse as text
#"Extracted Text Before Delimiter1" = Table.TransformColumns(#"Replaced Value", {{"Last Response", each Text.BeforeDelimiter(_, " 12:00:00 AM"), type text}}),
//Adding custom column return the numbers of days
#"Added Custom" = Table.AddColumn(#"Extracted Text Before Delimiter1", "Custom", each if Value.Is(Value.FromText([Last Response]), type number) then [Last Response] else Date.From(DateTime.LocalNow()) - Value.FromText([Last Response])),
//Changing the data type of the custom column to ensure they are numbers
#"Changed Type1" = Table.TransformColumnTypes(#"Added Custom",{{"Custom", Int64.Type}}),
//Removing unwanted columns
#"Removed Columns" = Table.RemoveColumns(#"Changed Type1",{"Last Response"}),
//Returning 1 for those days which are more than 7 else returning as 0
#"Added Conditional Column" = Table.AddColumn(#"Removed Columns", "> Week", each if [Custom] > 7 then 1 else 0),
//Returning 1 for those days which are more than 30 else returning as 0
#"Added Conditional Column1" = Table.AddColumn(#"Added Conditional Column", "> Month", each if [Custom] > 30 then 1 else 0),
//Grouping by each IP Address
#"Grouped Rows" = Table.Group(#"Added Conditional Column1", {"Address"}, {{"> Week", each List.Sum([#"> Week"]), type nullable number}, {"> Month", each List.Sum([#"> Month"]), type nullable number}}),
//Renamed the IP Address Column
#"Renamed Columns" = Table.RenameColumns(#"Grouped Rows",{{"Address", "IP SUBNET"}})
in
#"Renamed Columns"
• Change the Table name as SUBNETtbl before importing it back into Excel.
• When importing, you can either select Existing Sheet with the cell reference you want to place the table or you can simply click on NewSheet

Answer chosen information
I choose the answer by P.b because I was able to get it to work for me first. I did have to tweak the code he provided like this:
=LET(data,AllData, _d1,INDEX(data,,1)
, _d2, INDEX(data,,4)
, lr, TODAY()-IF(ISNUMBER(_d2),_d2,TODAY()-(IF(_d2="Today","0",IF(_d2="Yesterday","1",LEFT(_d2,LEN(_d2)-LEN(" days"))))))
, lft, TEXTBEFORE(_d1,".",2)
, unq, UNIQUE(lft)
, sq, SEQUENCE(COUNTA(lft),,1,0)
, mm, --(TRANSPOSE(unq)=lft)
, wk, MMULT(--(TRANSPOSE(lft)=unq)*TRANSPOSE(lr>7),sq)
, mn, MMULT(--(TRANSPOSE(lft)=unq)*TRANSPOSE(lr>30),sq)
, stack, HSTACK(unq, wk, mn)
, VSTACK({"IP Subnet", "> Week", "> Month"}, stack))
This is what I was able to get working for me. It gives me the list of IP Subnets and the counts of how many of the IPs' "Last Response" was longer than 7 days or 30 days.
I have also used the Power Query example provided by Mayukh Bhattacharya. I was able to get this working as well. I did not test out the "let" formula that he provided as I already have a "let" formula working. I didn't chose that answer as the solution only because I was able to get the other "let" formula working first. I did have to tweak this answer as well in Power Query to look like this:
let
//AllData Uploaded in PQ Editor,
Source = Excel.CurrentWorkbook(){[Name="AllData"]}[Content],
//Date Type Changed
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Last Response", type text}}),
//Extracting the IP SUBNET
#"Extracted Text Before Delimiter" = Table.TransformColumns(#"Changed Type", {{"Address", each Text.BeforeDelimiter(_, ".", 1), type text}}),
//Replacing " days" in last response column
#"Replaced Value 1" = Table.ReplaceValue(#"Extracted Text Before Delimiter"," days","",Replacer.ReplaceText,{"Last Response"}),
//Replacing the "Today" and "Yesterday" with numbers
#"Replaced Value 2" = Table.ReplaceValue(#"Replaced Value 1", "Yesterday", "1", Replacer.ReplaceText, {"Last Response"}),
#"Replaced Value 3" = Table.ReplaceValue(#"Replaced Value 2", "Today", "0", Replacer.ReplaceText, {"Last Response"}),
//Removing " 12:00:00 AM" from Date Time since we changed the data type of lastresponse as text
#"Extracted Text Before Delimiter1" = Table.TransformColumns(#"Replaced Value 3", {{"Last Response", each Text.BeforeDelimiter(_, " 12:00:00 AM"), type text}}),
//Adding custom column return the numbers of days
#"Added Custom" = Table.AddColumn(#"Extracted Text Before Delimiter1", "Custom", each if Value.Is(Value.FromText([Last Response]), type number) then [Last Response] else Date.From(DateTime.LocalNow()) - Value.FromText([Last Response])),
//Changing the data type of the custom column to ensure they are numbers
#"Changed Type1" = Table.TransformColumnTypes(#"Added Custom",{{"Custom", Int64.Type}}),
//Removing unwanted columns
#"Removed Columns" = Table.RemoveColumns(#"Changed Type1",{"Last Response"}),
//Returning 1 for those days which are more than 7 else returning as 0
#"Added Conditional Column" = Table.AddColumn(#"Removed Columns", "> Week", each if [Custom] > 7 then 1 else 0),
//Returning 1 for those days which are more than 30 else returning as 0
#"Added Conditional Column1" = Table.AddColumn(#"Added Conditional Column", "> Month", each if [Custom] > 30 then 1 else 0),
//Grouping by each IP Address
#"Grouped Rows" = Table.Group(#"Added Conditional Column1", {"Address"}, {{"> Week", each List.Sum([#"> Week"]), type nullable number}, {"> Month", each List.Sum([#"> Month"]), type nullable number}}),
//Renamed the IP Address Column
#"Renamed Columns" = Table.RenameColumns(#"Grouped Rows",{{"Address", "IP Subnet"}})
in
#"Renamed Columns"
Now that I have started using Power Query I will probably continue with it as I have more transformations on this data to work on.

Using new 365 functions, generate unique IPs value, and count for >7 and >30 days.
If date retrieve differences, otherwise extract days number:
=LET(IP,TEXTBEFORE(AllIPAddresses,".",2),U,UNIQUE(IP),S,SEQUENCE(COUNTA(AllIPAddresses),,1,0),D,IFERROR(TODAY()-AllResponses,TEXTBEFORE(AllResponses," ")*1),W,MMULT((TRANSPOSE(IP)=U)*(TRANSPOSE(D)>7),S),M,MMULT((TRANSPOSE(IP)=U)*(TRANSPOSE(D)>30),S),HSTACK(U,W,M))
with HSTACK combine columns results.

Related

Comparing Cells in a row to see if adjacent cells have same values

Problem: My maximum Range is around 10000 Rows x 365 columns, I want to compare cell values across a row .
Conditions:
It has to return how many times a name is repeated in each row for every primary key
if a name comes only once in a row, that need not be shown, anything more than 2 should be displayed
It has to exclude blank cells and if it encounters "Dispatched" then it need not count further.
Requirement: Any solution either excel or macro would do.
Sample Excel File
Bag Number
8th July
9th July
10th July
11th July
12th July
13th July
20/F/43352/1
FILING
FILING
FILING
FINAL POLISH
FINAL POLISH
FINAL POLISH
20/F/43352/2
FILING
FILING
FILING
FINAL POLISH
FINAL POLISH
FINAL POLISH
20/F/43352/3
FINAL POLISH
QC
Dispatched
Dispatched
Dispatched
Dispatched
20/F/43352/4
Casting
Casting
Laser Cutting
Filing
Filing
FINAL POLISH
20/F/43352/5
Casting
20/F/43352/6
Casting
Casting
FINAL POLISH
Dispatched
20/F/43352/7
FILING
FILING
FILING
FINAL POLISH
FINAL POLISH
FINAL POLISH
The Output for the same should be
Bags
Casting
Filing
Final Polish
Dispatched
20/F/43347/1
3days
3 days
Yes
20/F/43347/2
3days
3 days
Yes
20/F/43347/3
2 days
3days
3 days
Yes
Background
Until very recently this process was manual so once this spreadsheet was made, it would be divided among 3 people and they would manually scan, highlight and proceed
Tried a countif condition, row wise but that again reduces 365 columns to 12 columns and leaves behind lots of unnecessary values, (if its in a station for only 1 day need not be highlighted)
Tried Pivot but did not give a summary that makes sense.
VBA is not my strong suite haven't tried anything there.
I am looking for something that will help make sense to this and highlight if any product is stuck anywhere.
Hi all, to answer all queries,
#braX I have tried countif with the department names, but the resulting table is unwieldy for my requirement. am looking for ideas to solve this
#DavidWooley-AST there are total of 12 departments, and the data is kept for an entire year, a primary key can go through each department in 45 days or more.
Also there is a chance that incase of any rework then there is a revisit to the department. thus that data also has to be captured, sorry I should have mentioned this before.
You can create the output you show using Power Query, available in Windows Excel 2010+ and Office 365.
The below should get you started.
You will have to add some lines in the Table.Group Aggregation list for other tasks.
You may also need to add code to exclude non-repeats and after "Dispatched" but you showed no examples of that in your data or results, so I did not code anything for that.
I also don't know what you mean by "highlight if any product is stuck anywhere".
To use Power Query
Select some cell in your Data Table
Data => Get&Transform => from Table/Range
When the PQ Editor opens: Home => Advanced Editor
Make note of the Table Name in Line 2
Paste the M Code below in place of what you see
Change the Table name in line 2 back to what was generated originally.
Read the comments and explore the Applied Steps to understand the algorithm
M Code
let
//Replace table name in next line with the "real" table name in your workbook
Source = Excel.CurrentWorkbook(){[Name="Table6"]}[Content],
//unpivot all except the "Bag Number" to => a three column table
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"Bag Number"}, "Attribute", "Value"),
//remove unneeded Attribute column (the dates)
#"Removed Columns" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Attribute"}),
//Group by Bag Number
// then extract the Count for each type
// Add " days" to each count
#"Grouped Rows" = Table.Group(#"Removed Columns", {"Bag Number"}, {
{"Filing", (t)=> "Filing " & Text.From(List.Count(List.Select(t[Value],each _ = "FILING"))) & " days"},
{"Final Polish", (t)=> "Final Polish " & Text.From(List.Count(List.Select(t[Value],each _ = "FINAL POLISH"))) & " days"}
}),
//Merge columns with commas (and hyphen for the first to the rest) to get final format
#"Merged Columns" = Table.CombineColumns(#"Grouped Rows",{"Filing", "Final Polish"},
Combiner.CombineTextByDelimiter(", ", QuoteStyle.None),"Merged"),
#"Merged Columns1" = Table.CombineColumns(#"Merged Columns",{"Bag Number", "Merged"},
Combiner.CombineTextByDelimiter(" - ", QuoteStyle.None),"A")
in
#"Merged Columns1"
Edit based on your new example of data and desired output
Given your new example, you can get the output from PQ as shown below.
Note that you can add the other departments using the same syntax as shown for those done (except for Dispatched which is treated differently).
M Code
let
//Replace table name in next line with the "real" table name in your workbook
Source = Excel.CurrentWorkbook(){[Name="Table6"]}[Content],
//unpivot all except the "Bag Number" to => a three column table
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"Bag Number"}, "Attribute", "Value"),
//remove unneeded Attribute column (the dates)
#"Removed Columns" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Attribute"}),
//Change to proper case for consistency and text matching
properCase = Table.TransformColumns(#"Removed Columns",{{"Value", Text.Proper, type text}}),
//Group by Bag Number
// then extract the Count for each type
// Show null if count < 2
// Add " days" to each count
// Show only `Dispatched` if it occurrs one or more times
#"Grouped Rows" = Table.Group(properCase, {"Bag Number"}, {
{"Casting", (t)=>
let
x =List.Count(List.Select(t[Value], each _ = "Casting"))
in
if x < 2 then null else Number.ToText(x) & " days", type text},
{"Laser Cutting", (t)=>
let
x =List.Count(List.Select(t[Value], each _ = "Laser Cutting"))
in
if x < 2 then null else Number.ToText(x) & " days", type text},
{"Filing", (t)=>
let
x =List.Count(List.Select(t[Value], each _ = "Filing"))
in
if x < 2 then null else Number.ToText(x) & " days", type text},
{"Final Polish", (t)=>
let
x =List.Count(List.Select(t[Value], each _ = "Final Polish"))
in
if x < 2 then null else Number.ToText(x) & " days", type text},
{"QC", (t)=>
let
x =List.Count(List.Select(t[Value], each _ = "Qc"))
in
if x < 2 then null else Number.ToText(x) & " days", type text},
{"Dispatched", (t)=>
let
x =List.Count(List.Select(t[Value], each _ = "Dispatched"))
in
if x = 0 then null else "Dispatched", type text}
})
in
#"Grouped Rows"

I have 3 time periods in excel - I need to know the duration of the longest continuous period

Please help!
Ideally, I would really like to solve this using formulas only - not VBA or anything I consider 'fancy'.
I work for a program that awards bonuses for continuous engagement. We have three (sometimes more) engagement time periods that could overlap and/or could have spaces of no engagement. The magic figure is 84 days of continuous engagement. We have been manually reviewing each line (hundreds of lines) to see if the time periods add up to 84 days of continuous engagement, with no periods of inactivity.
In the link there is a pic of a summary of what we work with. Row 3 for example, doesn't have 84 days in any of the 3 time periods, but the first 2 time periods combined includes 120 consecutive days. The dates will not appear in date order - e.g. early engagements may be listed in period 3.
Really looking forward to your advice.
Annie
#TomSharpe has shown you a method of solving this with formulas. You would have to modify it if you had more than three time periods.
Not sure if you would consider a Power Query solution to be "too fancy", but it does allow for an unlimited number of time periods, laid out as you show in the sample.
With PQ, we
construct lists of all the consecutive dates for each pair of start/end
combine the lists for each row, removing the duplicates
apply a gap and island technique to the resulting date lists for each row
count the number of entries for each "island" and return the maximum
Please note: I counted both the start and the end date. In your days columns, you did not (except for one instance). If you want to count both, leave the code as is; if you don't we can make a minor modification
To use Power Query
Create a table which excludes that first row of merged cells
Rename the table columns in the format I show in the screenshot, since each column header in a table must have a different name.
Select some cell in that Data Table
Data => Get&Transform => from Table/Range
When the PQ Editor opens: Home => Advanced Editor
Make note of the Table Name in Line 2
Paste the M Code below in place of what you see
Change the Table name in line 2 back to what was generated originally.
Read the comments and explore the Applied Steps to better understand the algorithm
M Code
code edited to Sort the date lists to handle certain cases
let
Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Start P1", type datetime}, {"Comment1", type text}, {"End P1", type datetime}, {"Days 1", Int64.Type}, {"Start P2", type datetime}, {"Comment2", type text}, {"End P2", type datetime}, {"Days 2", Int64.Type}, {"Start P3", type datetime}, {"Comment3", type text}, {"End P3", type datetime}, {"Days 3", Int64.Type}}),
//set data types for columns 1/5/9... and 3/7/11/... as date
dtTypes = List.Transform(List.Alternate(Table.ColumnNames(#"Changed Type"),1,1,1), each {_,Date.Type}),
typed = Table.TransformColumnTypes(#"Changed Type",dtTypes),
//add Index column to define row numbers
rowNums = Table.AddIndexColumn(typed,"rowNum",0,1),
//Unpivot except for rowNum column
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(rowNums, {"rowNum"}, "Attribute", "Value"),
//split the attribute column to filter on Start/End => just the dates
//then filter and remove the attributes columns
#"Split Column by Delimiter" = Table.SplitColumn(#"Unpivoted Other Columns", "Attribute", Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.Csv, false), {"Attribute.1", "Attribute.2"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Attribute.1", type text}, {"Attribute.2", type text}}),
#"Removed Columns" = Table.RemoveColumns(#"Changed Type1",{"Attribute.2"}),
#"Filtered Rows" = Table.SelectRows(#"Removed Columns", each ([Attribute.1] = "End" or [Attribute.1] = "Start")),
#"Removed Columns1" = Table.RemoveColumns(#"Filtered Rows",{"Attribute.1"}),
#"Changed Type2" = Table.TransformColumnTypes(#"Removed Columns1",{{"Value", type date}, {"rowNum", Int64.Type}}),
//group by row number
//generate date list from each pair of dates
//combine into a single list of dates with no overlapped date ranges for each row
#"Grouped Rows" = Table.Group(#"Changed Type2", {"rowNum"}, {
{"dateList", (t)=> List.Sort(
List.Distinct(
List.Combine(
List.Generate(
()=>[dtList=List.Dates(
t[Value]{0},
Duration.TotalDays(t[Value]{1}-t[Value]{0})+1 ,
#duration(1,0,0,0)),idx=0],
each [idx] < Table.RowCount(t),
each [dtList=List.Dates(
t[Value]{[idx]+2},
Duration.TotalDays(t[Value]{[idx]+3}-t[Value]{[idx]+2})+1,
#duration(1,0,0,0)),
idx=[idx]+2],
each [dtList]))))}
}),
//determine Islands and Gaps
#"Expanded dateList" = Table.ExpandListColumn(#"Grouped Rows", "dateList"),
//Duplicate the date column and turn it into integers
#"Duplicated Column" = Table.DuplicateColumn(#"Expanded dateList", "dateList", "dateList - Copy"),
#"Changed Type3" = Table.TransformColumnTypes(#"Duplicated Column",{{"dateList - Copy", Int64.Type}}),
//add an Index column
//Then subtract the index from the integer date
// if the dates are consecutive the resultant ID column will => the same value, else it will jump
#"Added Index" = Table.AddIndexColumn(#"Changed Type3", "Index", 0, 1, Int64.Type),
#"Added Custom" = Table.AddColumn(#"Added Index", "ID", each [#"dateList - Copy"]-[Index]),
#"Removed Columns2" = Table.RemoveColumns(#"Added Custom",{"dateList - Copy", "Index"}),
//Group by the date ID column and a Count will => the consecutive days
#"Grouped Rows1" = Table.Group(#"Removed Columns2", {"rowNum", "ID"}, {{"Count", each Table.RowCount(_), Int64.Type}}),
#"Removed Columns3" = Table.RemoveColumns(#"Grouped Rows1",{"ID"}),
//Group by the Row number and return the Maximum Consecutive days
#"Grouped Rows2" = Table.Group(#"Removed Columns3", {"rowNum"}, {{"Max Consecutive Days", each List.Max([Count]), type number}}),
//combine the Consecutive Days column with original table
result = Table.Join(rowNums,"rowNum",#"Grouped Rows2","rowNum"),
#"Removed Columns4" = Table.RemoveColumns(result,{"rowNum"})
in
#"Removed Columns4"
Unfortunately Gap and Island seems to be a non-starter, because I don't think you can use it without either VBA or a lot of helper columns, plus the start dates need to be in order. It's a pity, because the longest continuous time on task (AKA largest island) drops out of the VBA version very easily and arguably it's easier to understand than the array formula versions below see this.
Moving on to option 2, if you have Excel 365, you can Use Sequence to generate a list of dates in a certain range, then check that each of them falls in one of the periods of engagement like this:
=LET(array,SEQUENCE(Z$2-Z$1+1,1,Z$1),
period1,(array>=A3)*(array<=C3),
period2,(array>=E3)*(array<=G3),
period3,(array>=I3)*(array<=K3),
SUM(--(period1+period2+period3>0)))
assuming that Z1 and Z2 contain the start and end of the range of dates that you're interested in (I've used 1/1/21 and 31/7/21).
If you don't have Excel 365, you can used the Row function to generate the list of dates instead. I suggest using the Name Manager to create a named range Dates:
=INDEX(Sheet1!$A:$A,Sheet1!$Z$1):INDEX(Sheet1!$A:$A,Sheet1!$Z$2)
Then the formula is:
= SUM(--(((ROW(Dates)>=A3) * (ROW(Dates)<=C3) +( ROW(Dates)>=E3) * (ROW(Dates)<=G3) + (ROW(Dates)>=I3) * (ROW(Dates)<=K3))>0))
You will probably have to enter this using CtrlShiftEnter or use Sumproduct instead of Sum.
EDIT
As #Qualia has perceptively noted, you want the longest time of continuous engagement. This can be found by applying Frequency to the first formula:
=LET(array,SEQUENCE(Z$2-Z$1+1,1,Z$1),
period1,(array>=A3)*(array<=C3),
period2,(array>=E3)*(array<=G3),
period3,(array>=I3)*(array<=K3),
onDays,period1+period2+period3>0,
MAX(FREQUENCY(IF(onDays,array),IF(NOT(onDays),array)))
)
and the non_365 version becomes
=MAX(FREQUENCY(IF((ROW(Dates)>=A3)*(ROW(Dates)<=C3)+(ROW(Dates)>=E3)*(ROW(Dates)<=G3)+(ROW(Dates)>=I3)*(ROW(Dates)<=K3),ROW(Dates)),
IF( NOT( (ROW(Dates)>=A3)*(ROW(Dates)<=C3)+(ROW(Dates)>=E3)*(ROW(Dates)<=G3)+(ROW(Dates)>=I3)*(ROW(Dates)<=K3) ),ROW(Dates))))

How to get mutiple substrings in Microsoft Excel Cell

I'm trying get from a cell just the value of the 'id' tag separated by ';'.
The data is as follows:
Cell:
A1: {"id":1145585,"label":"1145585: Project Z"}
A2: {"id":1150322,"label":"1150322: Project Waka 1"}|{"id":1150365,"label":"1150365: Project Waka 2"}
A3: {"id":1149240,"label":"1149240: Analysis of Technical Options"}|{"id":1149258,"label":"1149258: Check and Report"}
A4: {"id":1148925,"label":"1148925: Change Management Review"}|{"id":1148920,"label":"1148920: Follow-Up Meetings"}|{"id":1148923,"label":"1148923: Launch Date Definition"}
I have tried to use left, mid and find functions, however the number of 'IDs' can vary from 1 to 1000. I'm also trying to avoid using vba, but it seems to be the only option. So any solution is great!
The result should be:
Cell:
A1: 1145585
A2: 1150322;1150365
A3: 1149240;1149258
A4: 1148925;1148920;1148923
Any ideas?
Thanks!
Sounds like a task for #powerquery. Please refer to this article to find out how to use Power Query on your version of Excel. It is availeble in Excel 2010 Professional Plus and later versions. My demonstration is using Excel 2016.
The steps are:
Load the source data to power query editor which should look like the following:
Use Index Column function under the Add Column tab to add an Index column;
Use Split Column function under the Transform tab to split the column by custom delimiter "id": and put the results into Rows as shown below:
Use Extract function under the Transform tab to extract the first 7 characters of the column;
Change the Data Type to Whole Number, remove Errors, and then change the Data Type back to Text;
Use Group By function under the Transform tab to group Column1 by Index as set out below. Don't panic if the result is in error as it is expected.
Go back to last step and replace the original formula in the formula bar with the following one as Text.Combine is not a built-in function:
= Table.Group(#"Changed Type3", {"Index"}, {{"Sum", each Text.Combine([Column1],";"), type text}})
Close & Load the output to a new worksheet (by default), and you should have the following:
Here are the Power Query M codes behind the scene. Most of the steps are performed using built-in functions except the last step of manually replacing the formula with the correct one. Let me know if you have any questions. Cheers :)
let
Source = Excel.CurrentWorkbook(){[Name="Table10"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Added Index", {{"Column1", Splitter.SplitTextByDelimiter("""id"":", QuoteStyle.None), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Column1"),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column1", type text}}),
#"Extracted First Characters" = Table.TransformColumns(#"Changed Type1", {{"Column1", each Text.Start(_, 7), type text}}),
#"Changed Type2" = Table.TransformColumnTypes(#"Extracted First Characters",{{"Column1", Int64.Type}}),
#"Removed Errors" = Table.RemoveRowsWithErrors(#"Changed Type2", {"Column1"}),
#"Changed Type3" = Table.TransformColumnTypes(#"Removed Errors",{{"Column1", type text}}),
#"Grouped Rows" = Table.Group(#"Changed Type3", {"Index"}, {{"Sum", each Text.Combine([Column1],";"), type text}})
in
#"Grouped Rows"
Based on #TerryW comment, here is a solution using the FILTERXML function available in Excel 2013+. But it also requires TEXTJOIN which did not appear until later versions of Excel 2016 (and office 365)
It relies on the fact that the id string is always followed by a comma.
A disadvantage is that FILTERXML will return the numeric id's as numeric values. So leading zero's will be dropped. If there are always a fixed number of digits in the id and leading zero's need to be present, this can be mitigated by using the TEXT function.
We construct an xml by dividing both on id and on comma
We then use an xpath to return the node which follows the node that contains id
=TEXTJOIN(";",TRUE,FILTERXML("<t><s>" & SUBSTITUTE(SUBSTITUTE(A1,"""id"":",",id,"),",","</s><s>")&"</s></t>","//s[text()='id']/following-sibling::*[1]"))
Since this is an array formula, you need to "confirm" it by holding down ctrl + shift while hitting enter. If you do this correctly, Excel will place braces {...} around the formula as observed in the formula bar
Source
Results

Power Query Adding a row that sums up previous columns

I'm trying to create a query that sums up a column of values and puts the sum as a new row in the same table. I know I can do this using the group function but it doesn't do it exactly as I need it to do. I'm trying to create an accounting Journal Entry and I need to calculate the offsetting for a long list of debits. I know this is accountant talk. Here's a sample of the table I am using.
Date GL Num GL Name Location Amount
1/31 8000 Payroll Office 7000.00
1/31 8000 Payroll Remote 1750.00
1/31 8000 Payroll City 1800.00
1/31 8010 Taxes Office 600.00
1/31 8010 Taxes Remote 225.00
1/31 8010 Taxes City 240.00
1/31 3000 Accrual All (This needs to be the negative sum of all other rows)
I have been using the Group By functions and grouping by Date with the result being the sum of Amount but that eliminates the previous rows and the four columns except Date. I need to keep all rows and columns, putting the sum in the same Amount column if possible. If the sum has to be in a new column, I can work with that as long as the other columns and rows remain. I also need to enter the GL Num, GL Name, and Location values for this sum row. These three values will not change. They will always be 3000, Accrual, All. The date will change based upon the date used in the actual data. I would prefer to do this all in Power Query (Get & Transform) if possible. I can do it via VBA but I'm trying to make this effortless for others to use.
What you can do it calculate the accrual rows in a separate query and then append them.
Duplicate your query.
Group by Date and sum over Amount. This should return the following:
Date Amount
1/31 11615
Multiply your Amount column by -1. (Transform > Standard > Multiply)
Add custom columns for GL Num, GL Name and Location with the fixed values you choose.
Date Amount GL Num GL Name Location
1/31 11615 3000 Accrual All
Append this table to your original. (Home > Append Queries.)
You can also roll this all up into a single query like this:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
OriginalTable = Table.TransformColumnTypes(Source,{{"Date", type date}, {"GL Num", Int64.Type}, {"GL Name", type text}, {"Location", type text}, {"Amount", Int64.Type}}),
#"Grouped Rows" = Table.Group(OriginalTable, {"Date"}, {{"Amount", each List.Sum([Amount]), type number}}),
#"Multiplied Column" = Table.TransformColumns(#"Grouped Rows", {{"Amount", each _ * -1, type number}}),
#"Added Custom" = Table.AddColumn(#"Multiplied Column", "GL Num", each 3000),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "GL Name", each "Accrual"),
#"Added Custom2" = Table.AddColumn(#"Added Custom1", "Location", each "All"),
#"Appended Query" = Table.Combine({OriginalTable, #"Added Custom2"})
in
#"Appended Query"
Note that we are appending the last step with an earlier step in the query instead of referencing a different query.

Row totals based on column name in PowerQuery

I have a data file with around 400 columns in it. I need to import this data into PowerPivot. In order to reduce my file size, I would like to use PowerQuery to create 2 different row totals, and then delete all my unneeded columns upon load.
While my first row total column (RowTotal1) would summate all 400 columns, I would also like a second row total (RowTotal2) that subtracts from RowTotal1 any column whose name contains the text "click" in it.
Secondly, I would like to use the the value in my Country column as a variable, to also subtract any column that contains this var. e.g.
Site----Country----Col1----Col2----ClickCol1----Col3----Germany----RowTotal1----RowTotal2
1a--------USA----------2---------4-----------8------------16----------24--------------54---------------46-------
2a-----Germany-------2---------4-----------8------------16----------24--------------54---------------22-------
RowTotal1 = 2 + 4 + 8 + 16 + 24
RowTotal2 (first row) = 54 - 8 (ClickCol1)
RowTotal2 (second row) = 54 - 24 (Germany) - 8 (ClickCol1)
Is this possible? (EDIT: Yes. See answer below)
REVISED QUESTION: Is there a more memory efficient way to do than trying to group 300+ million rows at once?
Code would look something like this:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Site", type text}, {"Country", type text}, {"Col1", Int64.Type}, {"Col2", Int64.Type}, {"ClickCol1", Int64.Type}, {"Col3", Int64.Type}, {"Germany", Int64.Type}}),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type", {"Country", "Site"}, "Attribute", "Value"),
#"Added Conditional Column" = Table.AddColumn(#"Unpivoted Other Columns", "Value2", each if [Country] = [Attribute] or [Attribute] = "ClickCol1" then 0 else [Value] ),
#"Grouped Rows" = Table.Group(#"Added Conditional Column", {"Site", "Country"}, {{"RowTotal1", each List.Sum([Value]), type number},{"RowTotal2", each List.Sum([Value2]), type number}})
in
#"Grouped Rows"
But since you have a lot of columns, I should explain the steps:
(Assuming you have these in Excel file) Import them to Power Query
Select "Site" and "Country" columns (with Ctrl), right click > Unpivot Other Columns
Add Column with this formula (you might need to use Advanced Editor): Table.AddColumn(#"Unpivoted Other Columns", "Value2", each if [Country] = [Attribute] or [Attribute] = "ClickCol1" then 0 else [Value])
Select Site and Country columns, Right Click > Group By
Make it look like this:

Resources