I have a spreadsheet that has several columns. I'm only going to show data from 2 of them here, because they're the 2 that I'm dealing with in this problem.
The first column is IP Addresses. The second column is how long ago the last response was or the last response date:
Address
Last Response
10.1.1.109
10/17/2022
10.1.1.113
10/17/2022
10.1.1.137
10/17/2022
10.1.1.188
4 days
10.1.1.199
10/17/2022
10.1.21.5
10/17/2022
10.1.21.50
45 days
10.1.50.41
10/17/2022
10.1.50.71
10/17/2022
10.1.88.10
10/17/2022
10.1.88.249
6 days
10.16.6.190
4 days
10.64.0.76
28 days
10.64.3.48
45 days
What I need to do is to get a few counts worked out. I want to know how many from each IP subnet have
A response older than 1 week
A Response older then 1 month.
In the sample data you can see 3 IP subnets: 10.4, 10.16, and 10.64. I am expecting to get results like:
IP Subnet
> Week
> Month
10.1
9
1
10.16
0
0
10.64
2
1
I have a formula for the "> Week", but I don't like it. I am not able to figure out how to count based on the number at the beginning of the text in that column. I tried a formula like this:
=COUNTIFS(AllIPAddresses,"10.1.*",AllResponses, NUMBERVALUE(LEFT(AllResponses, FIND(" ",AllResponses)))&">7")
Obviously this doesn't work. It gives me a column full of 0's.
What I have working for the "> Week" column:
=COUNTIFS(AllIPAddresses,CONCAT(A2,"*"),AllResponses,"<>7 days",AllResponses,"<>6 days",AllResponses,"<>5 days",AllResponses,"<>4 days",AllResponses,"<>3 days",AllResponses,"<>2 days",AllResponses,"<>Today",AllResponses,"<>Yesterday")
But like I said, I don't like it as it is just looking at the column and not counting 8 of the options. I would prefer if I could have a way to get it to look at the column and count those whose number of days is > 7. Something simple would be great, but something that is shorter and/or simpler than what I have I'll take. And I cannot reuse that effectively for the "> month" result because then I'd have to list some 30 different options that I don't want to count.
It would be better to have it look for the 1 option that I do want.
I'm hoping for something like:
First COUNTIFS counts all the text that have a number > 7
Second COUNTIFS counts all the dates that are more than 7 days before today
=COUNTIFS(AllIPAddresses, CONCAT(A2,"*"),AllResponses, LEFT(AllResponse,2)&">7")+COUNTIFS(AllIPAddresses, CONCAT(A2,"*"),AllResponses,"<"&today()-7)
And then I can reuse this for the "> month" by changing the 7 to a 30.
Though I know this formula doesn't work.
Any assistance with this problem would be appreciated!
Some Notes about my formulas
For ease of use I have named ranges:
AllIPAddresses = A2:A700
AllResponses = B2:B700
(in my formula for > week) A2 is referring to the "10.1." so that the CONCAT will give the result of "10.1.*" to the COUNTIFS
EDIT
I have added an answer that explains why I chose the solution that I did and how I had to tweak the answers I received to make them work for my specific scenario.
This could be accomplished using:
=LET(data,A2:B15,
_d1,INDEX(data,,1),
_d2,INDEX(data,,2),
lr,TODAY()-IF(ISNUMBER(_d2),_d2,TODAY()-(LEFT(_d2,LEN(_d2)-LEN(" days")))),
lft,TEXTBEFORE(_d1,".",2),
unq,UNIQUE(lft),
sq,SEQUENCE(COUNTA(lft),,1,0),
mm,--(TRANSPOSE(unq)=lft),
wk,MMULT(--(TRANSPOSE(lft)=unq)*TRANSPOSE(lr>7),sq),
mn,MMULT(--(TRANSPOSE(lft)=unq)*TRANSPOSE(lr>30),sq),
stack,HSTACK(unq,wk,mn),
VSTACK({"IP Subnet","> Week","> Month"},stack))
lft uses TEXTBEFORE to get the first 2 sections of the IP address.
lr calculates the number of days of the last response compared to today.
unq is the unique values of lft (IP subnet).
wk uses MMULT to calculate the conditional count of unique IP subnet values where lr is greater than 7.
mn is the same as wk, but where lr is greater than 30.
Perhaps you could try:
• Formula used in cell D2
=LET(_ipaddress,TEXTBEFORE(Address,".",2),
_days,IFERROR(TODAY()-ISNUMBER(Last_Response)*Last_Response,
SUBSTITUTE(Last_Response," days","")+0),
_uip,UNIQUE(_ipaddress),
_week,BYROW(_uip,LAMBDA(x,SUM(--(x=_ipaddress)*(_days>7)))),
_month,BYROW(_uip,LAMBDA(x,SUM(--(x=_ipaddress)*(_days>30)))),
VSTACK({"IP Subnet","> Week","> Month"},HSTACK(_uip,_week,_month)))
Explanations of each named variables used to make the calculations:
• Address --> is a defined name range and refers to
=$A$2:$A$15
• Last_Response --> is a defined name range and refers to
=$B$2:$B$15
• _ipaddress --> extracts the IP Subnet using TEXTBEFORE()
TEXTBEFORE(Address,".",2)
• _days checks whether the range is a number or text, since in Excel dates are stored as a number, we are using ISNUMBER() to check which returns TRUE and FALSE for text values,
So that said, the first part of the IFERROR() checks and returns the number of days,
TODAY()-ISNUMBER(Last_Response)*Last_Response
While the second part which is a text value, we are only substituting the " days" with empty and converting it to a number as well.
SUBSTITUTE(Last_Response," days","")+0
=IFERROR(TODAY()-ISNUMBER(Last_Response)*Last_Response,
SUBSTITUTE(Last_Response," days","")+0)
• _uip --> This gives us the Unique IP SUBNET
UNIQUE(_ipaddress)
• _week --> this gives us the count for each unique values row wise and returns as an array of output, for those days which are greater than 7 days.
BYROW(_uip,LAMBDA(x,SUM(--(x=_ipaddress)*(_days>7))))
• _month --> while this gives us the count for each unique values row wise and returns as an array of output, for those days which are greater than 30 days.
BYROW(_uip,LAMBDA(x,SUM(--(x=_ipaddress)*(_days>30))))
• Last but not least, we are packing all the variables that are needed to show as an output with in an HSTACK()
HSTACK(_uip,_week,_month)
To make it look good with a proper header we are wrapping it within an VSTACK(), along with the headers, [1x3] array
Well, you can also perform such tasks quite easily using Power Query as well:
To accomplish this task using Power Query please follow the steps,
• Select some cell in your Data Table,
• Data Tab => Get&Transform => From Table/Range,
• When the PQ Editor opens: Home => Advanced Editor,
• Make note of all the 2 Tables Names,
• Paste the M Code below in place of what you see.
• And refer the notes
let
//IPAddresstbl Uploaded in PQ Editor,
Source = Excel.CurrentWorkbook(){[Name="IPAddresstbl"]}[Content],
//Date Type Changed
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Last Response", type text}}),
//Extracting the IP SUBNET
#"Extracted Text Before Delimiter" = Table.TransformColumns(#"Changed Type", {{"Address", each Text.BeforeDelimiter(_, ".", 1), type text}}),
//Replacing " days" in last response column
#"Replaced Value" = Table.ReplaceValue(#"Extracted Text Before Delimiter"," days","",Replacer.ReplaceText,{"Last Response"}),
//Removing " 12:00:00 AM" from Date Time since we changed the data type of lastresponse as text
#"Extracted Text Before Delimiter1" = Table.TransformColumns(#"Replaced Value", {{"Last Response", each Text.BeforeDelimiter(_, " 12:00:00 AM"), type text}}),
//Adding custom column return the numbers of days
#"Added Custom" = Table.AddColumn(#"Extracted Text Before Delimiter1", "Custom", each if Value.Is(Value.FromText([Last Response]), type number) then [Last Response] else Date.From(DateTime.LocalNow()) - Value.FromText([Last Response])),
//Changing the data type of the custom column to ensure they are numbers
#"Changed Type1" = Table.TransformColumnTypes(#"Added Custom",{{"Custom", Int64.Type}}),
//Removing unwanted columns
#"Removed Columns" = Table.RemoveColumns(#"Changed Type1",{"Last Response"}),
//Returning 1 for those days which are more than 7 else returning as 0
#"Added Conditional Column" = Table.AddColumn(#"Removed Columns", "> Week", each if [Custom] > 7 then 1 else 0),
//Returning 1 for those days which are more than 30 else returning as 0
#"Added Conditional Column1" = Table.AddColumn(#"Added Conditional Column", "> Month", each if [Custom] > 30 then 1 else 0),
//Grouping by each IP Address
#"Grouped Rows" = Table.Group(#"Added Conditional Column1", {"Address"}, {{"> Week", each List.Sum([#"> Week"]), type nullable number}, {"> Month", each List.Sum([#"> Month"]), type nullable number}}),
//Renamed the IP Address Column
#"Renamed Columns" = Table.RenameColumns(#"Grouped Rows",{{"Address", "IP SUBNET"}})
in
#"Renamed Columns"
• Change the Table name as SUBNETtbl before importing it back into Excel.
• When importing, you can either select Existing Sheet with the cell reference you want to place the table or you can simply click on NewSheet
Answer chosen information
I choose the answer by P.b because I was able to get it to work for me first. I did have to tweak the code he provided like this:
=LET(data,AllData, _d1,INDEX(data,,1)
, _d2, INDEX(data,,4)
, lr, TODAY()-IF(ISNUMBER(_d2),_d2,TODAY()-(IF(_d2="Today","0",IF(_d2="Yesterday","1",LEFT(_d2,LEN(_d2)-LEN(" days"))))))
, lft, TEXTBEFORE(_d1,".",2)
, unq, UNIQUE(lft)
, sq, SEQUENCE(COUNTA(lft),,1,0)
, mm, --(TRANSPOSE(unq)=lft)
, wk, MMULT(--(TRANSPOSE(lft)=unq)*TRANSPOSE(lr>7),sq)
, mn, MMULT(--(TRANSPOSE(lft)=unq)*TRANSPOSE(lr>30),sq)
, stack, HSTACK(unq, wk, mn)
, VSTACK({"IP Subnet", "> Week", "> Month"}, stack))
This is what I was able to get working for me. It gives me the list of IP Subnets and the counts of how many of the IPs' "Last Response" was longer than 7 days or 30 days.
I have also used the Power Query example provided by Mayukh Bhattacharya. I was able to get this working as well. I did not test out the "let" formula that he provided as I already have a "let" formula working. I didn't chose that answer as the solution only because I was able to get the other "let" formula working first. I did have to tweak this answer as well in Power Query to look like this:
let
//AllData Uploaded in PQ Editor,
Source = Excel.CurrentWorkbook(){[Name="AllData"]}[Content],
//Date Type Changed
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Last Response", type text}}),
//Extracting the IP SUBNET
#"Extracted Text Before Delimiter" = Table.TransformColumns(#"Changed Type", {{"Address", each Text.BeforeDelimiter(_, ".", 1), type text}}),
//Replacing " days" in last response column
#"Replaced Value 1" = Table.ReplaceValue(#"Extracted Text Before Delimiter"," days","",Replacer.ReplaceText,{"Last Response"}),
//Replacing the "Today" and "Yesterday" with numbers
#"Replaced Value 2" = Table.ReplaceValue(#"Replaced Value 1", "Yesterday", "1", Replacer.ReplaceText, {"Last Response"}),
#"Replaced Value 3" = Table.ReplaceValue(#"Replaced Value 2", "Today", "0", Replacer.ReplaceText, {"Last Response"}),
//Removing " 12:00:00 AM" from Date Time since we changed the data type of lastresponse as text
#"Extracted Text Before Delimiter1" = Table.TransformColumns(#"Replaced Value 3", {{"Last Response", each Text.BeforeDelimiter(_, " 12:00:00 AM"), type text}}),
//Adding custom column return the numbers of days
#"Added Custom" = Table.AddColumn(#"Extracted Text Before Delimiter1", "Custom", each if Value.Is(Value.FromText([Last Response]), type number) then [Last Response] else Date.From(DateTime.LocalNow()) - Value.FromText([Last Response])),
//Changing the data type of the custom column to ensure they are numbers
#"Changed Type1" = Table.TransformColumnTypes(#"Added Custom",{{"Custom", Int64.Type}}),
//Removing unwanted columns
#"Removed Columns" = Table.RemoveColumns(#"Changed Type1",{"Last Response"}),
//Returning 1 for those days which are more than 7 else returning as 0
#"Added Conditional Column" = Table.AddColumn(#"Removed Columns", "> Week", each if [Custom] > 7 then 1 else 0),
//Returning 1 for those days which are more than 30 else returning as 0
#"Added Conditional Column1" = Table.AddColumn(#"Added Conditional Column", "> Month", each if [Custom] > 30 then 1 else 0),
//Grouping by each IP Address
#"Grouped Rows" = Table.Group(#"Added Conditional Column1", {"Address"}, {{"> Week", each List.Sum([#"> Week"]), type nullable number}, {"> Month", each List.Sum([#"> Month"]), type nullable number}}),
//Renamed the IP Address Column
#"Renamed Columns" = Table.RenameColumns(#"Grouped Rows",{{"Address", "IP Subnet"}})
in
#"Renamed Columns"
Now that I have started using Power Query I will probably continue with it as I have more transformations on this data to work on.
Using new 365 functions, generate unique IPs value, and count for >7 and >30 days.
If date retrieve differences, otherwise extract days number:
=LET(IP,TEXTBEFORE(AllIPAddresses,".",2),U,UNIQUE(IP),S,SEQUENCE(COUNTA(AllIPAddresses),,1,0),D,IFERROR(TODAY()-AllResponses,TEXTBEFORE(AllResponses," ")*1),W,MMULT((TRANSPOSE(IP)=U)*(TRANSPOSE(D)>7),S),M,MMULT((TRANSPOSE(IP)=U)*(TRANSPOSE(D)>30),S),HSTACK(U,W,M))
with HSTACK combine columns results.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I keep trying ways to fix a dataset, but keep running into problems because of how inconsistent it is.
Here's what the data looks like:
Entry1
Age
45
Occupation
Scientist
Phone Number
408-283-3721
User I.D.
390842
Housing Type
Condo
Square Footage
1073.29
Floors
2
Bathrooms
2.5
Budget Max
$289,287
Household Size
3
Pets?
Yes
Entry2
Floors
2
Square Footage
1974.19
User I.D.
379733
Phone Number
312-246-9121
Pets?
No
Budget Max
$481,621
Household Size
4
Bathrooms
3
Housing Type
Apartment
Occupation
Pilot
Age
32
Entry3
User I.D.
379621
Floors
1
Square Footage
1223.12
Pets?
No
Occupation
Managing Director
Budget Max
$402,342
Phone Number
714-343-1358
Household Size
2
Age
31
Bathrooms
2
Housing Type
House
I want to create a new, cleaned dataset with headers along the top (e.g. "Age", "Occupation", etc) and the values associated (to the right of each variable name cell) as the row, underneath each column.
The variable names are all mixed up, not always on the same column or relative row, so it's not only transposing into a clean new dataset but finding the appropriate values depending on where the variable is (so, I'm thinking something like .Cells.Find(What:="the variable name") for each one and somehow returning the value next to it in a loop). Then, there's the issue where some entries have 3 rows and 8 columns and others 4 rows and 6 columns (not all rows being full too). I also struggle with placing the values under the appropriate column header and not replace the former value. (i.e not just changing one cell but adding to the one below and so on)
There are over 400 records like this, so doing it manually would be super tedious. I'm fairly certain these are all the variations though.
Loop through the data row by row.
If only the first column has data it is the header of an entry. Write that to a new workbook column A.
Enrty Name
Entry1
Then go to the next row. If more than 2 columns have data it is a data row to the previous entry. Data rows contain data in blocks of 2 cells, where the first block is the data description and the second cell the data value.
So you need to loop through the columns of the data rows in blocks of 2:
Take the first block which is Age | 45
Check if the column Age exists. Here it does not so we name the next free column Age and fill in the data to the last enty
Enrty Name
Age
Entry1
45
Then we move on to the next block Occupation | Scientist and do the same. Check if a column Occupation exists? No, ok insert next free column:
Enrty Name
Age
Occupation
Entry1
45
Scientist
We do this until the entire row is done, then we move over to the next one and if this is a data row too, we keep going until we find a new entry header.
So after the first entry your data would look like this:
Enrty Name
Age
Occupation
Phone Number
User I.D.
Housing Type
Square Footage
Floors
Bathrooms
Budget Max
Entry1
45
Scientist
408-283-3721
390842
Condo
1073.29
2
2.5
$289,287
Yes
Then you move over to the next entry
Enrty Name
Age
Occupation
Phone Number
User I.D.
Housing Type
Square Footage
Floors
Bathrooms
Budget Max
Entry1
45
Scientist
408-283-3721
390842
Condo
1073.29
2
2.5
$289,287
Yes
Entry2
The first data set here is Floors | 2, so you search in the first row for Floors it is found in column 8. So we write 2 into column 8.
Enrty Name
Age
Occupation
Phone Number
User I.D.
Housing Type
Square Footage
Floors
Bathrooms
Budget Max
Entry1
45
Scientist
408-283-3721
390842
Condo
1073.29
2
2.5
$289,287
Yes
Entry2
2
If you keep that going you have cleaned up data in the end.
If your real data corresponds to your example, where all the parameters are spelled identically, you can do this using Power Query.
If there are variations in your data that this table doesn't show, examples of these variations would be needed to craft a better solution.
Select some cell in your Data Table
Data => Get&Transform => from Table/Range
Select My data does NOT have headers
When the PQ Editor opens: Home => Advanced Editor
Make note of the Table name in Line 2
Paste the M Code below in place of what you see
Change the Table name in line 2 back to what was generated originally.
Read the comments and explore the Applied Steps to understand the algorithm
M Code (Modified to deal with missing Parameter Values)
let
Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
//Add grouping column Entries and Unpaivot
#"Added Custom" = Table.AddColumn(Source, "Entry", each
if Text.StartsWith([Column1],"entry",Comparer.OrdinalIgnoreCase) then [Column1] else null),
#"Filled Down" = Table.FillDown(#"Added Custom",{"Entry"}),
//Remove extra entry rows
remRows = Table.SelectRows(#"Filled Down", each [Entry] <> [Column1]),
//Table.ReplaceValue(#"Removed Columns1"," ",null,Replacer.ReplaceValue,{"Value"})
//Replace nulls with space so we don't lose one item of a "pair"
#"Replaced Value" = Table.ReplaceValue(remRows,null," ",Replacer.ReplaceValue,Table.ColumnNames(remRows)),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Replaced Value", {"Entry"}, "Attribute", "Value"),
#"Removed Columns" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Attribute"}),
#"Added Index" = Table.AddIndexColumn(#"Removed Columns", "Index", 0, 1, Int64.Type),
#"Inserted Integer-Division" = Table.AddColumn(#"Added Index", "Integer-Division", each Number.IntegerDivide([Index], 2), Int64.Type),
#"Removed Columns1" = Table.RemoveColumns(#"Inserted Integer-Division",{"Index"}),
//Group in pairs
//Mark blank subTables
//Extract Entry, Parameter and Value
#"Grouped Rows" = Table.Group(#"Removed Columns1", {"Integer-Division"}, {
{"Empties", each List.NonNullCount(List.ReplaceValue(_[Value]," ", null,Replacer.ReplaceValue))},
{"Entry", each _[Entry]{0}},
{"Parameter", each _[Value]{0}},
{"Value", each _[Value]{1}}
}),
#"Filtered Rows" = Table.SelectRows(#"Grouped Rows", each ([Empties] <> 0)),
#"Removed Columns2" = Table.RemoveColumns(#"Filtered Rows",{"Integer-Division", "Empties"}),
//Group by Entry, Pivot and expand
#"Grouped Rows1" = Table.Group(#"Removed Columns2", {"Entry"}, {
{"Pivot", each Table.Pivot(_, _[Parameter], "Parameter","Value")}
}),
//Column name list for the Pivot table.
//Could sort or order them any way you want
colNames = List.Distinct(#"Removed Columns2"[Parameter]),
#"Expanded Pivot" = Table.ExpandTableColumn(#"Grouped Rows1", "Pivot", colNames,colNames)
in
#"Expanded Pivot"
Original Data
Transformed
I have a table in Power query, which besides other fields has the following key fields:
SKU | Year | Week | Customer | Transaction | Type | Value
As an example, some rows would be:
AB587 | 2019 | 12 | Tom | Purchase | Forecast |200
AB587 | 2019 | 12 | Tom | Sale | Forecast |15
AB587 |2019 | 11 | Tom | Stock | Actual |1455
This is a table with about 300,000 rows with all the SKUs and a couple of year's worth of transactions for all customers, and this gets into a very very useful pivot table that is used extensively. I now need to add something to the data to make the table even more useful.
I have the forecast for purchases and sales for the whole year along with the actuals of course and they follow the above pattern. I also have the stock for all the weeks but only the one in the past i.e. actuals only. I don't have the stock forecast, which is what I want to add. The calculation is as simple as:
Stock from previous week + Purchase forecast from this week - Sale forecast from this week
The end result which I am expecting is that there will now be more rows added which will have as an example:
AB587 |2019 | 12 | Tom | Stock | Forecast |1640
(I am using numbers from above to calculate)
This will now enable me not only to pivot Purchase and Sales but also stock levels which will be game changing.
I would love for anyone to help me with this in Power Query (I have tried a number of methods over weeks but have not cracked it)
To try and solve it myself:
I appended more rows essentially appending Week-1 data for all actual weeks from my source reducing potentially some calculation time. Then I pivoted my "Transaction column" leading to new columns i.e. Purchase, Sale, Stock and Stock-1, which made the Stock forecast calculation easy (that's what it appears to be).
The thing which I did not think about is: this is only good to calculate the first week stock forecast, but then there is no way that I know to use that just calculated stock forecast to calculate the next week's stock forecast.
Basically there is no way to save that stock forecast that I just calculated to be used for the next week's calculation.
I'm not clear on what you are asking when you say you say you want to use the "calculated stock forecast to calculate the next week's stock forecast". If you just want to generate the formula and result you gave as an example as a component of your dataset though, that is pretty simple.
Starting from this as a sample table of your data loaded into PQ that I'm calling "Data Table":
I create two reference queries based off it called StockForecast and CombinedDataTable
In the "StockForecast" query we will add three custom columns. Two are the CalcYear and CalcWeek columns that take "Stock Actual" records and increase the week by one. The third is a CalcValue column that takes "Sale Forecast" records and makes the value in those negative. The code in the editor looks like this:
Source = DataTable,
#"Added Custom" = Table.AddColumn(Source, "CalcYear", each
if [Transaction] = "Stock" and [Type] = "Actual" then
(if [Week] = 52 then [Year] + 1 else [Year])
else [Year]),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "CalcWeek", each
if [Transaction] = "Stock" and [Type] = "Actual" then
(if [Week] = 52 then 1 else [Week] + 1)
else [Week]
),
#"Added Custom2" = Table.AddColumn(#"Added Custom1", "CalcValue", each
if [Transaction] = "Sale" and [Type] = "Forecast" then
[Value] * -1
else [Value]
),
Then you use the Group function and aggregate by Stock, Customer, CalcYear and CalcWeek, with a Sum on the CalcValue function. This gets the Stock Forecast value you are looking for. After that it's just a matter of adding a couple columns for identification and some cleanup.
#"Grouped Rows" = Table.Group(#"Added Custom2", {"Stock", "Customer", "CalcYear", "CalcWeek"}, {{"Value", each List.Sum([CalcValue]), type number}}),
#"Added Custom3" = Table.AddColumn(#"Grouped Rows", "Transaction", each "Stock"),
#"Added Custom4" = Table.AddColumn(#"Added Custom3", "Type", each "Forecast"),
#"Renamed Columns" = Table.RenameColumns(#"Added Custom4",{{"CalcYear", "Year"}, {"CalcWeek", "Week"}})
in
#"Renamed Columns"
Then end result of the data looks like this:
Then just go to the CombinedDataTable query, append the StockForecast query, and you have Stock Forecast values in your dataset.