Categorical column in to a dummy array in Excel - excel

I'm trying to translate a machine-learning model into excel, so that data analysts could play with it interactively.
I'd like to transform a categorical variable into dummy representation:
WeekDay
Monday
Thursday
to
WeekDay
{1,0,0,0,0,0,0}
{0,0,0,1,0,0,0}
Using excel arrays.
I tried this:
={INT(A1="Monday"),INT(A1="Tuesday"),INT(A1="Wednesday"), ...}
However, for some reason, excel doesn't accept forumlas in array expressions.
This approach does work, but it is problematic - since it does not allow combinig multiple arrays into one
=IF(A1="Monday", {1,0,0,0,0,0,0}, IF(A1="Tuesday", {0,1,0,0,0,0,0}, ....))
Also, it's super ugly
Any ideas ?

To get your array you can use INDEX like this:
INDEX(IF(TEXT(ROW($2:$8),"dddd")=A1,1,0),0)
This returns a vertical array.
to return a horizontal array use:
INDEX(IF(TEXT(COLUMN($B:$H),"dddd")=A2,1,0),0)
I have spilled the results of the array in the photo below:
If one has the Dynamic Array Formula SEQUENCE the ROW and COLUMN can be replaced with:
SEQUENCE(7,,2)
and
SEQUENCE(,7,2)
Respectively

I am not sure what the end goal is, but I would suggest instead of an array (again, assuming you are not utilizing VBA, and even if you are, this is very possible with Case functions):
Use the formula =Weekday(Cell,different return type for which day should = 1)
If you were to use the actual date (e.g. 5/1/2020), display it through custom formatting with only the day of the week (typically data analysis will already have the full date),
Cell Value A20 = "5/1/2020", format display Long Date - "Friday, May 1, 2020"
cell referencing with formula '=WEEKDAY(A20,1)' = 6
cell referencing with formula '=WEEKDAY(A20,2)' = 5 (is this similar to {0,0,0,0,1,0,0} enough for you to use?)
if using VBA, you could define a range with logic to turn this into {0,0,0,0,1,0,0}
Hope this helps.

Related

Sumif row values if year matches

I'd like to sum all the yearly totals together from the data below:
I have the following formula in I16
=SUMIFS(I3:CS3,I2:CS2, ">="&DATE(H16,1,1), I2:CS2, "<="&DATE(H16,12,31))
Unfortunately this gives a 0 amount. I thought it was maybe something to do with the date format so I created row 1 which has the formula =YEAR(I2) in I1 and copied across to the right. If I change the SUMIFS to use the date range I1:CS1 I get the same result.
Is there something I should be doing with the formats to make this work or is there something fundamentally wrong with my formula?
Here/screenshots refer:
Per commentary - this is likely due to number formatting in the currency / projected value field (row 3).
Typical case
The 'typical' case will require minimal adjustments (multiplying by 1 should suffice to convert text format to value in such cases):
=SUM((--$E$3#)*(1*C8=--$E$1#))
Notes: modify the following ranges as req. :
E3#: E3:ER3 (projected values [£ values, row 3])
E1#: E1:ER1 (time horizon [years, row 1])
This type of 'format' discrepancy can arise in numerous ways - a common example is when values are copied from somewhere else (HTML/notepad etc.) where source values included an extra space before/after. Another example is when Excel applies formatting (RE: delimiters etc.) from a previous/recent import within the active workbook/session. These are easily trimmed and can be handled using double-negation (e.g. --C3 = 1*C3 for numerical values)
However, if there was an extra space (or empty whitespaces) within the projected 'text' value = then more 'cleaning' may be required.
Final resort
As a thought experiment - I include a 'final-resort' case which strips any non-numerical character out of the cell (regardless of where these occur, printable or otherwise)...per-below.
This is arguably overkill - but may be of interest:
=--TEXTJOIN("",1,LET(x_,MID(CLEAN("'"&E3&"'"),SEQUENCE(1,LEN(CLEAN("'"&E3&"'")),1,1),1),y_,UNICODE(x_),FILTER(x_,--(y_<=57)*(y_>=48))))
The sumifs (typical) can then be applied to row 4 (instead of 3).
This requiresOffice 365 compatible version of Excel (modify as required RE: let & remove 'clean' function/outside wrapper for earlier veresion)

Can Excel arrays be used/indexed like traditional array/list structures?

I'm tracking season results of the Premier League in an Excel spreadsheet and I would like to implement a way to show each team's form (the result of their last 5 games). This seems to be very difficult in Excel for some reason or another.
My idea is to create an array of all results from games involving the team, then display the last 5 items in the array elsewhere in the sheet. In python for example, I could index the array by doing array[-1], but I can't find any equivalent in Excel (doesn't help that "index" is a whole different function in Excel, making Googling this topic very difficult).
TL;DR is there an Excel equivalent (on Excel's arrays) for Python's array[-1]?
EDIT: Say I come up with a formula that results in the array {"Win", "Win", "Win", "Lose", "Draw", "Lose", "Win", "Draw"} (lets call this array 'form'), is there a way to get the last ("Draw"), second last ("Win"), third last ("Loss") etc. value from this array in Excel? Again using Python as an example, this would be done like form[-1], form[-2], form[-3] etc.
EDIT:
Say I want to look at Spurs' form for their last 2 games (let's pretend these 3 games are all they have played). My approach would be to use an array formula to store the 'Result' value in column I in an array from every row containing the value 'Spurs' (found in rows 10, 20, 33 for reference) (still working on the specifics of this part too). So then that array formula would evaluate to {"Spurs", "Draw", "Newcastle United"}. In a different cell I would then like to take the 3rd value of this array ("Newcastle United), and in the next cell I'd like the 2nd value of the array ("Draw") etc. etc.
Given, as you wrote
=INDEX(Form,0,IF(1,N(ROW(INDIRECT(COUNTA(Form)-4&":"&COUNTA(Form))))))
will return an array of the last 5 entries in Form
If you want the last entry:
=INDEX(Form,0,COUNTA(Form))
next to last:
=INDEX(Form,0,COUNTA(Form)-1)
and so forth.

Excel - Return the column which contain three specific value in specific rows

I'm pulling my hair right now trying to achieve that without VBA.
I do not want any VBA and I know It's easy to do with barely more than two lines of vba code.
Here is a simplified template of the type of table I'm working with.
The objective is : With one formula Return the column where the three specified values are matched. The value Im searching for will be somewhere on the sheet where the formula will be.
Just put them as value in the formula if you have one that can achieve that.
Best regards and thanks in advance if anyone help me restore my mental sanity.
Assuming a data setup like this:
This formula is in cell F1:
=SUMPRODUCT((B1:C1&B2:C2&B3:C3=F2&F3&F4)*(COLUMN(B1:C1)))
Adjust ranges to suit your actual data
Explanation:
It combines values of the columns' rows into a single string (so in this example, it would be {"KPI ADATEDATA TYPE","KPI BDATEDATA TYPE"})
It then compares those results to a combined string of what you're looking for: "KPI ADATEDATA TYPE" converts results to TRUE/FALSE so you end up with {TRUE,FALSE} (because the first combined string with KPI A matches)
Then it gets all possible column numbers of the results: {2,3} in this case for columns B and C
The multiplication then converts the TRUE/FALSE's into 1s and 0s respectively so you end up with {1,0}*{2,3}
Because there can assumably only be a single match, this results in the correct column number being the only value to be multiplied by 1, so results are {2,0}
The SUMPRODUCT then sums the results, and since there is only a single non-zero number, it must be the column index.

How do I use INDIRECT inside an Excel array formula?

The situation
In the sheet "Planning" I have an area that contains pairs of sessions (strings) and hours (numbers) in adjacent cells (e.g. D11 and E11, I12 and J12 etc.) One session can occur multiple times.
D11:E11 is | Foo | 8 |
I12:J12 is | Foo | 4 |
In another sheet, I want to find a session in the Planning sheet and return an array with all the hours booked on that session (to calculate a total)
I use an array formula with a conditional and intend to use the SMALL function to retrieve the results from the array
The problem
The following formula returns all the correct references to hours booked on "Foo", so far so good.
=IF(Planning!$D$11:$CV$18="Foo";ADDRESS(ROW(Planning!$D$11:$CV$18);COLUMN(Planning!$D$11:$CV$18)+1;;;"Planning"))
{"Planning!$E$11"\FALSE\FALSE\FALSE\FALSE\"Planning!$J$12"}
However, if I use the INDIRECT function to retrieve the values of those references, they always return the value of the first reference in the array ("Planning!$E$11")
=IF(Planning!$D$11:$CV$18="Foo";INDIRECT(ADDRESS(ROW(Planning!$D$11:$CV$18);COLUMN(Planning!$D$11:$CV$18)+1;;;"Planning")))
{8\FALSE\FALSE\FALSE\FALSE\8}
How do I retrieve the correct values? Or should I tackle the problem in a whole different way?
Screenshots
The planning sheet
The overview I want
Since I was mainly interested in the total of planned hours, I eventually used the following formula:
=SUM(SUM(INDIRECT(IF(Planning!$D$11:$CV$18="Foo";(ADDRESS(ROW(Planning!$D$11:$CV$18);COLUMN(Planning!$D$11:$CV$18)+1;;;"Planning"));"$U$19"))))
IF: Create the array with references to the Planning sheet if the string is found. If it's not found, add the reference $U$19.
Using INDIRECT, replace all references with the values in the Planning sheet. $U$19 contains the value 0.
Then use SUM twice to sum up all the values. I don't know why, but see
Is it possible to have array as an argument to INDIRECT(), so INDIRECT() returns array?
https://superuser.com/questions/1196243/simplify-a-sum-of-indirect-cell-values
Indirect doest work in most array formulas. If you give it a string that refers to an array, like "A1:A10" it it returns those cells as expected but thats about it. You can use that array as the input to another function but you cant send an array output from another function to INDIRECT(). (Or at least i have not figured out a way)
Try using the INDEX function with the ROW function.
INDIRECT("A1:A10") is similar to
INDEX(A:A,ROW(A1:A10))
However the former is less flexible.
Comsider:
INDEX(A:A,FILTER(ROW(A1:A10),NOT(ISBLANK(A1:A10))*ISNUMBER(A1:A10)))
This returns an array containing the numerical values in the range but does not treat an empty cell as zero. Watch your order of operations and parenthesis.
The product NOT(ISBLANK(A1:A10)*ISNUMBER(A1:A10) is the inner product of two vectors of boolean values.
ROW(A1:A10) creates a vector of row values of the of the elements in that range. Then filter throws out any where the corespinsing element of the boolean vector is 0. Index then returns an array of values of the cells in its range coresponding to those rows. The range given to INDEX could be any row in fact. Not just the one your selecting on. Using the entire column (for example A:A) allows excel to automatically update the references if you move the source data, for instance if you insert a header row. If you use a specific range you will need to add an offset to the row value and it will not automatically update refernces. (Without a far more complex formula)

Excel - roundup then subtotal?

A colleague has an array of values in "X4:X38". Since these are in a table which may be filtered, she wants to use the subtotal function to sum them - but wants all of the values to be rounded up first.
={SUM(ROUNDUP(X4:X38,0))}
works perfectly well. However,
{SUBTOTAL(9,ROUNDUP(X4:X38,0))}
Generates a generic "The formula you typed contains an error" message. I have tried various obvious things, like putting additional brackets around the "roundup" section, etc.
Any help would be appreciated.
You can do this without a helper column by using this formula:
=SUMPRODUCT(SUBTOTAL(2,OFFSET(X4:X38,ROW(X4:X38)-MIN(ROW(X4:X38)),0,1)),ROUNDUP(X4:X38,0))
OFFSET effectively breaks the range down in to individual cells which are passed to SUBTOTAL function and that returns an array of 1 or 0 values based on whether each cell is visible after filter or not - this array is multiplied by the rounded values to give the overall sum of the rounded visible values.
Another way is to use AGGREGATE function like this
=SUMPRODUCT(ROUNDUP(AGGREGATE(15,7,X4:X38,ROW(INDIRECT("1:"&SUBTOTAL(2,X4:X38)))),0))
Given the complexity a helper column might be the preferable approach
After investigation, looks like this is not possible without helper column.
Add a helper column which rounds the individual values in column X, e.g. type the following formula into cell Y4 and drag down to Y38:
= ROUNDUP(X4,0)
And then instead of
= SUBTOTAL(9,ROUNDUP(X4:X38,0))
use:
= SUBTOTAL(9,Y4:Y38)
Then if necessary you can just hide the helper column. Of course the helper column doesn't have to be column Y, it could be any column, e.g. a column far to the right of where the data ends.

Resources