How to find parent in an indented hierarchy? - excel

I currently have a sheet in excel with an indented hierarchy of items as shown below. Each item is indented (four spaces per indent) to show how it fits into the overall hierarchy. I have been able to create a "Level" column that translates the indentation level into a number.
+------------+-------+--------+
| Item | Level | Parent |
+------------+-------+--------+
| P1 | 1 | N/A |
| P2 | 2 | P1 |
| P3 | 2 | P1 |
| P4 | 3 | P3 |
| P5 | 2 | P1 |
| P6 | 3 | P5 |
+------------+-------+--------+
What I want to do is generate the "Parent" column above, which uses the "level" information to display each item's parent.I think that this would need to be done with a loop that would do this for each item X :
-Find level info for X
-Find (levelx-1) which would equal the parent item's level
-Search upward for the first row with a level equal to (levelx-1)
-Find the item number in that row
-Write item number in adjacent cell to X
Unfortunately, I'm not sure how to translate this idea into VBA.Thanks in advance for any assistance.

OK, assuming the above table starts in cell A1, useful data starts in row 2. This formula will do the trick:
=INDEX($A$1:$A$7,MAX(IF($B$2:$B2=B2-1,ROW($B$2:$B2),"")))
Enter this in cell C2 as an array formula (Ctrl+Shift+Enter), then pull it down. The first one will obviously be an error (not #NA but #VALUE).
How it works:
IF($B$2:$B2=B2-1,ROW($B$2:$B2),"")
This creates an array with the row numbers for values with one level lower than the actual value. To examine only the values above the current row, you need to use expanding ranges, hence the $B$2:$B2 style references.
The MAX function gets the maximum of these rows, which is the closest to our current cell. Now we have the row number. All we need now is a formula to extract the data from column A from the indicated row. This is what INDEX does.

It took me a while to understand how this formula works, so after figuring it out (ok, my wife helped me a bit) I'd like to share an idiot-proof explanations for other Excel-dummies like me. Here we go:
=INDEX($A$1:$A$7,MAX(IF($B$2:$B2=B2-1,ROW($B$2:$B2),"")))
means:
Among values in range $B$2:$B2 find all values that equal to
B2-1.
If you find them, list the row numbers with value equal to
B2-1. (ROW)
From the list of the row numbers, pick the highest
row number (lets call it number X). (MAX)
Return the value which is in the line number X in the range $A$1:$A$7
(Warning! Your range has to start in the row no. 1, so that the row number is the same as the line number in your range. Otherwise - you have to adapt the formula.)

Related

Displaying the contents of a cell based on the max fuction

I am running a MAX function of a row of data such as:
+------+-----+------+------+-------+------+------+--------+------+------+--------+------+
| John | Doe | 4323 | Eric | Smith | 1235 | Sean | Wilson | 4567 | Jeff | French | 3212 |
+------+-----+------+------+-------+------+------+--------+------+------+--------+------+
(with each item being in a different cell)
Naturally the MAX function running on this whole row will return the 4567. I would like the cell in front of the MAX result to return the first name that directly proceeds the result such as:
Sean 4567
Keep in mind that the first name, last name and the number are in separate columns but on the same row, but always located in a cell a constant number of cells before the result. (I don't need nor want the last name for this result)
Suppose you have your values at row 4, from Column D to O.
The following will get you the name (Sean) instead of the MAX value (4567):
=INDEX(D4:O4,MATCH(MAX(D4:O4),D4:O4,0)-2)
The MATCH formula will return position 9, the INDEX formula will return the value at position 9. We want to return position 7, so subtract 2 and the INDEX will return "correct" result. The -2 is how many positions the return values should be offset (2 cells before the actually MAX value is found).
If you want Sean 4567 I should combine the the two formulas:
=INDEX(D4:O4,MATCH(MAX(D4:O4),D4:O4;0)-2) &" "& INDEX(D4:O4,MATCH(MAX(D4:O4),D4:O4,0))
Or
=INDEX(D4:O4,MATCH(MAX(D4:O4),D4:O4,0)-2) &" "& MAX(D4:O4)
You already mention it, it only works if the offset is constant across the row (Name is always located 2 cells before value cell).
Similar to what Wizhi said, but slightly shorter. Assuming the info is on row 1:
=INDIRECT(ADDRESS(1;MATCH(MAX(1:1);1:1) - 2))
Where your row number number is given (1:1), col number finds the max and goes 2 before (the given offset), and INDIRECT(ADDRESS()) concatenates it with the row number and finds the name there ('Sean', in this case.)

Structured Reference in an Index/Match Calculated Column Formula for Header Cell Range in a Table

I have this formula in a calculated column that is working great:
=IFERROR(INDEX(Allocation_of_Funds[[#Headers],[End.Nursing]:[Unassigned14]],MATCH(TRUE,INDEX(Allocation_of_Funds[#[End.Nursing]:[Unassigned14]]>0,0),0)),"")
But this formula is giving me trouble and represents what I want in the next calculated column, (based on the value in the previous column above) but it returns a #REF! error:
=INDEX(INDIRECT("Allocation_of_Funds[[#Headers]"&"["&[#[SOURCE 1]]&"]"):[#Unassigned14],MATCH(TRUE,INDIRECT("Allocation_of_Funds[[#Headers]"&"["&[#[SOURCE 1]]&"]"):[#Unassigned14]>0,0),0)
The details of the tables setup is as follows, in case it's helpful:
I have a table with a range of columns and each column represents a different type of account. For each row, any combination of these columns could contain values or blanks, so I've got another set of columns that I want to identify the table column headers for the non-blank columns for each record.
SOURCE1 | SOURCE2 | SOURCE3 | ACCT1 | ACCT2 | ACCT3 | ACCT4 | ACCT5
ACCT1 | ACCT2 | ACCT4 | 500 | 300 | | 100 |
ACCT2 | ACCT3 | | | 200 | 100 | |
ACCT3 | | | | | 500 | |
| | | | | | |
ACCT3 | ACCT4 | ACCT5 | | | 200 | 300 | 50
ACCT1 | ACCT3 | ACCT4 | 123 | | 332 | 100 |
So I need the SOURCE2 column to use the value in the SOURCE1 column to identify the start of the range where I am looking for the next cell with a value, whereby the column header above that value will be returned for the SOURCE2 row value. The same formula will apply to the SOURCE3 column, using the value of the SOURCE2 column to identify the start of the next range.
Thank in advance for picking your brain!
-Lindsay
I used the following formula to pull the headers and place them under the source numbers:
=IFERROR(INDEX($D$1:$H$1,AGGREGATE(15,6,COLUMN($D2:$H2)/ISNUMBER($D2:$H2)-COLUMN($D$1)+1,RIGHT(A$1,1)*1)),"")
I assumed your table's top left corner was in A1 with 1 being the header row and A-C being your source columns and D to H being account columns. The above formula can be placed in cell A2 and copied to the right and down as need be.
You seem to have a grasp of the IFERROR and INDEX function so I will explain the AGGREGATE function:
=AGGREGATE(15,6,COLUMN($D2:$H2)/ISNUMBER($D2:$H2)-COLUMN($D$1)+1,RIGHT(A$1,1)*1)
The AGGREGATE function is a mixture of a bunch of different functions rolled into with the ability to ignore some calculations. Another added feature is that some of the built in functions perform array calculations without the need for arrays.
In this particular case I chose aggregate function 15 which is the same as the SMALL function. I have also told aggregate to ignore calculations which generate errors by using the "6". For the array calculation I have asked it to divide the column number it is working with by the True or False result of that column being a number:
COLUMN($D2:$H2)/ISNUMBER($D2:$H2)-COLUMN($D$1)+1
True in excel math is the same as 1 and False is the same as 0. Anytime the cell is not a number it will try to divide by zero, generate an error, and be ignored by Aggregate function. This basically generates a list of column numbers that meet the criteria of having a number in their column. The subtraction of the D1 followed by a +1 is to convert the column number that is determined, to a relative column under your accounts headers.
The next part of the aggregate function is telling the SMALL operation which number in sorted order needs to be returned. I used the last character in your source header to determine which column number to return. For SOURCE1 the last character is 1 so I want the smallest column number returned. For SOURCE2, the second smallest number is returned. The *1 at the end converts the character to a number instead of 1 as text.
RIGHT(A$1,1)*1
Ergo, if you want to use up to 9 sources you can. You can do more sources as well but you would need to revise this formula or come up with a different way of providing which number of the small list you want returned. And you can expand the D2:H2 reference to be all your accounts, and adjust the D1:H1 reference to cover all your account headers.
Proof of Concept

Compare two excel columns which the most frequently occur in specific date

I would like to compare between few columns, what where the top 5 most popular products in year 2015.
I have this kind of data flow to work with:
Client | Product | Date of buy
------------------------------
client1 | A | 15.06.2015
client3 | A | 04.12.2015
client5 | F | 15.06.2015
client9 | G | 15.01.2015
client2 | G | 15.01.2015
client1 | R | 05.07.2015
client3 | G | 15.06.2015
client1 | F | 05.07.2015
client3 | F | 15.06.2016
Results - which products client bought the most with (in same date) the top 5 products communities of them. E.g..
1. Product A + Product H 222 times
2. Product A + Product E 77 times
3. Product B + Product O 70 times
4. etc
5. ...
Greetz,
Making the assumption:
you can use helper columns.
Your Columns up above are A, B and C.
You have two header rows and data starts in row 3.
Your dates are stored in an excel date format and not string values.
In E2 I generated a list of unique product items using the following formula:
=INDEX($B$3:$B$11,MATCH(0,INDEX(COUNTIF($E$2:E2,$B$3:$B$11),0,0),0))
I copied it down to match the number of rows in the initial list. It starts spitting #N/A when all the unique items in the list have been listed. If you want to avoid this you could put the formula inside of:
=IFERROR(insert formula,"")
Now in column F I did a count based on your criteria of each item and within the year 2015. I used a multiple count if function called COUNTIFS:
=COUNTIFS($C$3:$C$11,"<"&DATE(2016,1,1),
$C$3:$C$11,">"&DATE(2014,12,31),
$B$3:$B$11,E3)
I just reformatted that for easier reading. You will have to edit that slightly if you want to copy and paste. If you don't like seeing 0 when there is no product in the adjacent column you could wrap the equation in:
=IF(E3="","", insert formula )
I then skipped a column and sorted the list of counted items from largest to smallest and had it return the numbers in sequence. I only went down two rows, but you could technically do the whole list. The large function does this and the formula in H3 looks like:
=LARGE($F$3:$F$11,ROWS($1:1))
I then went back 1 column and put the product name that corresponds to the count, and then took the next name in the list when products had equal count. I put that in column F as normally when I read I want to read the product name first then read the quantity. If you want it the other way around just swap the columns. The formula in G1 is:
=INDEX($E$3:$E$11,MATCH(H3,$F$3:$F$11,0)+COUNTIF($H$3:$H3,H3)-1)
Copy E3 and F3 down as far as you need. Copy G3 and H3 down one row and you will have top two. down two rows and you have top three etc.
This is how it looks...The dates are displayed according to my computers date format.

Counting the number of older siblings in an Excel spreadsheet

I have a longitudinal spreadsheet of adolescent growth.
ID | CollectionDate | DOB | MOTHER ID | Sex
1 | 1Aug03 | 3Apr90 | 12 | 1
1 | 4Sept04 | 3Apr90 | 12 | 1
1 | 1Sept05 | 3Apr90 | 12 | 1
2 | 1Aug03 | 21Dec91 | 12 | 0
2 | 4Sept04 | 21Dec91 | 12 | 0
2 | 1Sept05 | 21Dec91 | 12 | 0
3 | 1Aug03 | 30Jan89 | 23 | 0
3 | 4Sept04 | 30Jan89 | 23 | 0
This is a sample of how my data is formatted and some of the variables that I have. As you can see, since it is longitudinal, each individual has multiple measurements. In the actual database there are over 10 measurements per individual and over 250 individuals.
What I am wanting to do is input a value signifying the number of older brothers and older sisters each individual has. That is why I have included the Mother ID (because it represents genetic relatedness) and sex. These new variable columns would just say how many older siblings of each sex each individual has. Is there a formula that I could use to do this quickly?
=COUNTIFS($B:$B,"<>"&$B2,$H:$H,$H2,$AI:$AI,$AI2,$J:$J,"<"&$J2)
Create a column named Distinct with this formula
=1/COUNTIF([ID],[#ID])
Then you can find all the older 0-sexed siblings like this
=SUMPRODUCT(([DOB]>[#DOB])*([MOTHERID]=[#MOTHERID])*([Sex]=0)*([Distinct]))
Note that I made the data a Table and used table notation. If you're not familiar [COLUMNNAME] refers to the whole column and [#COLUMNNAME] refers to the value in that column on the current row. It's similar to saying $A:$A and A2 if you're dealing with column A.
The first formula gives you a value to count that will always result in 1 for a particular ID. So ID=1 has three lines and Distinct will result in .33333 for each line. When you add up the three lines you get 1. This is similar to a SELECT DISTINCT in Sql parlance.
The SUMPRODUCT formula sums [Distinct] for every row where the DOB is greater than the current DOB, the Mother is the same as the current Mother, and the Sex is zero.
I have a possible solution. It involves adding two columns -- One for "# older siblings" and one for "unique?". So here are all the headings I have currently:
A -- ID
B -- CollectionDate
C -- DOB
D -- MOTHER ID
E -- Sex
F -- # older siblings
G -- unique?
In G2, I added the following formula:
=IF(A2=A1,0,1)
And dragged down. As long as the data is sorted by ID, this will only display "1" once for each unique person.
In F2, I added the following formula:
=COUNTIFS(G:G,"=1",D:D,"="&D2,C:C,"<"&C2)
And dragged down. It seemed to work correctly for the sample data you provided.
The stipulations are:
You would need the two columns.
The data would need to be sorted by ID
I hope this helps.
You need a formula like this (for example, for row 2):
=COUNTIFS($A:$A,"<>"&$A2,$E:$E,$E2,$D:$D,$D2,$C:$C,"<"&$C2)
Assuming E:E is column for sex, D:D is column for mother ID and C:C is column for DOB.
Write this formula in H2 cell for example and drag it down.

How to aggregate data 'grouped by' its parent and use that for further calculation (normalization) in Excel?

I have some data in an Excel Workbook. This data is hierarchical i.e., there is a parent-child relationship between the data across work-sheets. Here's how it looks in a particular worksheet: (There are other rows (above it) and columns to the right, but aren't important for this problem)
| Parent | Item | Score |
| P1 | I1 | 3 |
| P2 | I2 | 1 |
| P1 | I3 | 6 |
| P3 | I4 | 1.5 |
| P4 | I5 | 4 |
We need to have a sum total of all Items belonging to a particular parent i.e., total by parent to get 'sum of items' for each parent. (The root worksheet won't have any parents (i.e., blank column), but the structure is the same across worksheets). The need is to 'normalize' the scores of the children of a parent on a scale of 0-1 (EDIT: i.e. sum of the scores of the children must sum to 1)
I've been playing around with pivot tables and I see that you can aggregate the data by parent. But I'm not sure how exactly can I use that data to normalize the item scores. More so, the data across the excel sheets is quite dynamic and from my minimal experience with pivot tables it seems data isn't being refreshed automatically.
More so, each 'child-level worksheet' is generated from the current level worksheet (using macros). So we need a way to be able to aggregate scores by parent so that we can easily have it propagated to the next worksheet when copied (even if it's to be done manually).
I'm just at a corner with being able to do a 'Group By' (from SQL) in Excel. Any ideas?
If you need to find the sum of all values matching a given criterion, use the SUMIF function. I'm assuming "Parent" is in column A, "Item" in column B ans "Score" in column C. In D2, you would have to put the following formula: =SUMIF(A:A;A2;C:C), and copy it down.
However, you don't need to know the sum of the scores of the children of a parent if you want to put the scores on a scale of 0 to 1: you only need to know the maximum score of the children of a parent. Because the MAXIF function doesn't exist, we will use an array function combining MAX and IF. Type this in D2 and press Ctrl + Shift + Enter: =MAX(IF(A:A=A2;C:C). The brackets should have been added to show that it's an array formula: {=MAX(IF(A:A=A2;C:C)}. Now you need to divide the score of the child by the maximum score of its group.

Resources