Is there an easy way to reformat a poorly-formatted tree to a two-column table? - excel

I have a table representing a series of components and their subcomponents, and the subcomponents' respective subcomponents, and so on. It currently looks like a tree (one-to-many relations), but it could change at some point to resemble a graph (many-to-many relations) instead. Unfortunately, it was poorly formatted by its author, and looks something like this:
The above format is poor because there is a lot of data duplication and it is limited to a set number (4) of tiers. I would instead prefer if it looked something like this:
The above format is nice because there is very little data duplication, and it is not limited to a set number of tiers.
In case there is any confusion about what the tables represent, here is a graphical representation of the data:
It is simple enough to convert from the poor format to the nice format, but there are hundreds of root components, and manual data entry would be far too time-consuming and tedious.
I suspect this problem is unique and I am prepared to write some VBA code myself to parse the table into the nice format, but I thought I'd make sure that this wasn't a common problem with a pre-rolled solution before I rolled my own.
Is there a technical term to describe the poor formatting in the first table? Is there an easier way to reformat the data than to write a VBA macro?

This may be a complete aberration but it works for your sample (and at the moment I don’t have time to break it!)
Add an index (and a label for it) and reverse pivot (eg see An excel formula to find a row/column index in array).
Instead of drilling down on the Grand Totals intercept, drill down on each of the totals for the Tiers.
Reassemble the tables side by side, delete all columns except the Value ones and copy table to another area with Paste Special Values. Remove Duplicates on the range. Every time the value in the column immediately to the right does not change, delete and shift the values in the cells to the left. Reorder the columns right to left.

I copied each pair of adjacent columns in the Tier table (Tier 1 & Tier 2, Tier 2 & Tier 3, Tier 3 & Tier 4) and pasted them stacked vertically into a single pair of columns (Subcomponent & Component).
Next, I removed duplicates by selecting both of my new columns and clicking Remove Duplicates in the Data ribbon tab.
Next, I had to remove all rows which contained a blank cell in the Subcomponent column. To do this, I selected both columns again and filtered the data by clicking Filter in the Data ribbon tab. I selected (Blanks) in the Filter menu on the Subcomponent column and deleted all visible rows. I removed the filer by selecting (Select All) in the filter menu.
The resulting table contained many blank rows, so again I removed duplicates, and then manually shifted the data up one row to displace the one remaining blank row.
In the end, it took about a half hour, which is probably less time than it would have taken me to code a macro, and definitely less time than manual data entry.

Related

How can I make a drop down list in Excel 2013 based on several conditions?

What I would like to achieve is that sellers can choose the STORE in the blue cell (either with a drop down list or by hard-typing the STORE name) and, based on the selection on the blue cell, the available POSITIONS for that particular PRODUCT and that particular STORE are show in the green cell as a drop down list.
Let's say I have an Excel workbook, which contains a worksheet with this table with products data, which is automatically imported daily from our Nav server with this layout. It has 4 columns including PRODUCT CODE, DESCRIPTION, STORE IN WHICH IT CAN BE LOCATED and POSITION INSIDE DE STORE (please, check screenshot). It contains 1.5k rows and it changes dynamically, for example, new items are added or positions are exchanged.
As you can see, the same product (PRODUCT 2) can be located in several stores (STORES 1, 2 and 3), and it can be in several locations on each store (POSITIONS 2, 3, 1 and 4).
Now I need sellers to report which of these items they pick and from where, not only the STORE but its POSITION inside the store too. They do it with another worksheet inside the same Excel workbook. It looks more or less like this (please, check screenshot).
I know the drop down list is achieved via Data Validation but I can't figure out the formula for this. I have tried several approaches like:
Array formula to return all POSITIONS in the same ROW, following this (Formula 2.): https://www.ablebits.com/office-addins-blog/2017/02/22/vlookup-multiple-values-excel/. It is quite slow to calculate on the 1.5k items and, once done, I can't figure out how to make Data Validation to look for the 4 or 5 or 10 POSITIONS returned by the array formula, which also need to be filtered by STORE (please, check screenshot for the closest that I have been, array formula returning POSITIONS from column E).
Same formula as above directly on the Data Validation list box, which returns only the first POSITION found.
VBA custom fucntions which are not allowed in the Data Validation box.
I feel comfortable with both Power Query and VBA, and forumla as well, and can adapt most of the code I see but I don't know why I just can't figure out how to achieve this, maybe it is only I am blocked or something but every path I start to follow ends up in a dead end.
Does anyone have an idea on how to approach this? It doesn't really seem that complicated but it is becoming impossible for me.
Thank you very much for your time!!
This is what I have finally done, just in case someone else is facing this situation.
Instead of a plain-text table for the POSITIONS, I created a PowerQuery importing that CSV. Named that worksheet _LOCATIONS.
Added a custom column (Column E) combining the PRODUCT and the STORE so I had something like a Unique Identificator, resulting something like this but in PowerQuery.
Combined column:
Sorted column E and sub-sorted column D, so I make sure the list will always be ordered as I need, and saved the query.
Then, in worksheet REPORT, I entered this formula to create the drop down list in Data Validation in cell D2:
OFFSET(_LOCATIONS!$D$1,MATCH($A2&"-"&$C2,_LOCATIONS!$E:$E,0)-1,0,COUNTIF(_LOCATIONS!$E:$E,$A2&"-"&$C2))
And I am able to choose from the available POSITIONS for the selected PRODUCT in the selected STORE.
Brief explanation:
I set the reference for the OFFSET function in the very first POSITION (D1), and then I move it the amount of rows detected by the MATCH function (which searches for the "PRODUCT 2-STORE 2" string in the newly created combined column) minus 1 (PoweryQuery table has headers) and 0 columns. This leaves me on the first occurrence of my string (but on the POSITIONS column). Then I make the offset as high as the amount of rows detected by the COUNTIF function (which counts all occurrences of my PRODUCT-STORE pair), returning an array of all the positions (column D) matching the PRODUCT-STORE pair.
Ask for formula in Spanish if you need it.

Generate summary gantt chart from detailed activities

I want to create a gantt chart summary that shows a person´s whole "busy" and "free" schedule by day and in a single row, from a detailed gantt chart with a list of activities of different people in multiple rows.
Basically go from this:
To this: (which I created Manually)
To be able to give a summary of people´s shifts free time between activities.
Right now I´m just using this formula to compare the start and end date in each row and produce a "1" if the condition is True, then I just condition formatted the whole Gantt cells.
=IF(AND(Q$8>=$N12,DAY($K12)<>DAY($J12)),1,IF(AND(Q$8>=$N12-0.00001,Q$8<$O12-0.00001),1,""))
I have no idea how to start. I was thinking of doing the nest things:
Create a table of the names of all the possible people to be added in the gantt chart.
Program the macro to create a new sheet with the same template.
Program a loop which starts iterating with each person´s name:
For each person´s name which exists in the gantt chart to be summarized, start creating new rows per each day they have activities scheduled (I can´t figure out yet how I´d iterate through this).
Within the each person´s loop, start iterating each row on the original sheet, evaluating each start and end date´s and pasting on the new sheet´s current person´s current day row a "1" if the condition was true in the corresponding hours.
Loop until all individual activities of each person are finished.
Continue with next person.
I´d like to know if this is the logical way to go and if you have any pointers or similar code to recycle, I am not proficient in VBA and Excel macros.
Not sure if I understand properly, but it looks like you got a set of multiple records where you store the times In and Out of each worker, several rows for each worker.
And based on that, you would like to resume data, one row per worker, highlighting start and end time of each worker, but all in a single row.
I made a fake dataset like this:
I added 2 extra columns (you can even hide them if you don't want to see them)
Field START TIME got this formula:=B2-INT(B2)
Field END TIME got this formula: =C2-INT(C2)
In Excel, Dates are integers values and times are decimal values. I used both formulas to get only the decimal part of each start and end.
All this data is a table object named T_WORKERTIMES. I made a table object so if you add new records, the Gant Chart will autoupdate.
Then I made a simple (kind of horrible) Gant Chart:
The formula I've used in H2 and drag is:
=COUNTIFS(T_WORKERTIMES[Worker];$G2;T_WORKERTIMES[start time];"<="&H$1;T_WORKERTIMES[end time];">="&H$1)
Actually, all my data is in same sheet:
I added 2 Conditional Formating Rules to highlight cells in green/white if the result of the formula is 1/0.
Also, working with times sometimes can be hard, because decimals. 0,677083333335759 means 16:15. But 0,6770833333333333 too, so in Gant Chart I rounded up headers to 6 decimals.
My formula in H1 is =ROUND(7/24;6)
My formula in J1 and drag to right is =ROUND(H1+1/24/4;6)
So now everything works fine. Please, notice in worker 1, there is no activites from 07:00 to 08:00. So I add a new row with that data and everything updates:
I've uploaded a sample yo Google Drive you can see the formulas and hope you can adapt this to your needs.
https://drive.google.com/file/d/1KOuCAYsmlY9gfNUCUhIrihXu-tJz-K7t/view?usp=sharing
Biggest issue here is the decimal part of times, to make sure they fit the Gant Chart. An easy solution would be substracting just 1 minute to start time column (calculated, you can hide it) and sum 1 minute to end time column (calculated, you can hide it).
Hope this can guide you in your project.
It looks like you are trying to extract unique records per person and day to get a person/day summary of time availability but also want it to be automated as you add more people and days.
I was able to do this with a combination of powerquery and a pivot table. When new persons/dates are added or changed the report will update but you will need to refresh using CTRL+ALT+F5
you want to highlight your entire report or an area as big as you think it will get. While highlighted you will then utilize named range feature under FORMULAS tab -> DEFINED NAMES ribbon -> DEFINED NAME dropdown. We could name it REPORTAREA or something like that.
Make sure you change the conditional format formula in report to show 0 instead of "" so this can work properly
2 go to DATA tab -> GET AND TRANSFORM ribbon -> From other sources -> Blank Query.
This will open the power query editor as a blank query
3 In the formula bar type =Excel.CurrentWorkbook() case sensitive is important
4 From there you will see CONTENT and NAME column.
In the NAME Column select the drop down and go to TEXT FILTERS -> Equals... Type in the name of your named range so the query does not pick up anything else on accident.
5 Remove the NAME column by right click selecting it and then remove.
6 you will notice the CONTENT column has two curved arrows pointing left and right instead of straight down like you are used to in excel. Click these arrows and make sure you uncheck the "use original column name as prefix" option box and ensure that the EXPAND option is selected. Then click okay.
7 At this point it looks alot like your report. Go to the HOME tab -> TRANSFORM ribbon -> Use first row as headers.
8.Select only the columns that are NOT THE 24 hr STYLE TIME LABELS of your report and then right click -> Unpivot other columns
9 At this point you can start removing some of the columns you dont want by right click and remove. Also double click and rename the columns as you wish. You can right click the top of the column and change types to what you want.
Dont worry about the 24HR style time zones not looking correct as this will be fixed later, this column should be changed to decimal type and not time zone type.
select a column that has the date information you need and right click -> Duplicate column -> change type to date.
11.At the top left part of the screen there is a CLOSE AND LOAD drop down where you will load to a new worksheet.
That will produce a green table. Select the table and press ALT+D+P to produce a pivot table linked to the green table you produced from the query.
You may need to close the Queries and Connections box that opened in order to see the pivot table options that will appear on you right.
Drag the 24hr style column to the columns area.
Drag the People to the Rows area and after Drag the Column you made in step 10 to the Rows area.
Drag the conditional format column to the Values area.
Your pivot table wont look exactly like what you want. while pivot table is selected go to DESIGN tab -> REPORT LAYOUT -> Tabular and also SUBTOTALS -> DO NOT SHOW SUBTOTALS while in the same tab.
13 Highlight all of the 24hr style time labels and format them and after highlight the inside of the pivot table where all the 1 and 0 will be and apply the conditional formatting you applied previously. Dont forget you changed the formula originally so your if statement does not end with "" but instead with 0.
If you would like i think it is easier to switch around the ROWS and COLUMNS area of the pivot table fields so that the report is easier to read. I have chosen to do so in the pictures. If you want to keep the report the way you are used to you can follow previous instructions.
I put the above comment down here as a complete answer.
I call those cells after "Finish date" column as "Chart cells". To extract a unique list of names, please refers to: here
If each name, you can use the followings formula and format for cells value >0 to show the bars.
=SUMIF([name range], "[each name]", [for each column of the chart cells])
If you further needs to filter by dates, you need to use sumifs() instead:
=SUMIFS([each column of chart cells], [name range], [unique name obtained from above], [Finish date range],"<=" & DATEVALUE("[target date]")+1,[start date range],"<=" & DATEVALUE("[target date]"))
This is the Excel formula solution, which is good if your table is not huge.

Making excel search using one letter in cell

So basically, I have been trying to make excel scan one column and print out the row for cells which have specific letters.
For example I have the following data in my sheet.
What is required is to be have excel search for the values which contain T and I, and then print the new rows. Kind of like separate the two into two different tables, because then so that I can do further analysis on them.
So far I have been trying to use the VLOOKUP() function, but the problem with VLOOKUP() is that excel required a proper match and not just a letter in the cell. I had tried with both FALSE and TRUE. Then instead I tried to use the =INDEX($B$4:$K$9;MATCH($A$17;$A$5:$A$9;0);COLUMN(A4)) to make it work. But that also does not work, since it also requires a full match. Also another problem which I didn't realise before is that how can excel recognise each cell, because I will have different number after the letters everytime and then so how can one make excel not repeat the same row twice?
I have used another approach where I copy the data in a separated sheet and then I simply filter out the Ts and then copy/paste the Is into another sheet and vice versa. it is time consuming and so it would be much better if I can simply copy/paste my new data and it would generate the division on its own.
any suggestions or link would be really helpful.
UPDATE
I had a new idea on how to approach this problem. I was thinking that is it somehow possible to have VBA code running for filtering the data. Is there a way to specify in VBA code to filter the data by "Starts With" and Make the results be printed in another cell block?
Looks simple enough. First step is to make sure you have headers over your data and that it is in proper table format similar to my picture. Then select Data set and press CTRL+T. That should turn you data in to table object with stripes. Use the Formula =LEFT(C2,1) to take out the first letter which will be L or T.
Select the table and press ALT+D+P which will generate a pivot table based off original data set.
Drag the column with the formula I suggested over to the FILTERS area of the pivot table ID column to ROWS and all others to VALUES. Simply refresh and as new data is added you will get new pivot tables. Do not put the pivot tables on top of each other as I did that is only for the picture so you can see it. If you have too many filters to apply you can right click the helper column in the pivot table fields area to produce a slicer which is a button that helps you change the report quickly. Any other questions do ask.

Excel - Speed Up Index|Match Array Calculations

I'm having some serious performance issues with an excel workbook I created. I need to pull data from another worksheet in the book that has 7 columns of data and about 300 rows.
The amount of data should be no problem - I think the issue I'm having comes down to an index|match array that has multiple match conditions. I'm wondering if there's another approach I can take, because the workbook is becoming aggravating to work with.
Here's some made up data:
This data is aggregated in a separate program from a database, and I output it to an excel file.
Here's a sample of a made up segment of a report:
Where the formula for the rows "Active Accounts" and "Online Enabled Accounts" are:
{=IFERROR(INDEX($D:$G,MATCH($K$2&M$1,$C:$C&$A:$A,0),MATCH($L2,$D$1:$G$1,0)),0)}
and the formula for the rows "Both", "Online", and "Paper" are as follows:
{=IFERROR(INDEX($G:$G,MATCH($K2&M$1&$L6,$C:$C&$A:$A&$F:$F,0)),0)}
I have about 5 other "segments" that reflect similar data by different in this format across 13 months. With only 300 data records this workbook is still painfully slow to even apply formatting, so I'm hoping there's a better approach than to just use these arrows with Index|Match.
Select any range in your dataset and hit CTRL + A then hit CTRL + T. This will create a table you can reference as a named range.
Write the index match formula as you normally would except make sure to only select the data you're looking for this means do not select the entire column, this is what's weighing down your system, simply choose the ranges with the data you're looking at.
You'll notice as you highlight your data it will say something like Table1[Accounts] what this means is it's accessing that named column's datarange. This will allow your formula to scale as the table grows (or shrink as needed), and not calculate any farther than needed. This will save your computer a tremendous amount of computing power while it calculates.

Calculate average of values in a column based on matches of two variables in two other columns

I'm having an issue with some data that I'm working on that has had me stuck for a while.
I'm working on some patient data for a clinical practice that has each patient encounter logged on a separate line with an account ID, date of service, the height and weight measurements for that date, and other variables.
Aside from VLOOKUP and the usual formulae/functions I've got a pretty rudimentary understanding of Excel but I can pick up on things fairly quickly.
In the data I've got each line tied to a patient account ID as well as what quarter the DOS was in. For patients with multiple visits, they will be identifiable by repetitions of the account ID number on other lines.
For some patients, there will also be repetitions in the quarter if the same patient was seen twice in the same quarter. This is where I need help.
I'd simply like to average the value of a variable for each patient in each quarter. I'm not sure if AVERAGEIFS is the right function to use but I need an operation that checks for matches in a line of both account ID and quarter (Q1, Q2, Q3, or Q4) with the other lines in the sheet and comes up with a quarterly average for the variables in question.
What I have
What I need
If I'm understanding your question, you could use AVERAGEIFS to accomplish what you are asking. With excel, a lot of how successful an approach will be is determined by how your data is structured, and if/how often you are planning on updating your work.
It would be easier to give a concrete answer with an example of the data you are looking at.
If your source data is in columns A:D something like:
and you are looking to summarize the weight data in I by account ID and Quarter:
you could use AVERAGEIFS(C:C,A:A,G2,B:B,H2) this would find the cells in column A that match the value in G2, find the cells in column B that match the value in H2, and report the mean value in column C of the matching rows.
An alternative is to use a pivot table, which automates a lot of what you are trying to do. For that approach you would select your data block, and click on Insert>>Insert Pivot Table (at least in my version). That should bring up a wizard. Accepting the defaults will create a new sheet. Then look on the right side of your window, and you should see a list of your column names near the top, and four boxes called Filters, Columns, Rows, and Values. You should be able to drag and drop your columns into these boxes to get summaries of your data. If you add your Account ID and Quarter tabs as columns, and your height and weight as Values, then right click on each of the value columns and select "Value Field Settings" and select Average from the menu that pops up. That should give you something that looks like:
At that point, you can change the formatting to make if fit your needs, or copy data somewhere else.
The AVERAGEIFS approach will automatically update if you add more data, but will only summarize things that match the values you list. If there is an account ID/Quarter pair that isn't in the summary column you won't have any idea it's there. If you are summarizing an ID/Quarter pair that isn't in your data you'll end up with a division by Zero error like in the example.
The Pivot Table option only updates when you manually click refresh (right click and choose refresh pivot table from the menu), but will summarize all the data based on the columns you've selected. It's also a little more robust as you avoid having to type out the formulas and make sure you are pointing to the right column. This option also by default provides nested summaries; you can turn the subtotals and grand totals off if you want.
I used the function:
=SUMPRODUCT(($B$2:$B$13="Customer 1")*($C$2:$C$13="Q1"))
Where Customer 1 could be a user id and Q1 you can change to which quarter you want.
Excel screenshot

Resources