Excel - Create dataframe out of raw data with multiple columns

Excel - Create dataframe out of raw data with multiple columns - excel

In this question I will be writing some text, but also referring to a picture linked below.
I have raw data, with indicators (can be more than 4) always in column C.
In the raw data, I also have Company IDs (can be more than 2) always in row 5.
When these cross, they get some values. For instance, Company ID 1s value on indicator1 is 10.
I also have a list which takes the unique Company IDs on row 5.
I would like to create a dataframe which puts these values on table form, with each row representing a combination of indicator, company and the actual value - like represented in "desired outcome".
I would also like that table to be dynamic with how many indicators are in column C and how many companies are in row 5.
I need help with which formula to put in the three columns in desired outcome.
I have tried some combinations of xlookup, index match and byrow, but I cant seem to find a solution.
In advance, thank you :)

Related

How can I make a drop down list in Excel 2013 based on several conditions?

What I would like to achieve is that sellers can choose the STORE in the blue cell (either with a drop down list or by hard-typing the STORE name) and, based on the selection on the blue cell, the available POSITIONS for that particular PRODUCT and that particular STORE are show in the green cell as a drop down list.
Let's say I have an Excel workbook, which contains a worksheet with this table with products data, which is automatically imported daily from our Nav server with this layout. It has 4 columns including PRODUCT CODE, DESCRIPTION, STORE IN WHICH IT CAN BE LOCATED and POSITION INSIDE DE STORE (please, check screenshot). It contains 1.5k rows and it changes dynamically, for example, new items are added or positions are exchanged.
As you can see, the same product (PRODUCT 2) can be located in several stores (STORES 1, 2 and 3), and it can be in several locations on each store (POSITIONS 2, 3, 1 and 4).
Now I need sellers to report which of these items they pick and from where, not only the STORE but its POSITION inside the store too. They do it with another worksheet inside the same Excel workbook. It looks more or less like this (please, check screenshot).
I know the drop down list is achieved via Data Validation but I can't figure out the formula for this. I have tried several approaches like:
Array formula to return all POSITIONS in the same ROW, following this (Formula 2.): https://www.ablebits.com/office-addins-blog/2017/02/22/vlookup-multiple-values-excel/. It is quite slow to calculate on the 1.5k items and, once done, I can't figure out how to make Data Validation to look for the 4 or 5 or 10 POSITIONS returned by the array formula, which also need to be filtered by STORE (please, check screenshot for the closest that I have been, array formula returning POSITIONS from column E).
Same formula as above directly on the Data Validation list box, which returns only the first POSITION found.
VBA custom fucntions which are not allowed in the Data Validation box.
I feel comfortable with both Power Query and VBA, and forumla as well, and can adapt most of the code I see but I don't know why I just can't figure out how to achieve this, maybe it is only I am blocked or something but every path I start to follow ends up in a dead end.
Does anyone have an idea on how to approach this? It doesn't really seem that complicated but it is becoming impossible for me.
Thank you very much for your time!!

This is what I have finally done, just in case someone else is facing this situation.
Instead of a plain-text table for the POSITIONS, I created a PowerQuery importing that CSV. Named that worksheet _LOCATIONS.
Added a custom column (Column E) combining the PRODUCT and the STORE so I had something like a Unique Identificator, resulting something like this but in PowerQuery.
Combined column:
Sorted column E and sub-sorted column D, so I make sure the list will always be ordered as I need, and saved the query.
Then, in worksheet REPORT, I entered this formula to create the drop down list in Data Validation in cell D2:
OFFSET(_LOCATIONS!$D$1,MATCH($A2&"-"&$C2,_LOCATIONS!$E:$E,0)-1,0,COUNTIF(_LOCATIONS!$E:$E,$A2&"-"&$C2))
And I am able to choose from the available POSITIONS for the selected PRODUCT in the selected STORE.
Brief explanation:
I set the reference for the OFFSET function in the very first POSITION (D1), and then I move it the amount of rows detected by the MATCH function (which searches for the "PRODUCT 2-STORE 2" string in the newly created combined column) minus 1 (PoweryQuery table has headers) and 0 columns. This leaves me on the first occurrence of my string (but on the POSITIONS column). Then I make the offset as high as the amount of rows detected by the COUNTIF function (which counts all occurrences of my PRODUCT-STORE pair), returning an array of all the positions (column D) matching the PRODUCT-STORE pair.
Ask for formula in Spanish if you need it.

How to Isolate Rows of a Table Given One Column Using Two Variables

So I want to isolate all of the rows that are labeled 'Good' from all the rows that are labeled "bad".
I've tried to use the 'sort and filter' tool in excel, but this hasn't worked, I think due to the presence of the index table, which I've used to generate my formulas.
Here are the formulas being used to obtain a unique number for each row, which I then use to determine whether a value is "good" or "bad".
For reference, not all the boxes in the spreadsheet that are green are labeled 'good'.

This can be done by using an if statement and matching to the relevant number in the table, then printing the corresponding column data. This will need to be repeated for each column of data you want to use. An example of the code is given below;
=IF(ISNUMBER(MATCH(A2,AB:AB,0)),AG:AG,"No Data")

Count values for each row with a unique ID

I have a bunch of rows in a table. Each row reflects an event in a patient. However, one patient can have experienced multiple events, so it's possible for there to be multiple rows with the same patient number. Now I'd like to count the amount of male patients in my database, without counting the ones that had multiple events multiple times. Each patient is identified by a unique patient ID that could be used for this.
This shouldn't be all that complicated if not for the fact that I'm using a table that also has several filters, so I need to use SUBTOTAL for any counting functions.
I literally have no idea where to start, so I can't really provide any code...
Any function that could point me in the right direction would be greatly appreciated.
Thanks for the help.
~Laurens

Use a Pivot Table to filter and count all your patients database. Select your data and select Insert -> Tables -> Pivot Tables. Put your filters at the Filter section of the table and the Patient ID in the Rows section. Then, you can use COUNT to get the number of patients.
For more information about Pivot Tables, you can check this: https://support.office.com/en-us/article/Create-a-PivotTable-to-analyze-worksheet-data-a9a84538-bfe9-40a9-a8e9-f99134456576

To get the number of unique IDs in the same column, if the IDs are numeric, you can use SUM with FREQUENCY:
=SUM(IF(FREQUENCY($A$1:$A$1000,$A$1:$A$1000)>0,1))
If they're text and numbers mixed, you can get unique IDs with this one:
=SUM(IF(FREQUENCY(MATCH($A$1:$A$1000,$A$1:$A$1000,0),MATCH($A$1:$A$1000,$A$1:$A$1000,0))>0,1))
(From here)

Here you go
You've not mentioned whether an event is optional.
You might want to add extra column H with formula like h2=if(c2="",0,1) with 1/0 and multiply it as well in G.
Basically if column G contains a 1 you include it
Here's what the results of the formula look like:
Revision
Table is sorted by Patient id..
on change of patient id column H contains a 1, it'll be 0 otherwise.
So H2 is hard coded to 1, H3,H4,H6 will evaluate to 1.
So now G2=H2*E2 etc. You can filter by column H.
The beauty of mapping things into binary zeros and ones is you can do multiplication to achieve a logical AND result, whilst at the same time breaking a complex task into a series of steps. You can then apply a filter to the data to get the rows where column G are not zero, and see the totals count. Normally I'd insert a column between header and data on row 2 and then have G2=SUM(G3:G9)
Sum column H for number of patients.

Count unique items on a list that meet multiple criteria

This sounds simple, but I'm getting a real headache trying to figure it out:
I have two tables in excel in the same workbook but on different sheets. I want to count unique items in column B on the first table that meet a criteria (based on the data that's in column A of that table) and appear on the second table (on a different worksheet).
Because the data I'm working with is confidential, I've made up the two tables below (I've just clipped .jpgs). They are similarly formatted, but in reality I have much more data.
I need a formula that counts the number of unique people in Column B of Table 1 who also appear in Column B of Table 2 and whose date (in Column A of Table 1) is on or before 4/2/2016.
In this example it should come out with the answer three (for Bob, Jim, and Sue).
Table 1
Table 2
Any help you can provide would be hugely appreciated!!

If you put this =IF(AND(COUNTIF('Second Table'!$B:$B,'First Table'!$B2)>0,$A2<DATE(2016,4,2),SUMIF(B1:$B1,B2,$C$1:$C1)=0),1,"") in C2, then auto-fill it down would do it and you could sum it then at the bottom..

how to calculate the means of 100s of subgroups in excel

I have a spreadsheet with ~8000 records, there are ~400 unique identifiers (i.e. element 101, 102, 103....500) that I need to calculated means for. Is there a simple way to calculate means on large datasets like this?? Or will I have to do =average('select column block') for each subgroup/unique identifier?
Many Thanks

Use the following formula
=AVERAGEIF($A$1:$A$8000,"=IDNUMBER",$B$1:$B$8000)
Where
Column A is your column of ID numbers
Column B is your list that you need the mean from.
If your ID numbers are sequential, you can set up something like:
=AVERAGEIF($A$1:$A$8000,"="&100+row(A1),$B$1:B8000)
And copy that down from say C1 to C500
Alternatively you could make a list of the unique identifiers with another formula and place that unique list in C1 to C500 and then in column D use the following:
=AVERAGEIF($A$1:$A$8000,C1,$B$1:$B$8000)
If you have a header row you will need to adjust your ranges accordingly
The formula to generate a unique list of IDs is:
=INDEX($A$2:$A$8001,MATCH(0,INDEX(COUNTIF($C$1:C1,$A$2:$A$8001),0,0),0))
Use that in column C but in row 2 and copy down. So if your data starts in row 1 you will want to bump it down 1 row.

Create a pivot table with the unique identifiers in the rows and calculate the average of the values.

For data that is clustered up nicely and immediately ready to be handed off for a visual review of the averages try a creating a Subtotal:
Select your data
Go to Data > subtotal (far right on the tab)
On the menu popup in the At each change in field, select the column header name that corresponds to your unique identifier.
Select Average for Use function. Select the checkbox of the column for which you want to find the group's mean.
Select other formatting features if desired (defaults typically work best)
Click okay.
Take a sip of coffee and let the magic happen.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string