how to calculate the means of 100s of subgroups in excel - excel

I have a spreadsheet with ~8000 records, there are ~400 unique identifiers (i.e. element 101, 102, 103....500) that I need to calculated means for. Is there a simple way to calculate means on large datasets like this?? Or will I have to do =average('select column block') for each subgroup/unique identifier?
Many Thanks

Use the following formula
=AVERAGEIF($A$1:$A$8000,"=IDNUMBER",$B$1:$B$8000)
Where
Column A is your column of ID numbers
Column B is your list that you need the mean from.
If your ID numbers are sequential, you can set up something like:
=AVERAGEIF($A$1:$A$8000,"="&100+row(A1),$B$1:B8000)
And copy that down from say C1 to C500
Alternatively you could make a list of the unique identifiers with another formula and place that unique list in C1 to C500 and then in column D use the following:
=AVERAGEIF($A$1:$A$8000,C1,$B$1:$B$8000)
If you have a header row you will need to adjust your ranges accordingly
The formula to generate a unique list of IDs is:
=INDEX($A$2:$A$8001,MATCH(0,INDEX(COUNTIF($C$1:C1,$A$2:$A$8001),0,0),0))
Use that in column C but in row 2 and copy down. So if your data starts in row 1 you will want to bump it down 1 row.

Create a pivot table with the unique identifiers in the rows and calculate the average of the values.

For data that is clustered up nicely and immediately ready to be handed off for a visual review of the averages try a creating a Subtotal:
Select your data
Go to Data > subtotal (far right on the tab)
On the menu popup in the At each change in field, select the column header name that corresponds to your unique identifier.
Select Average for Use function. Select the checkbox of the column for which you want to find the group's mean.
Select other formatting features if desired (defaults typically work best)
Click okay.
Take a sip of coffee and let the magic happen.

Related

Excel: Is it possible to have a running two column list on one page and have it transfer to a secondary tab?

I have a database tab which holds all my information for each item in my inventory. This is always changing as I add new inventory. I want to take certain information in that tab (columns A & B) and use it in a separate tab called stock inventory in columns A & B. When it transfers I would like a blank cells in columns C,D,E, and F. I did try VLOOKUP however, because column A on the database tab is repeated, when the information was transferred to the stock inventory tab it duplicated information instead of actual information. For example Column A may say "Scale" column B may say "abalone, there may be a second entry with the same information in column A but different information in column B. When it transfers to the stock inventory tab anytime it recognizes "scale" in the column A it would say Abalone. This is not what I want. I want to bring over actual information for column B even if column A is the same. Any help is appreciated.
To solve the problem of the repeated value in the VLOOKUP you could create helper column on the left of the first vlookup matrix column, with the formula =ROW()&A2, and use this value as key to be searched with vllokup instead of the value in A1: there will be no duplicates, as the value will be the concatenation of the row number and the name of the article.

Restructuring data in excel

I am trying to condense data in a specific way. I want any occurrences of the number 1 in each column to show up as 1 (regardless of how many times it occurs) with the corresponding site, in the corresponding column. Some sites occur multiple times in the original data, and I want to make it so that only one of each unique site shows up in the resulting data table with a 1 for the corresponding column if there any 1's in the column from the original data.
I would think it would be a vlookup function, but I have tried many different things and I am really stuck on this.
Image of original data and what I am trying to do:
Thank you
This assumes that your data set only contains 1 or blank and this approach uses a Pivot Table with MAX function. Below are details in case anyone doesn't know Pivot Tables.
Select a cell in your data and insert Pivot Table. Note, I added a title for column A, as you need that in the Pivot Table.
Click in the created Pivot Table and the PivotTable Fields dialog should pop up. If not, right click in Pivot Table and select Show Field List.
Drag the Field names (Code, a, b,& c) down to the appropriate blocks below. (Values under Columns will be created for you.)
Click on the drop down arrow next to each field name and select Max. That will rename it to "Max of ...". If that bothers you, then you can type the name you want into the Custom Name field. Note, it will not let you type the same name as the field name, eg a, but it will work if you put a space in front of it.
Given that the Pivot Table would be a lot of work for a large number of columns, here is a formula based approach. Put this formula in cell G2, then drag it down and across to fill your new table.
Note, you will have to populate all codes that you have in column F. And if any new codes are added later you will have to keep this updated. One of the advantages of a Pivot Table is that it will do this for you.
I know that you won't be putting this in these cells, so adjust accordingly. In fact, I would recommend this be in another sheet.
=IF(COUNTIFS($A:$A,$F2,B:B,1)>0,1,0)
COUNTIFS($A:$A,$F2,B:B,1)
This will count each occurrence when the value in column A matches your code $F2 AND the value in column B equals 1.
If that count is >0, then you know that at least one match was found and the IF will return 1, otherwise 0.

Count values for each row with a unique ID

I have a bunch of rows in a table. Each row reflects an event in a patient. However, one patient can have experienced multiple events, so it's possible for there to be multiple rows with the same patient number. Now I'd like to count the amount of male patients in my database, without counting the ones that had multiple events multiple times. Each patient is identified by a unique patient ID that could be used for this.
This shouldn't be all that complicated if not for the fact that I'm using a table that also has several filters, so I need to use SUBTOTAL for any counting functions.
I literally have no idea where to start, so I can't really provide any code...
Any function that could point me in the right direction would be greatly appreciated.
Thanks for the help.
~Laurens
Use a Pivot Table to filter and count all your patients database. Select your data and select Insert -> Tables -> Pivot Tables. Put your filters at the Filter section of the table and the Patient ID in the Rows section. Then, you can use COUNT to get the number of patients.
For more information about Pivot Tables, you can check this: https://support.office.com/en-us/article/Create-a-PivotTable-to-analyze-worksheet-data-a9a84538-bfe9-40a9-a8e9-f99134456576
To get the number of unique IDs in the same column, if the IDs are numeric, you can use SUM with FREQUENCY:
=SUM(IF(FREQUENCY($A$1:$A$1000,$A$1:$A$1000)>0,1))
If they're text and numbers mixed, you can get unique IDs with this one:
=SUM(IF(FREQUENCY(MATCH($A$1:$A$1000,$A$1:$A$1000,0),MATCH($A$1:$A$1000,$A$1:$A$1000,0))>0,1))
(From here)
Here you go
You've not mentioned whether an event is optional.
You might want to add extra column H with formula like h2=if(c2="",0,1) with 1/0 and multiply it as well in G.
Basically if column G contains a 1 you include it
Here's what the results of the formula look like:
Revision
Table is sorted by Patient id..
on change of patient id column H contains a 1, it'll be 0 otherwise.
So H2 is hard coded to 1, H3,H4,H6 will evaluate to 1.
So now G2=H2*E2 etc. You can filter by column H.
The beauty of mapping things into binary zeros and ones is you can do multiplication to achieve a logical AND result, whilst at the same time breaking a complex task into a series of steps. You can then apply a filter to the data to get the rows where column G are not zero, and see the totals count. Normally I'd insert a column between header and data on row 2 and then have G2=SUM(G3:G9)
Sum column H for number of patients.

Trying add up values but have multiple entries

I am trying to look up the value in one column and pull the number from another column.
Of course, I could use the simple V-lookup or Match.
However, the first column of data has multiple entries that are the same. If I Vlookup it is just going to pull the first number in the second column.
I need to pull each number from the second column and somehow add them together. Despite the fact I have multiple entries.
If there is a way to consolidate the multiple entries in 1st column while also summing up the numbers in the 2nd, that would be great.
I would recommend a Pivot Table. To create one, select a cell in your data range (which needs to have column names in the first row. Choose Insert / Pivot Table from the Ribbon and select the New Worksheet option for the location.
In the Pivot Table list on the new worksheet, drag the name of the first column to the Row Labels box and the name of the second column to the Values box. The name in the Values box should turn to Sum of <2nd column name>.
The Pivot Table will now show a sorted list of the column 1 values and the summed values of column 2. In the example, you'll see that
Does SUMIF do what you are looking for?

Excel function advanced filter

I have a list of sales people and a list of their sale revenues in two separate columns. How do I use an advanced filter or other sorting means to find the max of the sale revenue column and then have the formula output be the corresponding sales person?
Referencing this page:
http://www.techonthenet.com/excel/formulas/max.php
In this example, assuming column A was your list of salespeople and column B was your list of sales revenues...
=Max(B2:B6)
in any empty cell would return the highest sales revenue from column B.
I figured it out.
You first need to set a criteria. In the link's example, you would first (off to the side) make a small column that had a title of Value, then right under that put in the function =Max(B2:B6). Then click the "Advanced Filter" button. The data range would be the entire database, A1:B6. The criteria range is the new two row column you just made of Value and the Max formula. Then select and output range that will be big enough to hold your filtered data. in this case, a 2x2 grid will be enough. (Make sure to click the copy to new cell option at the top.)
The resulting filter will be the Date of the Max Value.
This is my first answer post on this site, so please let me know if I formatted it wrong.

Resources