Spreadsheet aggregation/manipulation - excel

I have a spreadsheet structured like
2005 Alameda total HS graduates 1234
2005 Alameda UC enrollees 112
2006 Alameda total HS graduates 892
2006 Alameda UC enrollees 84
...
2009 Yolo total HS graduates 1300
2009 Yolo UC enrollees 93
and so on for every CA county for several years.
I want to generate a spreadsheet like this:
county 2005 2006 ...
Alameda 11.1% 9%
Alpine 7% 8%
...
Yolo 5.5% 4%
i.e. I want to project the years from rows to columns and have a row for each county, then divide the number of graduates (the data from each odd-numbered row in the original sheet) by the number of UC enrollees (even-row data) for each year, and insert it in the appropriate cell.
This would be easy enough for me to do in Java, but I want to get a feel for what's possible just using excel/Google sheets alone - how might I go about accomplishing this?

Assuming the counties are sorted, and they start in cell B2, enter =B2 in cell F2, and enter the following in F3:
=INDIRECT("B"&COUNTIF(B3:B$9999,"<="&F2)+ROW())
You can change 9999 based on the number of records, but it's fine as-is.
Copy F3 down as many rows as are needed:
You can then calculate percentages using SUMPRODUCT:
=IFERROR(
SUMPRODUCT(($A$2:$A$100=G$1)*
($B$2:$B$100=$F2)*
($C$2:$C$100="UC enrollees")*
$D$2:$D$100
)
/
SUMPRODUCT(($A$2:$A$100=G$1)*
($B$2:$B$100=$F2)*
($C$2:$C$100="total HS graduates")*
$D$2:$D$100
),
"")
The first SUMPRODUCT totals UC enrollees that match the year and county. The second SUMPRODUCT does the same for HS graduates. The results are divided, and IFERROR handles divide-by-zero errors for missing data.
Since your example shows percentages, I assume you want to divide UC enrollees by HS graduates, and not the other way around. Either way, I don't get the same totals as you, so let me know if I misunderstood.

Here is the pivot table way of doing it for comparison.
They are many ways of doing this but I've added column headers and chosen to use this formula to put percentages in even rows of column E and zeroes in odd rows in sheet 1:-
=IF(ISEVEN(ROW()),D3/D2*100,0)
Then I've inserted a pivot table in sheet 2 referring to my data in sheet 1 and set up the fields as shown and it's pretty automatic:-

Related

Is there a way to distribute data according to a logic in Excel vba?

I have an Excel sheet with the below data.
There are 10,000 Data rows.
9000 are of "USA" & 1000 are of "Other" country.
I want to evenly distribute the data so that when I have 9 "USA" followed by 1 "Other" data distributed throughout.
Name
Country
Alice
USA
Brook
Other
Cathy
USA
David
USA
Esther
Other
Freddy
USA
Galin
USA
Henry
Other
Indigo
USA
Jenny
USA
Kalin
Other
Linda
USA
How do I accomplish this using manual & excel VBA? Appreciate both solutions. Thanks
This can be achieved with a formula if you have the newest version of Excel.
Try something like (adapt ranges and what you are filtering on as necessary):
=LET(x, FILTER($B$1:$C$12, $C$1:$C$12="a"),
y, FILTER($B$1:$C$12, $C$1:$C$12="b"),
z, ROW(D1:D12), myrows, MAX(z),
ratio, MAX((COUNTA(x)/2)/(COUNTA(y)/2), (COUNTA(y)/2)/(COUNTA(x)/2))+1,
IF(MOD(z,ratio)<>0,
INDEX(x, IF(MOD(SEQUENCE(myrows),ratio)=0, 0, SEQUENCE(myrows)-CEILING(ROW(G1:G12)/ratio-1,1)), SEQUENCE(1,2)),
INDEX(y, IF(MOD(SEQUENCE(myrows),ratio)<>0,0,SEQUENCE(myrows)/ratio), SEQUENCE(1,2))))
For example:
The trick is to create the "correct" sequence for each result; for the first array you want to skip every nth row (in your case 10), and having the nth+1 row not default to n+1, but n, while in the second array you want to skip every row that isn't a some multiple of n, and have the nth rows count sequentially.
A caveat-- as is, I don't believe the formula will work with repetition other than 1, i.e. if you want to do something like 8 rows followed by 2 rows, this won't work.
This works even with older Excel versions:
If this is your data:
Add a Sort column with the following formula in C2 and pull it down:
=IF(B2="USA",COUNTIF($B$2:B2,"USA")+INT((COUNTIF($B$2:B2,"USA")-1)/ROUNDUP(COUNTIF(B:B,"USA")/(COUNTA(B:B)-COUNTIF(B:B,"USA")),0)),COUNTIF($B$2:B2,"Other")*(ROUNDUP(COUNTIF(B:B,"USA")/(COUNTA(B:B)-COUNTIF(B:B,"USA")),0)+1))
Then sort by this column C and USA and Other are evenly spread:

Merge multiple rows based on column & sum time values in Excel for MAC

I have a report pulled from KRONOS daily and emailed to me which has 4 columns. The names are down the first column, employee ID, type of pay, hours to pay in that order. I just need to combine the hours so that there is one value per row and just one name. I tried using an easy pivot table but that failed for 2 reasons.
I couldn't figure out how to sum duration of hours in 00:00 format
(tried every single option in field settings) either got N/a or the count of instances.
Even if it worked, I would like to figure out how to exclude the 1.00 hour penalty from the type of pay column indicated by "LP CA Meal Penalty"
I tried to paste the actual tables here but it wouldnt let me so below is my best attempt to replicate issue for one employee.
Curt, Cathy 90066408 LP CA Meal Penalty 1.00
Curt, Cathy 90066408 LP Overtime 1.77
Curt, Cathy 90066408 LP Regular 8.00
Result desired:
Curt, Cathy 90066408 = 9.77
Do Not want: Curt, Cathy 90066408 = 10.77
Assuming your data is in Column A:D and result needs to be displayed from Column G,
To get all the unique Names from Column A enter the following formula in Cell G2
=IFERROR(INDEX($A$2:$A$9,MATCH(0,INDEX(COUNTIF($G$1:G1,$A$2:$A$9),0,0),0)),"")
Drag/Copy down as required. Change range $A$2:$A$9 as per data in Column A.
Now, to get corresponding Employee ID, enter the following formula in Cell H2
=VLOOKUP(G2,A2:D9,2)
or
=INDEX($B$2:$B$9,MATCH(G2,$A$2:$A$9,0))
Finally, to get Total Hours, enter the below formula in Cell I2
=SUMIFS($D$2:$D$9,$A$2:$A$9,G2,$C$2:$C$9,"<>LP CA Meal Penalty")
or
=SUMPRODUCT(($D$2:$D$9)*($A$2:$A$9=G2)*($C$2:$C$9<>"LP CA Meal Penalty"))
See image for reference.
EDIT :
=SUMIFS($D$2:$D$9,$A$2:$A$9,G2,$C$2:$C$9,"<>LP CA Meal Penalty",$C$2:$C$9,"<>LP CA Rest Break Penalty")
or
=SUMPRODUCT(($D$2:$D$9)*($A$2:$A$9=G2)*($C$2:$C$9<>"LP CA Meal Penalty")*($C$2:$C$9<>"LP CA Rest Break Penalty"))
Seems like all you need to do is filter by your third column and exclude in the filter the "LP CA Meal Penalty".
Attached are examples of my result and my filter selection on my pivot table.
Result gotten
Filter excluding the unwanted work hour type from the sum
Let me know if this works for you.
You can use SUMIFS as the example I show you above and the formula to use is:
=SUMIFS($D$2:$D$9,$A$2:$A$9,F2,$C$2:$C$9,"<>"&"LP CA Meal Penalty")
This will exclude LP CA Meal Penalty from the computation.
Also, for the time format, the requirement is not so clear but this is hh:mm what I understand that you want. The formula you can use is:
=TEXT(H2/24,"hh:mm")
Please tweak the range that fits your data structure and let me know if this is what you are asking for.

Excel: how to NOT sum based on text values?

I am trying to sum values based on another text column.
Let's say my data look like this:
date item amount
4/3/03 book 100
8/3/05 rent 1090
5/6/06 food 5
2/7/09 repair 390
8/3/10 rent 1090
so I want to sum all the spendings (amount) when the "item" section is NOT equal to "rent", but I dont just want a grand sum, I just want a sum that's up to that date.
So the desired output (last column, subtotal) should look something like this:
date item amount subtotal
4/3/03 book 100 100
8/3/05 rent 1090 100
5/6/06 food 5 105
2/7/09 repair 390 495
8/3/10 rent 1090 495
I've tried to sum it up while filtering the rolls to only show anything but rent, but when I clear the filter, all the numbers will sum INCLUDING rent.
I've also tried using SUMIF (I named the top cell in "Amount" column as "first_amount"):
=SUMIF(C2,"rent",first_amount:E2)
But I dont think I'm using it correctly or maybe it just doesn't work. It shows no value whatsoever.
I found this post and read through the function pages, but still am not being able to do what I wanted to do:
Excel summing values based on multiple conditions
BONUS:
What if I want to exclude both "rent" and "food"?
I'm sure there's a very simple solution out there that I am just too dumb to think of. Any hint/help is truly appreciated!
Assuming A1 is the first cell make D1:
=SUMIF($B$2:B2, "<>rent", $C$2:C2)

Lookup and replace specific values with reference in pivot table

If I'm trying to find specific names in a list given from my pivot table such as -
Row Labels Revenues Order #
Panera 25 0
Pasta 15
Salad 10
Olive Garden 40 0
Sandwich 20
Pasta 20
Panda Express 30 0
Rice 15
Chicken 15
And I want to search through my document, find Olive Garden and Panda Express and I wanted to replace the 0 in the order # column with 10 for Olive Garden and 20 for Panda Express. Currently, someone here helped me out with
=IF(IFERROR(VLOOKUP(A9,worksheet!K:K,1,FALSE),"")="","",0)
which inserts 0's for the headers and blanks for the orders in the 'Order #' column, can I add a second formula that would find the names and replace the value in that column? Or adjust the current formula?
Quick note - order # column is not from the pivot table.
To make it more clear, - I am getting data from an external source (i.e. paper invoices), as opposed to making a manual entry to adjust the 0's in the order # column, I would like to tell VBA/Excel - "hey Olive Garden's order number is 10 and Panda Express's order number changed to 20, adjust".
this is my end goal -
Row Labels Revenues Order #
Panera 25 0
Pasta 15
Salad 10
Olive Garden 40 10
Sandwich 20
Pasta 20
Panda Express 30 20
Rice 15
Chicken 15
If you have a range with the restaurant names in one column and the order numbers in the next column (say columns X and Y of the sheet called "worksheet"), you could change your formula to be
=IF(IFERROR(MATCH(A9,worksheet!K:K,0),"")="","",IFERROR(VLOO‌​KUP(A9,worksheet!X:Y‌​,2,FALSE),0))
(P.S. Changed the original VLOOKUP to MATCH based on useful feedback from teylyn.)
FWIW, that formula would be better using MATCH, not VLookup, since it's returning the value from the first column.
But back on track: what are you trying to achieve? Change the values in a pivot table?
First, a formula cannot change values in another cell.
Second, a pivot table reports on existing data. You can't change the numbers that a pivot table reports.
You will need to re-think your approach. If you don't like the numbers the pivot table returns, you'll need to change the underlying source data.

Conditional averages (AVERAGEIF, AVERAGEIFS, or other option?)

Perhaps it's just been a long week, but I can't think of how to get a pretty simple average.
Here's my data (two columns):
1/3/1994 1165
1/4/1994 1162
1/5/1994 1133
1/6/1994 1133
1/7/1994 1138
1/10/1994 1143
1/11/1994 1118
1/12/1994 1150
1/13/1994 1171
1/14/1994 1177
1/17/1994 1161
1/18/1994 1162
1/19/1994 1121
1/20/1994 1112
1/21/1994 1129
1/24/1994 1136
1/25/1994 1124
1/26/1994 1118
1/27/1994 1127
1/28/1994 1133
1/31/1994 1088
2/1/1994 1055
2/2/1994 1051
2/3/1994 1071
2/4/1994 1079
2/7/1994 1054
2/8/1994 1079
2/9/1994 1079
2/10/1994 1089
2/11/1994 1074
2/14/1994 1083
2/15/1994 1068
2/16/1994 1075
2/17/1994 1071
As you can see, it's a column of dates (that continue until Sept. 9 2015, so it's long), and another of price. I am just trying to get the averages for January each month, of each year (i.e. Jan 1994, 1995, 1996 ... 2015, then Feb 1994, etc).
Here's the table I plan on using the formula in:
2007 2008 2009 2010 2011
January
February
March
April
So, in the cell right of "January" and below "2007", I want the average of prices that are in Jan, 2007.
I tried using this (again, my data starts in A1 and B1):
=AverageIfs(B:B,year(A:A),1994,month(A:A),1) (regular and as array), but it doesn't work - I keep getting the error "The formula you typed contains an error." (I'd really prefer this to be a formula, rather than a VB solution)
Thanks for any ideas!
Edit: In the mean time, I have created two helper columns, that are just the Month() and Year() of each row of data. Then I can use =AverageIfs(B:B,[month helper range],1,[year helper range],2007). Is there a way to do this without a helper column though?
Try this
=AVERAGE(IF(YEAR(A:A)=1994,IF(MONTH(A:A)=1,B:B,""),""))
entered as an array formula (CTRL-SHIFT-ENTER). If you want to use the month as text you could use
=AVERAGE(IF(YEAR(A:A)=1994,IF(TEXT(A:A,"mmmm")="January",B:B,""),""))
Hope that helps
assuming your data has a header: "Date" and "Price" in cells A1, B1.
assuming your data begins in A2 = "1/3/1994" and B2 = 1165
C1 = "Month"
D1 = "Year"
C2 = =TEXT(A2,"Mmmm")
D2 = =YEAR(A2)
Copy Cells C2+D2 down ...
I place your new table in:
H2 = "January"
H3 = "February"
... etc.
I1 = 1994
J1 = 1995
... etc.
I2 = =AVERAGEIFS($B:$B,$C:$C,$H2,$D:$D,I$1)
and copy that formula throughout the table.
Cheers!
Yes, you can use AVERAGEIFS() and you should. This is about a thousand times faster than the accepted answer:
=AVERAGEIFS(B:B,A:A,">="&DATE(1994,1,1),A:A,"<"&DATE(1994,2,1))
You can even do it this way for a more concise formula, but I believe it raises problems for non-USA users because of their date format settings:
=AVERAGEIFS(B:B,A:A,">=1/1/94",A:A,"<2/1/94")
I don't know if this is the most concise solution, but it works. You can use SUMPRODUCT as follows:
=SUMPRODUCT((MONTH($A:$A)=1)*(YEAR($A:$A)=1994)*$B:$B)/SUMPRODUCT((MONTH($A:$A)=1)*(YEAR($A:$A)=1994))
What this is essentially doing is summing the values in column B based on the two criteria, and then counting the number of rows that matched the criteria and dividing by that number.
For each row, the MONTH and YEAR conditions evaluate to either 1 (true) or 0 (false) and then those two values are multiplied with the value in column B, resulting in column B's value if both conditions are true, or 0 if one or both conditions are false.
This solution requires to use in the "table with the results" the number of the month instead of the names of the month
It also assumes that the "table with the results" starts at F2 (see picture)
Then use this formula:
=IFERROR(AVERAGEIFS($B:$B,$A:$A,">="&DATE(G$2,$F3,1),$A:$A,"<="&EOMONTH(DATE(Q$19,$P20,1),0)),"N/A")
The formula shows “N/A” if there are no prices for the period (Year/Month), if you want to see blank then replace it with “”
Small changes done to your sample data to work with several periods

Resources