change variable after collapse - statistics

I am working on repeated cross-sectional data and I want to create a variable recording the annual rate of change of marriages according to 5 different age groups. My starting dataset contains a binary variable taking the value 0 for "non married" and 1 for "married" as well as the individual age-group and year. I thought to collapse my data in order to retrieve the average group value for each age and individual
collapse (mean) married, by(year age)
year prop.married age
1992 .442865 27-31
1992 .75411 32-36
1992 .864637 37-41
1992 .900926 42-45
1993 .456798 27-31
etc.
How do i compute the annual change of rate for each age-group and possibly plot it (with a line for each age group over time)? I tried the following but it seems not to work:
bys age (year): gen cng= (prop.married[_n]-prop.married[_n-1]/prop.married[_n-1])
Thank in you in advance.

Related

How to do calculated values in Excel Pivot Table

I have a table like this:
Year Num Freq. Exam Grade Course
2014 102846 SM SM Astronomy 3
2015 102846 12,6 1,7 NC Astronomy 2
2017 102846 20 11,8 17 Astronomy 2
2015 102846 SM NC Defence Against the Dark Arts 4
2015 102846 11 4,5 NC Herbology 2
2015 102846 15 13,99 14 Herbology 2
I am trying to get the percentage of approved students (Grade >= 10) for each course by year and global average.
I've been trying for nearly 3 hours to do a calculated field but so far the only thing I could get was the sum of each student per year:
I have tried to do a calculated field with = Grade >= 10 hoping that it would give me a list of approved students but it gives me 1.
What am I doing wrong in here? It's my first time working with pivot tables.
I would really recommend to not mix string type (text) together with numbers. It's a horrifying idea and will cause a lot of headache when data will be used for calculations (both Freq. and Grade). Rather I would use 0 or some numeric value to represent the text.
Not recommended, but yes it's doable =)
You need some dummy variable to point out which row is number and which is text. So I created Grade Type. We can now count only the rows that have a number in the Grade column by using Grade Type = Number.
I create a table of the data and add the column Grade Type. I use this formula to get Grade Type:
=IF(ISNUMBER([#Grade]),"Number","Text")
I then create the following measures:
Nr of Approved Students
=COUNTX(FILTER(Table1, Table1[Grade Type]="Number"),
IF((VALUE(Table1[Grade])>=10),VALUE(Table1[Grade]),BLANK()))
First we filter which rows that should be evaluated (COUNTX(<table>,...)). If yes, then only count for rows that fulfill >=10, where VALUE() converts string number to numeric (COUNTX(...,<expression>)).
Nr of Student (w/ Grade Number)
=COUNTX(FILTER(Table1, Table1[Grade Type]="Number"), VALUE(Table1[Grade]))
Count all rows that have a number
Approved (% of Total)
=[Nr of Approved Students]/[Count of Grade]
Setup the PowerPivot Table
Create the PowerPivot and add the data to the data Model
Then create a new measure by clicking your pivot table and then "Measures" -> "New Measure..."
Fill in all the relevant data.
Result should be something like:

Why I am getting StatisticsError: no unique mode; found 2 equally common values while creating a pivot table?

Suppose I have this (randomic) df_bnb:
neighbourhood room_type price minimum_nights
0 Allen Pvt room 38 5
1 Arder Entire home/apt 90 2
2 Arrochar Entire home/apt 90 2
3 Belmont Shared Room 15 1
4 City Island Entire home/apt 100 3
Every row represents an Airbnb's booking.
I hope to generate a pivot_table in which Index is the column neighbourhood and columns are others data frame columns ['room_type', 'price', 'minimun_nights'].
I want entries of the abovementioned columns as mean, expect for room_type where I wish to have the mode. Like the following dataframe's example:
room_type price minimum_nights
Allen room type mode for Allen price mean for Allen mean min nights for Allen
Arder room type mode for Arder price mean for Arder mean min nights for Arder
Arrochar room type mode for Arrochar price mean for Arrochar mean of min nights for Arrochar
Belmont room type mode for Belmont price mean for Belmont mean of min nights for Belmont
City Island room type mode for City Island price mean fot City Is. mean of min nights for City Island
This is the code I try so far:
bnb_pivot = pd.pivot_table(bnb,
index = ['neighborhood'],
values = ['room_type', 'price',
'minimum_nights','number_of_reviews'],
aggfunc = {'room_type': statistics.mode,
'price' : np.mean,
'minimum_nights': np.mean,
'number_of_reviews': np.mean})
This is the error that I am getting:
StatisticsError: no unique mode; found 2 equally common values
I try to search for other sources, but I don't how to treat statistic.mode() while creating a pivot_table.
Many thanks in advance for any helpful indication!

How to Calculate Loan Balance at Any Given Point In Time Without Use of a Table in Excel

I'm trying to calculate the remaining balance of a home loan at any point in time for multiple home loans.
Its looks like it is not possible to find the home loan balance w/ out creating one of those long tables (example). Finding the future balance for multiple home loans would require setting up a table for ea. home (in this case, 25).
With a table, when you want to look at the balance after a certain amount of payments have been made for the home loan, you would just visually scan the table for that period...
But is there any single formula which shows the remaining loan balance by just changing the "time" variable? (# of years/mths in the future)...
An example of the information I'm trying to find is "what would be the remaining balance on a home loan with the following criteria after 10 years":
original loan amt: $100K
term: 30-yr
rate: 5%
mthly pmts: $536.82
pmts per yr: 12
I'd hate to have to create 25 different amortization schedules - a lot of copy-paste-dragging...
Thanks in advance!
You're looking for =FV(), or "future value).
The function needs 5 inputs, as follows:
=FV(rate, nper, pmt, pv, type)
Where:
rate = interest rate for the period of interest. In this case, you are making payments and compounding interest monthly, so your interest rate would be 0.05/12 = 0.00417
nper = the number of periods elapsed. This is your 'time' variable, in this case, number of months elapsed.
pmt = the payment in each period. in your case $536.82.
pv = the 'present value', in this case the principle of the loan at the start, or -100,000. Note that for a debt example, you can use a negative value here.
type = Whether payments are made at the beginning (1) or end (0) of the period.
In your example, to calculate the principle after 10 years, you could use:
=FV(0.05/12,10*12,536.82,-100000,0)
Which produces:
=81,342.32
For a loan this size, you would have $81,342.32 left to pay off after 10 years.
I don't like to post answer when there already exist a brilliant answer, but I want to give some views. Understanding why the formula works and why you should use FV as P.J correctly states!
They use PV in the example and you can always double-check Present Value (PV) vs Future Value (FV), why?
Because they are linked to each other.
FV is the compounded value of PV.
PV is the discounted value at interest rate of FV.
Which can be illustrated in this graph, source link:
In the example below, where I replicated the way the example calculate PV (Column E the example from excel-easy, Loan Amortization Schedule) and in Column F we use Excel's build in function PV. You want to know the other way... therefore FV Column J.
Since they are linked they need to give the same Cash Flows over time (bit more tricky if the period/interest rate is not constant over time)!!
And they indeed do:
Payment number is the number of periods you want to look at (10 year * 12 payments per year = 120, yellow cells).
PV function is composed by:
rate: discount rate per period
nper: total amount of periods left. (total periods - current period), (12*30-120)
pmt: the fixed amount paid every month
FV: is the value of the loan in the future at end after 360 periods (after 30 year * 12 payments per year). A future value of a loan at the end is always 0.
Type: when payments occur in the year, usually calculated at the end.
PV: 0.05/12, (12*30)-120, 536.82 ,0 , 0 = 81 342.06
=
FV: 0.05/12, 120, 536.82 , 100 000.00 , 0 = -81 342.06

SUM column multiplied by x, but only if y is less than x, if else multiply z

I have quite the tricky pickle; I have a piece of sales data which has Grand Value, Monthly Fee and Contract Term.
It looks like this.
Grand Value Monthly Fee Contract Term (months)
$100.00 $20.00 5
$120.00 $10.00 12
$120.00 $10.00 24
The first thing you might notice that the last entry value looks wrong; it isn't, it is the annual value of that sale, not the total value. It is calculated elsewhere as "est revenue" but that's irrelevant to the question.
What I need to do is get an accurate view on the current years value, not the total value over x number of years.
In layman terms, the query i'd like to write is "give me the product of multiplying the Monthly_Fee by Contract_Term by 12, but if less than 12 multiply by Contract_Term instead".
Currently the best query I have is
=SUM(Data!Monthly_Fee:Monthly_Fee)*12
Which just lazily multiplies the monthly fee by 12.
Any excel masters care to help?
If I understand you correctly, this should work:
=SUMPRODUCT(Data!B2:B4,--(Data!C2:C4<12),Data!C2:C4)+SUMPRODUCT(Data!B2:B4,--(Data!C2:C4>=12)*12)
(obviously change the cell references)

Averaging aggregated(SUM) values in Spotfire

I'm trying to Average aggregated(SUM) values, but my expression keeps doing weighted averages over the whole data set.
Table Structure
REGION SITE_ID MONTH QUANTITY
A 1 01 5
A 1 02 6
A 2 01 4
B 3 01 10
B 3 02 12
Expression
Avg(
Sum([quantity]) over (All([region]))/
UniqueCount([site_id]) over (All([region]))/
UniqueCount([month]) over (All([region]))
) over (All([region]))
To Clarify, I want to average A and B's Monthly Qty per Site
But I keep getting total qty divided by total no of site_ids divided by months
This really depends on where you are going to use it and what the REAL data looks like. This should get you started. Insert this calculated column.
SUM([QUANTITY]) OVER (Intersect([REGION],[MONTH])) / UniqueCount([REGION]) AS [AvgOverRegionByMonth]
This could be inaccurate depending on how the rest of your data looks like. You can also accomplish this in a cross table. The expressions for the Sum and Avg on the example below are as follows:
Sum([QUANTITY]) as [Sum], Sum([QUANTITY]) / Count([REGION]) as [Average]
EDIT
In order to ONLY get the average over the months, use this forumla:
AVG([QUANTITY]) OVER ([MONTH]) as [AvgOverMonth]
Here is what your data will look like:

Resources