VBA solution of VIF factors [EXCEL] - excel

I have several multiple linear regressions to carry out, I am wondering if there is a VBA solution for getting the VIF of regression outputs for different equations.
My current data format:
i=1
Year DependantVariable Variable2 Variable3 Variable4 Variable5 ....
2009 100 10 20 -
2010 110 15 25 -
2011 115 20 30 -
2012 125 25 35 -
2013 130 25 40 -
I have the above table, with the value of i determining the value of the variables (essentially, different regression input tables in place for every value of i)
I am looking for a VBA that will check every value of i (stored in a column), calculate the VIF for every value of i and output something like below
ivalue variable1VIF variable2VIF ...
1 1.1 1.3
2 1.2 10.1

Related

Remove n rows and iterate it n times in dataframe

I have 31 million of values in txt file. I need to remove values between 21600 to 61200, which I did through the code below and now I have to use this logic to remove for every 86400 values between above specified ones. This means remove values between 21600+86400 to 61200+86400, then remove 21600+86400+86400 to 61200+86400+86400 and so on applying same logic until the end of data. I tried many options, even using linked list, but I could not apply it to my large dataset. How shall it be done?
Visual example for values 1 to 24, remove values from 6 to `17:
1 2 3 4 5 6 - - - - - - - - - - 17 18 19 20 21 22 23 24
then apply to the next set of rows who follow this structure as below (start 6+24=30 and stop 17+24=41):
25 26 27 28 29 30 - - - - - - - - - - 41 42 43 44 45 46 47 48
and so on until the end of data (remove between 30+24 and 41+24 for the next set).
I limited the code below for the first 250000 of values for simplicity.
import numpy as np
import pandas as pd
sample = np.arange(0, 259201, 1).tolist()
df = pd.DataFrame(sample)
df = df.drop(df.index[21601:61200])
Basically, I need to apply something like this below, but I am not sure how to do it for my case.
for day in reverse(range(366)):
df.drop(df.index[21601+day*86400:61200+day*86400])
You can use the modulo operator to do so (% symbol in python and pandas).
Here is how your last piece of code can be re-written:
df[~(df.index.to_series() % 86400).between(21601, 61200)]
I used to_series() because between() is not defined for Index objects.

Parsing Data Output in Python

So I have this code:
si.get_stats("aapl")
which returns this junk:
0 Market Cap (intraday) 5 877.04B
1 Enterprise Value 3 966.56B
2 Trailing P/E 15.52
3 Forward P/E 1 12.46
4 PEG Ratio (5 yr expected) 1 1.03
5 Price/Sales (ttm) 3.30
6 Price/Book (mrq) 8.20
7 Enterprise Value/Revenue 3 3.64
8 Enterprise Value/EBITDA 6 11.82
9 Fiscal Year Ends Sep 29, 2018
10 Most Recent Quarter (mrq) Sep 29, 2018
11 Profit Margin 22.41%
12 Operating Margin (ttm) 26.69%
13 Return on Assets (ttm) 11.96%
14 Return on Equity (ttm) 49.36%
15 Revenue (ttm) 265.59B
16 Revenue Per Share (ttm) 53.60
17 Quarterly Revenue Growth (yoy) 19.60%
18 Gross Profit (ttm) 101.84B
19 EBITDA 81.8B
20 Net Income Avi to Common (ttm) 59.53B
21 Diluted EPS (ttm) 11.91
22 Quarterly Earnings Growth (yoy) 31.80%
23 Total Cash (mrq) 66.3B
24 Total Cash Per Share (mrq) 13.97
25 Total Debt (mrq) 114.48B
26 Total Debt/Equity (mrq) 106.85
27 Current Ratio (mrq) 1.12
28 Book Value Per Share (mrq) 22.53
29 Operating Cash Flow (ttm) 77.43B
30 Levered Free Cash Flow (ttm) 48.42B
31 Beta (3Y Monthly) 1.21
32 52-Week Change 3 5.27%
33 S&P500 52-Week Change 3 4.97%
34 52 Week High 3 233.47
35 52 Week Low 3 150.24
36 50-Day Moving Average 3 201.02
37 200-Day Moving Average 3 203.28
38 Avg Vol (3 month) 3 38.6M
39 Avg Vol (10 day) 3 42.36M
40 Shares Outstanding 5 4.75B
41 Float 4.62B
42 % Held by Insiders 1 0.07%
43 % Held by Institutions 1 61.16%
44 Shares Short (Oct 31, 2018) 4 36.47M
45 Short Ratio (Oct 31, 2018) 4 1.06
46 Short % of Float (Oct 31, 2018) 4 0.72%
47 Short % of Shares Outstanding (Oct 31, 2018) 4 0.77%
48 Shares Short (prior month Sep 28, 2018) 4 40.2M
49 Forward Annual Dividend Rate 4 2.92
50 Forward Annual Dividend Yield 4 1.51%
51 Trailing Annual Dividend Rate 3 2.72
52 Trailing Annual Dividend Yield 3 1.52%
53 5 Year Average Dividend Yield 4 1.73
54 Payout Ratio 4 22.84%
55 Dividend Date 3 Nov 15, 2018
56 Ex-Dividend Date 4 Nov 8, 2018
57 Last Split Factor (new per old) 2 1/7
58 Last Split Date 3 Jun 9, 2014
This is a third party function, scraping data off of Yahoo Finance. I need something like this
def func( si.get_stats("aapl") ):
**magic**
return Beta (3Y Monthly)
Specifically, I want it to return the number assocaited with Beta, not the actual text.
I'm assuming that the function call returns a single string or list of strings for each line in the table and is not writing to the stdout.
To get the value associated with Beta (3Y Monthly) or any of the other parameter names:
1) If the return is a single string with formatting to print as the table above it should have \n at the end of each line. So you can split this string to a list then iterate over to find the parameter name and split again to fetch the numeric associated with it
# Split the single formatted string to a list of elements, each element
# is one line in the table
str_lst = si.get_stats("aapl").split('\n')
for line in str_lst:
# change Beta (3Y Monthly) to any other parameter required.
if 'Beta (3Y Monthly)' in line:
# split this line with the default split value of white space
# this should provide a list of elements split at each white space.
# eg : ['31', 'Beta', '(3Y', 'Monthly)', '1.21'], the numeric value is the
# last element. Strip to remove trailing space/newline.
num_value_asStr = line.split()[-1].strip()
return num_value_asStr
2) If it already a list that is returned then just iterate over the list items and use the if condition as above and split the required list element to get the numeric value associated with the parameter.
str_lst = si.get_stats("aapl")
for line in str_lst:
# change Beta (3Y Monthly) to any other parameter required.
if 'Beta (3Y Monthly)' in line:
# split this line with the default split value of white space
# this should provide a list of elements split at each white space.
# eg : ['31', 'Beta', '(3Y', 'Monthly)', '1.21'], the numeric value is the
# last element. Strip to remove trailing space/newline.
num_value_asStr = line.split()[-1].strip()
return num_value_asStr

Aging Bucket DAX formula issue

I have a table as stated below,
MEM_ID dateDiff
4522 10
111 1
1112 -1
1232 5
121135 20
145 30
12254 60
I want a Dax formula which will give the output as stated below under measure column as
MEM_ID dateDiff Measure
4522 10 0-15 Days
111 1 0-15 Days
1112 -1 <0 Days
1232 5 0-15 Days
121135 20 15-30 Days
145 30 15-30 Days
12254 60 >60 Days
I have used this formula which didnt worked, any help much appriricated =IF(MAX([DateDiff]) <= 1, "0", IF(MAX([DateDiff])>=1 && MAX([DateDiff])<15,"1-15 Days",IF(MAX([DateDiff])>=15 && MAX([DateDiff])<30,"15-30 Days",IF(MAX([DateDiff])>=30 && MAX([DateDiff])<60,"30-60 Days",IF(MAX([DateDiff])>=60 && MAX([DateDiff])<90,"60-90",BLANK())))))
Using a lookup-table like Ian Ash suggests is better, but if you must use an IF formula, Try the formula you have, but delete the MAX functions.
=IF([DateDiff]< 1, "0",
IF([DateDiff]>=1 && [DateDiff]<15,"1- 15 days",
IF([DateDiff]>=15 && [DateDiff]<30,"15-30 Days",
IF([DateDiff]>=30 && [DateDiff]<60,"30-60 Days",
IF([DateDiff]>=60 && [DateDiff]<90,"60-90",
BLANK())))))
I would solve this problem by creating a look up table as follows.
Create a new worksheet called Lookup, then starting from A1 add the following data:
Min Max Bucket Description
-1000 0 1 <0 Days
1 15 2 0 - 15 Days
16 30 3 16 - 30 Days
You can add additional rows if you need to add more buckets. For example, to create a bucket for 30 to 60, you would add the row:
31 60 4 30 - 60 Days
Once you have the lookup table defined, you can reference into it using the following formula from your main worksheet:
=OFFSET(Lookup!$A$1,SUMPRODUCT((B2>=Lookup!$A$2:$A$4)*(B2<=Lookup!$B$2:$B$4)*(Lookup!$C$2:$C$4)),3)
In the above formula, the value being looked up is in B2.
If you have added rows to your lookup table, you will need to extend the look up range in the formula, i.e. Lookup!$A$2:$A$4 changes to Lookup!$A$2:$A$5 and so on.

Sum of diagonal products

I am looking to get the sumproduct but only for specific diagonals in an array. My setup is like below and the yellow highlighting should give an idea of how the formula should calculate
As text:
Years Rates 0 1 2 3
25 0.16 25 24 23 22
26 0.11 26 25 24 23
27 0.12 27 26 25 24
28 0.13 28 27 26 25
29 0.17 29 28 27 26
30 0.16 30 29 28 27
Years Sum of products
25
26
27
28
29
30
Note, the table on the right dictates how many years to include, so if the table were extended to include 4 years then 0.17*4 would need to be included in the sum product for 25
What is the best way to do this? Ideally not a CSE formula/ VBA. The actual table is much bigger, so I might need to be conscious of speed too.
I intend to edit this with what I came up with but I hope to see some different ways of doing this so I hope it's okay that I hold off for now.
Simply:
=MMULT(G4:J4,B7:B10)
Regards
You could give this CSE a try, maybe it's not as bad (even though you don't want one)
=SUMPRODUCT(B7:B10,TRANSPOSE(G4:J4))
I think a 'CSE' formula will be best even though you'd prefer not to.
With the first formula in B11 and the setup as in your image, (with the 0, 1, 2, 3 in D1:G1, the word "Rates" in B1, and the array in D2:G7 etc)
{=SUM(IF($D$2:$G$7=A11, $D$1:$G$1*$B$2:$B$7, 0))}
and drag down
This is the best way I can find, without using a CSE formula
=SUMPRODUCT(--($C$2:$F$7=$A11),$B$2:$B$7*$C$1:$F$1)
The first array is n x m in size and the second array is the product
of a n x 1 and 1 x m array, which is converted to an n x m
array. This provides SUMPRODUCT with two identically sized arrays as required.

Daily and Hourly Averages from (m/d/yyyy h:mm) timestamps in Excel

I have an Excel 2007 spreadsheet with date entries in this format m/d/yyyy h:mm (one cell). I would like find the hourly and daily average of all the columns of this spreadsheet and save each time aggregation to a new worksheet.
The data is recorded every ~10 minutes, but throughout the dates of data collection there was some time slips. Not every hour has the same number of rows. Also, the ending minute is either 0 or 6 depending on the time correction.
What would be a good way to approach this task within Excel 2007? It seems like this might be possible with a pivot table if I can create a formula that will select the correct range for the timestamps. Thanks.
For example, an date time entry in TIMESTAMP, 10/31/2012 0:06 which is in one cell.
TIMESTAMP Month Day Year Hour Min Rain_mm Rain_mm_2 AirTC AirTC_2 FuelM FuelM_2 VW ... there are ~16 variables (total) after the data time
10/31/2012 0:06 10 31 2012 0 06 0 0 26.11 26.08 2.545 6.4 0.049
10/31/2012 0:16 10 31 2012 0 16 0 0 25.98 25.97 2.624 6.6 0.049
10/31/2012 0:26 10 31 2012 0 26 0 0 24.32 23.33 2.543 6.5 0.048
10/31/2012 0:36 10 31 2012 0 36 0 0 24.32 23.33 2.543 6.5 0.048
10/31/2012 0:46 10 31 2012 0 46 0 0 24.32 23.33 2.543 6.5 0.048
10/31/2012 0:56 10 31 2012 0 56 0 0 25.87 25.87 2.753 7.3 0.049
10/31/2012 1:06 10 31 2012 0 06 0 0 25.74 25.74 2.879 8.1 0.051
## The above is just over one hour of collection on one day ##
...
## Different Day ### Notice Missing Time Stamp
11/30/2012 0:00 11 30 2012 0 06 0 0.1 26.12 26.18 2.535 6.4 0.049
11/30/2012 0:10 11 30 2012 0 16 0 0.1 25.90 25.77 2.424 6.6 0.049
11/30/2012 0:20 11 30 2012 0 26 0.1 0.2 24.12 24.43 2.542 6.4 0.046
11/30/2012 0:30 11 30 2012 0 36 0.1 0 24.22 22.32 2.543 6.5 0.048
11/30/2012 0:50 11 30 2012 0 56 0.1 0.2 26.77 25.87 2.743 6.3 0.049
11/30/2012 1:00 11 30 2012 0 06 0 0 24.34 24.77 2.459 5.1 0.050
## so forth on so on ##
After clarification of the requirement for daily averages edited to cover both daily and hourly averages:
Add a column (here B) for ‘H’ (ie hour) with =HOUR(A2) copied down.
(Note: Though formatted to show only m/d/y content of ColumnA is, in line with title, assumed to be all of mm/dd/yyyy hh:mm. Makes existing columns [with names jumbled] Month, Day, Year, Hour redundant).
Select data range.
Data, Subtotal, At each change in: TIMESTAMP, Use function: Average, Add subtotal to: check only columns G and to the right, OK.
Uncheck Replace current subtotals in Subtotal and apply At each change in: H, Use function: Average, and Add subtotal to: as before, OK.
Replace =SUBTOTAL(1, in Min column with =MIN( .
Delete ‘spare’ Grand Average row.
Reformat as required.
Hopefully this will be achieved and is what is required!:
Note midnight 'tonight' is counted as within first hour of tomorrow.
I had a similar need and worked it out this way:
Add a column for Date (assuming your dd/mm/yyyy hh:mm:ss data is in cell A2)
=DATE(YEAR(A2),MONTH(A2),DAY(A2))
Add a column for Year. If you have weeks from a single year, the year column can be neglected.
=YEAR(A2)
Add a column for Week Number
=WEEKNUM(A2)
Add 2 pivot tables, 1 for daily and 1 for weekly analysis.
Choose fields "Date" and the quantities you want. Put "Date" in the Rows section and sum/average of values in the Values section. You will get a date wise sum/average of the values you need.
In the weekly pivot table, do the same as above, just add "Year" and "Week no" in the Rows section instead of "Dates" as in above.
Hope this helps

Resources