Parsing Data Output in Python - python-3.x

So I have this code:
si.get_stats("aapl")
which returns this junk:
0 Market Cap (intraday) 5 877.04B
1 Enterprise Value 3 966.56B
2 Trailing P/E 15.52
3 Forward P/E 1 12.46
4 PEG Ratio (5 yr expected) 1 1.03
5 Price/Sales (ttm) 3.30
6 Price/Book (mrq) 8.20
7 Enterprise Value/Revenue 3 3.64
8 Enterprise Value/EBITDA 6 11.82
9 Fiscal Year Ends Sep 29, 2018
10 Most Recent Quarter (mrq) Sep 29, 2018
11 Profit Margin 22.41%
12 Operating Margin (ttm) 26.69%
13 Return on Assets (ttm) 11.96%
14 Return on Equity (ttm) 49.36%
15 Revenue (ttm) 265.59B
16 Revenue Per Share (ttm) 53.60
17 Quarterly Revenue Growth (yoy) 19.60%
18 Gross Profit (ttm) 101.84B
19 EBITDA 81.8B
20 Net Income Avi to Common (ttm) 59.53B
21 Diluted EPS (ttm) 11.91
22 Quarterly Earnings Growth (yoy) 31.80%
23 Total Cash (mrq) 66.3B
24 Total Cash Per Share (mrq) 13.97
25 Total Debt (mrq) 114.48B
26 Total Debt/Equity (mrq) 106.85
27 Current Ratio (mrq) 1.12
28 Book Value Per Share (mrq) 22.53
29 Operating Cash Flow (ttm) 77.43B
30 Levered Free Cash Flow (ttm) 48.42B
31 Beta (3Y Monthly) 1.21
32 52-Week Change 3 5.27%
33 S&P500 52-Week Change 3 4.97%
34 52 Week High 3 233.47
35 52 Week Low 3 150.24
36 50-Day Moving Average 3 201.02
37 200-Day Moving Average 3 203.28
38 Avg Vol (3 month) 3 38.6M
39 Avg Vol (10 day) 3 42.36M
40 Shares Outstanding 5 4.75B
41 Float 4.62B
42 % Held by Insiders 1 0.07%
43 % Held by Institutions 1 61.16%
44 Shares Short (Oct 31, 2018) 4 36.47M
45 Short Ratio (Oct 31, 2018) 4 1.06
46 Short % of Float (Oct 31, 2018) 4 0.72%
47 Short % of Shares Outstanding (Oct 31, 2018) 4 0.77%
48 Shares Short (prior month Sep 28, 2018) 4 40.2M
49 Forward Annual Dividend Rate 4 2.92
50 Forward Annual Dividend Yield 4 1.51%
51 Trailing Annual Dividend Rate 3 2.72
52 Trailing Annual Dividend Yield 3 1.52%
53 5 Year Average Dividend Yield 4 1.73
54 Payout Ratio 4 22.84%
55 Dividend Date 3 Nov 15, 2018
56 Ex-Dividend Date 4 Nov 8, 2018
57 Last Split Factor (new per old) 2 1/7
58 Last Split Date 3 Jun 9, 2014
This is a third party function, scraping data off of Yahoo Finance. I need something like this
def func( si.get_stats("aapl") ):
**magic**
return Beta (3Y Monthly)
Specifically, I want it to return the number assocaited with Beta, not the actual text.

I'm assuming that the function call returns a single string or list of strings for each line in the table and is not writing to the stdout.
To get the value associated with Beta (3Y Monthly) or any of the other parameter names:
1) If the return is a single string with formatting to print as the table above it should have \n at the end of each line. So you can split this string to a list then iterate over to find the parameter name and split again to fetch the numeric associated with it
# Split the single formatted string to a list of elements, each element
# is one line in the table
str_lst = si.get_stats("aapl").split('\n')
for line in str_lst:
# change Beta (3Y Monthly) to any other parameter required.
if 'Beta (3Y Monthly)' in line:
# split this line with the default split value of white space
# this should provide a list of elements split at each white space.
# eg : ['31', 'Beta', '(3Y', 'Monthly)', '1.21'], the numeric value is the
# last element. Strip to remove trailing space/newline.
num_value_asStr = line.split()[-1].strip()
return num_value_asStr
2) If it already a list that is returned then just iterate over the list items and use the if condition as above and split the required list element to get the numeric value associated with the parameter.
str_lst = si.get_stats("aapl")
for line in str_lst:
# change Beta (3Y Monthly) to any other parameter required.
if 'Beta (3Y Monthly)' in line:
# split this line with the default split value of white space
# this should provide a list of elements split at each white space.
# eg : ['31', 'Beta', '(3Y', 'Monthly)', '1.21'], the numeric value is the
# last element. Strip to remove trailing space/newline.
num_value_asStr = line.split()[-1].strip()
return num_value_asStr

Related

Create date ranges from an array of dates

Let's say I have below array of dates (not necessarily sorted):
import numpy as np
np.array(["2000Q1", "2000Q2", "2000Q3", "2000Q4", "2001Q1", "2001Q2", "2001Q3", "2001Q4", "2002Q1",
"2002Q2", "2002Q3", "2002Q4", "2003Q1", "2003Q2", "2003Q3", "2003Q4", "2004Q1", "2004Q2", "2004Q3",
"2004Q4", "2005Q1", "2005Q2", "2005Q3", "2005Q4", "2006Q1", "2006Q2", "2006Q3", "2006Q4", "2007Q1",
"2007Q2", "2007Q3", "2007Q4", "2008Q1", "2008Q2", "2008Q3", "2008Q4", "2009Q1", "2009Q2", "2009Q3",
"2009Q4"])
From this I want to create a DataFrame with 2 columns for start-date and end-date, where this dates corresponds to the starting date of a date range and ending date for that date rage spanning 4 years. This will continue for each element of above array until the last element. For example, first 3 rows of this new DataFrame would look like below
Is there any direct function/method to achieve above in Python?
Here's one way using PeriodIndex and DateOffset functions in pandas. Note that I named your array arr below:
df = pd.DataFrame({'start-date': arr,
'end-date': (pd.PeriodIndex(arr, freq='Q').to_timestamp() +
pd.DateOffset(years=4, months=10)).to_period('Q')})
Output:
start-date end-date
0 2000Q1 2004Q4
1 2000Q2 2005Q1
2 2000Q3 2005Q2
3 2000Q4 2005Q3
4 2001Q1 2005Q4
5 2001Q2 2006Q1
6 2001Q3 2006Q2
7 2001Q4 2006Q3
8 2002Q1 2006Q4
9 2002Q2 2007Q1
10 2002Q3 2007Q2
11 2002Q4 2007Q3
12 2003Q1 2007Q4
13 2003Q2 2008Q1
14 2003Q3 2008Q2
15 2003Q4 2008Q3
16 2004Q1 2008Q4
17 2004Q2 2009Q1
18 2004Q3 2009Q2
19 2004Q4 2009Q3
20 2005Q1 2009Q4
21 2005Q2 2010Q1
22 2005Q3 2010Q2
23 2005Q4 2010Q3
24 2006Q1 2010Q4
25 2006Q2 2011Q1
26 2006Q3 2011Q2
27 2006Q4 2011Q3
28 2007Q1 2011Q4
29 2007Q2 2012Q1
30 2007Q3 2012Q2
31 2007Q4 2012Q3
32 2008Q1 2012Q4
33 2008Q2 2013Q1
34 2008Q3 2013Q2
35 2008Q4 2013Q3
36 2009Q1 2013Q4
37 2009Q2 2014Q1
38 2009Q3 2014Q2
39 2009Q4 2014Q3

VBA solution of VIF factors [EXCEL]

I have several multiple linear regressions to carry out, I am wondering if there is a VBA solution for getting the VIF of regression outputs for different equations.
My current data format:
i=1
Year DependantVariable Variable2 Variable3 Variable4 Variable5 ....
2009 100 10 20 -
2010 110 15 25 -
2011 115 20 30 -
2012 125 25 35 -
2013 130 25 40 -
I have the above table, with the value of i determining the value of the variables (essentially, different regression input tables in place for every value of i)
I am looking for a VBA that will check every value of i (stored in a column), calculate the VIF for every value of i and output something like below
ivalue variable1VIF variable2VIF ...
1 1.1 1.3
2 1.2 10.1

Extractall dates - how to separate single years with RegEx in python?

I have got some dates included within the test in one of the columns in my dataframe.
for example,
sr = pd.Series(['04/20/2009', '04/20/09', '4/20/09', '4/3/09', '6/2008','12/2009','2010'])
I want to extract these dates..
and half of my year ends up in the 'month' and 'day' columns.
result = sr.str.extractall(r'(?P<month>\d{,2})[/]?(?P<day>\d{,2})[/]?(?P<year>\d{2,4})')
result
month day year
match
0 0 04 20 2009
1 0 04 20 09
2 0 4 20 09
3 0 4 3 09
4 0 6 20 08
5 0 12 20 09
6 0 20 NaN 10
how can I fix this?
I can only think of processing "'6/2008','12/2009','2010'" separately from "'04/20/2009', '04/20/09', '4/20/09'", and then appending them.
You could make the match a bit more specific for the months and days.
As there is always a year, you can make the whole group for the month and day optional.
In that optional group, you can match a month with an optional day.
(?<!\S)(?:(?P<month>1[0-2]|0?[1-9])/(?:(?P<day>3[01]|[12][0-9]|0?[1-9])/)?)?(?P<year>(?:20|19)?\d{2})(?!\S)
In parts
(?<!\S) Negative lookbehind, assert what is directly to the left is not a non whitespace char (whitespace boundary to the left)
(?: Non capture group
(?P<month>1[0-2]|0?[1-9])/ Group month followed by /
(?: Non capture group
(?P<day>3[01]|[12][0-9]|0?[1-9])/ Group day followed by /
)? Close group and make it optional
)? Close group and make it optional
(?P<year>(?:20|19)?\d{2}) Group year, optionally match either 20 or 19 and 2 digits
(?!\S) Negative lookahead, assert not a non whitespace char directly to the right (whitespace boundary to the right)
Regex demo

DAX help: % monthly share of another table

I have a DAX formula for my Powerpivot I cannot get to solve and was hoping for help.
I have two pivot tables connected already
Showing a cohort of actions taken within Month 1,….X on the sign up month
Total Sign Ups on monthly basis
I have tried to attached the sheet here but somehow I cant so I have add a screenshot of the sheet.1
What I have so far is:
=DIVIDE(
SUM(Range[conversion to KYC completed]),
SUM('Range 1'[Sum of signups]))
But this does not give me what I want as I think I’m missing the monthly grouping somehow.
Question 1:
What I want is to get the share of actions completed within 1,...,X months out of the total sign up that given month (e.g. Jan) (so the data from Table 2)
Question 2:
In best case I would also like to show total sign ups in the beginning of the cohort to make the cohort easier to understand, so having the monthly total sign up (which the cohort is calculated based on). But now I cannot get just the totals month by month. Is there anyways just to add in a monthly total column in the pivot without applying these number as a value across all columns?
Something like this is the ultimate outcome for me 2
UPDATED WITH SAMPLE DATA
Signup month, KYC completed month, Age by month, signups, conversion to KYC completed
Jan-17 Jul-18 18 97 75
Jan-17 Jul-18 18 99 79
Jan-17 Dec-18 23 95 80
Feb-17 May-18 15 99 74
Feb-17 Jul-18 17 90 75
Feb-17 Jul-18 17 95 76
Feb-17 Aug-18 18 92 71
Mar-17 May-18 14 94 73
Apr-17 Jul-18 15 93 75
May-17 Sep-18 16 94 70
May-17 Oct-18 17 98 72
Jun-17 May-18 11 95 79
Jul-17 Oct-18 15 97 74
Jul-17 Jul-18 12 94 78
Aug-17 Sep-18 13 96 74
Sep-17 Nov-18 14 95 80
Sep-17 Oct-18 13 94 79
DESIRED OUTCOME
The % for Month 1....X is calculated KYC Completed / Monthly Sign up
OUTPUT WITH THIS CODE
=VAR SignUpMonth = IF(HASONEVALUE('Range 1'[Row Labels]), BLANK())
RETURN
DIVIDE(CALCULATE(SUM([conversion to KYC completed])),
CALCULATE(SUM('Range 1'[Sum of signups]),
FILTER(ALL(Range), Range[Signup month (Month Index)] = SignUpMonth)))
[
Thanks for the sample data Franzi. Still not too clear what you're asking for, but perhaps this will help a little.
Signed Up to Signed In Ratio =
VAR SignUpMonth = SELECTEDVALUE(Table1[Signup month], BLANK())
RETURN
DIVIDE(CALCULATE(SUM([conversion to KYC completed])),
CALCULATE(SUM(Table1[ signups]),
FILTER(ALL(Table1), Table1[Signup month] = SignUpMonth)))
So. Let's break it down.
If I understand correct, you want to see the cross section of number of signins for a given month ( x axis ) signup combo ( y axis ) and divide that number by the total signups ( y axis ) per signup month.
number of signins for a given month ( x axis ) signup combo ( y axis ):
CALCULATE(SUM([conversion to KYC completed]))
TOTAL signups ( y axis ) per signup month
CALCULATE(SUM(Table1[ signups]),
FILTER(ALL(Table1), Table1[Signup month] = SignUpMonth))

Excel formula to get the count of certain value based on odd/even line

I have this data in Excel.
A B C
--------------------------------------
Line Number Value #1 Value #2
1 21 35
2 21 27
3 21 18
4 10 47
5 50 5
6 37 68
7 10 21
8 75 21
I tried to calculate the total "21" based on odd line number. In this situation, the answer should be 3. However, neither" IF(MOD(A1:A8,2)=1,COUNTIF(B1:C8,21)) " nor " {IF(MOD(A1:A8,2)=1,COUNTIF(B1:C8,21))} "worked and Google didn't yield anything helpful. Could anyone help me? Thanks!!
This works for odd lines:
=SUM(COUNTIF(A:B,21)-SUMPRODUCT((A:B=21)*(MOD(ROW(A:B),2)=0)))
there may be a better way of writing this formula.
Use this to count even lines:
=SUMPRODUCT((A:B=21)*(MOD(ROW(A:B),2)=0))

Resources