Counting the number of times the values are more than the mean for a specific column in Dataframe

Counting the number of times the values are more than the mean for a specific column in Dataframe - python-3.x

I'm trying to find the number of times the value in a certain column (in this case under "AveragePrice") is more than its mean & median. I calculated the mean using the below:
mean_AveragePrice = avocadodf["AveragePrice"].mean(axis = 0)
median_AveragePrice = avocadodf["AveragePrice"].median(axis = 0)
how do I count the number of times the values were more than the mean?
Sample of the Dataframe:
Date AveragePrice Total Volume PLU4046 PLU4225 PLU4770 Total Bags
0 27/12/2015 1.33 64236.62 1036.74 54454.85 48.16 8696.87
1 20/12/2015 1.35 54876.98 674.28 44638.81 58.33 9505.56
2 13/12/2015 0.93 118220.22 794.70 109149.67 130.50 8145.35
3 06/12/2015 1.08 78992.15 1132.00 71976.41 72.58 5811.16
4 29/11/2015 1.28 51039.60 941.48 43838.39 75.78 6183.95
5 22/11/2015 1.26 55979.78 1184.27 48067.99 43.61 6683.91
6 15/11/2015 0.99 83453.76 1368.92 73672.72 93.26 8318.86
7 08/11/2015 0.98 109428.33 703.75 101815.36 80.00 6829.22
8 01/11/2015 1.02 99811.42 1022.15 87315.57 85.34 11388.36

import numpy as np
mean_AveragePrice = avocadodf["AveragePrice"].mean(axis = 0)
median_AveragePrice = avocadodf["AveragePrice"].median(axis = 0)
where_bigger = np.where((avocadodf["AveragePrice"] > mean_AveragePrice) & (avocadodf["AveragePrice"] > median_AveragePrice), 1, 0 )
where_bigger.sum()
So you got the data you need and now you need the test. np.where will help you out

Related

Add suffix to a specific row in pandas dataframe

Im trying to add suffix to % Paid row in the dataframe, but im stuck with only adding suffix to the column names.
is there a way i can add suffix to a specific row values,
Any suggestions are highly appreciated.
d={
("Payments","Jan","NOS"):[],
("Payments","Feb","NOS"):[],
("Payments","Mar","NOS"):[],
}
d = pd.DataFrame(d)
d.loc["Total",("Payments","Jan","NOS")] = 9991
d.loc["Total",("Payments","Feb","NOS")] = 3638
d.loc["Total",("Payments","Mar","NOS")] = 5433
d.loc["Paid",("Payments","Jan","NOS")] = 139
d.loc["Paid",("Payments","Feb","NOS")] = 123
d.loc["Paid",("Payments","Mar","NOS")] = 20
d.loc["% Paid",("Payments","Jan","NOS")] = round((d.loc["Paid",("Payments","Jan","NOS")] / d.loc["Total",("Payments","Jan","NOS")])*100)
d.loc["% Paid",("Payments","Feb","NOS")] = round((d.loc["Paid",("Payments","Feb","NOS")] / d.loc["Total",("Payments","Feb","NOS")])*100)
d.loc["% Paid",("Payments","Mar","NOS")] = round((d.loc["Paid",("Payments","Mar","NOS")] / d.loc["Total",("Payments","Mar","NOS")])*100)
without suffix
I tried this way, it works but.. im looking for adding suffix for an entire row..
d.loc["% Paid",("Payments","Jan","NOS")] = str(round((d.loc["Paid",("Payments","Jan","NOS")] / d.loc["Total",("Payments","Jan","NOS")])*100)) + '%'
d.loc["% Paid",("Payments","Feb","NOS")] = str(round((d.loc["Paid",("Payments","Feb","NOS")] / d.loc["Total",("Payments","Feb","NOS")])*100)) + '%
d.loc["% Paid",("Payments","Mar","NOS")] = str(round((d.loc["Paid",("Payments","Mar","NOS")] / d.loc["Total",("Payments","Mar","NOS")])*100)) + '%'
with suffix

Select row separately by first index value, round and convert to integers, last to strings and add %:
d.loc["% Paid"] = d.loc["% Paid"].round().astype(int).astype(str).add(' %')
print (d)
Payments
Jan Feb Mar
NOS NOS NOS
Total 9991.0 3638.0 5433.0
Paid 139.0 123.0 20.0
% Paid 1 % 3 % 0 %

Panda returns 50x1 matrix instead of 50x7? (read_csv gone wrong)

I'm quite new to Python. I'm trying to load a .csv file with Panda but it returns a 50x1 matrix instead of expected 50x7. I'm a bit uncertain whether it is becaue my data contains numbers with "," (although I thought the quotechar attribute would solve that problem).
EDIT: Should perhaps mention that including the attribute sep=',' doesn't solve the issue)
My code looks like this
df = pd.read_csv('data.csv', header=None, quotechar='"')
print(df.head)
print(len(df.columns))
print(len(df.index))
Any ideas? Thanks in advance
Here is a subset of the data as text
10-01-2021,813,116927,"2,01",-,-,-
11-01-2021,657,117584,"2,02",-,-,-
12-01-2021,462,118046,"2,03",-,-,-
13-01-2021,12728,130774,"2,24",-,-,-
14-01-2021,17895,148669,"2,55",-,-,-
15-01-2021,15206,163875,"2,81",5,5,"0,0001"
16-01-2021,4612,168487,"2,89",7,12,"0,0002"
17-01-2021,2536,171023,"2,93",717,729,"0,01"
18-01-2021,3883,174906,"3,00",2147,2876,"0,05"
Here is the output of the head-function
0
0 27-12-2020,6492,6492,"0,11",-,-,-
1 28-12-2020,1987,8479,"0,15",-,-,-
2 29-12-2020,8961,17440,"0,30",-,-,-
3 30-12-2020,11477,28917,"0,50",-,-,-
4 31-12-2020,6197,35114,"0,60",-,-,-
5 01-01-2021,2344,37458,"0,64",-,-,-
6 02-01-2021,8895,46353,"0,80",-,-,-
7 03-01-2021,6024,52377,"0,90",-,-,-
8 04-01-2021,2403,54780,"0,94",-,-,-

Using your data I got the expected result. (even without quotechar='"')
Could you maybe show us your output?
import pandas as pd
df = pd.read_csv('data.csv', header=None)
print(df)
> 0 1 2 3 4 5 6
> 0 10-01-2021 813 116927 2,01 - - -
> 1 11-01-2021 657 117584 2,02 - - -
> 2 12-01-2021 462 118046 2,03 - - -
> 3 13-01-2021 12728 130774 2,24 - - -
> 4 14-01-2021 17895 148669 2,55 - - -
> 5 15-01-2021 15206 163875 2,81 5 5 0,0001
> 6 16-01-2021 4612 168487 2,89 7 12 0,0002
> 7 17-01-2021 2536 171023 2,93 717 729 0,01
> 8 18-01-2021 3883 174906 3,00 2147 2876 0,05

You need to define the seperator and delimiter, like this:
df = pd.read_csv('data.csv', header=None, sep = ',', delimiter=',' , quotechar='"')

Calculation of stock values with yfinance and python

I would like to make some calculations on stock prices in Python 3 and I have installed the module yfinance.
I try to get an individual value like this:
import yfinance as yf
#define the ticker symbol
tickerSymbol = 'MSFT'
#get data on this ticker
tickerData = yf.Ticker(tickerSymbol)
#get the historical prices for this ticker
tickerDf = tickerData.history(period='1d', start='2015-1-1', end='2020-12-30')
row_date = tickerDf[tickerDf['Date']=='2020-12-30']
value = row_date.Open.item()
#see your data
print (value)
But when I run this, it says:
KeyError: 'Date'
Which is strange because when I do this, it works well and I have the column Date:
import yfinance as yf
#define the ticker symbol
tickerSymbol = 'MSFT'
#get data on this ticker
tickerData = yf.Ticker(tickerSymbol)
#get the historical prices for this ticker
tickerDf = tickerData.history(period='1d', start='2015-1-1', end='2020-12-30')
#row_date = tickerDf[tickerDf['Date']=='2020-12-30']
#value = row_date.Open.item()
#see your data
print (tickerDf)
I get the following result:
G:\python> python test.py
Open High Low Close Volume Dividends Stock Splits
Date
2014-12-31 41.512481 42.143207 41.263744 41.263744 21552500 0.0 0
2015-01-02 41.450302 42.125444 41.343701 41.539135 27913900 0.0 0
2015-01-05 41.192689 41.512495 41.086088 41.157158 39673900 0.0 0
2015-01-06 41.201567 41.530255 40.455355 40.553074 36447900 0.0 0
2015-01-07 40.846223 41.272629 40.410934 41.068310 29114100 0.0 0
... ... ... ... ... ... ... ...
2020-12-22 222.690002 225.630005 221.850006 223.940002 22612200 0.0 0
2020-12-23 223.110001 223.559998 220.800003 221.020004 18699600 0.0 0
2020-12-24 221.419998 223.610001 221.199997 222.750000 10550600 0.0 0
2020-12-28 224.449997 226.029999 223.020004 224.960007 17933500 0.0 0
2020-12-29 226.309998 227.179993 223.580002 224.149994 17403200 0.0 0
[1510 rows x 7 columns]

Under the hood, yfinance uses a Pandas data frame to create a Ticker. In this dataframe, Date isn't an ordinary column, but is instead a name given to the index (see line 240 in base.py of yfinance). The index column behaves differently than other columns and actually can't be referenced by name. You can access it using TickerDf.index=='2020-12-30' or by turning it into a regular column using reset_index as explained in another question. Searching through an index is faster than searching a regular column, so if you are looking through a lot of data, it will be to your advantage to leave it as an index.

How to create a dataframe with different random numbers on each column?

I'm trying to make different random numbers but it keeps being the same on every column, how to fix it, using 1 line?
CODE:
yuju= pd.DataFrame()
column_price_x = [random.uniform(65.5,140.5) for i in range(20)]
for i in range(1990,2020):
yuju[i] = column_price_x
yuju
RESULT
EXPECTED:
Different numbers value for each column
How can I deal with it?

Its much easier than you think
In [12]: import numpy as np
In [13]: df = pd.DataFrame(np.random.rand(5,5))
In [14]: df
Out[14]:
0 1 2 3 4
0 0.463645 0.818606 0.520964 0.016413 0.286529
1 0.701693 0.556813 0.352911 0.738017 0.148805
2 0.899378 0.626350 0.821576 0.917648 0.404706
3 0.985617 0.336138 0.443910 0.690457 0.627859
4 0.121281 0.784853 0.799065 0.102332 0.156317
np.random.rand samples from standard uniform distribution (over [0,1])
Edit
if you want uniform distribution over given numbers, use np.random.uniform
In [16]: pd.DataFrame(np.random.uniform(low=65.5,high=140.5,size=(5,5))
...: )
Out[16]:
0 1 2 3 4
0 124.356069 96.718934 100.587485 136.670313 124.134073
1 68.109675 105.677037 86.084935 109.284336 108.393333
2 120.445978 125.036895 92.557137 105.864824 95.297450
3 91.027931 140.040051 94.362951 80.870850 70.106912
4 107.404708 92.472469 84.748544 82.116756 129.313166

here the solution
each iteration you should random again to assign new value for each column
yuju= pd.DataFrame()
for i in range(1990,2020):
yuju[i]= [random.uniform(65.5,140.5) for i in range(20)]
yuju
output
1990 1991 1992 1993 1994 1995 1996 1997 ...
0 73.117785 104.158470 76.704672 136.295814 106.008801 88.129275 96.843800 118.172649 ... 106.08
1 77.146977 131.584449 112.781430 113.071448 118.806880 140.301281 132.196554 136.222878 ... 74.85
2 67.976294 90.571586 137.313729 126.388545 134.941530 119.544528 119.692859 124.883332 ... 82.48
3 76.577618 102.765745 137.014399 84.696234 70.087628 86.180974 121.070030 87.991356 ... 71.67
4 104.675987 134.869611 120.221701 69.652423 105.650834 107.308007 122.372708 80.037225 ... 90.58
5 107.093326 124.649323 138.961846 84.312784 98.964176 87.691698 120.426266 79.888018 ... 97.46
6 97.375159 97.607740 119.027947 77.545403 81.365235 119.204719 75.426836 132.545121 ... 120.15
7 81.099338 94.315767 123.389789 85.734648 134.746295 99.196135 65.963834 72.895016 ... 135.63
8 129.577824 118.482358 137.838454 83.338883 68.603851 138.657750 85.155046 73.311065 ... 91.12
9 129.321333 134.598491 138.810883 119.487502 75.794849 125.314185 118.499014 126.969947 ... 74.86
10 122.704160 118.282868 114.196318 69.668442 112.237553 68.953530 115.395672 114.560736 ... 88.21
11 112.653109 109.635751 78.470715 81.973892 111.413094 76.918852 76.318205 129.423737 ... 103.06
12 80.984595 136.170595 83.258407 112.248942 96.730922 84.922575 104.984614 127.646325 ... 103.24
13 82.658896 97.066191 95.096705 107.757428 93.767250 93.958438 115.113325 98.931509 ... 105.32
14 85.173060 77.257117 72.668875 87.061919 130.088992 80.001858 104.526423 85.237558 ... 87.86
15 68.428850 79.948204 107.060400 92.962859 133.393354 93.806838 99.258857 138.314982 ... 86.80
16 115.105281 110.567551 119.868457 139.482290 103.235046 128.805920 140.131489 107.568099 ... 98.16
17 71.318147 119.965667 97.135972 90.174975 125.738171 115.655945 86.333461 114.574965 ... 134.80
18 134.000260 121.417473 104.832999 129.277671 139.932955 122.623911 92.369881 109.523118 ... 137.47
19 104.444951 111.712214 130.602922 119.446700 88.256841 110.316280 74.611164 88.364896 ... 115.32

Find two consecutive quarters of GDP decline, and ending with two consecutive quarters of GDP growth

I have the following df with data about the American quarterly GDP in billions of chained 2009 dollars, from 1947q1 to 2016q2:
df = pd.DataFrame(data = [1934.5, 1932.3, 1930.3, 1960.7, 1989.5, 2021.9, 2033.2, 2035.3, 2007.5, 2000.8, 2022.8, 2004.7, 2084.6, 2147.6, 2230.4, 2273.4, 2304.5, 2344.5, 2392.8, 2398.1, 2423.5, 2428.5, 2446.1, 2526.4, 2573.4, 2593.5, 2578.9, 2539.8, 2528.0, 2530.7, 2559.4, 2609.3, 2683.8, 2727.5, 2764.1, 2780.8, 2770.0, 2792.9, 2790.6, 2836.2, 2854.5, 2848.2, 2875.9, 2846.4, 2772.7, 2790.9, 2855.5, 2922.3, 2976.6, 3049.0, 3043.1, 3055.1, 3123.2, 3111.3, 3119.1, 3081.3, 3102.3, 3159.9, 3212.6, 3277.7, 3336.8, 3372.7, 3404.8, 3418.0, 3456.1, 3501.1, 3569.5, 3595.0, 3672.7, 3716.4, 3766.9, 3780.2, 3873.5, 3926.4, 4006.2, 4100.6, 4201.9, 4219.1, 4249.2, 4285.6, 4324.9, 4328.7, 4366.1, 4401.2, 4490.6, 4566.4, 4599.3, 4619.8, 4691.6, 4706.7, 4736.1, 4715.5, 4707.1, 4715.4, 4757.2, 4708.3, 4834.3, 4861.9, 4900.0, 4914.3, 5002.4, 5118.3, 5165.4, 5251.2, 5380.5, 5441.5, 5411.9, 5462.4, 5417.0, 5431.3, 5378.7, 5357.2, 5292.4, 5333.2, 5421.4, 5494.4, 5618.5, 5661.0, 5689.8, 5732.5, 5799.2, 5913.0, 6017.6, 6018.2, 6039.2, 6274.0, 6335.3, 6420.3, 6433.0, 6440.8, 6487.1, 6503.9, 6524.9, 6392.6, 6382.9, 6501.2, 6635.7, 6587.3, 6662.9, 6585.1, 6475.0, 6510.2, 6486.8, 6493.1, 6578.2, 6728.3, 6860.0, 7001.5, 7140.6, 7266.0, 7337.5, 7396.0, 7469.5, 7537.9, 7655.2, 7712.6, 7784.1, 7819.8, 7898.6, 7939.5, 7995.0, 8084.7, 8158.0, 8292.7, 8339.3, 8449.5, 8498.3, 8610.9, 8697.7, 8766.1, 8831.5, 8850.2, 8947.1, 8981.7, 8983.9, 8907.4, 8865.6, 8934.4, 8977.3, 9016.4, 9123.0, 9223.5, 9313.2, 9406.5, 9424.1, 9480.1, 9526.3, 9653.5, 9748.2, 9881.4, 9939.7, 10052.5, 10086.9, 10122.1, 10208.8, 10281.2, 10348.7, 10529.4, 10626.8, 10739.1, 10820.9, 10984.2, 11124.0, 11210.3, 11321.2, 11431.0, 11580.6, 11770.7, 11864.7, 11962.5, 12113.1, 12323.3, 12359.1, 12592.5, 12607.7, 12679.3, 12643.3, 12710.3, 12670.1, 12705.3, 12822.3, 12893.0, 12955.8, 12964.0, 13031.2, 13152.1, 13372.4, 13528.7, 13606.5, 13706.2, 13830.8, 13950.4, 14099.1, 14172.7, 14291.8, 14373.4, 14546.1, 14589.6, 14602.6, 14716.9, 14726.0, 14838.7, 14938.5, 14991.8, 14889.5, 14963.4, 14891.6, 14577.0, 14375.0, 14355.6, 14402.5, 14541.9, 14604.8, 14745.9, 14845.5, 14939.0, 14881.3, 14989.6, 15021.1, 15190.3, 15291.0, 15362.4, 15380.8, 15384.3, 15491.9, 15521.6, 15641.3, 15793.9, 15747.0, 15900.8, 16094.5, 16186.7, 16269.0, 16374.2, 16454.9, 16490.7, 16525.0, 16583.1],
index = ['1947q1', '1947q2', '1947q3', '1947q4', '1948q1', '1948q2', '1948q3',
'1948q4', '1949q1', '1949q2', '1949q3', '1949q4', '1950q1', '1950q2',
'1950q3', '1950q4', '1951q1', '1951q2', '1951q3', '1951q4', '1952q1',
'1952q2', '1952q3', '1952q4', '1953q1', '1953q2', '1953q3', '1953q4',
'1954q1', '1954q2', '1954q3', '1954q4', '1955q1', '1955q2', '1955q3',
'1955q4', '1956q1', '1956q2', '1956q3', '1956q4', '1957q1', '1957q2',
'1957q3', '1957q4', '1958q1', '1958q2', '1958q3', '1958q4', '1959q1',
'1959q2', '1959q3', '1959q4', '1960q1', '1960q2', '1960q3', '1960q4',
'1961q1', '1961q2', '1961q3', '1961q4', '1962q1', '1962q2', '1962q3',
'1962q4', '1963q1', '1963q2', '1963q3', '1963q4', '1964q1', '1964q2',
'1964q3', '1964q4', '1965q1', '1965q2', '1965q3', '1965q4', '1966q1',
'1966q2', '1966q3', '1966q4', '1967q1', '1967q2', '1967q3', '1967q4',
'1968q1', '1968q2', '1968q3', '1968q4', '1969q1', '1969q2', '1969q3',
'1969q4', '1970q1', '1970q2', '1970q3', '1970q4', '1971q1', '1971q2',
'1971q3', '1971q4', '1972q1', '1972q2', '1972q3', '1972q4', '1973q1',
'1973q2', '1973q3', '1973q4', '1974q1', '1974q2', '1974q3', '1974q4',
'1975q1', '1975q2', '1975q3', '1975q4', '1976q1', '1976q2', '1976q3',
'1976q4', '1977q1', '1977q2', '1977q3', '1977q4', '1978q1', '1978q2',
'1978q3', '1978q4', '1979q1', '1979q2', '1979q3', '1979q4', '1980q1',
'1980q2', '1980q3', '1980q4', '1981q1', '1981q2', '1981q3', '1981q4',
'1982q1', '1982q2', '1982q3', '1982q4', '1983q1', '1983q2', '1983q3',
'1983q4', '1984q1', '1984q2', '1984q3', '1984q4', '1985q1', '1985q2',
'1985q3', '1985q4', '1986q1', '1986q2', '1986q3', '1986q4', '1987q1',
'1987q2', '1987q3', '1987q4', '1988q1', '1988q2', '1988q3', '1988q4',
'1989q1', '1989q2', '1989q3', '1989q4', '1990q1', '1990q2', '1990q3',
'1990q4', '1991q1', '1991q2', '1991q3', '1991q4', '1992q1', '1992q2',
'1992q3', '1992q4', '1993q1', '1993q2', '1993q3', '1993q4', '1994q1',
'1994q2', '1994q3', '1994q4', '1995q1', '1995q2', '1995q3', '1995q4',
'1996q1', '1996q2', '1996q3', '1996q4', '1997q1', '1997q2', '1997q3',
'1997q4', '1998q1', '1998q2', '1998q3', '1998q4', '1999q1', '1999q2',
'1999q3', '1999q4', '2000q1', '2000q2', '2000q3', '2000q4', '2001q1',
'2001q2', '2001q3', '2001q4', '2002q1', '2002q2', '2002q3', '2002q4',
'2003q1', '2003q2', '2003q3', '2003q4', '2004q1', '2004q2', '2004q3',
'2004q4', '2005q1', '2005q2', '2005q3', '2005q4', '2006q1', '2006q2',
'2006q3', '2006q4', '2007q1', '2007q2', '2007q3', '2007q4', '2008q1',
'2008q2', '2008q3', '2008q4', '2009q1', '2009q2', '2009q3', '2009q4',
'2010q1', '2010q2', '2010q3', '2010q4', '2011q1', '2011q2', '2011q3',
'2011q4', '2012q1', '2012q2', '2012q3', '2012q4', '2013q1', '2013q2',
'2013q3', '2013q4', '2014q1', '2014q2', '2014q3', '2014q4', '2015q1',
'2015q2', '2015q3', '2015q4', '2016q1', '2016q2'])
df.columns = ['GDP in billions of chained 2009 dollars']
df.index.rename('quarter', inplace = True)
A recession period is defined as starting with two consecutive quarters of GDP decline, and ending with two consecutive quarters of GDP growth. The goal is to create a function 'get_recession_periods()' that returns all of the recession periods between 1947q1 and 2016q2. The output could a dataframe with two columns (start and end) or a list of tuples [(start and end), ...] with all the recession periods found.
Here is my try:
get_recession_periods()
lst_start = []
for i in range(0,len(df['GDP in billions of chained 2009 dollars'])-2):
if df['GDP in billions of chained 2009 dollars'][i] < df['GDP in billions of chained 2009 dollars'][i-1] and df['GDP in billions of chained 2009 dollars'][i+1] < df['GDP in billions of chained 2009 dollars'][i]:
lst_start.append(df.index[i])
start = lst_start[0]
lst_end = []
for j in range(df.index.get_loc(start),len(df['GDP in billions of chained 2009 dollars'])-2):
if df['GDP in billions of chained 2009 dollars'][j] > df['GDP in billions of chained 2009 dollars'][j-1] and df['GDP in billions of chained 2009 dollars'][j+1] > df['GDP in billions of chained 2009 dollars'][j]:
lst_end.append(df.index[j])
return (lst_start[0], lst_end[0])
But with the function above, I am only able to get the start and end quarter of the first recession in 1947.
Any idea?

This is probably overkill for this particular example... In a nutshell this is a bit more complicated than #zaq's answer but also much faster (about 9x here, and the difference would be much bigger on larger datasets) because it's vectorized instead of looped. But for this very small dataset here, clearly you would go with the simpler answer since even the slower way is fast enough. Finally, it stores the data in the dataframe itself rather than as a tuple (which could be an advantage or disadvantage, depending on the situation).
Thanks to #zaq for pointing out that I misread the question initially. I believe this now gives the same answer as zaq's except we have different implicit assumptions about the initial state of the world (beginning in recession or not) which is indeterminate in the data provided.
df['change'] = df.diff() # change in GDP from prior quarter
start = (df.change<0) & (df.change.shift(-1)<0) # potential start
end = (df.change>0) & (df.change.shift(-1)>0) # potential end
df['recess' ] = np.nan
df.loc[ start, 'recess' ] = -1
df.loc[ end, 'recess' ] = 1
df['recess'] = df.recess.ffill() # if the current row doesn't fit the
# definition of a start or end, then
# fill it with the prior row value
df['startend'] = np.nan
df.loc[ (df.recess==-1) & (df.recess.shift()== 1), 'startend'] = -1 # start
df.loc[ (df.recess== 1) & (df.recess.shift()==-1), 'startend'] = 1 # end
df[df.startend.notnull()]
GDP change recess startend
quarter
1947q4 1960.7 30.4 1.0 1.0
1949q1 2007.5 -27.8 -1.0 -1.0
1950q1 2084.6 79.9 1.0 1.0
1953q3 2578.9 -14.6 -1.0 -1.0
1954q2 2530.7 2.7 1.0 1.0
1957q4 2846.4 -29.5 -1.0 -1.0
1958q2 2790.9 18.2 1.0 1.0
1969q4 4715.5 -20.6 -1.0 -1.0
1970q2 4715.4 8.3 1.0 1.0
1974q3 5378.7 -52.6 -1.0 -1.0
1975q2 5333.2 40.8 1.0 1.0
1980q2 6392.6 -132.3 -1.0 -1.0
1980q4 6501.2 118.3 1.0 1.0
1981q4 6585.1 -77.8 -1.0 -1.0
1982q4 6493.1 6.3 1.0 1.0
1990q4 8907.4 -76.5 -1.0 -1.0
1991q2 8934.4 68.8 1.0 1.0
2008q3 14891.6 -71.8 -1.0 -1.0
2009q3 14402.5 46.9 1.0 1.0

One issue with your code is that you are not tracking the current status of the economy. If there are 20 consecutive quarters of GDP decline, your code will report 18 recessions beginning. And if there are 20 quarters of growth, it will report 18 recessions ending even if there wasn't one to begin with.
So, I introduce a Boolean recession to indicate whether we are in recession currently. Other changes: chained inequalities like a < b < c work in Python as expected, and improve readability; also, your column name is so verbose that I used positional indexing iloc instead, to have readable conditions in if-statements.
lst_start = []
lst_end = []
recession = False
for i in range(1, len(df)-1):
if not recession and (df.iloc[i-1, 0] > df.iloc[i, 0] > df.iloc[i+1, 0]):
recession = True
lst_start.append(df.index[i])
elif recession and (df.iloc[i-1, 0] < df.iloc[i, 0] < df.iloc[i+1, 0]):
recession = False
lst_end.append(df.index[i])
print(list(zip(lst_start, lst_end)))

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Counting the number of times the values are more than the mean for a specific column in Dataframe - python-3.x

Related

Add suffix to a specific row in pandas dataframe

Panda returns 50x1 matrix instead of 50x7? (read_csv gone wrong)

Calculation of stock values with yfinance and python

How to create a dataframe with different random numbers on each column?

Find two consecutive quarters of GDP decline, and ending with two consecutive quarters of GDP growth

Categories

Resources