How to plot this code of matplotlib efficiently - python-3.x

I am new to python and doing a time series analysis of stocks.I created a data frame of rolling average of 5 stocks according to their percentage change in close price.Therefore this df has 5 columns and i have another df index rolling average of percentage change of closing price.I want to plot individual stock column of the df with the index df. I wrote this code
fig.add_subplot(5,1,1)
plt.plot(pctchange_RA['HUL'])
plt.plot(N50_RA)
fig.add_subplot(5,1,2)
plt.plot(pctchange_RA['IRCON'])
plt.plot(N50_RA)
fig.add_subplot(5,1,3)
plt.plot(pctchange_RA['JUBLFOOD'])
plt.plot(N50_RA)
fig.add_subplot(5,1,4)
plt.plot(pctchange_RA['PVR'])
plt.plot(N50_RA)
fig.add_subplot(5,1,5)
plt.plot(pctchange_RA['VOLTAS'])
plt.plot(N50_RA)
NOTE:pctchange_RA is a pandas df of 5 stocks and N50_RA is a index df of one column

You can put your column names in a list and then just loop over it and create subplots dynamically. A pseudocode would look like the following
cols = ['HUL', 'IRCON', 'JUBLFOOD', 'PVR', 'VOLTAS']
for i, col in enumerate(cols):
ax = fig.add_subplot(5, 1, i+1)
ax.plot(pctchange_RA[col])
ax.plot(N50_RA)

Related

How to transpose and Pandas DataFrame and name new columns?

I have simple Pandas DataFrame with 3 columns. I am trying to Transpose it into and then rename that new dataframe and I am having bit trouble.
df = pd.DataFrame({'TotalInvoicedPrice': [123],
'TotalProductCost': [18],
'ShippingCost': [5]})
I tried using
df =df.T
which transpose the DataFrame into:
TotalInvoicedPrice,123
TotalProductCost,18
ShippingCost,5
So now i have to add column names to this data frame "Metrics" and "Values"
I tried using
df.columns["Metrics","Values"]
but im getting errors.
What I need to get is DataFrame that looks like:
Metrics Values
0 TotalInvoicedPrice 123
1 TotalProductCost 18
2 ShippingCost 5
Let's reset the index then set the column labels
df.T.reset_index().set_axis(['Metrics', 'Values'], axis=1)
Metrics Values
0 TotalInvoicedPrice 123
1 TotalProductCost 18
2 ShippingCost 5
Maybe you can avoid transpose operation (little performance overhead)
#YOUR DATAFRAME
df = pd.DataFrame({'TotalInvoicedPrice': [123],
'TotalProductCost': [18],
'ShippingCost': [5]})
#FORM THE LISTS FROM YOUR COLUMNS AND FIRST ROW VALUES
l1 = df.columns.values.tolist()
l2 = df.iloc[0].tolist()
#CREATE A DATA FRAME.
df2 = pd.DataFrame(list(zip(l1, l2)),columns = ['Metrics', 'Values'])
print(df2)

Pandas combining rows as header info

This is how I am reading and creating the dataframe with pandas
def get_sheet_data(sheet_name='SomeName'):
df = pd.read_excel(f'{full_q_name}',
sheet_name=sheet_name,
header=[0,1],
index_col=0)#.fillna(method='ffill')
df = df.swapaxes(axis1="index", axis2="columns")
return df.set_index('Product Code')
printing this tabularized gives me(this potentially will have hundreds of columns):
I cant seem to add those first two rows into the header, I've tried:
python:pandas - How to combine first two rows of pandas dataframe to dataframe header?https://stackoverflow.com/questions/59837241/combine-first-row-and-header-with-pandas
and I'm failing at each point. I think its because of the multiindex, not necessarily the axis swap? But using: https://pandas.pydata.org/docs/reference/api/pandas.MultiIndex.html is kind of going over my head right now. Please help me add those two rows into the header?
The output of df.columns is massive so Ive cut it down alot:
Index(['Product Code','Product Narrative\nHigh-level service description','Product Name','Huawei Product ID','Type','Bill Cycle Alignment',nan,'Stackable',nan,
and ends with:
nan], dtype='object')
We Create new column names and set them to df.columns, the new column names are generated by joining the 3 Multindex headers and the 1st row of the DataFrame.
df.columns = ['_'.join(i) for i in zip(df.columns.get_level_values(0).tolist(), df.columns.get_level_values(1).tolist(), df.iloc[0,:].replace(np.nan,'').tolist())]

How to calculate the diff between values of 2 adjacent values across every column in a pandas dataframe?

I made a dataset of shape (252,60) by concatenating the ['Close'] columns of every stock of the Sensex-30 index, and making columns by shifting each ['Close'] column by 1 level down. Here I wanted to count the difference between the shifted price and current price for every day and every stock, I tried to do so in a colab notebook, but I get an error as IndexError: single positional indexer is out-of-bounds
The dataset and code is too long to be shown, so you can look at it at this colab notebook
Reducing your code, I find the below works
import requests
df = pd.DataFrame()
for stock in ['RELIANCE','INFY','HCLTECH','TCS','BAJAJ-AUTO',
'TITAN','LT','NESTLEIND','TECHM','ASIANPAINT',
'M&M','ICICIBANK','POWERGRID','HINDUNILVR','SUNPHARMA',
'TATASTEEL','AXISBANK','SBIN','ULTRACEMCO','BAJAJFINSV',
'ITC','NTPC','BAJFINANCE','BHARTIARTL','MARUTI',
'KOTAKBANK','HDFC','HDFCBANK','ONGC','INDUSINDBK']:
url = "https://query1.finance.yahoo.com/v7/finance/download/"+stock+".BO?period1=1577110559&period2=1608732959&interval=1d&events=history&includeAdjustedClose=true"
df = pd.concat([df, pd.read_csv(io.BytesIO(requests.get(url).content), index_col="Date")
.loc[:,"Close"]
.to_frame().rename(columns={"Close":stock})], axis=1)
profit={f"{c}_profit":lambda dfa: dfa[c]-dfa[c].shift(periods=1) for c in df.columns}
df = df.assign(**profit)
df.shape
output
(252, 60)

Python TimeSeries ploting problem with holidays (no rows dates and times)

I am trying to plot a column of pandas dataframe with datetime index (timeseries). Some dates and times have no rows in dataframe and when I am going to plot it using simple df['column_name'].plot(), on the x axis which is datetime, it shows date and times with no rows in the dataframe and connects data before these empty days to date after it.
How should I get rid of these empty rows in plotting?
When making a line plot, the plotting library don't automatically know between what datapoints there should be a line drawn, and between what points, there should be a gap.
The most straightforward way to tell the library this, I think, is to create NaN-rows so that the index reflects what you think it should reflect. I.e. if you think data should be per minute, make sure that the dataframe index is per minute.
The plotting library then understands that where there is NaN-data, no line should be drawn.
Code example:
# generate a dataframe with one column of
df = pd.DataFrame(
[
['2020-04-03 12:10:00',23.2],
['2020-04-03 12:12:00',23.1],
['2020-04-03 12:13:00',14.1], #notice the gap here!
['2020-04-03 12:24:00',23.1],
['2020-04-03 12:25:00',23.3],
],
columns=['timestamp','value']
)
df['timestamp'] = pd.to_datetime(df.timestamp) # make sure that the timestamp data is stored as timestamps
Then we create reindex the data, which create new nan-rows where neededn.
df = df.set_index('timestamp')
df = df.reindex(pd.date_range(start=df.index.min(),end=df.index.max(),freq='1min'))
Finally plot it!
df['value'].plot(figsize=(10,6))
The result looks like

How to selecting multiple rows and take mean value based on name of the row

From this data frame I like to select rows with same concentration and also almost same name. For example, first three rows has same concentration and also same name except at the end of the name Dig_I, Dig_II, Dig_III. This 3 rows same with same concentration. I like to somehow select this three rows and take mean value of each column. After that I want to create a new data frame.
here is the whole data frame:
import pandas as pd
df = pd.read_csv("https://gist.github.com/akash062/75dea3e23a002c98c77a0b7ad3fbd25b.js")
import pandas as pd
df = pd.read_csv("https://gist.github.com/akash062/75dea3e23a002c98c77a0b7ad3fbd25b.js")
new_df = df.groupby('concentration').mean()
Note: This will only find the averages for columns with dtype float or int... this will drop the img_name column and will take the averages of all columns...
This may be faster...
df = pd.read_csv("https://gist.github.com/akash062/75dea3e23a002c98c77a0b7ad3fbd25b.js").groupby('concentration').mean()
If you would like to preserve the img_name...
df = pd.read_csv("https://gist.github.com/akash062/75dea3e23a002c98c77a0b7ad3fbd25b.js")
new = df.groupby('concentration').mean()
pd.merge(df, new, left_on = 'concentration', right_on = 'concentration', how = 'inner')
Does that help?

Resources