Convert Dataframe to display in matplotlib line chart with dates - python-3.x

I am facing an issue while plotting graph in matplotlib as I am unable to convert data exactly to give inputs to matplotlib
Here is my data
date,GOOG,AAPL,FB,BABA,AMZN,GE,AMD,WMT,BAC,GM,T,UAA,SHLD,XOM,RRC,BBY,MA,PFE,JPM,SBUX
1989-12-29,,0.117203,,,,0.352438,3.9375,3.48607,1.752478,,2.365775,,,1.766756,,0.166287,,0.110818,1.827968,
1990-01-02,,0.123853,,,,0.364733,4.125,3.660858,1.766686,,2.398184,,,1.766756,,0.173216,,0.113209,1.835617,
1990-01-03,,0.124684,,,,0.36405,4.0,3.660858,1.780897,,2.356516,,,1.749088,,0.194001,,0.113608,1.896803,
1990-01-04,,0.1251,,,,0.362001,3.9375,3.641439,1.743005,,2.403821,,,1.731422,,0.190537,,0.115402,1.904452,
1990-01-05,,0.125516,,,,0.358586,3.8125,3.602595,1.705114,,2.287973,,,1.722587,,0.190537,,0.114405,1.9121,
1990-01-08,,0.126347,,,,0.360635,3.8125,3.651146,1.714586,,2.326588,,,1.749088,,0.17668,,0.113409,1.9121,
1990-01-09,,0.1251,,,,0.353122,3.875,3.55404,1.714586,,2.273493,,,1.713754,,0.17668,,0.111017,1.850914,
1990-01-10,,0.119697,,,,0.353805,3.8125,3.55404,1.681432,,2.210742,,,1.722587,,0.173216,,0.11301,1.843264,
1990-01-11,,0.11471,,,,0.353805,3.875,3.592883,1.667222,,2.23005,,,1.731422,,0.169751,,0.111814,1.82032,
I have converted it as following dataframe
AAPL
2016 0.333945
2017 0.330923
2018 0.321857
2019 0.312790
<class 'pandas.core.frame.DataFrame'>
by using following code:
import pandas as pd
df = pd.read_csv("portfolio.txt")
companyname = "AAPL"
frames = df.loc[:, df.columns.str.startswith(companyname)]
l1 = frames.loc['2015-6-1':'2019-6-10']
print(l1)
print(type(l1))
plt.plot(li1, label="Company Past Information")
plt.xlabel('Risk Aversion')
plt.ylabel('Optimal Investment Portfolio')
plt.title('Optimal Investment Portfolio For Low, Medium & High')
plt.legend()
plt.show()
After plotting to matplotlib I getting output correctly for which data is existed.
But for which data is not available graph is plotting wrongly.
GOOG
2016 NaN
2017 NaN
2018 NaN
2019 NaN
Due to this I am unable to plot graph correctly
Please help out of this
Thanks in advance

If you're reading you data in from a .csv using pandas you can:
import pandas as pd
df = pd.csv_read(your_csv, parse_dates=[0]) # 0 means your dates are in the first column
Otherwise you can convert your data column to datatime using:
import pandas as pd
df['date'] = pd.to_datetime(df['date'])
When using matplotlib then you can:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(df.iloc[:, 0], df.loc[:, some_column])
plt.show()

Related

Difficulty grouping barchart using Python, Pandas and Matplotlib

I am having difficulty getting plot.bar to group the bars together the way I have them grouped in the dataframe. The dataframe returns the grouped data correctly, however, the bar graph is providing a separate bar for every line int he dataframe. Ideally, everything in my code below should group 3-6 bars together for each department (Dept X should have bars grouped together for each type, then count of true/false as the Y axis).
Dataframe:
dname Type purchased
Dept X 0 False 141
True 270
1 False 2020
True 2604
2 False 2023
True 1047
Code:
import psycopg2
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
##connection and query data removed
df = pd.merge(df_departments[["id", "dname"]], df_widgets[["department", "widgetid", "purchased","Type"]], how='inner', left_on='id', right_on='department')
df.set_index(['dname'], inplace=True)
dx=df.groupby(['dname', 'Type','purchased'])['widgetid'].size()
dx.plot.bar(x='dname', y='widgetid', rot=90)
I can't be sure without a more reproducible example, but try unstacking the innermost level of the MultiIndex of dx before plotting:
dx.unstack().plot.bar(x='dname', y='widgetid', rot=90)
I expect this to work because when plotting a DataFrame, each column becomes a legend entry and each row becomes a category on the horizontal axis.

Unable to create charts using bokeh

So I'm using the following code in Spyder to create a chart which will be displayed in a web browser:
import pandas as pd
import numpy as np
from bokeh.plotting import figure, output_file, show
car = pd.read_csv('car_sales.csv')
Month = car['Month']
Sales = car['Sales']
output_file("bokeh_scatter_example.html", title="Bokeh Scatter Plot Example")
fig2 = figure(title="Bokeh Scatter Plot Example", x_axis_label='Month',
y_axis_label='Sales')
fig2.circle(Month, Sales, size=5, alpha=0.5)
show(fig2)
What I've realized is that if the x-axis values are numeric, then this code works. But my months column is in string format i.e Jan, Feb etc which is when the code stops working. Any help would be appreciated. Thanks.
Edit: output of car.head()
Month Sales
0 Jan 1808
1 Feb 1251
2 Mar 3023
and so on.
Your X-axis is categorical in nature, so it needs special handling. You have to create a figure like this:
fig2 = figure(title="Bokeh Scatter Plot Example", x_axis_label='Month',
y_axis_label='Sales',
x_range=Month)
More details can be found here:
https://docs.bokeh.org/en/latest/docs/user_guide/categorical.html

Scatter plots from averaging a columns values

I'm working with 2 columns from a csv file: Month_num and Freight_In_(tonnes)
I'm attempting to plot the average value of freight for each month from 1987 to 2016. I can currently show each months average freight in, in table format but I'm struggling to get it to show in the scatter plot.
Here is my current code.
from matplotlib import pyplot as plt
import pandas as pd
df = pd.read_csv('CityPairs.csv')
Month = df.groupby(['Month_num'])['Freight_In_(tonnes)']
Month.mean()
plt.scatter(df['Month_num'], df['Freight_In_(tonnes)'])
plt.show()
Try this:
df.groupby(['Month_num']).mean().reset_index().plot(kind='scatter',x='Month_num',y='Freight_In_(tonnes)')

Applying a Month End Trading Calendar to Yahoo API data

This is my first post, and I am new to Python and Pandas. I have been working on piecing together the code below based on many questions and answers I have viewed on this website. My next challenge is how to apply a month end trading calendar to the code below so that the output consists of month end "Adj Close" values for the two ETFs listed "VTI and BND". The "100ma" 100 day moving average must still be calculated based on the previous 100 trading days.
#ryan sheftel appears to have something on this site that would work, but I can't seem to implement it with my code to give me what I want.
Create trading holiday calendar with Pandas
Code I have put together so far:
import datetime as dt #set start and end dates for data we are using
import pandas as pd
import numpy as np
import pandas_datareader.data as web # how I grab data from Yahoo Finance API. Pandas is popular data analysis library.
start = dt.datetime(2007,1,1)
end = dt.datetime(2017,2,18)
vti = web.DataReader('vti', 'yahoo',start, end)# data frame, stock ticker symbol, where getting from, start time, end time
bnd = web.DataReader('bnd', 'yahoo', start, end)
vti["100ma"] = vti["Adj Close"].rolling(window=100).mean()
bnd["100ma"] = bnd["Adj Close"].rolling(window=100).mean()
# Below I create a DataFrame consisting of the adjusted closing price of these stocks, first by making a list of these objects and using the join method
stocks = pd.DataFrame({'VTI': vti["Adj Close"],
'VTI 100ma': vti["100ma"],
'BND': bnd["Adj Close"],
'BND 100ma': bnd["100ma"],
})
print (stocks.head())
stocks.to_csv('Stock ETFs.csv')
I'd use asfreq to sample down to business month
import datetime as dt #set start and end dates for data we are using
import pandas as pd
import numpy as np
import pandas_datareader.data as web # how I grab data from Yahoo Finance API. Pandas is popular data analysis library.
start = dt.datetime(2007,1,1)
end = dt.datetime(2017,2,18)
ids = ['vti', 'bnd']
data = web.DataReader(ids, 'yahoo', start, end)
ac = data['Adj Close']
ac.join(ac.rolling(100).mean(), rsuffix=' 100ma').asfreq('BM')
bnd vti bnd 100ma vti 100ma
Date
2007-01-31 NaN 58.453726 NaN NaN
2007-02-28 NaN 57.504188 NaN NaN
2007-03-30 NaN 58.148760 NaN NaN
2007-04-30 54.632232 60.487535 NaN NaN
2007-05-31 54.202353 62.739991 NaN 59.207899
2007-06-29 54.033591 61.634027 NaN 60.057136
2007-07-31 54.531996 59.455505 NaN 60.902113
2007-08-31 55.340892 60.330213 54.335640 61.227386
2007-09-28 55.674840 62.650936 54.542452 61.363872
2007-10-31 56.186500 63.773849 54.942038 61.675567

Pandas Matplotlib Line Graph

Given the following data frame:
import pandas as pd
import numpy as np
df = pd.DataFrame(
{'YYYYMM':[201603,201503,201403,201303,201603,201503,201403,201303],
'Count':[5,6,2,7,4,7,8,9],
'Group':['A','A','A','A','B','B','B','B']})
df
Count Group YYYYMM
0 5 A 201603
1 6 A 201503
2 2 A 201403
3 7 A 201303
4 4 B 201603
5 7 B 201503
6 8 B 201403
7 9 B 201303
I need to generate a line graph with one line per group with a summary table at the bottom. Something like this:
I need each instance of 'YYYYMM' to be treated like a year by Pandas/Matplotlib.
So far, this seems to help, but I'm not sure if it will do the trick:
df['YYYYMM']=df['YYYYMM'].astype(str).str[:-2].astype(np.int64)
Then, I did this to pivot the data:
t=df.pivot_table(df,index=['YYYYMM'],columns=['Group'],aggfunc=np.sum)
Count
Group A B
YYYYMM
2013 7 9
2014 2 8
2015 6 7
2016 5 4
Then, I tried to plot it:
import matplotlib.pyplot as plt
%matplotlib inline
fig, ax = plt.subplots(1,1)
t.plot(table=t,ax=ax)
...and this happened:
I'd like to do the following:
remove all lines (borders) from the table at the bottom
remove the jumbled text in the table
remove the x axis tick labels (it should just show the years for tick labels)
I can clean up the rest myself (remove legend and borders, etc..).
Thanks in advance!
I may not have fully understood what you mean by 1., since you are showing the table lines in your reference. I have also not understood whether you want to transpose the table.
What you may be looking for is:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(
{'YYYYMM':[201603,201503,201403,201303,201603,201503,201403,201303],
'Count':[5,6,2,7,4,7,8,9],
'Group':['A','A','A','A','B','B','B','B']})
df['YYYYMM']=df['YYYYMM'].astype(str).str[:-2].astype(int)
t=pd.pivot_table(df, values='Count', index='YYYYMM',columns='Group',aggfunc=np.sum)
t.index.name = None
fig, ax = plt.subplots(1,1)
t.plot(table=t,ax=ax)
ax.xaxis.set_major_formatter(plt.NullFormatter())
plt.tick_params(
axis='x', # changes apply to the x-axis
which='both', # both major and minor ticks are affected
bottom='off', # ticks along the bottom edge are off
top='off', # ticks along the top edge are off
labelbottom='off') # labels along the bottom edge are off
plt.show()

Resources