Scatter plots from averaging a columns values - python-3.x

I'm working with 2 columns from a csv file: Month_num and Freight_In_(tonnes)
I'm attempting to plot the average value of freight for each month from 1987 to 2016. I can currently show each months average freight in, in table format but I'm struggling to get it to show in the scatter plot.
Here is my current code.
from matplotlib import pyplot as plt
import pandas as pd
df = pd.read_csv('CityPairs.csv')
Month = df.groupby(['Month_num'])['Freight_In_(tonnes)']
Month.mean()
plt.scatter(df['Month_num'], df['Freight_In_(tonnes)'])
plt.show()

Try this:
df.groupby(['Month_num']).mean().reset_index().plot(kind='scatter',x='Month_num',y='Freight_In_(tonnes)')

Related

Convert Dataframe to display in matplotlib line chart with dates

I am facing an issue while plotting graph in matplotlib as I am unable to convert data exactly to give inputs to matplotlib
Here is my data
date,GOOG,AAPL,FB,BABA,AMZN,GE,AMD,WMT,BAC,GM,T,UAA,SHLD,XOM,RRC,BBY,MA,PFE,JPM,SBUX
1989-12-29,,0.117203,,,,0.352438,3.9375,3.48607,1.752478,,2.365775,,,1.766756,,0.166287,,0.110818,1.827968,
1990-01-02,,0.123853,,,,0.364733,4.125,3.660858,1.766686,,2.398184,,,1.766756,,0.173216,,0.113209,1.835617,
1990-01-03,,0.124684,,,,0.36405,4.0,3.660858,1.780897,,2.356516,,,1.749088,,0.194001,,0.113608,1.896803,
1990-01-04,,0.1251,,,,0.362001,3.9375,3.641439,1.743005,,2.403821,,,1.731422,,0.190537,,0.115402,1.904452,
1990-01-05,,0.125516,,,,0.358586,3.8125,3.602595,1.705114,,2.287973,,,1.722587,,0.190537,,0.114405,1.9121,
1990-01-08,,0.126347,,,,0.360635,3.8125,3.651146,1.714586,,2.326588,,,1.749088,,0.17668,,0.113409,1.9121,
1990-01-09,,0.1251,,,,0.353122,3.875,3.55404,1.714586,,2.273493,,,1.713754,,0.17668,,0.111017,1.850914,
1990-01-10,,0.119697,,,,0.353805,3.8125,3.55404,1.681432,,2.210742,,,1.722587,,0.173216,,0.11301,1.843264,
1990-01-11,,0.11471,,,,0.353805,3.875,3.592883,1.667222,,2.23005,,,1.731422,,0.169751,,0.111814,1.82032,
I have converted it as following dataframe
AAPL
2016 0.333945
2017 0.330923
2018 0.321857
2019 0.312790
<class 'pandas.core.frame.DataFrame'>
by using following code:
import pandas as pd
df = pd.read_csv("portfolio.txt")
companyname = "AAPL"
frames = df.loc[:, df.columns.str.startswith(companyname)]
l1 = frames.loc['2015-6-1':'2019-6-10']
print(l1)
print(type(l1))
plt.plot(li1, label="Company Past Information")
plt.xlabel('Risk Aversion')
plt.ylabel('Optimal Investment Portfolio')
plt.title('Optimal Investment Portfolio For Low, Medium & High')
plt.legend()
plt.show()
After plotting to matplotlib I getting output correctly for which data is existed.
But for which data is not available graph is plotting wrongly.
GOOG
2016 NaN
2017 NaN
2018 NaN
2019 NaN
Due to this I am unable to plot graph correctly
Please help out of this
Thanks in advance
If you're reading you data in from a .csv using pandas you can:
import pandas as pd
df = pd.csv_read(your_csv, parse_dates=[0]) # 0 means your dates are in the first column
Otherwise you can convert your data column to datatime using:
import pandas as pd
df['date'] = pd.to_datetime(df['date'])
When using matplotlib then you can:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(df.iloc[:, 0], df.loc[:, some_column])
plt.show()

Difficulty grouping barchart using Python, Pandas and Matplotlib

I am having difficulty getting plot.bar to group the bars together the way I have them grouped in the dataframe. The dataframe returns the grouped data correctly, however, the bar graph is providing a separate bar for every line int he dataframe. Ideally, everything in my code below should group 3-6 bars together for each department (Dept X should have bars grouped together for each type, then count of true/false as the Y axis).
Dataframe:
dname Type purchased
Dept X 0 False 141
True 270
1 False 2020
True 2604
2 False 2023
True 1047
Code:
import psycopg2
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
##connection and query data removed
df = pd.merge(df_departments[["id", "dname"]], df_widgets[["department", "widgetid", "purchased","Type"]], how='inner', left_on='id', right_on='department')
df.set_index(['dname'], inplace=True)
dx=df.groupby(['dname', 'Type','purchased'])['widgetid'].size()
dx.plot.bar(x='dname', y='widgetid', rot=90)
I can't be sure without a more reproducible example, but try unstacking the innermost level of the MultiIndex of dx before plotting:
dx.unstack().plot.bar(x='dname', y='widgetid', rot=90)
I expect this to work because when plotting a DataFrame, each column becomes a legend entry and each row becomes a category on the horizontal axis.

Is it possible to explicitly set order the stacks in a matplotlib stackplot?

I want to explicitly set the order of the stacks in a Matplotlib stackplot. Here is an example:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
np.random.seed(1)
df = pd.DataFrame(np.random.randint(0,100,size=(100,4)),columns=list('ABCD'))
df.plot(kind='area',stacked=True,figsize=(20,10));
This produces the following image:
The last row of the dataframe from:
df.tail(1)
is:
A B C D
99 16 30 84 57
Here is what I want to achieve:
I want to re-order the plot of the stacks such that the stacks are plotted from the bottom up A, B, D, C i.e. the columns ordered from the bottom up, by the order of their increasing values in the last row of the df.
So far, I have tried re-ordering explicitly the columns in the df before plotting:
df[['A','B','D','C']].plot(kind='area',stacked=True,figsize=(20,10))
but this produces exactly the same chart as above.
Thank you for any help here!
The graphs are not the same. Look at the areas beneath the red graph for a particle x. The shapes for those graphs are different for the green and blue shaded areas.
And now,
df[['A','B','D','C']].plot(kind='area',stacked=True,figsize=(20,10))

Keeping only year and month in axis of a matplotlib plot

When using datetime objects as axis ticks in a matplotlib plot, I would like to only report year and month. But instead, using my code below, the x axis includes also days, hours, minutes ans seconds. Is there a simple way to remove these, so that only the month and year remain?
The formatting is not crucial. That is, it could be either like 2015-12 or like December 2015.
import matplotlib.pyplot as plt
import datetime
fig, ax = plt.subplots()
arr = [1,2,3,4]
ax.scatter(arr, arr)
firstDate = datetime.datetime(2010, 12, 18)
ticks = ax.get_xticks()
ax.xaxis.set_ticklabels([ firstDate+datetime.timedelta(7*tick) for tick in ticks])
plt.xticks(fontsize=12, rotation=90)
plt.show()

Unable to create charts using bokeh

So I'm using the following code in Spyder to create a chart which will be displayed in a web browser:
import pandas as pd
import numpy as np
from bokeh.plotting import figure, output_file, show
car = pd.read_csv('car_sales.csv')
Month = car['Month']
Sales = car['Sales']
output_file("bokeh_scatter_example.html", title="Bokeh Scatter Plot Example")
fig2 = figure(title="Bokeh Scatter Plot Example", x_axis_label='Month',
y_axis_label='Sales')
fig2.circle(Month, Sales, size=5, alpha=0.5)
show(fig2)
What I've realized is that if the x-axis values are numeric, then this code works. But my months column is in string format i.e Jan, Feb etc which is when the code stops working. Any help would be appreciated. Thanks.
Edit: output of car.head()
Month Sales
0 Jan 1808
1 Feb 1251
2 Mar 3023
and so on.
Your X-axis is categorical in nature, so it needs special handling. You have to create a figure like this:
fig2 = figure(title="Bokeh Scatter Plot Example", x_axis_label='Month',
y_axis_label='Sales',
x_range=Month)
More details can be found here:
https://docs.bokeh.org/en/latest/docs/user_guide/categorical.html

Resources