Keeping only year and month in axis of a matplotlib plot - python-3.x

When using datetime objects as axis ticks in a matplotlib plot, I would like to only report year and month. But instead, using my code below, the x axis includes also days, hours, minutes ans seconds. Is there a simple way to remove these, so that only the month and year remain?
The formatting is not crucial. That is, it could be either like 2015-12 or like December 2015.
import matplotlib.pyplot as plt
import datetime
fig, ax = plt.subplots()
arr = [1,2,3,4]
ax.scatter(arr, arr)
firstDate = datetime.datetime(2010, 12, 18)
ticks = ax.get_xticks()
ax.xaxis.set_ticklabels([ firstDate+datetime.timedelta(7*tick) for tick in ticks])
plt.xticks(fontsize=12, rotation=90)
plt.show()

Related

How to change color between two different columns in a scatterplot?

I have the following code:
plt.scatter(moving_avg_temp.chicagoMA, moving_avg_temp.globalMA)
plt.title('Correlation between Chicago & Global 5 Year MA')
plt.xlabel('Chicago 5 Year MA')
plt.ylabel('Global 5 Year MA')
plt.show()
Which results in a scatterplot where every data point is the same color. I am trying to have moving_avg_temp.chicagoMA be a different color than moving_avg_temp.globalMA to visualize the correlation between the two variables.
I am using pandas and Matplotlib.pyplot.
Here is a code snippet that might help you:
import matplotlib.pyplot as plt
%matplotlib inline
x = [1,2,3,4,7,8,7]
y = [4,1,3,6,3,6,8]
plt.scatter(x, y, c='red')
x2 = [5,6,7,8]
y2 = [1,3,5,2]
plt.scatter(x2, y2, c='lightblue')
plt.title('Correlation between Chicago & Global 5 Year MA')
plt.xlabel('Chicago 5 Year MA')
plt.ylabel('Global 5 Year MA')
plt.show()
Output:
Here is a list with all the color names within MatplotLib --> https://matplotlib.org/examples/color/named_colors.html

pandas DatetimeIndex to matplotlib x-ticks

I have a pandas Dateframe with a date index looking like this:
Date
2020-09-03
2020-09-04
2020-09-07
2020-09-08
The dates are missing a few entries, since its only data for weekdays.
The thing I want to do is: Plot the figure and set an x tick on every Monday of the week.
So far I've tried:
date_form = DateFormatter("%d. %b %Y")
ax4.xaxis.set_major_formatter(date_form)
ax4.xaxis.set_major_locator(mdates.WeekdayLocator(byweekday=MO))
But it will start with 1970 and not with the actual date index.
Then I tried to:
mdates.set_epoch('First day of my data')
But this won't help since Saturday and Sunday is skipped in my original Index.
Any ideas what I could do?
If you draw a line plot with one axis of datetime type,
the most natural solution is to use plot_date.
I created an example DataFrame like below:
Amount
Date
2020-08-24 210
2020-08-25 220
2020-08-26 240
2020-08-27 215
2020-08-28 243
...
Date (the index) is of datetime type.
The code to draw is:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
fig, ax = plt.subplots()
plt.xticks(rotation=30)
ax.plot_date(df.index, df.Amount, linestyle='solid')
ax.xaxis.set_major_locator(mdates.WeekdayLocator(byweekday=0))
plt.grid()
plt.show()
The picture I got is:
As you can see, there is absolutely no problem with x ticks and they
are just on Mondays, as you want.

Unable to create charts using bokeh

So I'm using the following code in Spyder to create a chart which will be displayed in a web browser:
import pandas as pd
import numpy as np
from bokeh.plotting import figure, output_file, show
car = pd.read_csv('car_sales.csv')
Month = car['Month']
Sales = car['Sales']
output_file("bokeh_scatter_example.html", title="Bokeh Scatter Plot Example")
fig2 = figure(title="Bokeh Scatter Plot Example", x_axis_label='Month',
y_axis_label='Sales')
fig2.circle(Month, Sales, size=5, alpha=0.5)
show(fig2)
What I've realized is that if the x-axis values are numeric, then this code works. But my months column is in string format i.e Jan, Feb etc which is when the code stops working. Any help would be appreciated. Thanks.
Edit: output of car.head()
Month Sales
0 Jan 1808
1 Feb 1251
2 Mar 3023
and so on.
Your X-axis is categorical in nature, so it needs special handling. You have to create a figure like this:
fig2 = figure(title="Bokeh Scatter Plot Example", x_axis_label='Month',
y_axis_label='Sales',
x_range=Month)
More details can be found here:
https://docs.bokeh.org/en/latest/docs/user_guide/categorical.html

Scatter plots from averaging a columns values

I'm working with 2 columns from a csv file: Month_num and Freight_In_(tonnes)
I'm attempting to plot the average value of freight for each month from 1987 to 2016. I can currently show each months average freight in, in table format but I'm struggling to get it to show in the scatter plot.
Here is my current code.
from matplotlib import pyplot as plt
import pandas as pd
df = pd.read_csv('CityPairs.csv')
Month = df.groupby(['Month_num'])['Freight_In_(tonnes)']
Month.mean()
plt.scatter(df['Month_num'], df['Freight_In_(tonnes)'])
plt.show()
Try this:
df.groupby(['Month_num']).mean().reset_index().plot(kind='scatter',x='Month_num',y='Freight_In_(tonnes)')

Pandas Matplotlib Line Graph

Given the following data frame:
import pandas as pd
import numpy as np
df = pd.DataFrame(
{'YYYYMM':[201603,201503,201403,201303,201603,201503,201403,201303],
'Count':[5,6,2,7,4,7,8,9],
'Group':['A','A','A','A','B','B','B','B']})
df
Count Group YYYYMM
0 5 A 201603
1 6 A 201503
2 2 A 201403
3 7 A 201303
4 4 B 201603
5 7 B 201503
6 8 B 201403
7 9 B 201303
I need to generate a line graph with one line per group with a summary table at the bottom. Something like this:
I need each instance of 'YYYYMM' to be treated like a year by Pandas/Matplotlib.
So far, this seems to help, but I'm not sure if it will do the trick:
df['YYYYMM']=df['YYYYMM'].astype(str).str[:-2].astype(np.int64)
Then, I did this to pivot the data:
t=df.pivot_table(df,index=['YYYYMM'],columns=['Group'],aggfunc=np.sum)
Count
Group A B
YYYYMM
2013 7 9
2014 2 8
2015 6 7
2016 5 4
Then, I tried to plot it:
import matplotlib.pyplot as plt
%matplotlib inline
fig, ax = plt.subplots(1,1)
t.plot(table=t,ax=ax)
...and this happened:
I'd like to do the following:
remove all lines (borders) from the table at the bottom
remove the jumbled text in the table
remove the x axis tick labels (it should just show the years for tick labels)
I can clean up the rest myself (remove legend and borders, etc..).
Thanks in advance!
I may not have fully understood what you mean by 1., since you are showing the table lines in your reference. I have also not understood whether you want to transpose the table.
What you may be looking for is:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(
{'YYYYMM':[201603,201503,201403,201303,201603,201503,201403,201303],
'Count':[5,6,2,7,4,7,8,9],
'Group':['A','A','A','A','B','B','B','B']})
df['YYYYMM']=df['YYYYMM'].astype(str).str[:-2].astype(int)
t=pd.pivot_table(df, values='Count', index='YYYYMM',columns='Group',aggfunc=np.sum)
t.index.name = None
fig, ax = plt.subplots(1,1)
t.plot(table=t,ax=ax)
ax.xaxis.set_major_formatter(plt.NullFormatter())
plt.tick_params(
axis='x', # changes apply to the x-axis
which='both', # both major and minor ticks are affected
bottom='off', # ticks along the bottom edge are off
top='off', # ticks along the top edge are off
labelbottom='off') # labels along the bottom edge are off
plt.show()

Resources