pandas DatetimeIndex to matplotlib x-ticks - python-3.x

I have a pandas Dateframe with a date index looking like this:
Date
2020-09-03
2020-09-04
2020-09-07
2020-09-08
The dates are missing a few entries, since its only data for weekdays.
The thing I want to do is: Plot the figure and set an x tick on every Monday of the week.
So far I've tried:
date_form = DateFormatter("%d. %b %Y")
ax4.xaxis.set_major_formatter(date_form)
ax4.xaxis.set_major_locator(mdates.WeekdayLocator(byweekday=MO))
But it will start with 1970 and not with the actual date index.
Then I tried to:
mdates.set_epoch('First day of my data')
But this won't help since Saturday and Sunday is skipped in my original Index.
Any ideas what I could do?

If you draw a line plot with one axis of datetime type,
the most natural solution is to use plot_date.
I created an example DataFrame like below:
Amount
Date
2020-08-24 210
2020-08-25 220
2020-08-26 240
2020-08-27 215
2020-08-28 243
...
Date (the index) is of datetime type.
The code to draw is:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
fig, ax = plt.subplots()
plt.xticks(rotation=30)
ax.plot_date(df.index, df.Amount, linestyle='solid')
ax.xaxis.set_major_locator(mdates.WeekdayLocator(byweekday=0))
plt.grid()
plt.show()
The picture I got is:
As you can see, there is absolutely no problem with x ticks and they
are just on Mondays, as you want.

Related

Convert Dataframe to display in matplotlib line chart with dates

I am facing an issue while plotting graph in matplotlib as I am unable to convert data exactly to give inputs to matplotlib
Here is my data
date,GOOG,AAPL,FB,BABA,AMZN,GE,AMD,WMT,BAC,GM,T,UAA,SHLD,XOM,RRC,BBY,MA,PFE,JPM,SBUX
1989-12-29,,0.117203,,,,0.352438,3.9375,3.48607,1.752478,,2.365775,,,1.766756,,0.166287,,0.110818,1.827968,
1990-01-02,,0.123853,,,,0.364733,4.125,3.660858,1.766686,,2.398184,,,1.766756,,0.173216,,0.113209,1.835617,
1990-01-03,,0.124684,,,,0.36405,4.0,3.660858,1.780897,,2.356516,,,1.749088,,0.194001,,0.113608,1.896803,
1990-01-04,,0.1251,,,,0.362001,3.9375,3.641439,1.743005,,2.403821,,,1.731422,,0.190537,,0.115402,1.904452,
1990-01-05,,0.125516,,,,0.358586,3.8125,3.602595,1.705114,,2.287973,,,1.722587,,0.190537,,0.114405,1.9121,
1990-01-08,,0.126347,,,,0.360635,3.8125,3.651146,1.714586,,2.326588,,,1.749088,,0.17668,,0.113409,1.9121,
1990-01-09,,0.1251,,,,0.353122,3.875,3.55404,1.714586,,2.273493,,,1.713754,,0.17668,,0.111017,1.850914,
1990-01-10,,0.119697,,,,0.353805,3.8125,3.55404,1.681432,,2.210742,,,1.722587,,0.173216,,0.11301,1.843264,
1990-01-11,,0.11471,,,,0.353805,3.875,3.592883,1.667222,,2.23005,,,1.731422,,0.169751,,0.111814,1.82032,
I have converted it as following dataframe
AAPL
2016 0.333945
2017 0.330923
2018 0.321857
2019 0.312790
<class 'pandas.core.frame.DataFrame'>
by using following code:
import pandas as pd
df = pd.read_csv("portfolio.txt")
companyname = "AAPL"
frames = df.loc[:, df.columns.str.startswith(companyname)]
l1 = frames.loc['2015-6-1':'2019-6-10']
print(l1)
print(type(l1))
plt.plot(li1, label="Company Past Information")
plt.xlabel('Risk Aversion')
plt.ylabel('Optimal Investment Portfolio')
plt.title('Optimal Investment Portfolio For Low, Medium & High')
plt.legend()
plt.show()
After plotting to matplotlib I getting output correctly for which data is existed.
But for which data is not available graph is plotting wrongly.
GOOG
2016 NaN
2017 NaN
2018 NaN
2019 NaN
Due to this I am unable to plot graph correctly
Please help out of this
Thanks in advance
If you're reading you data in from a .csv using pandas you can:
import pandas as pd
df = pd.csv_read(your_csv, parse_dates=[0]) # 0 means your dates are in the first column
Otherwise you can convert your data column to datatime using:
import pandas as pd
df['date'] = pd.to_datetime(df['date'])
When using matplotlib then you can:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(df.iloc[:, 0], df.loc[:, some_column])
plt.show()

How to plot pandas dataframe in 24 hour intervals? (multiple plots)

I have a pandas dataframe of about 3 years with the resolution of 6 seconds and I want to group the data into 24-hour bins and plot each day using matplotlib in a loop.
This is my dataframe's head:
timestamp consumption
0 2012-11-11 12:00:03 468
1 2012-11-11 12:00:09 476
2 2012-11-11 12:00:16 463
3 2012-11-11 12:00:22 449
4 2012-11-11 12:00:28 449
It includes the power consumption of a house from 2012 till 2015. After the pre-processing, the dataframe starts at about 12 pm of the first day. I need to plot all of the dataframe in 24-hour intervals and each plot must represent for a single day that starts from about 12 pm and ends at about 12 pm of the next day
So, I need about 1500 plots that show the power consumption of each day starting from 12 pm, for about 1500 days of my dataframe.
Thanks in advance.
Update: The reason I want to plot 1500 days separately, is I want to check each night's power consumption and label the occupant's sleep pattern. And I considered each day from 12 pm to 12 pm to have a complete sleep cycle in one plot. And after preparing the labels I'll be able to use them as train and test data for classification
Consider this not only an answer but also a suggestion. First, convert the column 'timestamp' into the index (DatetimeIndex)
df.set_index(df['timestamp'], inplace=True, drop=True)
Then, get all the unique days that happen in your DataFrame
unique_days = list(set(df.index.to_period('D').strftime('%Y-%m-%d')))
We then squeeze the DataFrame into a Series
del df['timestamp']
df = df.squeeze()
Now, just plot unique days in your series in separate subplots.
import matplotlib.pyplot as plt
unique_days = list(set(df.index.to_period('D').strftime('%Y-%m-%d')))
fig, axes = plt.subplots(nrows=len(unique_days), ncols=1)
row = 0
for day in unique_days:
df[day].plot(ax=axes[row], figsize=(50,10))
row += 1
plt.show()
Now, it's time for you to play around with the parameters of plots so that you can customize them to your needs.
This is kind of a strange request. If we knew what your end objective is, it might be easier to understand, but I'm going to assume you want to plot and then save figures for each of the days.
df['day'] = (df['timestamp'] + pd.Timedelta('12h')).dt.date
for day in df['day'].unique():
mask = (df['day'] == day)
#<the code for the plot that you want>
plt.plot(x=df[mask]['timestamp'].dt.time,y=df[mask]['consumption'])
plt.savefig('filename'+str(day)+'.png')
plt.close()

Keeping only year and month in axis of a matplotlib plot

When using datetime objects as axis ticks in a matplotlib plot, I would like to only report year and month. But instead, using my code below, the x axis includes also days, hours, minutes ans seconds. Is there a simple way to remove these, so that only the month and year remain?
The formatting is not crucial. That is, it could be either like 2015-12 or like December 2015.
import matplotlib.pyplot as plt
import datetime
fig, ax = plt.subplots()
arr = [1,2,3,4]
ax.scatter(arr, arr)
firstDate = datetime.datetime(2010, 12, 18)
ticks = ax.get_xticks()
ax.xaxis.set_ticklabels([ firstDate+datetime.timedelta(7*tick) for tick in ticks])
plt.xticks(fontsize=12, rotation=90)
plt.show()

Unable to create charts using bokeh

So I'm using the following code in Spyder to create a chart which will be displayed in a web browser:
import pandas as pd
import numpy as np
from bokeh.plotting import figure, output_file, show
car = pd.read_csv('car_sales.csv')
Month = car['Month']
Sales = car['Sales']
output_file("bokeh_scatter_example.html", title="Bokeh Scatter Plot Example")
fig2 = figure(title="Bokeh Scatter Plot Example", x_axis_label='Month',
y_axis_label='Sales')
fig2.circle(Month, Sales, size=5, alpha=0.5)
show(fig2)
What I've realized is that if the x-axis values are numeric, then this code works. But my months column is in string format i.e Jan, Feb etc which is when the code stops working. Any help would be appreciated. Thanks.
Edit: output of car.head()
Month Sales
0 Jan 1808
1 Feb 1251
2 Mar 3023
and so on.
Your X-axis is categorical in nature, so it needs special handling. You have to create a figure like this:
fig2 = figure(title="Bokeh Scatter Plot Example", x_axis_label='Month',
y_axis_label='Sales',
x_range=Month)
More details can be found here:
https://docs.bokeh.org/en/latest/docs/user_guide/categorical.html

Scatter plots from averaging a columns values

I'm working with 2 columns from a csv file: Month_num and Freight_In_(tonnes)
I'm attempting to plot the average value of freight for each month from 1987 to 2016. I can currently show each months average freight in, in table format but I'm struggling to get it to show in the scatter plot.
Here is my current code.
from matplotlib import pyplot as plt
import pandas as pd
df = pd.read_csv('CityPairs.csv')
Month = df.groupby(['Month_num'])['Freight_In_(tonnes)']
Month.mean()
plt.scatter(df['Month_num'], df['Freight_In_(tonnes)'])
plt.show()
Try this:
df.groupby(['Month_num']).mean().reset_index().plot(kind='scatter',x='Month_num',y='Freight_In_(tonnes)')

Resources