matplotlib auto generates dates in my x-axis - python-3.x

I have a DataFrame that consist of 2 columns
Transaction Week | Completed
2021-01-10 | 63
2021-01-17 | 76
2021-01-24 | 63
2021-01-31 | 20
I cannot understand why after I plot the graph, my x-axis has more than 4 dates (My DataFrame only has 4 entries). How can I remove those dates?
x=Weekly_Settled_Trans_Status['Transaction Week']
y=Weekly_Settled_Trans_Status['Completed']
plt.plot(x,y)
plt.tick_params('x', labelrotation=45)

Related

How to ignore plotting the feature which have all zeros in subplot using groupby

I am trying to use the "groupby" to plot all features in a dataframe in different subplots based on each serial number with ignoring the feature 'Ft' of each serial number (in subplot) where the all data are Zeros, for example, we should ignore 'Ft1' in S/N 'KLM10015' because all data in this feature are Zeros. The size of the dataframe is "5514 rows and 565 columns" with the ability of using a dataframe with different sizes.
The x-axis of each subplot represent the "Date" , y-axis represents each feature values (Ft) and the title represent the serial number (S/N).
This is an example of the dataframe which I have:
df =
S/N Ft1 Ft12 Ft17 ---- Ft1130 Ft1140 Ft1150
DATE
2021-01-06 KLM10015 0 12 14 ---- 17 52 47
2021-01-07 KLM10015 0 10 48 ---- 19 20 21
2021-01-11 KLM10015 0 0 45 ---- 0 19 0
2021-01-17 KLM10015 0 1 0 ---- 16 44 66
| | | | | | | |
| | | | | | | |
| | | | | | | |
2021-02-09 KLM10018 1 11 0 ---- 25 27 19
2021-12-13 KLM10018 12 0 19 ---- 78 77 18
2021-12-16 kLM10018 14 17 14 ---- 63 19 0
2021-07-09 KLM10018 18 0 77 ---- 65 34 98
2021-07-15 KLM10018 0 88 82 ---- 63 31 22
Code:
list_ID = ["ft1","ft12", "ft17, ......, ft1130, 1140, ft1150]
def plot_fun (dataframe):
for item in list_ID:
fig = plt.figure(figsize=(35, 20))
ax1 = fig.add_subplot(221)
ax2 = fig.add_subplot(222)
ax3 = fig.add_subplot(223)
ax4 = fig.add_subplot(224)
dataframe.groupby('S/N')[item].plot(legend='True', ax=ax1)
dataframe.groupby('S/N')[item].plot(legend='True', ax=ax2)
dataframe.groupby('S/N')[item].plot(legend='True', ax=ax3)
dataframe.groupby('S/N')[item].plot(legend='True', ax=ax4)
plt.show()
plot_fun (df)
I need really to your help. Thanks a lot

Extract weekly data from daily and reshape it from long to wide format using Pandas

Given a sample data as follows, I hope to extract one data entry for each week, if for the week having multiple entries, then I will use the largest weekday's data as for that week:
date variable value
0 2020-11-4 quantity 564.0000
1 2020-11-11 quantity 565.0000
2 2020-11-18 quantity 566.0000
3 2020-11-25 quantity 566.0000
4 2020-11-2 price 1829.1039
5 2020-11-3 price 1789.5883
6 2020-11-4 price 1755.4307
7 2020-11-5 price 1750.0727
8 2020-11-6 price 1746.7239
9 2020-11-9 price 1756.1005
10 2020-11-10 price 1752.0820
11 2020-11-11 price 1814.3693
12 2020-11-12 price 1833.7922
13 2020-11-13 price 1833.7922
14 2020-11-16 price 1784.2302
15 2020-11-17 price 1764.1376
16 2020-11-18 price 1770.1654
17 2020-11-19 price 1757.4400
18 2020-11-20 price 1770.1654
To get week number of each date, I use df['week_number'] = pd.to_datetime(df['date']).dt.week.
date variable value week_number
0 2020-11-4 quantity 564.0000 45 --> to keep
1 2020-11-11 quantity 565.0000 46 --> to keep
2 2020-11-18 quantity 566.0000 47 --> to keep
3 2020-11-25 quantity 566.0000 48 --> to keep
4 2020-11-2 price 1829.1039 45
5 2020-11-3 price 1789.5883 45
6 2020-11-4 price 1755.4307 45
7 2020-11-5 price 1750.0727 45
8 2020-11-6 price 1746.7239 45 --> to keep, since it's the largest weekday for this week
9 2020-11-9 price 1756.1005 46
10 2020-11-10 price 1752.0820 46
11 2020-11-11 price 1814.3693 46
12 2020-11-12 price 1833.7922 46
13 2020-11-13 price 1833.7922 46 --> to keep, since it's the largest weekday for this week
14 2020-11-16 price 1784.2302 47
15 2020-11-17 price 1764.1376 47
16 2020-11-18 price 1770.1654 47
17 2020-11-19 price 1757.4400 47
18 2020-11-20 price 1770.1654 47 --> to keep, since it's the largest weekday for this week
Finally, I will reshape rows indicating to_keep to the expected result as follow:
variable the_45th_week the_46th_week the_47th_week the_48th_week
0 quantity 564.0000 565.0000 566.0000 566.0
1 price 1756.1005 1833.7922 1770.1654 NaN
How could I manipulate data to get the expected result? Sincere thanks.
EDIT:
df = df.sort_values(by=['variable','date'], ascending=False)
df.drop_duplicates(['variable', 'week_number'], keep='last')
Out:
date variable value week_number
0 2020-11-4 quantity 564.0000 45
3 2020-11-25 quantity 566.0000 48
2 2020-11-18 quantity 566.0000 47
1 2020-11-11 quantity 565.0000 46
4 2020-11-2 price 1829.1039 45
14 2020-11-16 price 1784.2302 47
10 2020-11-10 price 1752.0820 46
In your solution is possible add pivot with rename:
df['week_number'] = pd.to_datetime(df['date']).dt.week
df = df.sort_values(by=['variable','date'], ascending=False)
df = df.drop_duplicates(['variable', 'week_number'], keep='last')
f = lambda x: f'the_{x}th_week'
out = df.pivot('variable','week_number','value').rename(columns=f)
print(out)
week_number the_45th_week the_46th_week the_47th_week the_48th_week
variable
price 1829.1039 1752.082 1784.2302 NaN
quantity 564.0000 565.000 566.0000 566.0
Or remove DataFrame.drop_duplicates, so is possible use DataFrame.pivot_table with aggregate function last:
df['week_number'] = pd.to_datetime(df['date']).dt.week
df = df.sort_values(by=['variable','date'], ascending=False)
f = lambda x: f'the_{x}th_week'
out = df.pivot_table(index='variable',columns='week_number',values='value', aggfunc='last').rename(columns=f)
EDIT: to get an exact same result as the expected one:
out.reset_index().rename_axis(None, axis=1)
Out:
variable the_45th_week the_46th_week the_47th_week the_48th_week
0 price 1829.1039 1752.082 1784.2302 NaN
1 quantity 564.0000 565.000 566.0000 566.0

How to create this year_month sales and previous year_month sales in two different columns?

I need to create two different columns, one for this year sales and one column for last year sales from a transactional level data?
Data format:-
Date | bill amount
2019-07-22 | 500
2019-07-25 | 200
2020-11-15 | 100
2020-11-06 | 900
2020-12-09 | 50
2020-12-21 | 600
Required format:-
Year_month |This month Sales | Prev month sales
2019_07 | 700 | -
2020_11 | 1000 | -
2020_12 | 650 | 1000
The relatively tricky bit is to figure out what the previous month is. We do it by figuring out the beginning of the month for each date and then rolling back by 1 month. Note that this will take care of January -> December of previous year issues
We start by creating a sample dataframe and importing some useful modules
from io import StringIO
from datetime import datetime,timedelta
from dateutil.relativedelta import relativedelta
data = StringIO(
"""
date|amount
2019-07-22|500
2019-07-25|200
2020-11-15|100
2020-11-06|900
2020-12-09|50
2020-12-21|600
""")
df = pd.read_csv(data,sep='|')
df['date'] = pd.to_datetime(df['date'])
df
we get
date amount
0 2019-07-22 500
1 2019-07-25 200
2 2020-11-15 100
3 2020-11-06 900
4 2020-12-09 50
5 2020-12-21 600
Then we figure out the month start and the previous month start using datetime utilities
df['month_start'] = df['date'].apply(lambda d:datetime(year = d.year, month = d.month, day = 1))
df['prev_month_start'] = df['month_start'].apply(lambda d:d+relativedelta(months = -1))
Then we summarize monthly sales using groupby on month start
ms_df = df.drop(columns = 'date').groupby('month_start').agg({'prev_month_start':'first','amount':sum}).reset_index()
ms_df
so we get
month_start prev_month_start amount
0 2019-07-01 2019-06-01 700
1 2020-11-01 2020-10-01 1000
2 2020-12-01 2020-11-01 650
Then we join (merge) ms_df on itself by mapping 'prev_month_start' to 'month_start'
ms_df2 = ms_df.merge(ms_df, left_on='prev_month_start', right_on='month_start', how = 'left', suffixes = ('','_prev'))
We are more or less there but now make it pretty by getting rid of superfluous columns, adding labels, etc
ms_df2['label'] = ms_df2['month_start'].dt.strftime('%Y_%m')
ms_df2 = ms_df2.drop(columns = ['month_start','prev_month_start','month_start_prev','prev_month_start_prev'])
columns = ['label','amount','amount_prev']
ms_df2 = ms_df2[columns]
and we get
| | label | amount | amount_prev |
|---:|--------:|---------:|--------------:|
| 0 | 2019_07 | 700 | nan |
| 1 | 2020_11 | 1000 | nan |
| 2 | 2020_12 | 650 | 1000 |
Using #piterbarg's data, we can use resample, combined with shift and concat to get your desired data:
import pandas as pd
from io import StringIO
data = StringIO(
"""
date|amount
2019-07-22|500
2019-07-25|200
2020-11-15|100
2020-11-06|900
2020-12-09|50
2020-12-21|600
"""
)
df = pd.read_csv(data, sep="|", parse_dates=["date"])
df
date amount
0 2019-07-22 500
1 2019-07-25 200
2 2020-11-15 100
3 2020-11-06 900
4 2020-12-09 50
5 2020-12-21 600
Get the sum for current sales:
data = df.resample(on="date", rule="1M").amount.sum().rename("This_month")
data
date
2019-07-31 700
2019-08-31 0
2019-09-30 0
2019-10-31 0
2019-11-30 0
2019-12-31 0
2020-01-31 0
2020-02-29 0
2020-03-31 0
2020-04-30 0
2020-05-31 0
2020-06-30 0
2020-07-31 0
2020-08-31 0
2020-09-30 0
2020-10-31 0
2020-11-30 1000
2020-12-31 650
Freq: M, Name: This_month, dtype: int64
Now, we can shift the month to get values for previous month, and drop rows that have 0 as total sales to get your final output:
(pd.concat([data, data.shift().rename("previous_month")], axis=1)
.query("This_month!=0")
.fillna(0))
This_month previous_month
date
2019-07-31 700 0.0
2020-11-30 1000 0.0
2020-12-31 650 1000.0

Python - Plot Multiple Dataframes by Month and Day (Ignore Year)

I have multiple dataframes having different years data.
The data in dataframes are:
>>> its[0].head(5)
Crocs
date
2017-01-01 46
2017-01-08 45
2017-01-15 43
2017-01-22 43
2017-01-29 41
>>> its[1].head(5)
Crocs
date
2018-01-07 23
2018-01-14 21
2018-01-21 23
2018-01-28 21
2018-02-04 25
>>> its[2].head(5)
Crocs
date
2019-01-06 90
2019-01-13 79
2019-01-20 82
2019-01-27 82
2019-02-03 81
I tried to plot all these dataframes in single figure (graph), yeah i accomplished but it was not what i wanted.
I plotted the dataframes using the following code
>>> for p in its:
plt.plot(p.index,p.values)
>>> plt.show()
and i got the following graph
but this is not what i wanted
i want the graph to be like this
Simply i want graph to ignore years and plot by month and days
You can try of converting the datetime index to timeseries integers based on month and date and plot
df3 = pd.concat(its,axis=1)
xindex= df3.index.month*30 + df3.index.day
plt.plot(xindex,df3)
plt.show()
If you want to have datetime information than integers you can add xticks to frame
labels = (df3.index.month*30).astype(str)+"-" + df3.index.day.astype(str)
plt.xticks(df3.index.month*30 + df3.index.day, labels)
plt.show()

Plot Shaded Error Bars from Pandas Agg

I have data in the following format:
| | Measurement 1 | | Measurement 2 | |
|------|---------------|------|---------------|------|
| | Mean | Std | Mean | Std |
| Time | | | | |
| 0 | 17 | 1.10 | 21 | 1.33 |
| 1 | 16 | 1.08 | 21 | 1.34 |
| 2 | 14 | 0.87 | 21 | 1.35 |
| 3 | 11 | 0.86 | 21 | 1.33 |
I am using the following code to generate a matplotlib line graph from this data, which shows the standard deviation as a filled in area, see below:
def seconds_to_minutes(x, pos):
minutes = f'{round(x/60, 0)}'
return minutes
fig, ax = plt.subplots()
mean_temperature_over_time['Measurement 1']['mean'].plot(kind='line', yerr=mean_temperature_over_time['Measurement 1']['std'], alpha=0.15, ax=ax)
mean_temperature_over_time['Measurement 2']['mean'].plot(kind='line', yerr=mean_temperature_over_time['Measurement 2']['std'], alpha=0.15, ax=ax)
ax.set(title="A Line Graph with Shaded Error Regions", xlabel="x", ylabel="y")
formatter = FuncFormatter(seconds_to_minutes)
ax.xaxis.set_major_formatter(formatter)
ax.grid()
ax.legend(['Mean 1', 'Mean 2'])
Output:
This seems like a very messy solution, and only actually produces shaded output because I have so much data. What is the correct way to produce a line graph from the dataframe I have with shaded error regions? I've looked at Plot yerr/xerr as shaded region rather than error bars, but am unable to adapt it for my case.
What's wrong with the linked solution? It seems pretty straightforward.
Allow me to rearrange your dataset so it's easier to load in a Pandas DataFrame
Time Measurement Mean Std
0 0 1 17 1.10
1 1 1 16 1.08
2 2 1 14 0.87
3 3 1 11 0.86
4 0 2 21 1.33
5 1 2 21 1.34
6 2 2 21 1.35
7 3 2 21 1.33
for i, m in df.groupby("Measurement"):
ax.plot(m.Time, m.Mean)
ax.fill_between(m.Time, m.Mean - m.Std, m.Mean + m.Std, alpha=0.35)
And here's the result with some random generated data:
EDIT
Since the issue is apparently iterating over your particular dataframe format let me show how you could do it (I'm new to pandas so there may be better ways). If I understood correctly your screenshot you should have something like:
Measurement 1 2
Mean Std Mean Std
Time
0 17 1.10 21 1.33
1 16 1.08 21 1.34
2 14 0.87 21 1.35
3 11 0.86 21 1.33
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 4 columns):
(1, Mean) 4 non-null int64
(1, Std) 4 non-null float64
(2, Mean) 4 non-null int64
(2, Std) 4 non-null float64
dtypes: float64(2), int64(2)
memory usage: 160.0 bytes
df.columns
MultiIndex(levels=[[1, 2], [u'Mean', u'Std']],
labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
names=[u'Measurement', None])
And you should be able to iterate over it with and obtain the same plot:
for i, m in df.groupby("Measurement"):
ax.plot(m["Time"], m['Mean'])
ax.fill_between(m["Time"],
m['Mean'] - m['Std'],
m['Mean'] + m['Std'], alpha=0.35)
Or you could restack it to the format above with
(df.stack("Measurement") # stack "Measurement" columns row by row
.reset_index() # make "Time" a normal column, add a new index
.sort_values("Measurement") # group values from the same Measurement
.reset_index(drop=True)) # drop sorted index and make a new one

Resources