Matplotlib - bar chart starts does not start with 0 - python-3.x

I have a data and I am looking to present it in bar chart form.
data:
col1 = ['2018 01 01', '2018 01 02', '2018 12 27'] #dates
col2 = ['4554', '14120', '1422'] #usage of the user in seconds for that data in col1
my code:
I have imported all of the modules
import openpyxl as ol
import numpy as np
import matplotlib.pyplot as plt
plt.bar(col1, col2, label="Usage of the user")
plt.xlabel("Date")
plt.ylabel("Usage in seconds")
plt.title('Usage report of ' + str(args.user))
plt.legend()
plt.savefig("data.png")
When I open data.png
I get this:
Click here for the image
The graph looks all over the place, I want it to start at zero.
I am new to the matplotlib and openpyxl.
Any help is appreciated.

It appears that the issue is that the values in col2 that are being plotted on the y-axis are strings rather than integers. Updating these values to integers will allow the y-axis to start at 0 and be in sequential order.
col1 = ['2018 01 01', '2018 01 02', '2018 12 27'] #dates
col2 = ['4554', '14120', '1422']
plt.bar(col1, [int(x) for x in col2], label="Usage of the user")
plt.xlabel("Date")
plt.ylabel("Usage in seconds")
plt.title('Usage report')
plt.legend()

Related

Plotting datetimes in matplotlib producing many colors

I new to python, trying to plot datetime data in matlibplot, but getting a strange result - I can only plot points and they are myriad different colors. I am using plot_date().
I tried generating a workable example but the problem wouldn't show up there (see below). So here is a sample of the database that is giving problems.
import pandas as pd
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
#get a sense of what the data looks like:
data.head()
out:
date variable value unit
0 2020-04-17 10:30:02.309433 Temperature 20.799999 C
2 2020-04-17 10:45:12.089008 Temperature 20.799999 C
4 2020-04-17 11:00:07.033692 Temperature 20.799999 C
6 2020-04-17 11:15:04.457991 Temperature 20.799999 C
8 2020-04-17 11:30:04.996910 Temperature 20.799999 C
data.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 99 entries, 0 to 196
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 date 99 non-null object
1 variable 99 non-null object
2 value 98 non-null float64
3 unit 99 non-null object
dtypes: float64(1), object(3)
memory usage: 3.9+ KB
#convert date variable to datetime
data['date'] = pd.to_datetime(data['date'])
#plot with plot_date, calling date2num on date variable
plt.plot_date([mdates.date2num(data['date'])], [data['value']])
Gives:
Why am I getting all these colored points? When I build a small data set of three time periods I don't see this behavior. Instead I get three blue points:
#create dataframe
df = pd.DataFrame({'time': ['2020-04-17 10:30:02.309433', '2020-04-17 10:30:02.309455', '2020-04-17 10:45:12.089008'],
'value': [20.799999, 41.099998, 47.599998]})
#change time variable to datetime object
df['time'] = pd.to_datetime(df['time'])
#plot
plt.plot_date(mdates.date2num(df['time']), df['value'])
Gives three blue dots as expected:
Finally, how can I produce a line plot using plot_date(). The only way I have seen to do this is using: datetime.datime.now() date formats and calling pyplot.plot() - see second answer here: Plotting time in Python with Matplotlib
The difference between plt.plot_date([mdates.date2num(data['date'])], [data['value']]) and plt.plot_date(mdates.date2num(df['time']), df['value']) is that you have an extra set of square brackets.
As for the line, add fmt='-' option to plot_date

How to split a dataframe and plot some columns

I have a dataframe with 990 rows and 7 columns, I want to make a XvsY linear graph, broking the line at every 22 rows.
I think that dividing the dataframe and then plotting it will be good way, but I don't get good results.
max_rows = 22
dataframes = []
while len(Co1new) > max_rows:
top = Co1new[:max_rows]
dataframes.append(top)
Co1new = Co1new[max_rows:]
else:
dataframes.append(Co1new)
for grafico in dataframes:
AC = plt.plot(grafico)
AC = plt.xlabel('Frequency (Hz)')
AC = plt.ylabel("Temperature (K)")
plt.show()
The code functions but it is not plotting the right columns.
Here some reduced data and in this case it should be divided at every four rows:
df = pd.DataFrame({
'col1':[2.17073,2.14109,2.16052,2.81882,2.29713,2.26273,2.26479,2.7643,2.5444,2.5027,2.52532,2.6778],
'col2':[10,100,1000,10000,10,100,1000,10000,10,100,1000,10000],
'col3':[2.17169E-4,2.15889E-4,2.10526E-4,1.53785E-4,2.09867E-4,2.07583E-4,2.01699E-4,1.56658E-4,1.94864E-4,1.92924E-4,1.87634E-4,1.58252E-4]})
One way I can think of is to add a new column with labels for every 22 records. See below
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn
seaborn.set(style='ticks')
"""
Assuming the index is numeric and is from [0-990)
this will return an integer for every 22 records
"""
Co1new['subset'] = 'S' + np.floor_divide(Co1new.index, 22).astype(str)
Out:
col1 col2 col3 subset
0 2.17073 10 0.000217 S0
1 2.14109 100 0.000216 S0
2 2.16052 1000 0.000211 S0
3 2.81882 10000 0.000154 S0
4 2.29713 10 0.000210 S1
5 2.26273 100 0.000208 S1
6 2.26479 1000 0.000202 S1
7 2.76434 10000 0.000157 S1
8 2.54445 10 0.000195 S2
9 2.50270 100 0.000193 S2
10 2.52532 1000 0.000188 S2
11 2.67780 10000 0.000158 S2
You can then use seaborn.pairplot to plot your data pairwise and use Co1new['subset'] as legend.
seaborn.pairplot(Co1new, hue='subset')
Or if you absolutely need line charts, you can make line charts of your data, each pair at a time separately, here is col1 vs. col3
seaborn.lineplot('col1', 'col3', hue='subset', data=Co1new)
Using #SIA ' s answer
df['groups'] = np.floor_divide(df.index, 3).astype(str)
import plotly.express as px
fig = px.line(df, x="col1", y="col2", color='groups')
fig.show()

Change the bar item name in Pandas

I have a test excel file like:
df = pd.DataFrame({'name':list('abcdefg'),
'age':[10,20,5,23,58,4,6]})
print (df)
name age
0 a 10
1 b 20
2 c 5
3 d 23
4 e 58
5 f 4
6 g 6
I use Pandas and matplotlib to read and plot it:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
excel_file = 'test.xlsx'
df = pd.read_excel(excel_file, sheet_name=0)
df.plot(kind="bar")
plt.show()
the result shows:
it use index number as item name, how can I change it to the name, which stored in column name?
You can specify columns for x and y values in plot.bar:
df.plot(x='name', y='age', kind="bar")
Or create Series first by DataFrame.set_index and select age column:
df.set_index('name')['age'].plot(kind="bar")
#if multiple columns
#df.set_index('name').plot(kind="bar")

Plot values for multiple months and years in Plotly/Dash

I have a Dash dashboard and I need to plot on the x axis months from 0-12 and I need to have multiple lines on the same figure for different years that have been selected, ie 1991-2040. The plotted value is a columns say 'total' in a dataframe. The labels should be years and the total value is on the y axis. My data looks like this:
Month Year Total
0 0 1991 31.4
1 0 1992 31.4
2 0 1993 31.4
3 0 1994 20
4 0 1995 300
.. ... ... ...
33 0 2024 31.4
34 1 2035 567
35 1 2035 10
36 1 2035 3
....
Do I need to group it and how to achieve that in Dash/Plotly?
It seems to me that you should have a look at pd.pivot_table.
%matplotlib inline
import pandas as pd
import numpy as np
import plotly.offline as py
import plotly.graph_objs as go
# create a df
N = 100
df = pd.DataFrame({"Date":pd.date_range(start='1991-01-01',
periods=N,
freq='M'),
"Total":np.random.randn(N)})
df["Month"] = df["Date"].dt.month
df["Year"] = df["Date"].dt.year
# use pivot_table to have years as columns
pv = pd.pivot_table(df,
index=["Month"],
columns=["Year"],
values=["Total"])
# remove multiindex in columns
pv.columns = [col[1] for col in pv.columns]
data = [go.Scatter(x = pv.index,
y = pv[col],
name = col)
for col in pv.columns]
py.iplot(data)

How to stop months being ordered alphabetically in pandas pivot table

alphabetically-ordered months
How can I stop pandas converting my chronologically-ordered data in a csv into alphabetical order (like in my current plot). This is the code I am using:
import seaborn as sns
df = pd.read_csv("C:/Users/Paul/Desktop/calendar.csv")
df2 = df.pivot("Month", "Year", "hPM2.5")
ax = sns.heatmap(df2, annot=True, fmt="d")
I think you can use ordered categorical:
import pandas as pd
import seaborn as sns
df = pd.DataFrame({'Month':['January','February','September'],
'Year':[2015,2015,2016],
'hPM2.5':[7,8,9]})
print (df)
Month Year hPM2.5
0 January 2015 7
1 February 2015 8
2 September 2016 9
cats = ['January','February','March','April','May','June',
'July','August','September','October','November','December']
df['Month'] = df['Month'].astype('category',
ordered=True,
categories=cats)
df2 = df.pivot("Month", "Year", "hPM2.5")
sns.heatmap(df2, annot=True)

Resources