I have created the following dataframe df:
Setup:
import pandas as pd
import numpy as np
import random
import copy
import feather
import matplotlib.pyplot as plt
from statsmodels.graphics.mosaicplot import mosaic
import plotly.graph_objects as go
import plotly.express as px
import panel as pn
import holoviews as hv
import geoviews as gv
import geoviews.feature as gf
import cartopy
import cartopy.feature as cf
from geoviews import opts
from cartopy import crs as ccrs
import hvplot.pandas
import colorcet as cc
from colorcet.plotting import swatch
#pn.extension() # commented out as this causes an intermittent javascript error
gv.extension("bokeh")
cols = {"name":["Jim","Alice","Bob","Julia","Fern","Bill","Jordan","Pip","Shelly","Mimi"],
"age":[19,26,37,45,56,71,20,36,37,55],
"sex":["Male","Female","Male","Female","Female","Male","Male","Male","Female","Female"],
"age_band":["18-24","25-34","35-44","45-54","55-64","65-74","18-24","35-44","35-44","55-64"],
"insurance_renew_month":[1,2,3,3,3,4,5,5,6,7],
"postcode_prefix":["EH","M","G","EH","EH","M","G","EH","M","EH"],
"postcode_order":[3,2,1,3,3,2,1,3,2,3],
"local_authority_district":["S12000036","E08000003","S12000049","S12000036","S12000036","E08000003","S12000036","E08000003","S12000049","S12000036"],
"blah1":[3,None,None,8,8,None,1,None,None,None],
"blah2":[None,None,None,33,5,None,66,3,22,3],
"blah3":["A",None,"A",None,"C",None,None,None,None,None],
"blah4":[None,None,None,None,None,None,None,None,None,1]}
df = pd.DataFrame.from_dict(cols)
df
Out[2]:
name age sex age_band ... blah1 blah2 blah3 blah4
0 Jim 19 Male 18-24 ... 3.0 NaN A NaN
1 Alice 26 Female 25-34 ... NaN NaN None NaN
2 Bob 37 Male 35-44 ... NaN NaN A NaN
3 Julia 45 Female 45-54 ... 8.0 33.0 None NaN
4 Fern 56 Female 55-64 ... 8.0 5.0 C NaN
5 Bill 71 Male 65-74 ... NaN NaN None NaN
6 Jordan 20 Male 18-24 ... 1.0 66.0 None NaN
7 Pip 36 Male 35-44 ... NaN 3.0 None NaN
8 Shelly 37 Female 35-44 ... NaN 22.0 None NaN
9 Mimi 55 Female 55-64 ... NaN 3.0 None 1.0
[10 rows x 12 columns]
df[["sex","age_band","postcode_prefix"]] = df[["sex","age_band","postcode_prefix"]].astype("category")
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 12 columns):
name 10 non-null object
age 10 non-null int64
sex 10 non-null category
age_band 10 non-null category
insurance_renew_month 10 non-null int64
postcode_prefix 10 non-null category
postcode_order 10 non-null int64
local_authority_district 10 non-null object
blah1 4 non-null float64
blah2 6 non-null float64
blah3 3 non-null object
blah4 1 non-null float64
dtypes: category(3), float64(3), int64(3), object(3)
memory usage: 1.3+ KB
The Problem:
I can successfully create a mosaic plot with the following code:
fig,ax = plt.subplots(figsize=(15,10))
mosaic(df,["sex", "age_band"],ax=ax);
However, I am having issues when I try to create a corresponding app using pn.interact:
categoric_cols = df.select_dtypes(include="category")
cat_atts = categoric_cols.columns.tolist()
cat_atts
Out[4]: ['sex', 'age_band', 'postcode_prefix']
def bivar_cat(x="sex",y="age_band"):
if x in cat_atts and y in cat_atts:
fig,ax = plt.subplots(figsize=(15,10))
return mosaic(df,[x,y],ax=ax);
app_df_cat = pn.interact(bivar_cat,x=cat_atts,y=cat_atts)
app_df_cat
Which results in the following:
The above rendered mosaic plot seems to correspond to the default values of x & y (ie sex & age_band). When you select a new attribute for x or y from the dropdowns, the text above the mosaic plot changes (this text seems to be a string representation of the plot) however the mosaic plot itself does not.
Is my issue possibly related to having to comment out pn.extension()? I have found that when pn.extension() is not commented out, it results in an intermittent javascript error whereby sometimes there is no error raised, sometimes there is an error but my panel app still loads and sometimes there is an error and it crashes my browser. (I have omitted the javascript error here as it can be very large - if it is helpful I can add this to my post.) I would say that the error is raised significantly more often than it is not.
Strangely enough, I haven't observed any difference in other apps that I have created where I have omitted pn.extension() vs including it.
However as the documentation always specifies that you include it, I would have expected that I would have to set my appropriate extensions for all my plots to work correctly? (I have plotly, hvplot, holoviews and geoviews plots successfully plotting in these other apps with and without pn.extension() and pn.extension("plotly") included).
Is it possible to produce panel apps based on mosaic plots?
Thanks
Software Info:
os x Catalina
browser Firefox
python 3.7.5
notebook 6.0.2
pandas 0.25.3
panel 0.7.0
plotly 4.3.0
plotly_express 0.4.1
holoviews 1.12.6
geoviews 1.6.5
hvplot 0.5.2
Statsmodels function mosaic() returns a tuple with a figure and rects.
What you're seeing now via interact is that tuple. This tuple also gets updated in your code when you use the dropdowns.
The figure you see below that is the figure that jupyter automatically plots one time. This one doesn't get updated.
The solution is two-fold:
1) only return the figure, not the tuple
2) prevent jupyter from automatically plotting your figure once with plt.close()
In code:
def bivar_cat(x='sex', y='age_band'):
fig, ax = plt.subplots(figsize=(15,10))
mosaic(df, [x,y], ax=ax)
plt.close()
return fig
app_df_cat = pn.interact(
bivar_cat,
x=cat_atts,
y=cat_atts,
)
app_df_cat
I have a dataframe with 990 rows and 7 columns, I want to make a XvsY linear graph, broking the line at every 22 rows.
I think that dividing the dataframe and then plotting it will be good way, but I don't get good results.
max_rows = 22
dataframes = []
while len(Co1new) > max_rows:
top = Co1new[:max_rows]
dataframes.append(top)
Co1new = Co1new[max_rows:]
else:
dataframes.append(Co1new)
for grafico in dataframes:
AC = plt.plot(grafico)
AC = plt.xlabel('Frequency (Hz)')
AC = plt.ylabel("Temperature (K)")
plt.show()
The code functions but it is not plotting the right columns.
Here some reduced data and in this case it should be divided at every four rows:
df = pd.DataFrame({
'col1':[2.17073,2.14109,2.16052,2.81882,2.29713,2.26273,2.26479,2.7643,2.5444,2.5027,2.52532,2.6778],
'col2':[10,100,1000,10000,10,100,1000,10000,10,100,1000,10000],
'col3':[2.17169E-4,2.15889E-4,2.10526E-4,1.53785E-4,2.09867E-4,2.07583E-4,2.01699E-4,1.56658E-4,1.94864E-4,1.92924E-4,1.87634E-4,1.58252E-4]})
One way I can think of is to add a new column with labels for every 22 records. See below
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn
seaborn.set(style='ticks')
"""
Assuming the index is numeric and is from [0-990)
this will return an integer for every 22 records
"""
Co1new['subset'] = 'S' + np.floor_divide(Co1new.index, 22).astype(str)
Out:
col1 col2 col3 subset
0 2.17073 10 0.000217 S0
1 2.14109 100 0.000216 S0
2 2.16052 1000 0.000211 S0
3 2.81882 10000 0.000154 S0
4 2.29713 10 0.000210 S1
5 2.26273 100 0.000208 S1
6 2.26479 1000 0.000202 S1
7 2.76434 10000 0.000157 S1
8 2.54445 10 0.000195 S2
9 2.50270 100 0.000193 S2
10 2.52532 1000 0.000188 S2
11 2.67780 10000 0.000158 S2
You can then use seaborn.pairplot to plot your data pairwise and use Co1new['subset'] as legend.
seaborn.pairplot(Co1new, hue='subset')
Or if you absolutely need line charts, you can make line charts of your data, each pair at a time separately, here is col1 vs. col3
seaborn.lineplot('col1', 'col3', hue='subset', data=Co1new)
Using #SIA ' s answer
df['groups'] = np.floor_divide(df.index, 3).astype(str)
import plotly.express as px
fig = px.line(df, x="col1", y="col2", color='groups')
fig.show()
I have the following dataframe and I am trying to create a stacked bar plot
import os
from pprint import pprint
import matplotlib.pyplot as plt
import pandas as pd
def classify_data():
race = ['race1','race1','race1','race1','race2','race2','race2', 'race2']
qualifier = ['last','first','first','first','last','last','first','first']
participant = ['rat','rat','cat','cat','rat','dog','dog','dog']
df = pd.DataFrame(
{'race':race,
'qualifier':qualifier,
'participant':participant
}
)
pprint(df)
df2 = df.groupby(['race','qualifier'])['race'].count().unstack('qualifier').fillna(0)
df2[['first','last']].plot(kind='bar', stacked=True)
plt.show()
classify_data()
I could manage to obtain the following plot. But , I want to create two plots out of my dataframe
One plot containing the following data for the qualifier 'last'
Race1 rat 1
Race1 cat 0
Race1 dog 0
Race2 rat 1
Race2 dog 1
Race2 cat 0
So the first bar plot would have 2 bars and each bar coded with a different color for the count of participant
Likewise a second plot for qualifier 'first'
EDIT:
Race1 rat 1
Race1 cat 2
Race1 dog 0
Race2 rat 0
Race2 dog 2
Race2 cat 0
From the original dataframe , I have to create the above two dataframe for creating the stacked plots
I am not sure how to use the groupby function and get the count of 'participant' for each 'qualifier' for a given 'race'
EDIT 2 : For qualifier 'last' the desired plot would look like( blue for rat , red for dog).
For qualifier 'first'
Could someone suggest me on how to proceed from here?
IIUC, this is what you want:
df2 = (df.groupby(['race','qualifier','participant'])
.size()
.unstack(level=-1)
.reset_index()
)
fig,axes = plt.subplots(1,2,figsize=(12,6),sharey=True)
for ax,q in zip(axes.ravel(),['first','last']):
tmp_df = df2[df2.qualifier.eq(q)]
tmp_df.plot.bar(x='race', ax=ax, stacked=True)
Output:
I have a Dash dashboard and I need to plot on the x axis months from 0-12 and I need to have multiple lines on the same figure for different years that have been selected, ie 1991-2040. The plotted value is a columns say 'total' in a dataframe. The labels should be years and the total value is on the y axis. My data looks like this:
Month Year Total
0 0 1991 31.4
1 0 1992 31.4
2 0 1993 31.4
3 0 1994 20
4 0 1995 300
.. ... ... ...
33 0 2024 31.4
34 1 2035 567
35 1 2035 10
36 1 2035 3
....
Do I need to group it and how to achieve that in Dash/Plotly?
It seems to me that you should have a look at pd.pivot_table.
%matplotlib inline
import pandas as pd
import numpy as np
import plotly.offline as py
import plotly.graph_objs as go
# create a df
N = 100
df = pd.DataFrame({"Date":pd.date_range(start='1991-01-01',
periods=N,
freq='M'),
"Total":np.random.randn(N)})
df["Month"] = df["Date"].dt.month
df["Year"] = df["Date"].dt.year
# use pivot_table to have years as columns
pv = pd.pivot_table(df,
index=["Month"],
columns=["Year"],
values=["Total"])
# remove multiindex in columns
pv.columns = [col[1] for col in pv.columns]
data = [go.Scatter(x = pv.index,
y = pv[col],
name = col)
for col in pv.columns]
py.iplot(data)
I am trying to create line graph in matplotlib from a dataframe with 10488 rows vs 3 columns. My dataframe appears like the following:
col_A col_B col_C
target_id
KYQ35740 22.67 19.7 26.0
KYQ35675 9.21 3.2 3.1
KYQ35736 73.93 42.8 24.6
KYQ35737 349.94 602.6 212.4
KYQ35685 16.10 19.5 29.1
Here, target id is the index. The trial I made was:
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('ggplot')
%matplotlib inline
df = pd.read_csv("Data.txt", sep='\t', index_col=['target_id'])
df.plot()
I get a bar graph with target Ids on x-axis and three colored bars representing each column. However, I need to produce the transpose of it. i.e col_A, col_B, col_C labels in x axis with plot marked with 10488 lines running through series. I don't require target_ids in legend.
I tried transposing the df with df.T followed by df.plot(). But the system hangs which I believe is due to 10488 labels needed to be put in legend ?!
Thanks in advance for your help.
AP
If you want to get rid of the legend, you can use legend=False.
import pandas as pd
import io
import matplotlib.pyplot as plt
u = u"""target_id col_A col_B col_C
KYQ35740 22.67 19.7 26.0
KYQ35675 9.21 3.2 3.1
KYQ35736 73.93 42.8 24.6
KYQ35737 349.94 602.6 212.4
KYQ35685 16.10 19.5 29.1"""
df = pd.read_csv(io.StringIO(u), delim_whitespace=True,index_col=['target_id'])
df=df.T
df.plot(legend=False)
plt.show()