How to Successfully Produce Mosaic Plots in Pyviz Panel Apps? - python-3.x
I have created the following dataframe df:
Setup:
import pandas as pd
import numpy as np
import random
import copy
import feather
import matplotlib.pyplot as plt
from statsmodels.graphics.mosaicplot import mosaic
import plotly.graph_objects as go
import plotly.express as px
import panel as pn
import holoviews as hv
import geoviews as gv
import geoviews.feature as gf
import cartopy
import cartopy.feature as cf
from geoviews import opts
from cartopy import crs as ccrs
import hvplot.pandas
import colorcet as cc
from colorcet.plotting import swatch
#pn.extension() # commented out as this causes an intermittent javascript error
gv.extension("bokeh")
cols = {"name":["Jim","Alice","Bob","Julia","Fern","Bill","Jordan","Pip","Shelly","Mimi"],
"age":[19,26,37,45,56,71,20,36,37,55],
"sex":["Male","Female","Male","Female","Female","Male","Male","Male","Female","Female"],
"age_band":["18-24","25-34","35-44","45-54","55-64","65-74","18-24","35-44","35-44","55-64"],
"insurance_renew_month":[1,2,3,3,3,4,5,5,6,7],
"postcode_prefix":["EH","M","G","EH","EH","M","G","EH","M","EH"],
"postcode_order":[3,2,1,3,3,2,1,3,2,3],
"local_authority_district":["S12000036","E08000003","S12000049","S12000036","S12000036","E08000003","S12000036","E08000003","S12000049","S12000036"],
"blah1":[3,None,None,8,8,None,1,None,None,None],
"blah2":[None,None,None,33,5,None,66,3,22,3],
"blah3":["A",None,"A",None,"C",None,None,None,None,None],
"blah4":[None,None,None,None,None,None,None,None,None,1]}
df = pd.DataFrame.from_dict(cols)
df
Out[2]:
name age sex age_band ... blah1 blah2 blah3 blah4
0 Jim 19 Male 18-24 ... 3.0 NaN A NaN
1 Alice 26 Female 25-34 ... NaN NaN None NaN
2 Bob 37 Male 35-44 ... NaN NaN A NaN
3 Julia 45 Female 45-54 ... 8.0 33.0 None NaN
4 Fern 56 Female 55-64 ... 8.0 5.0 C NaN
5 Bill 71 Male 65-74 ... NaN NaN None NaN
6 Jordan 20 Male 18-24 ... 1.0 66.0 None NaN
7 Pip 36 Male 35-44 ... NaN 3.0 None NaN
8 Shelly 37 Female 35-44 ... NaN 22.0 None NaN
9 Mimi 55 Female 55-64 ... NaN 3.0 None 1.0
[10 rows x 12 columns]
df[["sex","age_band","postcode_prefix"]] = df[["sex","age_band","postcode_prefix"]].astype("category")
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 12 columns):
name 10 non-null object
age 10 non-null int64
sex 10 non-null category
age_band 10 non-null category
insurance_renew_month 10 non-null int64
postcode_prefix 10 non-null category
postcode_order 10 non-null int64
local_authority_district 10 non-null object
blah1 4 non-null float64
blah2 6 non-null float64
blah3 3 non-null object
blah4 1 non-null float64
dtypes: category(3), float64(3), int64(3), object(3)
memory usage: 1.3+ KB
The Problem:
I can successfully create a mosaic plot with the following code:
fig,ax = plt.subplots(figsize=(15,10))
mosaic(df,["sex", "age_band"],ax=ax);
However, I am having issues when I try to create a corresponding app using pn.interact:
categoric_cols = df.select_dtypes(include="category")
cat_atts = categoric_cols.columns.tolist()
cat_atts
Out[4]: ['sex', 'age_band', 'postcode_prefix']
def bivar_cat(x="sex",y="age_band"):
if x in cat_atts and y in cat_atts:
fig,ax = plt.subplots(figsize=(15,10))
return mosaic(df,[x,y],ax=ax);
app_df_cat = pn.interact(bivar_cat,x=cat_atts,y=cat_atts)
app_df_cat
Which results in the following:
The above rendered mosaic plot seems to correspond to the default values of x & y (ie sex & age_band). When you select a new attribute for x or y from the dropdowns, the text above the mosaic plot changes (this text seems to be a string representation of the plot) however the mosaic plot itself does not.
Is my issue possibly related to having to comment out pn.extension()? I have found that when pn.extension() is not commented out, it results in an intermittent javascript error whereby sometimes there is no error raised, sometimes there is an error but my panel app still loads and sometimes there is an error and it crashes my browser. (I have omitted the javascript error here as it can be very large - if it is helpful I can add this to my post.) I would say that the error is raised significantly more often than it is not.
Strangely enough, I haven't observed any difference in other apps that I have created where I have omitted pn.extension() vs including it.
However as the documentation always specifies that you include it, I would have expected that I would have to set my appropriate extensions for all my plots to work correctly? (I have plotly, hvplot, holoviews and geoviews plots successfully plotting in these other apps with and without pn.extension() and pn.extension("plotly") included).
Is it possible to produce panel apps based on mosaic plots?
Thanks
Software Info:
os x Catalina
browser Firefox
python 3.7.5
notebook 6.0.2
pandas 0.25.3
panel 0.7.0
plotly 4.3.0
plotly_express 0.4.1
holoviews 1.12.6
geoviews 1.6.5
hvplot 0.5.2
Statsmodels function mosaic() returns a tuple with a figure and rects.
What you're seeing now via interact is that tuple. This tuple also gets updated in your code when you use the dropdowns.
The figure you see below that is the figure that jupyter automatically plots one time. This one doesn't get updated.
The solution is two-fold:
1) only return the figure, not the tuple
2) prevent jupyter from automatically plotting your figure once with plt.close()
In code:
def bivar_cat(x='sex', y='age_band'):
fig, ax = plt.subplots(figsize=(15,10))
mosaic(df, [x,y], ax=ax)
plt.close()
return fig
app_df_cat = pn.interact(
bivar_cat,
x=cat_atts,
y=cat_atts,
)
app_df_cat
Related
Calculation of percentile and mean
I want to find the 3% percentile of the following data and then average the data. Given below is the data structure. 0 NaN 1 NaN 2 NaN 3 NaN 4 NaN ... ... 96927 NaN 96928 NaN 96929 NaN 96930 NaN 96931 NaN Over here the concerned data lies exactly between the data from 13240:61156. Given below is my code: enter code here import pandas as pd import numpy as np load_var=pd.read_excel(r'path\file name.xlsx') load_var a=pd.DataFrame(load_var['column whose percentile is to be found']) print(a) b=np.nanpercentile(a,3) print(b) Please suggest the changes in the code. Thank you.
Use Series.quantile with mean in Series.agg: df = pd.DataFrame({ 'col':[7,8,9,4,2,3, np.nan], }) f = lambda x: x.quantile(0.03) f.__name__ = 'q' s = df['col'].agg(['mean', f]) print (s) mean 5.50 q 2.15 Name: col, dtype: float64
How to plot Histogram on specific data
I am reading CSV file: Notation Level RFResult PRIResult PDResult Total Result AAA 1 1.23 0 2 3.23 AAA 1 3.4 1 0 4.4 BBB 2 0.26 1 1.42 2.68 BBB 2 0.73 1 1.3 3.03 CCC 3 0.30 0 2.73 3.03 DDD 4 0.25 1 1.50 2.75 ... ... Here is the code import pandas as pd df = pd.rad_csv('home\NewFiles\Files.csv') Notation = df['Notation'] Level = df['Level'] RFResult = df['RFResult'] PRIResult = df['PRIResult'] PDResult = df['PDResult'] df.groupby('Level').plot(kind='bar') Above code gives me four different figures. I want to change few things below: I don't want to show the Level and Total Results bar in graph. How should I remove that? Also, how should I label xaxis and yaxis and title of each plot. So for this, I want to give the title of plot is "level number".
To plot use the following code... import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv('home\NewFiles\Files.csv') plt.hist((df['RFResult'],df['PRIResult'],df['PDResult']),bins=10) plt.title('Level Number') plt.xlabel('Label name') plt.ylabel('Label name') plt.plot()
You can do: import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv('home\NewFiles\Files.csv') df.plot(kind='hist', y = ['RFResult', 'PRIResult', 'PDResult'], bins=20) plt.title('level numbers') plt.xlabel('X-Label') plt.ylabel('Y-Label') Remember the plot is called by pandas, but is based on matplotlib. So you can pass additional arguments!
Plotting datetimes in matplotlib producing many colors
I new to python, trying to plot datetime data in matlibplot, but getting a strange result - I can only plot points and they are myriad different colors. I am using plot_date(). I tried generating a workable example but the problem wouldn't show up there (see below). So here is a sample of the database that is giving problems. import pandas as pd import matplotlib.dates as mdates import matplotlib.pyplot as plt #get a sense of what the data looks like: data.head() out: date variable value unit 0 2020-04-17 10:30:02.309433 Temperature 20.799999 C 2 2020-04-17 10:45:12.089008 Temperature 20.799999 C 4 2020-04-17 11:00:07.033692 Temperature 20.799999 C 6 2020-04-17 11:15:04.457991 Temperature 20.799999 C 8 2020-04-17 11:30:04.996910 Temperature 20.799999 C data.info() <class 'pandas.core.frame.DataFrame'> Int64Index: 99 entries, 0 to 196 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 date 99 non-null object 1 variable 99 non-null object 2 value 98 non-null float64 3 unit 99 non-null object dtypes: float64(1), object(3) memory usage: 3.9+ KB #convert date variable to datetime data['date'] = pd.to_datetime(data['date']) #plot with plot_date, calling date2num on date variable plt.plot_date([mdates.date2num(data['date'])], [data['value']]) Gives: Why am I getting all these colored points? When I build a small data set of three time periods I don't see this behavior. Instead I get three blue points: #create dataframe df = pd.DataFrame({'time': ['2020-04-17 10:30:02.309433', '2020-04-17 10:30:02.309455', '2020-04-17 10:45:12.089008'], 'value': [20.799999, 41.099998, 47.599998]}) #change time variable to datetime object df['time'] = pd.to_datetime(df['time']) #plot plt.plot_date(mdates.date2num(df['time']), df['value']) Gives three blue dots as expected: Finally, how can I produce a line plot using plot_date(). The only way I have seen to do this is using: datetime.datime.now() date formats and calling pyplot.plot() - see second answer here: Plotting time in Python with Matplotlib
The difference between plt.plot_date([mdates.date2num(data['date'])], [data['value']]) and plt.plot_date(mdates.date2num(df['time']), df['value']) is that you have an extra set of square brackets. As for the line, add fmt='-' option to plot_date
How to Animate multiple columns as dots with matplotlib from pandas dataframe with NaN in python
I have a question that maybe is asked before, but I couldn't find any post describing my problem. I have two pandas dataframes, each with the same index, representing x coordinates in one dataframe and y coordinates in the other. Each colum represent a car that started a specific timestep, logged every step until it arrived, and then stopped logging. Everytime a car starts on its route, a column is added to each dataframe and the coordinates of each step are added to each frame (every step it moves trough space therefor has new x,y coordinates), (see example for the x coordinates dataframe) But I am trying to animate the tracks of each car by plotting the coordinates in an animated graph, but I cannot seem to get it worked. My code: %matplotlib notebook from matplotlib import animation from JSAnimation import IPython_display from IPython.display import HTML fig = plt.figure(figsize=(10,10)) #ax = plt.axes() #nx.draw_networkx_edges(w_G,nx.get_node_attributes(w_G, 'pos')) n_steps = simulation.x_df.index def init(): graph, = plt.plot([],[],'o') return graph, def get_data_x(i): return simulation.x_df.loc[i] def get_data_y(i): return simulation.y_df.loc[i] def animate(i): x = get_data_x(i) y= get_data_y(i) graph.set_data(x,y) return graph, animation.FuncAnimation(fig, animate, frames=100, init_func = init, repeat=True) plt.show() It does not plot anything, so any help would be very much appreciated. EDIT: Minimal, Complete, and Verifiable example! So two simple examples of the x and y dataframes that I have. Each has the same index. import random import geopandas as gpd import networkx as nx import matplotlib.pyplot as plt import numpy as np import math import pandas as pd from shapely.geometry import Point from matplotlib import animation from JSAnimation import IPython_display %matplotlib inline [IN]: df_x = pd.DataFrame(data=np.array([[np.NaN, np.NaN, np.NaN, np.NaN], [4, np.nan, np.NaN,np.NaN], [7, 12, np.NaN,np.NaN], [6, 18, 12,9]]), index= [1, 2, 3, 4], columns=[1, 2, 3, 4]) gives: [OUT] 1 2 3 4 1 NaN NaN NaN NaN 2 4.0 NaN NaN NaN 3 7.0 12.0 NaN NaN 4 6.0 18.0 12.0 9.0 And the y coordinate dataframe: [IN] df_y = pd.DataFrame(data=np.array([[np.NaN, np.NaN, np.NaN, np.NaN], [6, np.nan, np.NaN,np.NaN], [19, 2, np.NaN,np.NaN], [4, 3, 1,12]]), index= [1, 2, 3, 4], columns=[1, 2, 3, 4])' gives: [OUT] 1 2 3 4 1 NaN NaN NaN NaN 2 6.0 NaN NaN NaN 3 19.0 2.0 NaN NaN 4 4.0 3.0 1.0 12.0 Now I want to create an animation, by creating a frame by plotting the x coordinate and the y coordinate of each column per each row of both dataframes. In this example, frame 1 should not contain any plot. Frame 2 should plot point (4.0 , 6.0) (of column 1). Frame 3 should plot point (7.0,19.0) (column1) and point (12.0,2.0) (column 2). Frame 4 should plot point (6.0, 4.0) (column 1), point (18.0,3.0) (column 2), point (12.0,1.0) (column 3) and (9.0, 12.0) column 4. Therefore I wrote the following code: I tried writing the following code to animate this: [IN] %matplotlib notebook from matplotlib import animation from JSAnimation import IPython_display from IPython.display import HTML fig = plt.figure(figsize=(10,10)) #ax = plt.axes() graph, = plt.plot([],[],'o') def get_data_x(i): return df_x.loc[i] def get_data_y(i): return df_y.loc[i] def animate(i): x = get_data_x(i) y= get_data_y(i) graph.set_data(x,y) return graph, animation.FuncAnimation(fig, animate, frames=4, repeat=True) plt.show() But this does not give any output. Any suggestions?
I've reformatted your code, but I think your main issue was that your dataframes start with a index of 1, but when you're calling your animation with frames=4, it's calling update() with i=[0,1,2,3]. Therefore when you do get_data_x(0) you raise a KeyError: 'the label [0] is not in the [index]' As per the documentation, frames= can be passed an iterable instead of an int. Here, I simply pass the index of your dataframe, and the function will iterate and call update() with each value. Actually, I decided to pass the intersection of your two dataframe indexes, that way, if there is one index present in one dataframe but not the other, it will not raise an Error. If you are garanteed that your two indexes are the same, then you could just do frames=df_x.index x_ = """ 1 2 3 4 1 NaN NaN NaN NaN 2 4.0 NaN NaN NaN 3 7.0 12.0 NaN NaN 4 6.0 18.0 12.0 9.0 """ y_ = """1 2 3 4 1 NaN NaN NaN NaN 2 6.0 NaN NaN NaN 3 19.0 2.0 NaN NaN 4 4.0 3.0 1.0 12.0""" df_x = pd.read_table(StringIO(x_), sep='\s+') df_y = pd.read_table(StringIO(y_), sep='\s+') fig, ax = plt.subplots(figsize=(5, 5)) graph, = ax.plot([],[], 'o') # either set up sensible limits here that won't change during the animation # or see the comment in function `update()` ax.set_xlim(0,20) ax.set_ylim(0,20) def get_data_x(i): return df_x.loc[i] def get_data_y(i): return df_y.loc[i] def update(i): x = get_data_x(i) y = get_data_y(i) graph.set_data(x,y) # if you don't know the range of your data, you can use the following # instructions to rescale the axes. #ax.relim() #ax.autoscale_view() return graph, # Creating the Animation object ani = animation.FuncAnimation(fig, update, frames=pd.Index.intersection(df_x.index,df_y.index), interval=500, blit=False) plt.show()
matplotlib line graph from dataframe
I am trying to create line graph in matplotlib from a dataframe with 10488 rows vs 3 columns. My dataframe appears like the following: col_A col_B col_C target_id KYQ35740 22.67 19.7 26.0 KYQ35675 9.21 3.2 3.1 KYQ35736 73.93 42.8 24.6 KYQ35737 349.94 602.6 212.4 KYQ35685 16.10 19.5 29.1 Here, target id is the index. The trial I made was: import pandas as pd import matplotlib.pyplot as plt plt.style.use('ggplot') %matplotlib inline df = pd.read_csv("Data.txt", sep='\t', index_col=['target_id']) df.plot() I get a bar graph with target Ids on x-axis and three colored bars representing each column. However, I need to produce the transpose of it. i.e col_A, col_B, col_C labels in x axis with plot marked with 10488 lines running through series. I don't require target_ids in legend. I tried transposing the df with df.T followed by df.plot(). But the system hangs which I believe is due to 10488 labels needed to be put in legend ?! Thanks in advance for your help. AP
If you want to get rid of the legend, you can use legend=False. import pandas as pd import io import matplotlib.pyplot as plt u = u"""target_id col_A col_B col_C KYQ35740 22.67 19.7 26.0 KYQ35675 9.21 3.2 3.1 KYQ35736 73.93 42.8 24.6 KYQ35737 349.94 602.6 212.4 KYQ35685 16.10 19.5 29.1""" df = pd.read_csv(io.StringIO(u), delim_whitespace=True,index_col=['target_id']) df=df.T df.plot(legend=False) plt.show()