How to Animate multiple columns as dots with matplotlib from pandas dataframe with NaN in python - python-3.x

I have a question that maybe is asked before, but I couldn't find any post describing my problem.
I have two pandas dataframes, each with the same index, representing x coordinates in one dataframe and y coordinates in the other. Each colum represent a car that started a specific timestep, logged every step until it arrived, and then stopped logging.
Everytime a car starts on its route, a column is added to each dataframe and the coordinates of each step are added to each frame (every step it moves trough space therefor has new x,y coordinates), (see example for the x coordinates dataframe)
But I am trying to animate the tracks of each car by plotting the coordinates in an animated graph, but I cannot seem to get it worked. My code:
%matplotlib notebook
from matplotlib import animation
from JSAnimation import IPython_display
from IPython.display import HTML
fig = plt.figure(figsize=(10,10))
#ax = plt.axes()
#nx.draw_networkx_edges(w_G,nx.get_node_attributes(w_G, 'pos'))
n_steps = simulation.x_df.index
def init():
graph, = plt.plot([],[],'o')
return graph,
def get_data_x(i):
return simulation.x_df.loc[i]
def get_data_y(i):
return simulation.y_df.loc[i]
def animate(i):
x = get_data_x(i)
y= get_data_y(i)
graph.set_data(x,y)
return graph,
animation.FuncAnimation(fig, animate, frames=100, init_func = init, repeat=True)
plt.show()
It does not plot anything, so any help would be very much appreciated.
EDIT: Minimal, Complete, and Verifiable example!
So two simple examples of the x and y dataframes that I have. Each has the same index.
import random
import geopandas as gpd
import networkx as nx
import matplotlib.pyplot as plt
import numpy as np
import math
import pandas as pd
from shapely.geometry import Point
from matplotlib import animation
from JSAnimation import IPython_display
%matplotlib inline
[IN]: df_x = pd.DataFrame(data=np.array([[np.NaN, np.NaN, np.NaN, np.NaN], [4, np.nan, np.NaN,np.NaN], [7, 12, np.NaN,np.NaN], [6, 18, 12,9]]), index= [1, 2, 3, 4], columns=[1, 2, 3, 4])
gives:
[OUT]
1 2 3 4
1 NaN NaN NaN NaN
2 4.0 NaN NaN NaN
3 7.0 12.0 NaN NaN
4 6.0 18.0 12.0 9.0
And the y coordinate dataframe:
[IN] df_y = pd.DataFrame(data=np.array([[np.NaN, np.NaN, np.NaN, np.NaN], [6, np.nan, np.NaN,np.NaN], [19, 2, np.NaN,np.NaN], [4, 3, 1,12]]), index= [1, 2, 3, 4], columns=[1, 2, 3, 4])'
gives:
[OUT]
1 2 3 4
1 NaN NaN NaN NaN
2 6.0 NaN NaN NaN
3 19.0 2.0 NaN NaN
4 4.0 3.0 1.0 12.0
Now I want to create an animation, by creating a frame by plotting the x coordinate and the y coordinate of each column per each row of both dataframes. In this example, frame 1 should not contain any plot. Frame 2 should plot point (4.0 , 6.0) (of column 1). Frame 3 should plot point (7.0,19.0) (column1) and point (12.0,2.0) (column 2). Frame 4 should plot point (6.0, 4.0) (column 1), point (18.0,3.0) (column 2), point (12.0,1.0) (column 3) and (9.0, 12.0) column 4. Therefore I wrote the following code:
I tried writing the following code to animate this:
[IN] %matplotlib notebook
from matplotlib import animation
from JSAnimation import IPython_display
from IPython.display import HTML
fig = plt.figure(figsize=(10,10))
#ax = plt.axes()
graph, = plt.plot([],[],'o')
def get_data_x(i):
return df_x.loc[i]
def get_data_y(i):
return df_y.loc[i]
def animate(i):
x = get_data_x(i)
y= get_data_y(i)
graph.set_data(x,y)
return graph,
animation.FuncAnimation(fig, animate, frames=4, repeat=True)
plt.show()
But this does not give any output. Any suggestions?

I've reformatted your code, but I think your main issue was that your dataframes start with a index of 1, but when you're calling your animation with frames=4, it's calling update() with i=[0,1,2,3]. Therefore when you do get_data_x(0) you raise a KeyError: 'the label [0] is not in the [index]'
As per the documentation, frames= can be passed an iterable instead of an int. Here, I simply pass the index of your dataframe, and the function will iterate and call update() with each value. Actually, I decided to pass the intersection of your two dataframe indexes, that way, if there is one index present in one dataframe but not the other, it will not raise an Error. If you are garanteed that your two indexes are the same, then you could just do frames=df_x.index
x_ = """ 1 2 3 4
1 NaN NaN NaN NaN
2 4.0 NaN NaN NaN
3 7.0 12.0 NaN NaN
4 6.0 18.0 12.0 9.0
"""
y_ = """1 2 3 4
1 NaN NaN NaN NaN
2 6.0 NaN NaN NaN
3 19.0 2.0 NaN NaN
4 4.0 3.0 1.0 12.0"""
df_x = pd.read_table(StringIO(x_), sep='\s+')
df_y = pd.read_table(StringIO(y_), sep='\s+')
fig, ax = plt.subplots(figsize=(5, 5))
graph, = ax.plot([],[], 'o')
# either set up sensible limits here that won't change during the animation
# or see the comment in function `update()`
ax.set_xlim(0,20)
ax.set_ylim(0,20)
def get_data_x(i):
return df_x.loc[i]
def get_data_y(i):
return df_y.loc[i]
def update(i):
x = get_data_x(i)
y = get_data_y(i)
graph.set_data(x,y)
# if you don't know the range of your data, you can use the following
# instructions to rescale the axes.
#ax.relim()
#ax.autoscale_view()
return graph,
# Creating the Animation object
ani = animation.FuncAnimation(fig, update,
frames=pd.Index.intersection(df_x.index,df_y.index),
interval=500, blit=False)
plt.show()

Related

Pandas groupby first element in tuple and according to bins

I have a series in a dataframe that contains lists of tuples of various lengths after zipping two series together. Eg.
df['lists']
[(0.0, 0), (1.7, 0.28095163296378495), (7.4, 1.12693228043272953), (18.1, 3.053019684863041594), (1.4, 0.053019684863041594), (1.5, 0.01985536)]
[(7.2, 0.14417851715463678), (0.0, 0), (1.5, 0.013) (6.1, 5.15851278579066022)]
I also have created bins.
bins = [0.1,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2.0,2.1,2.2,2.3,2.4,2.5,2.6,2.7,2.8,2.9,3.0,3.2,3.4,3.6,3.8,4.0,4.5,5.0,5.5,6.0,7.0,8.0,9.0,10.0,11.0,12.0,13.0,14.0,15.0,16.0,17.0,18.0,19.0,20.0,21.0,22.0,23.0,24.0,25.0,30.0,35.0,40.0,50.0,60.0,70.0,80.0,90.0,100.0,110.0,125.0,150.0,175.0,200.0,250.0,500.0]
I want to groupby the first element in the tuple according to the bins and exclude any tuple where the element is zero. This is so I can find the mean or do some other calculations on the second element grouped into these bins. Eg.
1.3 NaN
1.4 0.053019684863041594
1.5 0.01642768
1.6 NaN
...
7.0 0.6355553987936832
I can use the explode() method to separate out the lists but cannot figure it out from there.
Help is greatly appreciated.
Managed to solve this with a little help from #mozway. Needed a small tweak but it was my fault.
For posterity:
df2 = pd.DataFrame(df['lists'].explode().to_list(), columns=['col1', 'col2'])
out = (df2.loc[df2['col2'].ne(0)].assign(bin=lambda d: pd.cut(d['col2'], bins=bins))).groupby('bin')['col2'].mean()
Assuming s the input Series, you can use:
bins = [0.1,1.3,1.4,1.5,1.6,1.7]
df = pd.DataFrame(s.explode().to_list(),
columns=['col1', 'col2'])
out = (df
.loc[df['col1'].ne(0)]
.assign(bin=lambda d: pd.cut(d['col2'], bins=bins))
)
output:
col1 col2 bin
1 1.8 0.280952 (0.1, 1.3]
2 7.4 1.126932 (0.1, 1.3]
3 18.1 3.053020 NaN
4 7.2 0.144179 (0.1, 1.3]
6 6.1 5.158513 NaN
With aggregation:
out = (df
.loc[df['col1'].ne(0)]
.assign(bin=lambda d: pd.cut(d['col2'], bins=bins))
.groupby('bin')['col1'].mean()
)
output:
bin
(0.1, 1.3] 5.466667
(1.3, 1.4] NaN
(1.4, 1.5] NaN
(1.5, 1.6] NaN
(1.6, 1.7] NaN
Name: col1, dtype: float64
Used input:
s = pd.Series([[(0.0, 0), (1.8, 0.28095163296378495), (7.4, 1.1269322804327295), (18.1, 3.0530196848630418)],
[(7.2, 0.14417851715463678), (0.0, 0), (6.1, 5.15851278579066)]])

Calculation of percentile and mean

I want to find the 3% percentile of the following data and then average the data.
Given below is the data structure.
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
... ...
96927 NaN
96928 NaN
96929 NaN
96930 NaN
96931 NaN
Over here the concerned data lies exactly between the data from 13240:61156.
Given below is my code:
enter code here
import pandas as pd
import numpy as np
load_var=pd.read_excel(r'path\file name.xlsx')
load_var
a=pd.DataFrame(load_var['column whose percentile is to be found'])
print(a)
b=np.nanpercentile(a,3)
print(b)
Please suggest the changes in the code.
Thank you.
Use Series.quantile with mean in Series.agg:
df = pd.DataFrame({
'col':[7,8,9,4,2,3, np.nan],
})
f = lambda x: x.quantile(0.03)
f.__name__ = 'q'
s = df['col'].agg(['mean', f])
print (s)
mean 5.50
q 2.15
Name: col, dtype: float64

Replacing NaN value with None fill the value from the previous row in a DataFrame (pandas 1.0.3)

Not sure if it is a bug, but I am unable to replace NaN value with None value using latest Pandas library. When I use DataFrame.replace() method to replace NaN with None, dataframe is taking value from previous row instead of None value. For example,
import numpy as np
import pandas as pd
df = pd.DataFrame({'x': [10, 20, np.nan], 'y': [30, 40, 50]})
print(df)
Outputs
x y
0 10.0 30
1 20.0 40
2 NaN 50
And if I apply replace method
print(df.replace(np.NaN, None))
Outputs. Cell(X, 2) should be None instead of 20.0.
x y
0 10.0 30
1 20.0 40
2 20.0 50
Anyone help is appreciated.

How to Successfully Produce Mosaic Plots in Pyviz Panel Apps?

I have created the following dataframe df:
Setup:
import pandas as pd
import numpy as np
import random
import copy
import feather
import matplotlib.pyplot as plt
from statsmodels.graphics.mosaicplot import mosaic
import plotly.graph_objects as go
import plotly.express as px
import panel as pn
import holoviews as hv
import geoviews as gv
import geoviews.feature as gf
import cartopy
import cartopy.feature as cf
from geoviews import opts
from cartopy import crs as ccrs
import hvplot.pandas
import colorcet as cc
from colorcet.plotting import swatch
#pn.extension() # commented out as this causes an intermittent javascript error
gv.extension("bokeh")
cols = {"name":["Jim","Alice","Bob","Julia","Fern","Bill","Jordan","Pip","Shelly","Mimi"],
"age":[19,26,37,45,56,71,20,36,37,55],
"sex":["Male","Female","Male","Female","Female","Male","Male","Male","Female","Female"],
"age_band":["18-24","25-34","35-44","45-54","55-64","65-74","18-24","35-44","35-44","55-64"],
"insurance_renew_month":[1,2,3,3,3,4,5,5,6,7],
"postcode_prefix":["EH","M","G","EH","EH","M","G","EH","M","EH"],
"postcode_order":[3,2,1,3,3,2,1,3,2,3],
"local_authority_district":["S12000036","E08000003","S12000049","S12000036","S12000036","E08000003","S12000036","E08000003","S12000049","S12000036"],
"blah1":[3,None,None,8,8,None,1,None,None,None],
"blah2":[None,None,None,33,5,None,66,3,22,3],
"blah3":["A",None,"A",None,"C",None,None,None,None,None],
"blah4":[None,None,None,None,None,None,None,None,None,1]}
df = pd.DataFrame.from_dict(cols)
df
Out[2]:
name age sex age_band ... blah1 blah2 blah3 blah4
0 Jim 19 Male 18-24 ... 3.0 NaN A NaN
1 Alice 26 Female 25-34 ... NaN NaN None NaN
2 Bob 37 Male 35-44 ... NaN NaN A NaN
3 Julia 45 Female 45-54 ... 8.0 33.0 None NaN
4 Fern 56 Female 55-64 ... 8.0 5.0 C NaN
5 Bill 71 Male 65-74 ... NaN NaN None NaN
6 Jordan 20 Male 18-24 ... 1.0 66.0 None NaN
7 Pip 36 Male 35-44 ... NaN 3.0 None NaN
8 Shelly 37 Female 35-44 ... NaN 22.0 None NaN
9 Mimi 55 Female 55-64 ... NaN 3.0 None 1.0
[10 rows x 12 columns]
df[["sex","age_band","postcode_prefix"]] = df[["sex","age_band","postcode_prefix"]].astype("category")
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 12 columns):
name 10 non-null object
age 10 non-null int64
sex 10 non-null category
age_band 10 non-null category
insurance_renew_month 10 non-null int64
postcode_prefix 10 non-null category
postcode_order 10 non-null int64
local_authority_district 10 non-null object
blah1 4 non-null float64
blah2 6 non-null float64
blah3 3 non-null object
blah4 1 non-null float64
dtypes: category(3), float64(3), int64(3), object(3)
memory usage: 1.3+ KB
The Problem:
I can successfully create a mosaic plot with the following code:
fig,ax = plt.subplots(figsize=(15,10))
mosaic(df,["sex", "age_band"],ax=ax);
However, I am having issues when I try to create a corresponding app using pn.interact:
categoric_cols = df.select_dtypes(include="category")
cat_atts = categoric_cols.columns.tolist()
cat_atts
Out[4]: ['sex', 'age_band', 'postcode_prefix']
def bivar_cat(x="sex",y="age_band"):
if x in cat_atts and y in cat_atts:
fig,ax = plt.subplots(figsize=(15,10))
return mosaic(df,[x,y],ax=ax);
app_df_cat = pn.interact(bivar_cat,x=cat_atts,y=cat_atts)
app_df_cat
Which results in the following:
The above rendered mosaic plot seems to correspond to the default values of x & y (ie sex & age_band). When you select a new attribute for x or y from the dropdowns, the text above the mosaic plot changes (this text seems to be a string representation of the plot) however the mosaic plot itself does not.
Is my issue possibly related to having to comment out pn.extension()? I have found that when pn.extension() is not commented out, it results in an intermittent javascript error whereby sometimes there is no error raised, sometimes there is an error but my panel app still loads and sometimes there is an error and it crashes my browser. (I have omitted the javascript error here as it can be very large - if it is helpful I can add this to my post.) I would say that the error is raised significantly more often than it is not.
Strangely enough, I haven't observed any difference in other apps that I have created where I have omitted pn.extension() vs including it.
However as the documentation always specifies that you include it, I would have expected that I would have to set my appropriate extensions for all my plots to work correctly? (I have plotly, hvplot, holoviews and geoviews plots successfully plotting in these other apps with and without pn.extension() and pn.extension("plotly") included).
Is it possible to produce panel apps based on mosaic plots?
Thanks
Software Info:
os x Catalina
browser Firefox
python 3.7.5
notebook 6.0.2
pandas 0.25.3
panel 0.7.0
plotly 4.3.0
plotly_express 0.4.1
holoviews 1.12.6
geoviews 1.6.5
hvplot 0.5.2
Statsmodels function mosaic() returns a tuple with a figure and rects.
What you're seeing now via interact is that tuple. This tuple also gets updated in your code when you use the dropdowns.
The figure you see below that is the figure that jupyter automatically plots one time. This one doesn't get updated.
The solution is two-fold:
1) only return the figure, not the tuple
2) prevent jupyter from automatically plotting your figure once with plt.close()
In code:
def bivar_cat(x='sex', y='age_band'):
fig, ax = plt.subplots(figsize=(15,10))
mosaic(df, [x,y], ax=ax)
plt.close()
return fig
app_df_cat = pn.interact(
bivar_cat,
x=cat_atts,
y=cat_atts,
)
app_df_cat

Filling in nans for numbers in a column-specific way

Given a DataFrame and a list of indexes, is there an efficient pandas function that put nan value for all values vertically preceeding each of the entries of the list?
For example, suppose we have the list [4,8] and the following DataFrame:
index 0 1
5 1 2
2 9 3
4 3.2 3
8 9 8.7
The desired output is simply:
index 0 1
5 nan nan
2 nan nan
4 3.2 nan
8 9 8.7
Any suggestions for such a function that does this fast?
Here's one NumPy approach based on np.searchsorted -
s = [4,8]
a = df.values
idx = df.index.values
sidx = np.argsort(idx)
matching_row_indx = sidx[np.searchsorted(idx, s, sorter = sidx)]
mask = np.arange(a.shape[0])[:,None] < matching_row_indx
a[mask] = np.nan
Sample run -
In [107]: df
Out[107]:
0 1
index
5 1.0 2.0
2 9.0 3.0
4 3.2 3.0
8 9.0 8.7
In [108]: s = [4,8]
In [109]: a = df.values
...: idx = df.index.values
...: sidx = np.argsort(idx)
...: matching_row_indx = sidx[np.searchsorted(idx, s, sorter = sidx)]
...: mask = np.arange(a.shape[0])[:,None] < matching_row_indx
...: a[mask] = np.nan
...:
In [110]: df
Out[110]:
0 1
index
5 NaN NaN
2 NaN NaN
4 3.2 NaN
8 9.0 8.7
It was a bit tricky to recreate your example but this should do it:
import pandas as pd
import numpy as np
df = pd.DataFrame({'index': [5, 2, 4, 8], 0: [1, 9, 3.2, 9], 1: [2, 3, 3, 8.7]})
df.set_index('index', inplace=True)
for i, item in enumerate([4,8]):
for index, row in df.iterrows():
if index != item:
row[i] = np.nan
else:
break

Resources