I was trying to plot a seaborn distplot.
sample code:
import pandas as pd
import seaborn as sns
import numpy as np
import scipy
import matplotlib.pyplot as plt
# data
x1 = np.random.normal(10, 3.4, size=1000) # mean of 10
df = pd.DataFrame({'x1': x1})
def map_pdf(x, **kwargs):
mu, std =
x0, x1 = p1.axes[0][0].get_xlim() # axes for p1 is required to determine x_pdf
x_pdf = np.linspace(x0, x1, 100)
y_pdf = scipy.stats.norm.pdf(x_pdf, mu, std)
plt.plot(x_pdf, y_pdf, c='r')
p1 = sns.displot(data=df, x='x1', kind='hist', bins=40, stat='density'), 'x1')
not sure why I am not getting any output after executing the above code!
Upon execution above code, i am getting this,
<seaborn.axisgrid.FacetGrid at 0x7f6a6fa0f820>
Any help on this will be highly appreciated.
Thank you in advance for the support!

Use the to display your plot. The same was recreated and furnished below with the solution.
Can I generate a contourplot from three columns of data in python without using meshgrid?

I have three columns of data. They are too large to generate meshgrids from. So e.g. in order to generate a surface plot from the data, I use a method like so
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D
x, y, z = np.loadtxt('data_file', unpack=True)
df = pd.DataFrame({'x':x, 'y':y, 'z':z})
fig = plt.figure()
ax = Axes3D(fig)
surf = ax.plot_trisurf(df.x, df.y, df.z, cmap=cm.jet, linewidth=0.05)
fig.colorbar(surf, shrink=0.5, aspect=5)
Is there a similar alternative to plot_trisurf for contours?

Recovering features names of StandardScaler().fit_transform() with sklearn

Edited from a tutorial in Kaggle, I try to run the code below and data (available to download from here):
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt # for plotting facilities
from datetime import datetime, date
from sklearn.model_selection import TimeSeriesSplit, GridSearchCV
import xgboost as xgb
from sklearn.metrics import mean_squared_error, mean_absolute_error
import math
from sklearn.preprocessing import StandardScaler
df = pd.read_csv("./data/Aquifer_Petrignano.csv")
df['Date'] = pd.to_datetime(df.Date, format = '%d/%m/%Y')
df = df[df.Rainfall_Bastia_Umbra.notna()].reset_index(drop=True)
df = df.interpolate(method ='ffill')
df = df[['Date', 'Rainfall_Bastia_Umbra', 'Depth_to_Groundwater_P24', 'Depth_to_Groundwater_P25', 'Temperature_Bastia_Umbra', 'Temperature_Petrignano', 'Volume_C10_Petrignano', 'Hydrometry_Fiume_Chiascio_Petrignano']].resample('7D', on='Date').mean().reset_index(drop=False)
X = df.drop(['Depth_to_Groundwater_P24','Depth_to_Groundwater_P25','Date'], axis=1)
y1 = df.Depth_to_Groundwater_P24
y2 = df.Depth_to_Groundwater_P25
scaler = StandardScaler()
X = scaler.fit_transform(X)
model = xgb.XGBRegressor()
param_search = {'max_depth': range(1, 2, 2),
'min_child_weight': range(1, 2, 2),
'n_estimators' : [1000],
'learning_rate' : [0.1]}
tscv = TimeSeriesSplit(n_splits=2)
gsearch = GridSearchCV(estimator=model, cv=tscv,
param_grid=param_search), y1)
xgb_grid = xgb.XGBRegressor(**gsearch.best_params_), y1)
ax = xgb.plot_importance(xgb_grid)
y_val = y1[-80:]
X_val = X[-80:]
y_pred = xgb_grid.predict(X_val)
print(mean_absolute_error(y_val, y_pred))
print(math.sqrt(mean_squared_error(y_val, y_pred)))
I plotted a features importance figure whose original features names are hidden:
If I comment out these two lines:
scaler = StandardScaler()
X = scaler.fit_transform(X)
I get the output:
How could I use scaler.fit_transform() for X and get a feature importance plot with the original feature names?
The reason behind this is that StandardScaler returns a numpy.ndarray of your feature values (same shape as pandas.DataFrame.values, but not normalized) and you need to convert it back to pandas.DataFrame with the same column names.
Here's the part of your code that needs changing.
scaler = StandardScaler()
X = pd.DataFrame(scaler.fit_transform(X), columns=X.columns)

How do I plot vertical strips in matplotlib

I want to show the value of a 0 or 1 array on a plot with other timeseries.
How can I achieve something like the grey lines below - except mine will oscillate a lot more.
For example, how to add osc here:
import numpy as np
import matplotlib.pyplot as plt
import datetime
import pandas as pd
n = 100
x = range(n)
y = np.random.rand(100)
osc = np.random.randint(2, size=n)
Well, you can loop through the values and call axvspan(x0,x1,color=...,alpha=...);
import numpy as np
import matplotlib.pyplot as plt
n = 100
x = range(n)
y = np.random.randn(100).cumsum()
osc = np.random.randint(2, size=n)
plt.plot(x, y, color='crimson')
for x0, x1, os in zip(x[:-1], x[1:], osc):
if os:
plt.axvspan(x0, x1, color='blue', alpha=0.2, lw=0)
Note that only the first 99 values of osc are used, as there are only 99 intervals.
See code below:
import numpy as np
import matplotlib.pyplot as plt
n = 100
x = range(n)
y = np.random.rand(100)
osc = np.random.randint(2, size=n)
fig,ax = plt.subplots()
ax.axvspan(0,5,facecolor='grey', alpha=0.4)
Documentation on axvspan can be found here:
Similarly you can use axvline for just vertical lines.

Getting rid of extra lines in Python shapefile plot?

I am trying to do a basic plot of the world map using Python and the Matplotlib library. However, when I plot the polygons the plot shows many straight lines that do not seem to be part of the polygon. I am relatively new at working with shapefiles but the code I'm using has worked for a previous shapefile I used, so I'm confused and wondering what might be missing in the code.
The code I'm using is:
import numpy as np
import pandas as pd
import shapefile as shp
import matplotlib.pyplot as plt
import seaborn as sns
import os
sns.set(style='whitegrid', palette='ocean', color_codes=True)
sns.mpl.rc('figure', figsize=(10,6))
sf = shp.Reader(shp_path)
def plot_map(sf, x_lim = None, y_lim = None, figsize = (11,9)):
Plot map with lim coordinates
plt.figure(figsize = figsize)
for shape in sf.shapeRecords():
x = [i[0] for i in shape.shape.points[:]]
y = [i[1] for i in shape.shape.points[:]]
plt.plot(x, y, 'k')
if (x_lim == None) & (y_lim == None):
x0 = np.mean(x)
y0 = np.mean(y)
plt.text(x0, y0, id, fontsize=10)
id = id+1
if (x_lim != None) & (y_lim != None):
The following link shows resulting graph (I'm not allowed to post pictures yet?):
Any help is appreciated, thank you all!
pls use 'k.', or use scatter instead of plot
import numpy as np
import pandas as pd
import shapefile as shp
import matplotlib.pyplot as plt
import seaborn as sns
import os
sns.set(style='whitegrid', palette='ocean', color_codes=True)
sns.mpl.rc('figure', figsize=(10,6))
sf = shp.Reader(shp_path)
def plot_map(sf, x_lim = None, y_lim = None, figsize = (11,9)):
Plot map with lim coordinates
plt.figure(figsize = figsize)
for shape in sf.shapeRecords():
x = [i[0] for i in shape.shape.points[:]]
y = [i[1] for i in shape.shape.points[:]]
## change here
plt.plot(x, y, 'k.')
if (x_lim == None) & (y_lim == None):
x0 = np.mean(x)
y0 = np.mean(y)
plt.text(x0, y0, id, fontsize=10)
id = id+1
if (x_lim != None) & (y_lim != None):

How to loop through items in pandas col and run and plot a scikit model?

I got some interesting user data from races. I know when the respecitve athletes planed to finish a race and I know when they actaully finished (next to some more stuff). The goal is to find out when the athletes come in late. I want to run a support vector machine for each athlete and plot the decision boundaries.
Here is what I do:
import numpy as np
import pandas as pd
from sklearn import svm
from mlxtend.plotting import plot_decision_regions
import matplotlib.pyplot as plt
# Create arbitrary dataset for example
df = pd.DataFrame({'User': np.random.random_integers(low=1, high=4, size=50),
'Planned_End': np.random.uniform(low=-5, high=5, size=50),
'Actual_End': np.random.uniform(low=-1, high=1, size=50),
'Late': np.random.random_integers(low=0, high=2, size=50)}
# Fit Support Vector Machine Classifier
X = df[['Planned_End', 'Actual_End']]
y = df['Late']
clf = svm.SVC(decision_function_shape='ovo')
for i, y in df['User']:, y)
ax = plt.subplot()
fig = plot_decision_regions(X=X, y=y, clf=clf, legend=2)
I get the following error: TypeError: 'numpy.int64' object is not iterable - that is, I somehow can't loop through the column.
I guess it comes down to the numpy data format? How can I solve that?
try iteritems()
for i, y in df['User'].iteritems():
Your User Series contains numpy.int64 objects so you can only use:
for y in df['User']:
And you don't use i anywhere.
As for the rest of the code, this produces some solution, please edit accordingly:
import numpy as np
import pandas as pd
from sklearn import svm
from mlxtend.plotting import plot_decision_regions
import matplotlib.pyplot as plt
# Create arbitrary dataset for example
df = pd.DataFrame({'User': np.random.random_integers(low=1, high=4, size=50),
'Planned_End': np.random.uniform(low=-5, high=5, size=50),
'Actual_End': np.random.uniform(low=-1, high=1, size=50),
'Late': np.random.random_integers(low=0, high=2, size=50)}
# Fit Support Vector Machine Classifier
X = df[['Planned_End', 'Actual_End']].as_matrix()
y = df['Late']
clf = svm.SVC(decision_function_shape='ovo')
y = df['User'].values, y)
ax = plt.subplot()
fig = plot_decision_regions(X=X, y=y, clf=clf, legend=2)
