Bar graph with standard errors from Dataframe? - python-3.x

I have a DataFrame that stores results from a regression, like this:
feats = ['X1', 'X2', 'X3']
betas = [0.5, 0.7, 0.9]
ses = [0.05, 0.03, 0.02]
data = {
"Feature": feats,
"Beta": betas,
"Error":ses
}
data = pd.DataFrame(data)
It looks like this:
Beta Error Feature
0 0.5 0.05 X1
1 0.7 0.03 X2
2 0.9 0.02 X3
I want to make a graph coefficients for each feature, the height being "Beta" and the error line being "Error".
Is there a way to get this working in Matplot?
I have tried error plot but maybe did it wrong or something.

You can use the plt.errorbar as following (matplotlib 2.2.2)
plt.errorbar(data.Feature, data.Beta, yerr=data.Error, capthick=2, capsize=2)
If somehow the above line doesn't work for you, you can use this workaround
plt.errorbar(range(len(data.Feature)), data.Beta, yerr=data.Error, capthick=2, capsize=2)
plt.xticks(range(len(data.Feature)), data.Feature)

Related

How to add many values in error-bars chart in Python?

Anyone can help me, please?
x = np.array([['A','B','C','D','E'],['A','B','C','D','E']])
y = np.array([[2.60, 3.04, 2.98, 3.76, 3.00],[2.68, 2.96, 2.94, 3.75,3.03]])
yerr = np.array([[1.26, 1.37, 1.33 , 1.27, 1.38],[ 1.25, 1.38, 1.31, 1.27,1.38 ]])
plt.errorbar(x, y, yerr=yerr, fmt='o')
I am trying to plot error-bars with many values but the Python extracted the errors that
TypeError: unhashable type: 'numpy.ndarray'
although matplotlib.pyplot allows to draw error-bar with array data?
Many thanks,
parameters x and y should be one dimensional array, try to modify like this
x = np.array(['A','B','C','D','E'])
y = np.array([2.60, 3.04, 2.98, 3.76, 3.00])
yerr = np.array([[1.26, 1.37, 1.33 , 1.27, 1.38],[ 1.25, 1.38, 1.31, 1.27,1.38 ]])
plt.errorbar(x, y, yerr=yerr, fmt='o')

It's related to ROC curve

I have no problem in plotting the ROC curve and it also gets plotted as per my requirement, but the problem I am facing is in (ylim axes) it starts from 0.1 to 1.05, and it plots only even numbers (0.0 0.2 0.4...1.05), but I want to extend the ylim axes (for eg. 0.0 0.1 0.2 0.3...1.05). I want a code which includes both even and odd number while plotting ROC curve.
I searched in matplotlib but I didn't find anything related to my problem.
lw = 2
plt.figure()
plt.plot(fpr11, tpr11, 'o-', ms=2, label='ROC_curve_APOE(AUC11 = %0.4f)'
% roc_auc11, color='deeppink', linestyle=':', linewidth=2)
plt.plot(fpr51, tpr51, 'o-', ms=2, label='ROC_curve_Combined AUC5 =
%0.4f)' % roc_auc51, color='cornflowerblue', linestyle=':', linewidth=2)
plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')
plt.xlim([0, 1])
plt.ylim([0, 1.05])
plt.xlabel('1-Specificity(False Positive Rate)')
plt.ylabel('Sensitivity(True Positive Rate)')
# plt.title('ROC curve for MCIc vs MCIs')
plt.title('ROC curve for AD vs NC')
plt.legend(loc="lower right")
plt.show()
# plt.savefig('roc_auc.png')
plt.close()
My expected output must be the same as over here https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/#roc_curve_for_binary_svm
You can see in this figure that ylim has plotted every point from (0.0 0.1 ....up to 1).
Please help me solve it.
Not sure if you can set steps in ylim/xlim but you can use xticks/yticks instead
def frange(x, y, jump):
while x < y:
yield x
x += jump
yield y
plt.yticks(list(frange(0, 1.05, 0.1)))
How you choose to replace the frange is up to you, but you can also do something like
plt.yticks([0, 0.1, 0.2, 0.3, 0.4,....,1.0, 1.05])

Metpy HRRR Cross Section

I am working on creating cross sections of HRRR model output, I have read in the grib files using xarray with pynio as the engine and then converted this files to netcdf so I can work with them on my windows machine, therefore I am wondering if this is causing these issues.
Here is a what my dataset looks like after reading in the netcdf with xarray: Imgur
After reading in the data I try to follow the Metpy cross section/ Xarray tutorials by parsing the data:
data = ds.metpy.parse_cf()
Which yields this new dataset:Imgur
It created the crs coordinate so I assumed it worked somewhat correctly.
Following this I created a contour map of 700mb RH, winds, and elevation(different data set) where I parsed the RH from the data dataset and also pulled out the x and y
RH = data.metpy.parse_cf('RH_P0_L100_GLC0')
x, y = RH.metpy.coordinates('x', 'y')
This all worked and I could produce a nice looking plot no problem. So next I wanted to make a cross section. Following the example in the documentation:
start = (40.3847, -120.5676)
end = (39.2692, -122.3784)
cross = cross_section(data, start, end)
which gave these errors:Imgur
So then I instead tried using the RH variable from above since
RH.metpy.x
gave the x-dimension. But running
cross = cross_section(RH, start, end)
gave this error instead:Imgur
So I'm just wondering if I missed a step in parsing the original dataset or if the grib to netcdf conversion messed something up or if this is even possible using metpy?
In general I am just working towards creating a cross section like the one in the example: https://unidata.github.io/MetPy/latest/examples/cross_section.html#sphx-glr-examples-cross-section-py
As a bonus question would it be possible to fill terrain under the plots?
Currently, MetPy's cross section interpolation relies on the x and y dimensions being present in the Dataset/DataArray as dimension coordinates (see the description in xarray's documentation here). In your dataset, the x and y dimensions of ygrid_0 and xgrid_0 are listed as dimensions without coordinates, hence the problem.
However, since this situation is commonly encountered in meteorological data files, MetPy's current implementation may be too stringent. I would suggest opening an issue on MetPy's issue tracker.
In regards to your bonus question, so long as you have terrain level data in the same vertical coordinate as your data, you can use the fill_between() method in matplotlib to fill in terrain under the plots.
I have nearly the same problem.
ValueError: Data missing required coordinate information. Verify that your data have been parsed by MetPy with proper x and y dimension coordinates and added crs coordinate of the correct projection for each variable.
if i try this:
cross = cross_section(data, start, end)
the xarray looks like this:
<xarray.Dataset>
Dimensions: (bnds: 2, height: 61, height_2: 1, height_3: 60, height_4: 61, height_5: 1, lat: 101, lev: 1, lev_2: 1, lev_3: 1, lon: 121, time: 24)
Coordinates:
* height (height) float64 1.0 2.0 3.0 4.0 ... 58.0 59.0 60.0 61.0
* height_3 (height_3) float64 1.0 2.0 3.0 4.0 ... 57.0 58.0 59.0 60.0
* lev (lev) float64 0.0
* lev_2 (lev_2) float64 400.0
* lev_3 (lev_3) float64 800.0
* lon (lon) float64 -30.0 -29.5 -29.0 -28.5 ... 29.0 29.5 30.0
* lat (lat) float64 -10.0 -9.5 -9.0 -8.5 ... 38.5 39.0 39.5 40.0
crs object Projection: latitude_longitude
* height_2 (height_2) float64 10.0
* time (time) float64 2.017e+07 2.017e+07 ... 2.017e+07 2.017e+07
* height_4 (height_4) float64 1.0 2.0 3.0 4.0 ... 58.0 59.0 60.0 61.0
* height_5 (height_5) float64 2.0
Dimensions without coordinates: bnds
Data variables:
height_bnds (height, bnds) float64 ...
height_3_bnds (height_3, bnds) float64 ...
lev_bnds (lev, bnds) float64 ...
lev_2_bnds (lev_2, bnds) float64 ...
lev_3_bnds (lev_3, bnds) float64 ...
z_ifc (height, lat, lon) float32 ...
topography_c (lat, lon) float32 ...
fis (lat, lon) float32 ...
con_gust (time, height_2, lat, lon) float32 ...
gust10 (time, height_2, lat, lon) float32 ...
u (time, height_3, lat, lon) float32 ...
I mean there is a lat lon grid... is there a workaround to use the cross_section for a lat lon grid?
or can i rename the lat lon to x and y?
Best

Plotting a barplot with a vertical line in pyplot-seaborn-pandas

I am having trouble doing something that seems to me straightforward.
My data is:
ROE_SP500_Q2_2018_quantile.to_json()
'{"index":{"0":0.0,"1":0.05,"2":0.1,"3":0.15,"4":0.2,"5":0.25,"6":0.3,"7":0.35,"8":0.4,"9":0.45,"10":0.5,"11":0.55,"12":0.6,"13":0.65,"14":0.7,"15":0.75,"16":0.8,"17":0.85,"18":0.9,"19":0.95},"ROE_Quantiles":{"0":-0.8931,"1":-0.0393,"2":0.00569,"3":0.03956,"4":0.05826,"5":0.075825,"6":0.09077,"7":0.10551,"8":0.12044,"9":0.14033,"10":0.15355,"11":0.17335,"12":0.1878,"13":0.209175,"14":0.2357,"15":0.27005,"16":0.3045,"17":0.3745,"18":0.46776,"19":0.73119}}'
My code for the plot is:
plt.close()
plt.figure(figsize=(14,8))
sns.barplot(x = 'Quantile', y = 'ROE', data = ROE_SP500_Q2_2018_quantile)
plt.vlines(x = 0.73, ymin = 0, ymax = 0.6, color = 'blue', size = 2)
plt.show()
which returns the following image:
I would like to correct the following problems:
a) The ticklabels which are overly crowded in a strange way I do not understand
b) The vline which appears in the wrong place. I am using the wrong argument to set the thickness of the line and I get an error.
Pass to parameter data DataFrame, check seaborn.barplot:
data : DataFrame, array, or list of arrays, optional
Dataset for plotting. If x and y are absent, this is interpreted as wide-form. Otherwise it is expected to be long-form.
sns.barplot(x = 'index', y = 'ROE_Quantiles', data = ROE_SP500_Q2_2018_quantile)
#TypeError: vlines() missing 2 required positional arguments: 'ymin' and 'ymax'
plt.vlines(x = 0.73, ymin = 0, ymax = 0.6, color = 'blue', linewidth=5)
j = '{"index":{"0":0.0,"1":0.05,"2":0.1,"3":0.15,"4":0.2,"5":0.25,"6":0.3,"7":0.35,"8":0.4,"9":0.45,"10":0.5,"11":0.55,"12":0.6,"13":0.65,"14":0.7,"15":0.75,"16":0.8,"17":0.85,"18":0.9,"19":0.95},"ROE_Quantiles":{"0":-0.8931,"1":-0.0393,"2":0.00569,"3":0.03956,"4":0.05826,"5":0.075825,"6":0.09077,"7":0.10551,"8":0.12044,"9":0.14033,"10":0.15355,"11":0.17335,"12":0.1878,"13":0.209175,"14":0.2357,"15":0.27005,"16":0.3045,"17":0.3745,"18":0.46776,"19":0.73119}}'
import ast
df = pd.DataFrame(ast.literal_eval(j))
print (df)
index ROE_Quantiles
0 0.00 -0.893100
1 0.05 -0.039300
10 0.50 0.153550
11 0.55 0.173350
12 0.60 0.187800
13 0.65 0.209175
14 0.70 0.235700
15 0.75 0.270050
16 0.80 0.304500
17 0.85 0.374500
18 0.90 0.467760
19 0.95 0.731190
2 0.10 0.005690
3 0.15 0.039560
4 0.20 0.058260
5 0.25 0.075825
6 0.30 0.090770
7 0.35 0.105510
8 0.40 0.120440
9 0.45 0.140330
plt.close()
plt.figure(figsize=(14,8))
sns.barplot(x = 'index', y = 'ROE_Quantiles', data = df)
plt.vlines(x = 0.73, ymin = 0, ymax = 0.6, color = 'blue', linewidth=5)
plt.show()

log x axis on matplotlib histogram with imshow()

I have some x data in a lognorm distribution, of which I would like to plot a histogram. I have this data for different parameters, so I would like to have those parameters as my y-axis, on my x-axis I would like to have the bins of my histogram (in log-scale with reasonable ticks like 100 to 103) - exactly like a ax.hist(), only using imshow() to get it in a more beautiful and compact way.
mu, sigma = 3., 2.
img = []
for i in range(20):
dat = np.random.lognormal(mu, sigma + 1/10, 10000)
hist = np.histogram(dat, bins = 10**(np.arange(0, 3, step = 0.1)))[0]
img.append(hist)
plt.imshow(img)
plt.show()
The result looks like
this
but I would like to have the x-axis being log and matching the bins.
I also have data for y, but that is not so much of a problem.

Resources