Plotting a barplot with a vertical line in pyplot-seaborn-pandas - python-3.x

I am having trouble doing something that seems to me straightforward.
My data is:
ROE_SP500_Q2_2018_quantile.to_json()
'{"index":{"0":0.0,"1":0.05,"2":0.1,"3":0.15,"4":0.2,"5":0.25,"6":0.3,"7":0.35,"8":0.4,"9":0.45,"10":0.5,"11":0.55,"12":0.6,"13":0.65,"14":0.7,"15":0.75,"16":0.8,"17":0.85,"18":0.9,"19":0.95},"ROE_Quantiles":{"0":-0.8931,"1":-0.0393,"2":0.00569,"3":0.03956,"4":0.05826,"5":0.075825,"6":0.09077,"7":0.10551,"8":0.12044,"9":0.14033,"10":0.15355,"11":0.17335,"12":0.1878,"13":0.209175,"14":0.2357,"15":0.27005,"16":0.3045,"17":0.3745,"18":0.46776,"19":0.73119}}'
My code for the plot is:
plt.close()
plt.figure(figsize=(14,8))
sns.barplot(x = 'Quantile', y = 'ROE', data = ROE_SP500_Q2_2018_quantile)
plt.vlines(x = 0.73, ymin = 0, ymax = 0.6, color = 'blue', size = 2)
plt.show()
which returns the following image:
I would like to correct the following problems:
a) The ticklabels which are overly crowded in a strange way I do not understand
b) The vline which appears in the wrong place. I am using the wrong argument to set the thickness of the line and I get an error.

Pass to parameter data DataFrame, check seaborn.barplot:
data : DataFrame, array, or list of arrays, optional
Dataset for plotting. If x and y are absent, this is interpreted as wide-form. Otherwise it is expected to be long-form.
sns.barplot(x = 'index', y = 'ROE_Quantiles', data = ROE_SP500_Q2_2018_quantile)
#TypeError: vlines() missing 2 required positional arguments: 'ymin' and 'ymax'
plt.vlines(x = 0.73, ymin = 0, ymax = 0.6, color = 'blue', linewidth=5)
j = '{"index":{"0":0.0,"1":0.05,"2":0.1,"3":0.15,"4":0.2,"5":0.25,"6":0.3,"7":0.35,"8":0.4,"9":0.45,"10":0.5,"11":0.55,"12":0.6,"13":0.65,"14":0.7,"15":0.75,"16":0.8,"17":0.85,"18":0.9,"19":0.95},"ROE_Quantiles":{"0":-0.8931,"1":-0.0393,"2":0.00569,"3":0.03956,"4":0.05826,"5":0.075825,"6":0.09077,"7":0.10551,"8":0.12044,"9":0.14033,"10":0.15355,"11":0.17335,"12":0.1878,"13":0.209175,"14":0.2357,"15":0.27005,"16":0.3045,"17":0.3745,"18":0.46776,"19":0.73119}}'
import ast
df = pd.DataFrame(ast.literal_eval(j))
print (df)
index ROE_Quantiles
0 0.00 -0.893100
1 0.05 -0.039300
10 0.50 0.153550
11 0.55 0.173350
12 0.60 0.187800
13 0.65 0.209175
14 0.70 0.235700
15 0.75 0.270050
16 0.80 0.304500
17 0.85 0.374500
18 0.90 0.467760
19 0.95 0.731190
2 0.10 0.005690
3 0.15 0.039560
4 0.20 0.058260
5 0.25 0.075825
6 0.30 0.090770
7 0.35 0.105510
8 0.40 0.120440
9 0.45 0.140330
plt.close()
plt.figure(figsize=(14,8))
sns.barplot(x = 'index', y = 'ROE_Quantiles', data = df)
plt.vlines(x = 0.73, ymin = 0, ymax = 0.6, color = 'blue', linewidth=5)
plt.show()

Related

How to calculate Sensitivity, specificity and pos predictivity for each class in multi class classficaition

I have checked all SO question which generate confusion matrix and calculate TP, TN, FP, FN.
Scikit-learn: How to obtain True Positive, True Negative, False Positive and False Negative
Mainly it usage
from sklearn.metrics import confusion_matrix
For two class it's easy
from sklearn.metrics import confusion_matrix
y_true = [1, 1, 0, 0]
y_pred = [1, 0, 1, 0]
tn, fp, fn, tp = confusion_matrix(y_true, y_pred, labels=[0, 1]).ravel()
For multiclass there is one solution, but it does it only for first class. Not all class
def perf_measure(y_actual, y_pred):
class_id = set(y_actual).union(set(y_pred))
TP = []
FP = []
TN = []
FN = []
for index ,_id in enumerate(class_id):
TP.append(0)
FP.append(0)
TN.append(0)
FN.append(0)
for i in range(len(y_pred)):
if y_actual[i] == y_pred[i] == _id:
TP[index] += 1
if y_pred[i] == _id and y_actual[i] != y_pred[i]:
FP[index] += 1
if y_actual[i] == y_pred[i] != _id:
TN[index] += 1
if y_pred[i] != _id and y_actual[i] != y_pred[i]:
FN[index] += 1
return class_id,TP, FP, TN, FN
But this by default calculate for only one class.
But I want to calculate the values for each class given a 4 class. For https://extendsclass.com/csv-editor.html#0697f61
I have done it using excel like this.
Then calculate the results for each
I have automated it in Excel sheet, but is there any programatical solution in python or sklearn to do this ?
This is way easier with multilabel_confusion_matrix. For your example, you can also pass labels=["A", "N", "O", "~"] as an argument to get the labels in the preferred order.
from sklearn.metrics import multilabel_confusion_matrix
import numpy as np
mcm = multilabel_confusion_matrix(y_true, y_pred)
tps = mcm[:, 1, 1]
tns = mcm[:, 0, 0]
recall = tps / (tps + mcm[:, 1, 0]) # Sensitivity
specificity = tns / (tns + mcm[:, 0, 1]) # Specificity
precision = tps / (tps + mcm[:, 0, 1]) # PPV
Which results in an array for each metric:
[[0.83333333 0.94285714 0.64 0.25 ] # Sensitivity / Recall
[0.99029126 0.74509804 0.91666667 1. ] # Specificity
[0.9375 0.83544304 0.66666667 1. ]] # Precision / PPV
Alternatively, you may view class-dependent precision and recall in classification_report. You could get the same lists with output_dict=True and each class label.
>>> print(classification_report(y_true, y_pred))
precision recall f1-score support
A 0.94 0.83 0.88 18
N 0.84 0.94 0.89 70
O 0.67 0.64 0.65 25
~ 1.00 0.25 0.40 8
accuracy 0.82 121
macro avg 0.86 0.67 0.71 121
weighted avg 0.83 0.82 0.81 121

How to plot network by gnuplot

I have a list of more than 100 points. I'd like to plot a figure like this picture. The lines connect any two points whose distance is less than 3.
1.53 2.40
5.39 3.02
4.35 1.29
9.58 8.34
6.59 1.45
3.44 3.45
7.22 0.43
0.23 8.09
4.38 3.49
https://www.codeproject.com/Articles/1237026/Simple-MLP-Backpropagation-Artificial-Neural-Netwo
You probably have to check every point against every other point whether the distance is less than your threshold. So, create a table with all these points, the vector between them and plot them with vectors. The following example creates some random points with random sizes and random colors.
Code:
### Plot connections between points which are closer than a threshold
reset session
set size square
# create some random test data
set print $Data
myColors = "0xe71840 0x4d76c3 0xf04092 0x47c0ad 0xf58b1e 0xe6eb18 0x59268e 0x36b64c"
myColor(n) = int(word(myColors,n))
do for [i=1:100] {
print sprintf("%g %g %g %d", rand(0), rand(0), rand(0)*2+1, myColor(int(rand(0)*8)+1))
}
set print
d(x1,y1,x2,y2) = sqrt((x2-x1)**2 + (y2-y1)**2)
myDist = 0.2
set print $Connect
do for [i=1:|$Data|-1] {
x1=real(word($Data[i],1))
y1=real(word($Data[i],2))
do for [j=i+1:|$Data|] {
x2=real(word($Data[j],1))
y2=real(word($Data[j],2))
if (d(x1,y1,x2,y2)<myDist) { print sprintf("%g %g %g %g", x1, y1, x2-x1, y2-y1) }
}
}
set print
set key noautotitle
plot $Connect u 1:2:3:4 w vec lc "grey" nohead, \
$Data u 1:2:3:4 w p pt 7 ps var lc rgb var
### end of code
Result:
You do not specify how to choose the node size or color. I show an example using a constant pointsize and taking the color from sequential linetypes
[![enter image description here][1]][1]$DATA << EOD
1.53 2.40
5.39 3.02
4.35 1.29
9.58 8.34
6.59 1.45
3.44 3.45
7.22 0.43
0.23 8.09
4.38 3.49
EOD
N = |$DATA|
do for [i=1:N] {
do for [j=i+1:N] {
x0 = real(word($DATA[i],1))
y0 = real(word($DATA[i],2))
x1 = real(word($DATA[j],1))
y1 = real(word($DATA[j],2))
if ((x1-x0)**2 + (y1-y0)**2 <= 9) {
set arrow from x0,y0 to x1,y1 nohead
}
}
}
unset border
unset tics
unset key
set pointsize 3
plot $DATA using 1:2:0 with points pt 7 lc variable

Bar graph with standard errors from Dataframe?

I have a DataFrame that stores results from a regression, like this:
feats = ['X1', 'X2', 'X3']
betas = [0.5, 0.7, 0.9]
ses = [0.05, 0.03, 0.02]
data = {
"Feature": feats,
"Beta": betas,
"Error":ses
}
data = pd.DataFrame(data)
It looks like this:
Beta Error Feature
0 0.5 0.05 X1
1 0.7 0.03 X2
2 0.9 0.02 X3
I want to make a graph coefficients for each feature, the height being "Beta" and the error line being "Error".
Is there a way to get this working in Matplot?
I have tried error plot but maybe did it wrong or something.
You can use the plt.errorbar as following (matplotlib 2.2.2)
plt.errorbar(data.Feature, data.Beta, yerr=data.Error, capthick=2, capsize=2)
If somehow the above line doesn't work for you, you can use this workaround
plt.errorbar(range(len(data.Feature)), data.Beta, yerr=data.Error, capthick=2, capsize=2)
plt.xticks(range(len(data.Feature)), data.Feature)

Gnuplot fitting

I want to fit the following data:
70 0.0429065
100 0.041212
150 0.040117
200 0.035018
250 0.024366
300 0.02017
350 0.018255
400 0.015368
to the following function which is combination of an exponantial and a gaussian functions:
$ f(x)= a1*(a2* exp(-x/T2e)+ exp(-(x/T2g)**2))
$ fit f(x) 'data' via a1,a2,T2e,T2g
But it keeps giving me the following results:
a1 = 0.0720021 +/- 0.04453 (61.84%)
a2 = 0.310022 +/- 0.9041 (291.6%)
T2e = 63291.7 +/- 2.029e+07 (3.206e+04%)
T2g = 252.79 +/- 32.36 (12.8%)
While when I try to fit it separetly to
$ g(x)=b* exp(-(x/T2g)**2)
$ fit g(x) 'data' via b,T2g
I get
b = 0.0451053 +/- 0.001598 (3.542%)
T2g = 359.359 +/- 16.89 (4.701%)
and
$ S(x)=S0* exp(-x/T2e)
$ fit S(x) 'data' via S0,T2e
gives:
S0 = 0.057199 +/- 0.003954 (6.913%)
T2e = 319.257 +/- 38.17 (11.96%)
I already tried to set the initial values but it didn't change the results.
Does anybody know what is wrong?
Thank you,
Ok, you can see an exponential decay with a hump which could be a Gaussian.
The approach, how I got to a fit: first, exclude the datapoints 100 and 150 and fit the exponental and then set a Gaussian approximately at 170.
You probably don't get a good fit, because at least the Gaussian peak is shifted by some value x1.
With the code:
### fitting
reset session
$Data <<EOD
70 0.0429065
100 0.041212
150 0.040117
200 0.035018
250 0.024366
300 0.02017
350 0.018255
400 0.015368
EOD
a = 0.055
T2e = 310
b = 0.008
x1 = 170
T2g = 54
Exponential(x) = a*exp(-x/T2e)
Gaussian(x) = b*exp(-((x-x1)/T2g)**2)
f(x) = Exponential(x) + Gaussian(x)
fit f(x) $Data u 1:2 via a,b,x1,T2e,T2g
plot $Data u 1:2 w lp pt 7, f(x) lc rgb "red"
### end of code
You'll get:
a = 0.0535048 +/- 0.00183 (3.42%)
b = 0.00833589 +/- 0.001006 (12.06%)
x1 = 170.356 +/- 5.664 (3.325%)
T2e = 315.114 +/- 12.94 (4.106%)
T2g = 54.823 +/- 12.13 (22.12%)

Plot transparent 3D boxes using gnuplot

I have a data file "data.txt" which contains the coordinates of the borders of several boxes in three dimensions. Each line represents a single box. The file contains over 100 boxes.
x_Min x_Max y_Min y_Max z_Min z_Max
-0.2 0.2 -0.2 0.2 -0.2 0.2
0.2 0.4 -0.2 0.2 -0.2 0.2
....
...
..
Now I want to plot that. In two dimensions it is very easy by using
plot "boxes.txt" u 1:2:3:4 w boxxyerrorbars
With (x-Value):(y-Value):(Half Width):(Half Height).
Than I get this:
But how can I achieve this in three dimensions? I didn't find any solution for this problem.
In case you are still interested in a gnuplot solution...
If it is sufficient to just draw the edges of the boxes you can use the plotting style with vectors. You simply need to select the necessary columns and plot all edges in 3 loops. Here gnuplot's integer division (e.g. 1/2=0) is helpful.
However, if you want to plot surfaces and hide surfaces if they are covered by another box you'd better use with pm3d (check help pm3d). Then, however, you have to re-shape your input data.
Script:
### plot edges of boxes in 3D
reset session
$Data <<EOD
x_Min x_Max y_Min y_Max z_Min z_Max
-0.2 0.2 -0.2 0.2 -0.2 0.2
0.3 0.4 -0.1 0.2 -0.1 0.2
-1.5 -0.5 -1.2 -0.4 -0.9 0.0
0.5 1.0 -1.0 -0.5 -0.5 -0.1
0.0 0.3 -1.4 -1.1 -1.0 -0.7
EOD
set xyplane relative 0
set view equal xyz
set view 60,30,1.7
set xtics 0.5
set ytics 0.5
set ztics 0.5
set key noautotitle
splot for [i=0:3] $Data u 1:i/2+3:i%2+5:($2-$1):(0):(0):0 w vec lc var nohead, \
for [i=0:3] '' u i/2+1:3:i%2+5:(0):($4-$3):(0):0 w vec lc var nohead, \
for [i=0:3] '' u i/2+1:i%2+3:5:(0):(0):($6-$5):0 w vec lc var nohead
### end of script
Result:
I actually found a solution using Python and Matplotlib.
import numpy as np
import matplotlib.pyplot as plt
import random
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.gca(projection='3d')
DIM = 3;
# Unit cube
cube = [[[0.0,1.0],[0.0,0.0],[0.0,0.0]],\
[[0.0,0.0],[0.0,1.0],[0.0,0.0]],\
[[0.0,0.0],[0.0,0.0],[0.0,1.0]],\
[[1.0,1.0],[0.0,1.0],[0.0,0.0]],\
[[1.0,0.0],[1.0,1.0],[0.0,0.0]],\
[[1.0,1.0],[0.0,0.0],[0.0,1.0]],\
[[1.0,1.0],[1.0,1.0],[0.0,1.0]],\
[[0.0,0.0],[1.0,1.0],[0.0,1.0]],\
[[0.0,0.0],[0.0,1.0],[1.0,1.0]],\
[[0.0,1.0],[0.0,0.0],[1.0,1.0]],\
[[1.0,1.0],[0.0,1.0],[1.0,1.0]],\
[[0.0,1.0],[1.0,1.0],[1.0,1.0]]]
# Number of Cubes
numb_Cubes = 5
# Array with positions [x, y, z]
pos = [[0 for x in range(DIM)] for y in range(numb_Cubes)]
for k in range(numb_Cubes):
for d in range(DIM):
pos[k][d] = random.uniform(-1,1)
# Size of cubes
size_of_cubes = [0 for y in range(numb_Cubes)]
for k in range(numb_Cubes):
size_of_cubes[k] = random.random()
# Limits
xmin, xmax = -1, 1
ymin, ymax = -1, 1
zmin, zmax = -1, 1
for n in range(numb_Cubes):
for k in range(len(cube)):
x = np.linspace(cube[k][0][0]*size_of_cubes[n]+pos[n][0], cube[k][0][1]*size_of_cubes[n]+pos[n][0], 2)
y = np.linspace(cube[k][1][0]*size_of_cubes[n]+pos[n][1], cube[k][1][1]*size_of_cubes[n]+pos[n][1], 2)
z = np.linspace(cube[k][2][0]*size_of_cubes[n]+pos[n][2], cube[k][2][1]*size_of_cubes[n]+pos[n][2], 2)
ax.plot(x, y, z, 'black', lw=1)
ax.set_xlim([xmin,xmax])
ax.set_ylim([ymin,ymax])
ax.set_zlim([zmin,ymax])
The result I get:
I am still interested in a solution for gnuplot or a faster solution for Python.

Resources