I am a beginner in Keras and need help to understand keras.argmax(a, axis=-1) and keras.max(a, axis=-1). What is the meaning of axis=-1 when a.shape = (19, 19, 5, 80)? And also what will be the output of keras.argmax(a, axis=-1) and keras.max(a, axis=-1)?
This means that the index that will be returned by argmax will be taken from the last axis.
Your data has some shape (19,19,5,80). This means:
Axis 0 = 19 elements
Axis 1 = 19 elements
Axis 2 = 5 elements
Axis 3 = 80 elements
Now, negative numbers work exactly like in python lists, in numpy arrays, etc. Negative numbers represent the inverse order:
Axis -1 = 80 elements
Axis -2 = 5 elements
Axis -3 = 19 elements
Axis -4 = 19 elements
When you pass the axis parameter to the argmax function, the indices returned will be based on this axis. Your results will lose this specific axes, but keep the others.
See what shape argmax will return for each index:
K.argmax(a,axis= 0 or -4) returns (19,5,80) with values from 0 to 18
K.argmax(a,axis= 1 or -3) returns (19,5,80) with values from 0 to 18
K.argmax(a,axis= 2 or -2) returns (19,19,80) with values from 0 to 4
K.argmax(a,axis= 3 or -1) returns (19,19,5) with values from 0 to 79
Related
I'm trying to apply the ranksums function over all rows of two arrays
groupA = subset.iloc[:,4:7].values
groupB = subset.iloc[:,7:10].values
np.apply_along_axis(ranksums, 1, (zip(groupA,groupB)))
I get: AxisError: axis 1 is out of bounds for array of dimension 0
I have one dataframe with two columns , A and B . first i need to make empty bins with step 1 from 1 to 11 , (1,2),(2,3)....(10,11). then check from original dataframe if column B value greater than 3 then get value of column 'A' 2 rows before when column B is greater than 3.
Here is example dataframe :
df=pd.DataFrame({'A':[1,8.5,5.2,7,8,9,0,4,5,6],'B':[1,2,2,2,3.1,3.2,3,2,1,2]})
Required output 1:
df_out1=pd.DataFrame({'Value_A':[8.5,5.2]})
Required_output_2:
df_output2:
Bins count
(1 2) 0
(2,3) 0
(3,4) 0
(4,5) 0
(5,6) 1
(6,7) 0
(7,8) 0
(8,9) 1
(9,10) 0
(10,11) 0
You can index on a shifted series to get the two rows before 'A' satisfies some condition like
out1 = df['A'].shift(3)[df['B'] > 3]
The thing you want to do with the bins is known as a histogram. You can easily do this with numpy like
count, bin_edges = np.histogram(out1, bins=[i for i in range(1, 12)])
out2 = pd.DataFrame({'bin_lo': bin_edges[:-1], 'bin_hi': bin_edges[1:], 'count': count})
Here 'bin_lo' and 'bin_hi' are the lower and upper bounds of the bins.
I have 3 questions:
1)
The confusion matrix for sklearn is as follows:
TN | FP
FN | TP
While when I'm looking at online resources, I find it like this:
TP | FP
FN | TN
Which one should I consider?
2)
Since the above confusion matrix for scikit learn is different than the one I find in other rescources, in a multiclass confusion matrix, what's the structure will be? I'm looking at this post here:
Scikit-learn: How to obtain True Positive, True Negative, False Positive and False Negative
In that post, #lucidv01d had posted a graph to understand the categories for multiclass. is that category the same in scikit learn?
3)
How do you calculate the accuracy of a multiclass? for example, I have this confusion matrix:
[[27 6 0 16]
[ 5 18 0 21]
[ 1 3 6 9]
[ 0 0 0 48]]
In that same post I referred to in question 2, he has written this equation:
Overall accuracy
ACC = (TP+TN)/(TP+FP+FN+TN)
but isn't that just for binary? I mean, for what class do I replace TP with?
The reason why sklearn has show their confusion matrix like
TN | FP
FN | TP
like this is because in their code, they have considered 0 to be the negative class and one to be positive class. sklearn always considers the smaller number to be negative and large number to positive. By number, I mean the class value (0 or 1). The order depends on your dataset and class.
The accuracy will be the sum of diagonal elements divided by the sum of all the elements.p The diagonal elements are the number of correct predictions.
As the sklearn guide says: "(Wikipedia and other references may use a different convention for axes)"
What does it mean? When building the confusion matrix, the first step is to decide where to put predictions and real values (true labels). There are two possibilities:
put predictions to the columns, and true labes to rows
put predictions to the rows, and true labes to columns
It is totally subjective to decide which way you want to go. From this picture, explained in here, it is clear that scikit-learn's convention is to put predictions to columns, and true labels to rows.
Thus, according to scikit-learns convention, it means:
the first column contains, negative predictions (TN and FN)
the second column contains, positive predictions (TP and FP)
the first row contains negative labels (TN and FP)
the second row contains positive labels (TP and FN)
the diagonal contains the number of correctly predicted labels.
Based on this information I think you will be able to solve part 1 and part 2 of your questions.
For part 3, you just sum the values in the diagonal and divide by the sum of all elements, which will be
(27 + 18 + 6 + 48) / (27 + 18 + 6 + 48 + 6 + 16 + 5 + 21 + 1 + 3 + 9)
or you can just use score() function.
The scikit-learn convention is to place predictions in columns and real values in rows
The scikit-learn convention is to put 0 by default for a negative class (top) and 1 for a positive class (bottom). the order can be changed using labels = [1,0].
You can calculate the overall accuracy in this way
M = np.array([[27, 6, 0, 16], [5, 18,0,21],[1,3,6,9],[0,0,0,48]])
M
sum of diagonal
w = M.diagonal()
w.sum()
99
sum of matrices
M.sum()
160
ACC = w.sum()/M.sum()
ACC
0.61875
I have a closed contour in the form of a polyline. I am accessing the point
through vtkPolyData.GetLines() and iterating through the cells in
vtkCellArray.
I want to calculate the angle bisector at each vertex of the line. Therefore
I need to know the coordinate of V_{i-1}, V_i and V_{i+1}.
In the vtkCellArray, [n0, p_1, p_2,... , p_n0, ... ] , if p_2 comes after
p_1 in the cell , does it mean that p_1 and p_2 are connected together?
Yes, it does. Just to test your case with vtkPolyLine, let's create a vtkPolyData with a single vtkPolyLine where the last point of the line is same as the first point. We will see that the resultant cell array has the same sequence (i.e. the last and first point are the same.)
import vtk as v
pts = v.vtkPoints()
pts.InsertNextPoint(0,0,0)
pts.InsertNextPoint(1,0,0)
pts.InsertNextPoint(2,0,0)
pts.InsertNextPoint(3,0,0)
polyLine = v.vtkPolyLine()
polyLine.GetPointIds().SetNumberOfIds(5)
polyLine.GetPointIds().SetId(0,0)
polyLine.GetPointIds().SetId(1,1)
polyLine.GetPointIds().SetId(2,2)
polyLine.GetPointIds().SetId(3,3)
polyLine.GetPointIds().SetId(4,0)
lines = v.vtkCellArray()
lines.InsertNextCell(polyLine)
pd = v.vtkPolyData()
pd.SetPoints(pts)
pd.SetLines(lines)
wr = v.vtkPolyDataWriter()
wr.SetFileName('Lines.vtk')
wr.SetInputData(pd)
wr.Write()
The file Lines.vtk contains the following:
# vtk DataFile Version 4.2
vtk output
ASCII
DATASET POLYDATA
POINTS 4 float
0 0 0 1 0 0 2 0 0
3 0 0
LINES 1 6
5 0 1 2 3 0 # This line has 5 points and last and first point are the same (0)
I'm plotting a heatmap in gnuplot from a text file that is in matrix format:
z11 z12 z13
z21 z22 z23
z31 z32 z33
and so forth, using the following command (not including axis labelling, etc, for brevity):
plot '~/some_text_file.txt' matrix notitle with image
The matrix is quite large, in excess of 50 000 elements in the majority of cases, and it's mostly due to the size of my y-dimension (#rows). I would like to know if there's a way to change the limits in the y-dimension for a set number of values around a maximum, while keeping the x and z dimensions the same. E.g. if a maximum in the matrix is at [4000, 33], I want my y range to be centred at 4000 +- let's say 20% of length of the y-dimension.
Thanks.
Edit:
The solution below is basically the correct idea, however it works in my example but not in general because a bug in how gnuplot uses the stats command with matrix files. See the comments after the answer for further info.
You can do this using stats to get the indices that correspond to the maximum value dynamically.
Consider the following file which I named data:
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 5 3 4
0 1 2 3 4
If I run statsI get:
gnuplot> stats "data" matrix
* FILE:
Records: 25
Out of range: 0
Invalid: 0
Blank: 0
Data Blocks: 1
* MATRIX: [5 X 5]
Mean: 2.1200
Std Dev: 1.5315
Sum: 53.0000
Sum Sq.: 171.0000
Minimum: 0.0000 [ 0 0 ]
Maximum: 5.0000 [ 3 2 ]
COG: 2.9434 2.0566
The maximum value is in position [ 3 2 ] meaning row 3+1 and column 2+1 (in gnuplot the first row/column would be number 0). After running stats some variables are created automatically (help stats for more info), with STATS_index_max_x and STATS_index_max_y among them, which store the position of the maximum:
gnuplot> print STATS_index_max_x
3.0
gnuplot> print STATS_index_max_y
2.0
Which you can use to automatically set the ranges. Now, because STATS_index_max_x actually gives you the y (instead of x) position, you'll need to be careful. The total number of rows to obtain the range can be obtained with a system call (there might be a better built-in function, which I do not know):
gnuplot> range = system("awk 'END{print NR}' data")
gnuplot> print range
5
So basically you'll do:
stats "data" matrix
range = system("awk 'END{print NR}' data")
range_center = STATS_index_max_x
d = 0.2 * range
set yrange [range_center - d : range_center + d]
which will center the yrange at the position of your maximum value and will stretch it by +-20% of its total range.
The result of plot "data" matrix w image is now
instead of