l work with Networkx to generate some class of graphs.
Now l would like to permute nodes and rotate the graph with (80°, 90°,120° degree)
How can l apply permutation and rotation on graphs with NetworkX ?
Edit_1:
Given an adjacency matrix of a graph, l would like to rotate the graph in the way that it preserves the edges and vertices link. The only thing that changes is the position of nodes.
What l would like to do is to rotate my graph with 90 degree.
Input :
Adjacency matrix of graph G
process :
Apply rotation on G with 90 degree
Output :
Rotated adjacency matrix
It means, the graph preserves its topology and just the index of adjacency matrix that changes position.
For example nodes 1 at index 0 after rotation will be at index 4 for instance.
What l have tried ?
1)l looked after numpy.random.permutation() but it does't seem to accept the rotation parameter.
2) In networkX l didn't find any function that allows to do rotation.
EDIT2
Given an adjacency matrix of 5*5 (5 nodes:
adj=[[0,1,0,0,1],
[1,0,1,1,0],
[0,0,0,1,1],
[0,0,1,0,1],
[1,1,1,1,0]
]
l would like to permute between indexes .
Say that node 1 takes the place of node 3 , node 3 takes the place of nodes 4 and node 4 takes the place of node 1.
It's just the permutation of nodes (preserving their edges).
l would like to keep in a dictionary the mapping between original index and the new index after permutation.
Secondly, l would like to apply permutation or rotation of this adjacency matrix with an angle of 90°. (It's like apply rotation on an image). I'm not sure how it can be done.
Take a look at the networkx command relabel_nodes.
Given a graph G, if we want to relabel node 0 as 1, 1 as 3, and 3 as 0 [so a permutation of the nodes, leaving 2 in place], we create the dict mapping = {0:1, 1:3, 3:0}. Then we do
H = nx.relabel_nodes(G, mapping)
And H is now the permuted graph.
import networkx as nx
G = nx.path_graph(4) #0-1-2-3
mapping = {0:1, 1:3, 3:0}
H = nx.relabel_nodes(G, mapping) #1-3-2-0
#check G's adjacency matrix
print(nx.to_numpy_matrix(G,nodelist=[0,1,2,3]))
> [[ 0. 1. 0. 0.]
[ 1. 0. 1. 0.]
[ 0. 1. 0. 1.]
[ 0. 0. 1. 0.]]
#check H's adjacency matrix
print(nx.to_numpy_matrix(H,nodelist=[0,1,2,3]))
> [[ 0. 0. 1. 0.]
[ 0. 0. 0. 1.]
[ 1. 0. 0. 1.]
[ 0. 1. 1. 0.]]
Related
I am implementing a machine learning model in Python which predicts success or failure. I have created a dummy variable which is 1 when there is success and 0 when there is a failure. I understand the concept of confusion matrix but I have found some online where the TPs and TNs are on opposite sides of the matrix. I would like to know how to interpret the results for my variables. Is the top-left corner of the matrix predicting True Positive? If so would that translate to the amount of successes being predicted correctly or the amount of failures being predicted correctly?
Does the matrix match the diagram below and if so how?
Ideally, please describe each corner of the confusion matrix in the context where I have success as 1 and failure as 0.
Refer to the documentation: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html
Since you haven't specified the third parameter for labels in confusion_matrix, the labels in y_test_res will be used in sorted order, i.e. in this case 0 then 1. The row labels represent actual y, and column labels represent predicted y.
So the top-left corner is showing the number of failure observations, i.e. the actual y was 0 and was predicted 0, i.e. true negatives. The bottom-right corner is showing true positives, i.e. the actual y was 1 and was predicted 0.
The top-right corner would be actual y = 0 and predicted y = 1, i.e. false positive.
Using the confusion matrix plot would prettify things a little.
from sklearn.metrics import plot_confusion_matrix
plot_confusion_matrix(forest, X_test, y_test)
print(plt.show())
In the case of binary classification where classes are 0 and 1 and according to the doc :
1st row is for class 0
2nd row is for class 1
1st column is for predicted class 0
2nd column is for predicted class 1
Coefficient (0, 0) is the True Negative count (TN).
Coefficient (0, 1) is the False Positive count (FP).
Coefficient (1, 0) is the False Negative count (FN).
Coefficient (1, 1) is the True Positive count (TP).
Given a corpus of relevant documents (CORPUS) and a corpus of random documents (ran_CORPUS) I want to compute TF-IDF scores for all words in CORPUS, using ran_CORPUS as a base line. In my project, the ran_CORPUS has approximately 10 times as many documents as CORPUS.
CORPUS = ['this is a relevant document',
'this one is a relevant text too']
ran_CORPUS = ['the sky is blue',
'my cat has a furry tail']
My plan is to normalize the documents, make all documents in CORPUS to one document (CORPUS being now a list with one long string element). To CORPUS I append all ran_CORPUS documents. Using sklearn's TfidfTransformer I then would compute the TF-IDF matrix for the corpus (consisting now of CORPUS and ran_CORPUS). And finally select the first row of that CORPUS to get the TF-IDF scores for my initial relevant CORPUS.
Does anybody know whether this approach could work and if there is a simple way to code it?
When you say "whether this approach could work", I presume you mean does merging all the relevant documents into one and vectorising present a valid model. I would guess it depends what you are going to try to do with that model.
I'm not much of a mathematician, but I imagine that this is like averaging the scores for all your documents into one vector space, so you have lost some of the shape of the space the original vector space occupied by the individual relevant documents. So you have tried to make a "master" or "prototype" document which is mean to represent a topic?
If you are then going to do something like similarity matching with test documents, or classification by distance comparison then you may have lost some of the subtlety of the original documents' vectorisation. There may be more facets to the overall topic than the averages represent.
More specifically, imagine your original "relevant corpus" has two clusters of documents because there are actually two main sub-topics represented by different groups of important features. Later while doing classification, test documents could match either of those clusters individually - again because they are close to one of the two sub-topics. By averaging the whole "relevant corpus" in this case you would end up with a single document that was half-way between both of these clusters, but not accurately representing either. Therefore the test presentations might not match at all - depending on the classification technique.
I think it's hard to say without trialling it on proper specific corpuses.
Regardless of the validity, below is how it could be implemented.
Note you can also use the TfidfVectorizer to combine the vectorising and Tfidf'ing steps in one. The results are not always the exactly same, but they are in this case.
Also, you say normalise the documents - typically you might normalise the a vector representation before feeding into a classification algorithm which requires a normalised distribution (like SVM). However I think TFIDF naturally normalises so it doesn't appear to have any further effect (I may be wrong here).
import logging
from sklearn import preprocessing
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer, TfidfTransformer
CORPUS = ['this is a relevant document',
'this one is a relevant text too']
ran_CORPUS = ['the sky is blue',
'my cat has a furry tail']
doc_CORPUS = ' '.join([str(x) for x in CORPUS])
ran_CORPUS.append(doc_CORPUS)
count_vect = CountVectorizer()
X_counts = count_vect.fit_transform(ran_CORPUS)
tfidf_transformer = TfidfTransformer()
X_tfidf = tfidf_transformer.fit_transform(X_counts)
logging.debug("\nCount + TdidfTransform \n%s" % X_tfidf.toarray())
# or do it in one pass with TfidfVectorizer
vectorizer = TfidfVectorizer()
X_tfidf = vectorizer.fit_transform(ran_CORPUS)
logging.debug("\nTdidfVectoriser \n%s" % X_tfidf.toarray())
# normalising doesn't achieve much as tfidf is already normalised.
normalizer = preprocessing.Normalizer()
X_tfidf = normalizer.transform(X_tfidf)
logging.debug("\nNormalised:\n%s" % X_tfidf.toarray())
Count + TdidfTransform
[[0.52863461 0. 0. 0. 0. 0.40204024
0. 0. 0. 0.52863461 0. 0.
0.52863461 0. 0. ]
[0. 0.4472136 0. 0.4472136 0.4472136 0.
0.4472136 0. 0. 0. 0.4472136 0.
0. 0. 0. ]
[0. 0. 0.2643173 0. 0. 0.40204024
0. 0.2643173 0.52863461 0. 0. 0.2643173
0. 0.52863461 0.2643173 ]]
TdidfVectoriser
[[0.52863461 0. 0. 0. 0. 0.40204024
0. 0. 0. 0.52863461 0. 0.
0.52863461 0. 0. ]
[0. 0.4472136 0. 0.4472136 0.4472136 0.
0.4472136 0. 0. 0. 0.4472136 0.
0. 0. 0. ]
[0. 0. 0.2643173 0. 0. 0.40204024
0. 0.2643173 0.52863461 0. 0. 0.2643173
0. 0.52863461 0.2643173 ]]
Normalised:
[[0.52863461 0. 0. 0. 0. 0.40204024
0. 0. 0. 0.52863461 0. 0.
0.52863461 0. 0. ]
[0. 0.4472136 0. 0.4472136 0.4472136 0.
0.4472136 0. 0. 0. 0.4472136 0.
0. 0. 0. ]
[0. 0. 0.2643173 0. 0. 0.40204024
0. 0.2643173 0.52863461 0. 0. 0.2643173
0. 0.52863461 0.2643173 ]]
I have a scipy.sparse.csc_matrix with dtype = np.int32. I want to efficiently divide each column (or row, whichever faster for csc_matrix) of the matrix by the diagonal element in that column. So mnew[:,i] = m[:,i]/m[i,i] . Note that I need to convert my matrix to np.double (since mnew elements will be in [0,1]) and since the matrix is massive and very sparse I wonder if I can do it in some efficient/no for loop/never going dense way.
Best,
Ilya
Make a sparse matrix:
In [379]: M = sparse.random(5,5,.2, format='csr')
In [380]: M
Out[380]:
<5x5 sparse matrix of type '<class 'numpy.float64'>'
with 5 stored elements in Compressed Sparse Row format>
In [381]: M.diagonal()
Out[381]: array([ 0., 0., 0., 0., 0.])
too many 0s in the diagonal - lets add a nonzero diagonal:
In [382]: D=sparse.dia_matrix((np.random.rand(5),0),shape=(5,5))
In [383]: D
Out[383]:
<5x5 sparse matrix of type '<class 'numpy.float64'>'
with 5 stored elements (1 diagonals) in DIAgonal format>
In [384]: M1 = M+D
In [385]: M1
Out[385]:
<5x5 sparse matrix of type '<class 'numpy.float64'>'
with 10 stored elements in Compressed Sparse Row format>
In [387]: M1.A
Out[387]:
array([[ 0.35786668, 0.81754484, 0. , 0. , 0. ],
[ 0. , 0.41928992, 0. , 0.01371273, 0. ],
[ 0. , 0. , 0.4685924 , 0. , 0.35724102],
[ 0. , 0. , 0.77591294, 0.95008721, 0.16917791],
[ 0. , 0. , 0. , 0. , 0.16659141]])
Now it's trivial to divide each column by its diagonal (this is a matrix 'product')
In [388]: M1/M1.diagonal()
Out[388]:
matrix([[ 1. , 1.94983185, 0. , 0. , 0. ],
[ 0. , 1. , 0. , 0.01443313, 0. ],
[ 0. , 0. , 1. , 0. , 2.1444144 ],
[ 0. , 0. , 1.65583764, 1. , 1.01552603],
[ 0. , 0. , 0. , 0. , 1. ]])
Or divide the rows - (multiply by a column vector)
In [391]: M1/M1.diagonal()[:,None]
oops, these are dense; let's make the diagonal sparse
In [408]: md = sparse.csr_matrix(1/M1.diagonal()) # do the inverse here
In [409]: md
Out[409]:
<1x5 sparse matrix of type '<class 'numpy.float64'>'
with 5 stored elements in Compressed Sparse Row format>
In [410]: M.multiply(md)
Out[410]:
<5x5 sparse matrix of type '<class 'numpy.float64'>'
with 5 stored elements in Compressed Sparse Row format>
In [411]: M.multiply(md).A
Out[411]:
array([[ 0. , 1.94983185, 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0.01443313, 0. ],
[ 0. , 0. , 0. , 0. , 2.1444144 ],
[ 0. , 0. , 1.65583764, 0. , 1.01552603],
[ 0. , 0. , 0. , 0. , 0. ]])
md.multiply(M) for the column version.
Division of sparse matrix - similar except it is using the sum of the rows instead of the diagonal. Deals a bit more with the potential 'divide-by-zero' issue.
I would like to implement a kalman filter in Python for some tracking software I'm working on. It is a 2D coordinate system using a single vector x for position, velocity and acceleration of x and y coordinates,
I am using the following update and predict method:
# UPDATE
y = Z - (H * x)
S = H * P * H.T + R # residual convariance
K = P * H.T * S.I # Kalman gain
x = x + K*y
I = np.matrix(np.eye(F.shape[0])) # identity matrix
P = (I - K*H)*P
# PREDICT x, P based on motion
x = F*x
P = F*P*F.T + Q
return x, P
Using other resources I have come up with the following state transition matrix and measurement matrix but am not sure how correct they are:
F = np.matrix('''
1. 0. 1. 0. 0.5 0.;
0. 1. 0. 1. 0. 0.5;
0. 0. 1. 0. 1. 0.;
0. 0. 0. 1. 0. 1.;
0. 0. 0. 0. 1. 0.;
0. 0. 0. 0. 0. 1.
'''),
H = np.matrix('''
1. 0. 0. 0. 0. 0.;
0. 1. 0. 0. 0. 0.;
0. 0. 0. 0. 1. 0.;
0. 0. 0. 0. 0. 1.'''),
Basically it doesn't work and makes my tracked path more jittery and I have no clue where I'm going wrong. Can anyone help?
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
I am doing my graduation project in the field of computer vision, and i have only taken one course in statistics that discussed very basic concepts, and now i am facing more difficulty in rather advanced topics, so i need help (book, tutorial, course, ..etc) to grasp and review the basic ideas and concepts in statistics and then dive into the details (statistical details) used in computer vision.
You can calculate False Positives/False negatives, etc with this Confusion Matrix PyTorch example:
import torch
def confusion(prediction, truth):
""" Returns the confusion matrix for the values in the `prediction` and `truth`
tensors, i.e. the amount of positions where the values of `prediction`
and `truth` are
- 1 and 1 (True Positive)
- 1 and 0 (False Positive)
- 0 and 0 (True Negative)
- 0 and 1 (False Negative)
"""
confusion_vector = prediction / truth
# Element-wise division of the 2 tensors returns a new tensor which holds a
# unique value for each case:
# 1 where prediction and truth are 1 (True Positive)
# inf where prediction is 1 and truth is 0 (False Positive)
# nan where prediction and truth are 0 (True Negative)
# 0 where prediction is 0 and truth is 1 (False Negative)
true_positives = torch.sum(confusion_vector == 1).item()
false_positives = torch.sum(confusion_vector == float('inf')).item()
true_negatives = torch.sum(torch.isnan(confusion_vector)).item()
false_negatives = torch.sum(confusion_vector == 0).item()
return true_positives, false_positives, true_negatives, false_negatives
You could use nn.BCEWithLogitsLoss (remove the sigmoid therefore) and set the pos_weight > 1 to increase the recall. Or further optimize it with using Dice Coefficients to penalize the model for false positives, with something like:
def Dice(y_true, y_pred):
"""Returns Dice Similarity Coefficient for ground truth and predicted masks."""
#print(y_true.dtype)
#print(y_pred.dtype)
y_true = np.squeeze(y_true)/255
y_pred = np.squeeze(y_pred)/255
y_true.astype('bool')
y_pred.astype('bool')
intersection = np.logical_and(y_true, y_pred).sum()
return ((2. * intersection.sum()) + 1.) / (y_true.sum() + y_pred.sum() + 1.)
IOU Calculations Explained
Count true positives (TP)
Count false positives (FP)
Count false negatives (FN)
Intersection = TP
Union = TP + FP + FN
IOU = Intersection/Union
The left side is our ground truth, while the right side contains our predictions. The highlighted cells on the left side note which class we are looking at for statistics on the right side. The highlights on the right side note true positives in a cream color, false positives in orange, and false negatives in yellow (note that all others are true negatives — they are predicted as this individual class, and should not be based on the ground truth).
For Class 0, only the top row of the 4x4 matrix should be predicted as zeros. This is a rather simplified version of a real ground truth. In reality, the zeros could be anywhere in the matrix. On the right side, we see 1,0,0,0, meaning the first is a false negative, but the other three are true positives (aka 3 for Intersection as well). From there, we need to find anywhere else where zero was falsely predicted, and we note that happens once on the second row, and twice on the fourth row, for a total of three false positives.
To get the union, we add up TP (3), FP (3) and FN (1) to get seven. The IOU for this class, therefore, is 3/7.
If we do this for all the classes and average the IOUs, we get:
Mean IOU = [(3/7) + (2/6) + (3/4) + (1/6)] / 4 = 0.420
You will also want to learn how to pull the statistics for mAP (Mean Avg Precision):
https://www.youtube.com/watch?v=pM6DJ0ZZee0
https://towardsdatascience.com/breaking-down-mean-average-precision-map-ae462f623a52#1a59
https://medium.com/#hfdtsinghua/calculate-mean-average-precision-map-for-multi-label-classification-b082679d31be
Compute Covariance Matrixes
The variance of a variable describes how much the values are spread. The covariance is a measure that tells the amount of dependency between two variables.
A positive covariance means that the values of the first variable are large when values of the second variables are also large. A negative covariance means the opposite: large values from one variable are associated with small values of the other.
The covariance value depends on the scale of the variable so it is hard to analyze it. It is possible to use the correlation coefficient that is easier to interpret. The correlation coefficient is just the normalized covariance.
A positive covariance means that large values of one variable are associated with big values from the other (left). A negative covariance means that large values of one variable are associated with small values of the other one (right).
The covariance matrix is a matrix that summarises the variances and covariances of a set of vectors and it can tell a lot of things about your variables. The diagonal corresponds to the variance of each vector:
A matrix A and its matrix of covariance. The diagonal corresponds to the variance of each column vector. Let’s check with the formula of the variance:
With n the length of the vector, and x̄ the mean of the vector. For instance, the variance of the first column vector of A is:
This is the first cell of our covariance matrix. The second element on the diagonal corresponds of the variance of the second column vector from A and so on.
Note: the vectors extracted from the matrix A correspond to the columns of A.
The other cells correspond to the covariance between two column vectors from A. For instance, the covariance between the first and the third column is located in the covariance matrix as the column 1 and the row 3 (or the column 3 and the row 1):
The position in the covariance matrix. Column corresponds to the first variable and row to the second (or the opposite). The covariance between the first and the third column vector of A is the element in column 1 and row 3 (or the opposite = same value).
Let’s check that the covariance between the first and the third column vector of A is equal to -2.67. The formula of the covariance between two variables Xand Y is:
The variables X and Y are the first and the third column vectors in the last example. Let’s split this formula to be sure that it is crystal clear:
The sum symbol (Σ) means that we will iterate on the elements of the vectors. We will start with the first element (i=1) and calculate the first element of X minus the mean of the vector X:
Multiply the result with the first element of Y minus the mean of the vector Y:
Reiterate the process for each element of the vectors and calculate the sum of all results:
Divide by the number of elements in the vector.
EXAMPLE - Let’s start with the matrix A:
We will calculate the covariance between the first and the third column vectors:
and
Which is x̄=3, ȳ=4, and n=3 so we have:
Code example -
Using NumPy, the covariance matrix can be calculated with the function np.cov.
It is worth noting that if you want NumPy to use the columns as vectors, the parameter rowvar=False has to be used. Also, bias=True divides by n and not by n-1.
Let’s create the array first:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
A = np.array([[1, 3, 5], [5, 4, 1], [3, 8, 6]])
Now we will calculate the covariance with the NumPy function:
np.cov(A, rowvar=False, bias=True)
Finding the covariance matrix with the dot product
There is another way to compute the covariance matrix of A. You can center A around 0. The mean of the vector is subtracted from each element of the vector to have a vector with mean equal to 0. It is multiplied with its own transpose, and divided by the number of observations.
Let’s start with an implementation and then we’ll try to understand the link with the previous equation:
def calculateCovariance(X):
meanX = np.mean(X, axis = 0)
lenX = X.shape[0]
X = X - meanX
covariance = X.T.dot(X)/lenX
return covariance
print(calculateCovariance(A))
Output:
array([[ 2.66666667, 0.66666667, -2.66666667],
[ 0.66666667, 4.66666667, 2.33333333],
[-2.66666667, 2.33333333, 4.66666667]])
The dot product between two vectors can be expressed:
It is the sum of the products of each element of the vectors:
If we have a matrix A, the dot product between A and its transpose will give you a new matrix:
Visualize data and covariance matrices
In order to get more insights about the covariance matrix and how it can be useful, we will create a function to visualize it along with 2D data. You will be able to see the link between the covariance matrix and the data.
This function will calculate the covariance matrix as we have seen above. It will create two subplots — one for the covariance matrix and one for the data. The heatmap() function from Seaborn is used to create gradients of colour — small values will be coloured in light green and large values in dark blue. We chose one of our palette colours, but you may prefer other colours. The data is represented as a scatterplot.
def plotDataAndCov(data):
ACov = np.cov(data, rowvar=False, bias=True)
print 'Covariance matrix:\n', ACov
fig, ax = plt.subplots(nrows=1, ncols=2)
fig.set_size_inches(10, 10)
ax0 = plt.subplot(2, 2, 1)
# Choosing the colors
cmap = sns.color_palette("GnBu", 10)
sns.heatmap(ACov, cmap=cmap, vmin=0)
ax1 = plt.subplot(2, 2, 2)
# data can include the colors
if data.shape[1]==3:
c=data[:,2]
else:
c="#0A98BE"
ax1.scatter(data[:,0], data[:,1], c=c, s=40)
# Remove the top and right axes from the data plot
ax1.spines['right'].set_visible(False)
ax1.spines['top'].set_visible(False)
Uncorrelated data
Now that we have the plot function, we will generate some random data to visualize what the covariance matrix can tell us. We will start with some data drawn from a normal distribution with the NumPy function np.random.normal().
This function needs the mean, the standard deviation and the number of observations of the distribution as input. We will create two random variables of 300 observations with a standard deviation of 1. The first will have a mean of 1 and the second a mean of 2. If we randomly draw two sets of 300 observations from a normal distribution, both vectors will be uncorrelated.
np.random.seed(1234)
a1 = np.random.normal(2, 1, 300)
a2 = np.random.normal(1, 1, 300)
A = np.array([a1, a2]).T
A.shape
Note 1: We transpose the data with .T because the original shape is (2, 300) and we want the number of observations as rows (so with shape (300, 2)).
Note 2: We use np.random.seed function for reproducibility. The same random number will be used the next time we run the cell. Let’s check how the data looks like:
A[:10,:]
array([[ 2.47143516, 1.52704645],
[ 0.80902431, 1.7111124 ],
[ 3.43270697, 0.78245452],
[ 1.6873481 , 3.63779121],
[ 1.27941127, -0.74213763],
[ 2.88716294, 0.90556519],
[ 2.85958841, 2.43118375],
[ 1.3634765 , 1.59275845],
[ 2.01569637, 1.1702969 ],
[-0.24268495, -0.75170595]])
Nice, we have two column vectors; Now, we can check that the distributions are normal:
sns.distplot(A[:,0], color="#53BB04")
sns.distplot(A[:,1], color="#0A98BE")
plt.show()
plt.close()
We can see that the distributions have equivalent standard deviations but different means (1 and 2). So that’s exactly what we have asked for.
Now we can plot our dataset and its covariance matrix with our function:
plotDataAndCov(A)
plt.show()
plt.close()
Covariance matrix:
[[ 0.95171641 -0.0447816 ]
[-0.0447816 0.87959853]]
We can see on the scatterplot that the two dimensions are uncorrelated. Note that we have one dimension with a mean of 1 (y-axis) and the other with the mean of 2 (x-axis).
Also, the covariance matrix shows that the variance of each variable is very large (around 1) and the covariance of columns 1 and 2 is very small (around 0). Since we ensured that the two vectors are independent this is coherent. The opposite is not necessarily true: a covariance of 0 doesn’t guarantee independence.
Correlated data
Now, let’s construct dependent data by specifying one column from the other one.
np.random.seed(1234)
b1 = np.random.normal(3, 1, 300)
b2 = b1 + np.random.normal(7, 1, 300)/2.
B = np.array([b1, b2]).T
plotDataAndCov(B)
plt.show()
plt.close()
Covariance matrix:
[[ 0.95171641 0.92932561]
[ 0.92932561 1.12683445]]
The correlation between the two dimensions is visible on the scatter plot. We can see that a line could be drawn and used to predict y from x and vice versa. The covariance matrix is not diagonal (there are non-zero cells outside of the diagonal). That means that the covariance between dimensions is non-zero.
From this point with Covariance Matrcies, you can research further on the following:
Mean normalization
Standardization or normalization
Whitening
Zero-centering
Decorrelate
Rescaling