how to read .dat file and grep specific value - linux

I am trying to read a .dat file using bash script.
The value contains file size I want to grep values bigger than 0 except first column. Greater than zero value can come in any row.
I have awk script to read line by line.
1349848860, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
1349848920, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
1349848980, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
1349849040, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
1349849100, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0
1349849160, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 227.736, 2, 0, 29378, 0, 0, 0, 0, 0
1349849220, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
1349849280, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
1349849340, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
1349851200, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
1349851260, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
1349851320, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0
1349851380, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 227.736, 2, 0, 29620, 0, 0, 0, 0, 0
1349851440, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
#!/bin/bash
FILENAME=$1
awk '{kount++;print kount, $0} END{print "\nTotal " kount " lines read"}' $FILENAME
awk '{print $13}' $FILENAME
Desired Output -
227.736, 2, 29378
227.736, 2, 29620
Thanks for help.
Naveen

If I understand your needs :
awk -F"," '{for (i=2;i<=NF;i++){if ($i > 1) {print}}}' file.dat
OUTPUT
1349939700, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 2, 0, 69832, 0, 0, 0, 0, 0
1349939700, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 2, 0, 69832, 0, 0, 0, 0, 0
1349939700, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 2, 0, 69832, 0, 0, 0, 0, 0
1349939700, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 2, 0, 69832, 0, 0, 0, 0, 0
1349939700, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 2, 0, 69832, 0, 0, 0, 0, 0
1349939880, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 1, 0, 68552, 0, 0, 0, 0, 0
1349939880, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 1, 0, 68552, 0, 0, 0, 0, 0
1349939880, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 1, 0, 68552, 0, 0, 0, 0, 0
1349939880, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 1, 0, 68552, 0, 0, 0, 0, 0
1349940000, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 1, 0, 73826, 0, 0, 0, 0, 0
1349940000, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 1, 0, 73826, 0, 0, 0, 0, 0
1349940000, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 1, 0, 73826, 0, 0, 0, 0, 0
1349940000, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 1, 0, 73826, 0, 0, 0, 0, 0

From what I understand, you would like to grep all the lines that contain a number other than 0 or 1, not counting the first number at the beginning of the line. How about
egrep ',.*([2-9]|[0-9.]{4,})' tmp.txt
This will print, from your example, the lines
1349939700, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 2, 0, 69832, 0, 0, 0, 0, 0
1349939880, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 1, 0, 68552, 0, 0, 0, 0, 0
1349940000, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 1, 0, 73826, 0, 0, 0, 0, 0

If you want to take the contents of every 13th (i.e 14353), 16th (i.e 450.03) and 19th (i.e 69832) column of each line (based on your input) in this file then try the following:
awk < test.dat -F ', ' '{print $13,$16,$19}'
where test.dat in your data file.
This will output:
14353 450.03 69832
0 0 0
0 0 0
14353 450.03 68552
0 0 0
14353 450.03 73826

Related

How can I draw the Confusion Matrix when using image_dataset_from_directory in Tensorflow2.x?

My TF version is 2.9 and Python 3.8.
I have built an image binary classification CNN model and I am trying to get a confusion matrix.
The dataset structure is as follows.
train/
│------ benign/
│------ normal/
test/
│------ benign/
│------ normal/
The dataset configuration is as follows.
train_ds = tf.keras.utils.image_dataset_from_directory(
directory = train_data_dir,
labels="inferred",
validation_split=0.2,
subset="training",
seed=1337,
color_mode='grayscale',
image_size=(img_height, img_width),
batch_size=batch_size,
)
val_ds = tf.keras.utils.image_dataset_from_directory(
directory = train_data_dir,
labels="inferred",
validation_split=0.2,
subset="validation",
seed=1337,
color_mode='grayscale',
image_size=(img_height, img_width),
batch_size=batch_size,
)
test_ds = tf.keras.utils.image_dataset_from_directory(
directory = test_data_dir,
color_mode='grayscale',
seed=1337,
image_size=(img_height, img_width),
batch_size=batch_size,
)
I wrote the code referring to the following link to get the confusion matrix.
Reference Page
And this is my code about the confusion matrix.
predictions = model.predict(test_ds)
y_pred = []
y_true = []
# iterate over the dataset
for image_batch, label_batch in test_ds: # use dataset.unbatch() with repeat
# append true labels
y_true.append(label_batch)
# compute predictions
preds = model.predict(image_batch)
# append predicted labels
y_pred.append(np.argmax(preds, axis = - 1))
# convert the true and predicted labels into tensors
true_labels = tf.concat([item for item in y_true], axis = 0)
predicted_labels = tf.concat([item for item in y_pred], axis = 0)
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(true_labels, predicted_labels)
print(cm)
y_pred and y_true were obtained from test_ds as above, and the results of confusion matrix were as follows.
[[200 0]
[200 0]]
So I tried outputting true_labels and predicted_labels, and confirmed that predicted_labels are both 0 as follows.
print(true_labels)
<tf.Tensor: shape=(400,), dtype=int32, numpy=
array([0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0,
1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0,
0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0,
0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0,
1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0,
0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1,
0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1,
1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1,
1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1,
0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0,
1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0,
0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1,
0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0,
1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0,
1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0,
0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1,
1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0,
0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0,
0, 0, 1, 1])>
print(predicted_labels)
<tf.Tensor: shape=(400,), dtype=int64, numpy=
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0], dtype=int64)>
I'm not sure why predicted_labels are all zero.
But this is wrong. I think the following results are correct.
[[200 0]
[0 200]]
What is wrong? I've been struggling for a few days. Please please help me.
Thanks a lot.
In case of Image Binary Classification, threshold should be used to obtain predict label after model.predict(test_ds). I found that modifying the code in my question y_pred.append(np.argmax(preds, axis = - 1)) to y_pred.append(np.where(preds > threshold, 1, 0)) solved the problem. Hope it was helpful to someone.

How Can I declare constraints in Xpress IVE?

I am trying to write a model in Xpress IVE however I got
error101: Incompatible types for operator ('mpvar' * 'mpvar' not defined). error.
I tried to write this constraint but I couldn't make it.
The two consecutive characters on the string must be positioned to the neighboring
nodes of the grid.
I think, my model is true and all of my decision variables is true.
Can anyone help me about this issue?
Here is my code:
grid := 16
length := 8
!sample declarations section
declarations
! Declaring S and N array for the input
S: array(1..length) of integer
N: array(1..grid,1..grid) of integer
! Declaring decision variables
X: array(1..length, 1..grid) of mpvar
V: array(1..grid) of mpvar
C: array(1..grid,1..grid) of mpvar
W: real
constraint1, constraint2,constraint3: linctr
end-declarations
! Decision Variable Declaration
forall(i in 1..length, k in 1..grid) X(i,k) is_binary
forall(k in 1..grid) V(k) is_binary
forall(l in 1..grid) V(l) is_binary
forall(k in 1..grid, l in 1..grid) C(k,l) is_binary
!Input String
S:: [ 1, 0, 0, 1, 0, 1, 1, 0 ]
! Neighbours in the grid.
N:: [ 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0,
0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0]
! Finding consecutive 1's in the string
forall(i in 1..length-1) do
if S(i) = 1 and S(i+1) = 1
then W := W + 1
end-if
end-do
! Declaring Constraints
! Constraint 1
forall(k in 1..grid) constraint1 := sum(i in 1..length) X(i,k) <= 1
! Constraint 2
forall(i in 1..length) constraint2 := sum(k in 1..grid) X(i,k) = 1
!Constraint 3
forall( i in 1..length - 1 ) constraint3 := (sum(j in 1..grid)(sum(k in 1..grid) N(k,j) * X(i,k) * X(i + 1,j))) = 1
Since you are creating the product of two variables in Constraint3, your problem is no longer linear but now quadratic (thus non-linear). This means you have to use the mmnl (non-linear) Mosel module. Putting
uses "mmnl"
at the top of your model should do that. It enables multiplication of decision variables.
Note that due to the quadratic terms in it, your Constraint3 will no longer be of type linctr. It will now be nlctr and you have to adjust this in the declaration.

Where should I put the csv test file in PyTorch dataloader?

Let say I have test.csv
filename
1 a.jpg
2 b.jpb
then I have test image folder
/test
test_dataset = torchvision.datasets.ImageFolder(root= path + 'test/',transform=trans)
this will bring all the test files
If I want to make a submission file after done training, how should I link test folder's name and submission.csv file name?
%%time
from torch.autograd import Variable
results = []
with torch.no_grad():
model.eval()
print('start')
for num, data in enumerate(test_loader):
#print(num)
imgs, label = data
imgs,labels = imgs.to(device), label.to(device)
test = Variable(imgs)
output = model(test)
ps = torch.exp(output)
top_p, top_class = ps.topk(1, dim = 1)
#print(top_class)
results += top_class.cpu().numpy().tolist()
predictions = np.array(results).flatten()
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
0,
1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
How should I know which result is from which file?

How do i reformat my heatmap using matplotlib

Below I have the following code that generates a heatmap that plots each point as a block. But I want to switch the appearence to more traditional heatmaps. It currently looks like
but I want to make it appear like
though since the dataset is all 0 it would be one color but this is for future data. Below I have attached the code that generates the first heatmap, I need to rewrite the code to change its appearence into the second one. I couldnt find the code on the matplotlib examples
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
try:
temp = [
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0]]
temp = np.array(temp)
column = ["2-12","2-12","2-12","2-12", "2-13", "2-13","2-13","2-13","2-14","2-14","2-14","2-14", "2-15", "2-15", "2-15", "2-15", "2-16","2-16","2-16","2-16", "2-17", "2-17", "2-17", "2-17", "2-18","2-18","2-18","2-18","2-19","2-19","2-19","2-19", "2-20","2-20","2-20","2-20", "2-21", "2-21", "2-21", "2-21","2-22","2-22","2-22","2-22"]
nodes = ["0-3", "4-7", "8-11", "22-15", "26-19", "20-23", "24-27", "28-31", "32-35", "36-39"]
fig, ax = plt.subplots()
im = ax.imshow(temp)
# We want to show all ticks...
ax.set_xticks(np.arange(len(column)))
ax.set_yticks(np.arange(len(nodes)))
# ... and label them with the respective list entries
ax.set_xticklabels(column)
ax.set_yticklabels(nodes)
# Rotate the tick labels and set their alignment.
plt.setp(ax.get_xticklabels(), rotation=45, ha="right",
rotation_mode="anchor")
# Loop over data dimensions and create text annotations.
for i in range(len(nodes)):
for j in range(len(column)):
text = ax.text(j, i, temp[i, j],
ha="center", va="center", color="w")
fig.tight_layout()
plt.show()
except ValueError:
pass

What does Cassandra's JMX metric TombstoneScannedHistogram RecentValues mean?

For Cassandra's JMX metric TombstoneScannedHistogram RecentValues mean?
$ nodetool sjk mxdump -q "org.apache.cassandra.metrics:type=Table,keyspace=my_keyspace,scope=Person,name=TombstoneScannedHistogram"
"RecentValues" : [ 641, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23, 0, 0, 0, 0, 45, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ],
For example, does the 45 in the above example, mean for that time duration, there were 45 tombstones that were scanned for a single query?
Histogram describes number of occurrences at each level, where level is calculated as series where next value is 1.2 greater than previous, so it will be 1,2,3,4,5,6,7,8,10,12,14, .... until 25109160 (see this slide and all other slides if you interested about internals).
in this case, it means that there were 641 cases of 1 tombstone, then there were 23 cases of M tombstones, and 45 cases of N tombstones (you calculate M & N as per formula). There same way other histograms are calculated - for latencies, etc.

Resources