Below I have the following code that generates a heatmap that plots each point as a block. But I want to switch the appearence to more traditional heatmaps. It currently looks like
but I want to make it appear like
though since the dataset is all 0 it would be one color but this is for future data. Below I have attached the code that generates the first heatmap, I need to rewrite the code to change its appearence into the second one. I couldnt find the code on the matplotlib examples
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
try:
temp = [
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0]]
temp = np.array(temp)
column = ["2-12","2-12","2-12","2-12", "2-13", "2-13","2-13","2-13","2-14","2-14","2-14","2-14", "2-15", "2-15", "2-15", "2-15", "2-16","2-16","2-16","2-16", "2-17", "2-17", "2-17", "2-17", "2-18","2-18","2-18","2-18","2-19","2-19","2-19","2-19", "2-20","2-20","2-20","2-20", "2-21", "2-21", "2-21", "2-21","2-22","2-22","2-22","2-22"]
nodes = ["0-3", "4-7", "8-11", "22-15", "26-19", "20-23", "24-27", "28-31", "32-35", "36-39"]
fig, ax = plt.subplots()
im = ax.imshow(temp)
# We want to show all ticks...
ax.set_xticks(np.arange(len(column)))
ax.set_yticks(np.arange(len(nodes)))
# ... and label them with the respective list entries
ax.set_xticklabels(column)
ax.set_yticklabels(nodes)
# Rotate the tick labels and set their alignment.
plt.setp(ax.get_xticklabels(), rotation=45, ha="right",
rotation_mode="anchor")
# Loop over data dimensions and create text annotations.
for i in range(len(nodes)):
for j in range(len(column)):
text = ax.text(j, i, temp[i, j],
ha="center", va="center", color="w")
fig.tight_layout()
plt.show()
except ValueError:
pass
Related
My TF version is 2.9 and Python 3.8.
I have built an image binary classification CNN model and I am trying to get a confusion matrix.
The dataset structure is as follows.
train/
│------ benign/
│------ normal/
test/
│------ benign/
│------ normal/
The dataset configuration is as follows.
train_ds = tf.keras.utils.image_dataset_from_directory(
directory = train_data_dir,
labels="inferred",
validation_split=0.2,
subset="training",
seed=1337,
color_mode='grayscale',
image_size=(img_height, img_width),
batch_size=batch_size,
)
val_ds = tf.keras.utils.image_dataset_from_directory(
directory = train_data_dir,
labels="inferred",
validation_split=0.2,
subset="validation",
seed=1337,
color_mode='grayscale',
image_size=(img_height, img_width),
batch_size=batch_size,
)
test_ds = tf.keras.utils.image_dataset_from_directory(
directory = test_data_dir,
color_mode='grayscale',
seed=1337,
image_size=(img_height, img_width),
batch_size=batch_size,
)
I wrote the code referring to the following link to get the confusion matrix.
Reference Page
And this is my code about the confusion matrix.
predictions = model.predict(test_ds)
y_pred = []
y_true = []
# iterate over the dataset
for image_batch, label_batch in test_ds: # use dataset.unbatch() with repeat
# append true labels
y_true.append(label_batch)
# compute predictions
preds = model.predict(image_batch)
# append predicted labels
y_pred.append(np.argmax(preds, axis = - 1))
# convert the true and predicted labels into tensors
true_labels = tf.concat([item for item in y_true], axis = 0)
predicted_labels = tf.concat([item for item in y_pred], axis = 0)
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(true_labels, predicted_labels)
print(cm)
y_pred and y_true were obtained from test_ds as above, and the results of confusion matrix were as follows.
[[200 0]
[200 0]]
So I tried outputting true_labels and predicted_labels, and confirmed that predicted_labels are both 0 as follows.
print(true_labels)
<tf.Tensor: shape=(400,), dtype=int32, numpy=
array([0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0,
1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0,
0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0,
0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0,
1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0,
0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1,
0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1,
1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1,
1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1,
0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0,
1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0,
0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1,
0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0,
1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0,
1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0,
0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1,
1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0,
0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0,
0, 0, 1, 1])>
print(predicted_labels)
<tf.Tensor: shape=(400,), dtype=int64, numpy=
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0], dtype=int64)>
I'm not sure why predicted_labels are all zero.
But this is wrong. I think the following results are correct.
[[200 0]
[0 200]]
What is wrong? I've been struggling for a few days. Please please help me.
Thanks a lot.
In case of Image Binary Classification, threshold should be used to obtain predict label after model.predict(test_ds). I found that modifying the code in my question y_pred.append(np.argmax(preds, axis = - 1)) to y_pred.append(np.where(preds > threshold, 1, 0)) solved the problem. Hope it was helpful to someone.
i am playing around with scikit image restoration package and successfully ran the unsupervised_wiener algorithm on some made up data. In this simple example it does what I expect, but on my more complicated dataset it returns a striped pattern with extreme values of -1 and 1.
I would like to fiddle with the parameters to better understand what is going on, but I get the error as stated in the question. I tried scikit image version 0.19.3 and downgraded to scikit image version 0.19.2, but the error remains.
The same goes for the "other parameters":https://scikit-image.org/docs/0.19.x/api/skimage.restoration.html#skimage.restoration.unsupervised_wiener
Can someone explain why I can't input parameters?
The example below contains a "scan" and a "point-spread-function". I convolve the scan with the point spread function and then reverse the process using the unsupervised wiener deconvolution.
import numpy as np
import matplotlib.pyplot as plt
from skimage import color, data, restoration
import pickle
rng = np.random.default_rng()
from scipy.signal import convolve2d as conv2
scan = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0.5, 0.5, 0.5, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0.5, 1, 0.5, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0.5, 1, 0.5, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0.5, 1, 0.5, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0.5, 0.5, 0.5, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
])
print(scan.shape)
psf = np.array([
[1, 1, 1, 1, 1],#1
[1, 0, 0, 0, 1],#2
[1, 0, 0, 0, 1],#3
[1, 0, 0, 0, 1],#4
[1, 1, 1, 1, 1]#5
])
psf = psf/(np.sum(psf))
print(psf)
scan_conv = conv2(scan, psf, 'same')
deconvolved1, _ = restoration.unsupervised_wiener(scan_conv, psf, max_num_iter=10)
fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(8, 5),
sharex=True, sharey=True)
ax[0].imshow(scan, vmin=scan.min(), vmax=1)
ax[0].axis('off')
ax[0].set_title('Data')
ax[1].imshow(scan_conv)
ax[1].axis('off')
ax[1].set_title('Data_distorted')
ax[2].imshow(deconvolved1)
ax[2].axis('off')
ax[2].set_title('restoration1')
fig.tight_layout()
plt.show()
Let say I have test.csv
filename
1 a.jpg
2 b.jpb
then I have test image folder
/test
test_dataset = torchvision.datasets.ImageFolder(root= path + 'test/',transform=trans)
this will bring all the test files
If I want to make a submission file after done training, how should I link test folder's name and submission.csv file name?
%%time
from torch.autograd import Variable
results = []
with torch.no_grad():
model.eval()
print('start')
for num, data in enumerate(test_loader):
#print(num)
imgs, label = data
imgs,labels = imgs.to(device), label.to(device)
test = Variable(imgs)
output = model(test)
ps = torch.exp(output)
top_p, top_class = ps.topk(1, dim = 1)
#print(top_class)
results += top_class.cpu().numpy().tolist()
predictions = np.array(results).flatten()
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
0,
1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
How should I know which result is from which file?
For Cassandra's JMX metric TombstoneScannedHistogram RecentValues mean?
$ nodetool sjk mxdump -q "org.apache.cassandra.metrics:type=Table,keyspace=my_keyspace,scope=Person,name=TombstoneScannedHistogram"
"RecentValues" : [ 641, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23, 0, 0, 0, 0, 45, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ],
For example, does the 45 in the above example, mean for that time duration, there were 45 tombstones that were scanned for a single query?
Histogram describes number of occurrences at each level, where level is calculated as series where next value is 1.2 greater than previous, so it will be 1,2,3,4,5,6,7,8,10,12,14, .... until 25109160 (see this slide and all other slides if you interested about internals).
in this case, it means that there were 641 cases of 1 tombstone, then there were 23 cases of M tombstones, and 45 cases of N tombstones (you calculate M & N as per formula). There same way other histograms are calculated - for latencies, etc.
I am trying to read a .dat file using bash script.
The value contains file size I want to grep values bigger than 0 except first column. Greater than zero value can come in any row.
I have awk script to read line by line.
1349848860, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
1349848920, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
1349848980, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
1349849040, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
1349849100, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0
1349849160, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 227.736, 2, 0, 29378, 0, 0, 0, 0, 0
1349849220, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
1349849280, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
1349849340, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
1349851200, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
1349851260, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
1349851320, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0
1349851380, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 227.736, 2, 0, 29620, 0, 0, 0, 0, 0
1349851440, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
#!/bin/bash
FILENAME=$1
awk '{kount++;print kount, $0} END{print "\nTotal " kount " lines read"}' $FILENAME
awk '{print $13}' $FILENAME
Desired Output -
227.736, 2, 29378
227.736, 2, 29620
Thanks for help.
Naveen
If I understand your needs :
awk -F"," '{for (i=2;i<=NF;i++){if ($i > 1) {print}}}' file.dat
OUTPUT
1349939700, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 2, 0, 69832, 0, 0, 0, 0, 0
1349939700, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 2, 0, 69832, 0, 0, 0, 0, 0
1349939700, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 2, 0, 69832, 0, 0, 0, 0, 0
1349939700, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 2, 0, 69832, 0, 0, 0, 0, 0
1349939700, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 2, 0, 69832, 0, 0, 0, 0, 0
1349939880, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 1, 0, 68552, 0, 0, 0, 0, 0
1349939880, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 1, 0, 68552, 0, 0, 0, 0, 0
1349939880, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 1, 0, 68552, 0, 0, 0, 0, 0
1349939880, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 1, 0, 68552, 0, 0, 0, 0, 0
1349940000, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 1, 0, 73826, 0, 0, 0, 0, 0
1349940000, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 1, 0, 73826, 0, 0, 0, 0, 0
1349940000, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 1, 0, 73826, 0, 0, 0, 0, 0
1349940000, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 1, 0, 73826, 0, 0, 0, 0, 0
From what I understand, you would like to grep all the lines that contain a number other than 0 or 1, not counting the first number at the beginning of the line. How about
egrep ',.*([2-9]|[0-9.]{4,})' tmp.txt
This will print, from your example, the lines
1349939700, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 2, 0, 69832, 0, 0, 0, 0, 0
1349939880, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 1, 0, 68552, 0, 0, 0, 0, 0
1349940000, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 1, 0, 73826, 0, 0, 0, 0, 0
If you want to take the contents of every 13th (i.e 14353), 16th (i.e 450.03) and 19th (i.e 69832) column of each line (based on your input) in this file then try the following:
awk < test.dat -F ', ' '{print $13,$16,$19}'
where test.dat in your data file.
This will output:
14353 450.03 69832
0 0 0
0 0 0
14353 450.03 68552
0 0 0
14353 450.03 73826