How Can I declare constraints in Xpress IVE? - modeling

I am trying to write a model in Xpress IVE however I got
error101: Incompatible types for operator ('mpvar' * 'mpvar' not defined). error.
I tried to write this constraint but I couldn't make it.
The two consecutive characters on the string must be positioned to the neighboring
nodes of the grid.
I think, my model is true and all of my decision variables is true.
Can anyone help me about this issue?
Here is my code:
grid := 16
length := 8
!sample declarations section
declarations
! Declaring S and N array for the input
S: array(1..length) of integer
N: array(1..grid,1..grid) of integer
! Declaring decision variables
X: array(1..length, 1..grid) of mpvar
V: array(1..grid) of mpvar
C: array(1..grid,1..grid) of mpvar
W: real
constraint1, constraint2,constraint3: linctr
end-declarations
! Decision Variable Declaration
forall(i in 1..length, k in 1..grid) X(i,k) is_binary
forall(k in 1..grid) V(k) is_binary
forall(l in 1..grid) V(l) is_binary
forall(k in 1..grid, l in 1..grid) C(k,l) is_binary
!Input String
S:: [ 1, 0, 0, 1, 0, 1, 1, 0 ]
! Neighbours in the grid.
N:: [ 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0,
0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0]
! Finding consecutive 1's in the string
forall(i in 1..length-1) do
if S(i) = 1 and S(i+1) = 1
then W := W + 1
end-if
end-do
! Declaring Constraints
! Constraint 1
forall(k in 1..grid) constraint1 := sum(i in 1..length) X(i,k) <= 1
! Constraint 2
forall(i in 1..length) constraint2 := sum(k in 1..grid) X(i,k) = 1
!Constraint 3
forall( i in 1..length - 1 ) constraint3 := (sum(j in 1..grid)(sum(k in 1..grid) N(k,j) * X(i,k) * X(i + 1,j))) = 1

Since you are creating the product of two variables in Constraint3, your problem is no longer linear but now quadratic (thus non-linear). This means you have to use the mmnl (non-linear) Mosel module. Putting
uses "mmnl"
at the top of your model should do that. It enables multiplication of decision variables.
Note that due to the quadratic terms in it, your Constraint3 will no longer be of type linctr. It will now be nlctr and you have to adjust this in the declaration.

Related

How can I draw the Confusion Matrix when using image_dataset_from_directory in Tensorflow2.x?

My TF version is 2.9 and Python 3.8.
I have built an image binary classification CNN model and I am trying to get a confusion matrix.
The dataset structure is as follows.
train/
│------ benign/
│------ normal/
test/
│------ benign/
│------ normal/
The dataset configuration is as follows.
train_ds = tf.keras.utils.image_dataset_from_directory(
directory = train_data_dir,
labels="inferred",
validation_split=0.2,
subset="training",
seed=1337,
color_mode='grayscale',
image_size=(img_height, img_width),
batch_size=batch_size,
)
val_ds = tf.keras.utils.image_dataset_from_directory(
directory = train_data_dir,
labels="inferred",
validation_split=0.2,
subset="validation",
seed=1337,
color_mode='grayscale',
image_size=(img_height, img_width),
batch_size=batch_size,
)
test_ds = tf.keras.utils.image_dataset_from_directory(
directory = test_data_dir,
color_mode='grayscale',
seed=1337,
image_size=(img_height, img_width),
batch_size=batch_size,
)
I wrote the code referring to the following link to get the confusion matrix.
Reference Page
And this is my code about the confusion matrix.
predictions = model.predict(test_ds)
y_pred = []
y_true = []
# iterate over the dataset
for image_batch, label_batch in test_ds: # use dataset.unbatch() with repeat
# append true labels
y_true.append(label_batch)
# compute predictions
preds = model.predict(image_batch)
# append predicted labels
y_pred.append(np.argmax(preds, axis = - 1))
# convert the true and predicted labels into tensors
true_labels = tf.concat([item for item in y_true], axis = 0)
predicted_labels = tf.concat([item for item in y_pred], axis = 0)
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(true_labels, predicted_labels)
print(cm)
y_pred and y_true were obtained from test_ds as above, and the results of confusion matrix were as follows.
[[200 0]
[200 0]]
So I tried outputting true_labels and predicted_labels, and confirmed that predicted_labels are both 0 as follows.
print(true_labels)
<tf.Tensor: shape=(400,), dtype=int32, numpy=
array([0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0,
1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0,
0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0,
0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0,
1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0,
0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1,
0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1,
1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1,
1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1,
0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0,
1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0,
0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1,
0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0,
1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0,
1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0,
0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1,
1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0,
0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0,
0, 0, 1, 1])>
print(predicted_labels)
<tf.Tensor: shape=(400,), dtype=int64, numpy=
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0], dtype=int64)>
I'm not sure why predicted_labels are all zero.
But this is wrong. I think the following results are correct.
[[200 0]
[0 200]]
What is wrong? I've been struggling for a few days. Please please help me.
Thanks a lot.
In case of Image Binary Classification, threshold should be used to obtain predict label after model.predict(test_ds). I found that modifying the code in my question y_pred.append(np.argmax(preds, axis = - 1)) to y_pred.append(np.where(preds > threshold, 1, 0)) solved the problem. Hope it was helpful to someone.

Where should I put the csv test file in PyTorch dataloader?

Let say I have test.csv
filename
1 a.jpg
2 b.jpb
then I have test image folder
/test
test_dataset = torchvision.datasets.ImageFolder(root= path + 'test/',transform=trans)
this will bring all the test files
If I want to make a submission file after done training, how should I link test folder's name and submission.csv file name?
%%time
from torch.autograd import Variable
results = []
with torch.no_grad():
model.eval()
print('start')
for num, data in enumerate(test_loader):
#print(num)
imgs, label = data
imgs,labels = imgs.to(device), label.to(device)
test = Variable(imgs)
output = model(test)
ps = torch.exp(output)
top_p, top_class = ps.topk(1, dim = 1)
#print(top_class)
results += top_class.cpu().numpy().tolist()
predictions = np.array(results).flatten()
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
0,
1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
How should I know which result is from which file?

What does Cassandra's JMX metric TombstoneScannedHistogram RecentValues mean?

For Cassandra's JMX metric TombstoneScannedHistogram RecentValues mean?
$ nodetool sjk mxdump -q "org.apache.cassandra.metrics:type=Table,keyspace=my_keyspace,scope=Person,name=TombstoneScannedHistogram"
"RecentValues" : [ 641, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23, 0, 0, 0, 0, 45, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ],
For example, does the 45 in the above example, mean for that time duration, there were 45 tombstones that were scanned for a single query?
Histogram describes number of occurrences at each level, where level is calculated as series where next value is 1.2 greater than previous, so it will be 1,2,3,4,5,6,7,8,10,12,14, .... until 25109160 (see this slide and all other slides if you interested about internals).
in this case, it means that there were 641 cases of 1 tombstone, then there were 23 cases of M tombstones, and 45 cases of N tombstones (you calculate M & N as per formula). There same way other histograms are calculated - for latencies, etc.

Convert connected components to adjacency matrix

l have an adjacency matrix of 16 by 16.
Adjacency=[0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
From this adjacency matrix l applied scipy algorithm to determine the connected components as follow :
from scipy.sparse.csgraph import connected_components
supernodes=connected_components(Adjacency)
which returns 4 components :
(4, array([0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 2, 2, 3, 0], dtype=int32))
Now the algorithm returns 4 components (4 new nodes or 4 supernodes 0,1,2,3) and its associated adjacency matrix is of dim=(4,4)
My question is as follow :
Given the intial adjacency matrix of 16 by 16 and the connected components, how can l compute efficiently the new adjacency matrix ?
In other way, we need to merge all the nodes that are affected to the same connected component.
EDIT 1 :
Here a concrete example. Given the following adjacency matrix of 6 nodes, dim=-6,6) :
Adjacency_matrix=[[0,1,1,0,0,1],
[1,0,0,1,0,0],
[1,0,0,0,1,1],
[0,1,0,0,1,0],
[0,0,1,1,0,0],
[1,0,1,0,0,0]]
Given three supernodes as follow :
supernodes[0]=[0,2]# supernode 0 merges node 0 and 2
supernodes[1]=[1,4]#supernode 1 merges node 1 and 4
supernodes[2]=[3,5]#supernode 2 merges node 3 and 5
The supposed output :
Adjacency matrix of 3 supernodes dim=(3,3)
reduced_adjacency_matrix=[[0,1,1],
[1,0,1],
[1,1,0]]
What does it mean ?
For instance, consider the first supernodes[0]=[0,2]. The idea is as follow :
A) if i and j are in the same supernode then adjacency[i,j]=0
B)if i and j are in the same supernode and i or j has connection with other nodes other than i and j set 1
Thank you for your help.

how to read .dat file and grep specific value

I am trying to read a .dat file using bash script.
The value contains file size I want to grep values bigger than 0 except first column. Greater than zero value can come in any row.
I have awk script to read line by line.
1349848860, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
1349848920, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
1349848980, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
1349849040, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
1349849100, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0
1349849160, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 227.736, 2, 0, 29378, 0, 0, 0, 0, 0
1349849220, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
1349849280, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
1349849340, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
1349851200, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
1349851260, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
1349851320, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0
1349851380, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 227.736, 2, 0, 29620, 0, 0, 0, 0, 0
1349851440, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
#!/bin/bash
FILENAME=$1
awk '{kount++;print kount, $0} END{print "\nTotal " kount " lines read"}' $FILENAME
awk '{print $13}' $FILENAME
Desired Output -
227.736, 2, 29378
227.736, 2, 29620
Thanks for help.
Naveen
If I understand your needs :
awk -F"," '{for (i=2;i<=NF;i++){if ($i > 1) {print}}}' file.dat
OUTPUT
1349939700, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 2, 0, 69832, 0, 0, 0, 0, 0
1349939700, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 2, 0, 69832, 0, 0, 0, 0, 0
1349939700, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 2, 0, 69832, 0, 0, 0, 0, 0
1349939700, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 2, 0, 69832, 0, 0, 0, 0, 0
1349939700, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 2, 0, 69832, 0, 0, 0, 0, 0
1349939880, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 1, 0, 68552, 0, 0, 0, 0, 0
1349939880, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 1, 0, 68552, 0, 0, 0, 0, 0
1349939880, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 1, 0, 68552, 0, 0, 0, 0, 0
1349939880, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 1, 0, 68552, 0, 0, 0, 0, 0
1349940000, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 1, 0, 73826, 0, 0, 0, 0, 0
1349940000, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 1, 0, 73826, 0, 0, 0, 0, 0
1349940000, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 1, 0, 73826, 0, 0, 0, 0, 0
1349940000, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 1, 0, 73826, 0, 0, 0, 0, 0
From what I understand, you would like to grep all the lines that contain a number other than 0 or 1, not counting the first number at the beginning of the line. How about
egrep ',.*([2-9]|[0-9.]{4,})' tmp.txt
This will print, from your example, the lines
1349939700, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 2, 0, 69832, 0, 0, 0, 0, 0
1349939880, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 1, 0, 68552, 0, 0, 0, 0, 0
1349940000, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 14353, 1, 0, 450.03, 1, 0, 73826, 0, 0, 0, 0, 0
If you want to take the contents of every 13th (i.e 14353), 16th (i.e 450.03) and 19th (i.e 69832) column of each line (based on your input) in this file then try the following:
awk < test.dat -F ', ' '{print $13,$16,$19}'
where test.dat in your data file.
This will output:
14353 450.03 69832
0 0 0
0 0 0
14353 450.03 68552
0 0 0
14353 450.03 73826

Resources