Cutting Element in Unix Based on Column Value

Cutting Element in Unix Based on Column Value - linux

Without a shell script, in a single line. What command can help you cut from a row based on the column value
For example:
In
118 Balboni,Steve 23
11 Baker,Doug 0
120 Armas,Tony 13
133 Allanson,Andy 5
158 Baines,Harold 13
33 Bando,Chris 1
44 Adduci,James 1
50 Aguayo,Luis 3
5 Allen,Rod 0
94 Anderson,Brady 1
IF there 3rd row is not zero, how do I remove the row entirely in one statement? Is this possible in unix?

Assuming that the question is really asking about 'if the third column is non-zero, do not print it' or (equivalently) 'only print the row if the third column is 0':
Using awk:
awk '$3 == 0' data
(If the third column is zero, print the input; otherwise, ignore it. You could add { print } after the 0 to make the action explicit.)
Using perl:
perl -nae 'print if $F[2] == 0' data
Using sed:
sed -n '/ 0$/p' data

Using grep:
grep '[^0-9]0$' input

This does the inplace replacement.
perl -i -F -pane 'undef $_ if($F[2]!=0)' your_file
tested:
> cat temp
118 Balboni,Steve 23
11 Baker,Doug 0
120 Armas,Tony 13
133 Allanson,Andy 5
158 Baines,Harold 13
33 Bando,Chris 1
44 Adduci,James 1
50 Aguayo,Luis 3
5 Allen,Rod 0
94 Anderson,Brady 1
>
>
> perl -i -F -pane 'undef $_ if($F[2]!=0)' temp
> cat temp
11 Baker,Doug 0
5 Allen,Rod 0
>

If you wish to print lines that have no third column as well as those in which the 3rd column is explicitly 0 (ie, if you consider a blank field to be zero), try:
awk '!$3'
If you do not want to print lines with only 2 columns, try:
awk 'NF>2 && !$3'

Related

Adding a number to column [line by line]

I have a text file named text: The row and columns are:
1 A 18 -180
2 B 19 -180
3 C 20 -150
50 D 21 -100
128 E 22 -130
10 F 23 -0
10 G 23 -0
What I want to do is to print out the 4th column with adding a constant number to each of the lines (except ==0). To do this is what I have done.
#!/bin/bash
FILE="/dir/text"
while IFS= read -r line
do
echo "$line"
done <"$FILE"
I can read the fourth column, but at the same time I want to put an argument $1 which will add a constant number to all of the lines in the fourth column except any line of the fourth column has ==0.
UPDATE:
The Desired output would be like: [the line has zeros are ignored]
-160
-160
-130
-80
-110
For example, the program name is example.sh. I want to add a number to the fourth column using an argument. Therefore it would be:
example.sh $1
where $1 could be any number I want to add in the 4th column.

You should awk here which will be faster than bash.
awk -v number="100" '$4!=0{$4+=number} 1' Input_file
number is an awk variable where you could set its value as per your need.
Explanation: Adding detailed explanation for above code.
awk -v number="100" ' ##Starting awk program from here and creating a variable number whose value is 100.
$4!=0{ ##Checking condition if 4th column is NOT zero then do following.
$4+=number ##Adding variable number to 4th column here.
}
1 ##Mentioning 1 will print edited/non-edited lines.
' Input_file ##mentioning Input_file name here.

In order to preserve your formatting using awk while adding the values to the 4th field, you can calculate the new value of the 4th field and then use sub to change the value without forcing awk to recalculate the fields and removing the whitespace.
For example, with your file stored as text and adding a value of 180 to the 4th field (except where 0), you could do:
awk -v n=180 '$4!=0 {newval=$4+n; sub(/[0-9]+$/,newval)}1' text
Doing so would produce the following output:
$ awk -v n=180 '$4!=0 {newval=$4+n; sub(/[0-9]+$/,newval)}1' text
1 A 18 0
2 B 19 0
3 C 20 30
50 D 21 80
128 E 22 50
10 F 23 -0
10 G 23 -0
If called withing a shell script, you could pass your $1 parameter as:
awk -v n="$1" '$4!=0 {newval=$4+n; sub(/[0-9]+$/,newval)}1' text
Though I would suggest checking that an argument has been provided to the script with:
[ -z "$1" ] && {
echo "error: value require as argument"
exit 1
}
or you can provide a default value -- up to you.

With bash:
while read -ra a; do [[ ${a[3]} != -0 ]] && ((a[3]+=42)); echo "${a[#]}"; done < file
Output:
1 A 18 -138
2 B 19 -138
3 C 20 -108
50 D 21 -58
128 E 22 -88
10 F 23 -0
10 G 23 -0

Select rows in one file based on specific values in the second file (Linux)

I have two files:
One is "total.txt". It has two columns: the first column is natural numbers (indicator) ranging from 1 to 20, the second column contains random numbers.
1 321
1 423
1 2342
1 7542
2 789
2 809
2 5332
2 6762
2 8976
3 42
3 545
... ...
20 432
20 758
The other one is "index.txt". It has three columns:(1.indicator, 2:low value, 3: high value)
1 400 5000
2 600 800
11 300 4000
I want to output the rows of "total.txt" file with first column matches with the first column of "index.txt" file. And at the same time, the second column of output results must be larger than (>) the second column of the "index.txt" and smaller than (<) the third column of the "index.txt".
The expected result is as follows:
1 423
1 2342
2 809
2 5332
2 6762
11 ...
11 ...
I have tried this:
awk '$1==(awk 'print($1)' index.txt) && $2 > (awk 'print($2)' index.txt) && $1 < (awk 'print($2)' index.txt)' total.txt > result.txt
But it failed!
Can you help me with this? Thank you!

You need to read both files in the same awk script. When you read index.txt, store the other columns in an array.
awk 'FNR == NR { low[$1] = $2; high[$1] = $3; next }
$2 > low[$1] && $2 < high[$1] { print }' index.txt total.txt
FNR == NR is the common awk idiom to detect when you're processing the first file.

Use join like Barmar said:
# To join on the first columns
join -11 -21 total.txt index.txt
And if the files aren't sorted in lexical order by the first column then:
join -11 -21 <(sort -k1,1 total.txt) <(sort -k1,1 index.txt)

AWK--Comparing the value of two variables in two different files

I have two text files A.txt and B.txt. Each line of A.txt
A.txt
100
222
398
B.txt
1 2 103 2
4 5 1026 74
7 8 209 55
10 11 122 78
What I am looking for is something like this:
for each line of A
search B;
if (the value of third column in a line of B - the value of the variable in A > 10)
print that line of B;
Any awk for doing that??

How about something like this,
I had some troubles understanding your question, but maybe this will give you some pointers,
#!/bin/bash
# Read intresting values from file2 into an array,
for line in $(cat 2.txt | awk '{print $3}')
do
arr+=($line)
done
# Linecounter,
linenr=0
# Loop through every line in file 1,
for val in $(cat 1.txt)
do
# Increment linecounter,
((linenr++))
# Loop through every element in the array (containing values from 3 colum from file2)
for el in "${!arr[#]}";
do
# If that value - the value from file 1 is bigger than 10, print values
if [[ $((${arr[$el]} - $val )) -gt 10 ]]
then
sed -n "$(($el+1))p" 2.txt
# echo "Value ${arr[$el]} (on line $(($el+1)) from 2.txt) - $val (on line $linenr from 1.txt) equals $((${arr[$el]} - $val )) and is hence bigger than 10"
fi
done
done
Note,
This is a quick and dirty thing, there is room for improvements. But I think it'll do the job.

Use awk like this:
cat f1
1
4
9
16
cat f2
2 4 10 8
3 9 20 8
5 1 15 8
7 0 30 8
awk 'FNR==NR{a[NR]=$1;next} $3-a[FNR] < 10' f1 f2
2 4 10 8
5 1 15 8
UPDATE: Based on OP's edited question:
awk 'FNR==NR{a[NR]=$1;next} {for (i in a) if ($3-a[i] > 10) print}'
and see how simple awk based solution is as compared to nested for loops.

How to extract one column from multiple files, and paste those columns into one file?

I want to extract the 5th column from multiple files, named in a numerical order, and paste those columns in sequence, side by side, into one output file.
The file names look like:
sample_problem1_part1.txt
sample_problem1_part2.txt
sample_problem2_part1.txt
sample_problem2_part2.txt
sample_problem3_part1.txt
sample_problem3_part2.txt
......
Each problem file (1,2,3...) has two parts (part1, part2). Each file has the same number of lines.
The content looks like:
sample_problem1_part1.txt
1 1 20 20 1
1 7 21 21 2
3 1 22 22 3
1 5 23 23 4
6 1 24 24 5
2 9 25 25 6
1 0 26 26 7
sample_problem1_part2.txt
1 1 88 88 8
1 1 89 89 9
2 1 90 90 10
1 3 91 91 11
1 1 92 92 12
7 1 93 93 13
1 5 94 94 14
sample_problem2_part1.txt
1 4 330 30 a
3 4 331 31 b
1 4 332 32 c
2 4 333 33 d
1 4 334 34 e
1 4 335 35 f
9 4 336 36 g
The output should look like: (in a sequence of problem1_part1, problem1_part2, problem2_part1, problem2_part2, problem3_part1, problem3_part2,etc.,)
1 8 a ...
2 9 b ...
3 10 c ...
4 11 d ...
5 12 e ...
6 13 f ...
7 14 g ...
I was using:
paste sample_problem1_part1.txt sample_problem1_part2.txt > \
sample_problem1_partall.txt
paste sample_problem2_part1.txt sample_problem2_part2.txt > \
sample_problem2_partall.txt
paste sample_problem3_part1.txt sample_problem3_part2.txt > \
sample_problem3_partall.txt
And then:
for i in `find . -name "sample_problem*_partall.txt"`
do
l=`echo $i | sed 's/sample/extracted_col_/'`
`awk '{print $5, $10}' $i > $l`
done
And:
paste extracted_col_problem1_partall.txt \
extracted_col_problem2_partall.txt \
extracted_col_problem3_partall.txt > \
extracted_col_problemall_partall.txt
It works fine with a few files, but it's a crazy method when the number of files is large (over 4000).
Could anyone help me with simpler solutions that are capable of dealing with multiple files, please?
Thanks!

Here's one way using awk and a sorted glob of files:
awk '{ a[FNR] = (a[FNR] ? a[FNR] FS : "") $5 } END { for(i=1;i<=FNR;i++) print a[i] }' $(ls -1v *)
Results:
1 8 a
2 9 b
3 10 c
4 11 d
5 12 e
6 13 f
7 14 g
Explanation:
For each line of input of each input file:
Add the files line number to an array with a value of column 5.
(a[FNR] ? a[FNR] FS : "") is a ternary operation, which is set up to build up the arrays value as a record. It simply asks if the files line number is already in the array. If so, add the arrays value followed by the default file separator before adding the fifth column. Else, if the line number is not in the array, don't prepend anything, just let it equal the fifth column.
At the end of the script:
Use a C-style loop to iterate through the array, printing each of the arrays values.

For only ~4000 files, you should be able to do:
find . -name sample_problem*_part*.txt | xargs paste
If find is giving names in the wrong order, pipe it to sort:
find . -name sample_problem*_part*.txt | sort ... | xargs paste

# print filenames in sorted order
find -name sample\*.txt | sort |
# extract 5-th column from each file and print it on a single line
xargs -n1 -I{} sh -c '{ cut -s -d " " -f 5 $0 | tr "\n" " "; echo; }' {} |
# transpose
python transpose.py ?
where transpose.py:
#!/usr/bin/env python
"""Write lines from stdin as columns to stdout."""
import sys
from itertools import izip_longest
missing_value = sys.argv[1] if len(sys.argv) > 1 else '-'
for row in izip_longest(*[column.split() for column in sys.stdin],
fillvalue=missing_value):
print " ".join(row)
Output
1 8 a
2 9 b
3 10 c
4 11 d
5 ? e
6 ? f
? ? g
Assuming the first and second files have less lines than the third one (missing values are replaced by '?').

Try this one. My script assumes that every file has the same number of lines.
# get number of lines
lines=$(wc -l sample_problem1_part1.txt | cut -d' ' -f1)
for ((i=1; i<=$lines; i++)); do
for file in sample_problem*; do
# get line number $i and delete everything except the last column
# and then print it
# echo -n means that no newline is appended
echo -n $(sed -n ${i}'s%.*\ %%p' $file)" "
done
echo
done
This works. For 4800 files, each 7 lines long it took 2 minutes 57.865 seconds on a AMD Athlon(tm) X2 Dual Core Processor BE-2400.
PS: The time for my script increases linearly with the number of lines. It would take very long time to merge files with 1000 lines. You should consider learning awk and use the script from steve. I tested it: For 4800 files, each with 1000 lines it took only 65 seconds!

You can pass awk output to paste and redirect it to a new file as follows:
paste <(awk '{print $3}' file1) <(awk '{print $3}' file2) <(awk '{print $3}' file3) > file.txt

summing the values of a column based on the id of another column using Linux tools

I have a file that is ' ' delimetered with a few fields. I know how to select a specific field and sum that by itself, but was wondering if there was a clean way of doing this using the linux utilities, otherwise I will do it in C.
An example of what I am talking about:
FILE (there are more fields, but these are the only ones that matter for this case):
1 36
2 96
5 84422
2 2
1 655
So, for this small example I would want:
1 691
2 98
5 84422
I am not sure if it is really worth trying to do using linux utilities, but since I am trying to expand my knowledge using those tools I figured I would ask if it was 1.) possible, 2.) practical.

$ perl -ne '/ /; $x{$`}+=$'\''; END { print "$_ $x{$_}\n" foreach keys %x; }' <<__END__
> 1 36
> 2 96
> 5 84422
> 2 2
> 1 655
> __END__
1 691
2 98
5 84422

awk '{ a[$1] += $2 } END { for (i in a) { print i " " a[i] } }' input.txt | sort -n

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Cutting Element in Unix Based on Column Value - linux

Using grep: grep '[^0-9]0$' input

If you wish to print lines that have no third column as well as those in which the 3rd column is explicitly 0 (ie, if you consider a blank field to be zero), try: awk '!$3' If you do not want to print lines with only 2 columns, try: awk 'NF>2 && !$3'

Related

Adding a number to column [line by line]

Select rows in one file based on specific values in the second file (Linux)

AWK--Comparing the value of two variables in two different files

How to extract one column from multiple files, and paste those columns into one file?

summing the values of a column based on the id of another column using Linux tools

Categories

Resources