Bash simple search script with individual results [duplicate] - linux

This question already has answers here:
How to grep for the whole word
(7 answers)
Closed 3 years ago.
I have a script that searches through a file and displays the results.
However there is a problem for example when i search 1 the following results are given:
1 B C
11 D E
12 B C
13 D E
When i search for 1, I only want it to show the 1 not also
11 D E
12 B C
13 D E
Is this possible?
echo "$#" | sed 's/[[:space:]]/.*/g' | xargs -Ifile grep -Ei 'file' text.txt

You can use :
grep -w "1" <filename>
Explaination:
-w, --word-regexp
The expression is searched for as a word
Output without w:
grep "1" abc.txt
1
12
13
111
123
312
412
Output with w:
grep -w "1" abc.txt
1
When content of abc.txt is :
1
12
13
111
123
312
412

Try this in the script:
grep -o "\b$1\b" <file_name>
Example
vals=1
cat word.txt
1 B C
11 D E
12 B C
13 D E
grep -o "\b${vals}\b" word.txt
1

You can use -w flag with grep command. Please check the screenshot below.

Related

I have two huge sequencefiles where i want to extract the same linenumbers from file1 in file2

I have my two sequencefiles and I have a list of rows/lines of interest from file1. I want to extract the lines with the same linenumber as in file1. The list is just 1 column of numbers.
I tried using awk in a loop, but all I get is an empty file as output file.
My code looks like this:
for i in <listfile>;
do awk -F lnr="$i" 'NR==lnr' <file2> > outputfile
The output file is created but is just empty.
I could not find this question being asked before, but if so sorry for wasting your time
If I understand the question - file 1 has a list of "line numbers" and you desire to print those lines in file2:
awk 'FNR==NR{line[$1]=1;next}{if(line[FNR]==1)print FNR, $0}' file1 file2
Given the input...
for i in {a..z}; do echo $i; done > /tmp/list-1
for i in {z..a}; do echo $i; done > /tmp/list-2
The current line of each file will be stored in FNR, so you can use that.
$ awk -v a=4 -v b=9 'FNR >= a && FNR <= b { print FILENAME, NR, FNR, $0 }' /tmp/list-*
Sample output:
/tmp/list-1 4 4 d
/tmp/list-1 5 5 e
/tmp/list-1 6 6 f
/tmp/list-1 7 7 g
/tmp/list-1 8 8 h
/tmp/list-1 9 9 i
/tmp/list-2 30 4 w
/tmp/list-2 31 5 v
/tmp/list-2 32 6 u
/tmp/list-2 33 7 t
/tmp/list-2 34 8 s
/tmp/list-2 35 9 r

unix join command to return all columns in one file

I have two files that I am joining on one column. After the join, I just want the output to be all of the columns, in the original order, from only one of the files. For example:
cat file1.tsv
1 a ant
2 b bat
3 c cat
8 d dog
9 e eel
cat file2.tsv
1 I
2 II
3 III
4 IV
5 V
join -1 1 -2 1 file1.tsv file2.tsv -t $'\t' -o 1.1,1.2,1.3
1 a ant
2 b bat
3 c cat
I know I an use -o 1.1,1.2.. notation but my file has over two dozen columns. Is there some wildcard that I can use to say -o 1.* or something?
I'm not aware of wildcards in the format string.
From your desired output I think that what you want may be achievable like so without having to specify all the enumerations:
grep -f <(awk '{print $1}' file2.tsv ) file1.tsv
1 a ant
2 b bat
3 c cat
Or as an awk-only solution:
awk '{if(NR==FNR){a[$1]++}else{if($1 in a){print}}}' file2.tsv file1.tsv
1 a ant
2 b bat
3 c cat

extracting two ranges of lines of a file a and putting them as a data block with shell commands

I have two blocks of data in a file, say foo.txt like the following:
a 1
b 2
c 3
d 4
e 5
f 6
g 7
h 8
i 9
I'd like to extract rows 2:4 and 6:8 and put them as the following:
b 2 f 6
c 3 g 7
d 4 h 8
I could try using auxiliary files:
sed -n '2,4p' foo.txt > tmp1; sed -n '6,8p' foo.txt > tmp2; paste tmp1 tmp2 > output; rm tmp1 tmp2
But is there a better way to do it without auxiliary files? Thanks!
Using process substitution:
$ paste <(sed -n '2,4p' foo.txt) <(sed -n '6,8p' foo.txt) > output
$ cat output
b 2 f 6
c 3 g 7
d 4 h 8
$
In AWK:
$ awk 'NR==2,NR==4{a[++i]=$0} NR==6,NR==8{b[++j]=$0} END {for(i=1;i<=j;i++) print a[i],b[i]}' file
b 2 f 6
c 3 g 7
d 4 h 8
When between the given record numbers (NR), fill up arrays a and b. In the END, print them side by side.

Why does wc count one extra character in my file? [duplicate]

This question already has an answer here:
wc -m in unix adds one character
(1 answer)
Closed 6 years ago.
1.) I am using Debian 8.4 on a virtual box and as I ran the command wc sample.txt to sample.txt containing:
Hello
The output to the command was
1 1 6 sample.txt
Is the extra character EOF? If it is then how come when I ran the same command for an empty file the output was..
0 0 0 sample.txt
You have a trailing new line and this is what wc reports.
See for example if we create a file with printf:
$ printf "hello" > a
$ cat a | hexdump -c
0000000 h e l l o
0000005
$ wc a
0 1 5 a
However, if we write with something like echo, a trailing new line is appended:
$ echo "hello" > a
$ cat a | hexdump -c
0000000 h e l l o \n
0000006
$ wc a
1 1 6 a

How to extract one column from multiple files, and paste those columns into one file?

I want to extract the 5th column from multiple files, named in a numerical order, and paste those columns in sequence, side by side, into one output file.
The file names look like:
sample_problem1_part1.txt
sample_problem1_part2.txt
sample_problem2_part1.txt
sample_problem2_part2.txt
sample_problem3_part1.txt
sample_problem3_part2.txt
......
Each problem file (1,2,3...) has two parts (part1, part2). Each file has the same number of lines.
The content looks like:
sample_problem1_part1.txt
1 1 20 20 1
1 7 21 21 2
3 1 22 22 3
1 5 23 23 4
6 1 24 24 5
2 9 25 25 6
1 0 26 26 7
sample_problem1_part2.txt
1 1 88 88 8
1 1 89 89 9
2 1 90 90 10
1 3 91 91 11
1 1 92 92 12
7 1 93 93 13
1 5 94 94 14
sample_problem2_part1.txt
1 4 330 30 a
3 4 331 31 b
1 4 332 32 c
2 4 333 33 d
1 4 334 34 e
1 4 335 35 f
9 4 336 36 g
The output should look like: (in a sequence of problem1_part1, problem1_part2, problem2_part1, problem2_part2, problem3_part1, problem3_part2,etc.,)
1 8 a ...
2 9 b ...
3 10 c ...
4 11 d ...
5 12 e ...
6 13 f ...
7 14 g ...
I was using:
paste sample_problem1_part1.txt sample_problem1_part2.txt > \
sample_problem1_partall.txt
paste sample_problem2_part1.txt sample_problem2_part2.txt > \
sample_problem2_partall.txt
paste sample_problem3_part1.txt sample_problem3_part2.txt > \
sample_problem3_partall.txt
And then:
for i in `find . -name "sample_problem*_partall.txt"`
do
l=`echo $i | sed 's/sample/extracted_col_/'`
`awk '{print $5, $10}' $i > $l`
done
And:
paste extracted_col_problem1_partall.txt \
extracted_col_problem2_partall.txt \
extracted_col_problem3_partall.txt > \
extracted_col_problemall_partall.txt
It works fine with a few files, but it's a crazy method when the number of files is large (over 4000).
Could anyone help me with simpler solutions that are capable of dealing with multiple files, please?
Thanks!
Here's one way using awk and a sorted glob of files:
awk '{ a[FNR] = (a[FNR] ? a[FNR] FS : "") $5 } END { for(i=1;i<=FNR;i++) print a[i] }' $(ls -1v *)
Results:
1 8 a
2 9 b
3 10 c
4 11 d
5 12 e
6 13 f
7 14 g
Explanation:
For each line of input of each input file:
Add the files line number to an array with a value of column 5.
(a[FNR] ? a[FNR] FS : "") is a ternary operation, which is set up to build up the arrays value as a record. It simply asks if the files line number is already in the array. If so, add the arrays value followed by the default file separator before adding the fifth column. Else, if the line number is not in the array, don't prepend anything, just let it equal the fifth column.
At the end of the script:
Use a C-style loop to iterate through the array, printing each of the arrays values.
For only ~4000 files, you should be able to do:
find . -name sample_problem*_part*.txt | xargs paste
If find is giving names in the wrong order, pipe it to sort:
find . -name sample_problem*_part*.txt | sort ... | xargs paste
# print filenames in sorted order
find -name sample\*.txt | sort |
# extract 5-th column from each file and print it on a single line
xargs -n1 -I{} sh -c '{ cut -s -d " " -f 5 $0 | tr "\n" " "; echo; }' {} |
# transpose
python transpose.py ?
where transpose.py:
#!/usr/bin/env python
"""Write lines from stdin as columns to stdout."""
import sys
from itertools import izip_longest
missing_value = sys.argv[1] if len(sys.argv) > 1 else '-'
for row in izip_longest(*[column.split() for column in sys.stdin],
fillvalue=missing_value):
print " ".join(row)
Output
1 8 a
2 9 b
3 10 c
4 11 d
5 ? e
6 ? f
? ? g
Assuming the first and second files have less lines than the third one (missing values are replaced by '?').
Try this one. My script assumes that every file has the same number of lines.
# get number of lines
lines=$(wc -l sample_problem1_part1.txt | cut -d' ' -f1)
for ((i=1; i<=$lines; i++)); do
for file in sample_problem*; do
# get line number $i and delete everything except the last column
# and then print it
# echo -n means that no newline is appended
echo -n $(sed -n ${i}'s%.*\ %%p' $file)" "
done
echo
done
This works. For 4800 files, each 7 lines long it took 2 minutes 57.865 seconds on a AMD Athlon(tm) X2 Dual Core Processor BE-2400.
PS: The time for my script increases linearly with the number of lines. It would take very long time to merge files with 1000 lines. You should consider learning awk and use the script from steve. I tested it: For 4800 files, each with 1000 lines it took only 65 seconds!
You can pass awk output to paste and redirect it to a new file as follows:
paste <(awk '{print $3}' file1) <(awk '{print $3}' file2) <(awk '{print $3}' file3) > file.txt

Resources