unable to redirect data by using awk or cat - linux

I am using AIX for following code:
#!/bin/sh
cat ip.txt | awk -F ' ' '{print $2,$1}' >op.txt
or
awk -F ' ' '{print $2,$1}' ip.txt > op2.txt
It is generating an unknown file named "oxb1du".
Aslo, I can see file op2.txt after ls -ltr but it does not contain any data.
I/P file:
name 1
info 21
city 28
pin 31
state 34

Maybe you are looking for:
cat ip.txt | awk '{print $2,$1}' > op.txt

You probably have binary characters in your file. Try cleaning it first.
tr -cd '[:graph:]\n\t ' <"$file" >$TEMP_FILE && mv $TEMP_FILE "$file"
dos2unix and other programs may work, but I've had issues with dos2unix only removing carriage returns, and not other garbage so I've given you the above (obviously assign or replace the variables). Then just use:
awk -F" " '{print $2,$1}' ip.txt > op2.txt
I only changed the quotes for readability-- having them hanging away from the -F, and before other single quotes looks wonky. This way is quicker to read.

Related

Split a column on the space into two columns

I have a large file ~9GB with each row in this format:
12345,6789,Jim Bob
My desired output is this:
12345,6789,Jim,Bob
How would I do this using awk. It seems to be the fastest way to process this and I am new when it comes to using the Terminal for such things. Thanks!
Using awk and regex to replace the first space with a comma:
$ awk '{sub(/ /,",")}1' file
12345,6789,Jim,Bob
or using awk and regex to replace space with a comma in the third field ($3):
$ awk 'BEGIN{FS=OFS=","}{sub(/ /,",",$3)}1' file
12345,6789,Jim,Bob
Another solution using awk
awk '$1=$1' OFS=, file
you get,
12345,6789,Jim,Bob
I have a feeling sed would be a lot faster for your requirement, given the huge size of the input file:
sed -E 's/ ([^ ]+)$/,\1/' file > file.modified
or, for in-place editing:
sed -i.bak -E 's/ ([^ ]+)$/,\1/' file
Benchmarking with a 36 MB file, dummy.txt:
$ time awk 'BEGIN{FS=OFS=","}{sub(/ /,",",$3)}1' dummy.txt >/dev/null
real 0m3.357s
user 0m3.337s
sys 0m0.016s
$ time awk '{sub(/ /,",")}1' dummy.txt >/dev/null
real 0m3.182s
user 0m3.166s
sys 0m0.014s
$ time awk '$1=$1' OFS=, dummy.txt >/dev/null
real 0m3.150s
user 0m3.130s
sys 0m0.018s
$ time sed -E 's/ ([^ ]+)$/,\1/' dummy.txt >/dev/null
real 0m1.646s
user 0m1.633s
sys 0m0.013s
sed is 2x faster than awk! For a 9G file, this difference could be even more dramatic.
Well you can use 'tr' also if that suits you
tr -s ' ' ',' < file.txt > tr.txt
where file.txt is your input file
and tr.txt is the output file.
Well if you want to use awk only, you could choose space as the field separator and use awk to print a ',' between two columns
awk -F' ' '{print $1","$2}' file.txt
Benchmarking done for 283Mb file
Using tr
time tr -s ' ' ',' < file.txt >tr.txt
real 0m10.976s
user 0m1.042s
sys 0m0.966s
Using awk
time awk -F' ' '{print $1","$2}' file.txt > /dev/null
real 0m14.141s
user 0m13.909s
sys 0m0.199s
Using #codeforester method
time sed -E 's/ ([^ ]+)$/,\1/' file.txt >/dev/null
real 0m42.183s
user 0m41.659s
sys 0m0.435s
tr works even faster than sed and awk

Print name of the file in front of every line of file

I have a lot of text files and I want to make a bash script in linux to print the name of file in each lines of file. For example I have file lenovo.txt and I want that every line in the file to start with lenovo.txt.
I try to make a "for" for this but didn't work.
for i in *.txt
do
awk '{print '$i' $0}' /var/SambaShare/$i > /var/SambaShare/new_$i
done
Thanks!
It doesn't work because you need to pass $i to awk with the -v option. But you can also use the FILENAME built-in variable in awk :
ls *txt
file.txt file2.txt
cat *txt
A
B
C
A2
B2
C2
for i in *txt; do
awk '{print FILENAME,$0}' $i;
done
file.txt A
file.txt B
file.txt C
file2.txt A2
file2.txt B2
file2.txt C2
An to redirect into a new file :
for i in *txt; do
awk '{print FILENAME,$0}' $i > ${i%.txt}_new.txt;
done
As for your corrected version :
for i in *.txt
do
awk -v i=$i '{print i,$0}' $i > new_$i
done
Hope this helps.
Using grep you can make use of the --with-filename (alias -H) option and use an empty pattern that always matches:
for i in *.txt
do
grep -H "" $i > new_$i
done
Awk and Bash don't share the same variables as they are different languages with separate interpreters. You should pass Bash variables to Awk with the -v option.
You should also quote your file name variables to ensure they don't get expanded as separate arguments if they contain whitespace.
for i in *.txt
do
awk -v i="$i" '{print i,$0}' "$i" > "$i"
done

Bash script: Read text after characters

I'd like to read the text after characters in a file.
For example:
MPlayer-2013-08-30-i486|MPlayer|2013-08-30-i486||Multimedia;video|4508K||MPlayer-2013-08-30-i486.pet|+ffmpeg|mplayer video player|slackware|14.0||
I'd like to read the version of the program (in the third box):
2013-08-30-i486
How I can do this in my bash script?
This is pretty easily done with cut:
echo 'MPlayer-2013-08-30-i486|MPlayer|2013-08-30-i486||Multimedia;video|4508K||MPlayer-2013-08-30-i486.pet|+ffmpeg|mplayer video player|slackware|14.0||' | cut -d '|' -f 3
2013-08-30-i486
which will split on | and choose the 3rd field.
Using BASH regex:
s='MPlayer-2013-08-30-i486|MPlayer|2013-08-30-i486||Multimedia;video|4508K||MPlayer-2013-08-30-i486.pet|+ffmpeg|mplayer video player|slackware|14.0||'
[[ "$s" =~ MPlayer-([^|]+) ]] && echo "${BASH_REMATCH[1]}"
2013-08-30-i486
Using awk:
awk -F 'MPlayer-|\\|' '{print $2}' <<< "$s"
2013-08-30-i486
To grab 3rd field using awk:
awk -F '\\|' '{print $3}' <<< "$s"
2013-08-30-i486
This is simple to do in AWK:
$ awk -F'|' '{print $3}' file
2013-08-30-i486
It seems that the same data is repeated in several places, so I assume that they are all OK to use...In the above line, the input is being split into fields on the | character and the third field is being printed. The same thing will happen for every line of input.
Through grep,
$ grep -oP 'MPlayer-\K[^|.]*(?=\|)' file
2013-08-30-i486
Through sed,
$ echo 'MPlayer-2013-08-30-i486|MPlayer|2013-08-30-i486||Multimedia;video|4508K||MPlayer-2013-08-30-i486.pet|+ffmpeg|mplayer video player|slackware|14.0||' | sed -r 's/^[^|]+\|[^|]+\|([^|]+).*$/\1/'
2013-08-30-i486
Using read (all shells):
IFS='|' read __ __ VERSION __ < file
echo "$VERSION"
Another using read -a and Bash arrays:
IFS='|' read -a FIELDS < file
echo "${FIELDS[2]}"
Output:
2013-08-30-i486
The read built-in will be most efficient for a single line:
IFS="|" read __ __ version __ <<< "$line"
although if you are processing a file full of such lines with
while IFS="|" read __ __ version __; do
# do something with $version
done < file
it might be more efficient to use cut:
while read version; do
# do something with $version
done < <(cut -d'|' -f3 file)
or awk:
awk -F'|' '{ # do something with $3 }' file

How to run grep inside awk?

Suppose I have a file input.txt with few columns and few rows, the first column is the key, and a directory dir with files which contain some of these keys. I want to find all lines in the files in dir which contain these key words. At first I tried to run the command
cat input.txt | awk '{print $1}' | xargs grep dir
This doesn't work because it thinks the keys are paths on my file system. Next I tried something like
cat input.txt | awk '{system("grep -rn dir $1")}'
But this didn't work either, eventually I have to admit that even this doesn't work
cat input.txt | awk '{system("echo $1")}'
After I tried to use \ to escape the white space and the $ sign, I came here to ask for your advice, any ideas?
Of course I can do something like
for x in `cat input.txt` ; do grep -rn $x dir ; done
This is not good enough, because it takes two commands, but I want only one. This also shows why xargs doesn't work, the parameter is not the last argument
You don't need grep with awk, and you don't need cat to open files:
awk 'NR==FNR{keys[$1]; next} {for (key in keys) if ($0 ~ key) {print FILENAME, $0; next} }' input.txt dir/*
Nor do you need xargs, or shell loops or anything else - just one simple awk command does it all.
If input.txt is not a file, then tweak the above to:
real_input_generating_command |
awk 'NR==FNR{keys[$1]; next} {for (key in keys) if ($0 ~ key) {print FILENAME, $0; next} }' - dir/*
All it's doing is creating an array of keys from the first file (or input stream) and then looking for each key from that array in every file in the dir directory.
Try following
awk '{print $1}' input.txt | xargs -n 1 -I pattern grep -rn pattern dir
First thing you should do is research this.
Next ... you don't need to grep inside awk. That's completely redundant. It's like ... stuffing your turkey with .. a turkey.
Awk can process input and do "grep" like things itself, without the need to launch the grep command. But you don't even need to do this. Adapting your first example:
awk '{print $1}' input.txt | xargs -n 1 -I % grep % dir
This uses xargs' -I option to put xargs' input into a different place on the command line it runs. In FreeBSD or OSX, you would use a -J option instead.
But I prefer your for loop idea, converted into a while loop:
while read key junk; do grep -rn "$key" dir ; done < input.txt
Use process substitution to create a keyword "file" that you can pass to grep via the -f option:
grep -f <(awk '{print $1}' input.txt) dir/*
This will search each file in dir for lines containing keywords printed by the awk command. It's equivalent to
awk '{print $1}' input.txt > tmp.txt
grep -f tmp.txt dir/*
grep requires parameters in order: [what to search] [where to search]. You need to merge keys received from awk and pass them to grep using the \| regexp operator.
For example:
arturcz#szczaw:/tmp/s$ cat words.txt
foo
bar
fubar
foobaz
arturcz#szczaw:/tmp/s$ grep 'foo\|baz' words.txt
foo
foobaz
Finally, you will finish with:
grep `commands|to|prepare|a|keywords|list` directory
In case you still want to use grep inside awk, make sure $1, $2 etc are outside quote.
eg. this works perfectly
cat file_having_query | awk '{system("grep " $1 " file_to_be_greped")}'
// notice the space after grep and before file name

Grep - returning both the line number and the name of the file

I have a number of log files in a directory. I am trying to write a script to search all the log files for a string and echo the name of the files and the line number that the string is found.
I figure I will probably have to use 2 grep's - piping the output of one into the other since the -l option only returns the name of the file and nothing about the line numbers. Any insight in how I can successfully achieve this would be much appreciated.
Many thanks,
Alex
$ grep -Hn root /etc/passwd
/etc/passwd:1:root:x:0:0:root:/root:/bin/bash
combining -H and -n does what you expect.
If you want to echo the required informations without the string :
$ grep -Hn root /etc/passwd | cut -d: -f1,2
/etc/passwd:1
or with awk :
$ awk -F: '/root/{print "file=" ARGV[1] "\nline=" NR}' /etc/passwd
file=/etc/passwd
line=1
if you want to create shell variables :
$ awk -F: '/root/{print "file=" ARGV[1] "\nline=" NR}' /etc/passwd | bash
$ echo $line
1
$ echo $file
/etc/passwd
Use -H. If you are using a grep that does not have -H, specify two filenames. For example:
grep -n pattern file /dev/null
My version of grep kept returning text from the matching line, which I wasn't sure if you were after... You can also pipe the output to an awk command to have it ONLY print the file name and line number
grep -Hn "text" . | awk -F: '{print $1 ":" $2}'

Resources