Bash script: Read text after characters

Bash script: Read text after characters - linux

I'd like to read the text after characters in a file.
For example:
MPlayer-2013-08-30-i486|MPlayer|2013-08-30-i486||Multimedia;video|4508K||MPlayer-2013-08-30-i486.pet|+ffmpeg|mplayer video player|slackware|14.0||
I'd like to read the version of the program (in the third box):
2013-08-30-i486
How I can do this in my bash script?

This is pretty easily done with cut:
echo 'MPlayer-2013-08-30-i486|MPlayer|2013-08-30-i486||Multimedia;video|4508K||MPlayer-2013-08-30-i486.pet|+ffmpeg|mplayer video player|slackware|14.0||' | cut -d '|' -f 3
2013-08-30-i486
which will split on | and choose the 3rd field.

Using BASH regex:
s='MPlayer-2013-08-30-i486|MPlayer|2013-08-30-i486||Multimedia;video|4508K||MPlayer-2013-08-30-i486.pet|+ffmpeg|mplayer video player|slackware|14.0||'
[[ "$s" =~ MPlayer-([^|]+) ]] && echo "${BASH_REMATCH[1]}"
2013-08-30-i486
Using awk:
awk -F 'MPlayer-|\\|' '{print $2}' <<< "$s"
2013-08-30-i486
To grab 3rd field using awk:
awk -F '\\|' '{print $3}' <<< "$s"
2013-08-30-i486

This is simple to do in AWK:
$ awk -F'|' '{print $3}' file
2013-08-30-i486
It seems that the same data is repeated in several places, so I assume that they are all OK to use...In the above line, the input is being split into fields on the | character and the third field is being printed. The same thing will happen for every line of input.

Through grep,
$ grep -oP 'MPlayer-\K[^|.]*(?=\|)' file
2013-08-30-i486
Through sed,
$ echo 'MPlayer-2013-08-30-i486|MPlayer|2013-08-30-i486||Multimedia;video|4508K||MPlayer-2013-08-30-i486.pet|+ffmpeg|mplayer video player|slackware|14.0||' | sed -r 's/^[^|]+\|[^|]+\|([^|]+).*$/\1/'
2013-08-30-i486

Using read (all shells):
IFS='|' read __ __ VERSION __ < file
echo "$VERSION"
Another using read -a and Bash arrays:
IFS='|' read -a FIELDS < file
echo "${FIELDS[2]}"
Output:
2013-08-30-i486

The read built-in will be most efficient for a single line:
IFS="|" read __ __ version __ <<< "$line"
although if you are processing a file full of such lines with
while IFS="|" read __ __ version __; do
# do something with $version
done < file
it might be more efficient to use cut:
while read version; do
# do something with $version
done < <(cut -d'|' -f3 file)
or awk:
awk -F'|' '{ # do something with $3 }' file

Related

How do you change column names to lowercase with linux and store the file as it is?

I am trying to change the column names to lowercase in a csv file. I found the code to do that online but I dont know how to replace the old column names(uppercase) with new column names(lowercase) in the original file. I did something like this:
$cat head -n1 xxx.csv | tr "[A-Z]" "[a-z]"
But it simply just prints out the column names in lowercase, which is not enough for me.
I tried to add sed -i but it did not do any good. Thanks!!

Using awk (readability winner) :
concise way:
awk 'NR==1{print tolower($0);next}1' file.csv
or using ternary operator:
awk '{print (NR==1) ? tolower($0): $0}' file.csv
or using if/else statements:
awk '{if (NR==1) {print tolower($0)} else {print $0}}' file.csv
To change the file for real:
awk 'NR==1{print tolower($0);next}1' file.csv | tee /tmp/temp
mv /tmp/temp file.csv
For your information, sed using the in place edit switch -i do the same: it use a temporary file under the hood.
You can check this by using :
strace -f -s 800 sed -i'' '...' file

Using perl:
perl -i -pe '$_=lc() if $.==1' file.csv
It replace the file on the fly with -i switch

You can use sed to tell it to replace the first line with all lower-case and then print the rest as-is:
sed '1s/.*/\L&/' ./xxx.csv
Redirect the output or use -i to do an in-place edit.
Proof of Concept
$ echo -e "COL1,COL2,COL3\nFoO,bAr,baZ" | sed '1s/.*/\L&/'
col1,col2,col3
FoO,bAr,baZ

unable to redirect data by using awk or cat

I am using AIX for following code:
#!/bin/sh
cat ip.txt | awk -F ' ' '{print $2,$1}' >op.txt
or
awk -F ' ' '{print $2,$1}' ip.txt > op2.txt
It is generating an unknown file named "oxb1du".
Aslo, I can see file op2.txt after ls -ltr but it does not contain any data.
I/P file:
name 1
info 21
city 28
pin 31
state 34

Maybe you are looking for:
cat ip.txt | awk '{print $2,$1}' > op.txt

You probably have binary characters in your file. Try cleaning it first.
tr -cd '[:graph:]\n\t ' <"$file" >$TEMP_FILE && mv $TEMP_FILE "$file"
dos2unix and other programs may work, but I've had issues with dos2unix only removing carriage returns, and not other garbage so I've given you the above (obviously assign or replace the variables). Then just use:
awk -F" " '{print $2,$1}' ip.txt > op2.txt
I only changed the quotes for readability-- having them hanging away from the -F, and before other single quotes looks wonky. This way is quicker to read.

Bash tries to execute commands in heredoc

I am trying to write a simple bash script that will print a multiline output to another file. I am doing it through heredoc format:
#!/bin/sh
echo "Hello!"
cat <<EOF > ~/Desktop/what.txt
a=`echo $1 | awk -F. '{print $NF}'`
b=`echo $2 | tr '[:upper:]' '[:lower:]'`
EOF
I was expecting to see a file in my desktop with these contents:
a=`echo $1 | awk -F. '{print $NF}'`
b=`echo $2 | tr '[:upper:]' '[:lower:]'`
But instead, I am seeing these as the contents of my what.txt file:
a=
b=
Somehow, even though it is part of a heredoc, bash is trying to execute it line by line. How do I prevent this, and print the contents to the file as it is?

Quote EOF so that bash takes inputs literally:
cat <<'EOF' > what.txt
a=`echo $1 | awk -F. '{print $NF}'`
b=`echo $2 | tr '[:upper:]' '[:lower:]'`
EOF
Also start using $() for command substitution instead of old and problematic ``.

How to run grep inside awk?

Suppose I have a file input.txt with few columns and few rows, the first column is the key, and a directory dir with files which contain some of these keys. I want to find all lines in the files in dir which contain these key words. At first I tried to run the command
cat input.txt | awk '{print $1}' | xargs grep dir
This doesn't work because it thinks the keys are paths on my file system. Next I tried something like
cat input.txt | awk '{system("grep -rn dir $1")}'
But this didn't work either, eventually I have to admit that even this doesn't work
cat input.txt | awk '{system("echo $1")}'
After I tried to use \ to escape the white space and the $ sign, I came here to ask for your advice, any ideas?
Of course I can do something like
for x in `cat input.txt` ; do grep -rn $x dir ; done
This is not good enough, because it takes two commands, but I want only one. This also shows why xargs doesn't work, the parameter is not the last argument

You don't need grep with awk, and you don't need cat to open files:
awk 'NR==FNR{keys[$1]; next} {for (key in keys) if ($0 ~ key) {print FILENAME, $0; next} }' input.txt dir/*
Nor do you need xargs, or shell loops or anything else - just one simple awk command does it all.
If input.txt is not a file, then tweak the above to:
real_input_generating_command |
awk 'NR==FNR{keys[$1]; next} {for (key in keys) if ($0 ~ key) {print FILENAME, $0; next} }' - dir/*
All it's doing is creating an array of keys from the first file (or input stream) and then looking for each key from that array in every file in the dir directory.

Try following
awk '{print $1}' input.txt | xargs -n 1 -I pattern grep -rn pattern dir

First thing you should do is research this.
Next ... you don't need to grep inside awk. That's completely redundant. It's like ... stuffing your turkey with .. a turkey.
Awk can process input and do "grep" like things itself, without the need to launch the grep command. But you don't even need to do this. Adapting your first example:
awk '{print $1}' input.txt | xargs -n 1 -I % grep % dir
This uses xargs' -I option to put xargs' input into a different place on the command line it runs. In FreeBSD or OSX, you would use a -J option instead.
But I prefer your for loop idea, converted into a while loop:
while read key junk; do grep -rn "$key" dir ; done < input.txt

Use process substitution to create a keyword "file" that you can pass to grep via the -f option:
grep -f <(awk '{print $1}' input.txt) dir/*
This will search each file in dir for lines containing keywords printed by the awk command. It's equivalent to
awk '{print $1}' input.txt > tmp.txt
grep -f tmp.txt dir/*

grep requires parameters in order: [what to search] [where to search]. You need to merge keys received from awk and pass them to grep using the \| regexp operator.
For example:
arturcz#szczaw:/tmp/s$ cat words.txt
foo
bar
fubar
foobaz
arturcz#szczaw:/tmp/s$ grep 'foo\|baz' words.txt
foo
foobaz
Finally, you will finish with:
grep `commands|to|prepare|a|keywords|list` directory

In case you still want to use grep inside awk, make sure $1, $2 etc are outside quote.
eg. this works perfectly
cat file_having_query | awk '{system("grep " $1 " file_to_be_greped")}'
// notice the space after grep and before file name

Escaping backslash in AWK

I'm trying to understand why the command below doesn't work (output is empty):
echo 'aaa\tbbb' | awk -F '\\t' '{print $2}'
I would expect the output to be 'bbb'.
Interestingly this works (output is 'bbb'):
echo 'aaa\tbbb' | awk -F 't' '{print $2}'
And this works as well (ouptut is 'tbbb'):
echo 'aaa\tbbb' | awk -F '\\' '{print $2}'
It looks as if \\\t is read as backslash followed by tab instead of escaped backslash followed by t.
Is there a proper way to write this command?

You need to tell echo to interpret backslash escapes. Try:
$ echo -e 'aaa\tbbb' | awk -F '\t' '{print $2}'
bbb
man echo would tell:
-e enable interpretation of backslash escapes

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Bash script: Read text after characters - linux

This is pretty easily done with cut: echo 'MPlayer-2013-08-30-i486|MPlayer|2013-08-30-i486||Multimedia;video|4508K||MPlayer-2013-08-30-i486.pet|+ffmpeg|mplayer video player|slackware|14.0||' | cut -d '|' -f 3 2013-08-30-i486 which will split on | and choose the 3rd field.

Through grep, $ grep -oP 'MPlayer-\K[^|.](?=\|)' file 2013-08-30-i486 Through sed, $ echo 'MPlayer-2013-08-30-i486|MPlayer|2013-08-30-i486||Multimedia;video|4508K||MPlayer-2013-08-30-i486.pet|+ffmpeg|mplayer video player|slackware|14.0||' | sed -r 's/^[^|]+\|[^|]+\|([^|]+).$/\1/' 2013-08-30-i486

Using read (all shells): IFS='|' read VERSION __ < file echo "$VERSION" Another using read -a and Bash arrays: IFS='|' read -a FIELDS < file echo "${FIELDS[2]}" Output: 2013-08-30-i486

Related

How do you change column names to lowercase with linux and store the file as it is?

unable to redirect data by using awk or cat

Bash tries to execute commands in heredoc

How to run grep inside awk?

Escaping backslash in AWK

Categories

Resources

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Bash script: Read text after characters - linux

This is pretty easily done with cut: echo 'MPlayer-2013-08-30-i486|MPlayer|2013-08-30-i486||Multimedia;video|4508K||MPlayer-2013-08-30-i486.pet|+ffmpeg|mplayer video player|slackware|14.0||' | cut -d '|' -f 3 2013-08-30-i486 which will split on | and choose the 3rd field.

Through grep, $ grep -oP 'MPlayer-\K[^|.]*(?=\|)' file 2013-08-30-i486 Through sed, $ echo 'MPlayer-2013-08-30-i486|MPlayer|2013-08-30-i486||Multimedia;video|4508K||MPlayer-2013-08-30-i486.pet|+ffmpeg|mplayer video player|slackware|14.0||' | sed -r 's/^[^|]+\|[^|]+\|([^|]+).*$/\1/' 2013-08-30-i486

Using read (all shells): IFS='|' read __ __ VERSION __ < file echo "$VERSION" Another using read -a and Bash arrays: IFS='|' read -a FIELDS < file echo "${FIELDS[2]}" Output: 2013-08-30-i486

Related

How do you change column names to lowercase with linux and store the file as it is?

unable to redirect data by using awk or cat

Bash tries to execute commands in heredoc

How to run grep inside awk?

Escaping backslash in AWK

Categories

Resources

Through grep, $ grep -oP 'MPlayer-\K[^|.](?=\|)' file 2013-08-30-i486 Through sed, $ echo 'MPlayer-2013-08-30-i486|MPlayer|2013-08-30-i486||Multimedia;video|4508K||MPlayer-2013-08-30-i486.pet|+ffmpeg|mplayer video player|slackware|14.0||' | sed -r 's/^[^|]+\|[^|]+\|([^|]+).$/\1/' 2013-08-30-i486

Using read (all shells): IFS='|' read VERSION __ < file echo "$VERSION" Another using read -a and Bash arrays: IFS='|' read -a FIELDS < file echo "${FIELDS[2]}" Output: 2013-08-30-i486