How to combine files and search the result? - linux

I have two sample files as follows:
File1
item name="NY" block="A" idnum="12345"
item name="NJ" block="B" idnum="123456"
item name="CT" block="C" idnum="1234567"
File2
ID_B|ID_C|NY|4|8198|10|2374|127
ID_C|ID_D|NJ|4|8198|10|2374|127
ID_D|ID_E|CT|4|8198|10|2374|127
I would like to be able to generate a file as passing ID as argument and output should look like this
If I am looking for info for ID_B then output should be
ID_B|ID_C|NY|4|8198|10|2374|127 => "NY" block="A" idnum="12345"
If am looking for two ID_C and ID_D together it should be
ID_C|ID_D|NJ|4|8198|10|2374|127 => "NJ" block="B" idnum="123456"
ID_D|ID_E|CT|4|8198|10|2374|127 => "CT" block="C" idnum="1234567"

With bash, join, sort and awk.
script.sh:
#!/bin/bash
file1="File1"
file2="File2"
grep -f <(printf "^%s|\n" "$#") <(join -t '|' -1 1 -2 3 <(awk -F'"' '{OFS="|"; print $2,$4,$6}' "$file1" | sort -t '|' -k 1,1) <(sort -t '|' -k 3,3 "$file2") -o 2.1,2.2,2.3,2.4,2.5,2.6,2.7,2.8,2.3,1.2,1.3 | awk -F'|' '{ printf "%s|%s|%s|%s|%s|%s|%s|%s => \"%s\" block=\"%s\" idnum=\"%s\"\n",$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11 }')
Example: script.sh ID_C ID_D
Output:
ID_D|ID_E|CT|4|8198|10|2374|127 => "CT" block="C" idnum="1234567"
ID_C|ID_D|NJ|4|8198|10|2374|127 => "NJ" block="B" idnum="123456"

Something like
echo "Please enter the key"
read key
grep "item name=\"$(grep "^${key}|" file2 | cut -d"|" -f3)\"" file1
Edit:
I did not make the literal output, just wanted to show that you could grep the fields. A complete solution should look like:
#!/bin/bash
if [ $# -eq 0 ]; then
echo "Usage $0 id [id2 ..]"
exit 1
fi
for key in $*; do
f2=$(grep "^${key}|" file2 )
f1=$(grep "item name=\"$(echo "^${f2}" | cut -d"|" -f3)\"" file1 )
echo "${f2} => ${f1#item name=}"
done

awk, wrapped in a thin shell script
#!/bin/sh
awk -F '[|"]' -v OFS='"' -v keys="$*" '
BEGIN {
n = split(keys, a, " ")
for (i=1; i <= n; i++) key[a[i]] = 1
}
NR == FNR {
if ($1 in key) line[$3] = $0
next
}
$2 in line {
$1 = line[$2] " => "
print
}
' file2 file1
Then
$ sh script.sh ID_B ID_D
ID_B|ID_C|NY|4|8198|10|2374|127 => "NY" block="A" idnum="12345"
ID_D|ID_E|CT|4|8198|10|2374|127 => "CT" block="C" idnum="1234567"

What you want, is very similar to the JOIN in the SQL.
Unfortunately, implementing this in procedural languages like bash, awk or sed, requires dictionaries, or string-indexed string arrays. (And even with them would be it much more complex as a single-line sql query.) None of them supports it, although there are numerous possibilities to emulate it.
In your place I would use perl or python to solve this problem - on these languages would be your goal so simple, that even with the additional learning effort would it be around so effective as a bash workaround.
But you want a bash-awk-sed solution, so this is what I give you now.
Don't use these languages alone, use them integrated. The most simplest thing what you can do in bash, to use the external command grep.
For example, an integrated awk/bash core which partially does your the task (I didn't tested it):
awk 'BEGIN {FS="[_|]"} { print $2 " " $0}' file1|while read l
do
set $l
echo $2 $(grep 'block="'$1'"' file2)
done
But to understand this code, debug it and extend it, you need to have a big experience in regexps, exact details of escaping, and how is it the fastest to integrate them on both of these languages. On my opinion, if you are learning programming, it is a must-have before you start to learn perl/python, but in everyday work the higher level solutions are better.

You can also write a short bash script that will retrieve the information in a brute-force manner:
#!/bin/bash
[ -z "$1" -o -z "$2" -o -z "$3" ] && {
printf "error: insufficient input. Usage: %s file1 file2 query\n" ${0##*/}
exit 1
}
file1="$1"
file2="$2"
query="$3"
## line from file1
f1=$(grep "$query" "$file1")
[ -z "$f1" ] && {
printf "error: '%s' not found in '%s'\n." "$query" "$file1"
exit 1
}
## line from file2
f2=$(grep $(awk -F '|' '{ print $3 }' <<<$f1) "$file2")
[ -z "$f1" ] && {
printf "error: no related row found in '%s'\n." "$file2"
exit 1
}
printf "%s => %s\n" "$f1" "$f2"
Examples
$ bash joinfiles.sh join1.txt join2.txt ID_B
ID_B|ID_C|NY|4|8198|10|2374|127 => item name="NY" block="A" idnum="12345"
$ bash joinfiles.sh join1.txt join2.txt ID_E
ID_D|ID_E|CT|4|8198|10|2374|127 => item name="CT" block="C" idnum="1234567"
$ bash joinfiles.sh join1.txt join2.txt ID_F
error: 'ID_F' not found in 'join1.txt'
Processing Multiple Input Tags
With small adjustments, you can easily process as many tags as you desire. Below, the code will match all of the tags provided as arguments 3 and beyond. If you want to provide more than a half-dozed tags at a time, then you should adjust the script to read tags from stdin instead of taking them as arguments. Also note, the code matches the tag in the first field of the first file. If you want to match tags in all fields in file one, then f1 will become an array and the remaining part of the code with operate on each element in f1. Let me know what you are thinking. Here is the example for multiple input tags as arguments:
#!/bin/bash
[ -z "$1" -o -z "$2" -o -z "$3" ] && {
printf "error: insufficient input. Usage: %s file1 file2 query\n" ${0##*/}
exit 1
}
[ -f "$1" -a -f "$2" ] || {
printf "error: file not found. '%s' or '%s'\n" "$1" "$2"
exit 1
}
file1="$1"
file2="$2"
for i in "${#:3}"; do
query="$i"
## line from file1
f1=$(grep "^$query" "$file1")
[ -z "$f1" ] && {
printf "error: '%s' not found in '%s'\n." "$query" "$file1"
exit 1
}
## line from file2
f2=$(grep $(awk -F '|' '{ print $3 }' <<<$f1) "$file2")
[ -z "$f1" ] && {
printf "error: no related row found in '%s'\n." "$file2"
exit 1
}
printf "%s => %s\n" "$f1" "$f2"
done
Use/Output
$ bash joinfiles2.sh join1.txt join2.txt ID_B ID_D ID_F
ID_B|ID_C|NY|4|8198|10|2374|127 => item name="NY" block="A" idnum="12345"
ID_D|ID_E|CT|4|8198|10|2374|127 => item name="CT" block="C" idnum="1234567"
error: 'ID_F' not found in 'join1.txt'

Related

Difficulty to create .txt file from loop in bash

I've this data :
cat >data1.txt <<'EOF'
2020-01-27-06-00;/dev/hd1;100;/
2020-01-27-12-00;/dev/hd1;100;/
2020-01-27-18-00;/dev/hd1;100;/
2020-01-27-06-00;/dev/hd2;200;/usr
2020-01-27-12-00;/dev/hd2;200;/usr
2020-01-27-18-00;/dev/hd2;200;/usr
EOF
cat >data2.txt <<'EOF'
2020-02-27-06-00;/dev/hd1;120;/
2020-02-27-12-00;/dev/hd1;120;/
2020-02-27-18-00;/dev/hd1;120;/
2020-02-27-06-00;/dev/hd2;230;/usr
2020-02-27-12-00;/dev/hd2;230;/usr
2020-02-27-18-00;/dev/hd2;230;/usr
EOF
cat >data3.txt <<'EOF'
2020-03-27-06-00;/dev/hd1;130;/
2020-03-27-12-00;/dev/hd1;130;/
2020-03-27-18-00;/dev/hd1;130;/
2020-03-27-06-00;/dev/hd2;240;/usr
2020-03-27-12-00;/dev/hd2;240;/usr
2020-03-27-18-00;/dev/hd2;240;/usr
EOF
I would like to create a .txt file for each filesystem ( so hd1.txt, hd2.txt, hd3.txt and hd4.txt ) and put in each .txt file the sum of the value from each FS from each dataX.txt. I've some difficulties to explain in english what I want, so here an example of the result wanted
Expected content for the output file hd1.txt:
2020-01;/dev/hd1;300;/
2020-02;/dev/hd1;360;/
2020-03;/dev/hd1;390:/
Expected content for the file hd2.txt:
2020-01;/dev/hd2;600;/usr
2020-02;/dev/hd2;690;/usr
2020-03;/dev/hd2;720;/usr
The implementation I've currently tried:
for i in $(cat *.txt | awk -F';' '{print $2}' | cut -d '/' -f3| uniq)
do
cat *.txt | grep -w $i | awk -F';' -v date="$(cat *.txt | awk -F';' '{print $1}' | cut -d'-' -f-2 | uniq )" '{sum+=$3} END {print date";"$2";"sum}' >> $i
done
But it doesn't works...
Can you show me how to do that ?
Because the format seems to be so constant, you can delimit the input with multiple separators and parse it easily in awk:
awk -v FS='[;-/]' '
prev != $9 {
if (length(output)) {
print output >> fileoutput
}
prev = $9
sum = 0
}
{
sum += $9
output = sprintf("%s-%s;/%s/%s;%d;/%s", $1, $2, $7, $8, sum, $11)
fileoutput = $8 ".txt"
}
END {
print output >> fileoutput
}
' *.txt
Tested on repl generates:
+ cat hd1.txt
2020-01;/dev/hd1;300;/
2020-02;/dev/hd1;360;/
2020-03;/dev/hd1;390;/
+ cat hd2.txt
2020-01;/dev/hd2;600;/usr
2020-02;/dev/hd2;690;/usr
2020-03;/dev/hd2;720;/usr
Alternatively, you could -v FS=';' and use split to split first and second column to extract the year and month and the hdX number.
If you seek a bash solution, I suggest you invert the loops - first iterate over files, then over identifiers in second column.
for file in *.txt; do
prev=
output=
while IFS=';' read -r date dev num path; do
hd=$(basename "$dev")
if [[ "$hd" != "${prev:-}" ]]; then
if ((${#output})); then
printf "%s\n" "$output" >> "$fileoutput"
fi
sum=0
prev="$hd"
fi
sum=$((sum + num))
output=$(
printf "%s;%s;%d;%s" \
"$(cut -d'-' -f1-2 <<<"$date")" \
"$dev" "$sum" "$path"
)
fileoutput="${hd}.txt"
done < "$file"
printf "%s\n" "$output" >> "$fileoutput"
done
You could also almost translate awk to bash 1:1 by doing IFS='-;/' in while read loop.

Printing awk output in same line after grep

I have a very crude script getinfo.sh that gets me information from all files with name FILENAME1 and FILENAME2 in all subfolders and the path of the subfolder. The awk result should only pick the nth line from FILENAME2 if the script is called with "getinfo.sh n". I want all the info printed in one line!
The problem is that if i use print instead of printf the info is written to a new line but my script works. If i use printf i can see the last bit of the awk command in the command propt after the script ist done, but it is not paset after the grep command in the same line. All in all the complete line would be pretty long, but that is intentionally. Would you be willing to tell me what i am doing wrong?
#!/bin/bash
IFS=$'\n'
while read -r fname ;
do
pushd $(dirname "${fname}") > /dev/null
printf '%q' "${PWD##*/}"
grep 'Search_term ' FILENAME1 | tail -1
awk '{ if(NR==n) printf "%s",$0 }' n=$1 $2 FILENAME2
popd > /dev/null
done < <(find . -type f -name 'FILENAME1')
I would also be happy to grep the nth line if this is easier?
SOLUTION:
#!/bin/bash
IFS=$'\n'
while read -r fname ;
do
pushd $(dirname "${fname}") > /dev/null
{
printf '%q' "${PWD##*/}"
grep 'Search_term' FILENAME1 | tail -1
} | tr -d '\n'
if [ "$1" -eq "$1" ] 2>/dev/null
then
awk '{ if(NR==n) printf "%s",$0 }' n="$1" FILENAME2
fi
printf "\n"
popd > /dev/null
done < <(find . -type f -name 'FILENAME1')
You made it clearer in the comments.
I want the output of printf '%q' "${PWD##*/}" and grep 'Search_term ' FILENAME1 | tail -1 and awk '{ if(NR==n) printf "%s",$0 }' n=$1 $2 FILENAME2 to be printed in one line
So first, we have three commands, that each print a single line of output. As the commands do not matter, let's wrap them in functions to simplify the answer:
cmd1() { printf '%q\n' "${PWD##*/}"; }
cmd2() { grep .... ; }
cmd3() { awk ....; }
To print them without newlines between them, we can:
Use a command substitution, which removes trailing empty newlines. With some printf:
printf "%s%s%s\n" "$(cmd1)" "$(cmd2)" "$(cmd3)"
or some echo:
echo "$(cmd1) $(cmd2) $(cmd3)"
or append to a variable:
str="$(cmd1)"
str+=" $(cmd2)"
str+=" $(cmd3)"
printf" %s\n" "$str"
and so on.
We can remove newlines from the stream, using tr -d '\n':
{
cmd1
cmd2
cmd3
} | tr -d '\n'
echo # newlines were removed, so add one to the end.
or we can also remove the newlines only from the first n-1 commands, but I think this is less readable:
{
cmd1
cmd2
} | tr -d'\n'
cmd3 # the trailing newline will be added by cmd3
If i do not pass a number the awk command should be omited.
I see that your awk command expands both $1 and $2, and i see only $1 to be passed as the n=$1 environment variable to awk. I don't know what is $2. You can write if-s on the value of $# the number of arguments:
if (($# == 2)); then
awk '{ if(NR==n) printf "%s",$0 }' n="$1" "$2" FILENAME2
fi
and similar for each case you want to handle. Remember about proper quoting.
Your command shows the unused parameter $2, I deleted that one.
You can add a newline at the end of the awk using the END block, but you also want an extra newline when you call your script without a line number. echo will do.
#!/bin/bash
IFS=$'\n'
while read -r fname ;
do
pushd $(dirname "${fname}") > /dev/null
# Add result of grep in same printf statement
printf '%s %s' "${PWD##*/}" "$(grep 'Search_term ' FILENAME1 | tail -1)"
if (( $# -eq 1 )); then
# use $1 as an awk variable, number n
# use $2 as a different file to read from
awk -v n=$1 '{ if(NR==n) printf "%s ",$0 }' FILENAME2
fi
# Add line-ending
echo
popd > /dev/null
done < <(find . -type f -name 'FILENAME1')

get user input in awk script and update it in file

I have a students.txt (RollNo, Name, IDU, CGPA), If Roll number exists prompt the user to change the IDU and CGPA and update the same in the file named “Student.txt”
I made the following script:
#! /bin/bash
dispaly(){
awk -F ":" -v roll=$1 '{ if ( $1 == roll) {name = $2; print name; } }
END {if (name == "") print "not found" }' students.txt
}
echo "Enter the roll no."
read rno
if [ $rno -ge 1000 ] && [ $rno -le 9999 ]
then
dispaly $rno
# Now I have a valid $rno and want to update that line
else
echo Enter no between 1000 and 9999
fi
now I need help in taking user input for IDU and CGPA values and update the students.text file with that values against the record found.
In general "-" is used for standard input for awk e.g.
awk '{print($1)}' -
It's not clear to me exactly what you want here. Can't you use additional 'read' statements in the bash part of the script for input of the other 2 values?
first, I grep for roll
grep ^roll students.txt
if found then used awk to replace the records
awk -F : -v rno=$rno -v idu=$idu -v cgpa=$cgpa ' $1==rno { $3=idu;$4=cgpa} OFS=":" ' students.txt > tmp.txt && mv tmp.txt students.txt

Move a file if all 6th fields are 0

I want to be able to check a file to see if all records are 0 within a file and if they are to then move the file.
I have written the script, ran it, no errors, but it does not move the file, can anyone please suggest why?
#!/bin/bash
result=`cat conc_upld_atp.11002.20141204151900.dat | awk -F , '{ print $6 }' | uniq`
if [ result = "1" ]; then
mv conc_upld_atp.11002.20141204151900.dat home/stephenb/scripttest
fi
In Bash = compares strings; to you compare integers you need -eq:
if [ "$result" -eq 1 ]; then
Note that it is preferred to say var=$(command). Also, your command cat file | awk '...' can be simplified to just awk '...' file. And depending on what exactly you want to do, probably awk can handle all of it.
For example, if you just want to check if any of the 6th fields are not 0, just use:
awk '$1 != 0 {v=1} END {print v+0}' file
and then the rest of your code.
However, you can do it in a extremely fast way by using what 999999999999999999999999999999 suggested in comments:
awk -F, '$6!=0{exit 1}' file && mv file newfile
This loops through the file and exits with a code error if any line contains a 6th field different from 0. If this does not happen, awk's exit code is 0, so that the && command is performed and, hence, mv file newfile happens. You can even keep track of the other condition by saying:
awk -F, '$6!=0{exit 1}' file && mv file newfile || echo "bad data"
You're wanting to check if all records are 0, but you specifically check result = "1".
I would recommend two things: use numerical comparison and compare against the correct value:
if (( result == 0 )); then
If you want check whether all fields is 0, then try this script:
result=`cat conc_upld_atp.11002.20141204151900.dat | awk '{ print $6 }' | uniq`
if [ "$result" == "0" ]; then
mv conc_upld_atp.11002.20141204151900.dat /home/stephenb/scripttest
fi
In the end i used the below.
#!/bin/bash
result=`cat conc_upld_atp.11002.20141204151900.dat | awk -F , '{ print $6 }' | uniq`
resultcount=`echo $result | wc -l`
echo $resultcount
if [ $resultcount == "1" ]; then
echo match
mv conc_upld_atp.11002.20141204151900.dat /home/stephenb/scripttest
fi

Retrieve string between characters and assign on new variable using awk in bash

I'm new to bash scripting, I'm learning how commands work, I stumble in this problem,
I have a file /home/fedora/file.txt
Inside of the file is like this:
[apple] This is a fruit.
[ball] This is a sport's equipment.
[cat] This is an animal.
What I wanted is to retrieve words between "[" and "]".
What I tried so far is :
while IFS='' read -r line || [[ -n "$line" ]];
do
echo $line | awk -F"[" '{print$2}' | awk -F"]" '{print$1}'
done < /home/fedora/file.txt
I can print the words between "[" and "]".
Then I wanted to put the echoed word into a variable but i don't know how to.
Any help I will appreciate.
Try this:
variable="$(echo $line | awk -F"[" '{print$2}' | awk -F"]" '{print$1}')"
or
variable="$(awk -F'[\[\]]' '{print $2}' <<< "$line")"
or complete
while IFS='[]' read -r foo fruit rest; do echo $fruit; done < file
or with an array:
while IFS='[]' read -ra var; do echo "${var[1]}"; done < file
In addition to using awk, you can use the native parameter expansion/substring extraction provided by bash. Below # indicates a trim from the left, while % is used to trim from the right. (note: a single # or % indicates removal up to the first occurrence, while ## or %% indicates removal of all occurrences):
#!/bin/bash
[ -r "$1" ] || { ## validate input is readable
printf "error: insufficient input. usage: %s filename\n" "${0##*/}"
exit 1
}
## read each line and separate label and value
while read -r line || [ -n "$line" ]; do
label=${line#[} # trim initial [ from left
label=${label%%]*} # trim through ] from right
value=${line##*] } # trim from left through '[ '
printf " %-8s -> '%s'\n" "$label" "$value"
done <"$1"
exit 0
Input
$ cat dat/labels.txt
[apple] This is a fruit.
[ball] This is a sport's equipment.
[cat] This is an animal.
Output
$ bash readlabel.sh dat/labels.txt
apple -> 'This is a fruit.'
ball -> 'This is a sport's equipment.'
cat -> 'This is an animal.'

Resources