How to search the full string in file which is passed as argument in shell script? - linux

i am passing a argument and that argument i have to match in file and extract the information. Could you please how I can get it?
Example:
I have below details in file-
iMedical_Refined_load_Procs_task_id=970113
HV_Rawlayer_Execution_Process=988835
iMedical_HV_Refined_Load=988836
DHS_RawLayer_Execution_Process=988833
iMedical_DHS_Refined_Load=988834
If I am passing 'hv' as argument so it should to pick 'iMedical_HV_Refined_Load' and give the result - '988836'
If I am passing 'dhs' so it should pick - 'iMedical_DHS_Refined_Load' and give the result = '988834'
I tried below logic but its not giving the result correctly. What Changes I need to do-
echo $1 | tr a-z A-Z
g=${1^^}
echo $g
echo $1
val=$(awk -F= -v s="$g" '$g ~ s{print $2}' /medaff/Scripts/Aggrify/sltconfig.cfg)
echo "TASK ID is $val"

Assuming your matching criteria is the first string after delimiter _ and the output needed is the numbers after the = char, then you can try this sed
$ sed -n "/_$1/I{s/[^=]*=\(.*\)/\1/p}" input_file
$ read -r input
hv
$ sed -n "/_$input/I{s/[^=]*=\(.*\)/\1/p}" input_file
988836
$ read -r input
dhs
$ sed -n "/_$input/I{s/[^=]*=\(.*\)/\1/p}" input_file
988834

If I'm reading it right, 2 quick versions -
$: cat 1
awk -F= -v s="_${1^^}_" '$1~s{print $2}' file
$: cat 2
sed -En "/_${1^^}_/{s/^.*=//;p;}" file
Both basically the same logic.
In pure bash -
$: cat 3
while IFS='=' read key val; do [[ "$key" =~ "_${1^^}_" ]] && echo "$val"; done < file
That's a lot less efficient, though.
If you know for sure there will be only one hit, all these could be improved a bit by short-circuit exits, but on such a small sample it won't matter at all. If you have a larger dataset to read, then I strongly suggest you formalize your specs better than "in this set I should get...".

Related

Print second last line from variable in bash

VAR="1\n2\n3"
I'm trying to print out the second last line. One liner in bash!
I've gotten so far: printf -- "$VAR" | head -2
It however prints out too much.
I can do this with a file no problem: tail -2 ~/file | head -1
You almost done this task by yourself. Try
VAR="1\n2\n3"; printf -- "$VAR"|tail -2|head -1
Here is one pure bash way of doing this:
readarray -t arr < <(printf -- "$VAR") && echo "${arr[-2]}"
2
You may also use this awk as a single command:
VAR="1\n2\n3"
awk -F '\\\\n' '{print $(NF-1)}' <<< "$VAR"
2
maybe more efficient using a temporary variable and using expansions
var=$'1\n2\n3' ; tmpvar=${var%$'\n'*} ; echo "${tmpvar##*$'\n'}"
Use echo -e for backslash interpretation and to translate \n to newlines and print the interested line number using NR.
$ echo -e "${VAR}" | awk 'NR==2'
2
With multiple lines and do, tail and head can be used to print any particular line number.
$ echo -e "$VAR" | tail -2 | head -1
2
or do a fancy sed, where you keep the previous line in the buffer-space (x) to print and keep deleting until the last line,
$ echo -e "$VAR" | sed 'x;$!d'
2

How to use uniq after printf

I have lot of file which I need to concatenate together with same prefix. I have an idea, but I do not know how to solve this problem:
files:
NAME1_C001_xxx.tsv
NAME1_C001_yyy.tsv
NAME2_C001_xxx.tsv
NAME2_C001_yyy.tsv
I want to print just uniq prefix - NAME1 and NAME2. Length of string in prefix and suffix is vary, but always before prefix is _C001
my solution is:
fo i in *.tsv
do prexix=$(printf "%s\n" "${i%_C001*}")
cat $prefix_C001_xxx.tsv $prefix_C001_yyy.tsv > ${i%_C001*}.merged.tsv
done;
But this solution is not very good. I have each prefix twice.
Thank you for any help.
EDITED:
One solution thanks to anubhava:
fo i in $(printf "%s\n" *.tsv | awk -F '_C001' '!seen[$1]++{print $1}')
do
cat $prefix_C001_xxx.tsv $prefix_C001_yyy.tsv > ${i%_C001*}.merged.tsv
done;
You don't need printf at all here; it's just an unnecessary wrapper around the parameter substitution you are already using.
for i in *.tsv
do prefix=${i%_C001*}
[[ -f $prefix.merged.tsv ]] && continue # Avoid doing the same prefix twice
cat "${prefix}"_* > "$prefix.merged.tsv"
done
As your filenames don't contain any newline you can pipe your list to a awk command to print unique prefixes using field separator as _C001:
printf "%s\n" *.tsv | awk -F '_C001' '!seen[$1]++{print $1}'
NAME1
NAME2
You can also use _ as FS in awk:
printf "%s\n" *.tsv | awk -F _ '!seen[$1]++{print $1}'

Extract substring after a character

I'm trying to extract substring after the last period (dot).
examples below.
echo "filename..txt" should return "txt"
echo "filename.txt." should return ""
echo "filename" should return ""
echo "filename.xml" should return "xml"
I tried below. but works only if the character(dot) exists once. But my filename may have (dot) for 0 or more times.
echo "filename.txt" | cut -d "." -f2
Let's use awk!
awk -F"." '{print (NF>1)? $NF : ""}' file
This sets field separator to . and prints the last one. But if there is none, it prints an empty string.
Test
$ cat file
filename..txt
filename.txt.
filename
filename.xml
$ awk -F"." '{print (NF>1)? $NF : ""}' file
txt
xml
One can make this portable (so it's not Linux-only), avoiding an ERE dependency, with the following:
$ sed -ne 's/.*\.//p' <<< "file..txt"
txt
$ sed -ne 's/.*\.//p' <<< "file.txt."
$ sed -ne 's/.*\.//p' <<< "file"
$ sed -ne 's/.*\.//p' <<< "file.xml"
xml
Note that for testing purposes, I'm using a "here-string" in bash. If your shell is not bash, use whatever your shell uses to feed data to sed.
The important bit here is the use of sed's -n option, which tells it not to print anything by default, combined with the substitute command's explicit p flag, which tells sed to print only upon a successful substitution, which obviously requires a dot to be included in the pattern.
With this solution, the difference between "file.txt." and "file" is that the former returns the input line replaced with null (so you may still get a newline depending on your usage), whereas the latter returns nothing, as sed is not instructed to print, as no . is included in the input. The end result may well be the same, of course:
$ printf "#%s#\n" $(sed -ne 's/.*\.//p' <<< "file.txt.")
##
$ printf "#%s#\n" $(sed -ne 's/.*\.//p' <<< "file")
##
Simple to do with awk:
awk -F"." '{ print $NF }'
What this does: With dot as a delimiter, extract the last field from the input.
Use sed in 2 steps: first remove string without a dot and than remove up to the last dot:
sed -e 's/^[^.]*$//' -e 's/.*\.//'
Test:
for s in file.txt.. file.txt. file.txt filename file.xml; do
echo "$s -> $(echo "$s" | sed -e 's/^[^.]*$//' -e 's/.*\.//')"
done
Testresult:
file.txt.. ->
file.txt. ->
file.txt -> txt
filename ->
file.xml -> xml
Actually the answer of #ghoti is roughly the same, just a bit shorter (better).
This solution can be used by other readers who wants to do something like this in another language.

concatenate the result of echo and a command output

I have the following code:
names=$(ls *$1*.txt)
head -q -n 1 $names | cut -d "_" -f 2
where the first line finds and stores all names matching the command line input into a variable called names, and the second grabs the first line in each file (element of the variable names) and outputs the second part of the line based on the "_" delim.
This is all good, however I would like to prepend the filename (stored as lines in the variable names) to the output of cut. I have tried:
names=$(ls *$1*.txt)
head -q -n 1 $names | echo -n "$names" cut -d "_" -f 2
however this only prints out the filenames
I have tried
names=$(ls *$1*.txt
head -q -n 1 $names | echo -n "$names"; cut -d "_" -f 2
and again I only print out the filenames.
The desired output is:
$
filename1.txt <second character>
where there is a single whitespace between the filename and the result of cut.
Thank you.
Best approach, using awk
You can do this all in one invocation of awk:
awk -F_ 'NR==1{print FILENAME, $2; exit}' *"$1"*.txt
On the first line of the first file, this prints the filename and the value of the second column, then exits.
Pure bash solution
I would always recommend against parsing ls - instead I would use a loop:
You can avoid the use of awk to read the first line of the file by using bash built-in functionality:
for i in *"$1"*.txt; do
IFS=_ read -ra arr <"$i"
echo "$i ${arr[1]}"
break
done
Here we read the first line of the file into an array, splitting it into pieces on the _.
Maybe something like that will satisfy your need BUT THIS IS BAD CODING (see comments):
#!/bin/bash
names=$(ls *$1*.txt)
for f in $names
do
pattern=`head -q -n 1 $f | cut -d "_" -f 2`
echo "$f $pattern"
done
If I didn't misunderstand your goal, this also works.
I've always done it this way, I just found out that this is a deprecated way to do it.
#!/bin/bash
names=$(ls *"$1"*.txt)
for e in $names;
do echo $e `echo "$e" | cut -c2-2`;
done

grep -o: Keep input line format

$ echo "abca\ndeaf" | grep -o a
a
a
a
I am looking for the output:
aa
a
Or perhaps
a a
a
or even
a<TAB>a
a
(this is a very very simplified example)
I just want it not to throw away the line grouping.
You can do it with sed by removing any character that isn't a:
echo "abca\ndeaf" | sed 's/[^a]//g'
aa
a
It can't be done with grep alone.
#sudo_O's answer shows how to do this with single-character strings. The difficulty level is raised if you want to match longer strings.
One way to do it is by parsing the output of grep -n -o, like so:
$ cat mgrep
#!/bin/bash
# Print each match along with its line number.
grep -no "$#" | {
matches=() # An array of matches to be printed when the line number changes.
lastLine= # Keep track of the current and previous line numbers.
# Read the matches, with `:' as the separator.
while IFS=: read line match; do
# If this is the same line number as the previous match, add this one to
# the list.
if [[ $line = $lastLine ]]; then
matches+=("$match")
# Otherwise, print out the list of matches we've accumulated and start
# over.
else
(( ${#matches[#]} )) && echo "${matches[#]}"
matches=("$match")
fi
lastLine=$line
done
# Print any remaining matches.
(( ${#matches[#]} )) && echo "${matches[#]}"
}
Example usage:
$ echo $'abca\ndeaf' | ./mgrep a
a a
a
$ echo $'foo bar foo\nbaz\ni like food' | ./mgrep foo
foo foo
foo
Based off John Kugelman's solution, this one works with one input file and gawk
grep -on abc file.txt | awk -v RS='[[:digit:]]+:' 'NF{$1=$1; print}'
If you're willing to use perl:
$ echo $'abca\ndeaf' | perl -ne '#m = /a/g; print "#m\n"'
a a
a

Resources