Shell script to extract 2 values from all files? - linux

I have a directory full of files like this:
[Location]
state=California
city=Palo Alto
[Outlet]
id=23
manager=John Doe
I want to write a small script, that outputs one line for each file like this:
John Doe,Palo Alto
How do I do that? I suspect some grep and looping. So far I have:
#!/bin/bash
echo Manager,City > result.txt
for f in *.config
do
cat "$f" | grep manager= >> result.txt
cat "$f" | grep city= >> result.txt
done
but that's of course incomplete since grep returns the whole line on its own line and I only want the part after the first = sign.

echo Manager,City > result.txt
for f in *.config; do
manager=$(awk -F= '$1=="manager" {print $2}' "$f")
city=$( awk -F= '$1=="city" {print $2}' "$f")
echo "$manager,$city"
done >> result.txt
awk -F= uses an equal sign as the field separator, and then checks for the desired variables ($1) and prints their values ($2). $(cmd) captures the output of a command and yields strings that can be assigned to the two variables $manager and $city.

Similar to John Kugelman's answer but using grep.
echo Manager,City > result.txt
for file in *.config; do
name=$(grep -oP '(?<=manager\=).*' "$file")
location=$(grep -oP '(?<=city\=).*' "$file")
echo "$name,$location"
done >> result.txt

You can do this with a single awk command, as per the following transcript:
pax> cat 1.config
[Location]
state=California
city=Palo Alto
[Outlet]
id=23
manager=John Doe
pax> cat 2.config
[Location]
state=Western Australia
city=Perth
[Outlet]
id=24
manager=Pax Diablo
pax> awk '
/^city=/ {gsub (/^city=/, "", $0); city=$0}
/^manager=/{gsub(/^manager=/, "", $0); print $0 "," city}
' *.config
John Doe,Palo Alto
Pax Diablo,Perth
Note that this assumes the city comes before the manager, and that all files have both city and manager. If those assumptions are incorrect, the awk script becomes a little more complex but it's still doable.
In that case, it becomes something like:
awk '
FNR==1 {city = ""; mgr = ""}
/^city=/ {gsub (/^city=/, "", $0); city = $0}
/^manager=/ {gsub (/^manager=/, "", $0); mgr = $0}
{if (city!="" && mgr!=""){
print mgr "," city; city = ""; mgr = "";
}}
' *.config
What this does is to make the order irrelevant. It resets the city and manager variables to empty string at the start of each file and just stores them in the cases where it finds the relevant lines. After every line, if both are set, it prints and clears them.

Related

How to get 1st field of a file only when 2nd field matches a string?

How to get 1st field of a file only when 2nd field matches a given string?
#cat temp.txt
Ankit pass
amit pass
aman fail
abhay pass
asha fail
ashu fail
cat temp.txt | awk -F"\t" '$2 == "fail" { print $1 }'*
gives no output
Another syntax with awk:
awk '$2 ~ /^faild$/{print $1}' input_file
A deleted 'cat' command.
^ start string
$ end string
It's the best way to match patten.
Either:
Your fields are not tab-separated or
You have blanks at the end of the relevant lines or
You have DOS line-endings and so there are CRs at the end of every
line and so also at the end of every $2 in every line (see
Why does my tool output overwrite itself and how do I fix it?)
With GNU cat you can run cat -Tev temp.txt to see tabs (^I), CRs (^M) and line endings ($).
Your code seems to work fine when I remove the * at the end
cat temp.txt | awk -F"\t" '$2 == "fail" { print $1 }'
The other thing to check is if your file is using tab or spaces. My copy/paste of your data file copied spaces, so I needed this line:
cat temp.txt | awk '$2 == "fail" { print $1 }'
The other way of doing this is with grep:
cat temp.txt | grep fail$ | awk '{ print $1 }'

Get the last two letters of each line in a file using script shell

I have a .txt file with 20 lines, and would love to get the last two letters of each line. it equals AA in every line then print Good. if not, print Bad.
line11111111111111111 AA
line22222222222222222 AA
line33333333333333333 AA
.....................
line20202020202020202 AA
This is GOOD.
===========================
line11111111111111111 AB
line22222222222222222 AC
line33333333333333333 WD
.....................
line20202020202020202 ZZ
This is BAD.
Did this but needs improvement : sed 's/^.*\(.\{2\}\)/\1/'
based on your file layout
$ awk '$NF!="AA"{f=1; exit} END{print (f?"BAD":"GOOD")}' file
note that you don't have to check the rest after the first non "AA" token.
You may use a single command awk:
awk 'substr($0, length()-1) != "AA"{exit 1}' file && echo "GOOD" || echo "BAD"
substr($0, length()-1) will extract last 2 characters of every line. awk command will exit with 1 if we don't fine AA in any line.
Use a grep invert-match to identify lines not ending with "AA":
if egrep -q -v AA$ input.txt; then echo "bad"; else echo "good";fi
This script shoud work with awk. The name of the txt file for me is .test you can change it with your file name.
if [ "$(awk '{ print $2 }' .test | uniq)" = "AA" ]; then
echo "This is GOOD";
else echo "This is BAD";
fi
How it works:
First, awk is used to get the second column by awk '{ print $2 }' and using uniq command we are taking unique entries from each line. If all the lines are AA uniq makes it only 1 line. At last we check whether this last product is only "AA" (1 line string with 2 As) or not.
Solution with grep and wc:
if [ "`grep -v 'AA$' your-file-here | wc -l`" == "0" ] ; then echo 'GOOD' ; else echo 'BAD' ; fi
The grep command checks for all lines not ending with AA and wc counts how many lines.

unix concatenate list of files into on line

In a directory, there is several files such as:
file1
file2
file3
Is there a simple way to concatenate those files to get one line (connected by "OR") in bash as follows:
file1 OR file2 OR file3
Or do I need to write a script for it?
You can use this function to print all filenames (including ones with space, newline or special characters) with " OR " as separator (assuming your filename doesn't contain ASCII code 4):
orfiles() {
local IFS=$'\4'
local out="$*"
echo "${out//$'\4'/ OR }"
}
Then call it as:
orfiles *
How it works:
We set IFS (Internal Field Separator) to ASCII 4 locally inside the function
We store output of "$*" in local variable out. This will place \4 after each filename in variable $out.
Finally using BASH string substitution we globally replace \4 by " OR " while printing the output from $out.
In Unix systems IFS is only a single character delimiter therefore it cannot store multi character string " OR " and we have to do this in 2 steps as shown above.
You can simply do that with
printf '%s OR ' $(ls -1 *) | sed 's/OR $/''/'; echo -e '\n'
Where ls -1 * is the directory.
The moment that should be considered is that a filename could contain whitespace(s).
Use the following ls + awk solution:
ls -1 * | awk '{ r=(r)? r" OR "$0 : $0 }END{ print r }'
Workaround for filenames with newline(s):
echo -e $(ls -1b hello* | awk -v RS= '{gsub(/\n/," OR ",$0); gsub(/\\ /," ",$0); print $0}')
-b - ls option to print C-style escapes for nongraphic characters
ls -1|awk -v q='"' '{printf "%s%s", NR==1?"":" OR ", q $0 q}END{print ""}'
the ls & awk way to do it, with example that the filename containing spaces:
kent$ ls -1
file1
file2
'file with OR and space'
kent$ ls -1|awk -v q='"' '{printf "%s%s", NR==1?"":" OR ", q $0 q}END{print ""}'
"file1" OR "file2" OR "file with OR and space"
$ for f in *; do printf '%s%s' "$s" "$f"; s=" OR "; done; printf '\n'
file1 OR file2 OR file3

How to extract words between two characters in linux?

I have the following stored in a file named tmp.txt
user/config/jars/content-config-factory-3.2.0.0.jar
I need to store this word to a variable -
$variable=content-config-factory
I have written the following
while read line
do
var=$(echo $line | awk 'BEGIN{FS="\/"; OFS=" "} {print $NF}' )
var=$(echo $var | awk 'BEGIN{FS="-"; OFS=" "} {print $(1)}' )
echo $var
done < tmp.txt
This returns the result "content" instead of "content-config-factory".
Can anyone please tell me how to extract a word between two characters from a string efficiently.
An awk solution would be like
awk -F/ '{sub("-[^-]+$", "", $NF); print $NF}
Test
$ echo "user/config/jars/content-config-factory-3.2.0.0.jar" | awk -F/ '{sub("-[^-]+$", "", $NF); print $NF}'
content-config-factory
You can try this way also and get your expected result
variable=$(sed 's:.*/\(.*\)-.*:\1:' FileName)
echo $variable
OutPut :
content-config-factory
You could use grep,
grep -oP '(?<=/)[^/]*(?=-\d+\.)' file
Example:
$ var=$(echo 'user/config/jars/content-config-factory-3.2.0.0.jar' | grep -oP '(?<=/)[^/]*(?=-\d+\.)')
$ echo "$var"
content-config-factory

How do I find the count of multiple words in a text file?

I am able to find the number of times a word occurs in a text file, like in Linux we can use:
cat filename|grep -c tom
My question is, how do I find the count of multiple words like "tom" and "joe" in a text file.
Since you have a couple names, regular expressions is the way to go on this one. At first I thought it was as simple as just a grep count on the regular expression of joe or tom, but fount that this did not account for the scenario where tom and joe are on the same line (or tom and tom for that matter).
test.txt:
tom is really really cool! joe for the win!
tom is actually lame.
$ grep -c '\<\(tom\|joe\)\>' test.txt
2
As you can see from the test.txt file, 2 is the wrong answer, so we needed to account for names being on the same line.
I then used grep -o to show only the part of a matching line that matches the pattern where it gave the correct pattern matches of tom or joe in the file. I then piped the results into number of lines into wc for the line count.
$ grep -o '\(joe\|tom\)' test.txt|wc -l
3
3...the correct answer! Hope this helps
Ok, so first split the file into words, then sort and uniq:
tr -cs '[:alnum:]' '\n' < testdata | sort | uniq -c
You use uniq:
sort filename | uniq -c
Use awk:
{for (i=1;i<=NF;i++)
count[$i]++
}
END {
for (i in count)
print count[i], i
}
This will produce a complete word frequency count for the input.
Pipe tho output to grep to get the desired fields
awk -f w.awk input | grep -E 'tom|joe'
BTW, you do not need cat in your example, most programs that acts as filters can take the filename as an parameter; hence it's better to use
grep -c tom filename
if not, there is a strong possibility that people will start throwing Useless Use of Cat Award at you ;-)
The sample you gave does not search for words "tom". It will count "atom" and "bottom" and many more.
Grep searches for regular expressions. Regular expression that matches word "tom" or "joe" is
\<\(tom\|joe\)\>
You could do regexp,
cat filename |tr ' ' '\n' |grep -c -e "\(joe\|tom\)"
Here is one:
cat txt | tr -s '[:punct:][:space:][:blank:]'| tr '[:punct:][:space:][:blank:]' '\n\n\n' | tr -s '\n' | sort | uniq -c
UPDATE
A shell script solution:
#!/bin/bash
file_name="$2"
string="$1"
if [ $# -ne 2 ]
then
echo "Usage: $0 <pattern to search> <file_name>"
exit 1
fi
if [ ! -f "$file_name" ]
then
echo "file \"$file_name\" does not exist, or is not a regular file"
exit 2
fi
line_no_list=("")
curr_line_indx=1
line_no_indx=0
total_occurance=0
# line_no_list contains loc k the line number loc k+1 the number
# of times the string occur at that line
while read line
do
flag=0
while [[ "$line" == *$string* ]]
do
flag=1
line_no_list[line_no_indx]=$curr_line_indx
line_no_list[line_no_indx+1]=$((line_no_list[line_no_indx+1]+1))
total_occurance=$((total_occurance+1))
# remove the pattern "$string" with a null" and recheck
line=${line/"$string"/}
done
# if we have entered the while loop then increment the
# line index to access the next array pos in the next
# iteration
if (( flag == 1 ))
then
line_no_indx=$((line_no_indx+2))
fi
curr_line_indx=$((curr_line_indx+1))
done < "$file_name"
echo -e "\nThe string \"$string\" occurs \"$total_occurance\" times"
echo -e "The string \"$string\" occurs in \"$((line_no_indx/2))\" lines"
echo "[Occurence # : Line Number : Nos of Occurance in this line]: "
for ((i=0; i<line_no_indx; i=i+2))
do
echo "$((i/2+1)) : ${line_no_list[i]} : ${line_no_list[i+1]} "
done
echo
I completely forgot about grep -f:
cat filename | grep -fc names
AWK solution:
Assuming the names are in a file called names:
cat filename | awk 'NR==FNR {h[NR] = $1;ct[i] = 0; cnt=NR} NR !=FNR {for(i=1;i<=cnt;++i) if(match($0,h[i])!=0) ++ct[i] } END {for(i in h) print h[i], ct[i]}' names -
Note that your original grep doesn't search for words. e.g.
$ echo tomorrow | grep -c tom
1
You need grep -w
gawk -vRS='[^[:alpha:]]+' '{print}' | grep -c '^(tom|joe|bob|sue)$'
The gawk program sets the record separator to anything non-alphabetic, so every word will end up on a separate line. Then grep counts lines that match one of the words you want exactly.
We use gawk because the POSIX awk doesn't allow regex record separator.
For brevity, you can replace '{print}' with 1 - either way, it's an Awk program that simply prints out all input records ("is 1 true? it is? then do the default action, which is {print}.")
To find all hits in all lines
echo "tom is really really cool! joe for the win!
tom is actually lame." | akw '{i+=gsub(/tom|joe/,"")} END {print i}'
3
This will count "tomtom" as 2 hits.

Resources