bash - Diff a command with a file (specific) - linux

so its pretty hard to describe for me what I want to do, but I'll try it:
(Because of some private information I changed the names)
I want to "diff" a command output with a text file created from me.
The command output looks like:
'Blabla1' '12.34.56.78' (24 objects + dependencies), STATUS: 'RUNNING'
'Blabla3' '12.34.56.89' (89 objects + dependencies), STATUS: 'RUNNING'
And the txtfile:
Blabla1
Blabla2
If it finds Blabla1 anywhere in the command output its fine. But you see, he will not find Blabla2 anywhere in the command output and this difference I want as an output.
I hope you understand what I mean and you could possible help me.
Greetings,
Can
UPDATE::::
#hek2mgl
So my command is:
./factory.sh listapplications | grep -i running
This command shows this:
'ftp' '1' (7 objects + dependencies), STATUS: 'RUNNING' - 'XSD Da
'abc' '5.1.0' (14 objects + dependencies), STATUS: 'RUNNING' - '2017-10-13: Fix fuer Bug 2150'
'name' '1.0.2' (5 objects + dependencies), STATUS: 'RUNNING'
And I want to compare that output with my textfile:
ftp
abc
name
missing
alsomissing
So if I compare this 2 now it should check if he finds the words from my textfile ANYWHERE in the command output. If it does find it anywhere -> not output.
And as you see he'll not find "missing" and "alsomissing". I want this two as an output at the end.

What you might be interested in is grep in combination with 'process substitution'. If your file with patterns is file.txt and your command to execute is cmd then you can use
grep -o -F -f file.txt <(cmd) | grep -v -F -f - file.txt
This will output the patterns is file.txt which are not matched in the output of cmd.
In case of the Blabla example, the above line will output
Blabla2
How it works is the following. The first part will search for all patterns listed in file.txt in the output of cmd and will only output the matched parts. This means that
% grep -o -F -f file.txt <(cmd)
Blabla1
This output is now piped to another command that will try to find all lines in file.txt which do not match any of the patterns comming from the pipe (-f -)
% grep -o -F -f file.txt <(cmd) | grep -v -F -f - file.txt
Blabla2

So ... this seems to do it, using bash process substitution:
$ cat file1
'Blabla1' '12.34.56.78' (24 objects + dependencies), STATUS: 'RUNNING'
'Blabla3' '12.34.56.89' (89 objects + dependencies), STATUS: 'RUNNING'
$ cat file2
Blabla1
Blabla2
$ grep -vFf <(awk '{gsub(/[^[:alnum:]]/,"",$1);print $1}' file1) file2
Blabla2
The awk script takes the first field, strips non-alphanumeric characters from it (i.e. the single quotes) and outputs just that first field. The grep option -f uses the "virtual" file created by the aforementioned process substitution as a list of fixed strings to search for within the input file (file2), and the -v reverses the search, showing you only what was not found.
If the regex in the gsub() is too greedy, you might replace it with something like $1=substr($1,2,length($1)-2).
You could alternately do this in (POSIX) awk alone, without relying on bash process substitution:
$ awk 'NR==FNR{a[substr($1,2,length($1)-2)];next} $1 in a{next} 1' file1 file2
Blabla2
This reads the stripped first field of file1 into the keys of an array, then for each line of file2 checks for the existence of that key in the array, skipping lines that match and printing any left over. (The 1 at the end of the script is short-hand for "print this line".)

You can also use awk only:
awk '
# Store patterns of text.file in an array (p)atterns.
# Initialize their count of occurrence with 0
NR==FNR{
p[$0]=0
next
}
# Replace the quotes around BlaBla... in cmd output.
# Increase the count of occurrence of the pattern
{
gsub("'\''", "")
p[$1]++
}
# At the end of the input print those patterns which
# did not appear in cmd output, meaning their count of
# occurrence is zero.
END{
for(i in p){
if(p[i]==0){
print i
}
}
}' text.file cmd.txt
PS: Alternatively you use process substitution instead of storing the command output in a file. Replace cmd.txt by <(cmd) then.

Related

Pass command-line arguments to grep as search patterns and print lines which match them all

I'm learning about grep commands.
I want to make a program that when a user enters more than one word, outputs a line containing the word in the data file.
So I connected the words that the user typed with '|' and put them in the grep command to create the program I intended.
But this is OR operation. I want to make AND operation.
So I learned how to use AND operation with grep commands as follows.
cat <file> | grep 'pattern1' | grep 'pattern2' | grep 'pattern3'
But I don't know how to put the user input in the 'pattern1', 'pattern2', 'pattern3' position. Because the number of words the user inputs is not determined.
As user input increases, grep must be executed using more and more pipes, but I don't know how to build this part.
The user input is as follows:
$ [the name of my program] 'pattern1' 'pattern2' 'pattern3' ...
I'd really appreciate your help.
With grep -f you can grep multiple items, when each of them is on a line in a file.
With <(command) you can let Bash think that the result of command is a file.
With printf "%s\n" and a list of arguments, each argument is printed on a new line.
Together:
grep -f <(printf "%s\n" "$#") datafile
suggesting to use awk pattern logic:
awk '/RegExp-pattern-1/ && /RegExp-pattern-2/ && /RegExp-pattern-3/ 1' input.txt
The advantages: you can play with logic operators && || on RegExp patterns. And your are scanning the whole file once.
The disadvantages: must provide files list (can't traverse sub directories), and limited RegExp syntax compared to grep -E or grep -P
In principle, what you are asking could be done with a loop with output to a temporary file.
file=inputfile
temp=$(mktemp -d -t multigrep.XXXXXXXXX) || exit
trap 'rm -rf "$temp"' ERR EXIT
for regex in "$#"; do
grep "$regex" "$file" >"$temp"/output
mv "$temp"/output "$temp"/input
file="$temp"/input
done
cat "$temp"/input
However, a better solution is probably to arrange for Awk to check for all the patterns in one go, and avoid reading the same lines over and over again.
Passing the arguments to Awk with quoting intact is not entirely trivial. Here, we simply pass them as command-line arguments and process those into an array within the Awk script itself.
awk 'BEGIN { for(i=1; i<ARGC; ++i) a[i]=ARGV[i];
ARGV[1]="-"; ARGC=1 }
{ for(n=1; n<=i; ++n) if ($0 !~ a[n]) next; }1' "$#" <file
In brief, in the BEGIN block, we copy the command-line arguments from ARGV to a, then replace ARGV and ARGC to pass Awk a new array of (apparent) command-line arguments which consists of just - which means to read standard input. Then, we simply iterate over a and skip to the next line if the current input line from standard input does not match. Any remaining lines have matched all the patterns we passed in, and are thus printed.

Fetching the value of variable stored in a file

I am trying to fetch the output of a variable stored in a file in another shell script.
Example:
cat abc.log
var1=2
var2=2
var3=25
I am writing a script to fetch the value of var3.
Thank you in advance.
awk -F= '$1 ~ /^[[:space:]]*var3/ { print $2 }' abc.log
Set the field delimiter to = and then where the line contains "var3", print the second field.
Alternatively, you could:
source abc.log
and then:
echo $var3
Using sed you can isolate 25 with particularity with:
sed -n '/^[[:space:]]*var3=/s/^[^=]*=//p' file
Explanation
This is the general substitution form s/find/replace/ with a matching expression preceding it. The total form is /match/s/find/replace/. The option -n suppresses the normal printing of pattern-space and the p at the end tells sed to print the line where the match and substitution took place. Specifically,
/match/ locates a line with any number of preceding whitespace characters followed by var3=. The POSIX [:space:] character class matches any whitespace,
the /find/ is all characters anchored from the '^' beginning that are not the [^=] character and then match the literal '=' character, and finally
the /replace/ is the empty-string leaving the 25 alone which is printed.
Example Use/Output
$ sed -n '/^[[:space:]]*var3=/s/^[^=]*=//p' file
25
A grep one-liner, if your grep has support for Perl-compatible regular expressions (the -P option; not all greps support that)
grep -Po '^\s*var3=\K.*' abc.log
or,
grep -Po '^\s*var3=\K.*' abc.log | tail -n1
in order to get the last value of the var3, if multiple var3s is a possibility.

Grep a word out of a file and save the file as that word

I am using Ubuntu Linux and grepping info out of a file (lets say filename.log) and want to save the file using some of the info inside of (filename.log).
example:
The info in the (filename.log) has version_name and date.
When displaying this info on screen using cat it will display:
version_name=NAME
date=TODAY
I then want to save the file as NAME-TODAY.log and have no idea how to do this.
Any help will be appreciated
You can chain a bunch of basic linux commands with the pipe character |. Combined with a thing called command substitution (taking the output of a complex command, to use in another command. syntax: $(your command)) you can achieve what you want to do.
This is what I came up with, based on your question:
cp filename.log $(grep -E "(version_name=)|(date=)" filename.log | cut -f 2 -d = | tr '\n' '-' | rev | cut -c 2- | rev).log
So here I used cp, $(), grep, cut, tr and finally rev.
Since you said you had no idea where to start, let me walk you trough this oneliner:
cp - it is used to copy the filename.log file to a new file,
with the name based on the values of version_name and date (step 2 and up)
command substitution $() the entire command between the round brackets is 'resolved' before finishing the cp command in step 1. e.g. in your example it would be NAME-TODAY. notice the .log at the end outside of the round brackets to give it a proper file extension. The output of this command in your example will be NAME-TODAY.log
grep -E "(version_name=)|(date=)" grep with regexp flag -E to be able to do what we are doing. Matches any lines that contain version_name= OR date=. The expected output is:
version_name=NAME
date=TODAY
cut -f 2 -d = because I am not interested in version_name
, but instead in the value associated with that field, I use cut to split the line at the equals character = with the flag -d =. I then select the value behind the equals character (the second field) with the flag -f 2. The expected output is:
NAME
TODAY
tr '\n' '-' because grep outputs on multiple lines, I want to remove all new lines and replace them with a dash. Expected output:
NAME-TODAY-
rev | cut -c 2- | rev I am grouping these. rev reverses the word I have created. with cut -c 2- I cut away all characters starting from the second character of the reversed word. This is required because I replaced new lines with dashes and this means I now have NAME-TODAY-. Basicly this is just an extra step to remove the last dash. See expected outputs of each step:
-YADOT-EMAN
YADOT-EMAN
NAME-TODAY
remember this value is in the command substituion of step 2, so the end result will be:
cp filename.log NAME-TODAY.log
I manged to solve this by doing the following: grep filename.log > /tmp/file.info && filename=$(echo $(grep "version_name" /tmp/filename.info | cut -d " " -f 3)-$(grep "date" /tmp/filename.info | cut -d " " -f 3)-$filename.log

Delete lines from a file matching first 2 fields from a second file in shell script

Suppose I have setA.txt:
a|b|0.1
c|d|0.2
b|a|0.3
and I also have setB.txt:
c|d|200
a|b|100
Now I want to delete from setA.txt lines that have the same first 2 fields with setB.txt, so the output should be:
b|a|0.3
I tried:
comm -23 <(sort setA.txt) <(sort setB.txt)
But the equality is defined for whole line, so it won't work. How can I do this?
$ awk -F\| 'FNR==NR{seen[$1,$2]=1;next;} !seen[$1,$2]' setB.txt setA.txt
b|a|0.3
This reads through setB.txt just once, extracts the needed information from it, and then reads through setA.txt while deciding which lines to print.
How it works
-F\|
This sets the field separator to a vertical bar, |.
FNR==NR{seen[$1,$2]=1;next;}
FNR is the number of lines read so far from the current file and NR is the total number of lines read. Thus, when FNR==NR, we are reading the first file, setB.txt. If so, set the value of associative array seen to true, 1, for the key consisting of fields one and two. Lastly, skip the rest of the commands and start over on the next line.
!seen[$1,$2]
If we get to this command, we are working on the second file, setA.txt. Since ! means negation, the condition is true if seen[$1,$2] is false which means that this combination of fields one and two was not in setB.txt. If so, then the default action is performed which is to print the line.
This should work:
sed -n 's#\(^[^|]*|[^|]*\)|.*#/^\1/d#p' setB.txt |sed -f- setA.txt
How this works:
sed -n 's#\(^[^|]*|[^|]*\)|.*#/^\1/d#p'
generates an output:
/^c|d/d
/^a|b/d
which is then used as a sed script for the next sed after the pipe and outputs:
b|a|0.3
(IFS=$'|'; cat setA.txt | while read x y z; do grep -q -P "\Q$x|$y|\E" setB.txt || echo "$x|$y|$z"; done; )
explanation: grep -q means only test if grep can find the regexp, but do not output, -P means use Perl syntax, so that the | is matched as is because the \Q..\E struct.
IFS=$'|' will make bash to use | instead of the spaces (SPC, TAB, etc.) as token separator.

Error with a script in bash

I have a little error with a script I wrote in bash and I can't figure out what's I'm doing wrong
note that I'm using this script for thousands of calculations and this error happened only a few times (like 20 or so), but it still happened
What the script does is this: basically it takes in input a web page that I got from a site with the utility w3m and it counts all the occurrences of the words in it... After it orders them from the most common to the ones that occur only once
this is the code:
#!/bin/bash
# counts the numbers of words from specific sites #
# writes in a file the occurrences ordered from the most common #
touch check # file used to analyze the occurrences
touch distribution # final file ordered
page=$1 # the web page that needs to be analyzed
occurrences=$2 # temporary file for the occurrences
dictionary=$3 # dictionary used for another purpose (ignore this)
# write the words one by column
cat $page | tr -c [:alnum:] "\n" | sed '/^$/d' > check
# lopp to analyze the words
cat check | while read words
do
word=${words}
strlen=${#word}
# ignores blacklisted words or small ones
if ! grep -Fxq $word .blacklist && [ $strlen -gt 2 ]
then
# if the word isn't in the file
if [ `egrep -c -i "^$word: " $occurrences` -eq 0 ]
then
echo "$word: 1" | cat >> $occurrences
# else if it is already in the file, it calculates the occurrences
else
old=`awk -v words=$word -F": " '$1==words { print $2 }' $occurrences`
### HERE IS THE ERROR, EITHER THE LET OR THE SED ###
let "new=old+1"
sed -i "s/^$word: $old$/$word: $new/g" $occurrences
fi
fi
done
# orders the words
awk -F": " '{print $2" "$1}' $occurrences | sort -rn | awk -F" " '{print $2": "$1}' > distribution
# ignore this, not important
grep -w "1" distribution | awk -F ":" '{print $1}' > temp_dictionary
for line in `cat temp_dictionary`
do
if ! grep -Fxq $line $dictionary
then
echo $line >> $dictionary
fi
done
rm check
rm temp_dictionary
this is the error: (I'm translating it, so it could be different in english)
./wordOccurrences line:30 let:x // where x is a number, usually 9 or 10 (but also 11, 13, etc)
1: syntax error in the espression (the error token is 1)
sed: expression -e #1, character y: command 's' not terminated // where y is another number (this one is also usually 9 or 10) with y being different from x
EDIT:
Talking with kev it looks like it's a newline problem
I added an echo between let and sed to print the sed and it worked perfectly for like 5 to 10 minutes until that error. Usually the sed without error looked like this:
s/^CONSULENTI: 6$/CONSULENTI: 7/g
but when I got the error it was like this:
s/^00145: 1
1$/00145: 4/g
how to fix this?
If you get a new line in $old, it means awk prints two lines so there is a duplicate in $occurences.
The script seems complicated to count words, and not efficient because it launches many processes and process file in a loop ;
maybe you can do something similar with
sort | uniq -c
You should also consider that your case-insensitivity is not consistent throughout the program. I created a page with just "foooo" in it and ran the program, then created one with "Foooo" in it and ran the program again. The 'old=`awk...' line sets 'old' to the empty string because awk is matching case sensitively. This results in the occurrences file not being updated. The subsequent sed and possibly some of the greps are also case sensitive.
This may not be the only error since it doesn't explain the error message you saw, but it is an indication that the same word with different capitalization will be handled erroneously by your script.
The following would separate the words, lowercase them, and then remove the ones smaller than three characters:
tr -cs '[:alnum:]' '\n' <foo | tr '[:upper:]' '[:lower:]' | egrep -v '^.{0,2}$'
Using this at the front of your script would mean that the rest of the script would not have to be case insensitive to be correct.

Resources