Egrep acts strange with -f option - linux

I've got a strangely acting egrep -f.
Example:
$ egrep -f ~/tmp/tmpgrep2 orig_20_L_A_20090228.txt | wc -l
3
$ for lines in `cat ~/tmp/tmpgrep2` ; do egrep $lines orig_20_L_A_20090228.txt ; done | wc -l
12
Could someone give me a hint what could be the problem?
No, the files did not changed between executions. The expected answer for the egrep line count is 12.
UPDATE on file contents: the searched file contains cca 13000 lines, each of them are 500 char long, the pattern file contains 12 lines, each of them are 24 char long. The pattern always (and only) occurs on a fixed position in the seached file (26-49).
UPDATE on pattern contents: every pattern from tmpgrep2 are a 24 char long number.

If the search patterns are found on the same lines, then you can get the result you see:
Suppose you look for:
abc
def
ghi
jkl
and the data file is:
abcdefghijklmnoprstuvwxzy
then the one-time command will print 1 and the loop will print 4.

Could it be that the lines read contain something that the shell is expanding/substituting for you, in the second version? Then that doesn't get done by grep when it reads the patterns itself, thus leading to a different sent of patterns being matched.
I'm not totally sure if the shell is doing any expansion on the variable value in an invocation like that, but it's an idea at least.
EDIT: Nope, it doesn't seem to do any substitutions. But it could be quoting issue, if your patterns contain whitespace the for loop will step through each token, not through each line. Take a look at the read bash builtin.

Do you have any duplicates in ~/tmp/tmpgrep2? Egrep will only use the dupes one time, but your loop will use each occurrence.
Get rid of dupes by doing something like this:
$ for lines in `sort < ~/tmp/tmpgrep2 | uniq` ; do egrep $lines orig_20_L_A_20090228.txt ; done | wc -l

I second #unwind.
Why don't you run without wc -l and see what each search is finding?
And maybe:
for lines in `cat ~/tmp/tmpgrep2` ; do echo $lines ; done
Just to see now the shell is handling $lines?

The others have already come up with most of the things I would look at. The next thing I would check is the environment variable GREP_OPTIONS, or whatever it is called on your machine. I've gotten the strangest error messages or behaviors when using a command line argument that interfered with the environment settings.

Related

Get last line from grep search on multiple files, and write them in a output file

I have multiple files located in multiple directories. From them I search a keyword 'ENERGY' by grep. In each file I get multiple match cases. I want to take the last line from each file and save the results in the output.txt file. I wrote the following code:
labl=SubDir
ENERGY=`grep 'ENERGY' MyDir*${labl}*/*.txt`
cat > output.txt << EOF
${ENERGY}
EOF
This code saves all match cases from each file. But as mentioned, I need the last match case from each file. For that I modified the grep command as:
ENERGY=`grep 'ENERGY' MyDir*${labl}*/*.txt|taile -l`
Unfortunately this doesn't do the job either. Instead, it saves all the match cases from the last file only.
How to solve it?
Please don't run multiple processes/pipes to achieve this.
gawk '/ENERGY/{last=$0} ENDFILE{if(last!="") print last; last=""}' MyDir*"$labl"*/*.txt
/ENERGY/{last=$0}: On lines which match the regex ENERGY, set variable last to the contents of the entire line $0
ENDFILE{...} Run this {action} at the end of every input file supplied by the glob.
if(last!="") print last: print last if it's not null
last="": reset this variable to null, avoiding duplication
MyDir*"${labl}"*/*.txt: Quoted variable in glob will match directory names that include spaces
Use a for loop:
for f in MyDir*"$lab1"*/*.txt; do
grep ENERGY "$f" | tail -1 >> output.txt
done
Yet one but probably not last possible approach is to use parallel like this. Probably you can achieve the same with xargs, but I personally prefer parallel as simpler and giving the possibility to scale your process.
ls -1 file* | parallel -j1 "grep ENERGY {} | tail -n 1" > output.txt

Line from bash command output stored in variable as string

I'm trying to find a solution to a problem analog to this one:
#command_A
A_output_Line_1
A_output_Line_2
A_output_Line_3
#command_B
B_output_Line_1
B_output_Line_2
Now I need to compare A_output_Line_2 and B_output_Line_1 and echo "Correct" if they are equal and "Not Correct" otherwise.
I guess the easiest way to do this is to copy a line of output in some variable and then after executing the two commands, simply compare the variables and echo something.
This I need to implement in a bash script and any information on how to get certain line of output stored in a variable would help me put the pieces together.
Also, it would be cool if anyone can tell me not only how to copy/store a line, but probably just a word or sequence like : line 1, bytes 4-12, stored like string in a variable.
I am not a complete beginner but also not anywhere near advanced linux bash user. Thanks to any help in advance and sorry for bad english!
An easier way might be to use diff, no?
Something like:
command_A > command_A.output
command_B > command_B.output
diff command_A.output command_B.output
This will work for comparing multiple strings.
But, since you want to know about single lines (and words in the lines) here are some pointers:
# first line of output of command_A
command_A | head -n 1
The -n 1 option says only to use the first line (default is 10 I think)
# second line of output of command_A
command_A | head -n 2 | tail -n 1
that will take the first two lines of the output of command_A and then the last of those two lines. Happy times :)
You can now store this information in a variable:
export output_A=`command_A | head -n 2 | tail -n 1`
export output_B=`command_B | head -n 1`
And then compare it:
if [ "$output_A" == "$output_B" ]; then echo 'Correct'; else echo 'Not Correct'; fi
To just get parts of a string, try looking into cut or (for more powerful stuff) sed and awk.
Also, just learing a good general purpose scripting language like python or ruby (even perl) can go a long way with this kind of problem.
Use the IFS (internal field separator) to separate on newlines and store the outputs in an array.
#!/bin/bash
IFS='
'
array_a=( $(./a.sh) )
array_b=( $(./b.sh) )
if [ "${array_a[1]}" = "${array_b[0]}" ]; then
echo "CORRECT"
else
echo "INCORRECT"
fi

Tail inverse / printing everything except the last n lines?

Is there a (POSIX command line) way to print all of a file EXCEPT the last n lines? Use case being, I will have multiple files of unknown size, all of which contain a boilerplate footer of a known size, which I want to remove. I was wondering if there is already a utility that does this before writing it myself.
Most versions of head(1) - GNU derived, in particular, but not BSD derived - have a feature to do this. It will show the top of the file except the end if you use a negative number for the number of lines to print.
Like so:
head -n -10 textfile
Probably less efficient than the "wc" + "do the math" + "tail" method, but easier to look at:
tail -r file.txt | tail +NUM | tail -r
Where NUM is one more than the number of ending lines you want to remove, e.g. +11 will print all but the last 10 lines. This works on BSD which does not support the head -n -NUM syntax.
The head utility is your friend.
From the man page of head:
-n, --lines=[-]K
print the first K lines instead of the first 10;
with the leading `-', print all but the last K lines of each file
There's no standard commands to do that, but you can use awk or sed to fill a buffer of N lines, and print from the head once it's full. E.g. with awk:
awk -v n=5 '{if(NR>n) print a[NR%n]; a[NR%n]=$0}' file
cat <filename> | head -n -10 # Everything except last 10 lines of a file
cat <filename> | tail -n +10 # Everything except 1st 10 lines of a file
If the footer starts with a consistent line that doesn't appear elsewhere, you can use sed:
sed '/FIRST_LINE_OF_FOOTER/q' filename
That prints the first line of the footer; if you want to avoid that:
sed -n '/FIRST_LINE_OF_FOOTER/q;p' filename
This could be more robust than counting lines if the size of the footer changes in the future. (Or it could be less robust if the first line changes.)
Another option, if your system's head command doesn't support head -n -10, is to precompute the number of lines you want to show. The following depends on bash-specific syntax:
lines=$(wc -l < filename) ; (( lines -= 10 )) ; head -$lines filename
Note that the head -NUMBER syntax is supported by some versions of head for backward compatibility; POSIX only permits the head -n NUMBER form. POSIX also only permits the argument to -n to be a positive decimal integer; head -n 0 isn't necessarily a no-op.
A POSIX-compliant solution is:
lines=$(wc -l < filename) ; lines=$(($lines - 10)) ; head -n $lines filename
If you need to deal with ancient pre-POSIX shells, you might consider this:
lines=`wc -l < filename` ; lines=`expr $lines - 10` ; head -n $lines filename
Any of these might do odd things if a file is 10 or fewer lines long.
tac file.txt | tail +[n+1] | tac
This answer is similar to user9645's, but it avoids the tail -r command, which is also not a valid option many systems. See, e.g., https://ubuntuforums.org/showthread.php?t=1346596&s=4246c451162feff4e519ef2f5cb1a45f&p=8444785#post8444785 for an example.
Note that the +1 (in the brackets) was needed on the system I tried it on to test, but it may not be required on your system. So, to remove the last line, I had to put 2 in the brackets. This is probably related to the fact that you need to have the last line ending with regular line feed character(s). This, arguably, makes the last line a blank line. If you don't do that, then the tac command will combine the last two lines, so removing the "last" line (or the first to the tail command) will actually remove the last two lines.
My answer should also be the fastest solution of those listed to date for systems lacking the improved version of head. So, I think it is both the most robust and the fastest of all the answers listed.
head -n $((`(wc -l < Windows_Terminal.json)`)) Windows_Terminal.json
This will work on Linux and on MacOs, keep in mind Mac does not support a negative value. so This is quite handy.
n.b : replace Windows_Terminal.json with your file name
It is simple. You have to add + to the number of lines that you wanted to avoid.
This example gives to you all the lines except the first 9
tail -n +10 inputfile
(yes, not the first 10...because it counts different...if you want 10, just type
tail -n 11 inputfile)

Count the number of occurrences in a string. Linux

Okay so what I am trying to figure out is how do I count the number of periods in a string and then cut everything up to that point but minus 2. Meaning like this:
string="aaa.bbb.ccc.ddd.google.com"
number_of_periods="5"
number_of_periods=`expr $number_of_periods-2`
string=`echo $string | cut -d"." -f$number_of_periods`
echo $string
result: "aaa.bbb.ccc.ddd"
The way that I was thinking of doing it was sending the string to a text file and then just greping for the number of times like this:
grep -c "." infile
The reason I don't want to do that is because I want to avoid creating another text file for I do not have permission to do so. It would also be simpler for the code I am trying to build right now.
EDIT
I don't think I made it clear but I want to make finding the number of periods more dynamic because the address I will be looking at will change as the script moves forward.
If you don't need to count the dots, but just remove the penultimate dot and everything afterwards, you can use Bash's built-in string manuipulation.
${string%substring}
Deletes shortest match of $substring from back of $string.
Example:
$ string="aaa.bbb.ccc.ddd.google.com"
$ echo ${string%.*.*}
aaa.bbb.ccc.ddd
Nice and simple and no need for sed, awk or cut!
What about this:
echo "aaa.bbb.ccc.ddd.google.com"|awk 'BEGIN{FS=OFS="."}{NF=NF-2}1'
(further shortened by helpful comment from #steve)
gives:
aaa.bbb.ccc.ddd
The awk command:
awk 'BEGIN{FS=OFS="."}{NF=NF-2}1'
works by separating the input line into fields (FS) by ., then joining them as output (OFS) with ., but the number of fields (NF) has been reduced by 2. The final 1 in the command is responsible for the print.
This will reduce a given input line by eliminating the last two period separated items.
This approach is "shell-agnostic" :)
Perhaps this will help:
#!/bin/sh
input="aaa.bbb.ccc.ddd.google.com"
number_of_fields=$(echo $input | tr "." "\n" | wc -l)
interesting_fields=$(($number_of_fields-2))
echo $input | cut -d. -f-${interesting_fields}
grep -o "\." <<<"aaa.bbb.ccc.ddd.google.com" | wc -l
5

Grep filtering output from a process after it has already started?

Normally when one wants to look at specific output lines from running something, one can do something like:
./a.out | grep IHaveThisString
but what if IHaveThisString is something which changes every time so you need to first run it, watch the output to catch what IHaveThisString is on that particular run, and then grep it out? I can just dump to file and later grep but is it possible to do something like background it and then bring it to foreground and bringing it back but now piped to some grep? Something akin to:
./a.out
Ctrl-Z
fg | grep NowIKnowThisString
just wondering..
No, it is only in your screen buffer if you didn't save it in some other way.
Short form: You can do this, but you need to know that you need to do it ahead-of-time; it's not something that can be put into place interactively after-the-fact.
Write your script to determine what the string is. We'd need a more detailed example of the output format to give a better example of usage, but here's one for the trivial case where the entire first line is the filter target:
run_my_command | { read string_to_filter_for; fgrep -e "$string_to_filter_for" }
Replace the read string_to_filter_for with as many commands as necessary to read enough input to determine what the target string is; this could be a loop if necessary.
For instance, let's say that the output contains the following:
Session id: foobar
and thereafter, you want to grep for lines containing foobar.
...then you can pipe through the following script:
re='Session id: (.*)'
while read; do
if [[ $REPLY =~ $re ]] ; then
target=${BASH_REMATCH[1]}
break
else
# if you want to print the preamble; leave this out otherwise
printf '%s\n' "$REPLY"
fi
done
[[ $target ]] && grep -F -e "$target"
If you want to manually specify the filter target, this can be done by having the loop check for a file being created with filter contents, and using that when starting up grep afterwards.
That is a little bit strange what you need, but you can do it tis way:
you must go into script session first;
then you use shell how usually;
then you start and interrupt you program;
then run grep over typescript file.
Example:
$ script
$ ./a.out
Ctrl-Z
$ fg
$ grep NowIKnowThisString typescript
You could use a stream editor such as sed instead of grep. Here's an example of what I mean:
$ cat list
Name to look for: Mike
Dora 1
John 2
Mike 3
Helen 4
Here we find the name to look for in the fist line and want to grep for it. Now piping the command to sed:
$ cat list | sed -ne '1{s/Name to look for: //;h}' \
> -e ':r;n;G;/^.*\(.\+\).*\n\1$/P;s/\n.*//;br'
Mike 3
Note: sed itself can take file as a parameter, but you're not working with text files, so that's how you'd use it.
Of course, you'd need to modify the command for your case.

Resources