How does grep know it is writing to the input file?

How does grep know it is writing to the input file? - linux

If I try to redirect the output of grep to the same file that it is reading from, like so:
$ grep stuff file.txt > file.txt
I get the error message grep: input file 'file.txt' is also the output. How does grep determine this?

According to the GNU grep source code, the grep check the i-nodes of the input and the output:
if (!out_quiet && list_files == 0 && 1 < max_count
&& S_ISREG (out_stat.st_mode) && out_stat.st_ino
&& SAME_INODE (st, out_stat)) /* <------------------ */
{
if (! suppress_errors)
error (0, 0, _("input file %s is also the output"), quote (filename));
errseen = 1;
goto closeout;
}
The out_stat is filled by calling fstat against STDOUT_FILENO.
if (fstat (STDOUT_FILENO, &tmp_stat) == 0 && S_ISREG (tmp_stat.st_mode))
out_stat = tmp_stat;

Looking at the source code - you can see that it checks for this case (the file is already open for reading by grep) and reports it, see the SAME_INODE check below:
/* If there is a regular file on stdout and the current file refers
to the same i-node, we have to report the problem and skip it.
Otherwise when matching lines from some other input reach the
disk before we open this file, we can end up reading and matching
those lines and appending them to the file from which we're reading.
Then we'd have what appears to be an infinite loop that'd terminate
only upon filling the output file system or reaching a quota.
However, there is no risk of an infinite loop if grep is generating
no output, i.e., with --silent, --quiet, -q.
Similarly, with any of these:
--max-count=N (-m) (for N >= 2)
--files-with-matches (-l)
--files-without-match (-L)
there is no risk of trouble.
For --max-count=1, grep stops after printing the first match,
so there is no risk of malfunction. But even --max-count=2, with
input==output, while there is no risk of infloop, there is a race
condition that could result in "alternate" output. */
if (!out_quiet && list_files == 0 && 1 < max_count
&& S_ISREG (out_stat.st_mode) && out_stat.st_ino
&& SAME_INODE (st, out_stat))
{
if (! suppress_errors)
error (0, 0, _("input file %s is also the output"), quote (filename));
errseen = true;
goto closeout;
}

Here is how to write back to some file:
grep stuff file.txt > tmp && mv tmp file.txt

try pipline with cat or tac:
cat file | grep 'searchpattern' > newfile
it's best practice and short for realization

Related

What to do in order to create a continuous .txt files without replacing the already existing .txt files using bash

I am trying to write a bash script to create multiple .txt files.
With the below code I created the files, but when I run the script again I get the same output instead of having more files with increasing number.
#! /bin/bash
for z in $(seq -w 1 10);
do
[[ ! -f "${z}_name.txt" ]] && {touch "${z}_name.txt";}
done

Based in part on work by Raman Sailopal in a now-deleted answer (and on comments I made about that answer, as well as comments I made about the question), you could use:
shopt -s nullglob
touch $(seq -f '%.0f_name.txt' \
$(printf '%s\n' [0-9]*_name.txt |
awk 'BEGIN { max = 0 }
{ val = $0 + 0; if (val > max) max = val; }
END { print max + 1, max + 10 }'
)
)
The shopt -s nullglob command means that if there are no names that match the glob expression [0-9]*_name.txt, nothing will be generated in the arguments to the printf command.
The touch command is given a list of file names. The seq command formats a range of numbers using zero decimal places (so it formats them as integers) plus the rest of the name (_name.txt). The range is given by the output of printf … | awk …. The printf() command lists file names that start with a digit and end with _name.txt one per line. The awk command keeps a track of the current maximum number; it coerces the name into a number (awk ignores the material after the last digit) and checks whether the number is larger than before. At the end, it prints two values, the largest value plus 1 and the largest value plus 10 (defaulting to 1 and 10 if there were no files). Adding the -w option to seq is irrelevant when you specify -f and a format; the file names won't be generated with leading zeros. There are ways to deal with this if they're crucial — probably simplest is to drop the -f option to seq and add the -w option, and output the output through sed 's/$/_name.txt/'.
You can squish the awk script onto a single line; you can squish the whole command onto a single line. However, it is arguably easier to see the organization of the command when they are spread over multiple lines.
Note that (apart from a possible TOCTOU — Time of Check, Time of Use — issue), there is no need to check whether the files exist. They don't; they'd have been listed by the glob [0-9]*_name.txt if they did, and the number would have been accounted for. If you want to ensure no damage to existing files, you'd need to use set -C or set -o noclobber and then create the files one by one using shell I/O redirection.
[…time passes…]
Actually, you can have awk do the file name generation instead of using seq at all:
touch $(printf '%s\n' [0-9]*_name.txt |
awk 'BEGIN { max = 0 }
{ val = $0 + 0; if (val > max) max = val; }
END { for (i = max + 1; i <= max + 10; i++)
printf "%d_name.txt\n", i
}'
)
And, if you try a bit harder, you can get rid of the printf command too:
touch $(awk 'BEGIN { max = 0
for (i = 1; i <= ARGC; i++)
{
val = ARGV[i] + 0;
if (val > max)
max = val
}
for (i = max + 1; i <= max + 10; i++)
printf "%d_name.txt\n", i
}' [0-9]*_name.txt
)
Don't forget the shopt -s nullglob — that's still needed for maximum resiliency.
You might even choose to get rid of the separate touch command by having awk write to the files:
awk 'BEGIN { max = 0
for (i = 0; i < ARGC; i++)
{
val = ARGV[i] + 0;
if (val > max)
max = val
}
for (i = max + 1; i <= max + 10; i++)
{
name = sprintf("%d_name.txt", i)
printf "" > name
}
exit
}' [0-9]*_name.txt
Note the use of exit. Note that the POSIX specification for awk says that ARGC is the number of arguments in ARGV and that the elements in ARGV are indexed from 0 to ARGC - 1 — as in C programs.
There are few shell scripts that cannot be improved. The first version shown runs 4 commands; the last runs just one. That difference could be quite significant if there were many files to be processed.
Beware: eventually, the argument list generated by the glob will get too big; then you have to do more work. You might be obliged to filter the output from ls (with its attendant risks and dangers) and feed the output (the list of file names) into the awk script and process the lines of input once more. While your lists remain a few thousand files long, it probably won't be a problem.

How can I use bash to gather all NFS mount points with multiple configuration files to check that each mount is writeable?

I am trying to create a script that will dynamically find all NFS mount points that should be writable and check that they are still writable however I can't seem to get my head around connecting the mounts to their share directories.
So for example I have a server's /etc/auto.master like this (I've sanitized some of the data):
/etc/auto.master
/nfs1 /etc/auto.nfs1 --ghost
/nfs2 /etc/auto.nfs2 --ghost
And each of those files has:
/etc/auto.nfs1
home -rw,soft-intr -fstype=nfs server1:/shared/home
store -rw,soft-intr -fstype=nfs server2:/shared/store
/etc/auto.nfs2
data -rw,soft-intr -fstype=nfs oralceserver1:/shared/data
rman -rw,soft-intr -fstype=nfs oracleserver1:/shared/rman
What I'm trying to get out of that is
/nfs1/home
/nfs1/store
/nfs2/data
/nfs2/rman
without getting any erroneous or commented entries caught in the net.
My code attempt is this:
#!/bin/bash
for automst in `grep '^/' /etc/auto.master|awk -F" " '{for(i=1;i<=NF;i++){if ($i ~ /etc/){print $i}}}'`;
do echo $automst > /tmp/auto.mst
done
AUTOMST=`cat /tmp/auto.mst`
for mastermount in `grep '^/' /etc/auto.master|awk -F" " '{for(i=1;i<=NF;i++){if ($i ~ /etc/){print $i}}}'`;
do grep . $mastermount|grep -v '#'|awk {'print $1'};
done > /tmp/nfsmounteddirs
for dir in `cat /tmp/nfsmounteddirs`;
do
if [ -w "$dir" ]; then echo "$dir is writeable"; else echo "$dir is not writeable!";fi
done
I have 600 Linux servers and many have their own individual NFS setups and we don't have an alerting solution in place that can check, and while having all those individual scripts would be "a" solution, it would be a nightmare to manage and a lot of work so the dynamic aspect of it would be very useful.

awk '/^\// { # Process where lines begin with a /
fil=$2; # Track the file name
nfs=$1 # Track the nfs
while (getline line < fil > 0) { # Read the fil specified by fil till the end of the file
split(line,map,","); # Split the lines in array map with , as the delimiter
split(map[1],map1,/[[:space:]]+/); # Further split map into map1 with spaces as the delimiter
if(map1[2]~/w/ && line !~ /^#/) {
print nfs" "map1[1] # If w is in the permissions string and the line doesn't begin with a comment, print the nfs and share
}
}
close(fil) # Close the file after we have finished reading
}' /etc/auto.master
One liner:
awk '/^\// { fil=$2;nfs=$1;while (getline line < fil > 0) { split(line,map,",");split(map[1],map1,/[[:space:]]+/);if(map1[2]~/w/ && line !~ /^#/) { print nfs" "map1[1] } } close(fil) }' /etc/auto.master
Output:
/nfs1 home
/nfs1 store
/nfs2 data
/nfs2 rman

sed command issue with string replacement

I'm having a weird problem with the sed command.
I have a script that take a c file, copy it X times and then replace the name of the functions inside it by adding number to the name.
For example:
originalFile.c contains these functions check0, check1 check2
The script will generate those file:
originalFile1.c: check0 check1 check2
originalFile2.c: check3 check4 check5
originalFile3.c: check6 check7 check8
... and so on.
Now the problem... If I generate enough files so the number goes up to 10,20 or more I noticed something in the name of the function. The first function of the file is renamed incorrectly but the other are corrects. For example:
originalFileX.c: __check165__ check16 check17
...
originalFileZ.c: __check297__ __check298__ check29 -> in this file 2 names are incorrects.
Also, If I print the name with echo everything is correct. Do you have any idea what could be wrong?
Here is my script (I run it under OSX):
#!/bin/bash
NUMCHECK=3
# $1: filename
# $2: number of function in the file
# $3: number of function I want to generate
# $4: function basename
function replace_name() {
FILE_NUM=$((($3+($2-1))/$2))
TMP=0
for (( i=1; i<$FILE_NUM+1; i++ ))
do
cp $1.mm test/$1$i.mm
for (( j=0; j<$2; j++ ))
do
OLDNAME="$4$j"
NEWNAME="$4$TMP"
echo $OLDNAME:$NEWNAME
sed -i "" "s/$OLDNAME/$NEWNAME/g" test/$1$i.mm
TMP=$(($TMP+1))
done
done
}
replace_name check $NUMCHECK 60 check

Youre doing 3 runs of the sed in each file. Just imagine the following
sed -i s/check0/check150/g test/check51.mm
sed -i s/check1/check151/g test/check51.mm
sed -i s/check2/check152/g test/check51.mm
The
s/check0/check150/g changes the check0 to check150 - ok
s/check1/check151/g will change the previous check150 to check15150 (because it finds the check1 string in the check150 too, from the previous step).
etc...
You need more precisely define your regex. because here isn't any example input, can't help more.

grep lines before and after in aix/ksh shell

I want to extract lines before and after a matched pattern.
eg: if the file contents are as follows
absbasdakjkglksagjgj
sajlkgsgjlskjlasj
hello
lkgjkdsfjlkjsgklks
klgdsgklsdgkldskgdsg
I need find hello and display line before and after 'hello'
the output should be
sajlkgsgjlskjlasj
hello
lkgjkdsfjlkjsgklks
This is possible with GNU but i need a method that works in AIX / KSH SHELL WHERE NO GNU IS INSTALLED.

sed -n '/hello/{x;G;N;p;};h' filename

I've found it is generally less frustrating to build the GNU coreutils once, and benefit from many more features http://www.gnu.org/software/coreutils/

Since you'll have Perl on the machine, you could use the following code, but you'd probably do better to install the GNU utilities. This has options -b n1 for lines before and -f n1 for lines following the match. It works with PCRE matches (so if you want case-insensitive matching, add an i after the regex instead using a -i option. I haven't implemented -v or -l; I didn't need those.
#!/usr/bin/env perl
#
# #(#)$Id: sgrep.pl,v 1.7 2013/01/28 02:07:18 jleffler Exp $
#
# Perl-based SGREP (special grep) command
#
# Print lines around the line that matches (by default, 3 before and 3 after).
# By default, include file names if more than one file to search.
#
# Options:
# -b n1 Print n1 lines before match
# -f n2 Print n2 lines following match
# -n Print line numbers
# -h Do not print file names
# -H Do print file names
use warnings;
use strict;
use constant debug => 0;
use Getopt::Std;
my(%opts);
sub usage
{
print STDERR "Usage: $0 [-hnH] [-b n1] [-f n2] pattern [file ...]\n";
exit 1;
}
usage unless getopts('hnf:b:H', \%opts);
usage unless #ARGV >= 1;
if ($opts{h} && $opts{H})
{
print STDERR "$0: mutually exclusive options -h and -H specified\n";
exit 1;
}
my $op = shift;
print "# regex = $op\n" if debug;
# print file names if -h omitted and more than one argument
$opts{F} = (defined $opts{H} || (!defined $opts{h} and scalar #ARGV > 1)) ? 1 : 0;
$opts{n} = 0 unless defined $opts{n};
my $before = (defined $opts{b}) ? $opts{b} + 0 : 3;
my $after = (defined $opts{f}) ? $opts{f} + 0 : 3;
print "# before = $before; after = $after\n" if debug;
my #lines = (); # Accumulated lines
my $tail = 0; # Line number of last line in list
my $tbp_1 = 0; # First line to be printed
my $tbp_2 = 0; # Last line to be printed
# Print lines from #lines in the range $tbp_1 .. $tbp_2,
# leaving $leave lines in the array for future use.
sub print_leaving
{
my ($leave) = #_;
while (scalar(#lines) > $leave)
{
my $line = shift #lines;
my $curr = $tail - scalar(#lines);
if ($tbp_1 <= $curr && $curr <= $tbp_2)
{
print "$ARGV:" if $opts{F};
print "$curr:" if $opts{n};
print $line;
}
}
}
# General logic:
# Accumulate each line at end of #lines.
# ** If current line matches, record range that needs printing
# ** When the line array contains enough lines, pop line off front and,
# if it needs printing, print it.
# At end of file, empty line array, printing requisite accumulated lines.
while (<>)
{
# Add this line to the accumulated lines
push #lines, $_;
$tail = $.;
printf "# array: N = %d, last = $tail: %s", scalar(#lines), $_ if debug > 1;
if (m/$op/o)
{
# This line matches - set range to be printed
my $lo = $. - $before;
$tbp_1 = $lo if ($lo > $tbp_2);
$tbp_2 = $. + $after;
print "# $. MATCH: print range $tbp_1 .. $tbp_2\n" if debug;
}
# Print out any accumulated lines that need printing
# Leave $before lines in array.
print_leaving($before);
}
continue
{
if (eof)
{
# Print out any accumulated lines that need printing
print_leaving(0);
# Reset for next file
close ARGV;
$tbp_1 = 0;
$tbp_2 = 0;
$tail = 0;
#lines = ();
}
}

I had a situation where I was stuck with a slow telnet session on a tablet, believe it or not, and I couldn't write a Perl script very easily with that keyboard. I came up with this hacky maneuver that worked in a pinch for me with AIX's limited grep. This won't work well if your grep returns hundreds of lines, but if you just need one line and one or two above/below it, this could do it. First I ran this:
cat -n filename |grep criteria
By including the -n flag, I see the line number of the data I'm seeking, like this:
2543 my crucial data
Since cat gives the line number 2 spaces before and 1 space after, I could grep for the line number right before it like this:
cat -n filename |grep " 2542 "
I ran this a couple of times to give me lines 2542 and 2544 that bookended line 2543. Like I said, it's definitely fallable, like if you have reams of data that might have " 2542 " all over the place, but just to grab a couple of quick lines, it worked well.

SED command inside a loop

Hello: I have a lot of files called test-MR3000-1.txt to test-MR4000-1.nt, where the number in the name changes by 100 (i.e. I have 11 files),
$ ls test-MR*
test-MR3000-1.nt test-MR3300-1.nt test-MR3600-1.nt test-MR3900-1.nt
test-MR3100-1.nt test-MR3400-1.nt test-MR3700-1.nt test-MR4000-1.nt
test-MR3200-1.nt test-MR3500-1.nt test-MR3800-1.nt
and also a file called resonancia.kumac which in a couple on lines contains the string XXXX.
$ head resonancia.kumac
close 0
hist/delete 0
vect/delete *
h/file 1 test-MRXXXX-1.nt
sigma MR=XXXX
I want to execute a bash file which substitutes the strig XXXX in a file by a set of numbers obtained from the command ls *MR* | cut -b 8-11.
I found a post in which there are some suggestions. I try my own code
for i in `ls *MR* | cut -b 8-11`; do
sed -e "s/XXXX/$i/" resonancia.kumac >> proof.kumac
done
however, in the substitution the numbers are surrounded by sigle qoutes (e.g. '3000').
Q: What should I do to avoid the single quote in the set of numbers? Thank you.

This is a reproducer for the environment described:
for ((i=3000; i<=4000; i+=100)); do
touch test-MR${i}-1.nt
done
cat >resonancia.kumac <<'EOF'
close 0
hist/delete 0
vect/delete *
h/file 1 test-MRXXXX-1.nt
sigma MR=XXXX
EOF
This is a script which will run inside that environment:
content="$(<resonancia.kumac)"
for f in *MR*; do
substring=${f:7:3}
echo "${content//XXXX/$substring}"
done >proof.kumac
...and the output looks like so:
close 0
hist/delete 0
vect/delete *
h/file 1 test-MR300-1.nt
sigma MR=300
There are no quotes anywhere in this output; the problem described is not reproduced.

or if it could be perl:
#!/usr/bin/perl
#ls = glob('*MR*');
open (FILE, 'resonancia.kumac') || die("not good\n");
#cont = <FILE>;
$f = shift(#ls);
$f =~ /test-MR([0-9]*)-1\.nt/;
$nr = $1;
#out = ();
foreach $l (#cont){
if($l =~ s/XXXX/$nr/){
$f = shift(#ls);
$f =~ /test-MR([0-9]*)-1\.nt/;
$nr = $1;
}
push #out, $l;
}
close FILE;
open FILE, '>resonancia.kumac' || die("not good\n");
print FILE #out;
That would replace the first XXXX with the first filename, what seemed to be the question before change.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How does grep know it is writing to the input file? - linux

If I try to redirect the output of grep to the same file that it is reading from, like so: $ grep stuff file.txt > file.txt I get the error message grep: input file 'file.txt' is also the output. How does grep determine this?

Here is how to write back to some file: grep stuff file.txt > tmp && mv tmp file.txt

try pipline with cat or tac: cat file | grep 'searchpattern' > newfile it's best practice and short for realization

Related

What to do in order to create a continuous .txt files without replacing the already existing .txt files using bash

How can I use bash to gather all NFS mount points with multiple configuration files to check that each mount is writeable?

sed command issue with string replacement

grep lines before and after in aix/ksh shell

SED command inside a loop

Categories

Resources