How to print the result of the first part of the pipe? - linux

I have the following grep:
grep -Po '(?<=PROGRAM\()[^\)]+(?=\))' /home/programs/hello_word.sh
Wich displays the string between PROGRAM( and ):
RECTONTER
Then, I need to know if these string extracted is contained in a file, so:
grep -Po '(?<=PROGRAM\()[^\)]+(?=\))' /home/programs/hello_word.sh | xargs -I % grep -e % /home/leherad/pgm_currentdate
File content:
RECTONTER
CORASFE
RENTOASD
UBICARP
If its found, returns the line of /home/leherad/pgm_currentdate, but I want to print the line extracted in the first grep (RECTONTER). If not found, then wouldn't return nothing.
There is a simple way to do this, or I should not complicate and would be better build a script and save the first grep in a variable?

You can store it on a variable first:
read -r FIRST < <(exec grep -Po '(?<=PROGRAM\()[^\)]+(?=\))' /home/programs/hello_word.sh) && grep -e "$FIRST" /home/leherad/pgm_currentdate
Update 01
#!/bin/bash
shopt -s nullglob
for FILE in /home/programs/*; do
read -r FIRST < <(exec grep -Po '(?<=PROGRAM\()[^\)]+(?=\))' "$FILE") && grep -e "$FIRST" /home/leherad/pgm_currentdate && echo "$FIRST"
done

I think a straightforward way to solve this is to use a function.
Also, your grep pattern will match shell comments, which could cause unexpected behavior in your xargs command when there are more than one matches; you might want to take steps to only grab the first match. It's hard to say without actually seeing the input files, so I'm guessing this is either ok or comments are actually the expected place for your target pattern.
Anyway, here's my best guess at a function that would work for you.
get_program() {
local filename="$1"
local program="$( grep -m1 -Po '(?<=PROGRAM\()[^\)]+(?=\))' "$filename" )"
if grep -q -e "$program" /home/leherad/pgm_currentdate; then
echo $program
grep -e "$program" /home/leherad/pgm_currentdate
fi
}
get_program /home/programs/hello_word.sh

Related

How to move files where the first line contains a string?

I am currently using the following command:
grep -l -Z -E '.*?FindMyRegex' /home/user/folder/*.csv | xargs -0 -I{} mv {} /home/destination/folder
This works fine. The problem is it uses grep on the entire file.
I would like to use the grep command on the FIRST line of the file only.
I have tried to use head -1 file | at the beginning, but it did not work.
A change I would add to your script is -
for file in *.csv; do
head -1 "$file" | grep -l -Z -E '.*?FindMyRegex' | xargs -0 -I{} mv {} /home/destination/folder;
done
you can maybe try sed '1q' file.csv | grep ... to search the regexp only in the first line.
You don't need grep or find, as long as your files don't have embedded newlines.
I don't know an easy way off the top of my head to get sed to delimit with nulls.
mv $( for f in /home/user/folder/*.csv;
do sed -ns '1 { /yourPattern/F; q; }' $f;
done ) /home/destination/folder/
EDIT
Rewrote with a loop. This will run a separate instance of sed to check each file, but at least it shouldn't read beyond the first line. It will fail syntactically if there are no hits.
You might need -E depending on your regex.
-n says don't print records from the files.
-s says treat each file as a distinct input - this is so the filenames aren't always the first one.
This does require GNU sed for the F.
gawk 'FNR==1{if($0~/PATTERN/)
printf "mv %s %s\n",FILENAME, "/target";nextfile}' /path/*.csv
First of all, in your regex: .*?FindMyRegex the .*? doesn't make any sense, they could be removed.
The above awk (gawk) one-liner will build up mv file target command lines for you. You can check them, if you are satisfied with them, pipe the output to |sh , the commands are gonna be executed.
replace PATTERN by your regex pattern, and /target by the real target dir.
The one-liner is assuming that the filenames don't contain special chars (space i.e.), if it is the case, add "s to the mv cmd.
using GNU awk to find the filenames, pipe the filenames into xargs
gawk -v pattern="myRegex" '
FNR == 1 {if ($0 ~ pattern) printf "%s\0", FILENAME; nextfile}
' *.csv | xargs -0 echo mv -t destination
If it looks OK, remove "echo"
Try this Shellcheck-clean Bash code:
#! /bin/bash
shopt -s nullglob # Globs that match nothing expand to nothing
shopt -s dotglob # Globs match files whose names start with '.'
dest=/home/destination/folder
for file in *.csv ; do
head -n 1 -- "$file" | grep -qE '.*?FindMyRegex' && mv -- "$file" "$dest"
done
shopt -s nullglob prevents an error if there are no .csv files in the directory.
shopt -s dotglob ensures that files whose name starts with '.' are handled.
The -- in the options for head and mv ensures that files whose names begin with - are handled correctly.
The quotes in "$file" and "$dest" ensure that names that contain whitespace (actually $IFS) characters (including newlines) or glob metacharacters are handled correctly.
Note that the .*? in the reqular expression is probably redundant, and may not do what you think it does (grep -E doesn't do non-greedy matching).

Reformatting name / content pairs from grep in a bash script

I'm attempting to create a bash script that will grep a single file for two separate pieces of data, and print them to stdout.
So far this is what I have:
#!/bin/sh
cd /my/filePath/to/directory
APP=`grep -r --include "inputs.conf" "\[" | grep -oP '^[^\/]+'`
INPUT=`grep -r --include "inputs.conf" "\[" | grep -oP '\[[^\]]+'`
for i in $APP
do
{cd /opt/splunk/etc/deployment-apps
INPUT=`grep -r --include "inputs.conf" "\[" | grep -oP '\[[^\]]+'`
echo -n "$i | $INPUT"}
done
echo "";
exit
Which gives me an output printing the entire output of the first command (which is about 200 lines), then a |, then the other results from the second command. I was thinking I could create an array to do this, however I'm still learning bash.
This is an output example from the command without piping to grep:
TA-XA6x-Server/local/inputs.conf:[perfmon://Processor]
There are 200+ of these in a single execution, and I was looking to have the format be printed as something like this
app="TA-XA6x-Server/local/inputs.conf:" | input="[perfmon://Processor]"
There are essentially two pieces of information I'm attempting to stitch together:
the file path to the file
the contents of the file itself (the input)
Here is an example of the file path:
/opt/splunk/etc/deployment-apps/TA-XA6x-Server/local/inputs.conf
and this is an example of the inputs.conf file contents:
[perfmon://TCPv4]
The easy, mostly-working-ish approach is something like this:
#!/bin/bash
while IFS=: read -r name content; do
printf 'app="%s" | input="%s"\n' "$name" "$content"
done < <(grep -r --include "inputs.conf" "\[")
If you need to work reliably with all possible filenames (including names with colons or newlines) and have GNU grep available, consider the --null argument to grep and adjusting the read usage appropriately:
#!/bin/bash
while IFS= read -r -d '' name && IFS= read -r content; do
printf 'app="%s" | input="%s"\n' "$name" "$content"
done < <(grep -r --null --include "inputs.conf" "\[")

How to get only filenames without Path by using grep

I have got the following Problem.
I´m doing a grep like:
$command = grep -r -i --include=*.cfg 'host{' /omd/sites/mesh/etc/icinga/conf.d/objects
I got the following output:
/omd/sites/mesh/etc/icinga/conf.d/objects/testsystem/test1.cfg:define host{
/omd/sites/mesh/etc/icinga/conf.d/objects/testsystem/test2.cfg:define host{
/omd/sites/mesh/etc/icinga/conf.d/objects/testsystem/test3.cfg:define host{
...
for all *.cfg files.
With exec($command,$array)
I passed the result in an array.
Is it possible to get only the filenames as result of the grep-command.
I have tried the following:
$Command= grep -l -H -r -i --include=*.cfg 'host{' /omd/sites/mesh/etc/icinga/conf.d/objects
but I got the same result.
I know that on the forum a similar topic exists.(How can I use grep to show just filenames (no in-line matches) on linux?), but the solution doesn´t work.
With "exec($Command,$result_array)" I try to get an array with the results.
The mentioned solutions works all, but I can´t get an resultarray with exec().
Can anyone help me?
Yet another simpler solution:
grep -l whatever-you-want | xargs -L 1 basename
or you can avoid xargs and use a subshell instead, if you are not using an ancient version of the GNU coreutils:
basename -a $(grep -l whatever-you-want)
basename is the bash straightforward solution to get a file name without path. You may also be interested in dirname to get the path only.
GNU Coreutils basename documentation
Is it possible to get only the filenames as result of the grep command.
With grep you need the -l option to display only file names.
Using find ... -execdir grep ... \{} + you might prevent displaying the full path of the file (is this what you need?)
find /omd/sites/mesh/etc/icinga/conf.d/objects -name '*.cfg' \
-execdir grep -r -i -l 'host{' \{} +
In addition, concerning the second part of your question, to read the result of a command into an array, you have to use the syntax: IFS=$'\n' MYVAR=( $(cmd ...) )
In that particular case (I formatted as multiline statement in order to clearly show the structure of that expression -- of course you could write as a "one-liner"):
IFS=$'\n' MYVAR=(
$(
find objects -name '*.cfg' \
-execdir grep -r -i -l 'host{' \{} +
)
)
You have then access to the result in the array MYVAR as usual. While I while I was testing (3 matches in that particular case):
sh$ echo ${#MYVAR[#]}
3
sh$ echo ${MYVAR[0]}
./x y.cfg
sh$ echo ${MYVAR[1]}
./d.cfg
sh$ echo ${MYVAR[2]}
./e.cfg
# ...
This should work:
grep -r -i --include=*.cfg 'host{' /omd/sites/mesh/etc/icinga/conf.d/objects | \
awk '{print $1}' | sed -e 's|[^/]*/||g' -e 's|:define$||'
The awk portion finds the first field in it and the sed command trims off the path and the :define.

linux copy files with first row containing genome to other directory

I have many files under the directory. And I want to copy those ones with first line contains "genome" word to a new folder. How should I do that. I can match those line out but I do not know how to manipulate the file again.
I build a bash structure like
for i in *
do
if [ sed -ne 'genome' $i ]
then cp $i OTHERDIR
fi
done
But it seems the if statement can do very limited thing and can not have sed in it unlike other programming language.
Try this:
for i in *
do
head -1 $i| grep "genome" && cp $i OTHERDIR
done
You can use head -1 to peek at the first line:
for i in *
do
fline=`head -1 $i`
if [ "$fline" = genome ]
then cp $i OTHERDIR
fi
done
First of all, you seem to make no attempt to check only the first line. Your expression should use head to isolate the first line as mentioned by the other posters.
Anyway, you can accomplish this task with sed or grep inside the if statement. The pipe requires the use of output capture $()
With sed: You don't need the -e, it is the default.
$(head -1 $i | sed -n '/genome/=')
With grep: Probably the easier solution. sed is overpowered for this job.
$(head -1 $i | grep genome)
http://www.unix.com/unix-dummies-questions-answers/65705-how-grep-only-1st-line.html
Amazingly I tried the following scripts myself and it works. Haven't check answers posted yet.
for i in *
do
grep -l "genome" $i | xargs -I one cp one DESTDIR
done

Problems with Grep Command in bash script

I'm having some rather unusual problems using grep in a bash script. Below is an example of the bash script code that I'm using that exhibits the behaviour:
UNIQ_SCAN_INIT_POINT=1
cat "$FILE_BASENAME_LIST" | uniq -d >> $UNIQ_LIST
sed '/^$/d' $UNIQ_LIST >> $UNIQ_LIST_FINAL
UNIQ_LINE_COUNT=`wc -l $UNIQ_LIST_FINAL | cut -d \ -f 1`
while [ -n "`cat $UNIQ_LIST_FINAL | sed "$UNIQ_SCAN_INIT_POINT"'q;d'`" ]; do
CURRENT_LINE=`cat $UNIQ_LIST_FINAL | sed "$UNIQ_SCAN_INIT_POINT"'q;d'`
CURRENT_DUPECHK_FILE=$FILE_DUPEMATCH-$CURRENT_LINE
grep $CURRENT_LINE $FILE_LOCTN_LIST >> $CURRENT_DUPECHK_FILE
MATCH=`grep -c $CURRENT_LINE $FILE_BASENAME_LIST`
CMD_ECHO="$CURRENT_LINE matched $MATCH times," cmd_line_echo
echo "$CURRENT_DUPECHK_FILE" >> $FILE_DUPEMATCH_FILELIST
let UNIQ_SCAN_INIT_POINT=UNIQ_SCAN_INIT_POINT+1
done
On numerous occasions, when grepping for the current line in the file location list, it has put no output to the current dupechk file even though there have definitely been matches to the current line in the file location list (I ran the command in terminal with no issues).
I've rummaged around the internet to see if anyone else has had similar behaviour, and thus far all I have found is that it is something to do with buffered and unbuffered outputs from other commands operating before the grep command in the Bash script....
However no one seems to have found a solution, so basically I'm asking you guys if you have ever come across this, and any idea/tips/solutions to this problem...
Regards
Paul
The `problem' is the standard I/O library. When it is writing to a terminal
it is unbuffered, but if it is writing to a pipe then it sets up buffering.
try changing
CURRENT_LINE=`cat $UNIQ_LIST_FINAL | sed "$UNIQ_SCAN_INIT_POINT"'q;d'`
to
CURRENT LINE=`sed "$UNIQ_SCAN_INIT_POINT"'q;d' $UNIQ_LIST_FINAL`
Are there any directories with spaces in their names in $FILE_LOCTN_LIST? Because if they are, those spaces will need escaped somehow. Some combination of find and xargs can usually deal with that for you, especially xargs -0
A small bash script using md5sum and sort that detects duplicate files in the current directory:
CURRENT="" md5sum * |
sort |
while read md5sum filename;
do
[[ $CURRENT == $md5sum ]] && echo $filename is duplicate;
CURRENT=$md5sum;
done
you tagged linux, some i assume you have tools like GNU find,md5sum,uniq, sort etc. here's a simple example to find duplicate files
$ echo "hello world">file
$ md5sum file
6f5902ac237024bdd0c176cb93063dc4 file
$ cp file file1
$ md5sum file1
6f5902ac237024bdd0c176cb93063dc4 file1
$ echo "blah" > file2
$ md5sum file2
0d599f0ec05c3bda8c3b8a68c32a1b47 file2
$ find . -type f -exec md5sum "{}" \; |sort -n | uniq -w32 -D
6f5902ac237024bdd0c176cb93063dc4 ./file
6f5902ac237024bdd0c176cb93063dc4 ./file1

Resources