Reformatting name / content pairs from grep in a bash script - linux

I'm attempting to create a bash script that will grep a single file for two separate pieces of data, and print them to stdout.
So far this is what I have:
#!/bin/sh
cd /my/filePath/to/directory
APP=`grep -r --include "inputs.conf" "\[" | grep -oP '^[^\/]+'`
INPUT=`grep -r --include "inputs.conf" "\[" | grep -oP '\[[^\]]+'`
for i in $APP
do
{cd /opt/splunk/etc/deployment-apps
INPUT=`grep -r --include "inputs.conf" "\[" | grep -oP '\[[^\]]+'`
echo -n "$i | $INPUT"}
done
echo "";
exit
Which gives me an output printing the entire output of the first command (which is about 200 lines), then a |, then the other results from the second command. I was thinking I could create an array to do this, however I'm still learning bash.
This is an output example from the command without piping to grep:
TA-XA6x-Server/local/inputs.conf:[perfmon://Processor]
There are 200+ of these in a single execution, and I was looking to have the format be printed as something like this
app="TA-XA6x-Server/local/inputs.conf:" | input="[perfmon://Processor]"
There are essentially two pieces of information I'm attempting to stitch together:
the file path to the file
the contents of the file itself (the input)
Here is an example of the file path:
/opt/splunk/etc/deployment-apps/TA-XA6x-Server/local/inputs.conf
and this is an example of the inputs.conf file contents:
[perfmon://TCPv4]

The easy, mostly-working-ish approach is something like this:
#!/bin/bash
while IFS=: read -r name content; do
printf 'app="%s" | input="%s"\n' "$name" "$content"
done < <(grep -r --include "inputs.conf" "\[")
If you need to work reliably with all possible filenames (including names with colons or newlines) and have GNU grep available, consider the --null argument to grep and adjusting the read usage appropriately:
#!/bin/bash
while IFS= read -r -d '' name && IFS= read -r content; do
printf 'app="%s" | input="%s"\n' "$name" "$content"
done < <(grep -r --null --include "inputs.conf" "\[")

Related

Search, match and copy directories into another based on names in a txt file

My goal is copy a bulk of specific directories whose names are in a txt file as follows:
$ cat names.txt
raw1
raw2
raw3
raw4
raw5
These directories have subdirectories, hence it is important to copy all the contents. When I list in my terminal it looks like this:
$ ls -l
raw3
raw7
raw1
raw8
raw5
raw6
raw2
raw4
To perform this task, I have tried the following:
cat names.txt | while read line; do grep -l '$line' | xargs -r0 cp -t <desired_destination>; done
But, I get this mistake
cp: cannot stat No such file or directory
I suppose it's because the names in the file list (names.txt) don't match in sorting with the ones in the terminal. Notice that they are unsorted and by using while read line doesn't work. Thank you for taking the time and commitment to help me.
Having problems following the logic of the current code so in the name of K.I.S.S. I propose:
tgtdir=/my/target/directory
while read -r srcdir
do
[[ -d "${srcdir}" ]] && cp -rp "${srcdir}" "${tgtdir}"
done < <(tr -d '\r' < names.dat)
NOTES:
the < <(tr -d '\r' < names.dat) is used to remove windows/dos line endings from names.dat (per comments from OP); if names.dat is updated to remove the \r' then the tr -d with be a no-op (ie, bit of overhead to spawn the subprocess but the script should still read names.dat correctly)
assumes script is run from the directory where the source directories reside otherwise code can be modified to either cd to said directory or preface the ${srcdir} references with said directory
OP can add/modify the cp flags as needed, but I'm assuming at a minimum -r will be needed in order to recursively copy the directories
UUoC.
cat names.txt | while read line; do ...; done
is better written
while read line; do ...; done < names.txt
do grep -l '$LINE' | is eating your input.
printf "%s\n" 1 2 3 |while read line; do echo "Read: [$line]"; grep . | cat; done
Read: [1]
2
3
In your case, it is likely finding no lines that match the literal string $LINE which you have embedded in single-qote marks, which do not allow it to be parsed for content. Use "$line" (avoid capitals), and wouldn't be helpful even if it did match:
$: printf "%s\n" 1 2 3 | grep -l .
(standard input)
You didn't tell it what to read from, so -l is pointless since it's reading the same stdin stream that the read is.
I think what you want is a little simpler -
xargs cp -Rt /your/desired/target/directory/ < names.txt
Assuming you wanted to leave the originals where they were.

Find files in different directories and operate on the filenames

$ ls /tmp/foo/
file1.txt file2.txt
$ ls /tmp/bar/
file20.txt
$ ls /tmp/foo/file*.txt | grep -o -E '[0-9]+' | sort -n | paste -s -d,
1,2
How to fetch the number in the filename from both the directories? in the above example, I need to get 1,2,20, its in bash shell.
UPDATE:
$ ls /tmp/foo/file*.txt /tmp/bar/file*.txt /tmp/jaz99/file*.txt /tmp/nah/file*.txt | grep -o -E '[0-9]+' | sort -n | paste -s -d,
ls: cannot access /tmp/nah/file*.txt: No such file or directory
1,2,20,30,99
in this case, it should not print 99 (as its not matched by *), and should not print the error if file not found.
You can get this done using a loop with output of find:
s=
# run a loop using find command in a process substitution
while IFS= read -d '' -r file; do
file="${file##*/}" # strip down all directory paths
s+="${file//[!0-9]/}," # remove all non-numeric characters and append comma
done < <(find /tmp/{foo,bar,nah,jaz99} -name '*.txt' -print0 2>/dev/null)
echo "${s%,}" # remove last comma from string
Output
1,2,20,30
Here's my take on this. Use arrays. No need to use external tools like sed or awk or find.
#!/usr/bin/env bash
declare -a a=()
for f in /tmp/{foo,bar,nah}/file*.txt; do
[[ $f =~ .*file([0-9]+).* ]]
a+=( ${BASH_REMATCH[1]} )
done
IFS=,
echo "${a[*]}"
The [[...]] expression populates the $BASH_REMATCH array with regex components. You can use that to extract the numbers and place them in a new temporary array, which you can express with comma separators using $IFS.
Results:
$ mkdir /tmp/foo /tmp/bar
$ touch /tmp/foo/file{1,2}.txt /tmp/bar/file20.txt
$ ./doit
1,2,20

How to print the result of the first part of the pipe?

I have the following grep:
grep -Po '(?<=PROGRAM\()[^\)]+(?=\))' /home/programs/hello_word.sh
Wich displays the string between PROGRAM( and ):
RECTONTER
Then, I need to know if these string extracted is contained in a file, so:
grep -Po '(?<=PROGRAM\()[^\)]+(?=\))' /home/programs/hello_word.sh | xargs -I % grep -e % /home/leherad/pgm_currentdate
File content:
RECTONTER
CORASFE
RENTOASD
UBICARP
If its found, returns the line of /home/leherad/pgm_currentdate, but I want to print the line extracted in the first grep (RECTONTER). If not found, then wouldn't return nothing.
There is a simple way to do this, or I should not complicate and would be better build a script and save the first grep in a variable?
You can store it on a variable first:
read -r FIRST < <(exec grep -Po '(?<=PROGRAM\()[^\)]+(?=\))' /home/programs/hello_word.sh) && grep -e "$FIRST" /home/leherad/pgm_currentdate
Update 01
#!/bin/bash
shopt -s nullglob
for FILE in /home/programs/*; do
read -r FIRST < <(exec grep -Po '(?<=PROGRAM\()[^\)]+(?=\))' "$FILE") && grep -e "$FIRST" /home/leherad/pgm_currentdate && echo "$FIRST"
done
I think a straightforward way to solve this is to use a function.
Also, your grep pattern will match shell comments, which could cause unexpected behavior in your xargs command when there are more than one matches; you might want to take steps to only grab the first match. It's hard to say without actually seeing the input files, so I'm guessing this is either ok or comments are actually the expected place for your target pattern.
Anyway, here's my best guess at a function that would work for you.
get_program() {
local filename="$1"
local program="$( grep -m1 -Po '(?<=PROGRAM\()[^\)]+(?=\))' "$filename" )"
if grep -q -e "$program" /home/leherad/pgm_currentdate; then
echo $program
grep -e "$program" /home/leherad/pgm_currentdate
fi
}
get_program /home/programs/hello_word.sh

Ordering a loop in bash

I've a bash script like this:
for d in /home/test/*
do
echo $d
done
Which ouputs this:
/home/test/newer dir
/home/test/oldest dir
I'd like to order the folders by creation time so that the 'oldest dir' directory appears first in the list. I've tried ls and tree variations to no avail.
For example,
for d in `ls -d -c -1 $PWD/*`
Returns:
/home/test/oldest
dir
/home/test/newer
dir
Very close, but it does not respect the space in the directory name. My question, how would I have oldest dir on top and support the whitespace?
ls -d -c $PWD/* | while read line
do echo "$line"
done
Another technique, kind of a Schwartzian transform:
stat -c $'%Z\t%n' /home/test/* | sort -n | cut -f2- |
while IFS= read -r filename; do
# ...
This solution is fragile with filenames containing newlines.

Problems with Grep Command in bash script

I'm having some rather unusual problems using grep in a bash script. Below is an example of the bash script code that I'm using that exhibits the behaviour:
UNIQ_SCAN_INIT_POINT=1
cat "$FILE_BASENAME_LIST" | uniq -d >> $UNIQ_LIST
sed '/^$/d' $UNIQ_LIST >> $UNIQ_LIST_FINAL
UNIQ_LINE_COUNT=`wc -l $UNIQ_LIST_FINAL | cut -d \ -f 1`
while [ -n "`cat $UNIQ_LIST_FINAL | sed "$UNIQ_SCAN_INIT_POINT"'q;d'`" ]; do
CURRENT_LINE=`cat $UNIQ_LIST_FINAL | sed "$UNIQ_SCAN_INIT_POINT"'q;d'`
CURRENT_DUPECHK_FILE=$FILE_DUPEMATCH-$CURRENT_LINE
grep $CURRENT_LINE $FILE_LOCTN_LIST >> $CURRENT_DUPECHK_FILE
MATCH=`grep -c $CURRENT_LINE $FILE_BASENAME_LIST`
CMD_ECHO="$CURRENT_LINE matched $MATCH times," cmd_line_echo
echo "$CURRENT_DUPECHK_FILE" >> $FILE_DUPEMATCH_FILELIST
let UNIQ_SCAN_INIT_POINT=UNIQ_SCAN_INIT_POINT+1
done
On numerous occasions, when grepping for the current line in the file location list, it has put no output to the current dupechk file even though there have definitely been matches to the current line in the file location list (I ran the command in terminal with no issues).
I've rummaged around the internet to see if anyone else has had similar behaviour, and thus far all I have found is that it is something to do with buffered and unbuffered outputs from other commands operating before the grep command in the Bash script....
However no one seems to have found a solution, so basically I'm asking you guys if you have ever come across this, and any idea/tips/solutions to this problem...
Regards
Paul
The `problem' is the standard I/O library. When it is writing to a terminal
it is unbuffered, but if it is writing to a pipe then it sets up buffering.
try changing
CURRENT_LINE=`cat $UNIQ_LIST_FINAL | sed "$UNIQ_SCAN_INIT_POINT"'q;d'`
to
CURRENT LINE=`sed "$UNIQ_SCAN_INIT_POINT"'q;d' $UNIQ_LIST_FINAL`
Are there any directories with spaces in their names in $FILE_LOCTN_LIST? Because if they are, those spaces will need escaped somehow. Some combination of find and xargs can usually deal with that for you, especially xargs -0
A small bash script using md5sum and sort that detects duplicate files in the current directory:
CURRENT="" md5sum * |
sort |
while read md5sum filename;
do
[[ $CURRENT == $md5sum ]] && echo $filename is duplicate;
CURRENT=$md5sum;
done
you tagged linux, some i assume you have tools like GNU find,md5sum,uniq, sort etc. here's a simple example to find duplicate files
$ echo "hello world">file
$ md5sum file
6f5902ac237024bdd0c176cb93063dc4 file
$ cp file file1
$ md5sum file1
6f5902ac237024bdd0c176cb93063dc4 file1
$ echo "blah" > file2
$ md5sum file2
0d599f0ec05c3bda8c3b8a68c32a1b47 file2
$ find . -type f -exec md5sum "{}" \; |sort -n | uniq -w32 -D
6f5902ac237024bdd0c176cb93063dc4 ./file
6f5902ac237024bdd0c176cb93063dc4 ./file1

Resources