Iterative Bash Script Bug

Iterative Bash Script Bug - linux

Using a bash script, I'm trying to iterate through a text file that only has around 700 words, line-by-line, and run a case-insensitive grep search in the current directory using that word on particular files. To break it down, I'm trying to output the following to a file:
Append a newline to a file, then the searched word, then another newline
Append the results of the grep command using that search
Repeat steps 1 and 2 until all words in the list are exhausted
So for example, if I had this list.txt:
search1
search2
I'd want the results.txt to be:
search1:
grep result here
search2:
grep result here
I've found some answers throughout the stack exchanges on how to do this and have come up with the following implementation:
#!/usr/bin/bash
while IFS = read -r line;
do
"\n$line:\n" >> "results.txt";
grep -i "$line" *.in >> "results.txt";
done < "list.txt"
For some reason, however, this (and the numerous variants I've tried) isn't working. Seems trivial, but I'd it's been frustrating me beyond belief. Any help is appreciated.

Your script would work if you changed it to:
while IFS= read -r line; do
printf '\n%s:\n' "$line"
grep -i "$line" *.in
done < list.txt > results.txt
but it'd be extremely slow. See https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice for why you should think long and hard before writing a shell loop just to manipulate text. The standard UNIX tool for manipulating text is awk:
awk '
NR==FNR { words2matches[$0]; next }
{
for (word in words2matches) {
if ( index(tolower($0),tolower(word)) ) {
words2matches[word] = words2matches[word] $0 ORS
}
}
}
END {
for (word in words2matches) {
print word ":" ORS words2matches[word]
}
}
' list.txt *.in > results.txt
The above is untested of course since you didn't provide sample input/output we could test against.

Possible problems:
bash path - use /bin/bash path instead of /usr/bin/bash
blank spaces - remove ' ' after IFS
echo - use -e option for handling escape characters (here: '\n')
semicolons - not required at end of line
Try following script:
#!/bin/bash
while IFS= read -r line; do
echo -e "$line:\n" >> "results.txt"
grep -i "$line" *.in >> "results.txt"
done < "list.txt"

You do not even need to write a bash script for this purpose:
INPUT FILES:
$ more file?.in
::::::::::::::
file1.in
::::::::::::::
abc
search1
def
search3
::::::::::::::
file2.in
::::::::::::::
search2
search1
abc
def
::::::::::::::
file3.in
::::::::::::::
abc
search1
search2
def
search3
PATTERN FILE:
$ more patterns
search1
search2
search3
CMD:
$ grep -inf patterns file*.in | sort -t':' -k3 | awk -F':' 'BEGIN{OFS=FS}{if($3==buffer){print $1,$2}else{print $3; print $1,$2}buffer=$3}'
OUTPUT:
search1
file1.in:2
file2.in:2
file3.in:2
search2
file2.in:1
file3.in:3
search3
file1.in:4
file3.in:5
EXPLANATIONS:
grep -inf patterns file*.in will grep all the file*.in with all the patterns located in patterns file thanks to -f option, using -i forces insensitive case, -n will add the line numbers
sort -t':' -k3 you sort the output with the 3rd column to regroup patterns together
awk -F':' 'BEGIN{OFS=FS}{if($3==buffer){print $1,$2}else{print $3; print $1,$2}buffer=$3}' then awk will print the display that you want by using : as Field Separator and Output Field Separator, you use a buffer variable to save the pattern (3rd field) and you print the pattern whenever it changes ($3!=buffer)

Related

Filename manipulation

Kindly help me with a unix script to modify the filename in required format as shown below:
AN_555a_orange_20190513.txt
AN_555b_apple_20190513.txt
Required format: Fruits names first character should be in Caps and also its position should be is changed to second:
AN_Orange_555a_20190513.txt
AN_Apple_555a_20190513.txt
And it should apply for all files present in directory,
below is the command i'm trying which is not working
for in in aaal*
do
out=${in#*_}
out=${out%_*_*_*}
out=${out%[0-9]}
out1=${out#*_}
out2=${out%_*}
AAAI_$out1$out2.txt
done

This script is simple, but worked with your sample:
#!/bin/bash
for i in AN*; do
NAME=$(echo $i | awk -F_ '{printf "%s_%s%s_%s_%s", $1,toupper( substr( $3,1,1)),(substr($3,2,100)),$2,$4,$5}')
echo "--> $NAME"
done

An interesting solution for this case is to use sed, just like this:
$ ls -1 | sed 's/\(AN_\)\([^_]*_\)\([a-z]*_\)\([0-9]*.txt\)/mv "&" "\1\u\3\2\4"/e'
Note the final e at the end of the sed command. It tells sed to execute the result of the substitution as a bash command.
So if you remove the e (which you could do at first, to check the substitution works as expected), you would get in the console:
$ ls -1 | sed 's/\(AN_\)\([^_]*_\)\([a-z]*_\)\([0-9]*.txt\)/mv "&" "\1\u\3\2\4"/'
mv "AN_555a_orange_20190513.txt" "AN_Orange_555a_20190513.txt"
mv "AN_555b_apple_20190513.txt" "AN_Apple_555b_20190513.txt"
(The sed substitution matches the several groups of characters, reorders them and creates the mv ... ... line. Note that & in the replacement pattern denotes the whole pattern matched, and \u tells sed to put the next character as upper case.)
Then add back that final e, and instead of printing these lines sed will execute them, effectively renaming the files.

This onliner could give you more idas:
awk -F_ '{printf "mv %s %s_%s%s_%s_%s\n", $0, $1,toupper(substr($3,1,1)), substr($3, 2),$2,$4}' <(ls *.txt)
This will print something like:
mv AN_555a_orange_20190513.txt AN_Orange_555a_20190513.txt
mv AN_555b_apple_20190513.txt AN_Apple_555b_20190513.txt
Then if are happy with the results, pipe it to sh for example:
awk -F_ '{printf "mv %s %s_%s%s_%s_%s\n", $0, $1,toupper(substr($3,1,1)), substr($3, 2),$2,$4}' <(ls *.txt) | sh

Need to reduce the execution time

We are trying to execute below script for finding out the occurrence of a particular word in a log file
Need suggestions to optimize the script.
Test.log size - Approx to 500 to 600 MB
$wc -l Test.log
16609852 Test.log
po_numbers - 11 to 12k po's to search
$more po_numbers
xxx1335
AB1085
SSS6205
UY3347
OP9111
....and so on
Current Execution Time - 2.45 hrs
while IFS= read -r po
do
check=$(grep -c "PO_NUMBER=$po" Test.log)
echo $po "-->" $check >>list3
if [ "$check" = "0" ]
then
echo $po >>po_to_server
#else break
fi
done < po_numbers

You are reading your big file too many times when you execute
grep -c "PO_NUMBER=$po" Test.log
You can try to split your big file into smaller ones or write your patterns to a file and make grep use it
echo -e "PO_NUMBER=$po\n" >> patterns.txt
then
grep -f patterns.txt Test.log

$ grep -Fwf <(sed 's/.*/PO_NUMBER=&/' po_numbers) Test.log
create the lookup file from po_numbers (process substitution) check for literal word matches from the log file. This assumes the searched PO_NUMBER=xxx is a separate word, if not remove -w, also assumes there is no regex but just literal matches, if not remove -F, however both will slow down searches.

Using Grep :
sed -e 's|^|PO_NUMBER=|' po_numbers | grep -o -F -f - Test.log | sed -e 's|^PO_NUMBER=||' | sort | uniq -c > list3
grep -o -F -f po_numbers list3 | grep -v -o -F -f - po_numbers > po_to_server
Using awk :
This awk program might work faster
awk '(NR==FNR){ po[$0]=0; next }
{ for(key in po) {
str=$0
po[key]+=gsub("PO_NUMBER="key,"",str)
}
}
END {
for(key in po) {
if (po[key]==0) {print key >> "po_to_server" }
else {print key"-->"po[key] >> "list3" }
}
}' po_numbers Test.log
This does the following :
The first line loads the po keys from the file po_numbers
The second awk parser, will pars the file for occurences of PO_NUMBER=key per line. (gsub is a function which performs a substitutation and returns the substitution count)
In the end we print out the requested output to the requested files.
The assumption here is that is might be possible that multiple patterns could occure multiple times on a single line of Test.log
Comment: the original order of po_numbers will not be satisfied.

"finding out the occurrence"
Not sure if you mean to count the number of occurrences for each searched word or to output the lines in the log that contain at least one of the searched words. This is how you could solve it in the latter case:
(cat po_numbers; echo GO; cat Test.log) | \
perl -nle'$r?/$r/&&print:/GO/?($r=qr/#{[join"|",#s]}/):push#s,$_'

Replace string in a file from a file [duplicate]

This question already has answers here:
Difference between single and double quotes in Bash
(7 answers)
Closed 5 years ago.
I need help with replacing a string in a file where "from"-"to" strings coming from a given file.
fromto.txt:
"TRAVEL","TRAVEL_CHANNEL"
"TRAVEL HD","TRAVEL_HD_CHANNEL"
"FROM","TO"
First column is what to I'm searching for, which is to be replaced with the second column.
So far I wrote this small script:
while read p; do
var1=`echo "$p" | awk -F',' '{print $1}'`
var2=`echo "$p" | awk -F',' '{print $2}'`
echo "$var1" "AND" "$var2"
sed -i -e 's/$var1/$var2/g' test.txt
done <fromto.txt
Output looks good (x AND y), but for some reason it does not replace the first column ($var1) with the second ($var2).
test.txt:
"TRAVEL"
Output:
"TRAVEL" AND "TRAVEL_CHANNEL"
sed -i -e 's/"TRAVEL"/"TRAVEL_CHANNEL"/g' test.txt
"TRAVEL HD" AND "TRAVEL_HD_CHANNEL"
sed -i -e 's/"TRAVEL HD"/"TRAVEL_HD_CHANNEL"/g' test.txt
"FROM" AND "TO"
sed -i -e 's/"FROM"/"TO"/g' test.txt
$ cat test.txt
"TRAVEL"

input:
➜ cat fromto
TRAVEL TRAVEL_CHANNEL
TRAVELHD TRAVEL_HD
➜ cat inputFile
TRAVEL
TRAVELHD
The work:
➜ awk 'BEGIN{while(getline < "fromto") {from[$1] = $2}} {for (key in from) {gsub(key,from[key])} print}' inputFile > output
and output:
➜ cat output
TRAVEL_CHANNEL
TRAVEL_CHANNEL_HD
➜
This first (BEGIN{}) loads your input file into an associate array: from["TRAVEL"] = "TRAVEL_HD", then rather inefficiently performs search and replace line by line for each array element in the input file, outputting the results, which I piped to a separate outputfile.
The caveat, you'll notice, is that the search and replaces can interfere with each other, the 2nd line of output being a perfect example since the first replacement happens. You can try ordering your replacements differently, or use a regex instead of a gsub. I'm not certain if awk arrays are guaranteed to have a certain order, though. Something to get you started, anyway.
2nd caveat. There's a way to do the gsub for the whole file as the 2nd step of your BEGIN and probably make this much faster, but I'm not sure what it is.

you can't do this oneshot you have to use variables within a script
maybe something like below sed command for full replacement
-bash-4.4$ cat > toto.txt
1
2
3
-bash-4.4$ cat > titi.txt
a
b
c
-bash-4.4$ sed 's|^\s*\(\S*\)\s*\(.*\)$|/^\2\\>/s//\1/|' toto.txt | sed -f - titi.txt > toto.txt
-bash-4.4$ cat toto.txt
a
b
c
-bash-4.4$

Print all columns except first using AWK

I have a file which contains file list. The file looks like this
$ cat filelist
D src/layouts/PersonAccount-Person Account Layout.layout
D src/objects/Case Account-Record List.object
I want to cut first two Columns and print only file names with along directory path. This list is dynamic. File name has spaces in between. So I can't use space as delimiter. How to get this using AWK command?
The output should be like this
src/layouts/PersonAccount-Person Account Layout.layout
src/objects/Case Account-Record List.object

Can you try this once:
bash-4.4$ cat filelist |awk '{$1="";print $0}'
src/layouts/PersonAccount-Person Account Layout.layout
src/objects/Case Account-Record List.object
else if you want to remove 2 columns it would be:
awk '{$1=$2="";print $0}'
This will produce the below output:
bash-4.4$ cat filelist |awk '{$1=$2="";print $0}'
Account Layout.layout
Account-Record List.object

Try this out:
awk -F" " '{$1=""; print $0}' filelist | sed 's/^ //c'
Here sed is used to remove the first space of the output line.

print only file names with along directory path
awk approach:
awk '{ sub(/^[[:space:]]*[^[:space:]][[:space:]]+/,"",$0) }1' filelist
The output:
src/layouts/PersonAccount-Person Account Layout.layout
src/objects/Case Account-Record List.object
----------
To extract only basename of the file:
awk -F'/' '{print $NF}' filelist
The output:
PersonAccount-Person Account Layout.layout
Case Account-Record List.object

This will do exactly what you want for your example :
sed -E 's/(.*)([ ][a-zA-Z0-9]+\/[a-zA-Z0-9]+\/[a-zA-Z0-9. -]+)/\2/g' filelist
Explanation :
Its matching your path (including spaces if there were any ) and then replacing the whole line with that one match. Easy peasy lemon squeezy :)
Regards!

A simple grep
grep -o '[^[:blank:]]*/.*' filelist
That's zero or more non-blank characters followed by a slash followed by the rest of the string.
This will not match any lines that don't have a slash

Here is a portable POSIX shell solution:
#!/bin/sh
cat "$#" |while read line; do
echo "${line#* * }"
done
This loops over each line of the given input file(s) (or else standard input) and prints the line without the first two spaces or the text that exists before them. It is not greedy.
Unlike some of the other answers here, this will preserve spacing (if any) in the rest of the line.
If you want that as a one-liner:
while read L < filelist; do echo "${L#* * }"; done
This will fail if the uppermost directory's name starts with a space. To work around that, you need to peel away the leading ten characters (which I assume are static):
#!/bin/sh
cat "$#" |while read line; do
echo "${line#??????????}"
done
As a one-liner, in bash, this can be simplified by using substrings:
while read L < filelist; do echo "${L:10}"; done

Finding contents of one file in another file

I'm using the following shell script to find the contents of one file into another:
#!/bin/ksh
file="/home/nimish/contents.txt"
while read -r line; do
grep $line /home/nimish/another_file.csv
done < "$file"
I'm executing the script, but it is not displaying the contents from the CSV file. My contents.txt file contains number such as "08915673" or "123223" which are present in the CSV file as well. Is there anything wrong with what I do?

grep itself is able to do so. Simply use the flag -f:
grep -f <patterns> <file>
<patterns> is a file containing one pattern in each line; and <file> is the file in which you want to search things.
Note that, to force grep to consider each line a pattern, even if the contents of each line look like a regular expression, you should use the flag -F, --fixed-strings.
grep -F -f <patterns> <file>
If your file is a CSV, as you said, you may do:
grep -f <(tr ',' '\n' < data.csv) <file>
As an example, consider the file "a.txt", with the following lines:
alpha
0891234
beta
Now, the file "b.txt", with the lines:
Alpha
0808080
0891234
bEtA
The output of the following command is:
grep -f "a.txt" "b.txt"
0891234
You don't need at all to for-loop here; grep itself offers this feature.
Now using your file names:
#!/bin/bash
patterns="/home/nimish/contents.txt"
search="/home/nimish/another_file.csv"
grep -f <(tr ',' '\n' < "${patterns}") "${search}"
You may change ',' to the separator you have in your file.

Another solution:
use awk and create your own hash(e.g. ahash), all controlled by yourself.
replace $0 to $i and you can match any fields you want.
awk -F"," '
{
if (nowfile==""){ nowfile = FILENAME; }
if(FILENAME == nowfile)
{
hash[$0]=$0;
}
else
{
if($0 ~ hash[$0])
{
print $0
}
}
} ' xx yy

I don't think you really need a script to perform what you're trying to do.
One command is enough. In my case, I needed an identification number in column 11 in a CSV file (with ";" as separator):
grep -f <(awk -F";" '{print $11}' FILE_TO_EXTRACT_PATTERNS_FROM.csv) TARGET_FILE.csv

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Iterative Bash Script Bug - linux

Related

Filename manipulation

Need to reduce the execution time

Replace string in a file from a file [duplicate]

Print all columns except first using AWK

Finding contents of one file in another file

Categories

Resources