How to move files using the result as condition after grep command - linux

I have 2 files that I needed to grep in a separate file.
The two files are in this directory /var/list
TB.1234.txt
TB.135325.txt
I have to grep them in another file in another directory which is in /var/sup/. I used the command below:
for i in TB.*; do grep "$i" /var/sup/logs.txt; done
what I want to do is, if the result of the grep command contains the word "ERROR" the files which is found in /var/list will be moved to another directory /var/last.
for example I grep this file TB.1234.txt to /var/sup/logs.txt then the result is like this:
ERROR: TB.1234.txt
TB.1234.txt will be move to /var/last.
please help. I don't know how to construct the logic on how to move the files, I'm stuck in that I provided, I am also trying to use two greps in a for loop but I am encountering an error.
I am new in coding and really appreciates any help and suggestions. Thank you so much.

If you are asking how to move files which contain "ERROR", this should be extremely straightforward.
for file in TB.*; do
grep -q 'ERROR' "$file" &&
mv "$file" /var/last/
done
The notation this && that is a convenient shorthand for
if this; then
that
fi
The -q option to grep says to not print the matches, and quit as soon as you find one. Like all well-defined commands, grep sets its exit code to reflect whether it succeeded (the status is visible in $?, but usually you would not examine it directly; perhaps see also Why is testing ”$?” to see if a command succeeded or not, an anti-pattern?)
Your question is rather unclear, but if you want to find either of the matching files in a third file, perhaps something like
awk 'FNR==1 && (++n < ARGC-1) { a[n] = FILENAME; nextfile }
/ERROR/ { for(j=1; j<=n; ++j) if ($0 ~ a[j]) b[a[j]]++ }
END { for(f in b) print f }' TB*.txt /var/sup/logs.txt |
xargs -r mv -t /var/last/
This is somewhat inefficient in that it will read all the lines in the log file, and brittle in that it will only handle file names which do not contain newlines. (The latter restriction is probably unimportant here, as you are looking for file names which occur on the same line as the string "ERROR" in the first place.)
In some more detail, the Awk script collects the wildcard matches into the array a, then processes all lines in the last file, looking for ones with "ERROR" in them. On these lines, it checks if any of the file names in a are also found, and if so, also adds them to b. When all lines have been processed, print the entries in b, which are then piped to a simple shell command to move them.
xargs is a neat command to read some arguments from standard input, and run another command with those arguments added to its command line. The -r option says to not run the other command if there are no arguments.
(mv -t is a GNU extension; it's convenient, but not crucial to have here. If you need portable code, you could replace xargs with a simple while read -r loop.)
The FNR==1 condition requires that the input files are non-empty.
If the text file is small, or you expect a match near its beginning most of the time, perhaps just live with grepping it multiple times:
for file in TB.*; do
grep -Eq "ERROR.*$file|$file.*ERROR" /var/sup/logs.txt &&
mv "$file" /var/last/
done
Notice how we now need double quotes, not single, around the regular expression so that the variable $file gets substituted in the string.

grep has an -l switch, showing only the filename of the file which contains a pattern. It should not be too difficult to write something like (this is pseudocode, it won't work, it's just for giving you an idea):
if $(grep -l "ERROR" <directory> | wc -l) > 0
then foreach (f in $(grep -l "ERROR")
do cp f <destination>
end if
The wc -l is to check if there are any files which contain the word "ERROR". If not, nothing needs to be done.
Edit after Tripleee's comment:
My proposal can be simplified as:
if grep -lq "ERROR" TB.*;
then foreach (f in $(grep -l "ERROR")
do cp f <destination>
end if
Edit after Tripleee's second comment:
This is even shorter:
for f in $(grep -l "ERROR" TB.*);
do cp "$f" destination;
done

Related

Bash script that counts and prints out the files that start with a specific letter

How do i print out all the files of the current directory that start with the letter "k" ?Also needs to count this files.
I tried some methods but i only got errors or wrong outputs. Really stuck on this as a newbie in bash.
Try this Shellcheck-clean pure POSIX shell code:
count=0
for file in k*; do
if [ -f "$file" ]; then
printf '%s\n' "$file"
count=$((count+1))
fi
done
printf 'count=%d\n' "$count"
It works correctly (just prints count=0) when run in a directory that contains nothing starting with 'k'.
It doesn't count directories or other non-files (e.g. fifos).
It counts symlinks to files, but not broken symlinks or symlinks to non-files.
It works with 'bash' and 'dash', and should work with any POSIX-compliant shell.
Here is a pure Bash solution.
files=(k*)
printf "%s\n" "${files[#]}"
echo "${#files[#]} files total"
The shell expands the wildcard k* into the array, thus populating it with a list of matching files. We then print out the array's elements, and their count.
The use of an array avoids the various problems with metacharacters in file names (see e.g. https://mywiki.wooledge.org/BashFAQ/020), though the syntax is slightly hard on the eyes.
As remarked by pjh, this will include any matching directories in the count, and fail in odd ways if there are no matches (unless you set nullglob to true). If avoiding directories is important, you basically have to get the directories into a separate array and exclude those.
To repeat what Dominique also said, avoid parsing ls output.
Demo of this and various other candidate solutions:
https://ideone.com/XxwTxB
To start with: never parse the output of the ls command, but use find instead.
As find basically goes through all subdirectories, you might need to limit that, using the -maxdepth switch, use value 1.
In order to count a number of results, you just count the number of lines in your output (in case your output is shown as one piece of output per line, which is the case of the find command). Counting a number of lines is done using the wc -l command.
So, this comes down to the following command:
find ./ -maxdepth 1 -type f -name "k*" | wc -l
Have fun!
This should work as well:
VAR="k"
COUNT=$(ls -p ${VAR}* | grep -v ":" | wc -w)
echo -e "Total number of files: ${COUNT}\n" 1>&2
echo -e "Files,that begin with ${VAR} are:\n$(ls -p ${VAR}* | grep -v ":" )" 1>&2

Pass command-line arguments to grep as search patterns and print lines which match them all

I'm learning about grep commands.
I want to make a program that when a user enters more than one word, outputs a line containing the word in the data file.
So I connected the words that the user typed with '|' and put them in the grep command to create the program I intended.
But this is OR operation. I want to make AND operation.
So I learned how to use AND operation with grep commands as follows.
cat <file> | grep 'pattern1' | grep 'pattern2' | grep 'pattern3'
But I don't know how to put the user input in the 'pattern1', 'pattern2', 'pattern3' position. Because the number of words the user inputs is not determined.
As user input increases, grep must be executed using more and more pipes, but I don't know how to build this part.
The user input is as follows:
$ [the name of my program] 'pattern1' 'pattern2' 'pattern3' ...
I'd really appreciate your help.
With grep -f you can grep multiple items, when each of them is on a line in a file.
With <(command) you can let Bash think that the result of command is a file.
With printf "%s\n" and a list of arguments, each argument is printed on a new line.
Together:
grep -f <(printf "%s\n" "$#") datafile
suggesting to use awk pattern logic:
awk '/RegExp-pattern-1/ && /RegExp-pattern-2/ && /RegExp-pattern-3/ 1' input.txt
The advantages: you can play with logic operators && || on RegExp patterns. And your are scanning the whole file once.
The disadvantages: must provide files list (can't traverse sub directories), and limited RegExp syntax compared to grep -E or grep -P
In principle, what you are asking could be done with a loop with output to a temporary file.
file=inputfile
temp=$(mktemp -d -t multigrep.XXXXXXXXX) || exit
trap 'rm -rf "$temp"' ERR EXIT
for regex in "$#"; do
grep "$regex" "$file" >"$temp"/output
mv "$temp"/output "$temp"/input
file="$temp"/input
done
cat "$temp"/input
However, a better solution is probably to arrange for Awk to check for all the patterns in one go, and avoid reading the same lines over and over again.
Passing the arguments to Awk with quoting intact is not entirely trivial. Here, we simply pass them as command-line arguments and process those into an array within the Awk script itself.
awk 'BEGIN { for(i=1; i<ARGC; ++i) a[i]=ARGV[i];
ARGV[1]="-"; ARGC=1 }
{ for(n=1; n<=i; ++n) if ($0 !~ a[n]) next; }1' "$#" <file
In brief, in the BEGIN block, we copy the command-line arguments from ARGV to a, then replace ARGV and ARGC to pass Awk a new array of (apparent) command-line arguments which consists of just - which means to read standard input. Then, we simply iterate over a and skip to the next line if the current input line from standard input does not match. Any remaining lines have matched all the patterns we passed in, and are thus printed.

Can I get the name of the file currently being read in a for loop?

I want to write a script that takes a word as an argument and searches the current and sub directories' files for the word. if it is found in any of the files it should echo out a message containing the file name and the line the word is found on.
this is what I have so far, but I can't find a way to actually store the file name of the file being read or the line number..
word=$1
for var in $(grep -R "$word *")
do
filename=$(find . -type f -name "*") ------- //this doesnt work
linenmbr=$(grep -n "$ord" file) ----------- //this doesnt work
echo found $word in $filename on line number $linenmbr
done
In bash, any time you are looping, you want to avoid calling utilities (e.g. grep and find) within the loop. That is horribly inefficient because it will spawn a separate subshell for every utility every iteration. (which for 10 iterations -- that is 20 additional subshells, it adds up quick) So in your case, you call grep to feed the loop, and then spawn a separate subshell calling grep again within the loop as well as spawning a separate subshell for find.
You should think of a way to only call grep (or a utility that will provide the needed information) only once, and then parse the output.
If you did want to use grep, then calling grep -rn within a process substitution which is used to feed a while loop is probably as good as you are going to get. You can then use the bash builtin parameter expansions to isolate the filename and line-numbers which will be about as efficient as bash could get, e.g.
#!/bin/bash
[ -z "$1" ] && { ## validate at least 1 input given
printf "error: insufficient input.\nusage: %s srch_term\n" "${0##*/}"
exit 1
}
while read -r line; do ## read each line of grep output
fn="${line%%:*}" ## isolate filename
no="${line#*:}" ## remove filename
no="${no%%:*}" ## isolate number
printf "found %s in %s on line number %d\n" "$1" "$fn" "$no"
done < <(grep -rn "$1") ## grep in process substitution
Choosing A More Efficient Method
If you can accomplish what you are attempting with one of the stream editing tools, e.g. awk or sed, you are likely to be able to isolate the wanted information an order of magnitude faster. For example, using awk and setting globstar you could do something similar to the following:
#!/bin/bash
shopt -s globstar ## set globstar
[ -z "$1" ] && { ## validate at least 1 input given
printf "error: insufficient input.\nusage: %s srch_term\n" "${0##*/}"
exit 1
}
## find all matching files and line numbers
awk -v word="$1" '/'$1'/ {
print "found",word,"in",FILENAME,"on line number",FNR; next
}' **/* 2>/dev/null
Give both a try and let me know if you have further questions.
If you want to compare and ensure both are producing the same output, you can use diff to confirm, e.g.
$ diff <(grepscript.sh | sort) <(awkscript.sh | sort)
(if no difference is reported, the output is the same)

Removing 10 Characters of Filename in Linux

I just downloaded about 600 files from my server and need to remove the last 11 characters from the filename (not including the extension). I use Ubuntu and I am searching for a command to achieve this.
Some examples are as follows:
aarondyne_kh2_13thstruggle_or_1250556383.mus should be renamed to aarondyne_kh2_13thstruggle_or.mus
aarondyne_kh2_darknessofunknow_1250556659.mp3 should be renamed to aarondyne_kh2_darknessofunknow.mp3
It seems that some duplicates might exist after I do this, but if the command fails to complete and tells me what the duplicates would be, I can always remove those manually.
Try using the rename command. It allows you to rename files based on a regular expression:
The following line should work out for you:
rename 's/_\d+(\.[a-z0-9A-Z]+)$/$1/' *
The following changes will occur:
aarondyne_kh2_13thstruggle_or_1250556383.mus renamed as aarondyne_kh2_13thstruggle_or.mus
aarondyne_kh2_darknessofunknow_1250556659.mp3 renamed as aarondyne_kh2_darknessofunknow.mp3
You can check the actions rename will do via specifying the -n flag, like this:
rename -n 's/_\d+(\.[a-z0-9A-Z]+)$/$1/' *
For more information on how to use rename simply open the manpage via: man rename
Not the prettiest, but very simple:
echo "$filename" | sed -e 's!\(.*\)...........\(\.[^.]*\)!\1\2!'
You'll still need to write the rest of the script, but it's pretty simple.
find . -type f -exec sh -c 'mv {} `echo -n {} | sed -E -e "s/[^/]{10}(\\.[^\\.]+)?$/\\1/"`' ";"
one way to go:
you get a list of your files, one per line (by ls maybe) then:
ls....|awk '{o=$0;sub(/_[^_.]*\./,".",$0);print "mv "o" "$0}'
this will print the mv a b command
e.g.
kent$ echo "aarondyne_kh2_13thstruggle_or_1250556383.mus"|awk '{o=$0;sub(/_[^_.]*\./,".",$0);print "mv "o" "$0}'
mv aarondyne_kh2_13thstruggle_or_1250556383.mus aarondyne_kh2_13thstruggle_or.mus
to execute, just pipe it to |sh
I assume there is no space in your filename.
This script assumes each file has just one extension. It would, for instance, rename "foo.something.mus" to "foo.mus". To keep all extensions, remove one hash mark (#) from the first line of the loop body. It also assumes that the base of each filename has at least 12 character, so that removing 11 doesn't leave you with an empty name.
for f in *; do
ext=${f##*.}
new_f=${base%???????????.$ext}
if [ -f "$new_f" ]; then
echo "Will not rename $f, $new_f already exists" >&2
else
mv "$f" "$new_f"
fi
done

recursively "normalize" filenames

i mean getting rid of special chars in filenames, etc.
i have made a script, that can recursively rename files [http://pastebin.com/raw.php?i=kXeHbDQw]:
e.g.: before:
THIS i.s my file (1).txt
after running the script:
This-i-s-my-file-1.txt
Ok. here it is:
But: when i wanted to test it "fully", with filenames like this:
¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÂÃÄÅÆÇÈÊËÌÎÏÐÑÒÔÕ×ØÙUÛUÝÞßàâãäåæçèêëìîïðñòôõ÷øùûýþÿ.txt
áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&'()*+,:;<=>?#[\]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£.txt
it fails [http://pastebin.com/raw.php?i=iu8Pwrnr]:
$ sh renamer.sh directorythathasthefiles
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?#[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?#[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?#[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?#[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?#[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?#[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?#[]^_`{|}~€‚ƒ„…†....and so on
$
so "mv" can't handle special chars.. :\
i worked on it for many hours..
does anyone has a working one? [that can handle chars [filenames] in that 2 lines too?]
mv handles special characters just fine. Your script doesn't.
In no particular order:
You are using find to find all directories, and ls each directory separately.
Why use for DEPTH in... if you can do exactly the same with one command?
find -maxdepth 100 -type d
Which makes the arbitrary depth limit unnecessary
find -type d
Don't ever parse the output of ls, especially if you can let find handle that, too
find -not -type d
Make sure it works in the worst possible case:
find -not -type d -print0 | while read -r -d '' FILENAME; do
This stops read from eating certain escapes and choking on filenames with new-line characters.
You are repeating the entire ls | replace cycle for every single character. Don't - it kills performance. Loop over each directory all files once, and just use multiple sed's, or multiple replacements in one sed command.
sed 's/á/a/g; s/í/i/g; ...'
(I was going to suggest sed 'y/áí/ai/', but unfortunately that doesn't seem to work with Unicode. Perhaps perl -CS -Mutf8 -pe 'y/áí/ai/' would.)
You're still thinking in ASCII: "other special chars - ASCII Codes 33.. ..255". Don't.
These days, most systems use Unicode in UTF-8 encoding, which has a much wider range of "special" characters - so big that listing them out one by one becomes pointless. (It is even multibyte - "e" is one byte, "ė" is three bytes.)
True ASCII has 128 characters. What you currently have in mind are the ISO 8859 character sets (sometimes called "ANSI") - in particular, ISO 8859-1. But they go all the way up to 8859-16, and only the "ASCII" part stays the same.
echo -n $(command) is rather useless.
There are much easier ways to find the directory and basename given a path. For example, you can do
directory=$(dirname "$path")
oldnname=$(basename "$path")
# filter $oldname
mv "$path" "$directory/$newname"
Do not use egrep to check for errors. Check the program's return code. (Like you already do with cd.)
And instead of filtering out other errors, do...
if [[ -e $directory/$newname ]]; then
echo "target already exists, skipping: $oldname -> $newname"
continue
else
mv "$path" "$directory/$newname"
fi
The ton of sed 's/------------/-/g' calls can be changed to a single regexp:
sed -r 's/-{2,}/-/g'
The [ ]s in tr [foo] [bar] are unnecessary. They just cause tr to replace [ to [, and ] to ].
Seriously?
echo "$FOLDERNAME" | sed "s/$/\//g"
How about this instead?
echo "$FOLDERNAME/"
And finally, use detox.
Try something like:
find . -print0 -type f | awk 'BEGIN {RS="\x00"} { printf "%s\x00", $0; gsub("[^[:alnum:]]", "-"); printf "%s\0", $0 }' | xargs -0 -L 2 mv
Use of xargs(1) will ensure that each filename passed exactly as one parameter. awk(1) is used to add new filename right after old one.
One more trick: sed -e 's/-+/-/g' will replace groups of more than one "-" with exactly one.
Assuming the rest of your script is right, your problem is that you are using read but you should use read -r. Notice how the backslash disappeared:
áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&'()*+,:;<=>?#[\]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£.txt
áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?#[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£
Ugh...
Some tips to clean up your script:
** Use sed to do translation on multiple characters at once, that'll clean things up and make it easier to manage:
dev:~$ echo 'áàaieeé!.txt' | sed -e 's/[áàã]/a/g; s/[éè]/e/g'
aaaieee!.txt
** rather than renaming the file for each change, run all your filters then do one move
$ NEWNAME='áàaieeé!.txt'
$ NEWNAME="$(echo "$NEWNAME" | sed -e 's/[áàã]/a/g; s/[éè]/e/g')"
$ NEWNAME="$(echo "$NEWNAME" | sed -e 's/aa*/a/g')"
$ echo $NEWNAME
aieee!.txt
** rather than doing a ls | read ... loop, use:
for OLDNAME in $DIR/*; do
blah
blah
blah
done
** separate out your path traversal and renaming logic into two scripts. One script finds the files which need to be renamed, one script handles the normalization of a single file. Once you learn the 'find' command, you'll realize you can toss the first script :)

Resources