Why does echo command interpret variable for base directory? - linux

I would like to find some file types in pictures folder and I have created the following bash-script in /home/user/pictures folder:
for i in *.pdf *.sh *.txt;
do
echo 'all file types with extension' $i;
find /home/user/pictures -type f -iname $i;
done
But when I execute the bash-script, it does not work as expected for files that are located on the base directory /home/user/pictures. Instead of echo 'All File types with Extension *.sh' the command interprets the variable for base directory:
all file types with extension file1.sh
/home/user/pictures/file1.sh
all file types with extension file2.sh
/home/user/pictures/file2.sh
all file types with extension file3.sh
/home/user/pictures/file3.sh
I would like to know why echo - command does not print "All File types with Extension *.sh".

Revised code:
for i in '*.pdf' '*.sh' '*.txt'
do
echo "all file types with extension $i"
find /home/user/pictures -type f -iname "$i"
done
Explanation:
In bash, a string containing *, or a variable which expands to such a string, may be expanded as a glob pattern unless that string is protected from glob expansion by putting it inside quotes (although if the glob pattern does not match any files, then the original glob pattern will remain after attempted expansion).
In this case, it is not wanted for the glob expansion to happen - the string containing the * needs to be passed as a literal to each of the echo and the find commands. So the $i should be enclosed in double quotes - these will allow the variable expansion from $i, but the subsequent wildcard expansion will not occur. (If single quotes, i.e. '$i' were used instead, then a literal $i would be passed to echo and to find, which is not wanted either.)
In addition to this, the initial for line needs to use quotes to protect against wildcard expansion in the event that any files matching any of the glob patterns exist in the current directory. Here, it does not matter whether single or double quotes are used.
Separately, the revised code here also removes some unnecessary semicolons. Semicolons in bash are a command separator and are not needed merely to terminate a statement (as in C etc).
Observed behaviour with original code
What seems to be happening here is that one of the patterns used in the initial for statement is matching files in the current directory (specifically the *.sh is matching file1.sh file2.sh, and file3.sh). It is therefore being replaced by a list of these filenames (file1.sh file2.sh file3.sh) in the expression, and the for statement will iterate over these values. (Note that the current directory might not be the same as either where the script is located or the top level directory used for the find.)
It would also still be expected that the *.pdf and *.txt would be used in the expression -- either substituted or not, depending on whether any matches are found. Therefore the output shown in the question is probably not the whole output of the script.

Such expressions (*.blabla) changes the value of $i in the loop. Here is the trick i would do :
for i in pdf sh txt;
do
echo 'all file types with extension *.'$i;
find /home/user/pictures -type f -iname '*.'$i;
done

Related

Delete files in a variable - bash

i have a variable of filenames that end with a vowel. I need to delete all of these files at once. I have tried using
rm "$vowels"
but that only seems to return the files within the variable and state that there is "No such file or Directory"
Its your use of quotes: they tell rm that your variables contents are to be interpreted as a single argument (filename). Without quotes the contents will be broken into multiple arguments using the shell rules in effect.
Be aware that this can be risky if your filenames contain spaces - as theres no way to tell the difference between spaces between filenames, and spaces IN filenames.
You can get around this by using an array instead and using quoted array expansion (which I cant remember the syntax of, but might look something like rm "${array[#]}" - where each element in the array will be output as a quoted string).
SOLUTION
assigning the variable
vowel=$(find . -type f | grep "[aeiou]$")
removing all files within variable
echo $vowel | xargs rm -v

How to add sequential numbers say 1,2,3 etc. to each file name and also for each line of the file content in a directory?

I want to add sequential number for each file and its contents in a directory. The sequential number should be prefixed with the filename and for each line of its contents should have the same number prefixed. In this manner, the sequential numbers should be generated for all the files(for names and its contents) in the sub-folders of the directory.
I have tried using maxdepth, rename, print function as a part. but it throws error saying that "-maxdepth" - not a valid option.
I have already a part of code(to print the names and contents of text files in a directory) and this logic should be appended with it.
#!bin/bash
cd home/TESTING
for file in home/TESTING;
do
find home/TESTING/ -type f -name *.txt -exec basename {} ';' -exec cat {} \;
done
P.s - print, rename, maxdepth are not working
If the name of the first file is File1.txt and its contents is mentioned as "Louis" then the output for the filename should be 1File1.txt and the content should be as "1Louis".The same should be replaced with 2 for second file. In this manner, it has to traverse through all the subfolders in the directory and print accordingly. I have already a part of code and this logic should be appended with it.
There should be fail safe if you execute cd in a script. You can execute command in wrong directory if you don't.
In your attempt, the output would be the same even without the for cycle, as for file in home/TESTING only pass home/TESTING as argument to for so it only run once. In case of
for file in home/TESTING/* this would happen else how.
I used find without --maxdepth, so it will look into all subdirectory as well for *.txt files. If you want only the current directory $(find /home/TESTING/* -type f -name "*.txt") could be replaced to $(ls *.txt) as long you do not have directory that end to .txt there will be no problem.
#!/bin/bash
# try cd to directory, do things upon success.
if cd /home/TESTING ;then
# set sequence number
let "x = 1"
# pass every file to for that find matching, sub directories will be also as there is no maxdeapth.
for file in $(find /home/TESTING/* -type f -name "*.txt") ; do
# print sequence number, and base file name, processed by variable substitution.
# basename can be used as well but this is bash built in.
echo "${x}${file##*/}"
# print file content, and put sequence number before each line with stream editor.
sed 's#^#'"${x}"'#g' ${file}
# increase sequence number with one.
let "x++"
done
# unset sequence number
unset 'x'
else
# print error on stderr
echo 'cd to /home/TESTING directory is failed' >&2
fi
Variable Substitution:
There is more i only picked this 4 for now as they similar.
${var#pattern} - Use value of var after removing text that match pattern from the left
${var##pattern} - Same as above but remove the longest matching piece instead the shortest
${var%pattern} - Use value of var after removing text that match pattern from the right
${var%%pattern} - Same as above but remove the longest matching piece instead the shortest
So ${file##*/} will take the variable of $file and drop every caracter * before the last ## slash /. The $file variable value not get modified by this, so it still contain the path and filename.
sed 's#^#'"${x}"'#g' ${file} sed is a stream editor, there is whole books about its usage, for this particular one. It usually placed into single quote, so 's#^#1#g' will add 1 the beginning of every line in a file.s is substitution, ^ is the beginning of the file, 1 is a text, g is global if you not put there the g only first mach will be affected.
# is separator it can be else as well, like / for example. I brake single quote to let variable be used and reopened the single quote.
If you like to replace a text, .txt to .php, you can use sed 's#\.txt#\.php#g' file , . have special meaning, it can replace any singe character, so it need to be escaped \, to use it as a text. else not only file.txt will be matched but file1txt as well.
It can be piped , you not need to specify file name in that case, else you have to provide at least one filename in our case it was the ${file} variable that contain the filename. As i mentioned variable substitution is not modify variable value so its still contain the filename with path.

Using a glob expression passed as a bash script argument

TL;DR:
Why isn't invoking ./myscript foo* when myscript has var=$1 the same as invoking ./myscript with var=foo* hardcoded?
Longer form
I've come across a weird issue in a bash script I'm writing. I am sure there is a simple explanation, but I can't figure it out.
I am trying to pass a command line argument to be assigned as a variable in the script.
I want the script to allow 2 command line arguments as follows:
$ bash my_bash_script.bash args1 args2
In my script, I assigned variables like this:
ARGS1=$1
ARGS2=$2
Args 1 is a string descriptor to add to the output file.
Args 2 is a group of directories: "dir1, dir2, dir3", which I am passing as dir*
When I assign dir* to ARGS2 in the script it works fine, but when I pass dir* as the second command line argument, it only includes dir1 in the wildcard expansion of dir*.
I assume this has something to do with how the shell handles wildcards (even when passed as args), but I don't really understand it.
Any help would be appreciated.
Environment / Usage
I have a group of directories:
dir_1_y_map, dir_1_x_map, dir_2_y_map, dir_2_x_map,
... dir_10_y_map, dir_10_x_map...
Inside these directories I am trying to access a file with extension ".status" via *.status, and ".report.txt" via *report.txt.
I want to pass dir_*_map as the second argument to the script and store it in the variable ARGS2, then use it to search within each of the directories for the ".status" and ".report" files.
The issue is that passing dir_*_map from the command line doesn't give the list of directories, but rather just the first item in the list. If I assign the variable ARGS2=dir_*_map within the script, it works as I intend.
Workaround: Quoting
It turns out that passing the second argument in quotes allowed the wildcard expansion to work appropriately for "dir_*_map"
#!/usr/bin/env bash
ARGS1=$1
ARGS2=$2
touch $ARGS1".extension"
for i in /$ARGS2/*.status
do
grep -e "string" $i >> $ARGS1".extension"
done
Here is an example invocation of the script:
sh ~/path/to/script descriptor "dir_*_map"
I don't fully understand when/why some arguments must be passed in quotes, but I assume it has to do with the wildcard expansion in the for loop.
Addressing the "why"
Assignments, as in var=foo*, don't expand globs -- that is, when you run var=foo*, the literal string foo* is put into the variable foo, not the list of files matching foo*.
By contrast, unquoted use of foo* on a command line expands the glob, replacing it with a list of individual names, each of which is passed as a separate argument.
Thus, running ./yourscript foo* doesn't pass foo* as $1 unless no files matching that glob expression exist; instead, it becomes something like ./yourscript foo01 foo02 foo03, with each argument in a different spot on the command line.
The reason running ./yourscript "foo*" functions as a workaround is the unquoted expansion inside the script allowing the glob to be expanded at that later time. However, this is bad practice: glob expansion happens concurrent with string-splitting (meaning that relying on this behavior removes your ability to pass filenames containing characters found in IFS, typically whitespace), and also means that you can't pass literal filenames when they could also be interpreted as globs (if you have a file named [1] and a file named 1, passing [1] would always be replaced with 1).
Idiomatic Usage
The idiomatic way to build this would be to shift away the first argument, and then iterate over subsequent ones, like so:
#!/bin/bash
out_base=$1; shift
shopt -s nullglob # avoid generating an error if a directory has no .status
for dir; do # iterate over directories passed in $2, $3, etc
for file in "$dir"/*.status; do # iterate over files ending in .status within those
grep -e "string" "$file" # match a single file
done
done >"${out_base}.extension"
If you have many .status files in a single directory, all this can be made more efficient by using find to invoke grep with as many arguments as possible, rather than calling grep individually on a per-file basis:
#!/bin/bash
out_base=$1; shift
find "$#" -maxdepth 1 -type f -name '*.status' \
-exec grep -h -- /dev/null '{}' + \
>"${out_base}.extension"
Both scripts above expect the globs passed not to be quoted on the invoking shell. Thus, usage is of the form:
# being unquoted, this expands the glob into a series of separate arguments
your_script descriptor dir_*_map
This is considerably better practice than passing globs to your script (which then is required to expand them to retrieve the actual files to use); it works correctly with filenames containing whitespace (which the other practice doesn't), and files whose names are themselves glob expressions.
Some other points of note:
Always put double quotes around expansions! Failing to do so results in the additional steps of string-splitting and glob expansion (in that order) being applied. If you want globbing, as in the case of "$dir"/*.status, then end the quotes before the glob expression starts.
for dir; do is precisely equivalent to for dir in "$#"; do, which iterates over arguments. Don't make the mistake of using for dir in $*; do or for dir in $#; do instead! These latter invocations combine each element of the list with the first character of IFS (which, by default, contains the space, the tab and the newline in that order), then splits the resulting string on any IFS characters found within, then expands each component of the resulting list as a glob.
Passing /dev/null as an argument to grep is a safety measure: It ensures that you don't have different behavior between the single-argument and multi-argument cases (as an example, grep defaults to printing filenames within output only when passed multiple arguments), and ensures that you can't have grep hang trying to read from stdin if it's passed no additional filenames at all (which find won't do here, but xargs can).
Using lower-case names for your own variables (as opposed to system- and shell-provided variables, which have all-uppercase names) is in accordance with POSIX-specified convention; see fourth paragraph of the POSIX specification regarding environment variables, keeping in mind that environment variables and shell variables share a namespace.

How to remove the extension of a file?

I have a folder that is full of .bak files and some other files also. I need to remove the extension of all .bak files in that folder. How do I make a command which will accept a folder name and then remove the extension of all .bak files in that folder ?
Thanks.
To remove a string from the end of a BASH variable, use the ${var%ending} syntax. It's one of a number of string manipulations available to you in BASH.
Use it like this:
# Run in the same directory as the files
for FILENAME in *.bak; do mv "$FILENAME" "${FILENAME%.bak}"; done
That works nicely as a one-liner, but you could also wrap it as a script to work in an arbitrary directory:
# If we're passed a parameter, cd into that directory. Otherwise, do nothing.
if [ -n "$1" ]; then
cd "$1"
fi
for FILENAME in *.bak; do mv "$FILENAME" "${FILENAME%.bak}"; done
Note that while quoting your variables is almost always a good practice, the for FILENAME in *.bak is still dangerous if any of your filenames might contain spaces. Read David W.'s answer for a more-robust solution, and this document for alternative solutions.
There are several ways to remove file suffixes:
In BASH and Kornshell, you can use the environment variable filtering. Search for ${parameter%word} in the BASH manpage for complete information. Basically, # is a left filter and % is a right filter. You can remember this because # is to the left of %.
If you use a double filter (i.e. ## or %%, you are trying to filter on the biggest match. If you have a single filter (i.e. # or %, you are trying to filter on the smallest match.
What matches is filtered out and you get the rest of the string:
file="this/is/my/file/name.txt"
echo ${file#*/} #Matches is "this/` and will print out "is/my/file/name.txt"
echo ${file##*/} #Matches "this/is/my/file/" and will print out "name.txt"
echo ${file%/*} #Matches "/name.txt" and will print out "/this/is/my/file"
echo ${file%%/*} #Matches "/is/my/file/name.txt" and will print out "this"
Notice this is a glob match and not a regular expression match!. If you want to remove a file suffix:
file_sans_ext=${file%.*}
The .* will match on the period and all characters after it. Since it is a single %, it will match on the smallest glob on the right side of the string. If the filter can't match anything, it the same as your original string.
You can verify a file suffix with something like this:
if [ "${file}" != "${file%.bak}" ]
then
echo "$file is a type '.bak' file"
else
echo "$file is not a type '.bak' file"
fi
Or you could do this:
file_suffix=$(file##*.}
echo "My file is a file '.$file_suffix'"
Note that this will remove the period of the file extension.
Next, we will loop:
find . -name "*.bak" -print0 | while read -d $'\0' file
do
echo "mv '$file' '${file%.bak}'"
done | tee find.out
The find command finds the files you specify. The -print0 separates out the names of the files with a NUL symbol -- which is one of the few characters not allowed in a file name. The -d $\0means that your input separators are NUL symbols. See how nicely thefind -print0andread -d $'\0'` together?
You should almost never use the for file in $(*.bak) method. This will fail if the files have any white space in the name.
Notice that this command doesn't actually move any files. Instead, it produces a find.out file with a list of all the file renames. You should always do something like this when you do commands that operate on massive amounts of files just to be sure everything is fine.
Once you've determined that all the commands in find.out are correct, you can run it like a shell script:
$ bash find.out
rename .bak '' *.bak
(rename is in the util-linux package)
Caveat: there is no error checking:
#!/bin/bash
cd "$1"
for i in *.bak ; do mv -f "$i" "${i%%.bak}" ; done
You can always use the find command to get all the subdirectories
for FILENAME in `find . -name "*.bak"`; do mv --force "$FILENAME" "${FILENAME%.bak}"; done

bash - run script based on substring of filename (perhaps using wildcard)

I've got the below simple script that calls an external script with a number of filenames and arguments of either a delimiter or a set of cut positions. My question: is there a way to make the filename 'dynamic using wildcards' in the sense that the directory will always contain those filenames but with extra text on either end? But the script can do some sort of match up to get the full filename based on a 'contains'.
current /release/ext/ directory contents:
2011storesblah.dat
hrlatest.dat
emp_new12.txt
ie the directory contains these files today (but next week the filenames in this directory could have a slightly different prefix.
eg:
stores_newer.dat
finandhr.dat
emps.txt
Script:
#!/bin/bash
FILES='/release/ext/stores.dat "|"
/release/ext/emp.txt 1-3 4-11 15-40
/release/ext/hr.dat "|" 2'
for f in $FILES
do
echo `sh myexternalscript.sh $f`;
done
Note: there is no need to handle a scenario where the file in my script matches more than 2 files in the direc (it will always only match one).
Also it only can match the file types that are specified in the script.
Also, I don't need to search recursively, just needs to look in the /release/ext/ directory only.
I'm running SunOS 5.10.
$FILES=`find /release/ext -name *stores*.dat`
for FILE in $FILES do
# need to test for empty, case $FILES is empty
test -n "$FILE" && /do/whatever/you/want
done;
It is unclear what the pipe characters and numbers are for in your $FILES variable. However, here is something you might find useful:
#!/bin/bash
filespecs='*stores*.dat *hr*.dat *emp*.txt'
dir='/release/ext'
cd "$dir"
for file in $filespecs
do
sh myexternalscript.sh "$dir/$file"
done
Note that your question is tagged "bash" and you use "bash" in your shebang, but for some reason, you use "sh" when you call your other script. On some systems, sh is symlinked to Bash, but it will behave differently than Bash when called directly. On many systems, sh is completely separate from Bash.
In order to expand the globs and incorporate other arguments, you will need to violate the Bash rule of always quoting variables (this is an example of one of the exceptions).
filespecs='*stores*.dat | 3
*hr*.dat 4 5
*emp*.txt 6 7 8'
while read -r spec arg1 arg2 arg3 arg4
do
sh myexternalscript.sh "$dir"/$spec "$arg1" "$arg2" "$arg3" "$arg4"
done < <(echo "$filespecs")
Use as many "arg" arguments as you think you'll need. Extras will be passed as empty, but set arguments. If there are more arguments than variables to accept them, then the last variable will contain all the remainders in addition to the one that corresponds to it. This version doesn't need the cd since the glob isn't expanded until the directory has been prepended, while in the first version the glob is expanded before the directory is prepended.
If you quote the pipes in the manner shown in your question, then the double quotes will be included in the argument. In the way I show it, only the pipe character gets passed but it's protected since the variable is quoted at the time it's referenced.

Resources