Shell script how to list file name in ascending order - linux

I am new to linux, writing a bash script below.
The files in the current folder are stored as 1.jpg,2.jpg, and so on, I have to process files sequentially according to their names but in the below loop I get file names is some different order.
for i in ./*.jpg
do
filename=$(basename "$i")
echo "filename is ./$filename"
done
output I get is like this
filename is ./10.jpg
filename is ./11.jpg
filename is ./12.jpg
filename is ./13.jpg
filename is ./14.jpg
filename is ./15.jpg
filename is ./16.jpg
filename is ./17.jpg
filename is ./18.jpg
filename is ./19.jpg
filename is ./1.jpg
filename is ./20.jpg
filename is ./21.jpg
filename is ./22.jpg
filename is ./27.jpg
filename is ./28.jpg
filename is ./29.jpg
filename is ./2.jpg
filename is ./3.jpg
filename is ./4.jpg
filename is ./6.jpg
filename is ./7.jpg
filename is ./8.jpg
filename is ./9.jpg
Any assistance as to how can I process them in the sequence of names 1.jpg, 2.jpg etc

Pathname expansion (glob expansion) returns a list of filenames which is alphabetically sorted according to your current locale. If you have something simple like UTF-8 or C, your sorting order will be ASCII sorted. This is visible in the result of the OP. The file with name 19.jpg is sorted before 1.jpg because the lt;dot>-character has a higher lexicographical order than the character 9.
If you want to traverse your files in a different sorting order, then a different approach needs to be taken.
Under the bold assumption that the OP requests to traverse the files in a numeric sorted way, i.e. order the names according to a number at the beginning of the file-name, you can do the following:
while IFS= read -r -d '' file; do
echo "filename: $file"
done < <(find . -maxdepth 1 -type f -name '*.jpg' -print0 | sort -z -n)
Here we use find to list all files in the current directory (depth==1) we print them with a \0 as a separator, and use sort to ask for the requested sorting, indicating that we have \0 as the field separator. Instead of using a for-loop, we use a while-loop to read the information.
See BashPitFall001 For some details
note: sort -z is a GNU extension

not quite sure if this is what you're asking, but you have the echo inside your loop which will cause it to be printed in a different row each time.
you can do:
list ""
for i in ./*.jpg
do
filename=$(basename "$i")
list="$list $filename"
done
echo "files: $list"
which would output
files: 1.jpg 2.jpg
Nevertheless, You should clarify your question.

From your requirement to "process them in the sequence of names 1.jpg, 2.jpg etc", this will accomplish that. The sort specifies a numeric key obtained by defining the first field as the string before a "." delimiter.
#!/usr/bin/env bash
shopt -s nullglob
allfiles=(*.jpg);
for f in "${allfiles[#]}"
do
echo "$f"
done | sort -t"." -k1n

Related

How to rename fasta header based on filename in multiple files?

I have a directory with multiple fasta file named as followed:
BC-1_bin_1_genes.faa
BC-1_bin_2_genes.faa
BC-1_bin_3_genes.faa
BC-1_bin_4_genes.faa
etc. (about 200 individual files)
The fasta header look like this:
>BC-1_k127_3926653_6 # 4457 # 5341 # -1 # ID=2_6;partial=01;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.697
I now want to add the filename to the header since I want to annotate the sequences for each file.I tried the following:
for file in *.faa;
do
sed -i "s/>.*/${file%%.*}/" "$file" ;
done
It worked partially but it removed the ">" from the header which is essential for the fasta file. I tried to modify the "${file%%.*}" part to keep the carrot but it always called me out on bad substitutions.
I also tried this:
awk '/>/{sub(">","&"FILENAME"_");sub(/\.faa/,x)}1' *.faa
This worked in theory but only printed everything on my terminal rather than changing it in the respective files.
Could someone assist with this?
It's not clear whether you want to replace the earlier header, or add to it. Both scenarios are easy to do. Don't replace text you don't want to replace.
for file in ./*.faa;
do
sed -i "s/^>.*/>${file%%.*}/" "$file"
done
will replace the header, but include a leading > in the replacement, effectively preserving it; and
for file in ./*.faa;
do
sed -i "s/^>.*/&${file%%.*}/" "$file"
done
will append the file name at the end of the header (& in the replacement string evaluates to the string we are replacing, again effectively preserving it).
For another variation, try
for file in *.faa;
do
sed -i "/^>/s/\$/ ${file%%.*}/" "$file"
done
which says on lines which match the regex ^>, replace the empty string at the end of the line $ with the file name.
Of course, your Awk script could easily be fixed, too. Standard Awk does not have an option to parallel the -i "in-place" option of sed, but you can easily use a temporary file:
for file in ./*.faa;
do
awk '/>/{ $0 = $0 " " FILENAME);sub(/\.faa/,"")}1' "$file" >"$file.tmp" &&
mv "$file.tmp" "$file"
done
GNU Awk also has an -i inplace extension which you could simply add to the options of your existing script if you have GNU Awk.
Since FASTA files typically contain multiple headers, adding to the header rather than replacing all headers in a file with the same string seems more useful, so I changed your Awk script to do that instead.
For what it's worth, the name of the character ^ is caret (carrot is 🥕). The character > is called greater than or right angle bracket, or right broket or sometimes just wedge.
You just need to detect the pattern to replace and use regex to implement it:
fasta_helper.sh
location=$1
for file in $location/*.faa
do
full_filename=${file##*/}
filename="${full_filename%.*}"
#scape special chars
filename=$(echo $filename | sed 's_/_\\/_g')
echo "adding file name: $filename to: $full_filename"
sed -i -E "s/^[^#]+/>$filename /" $location/$full_filename
done
usage:
Just pass the folder with fasta files:
bash fasta_helper.sh /foo/bar
test:
lectures
Regex: matching up to the first occurrence of a character
Extract filename and extension in Bash
https://unix.stackexchange.com/questions/78625/using-sed-to-find-and-replace-complex-string-preferrably-with-regex
Locating your files
Suggesting to first identify your files with find command or ls command.
find . -type f -name "*.faa" -printf "%f\n"
A find command to print only file with filenames extension .faa. Including sub directories to current directory.
ls -1 "*.faa"
An ls command to print files and directories with extension .faa. In current directory.
Processing your files
Once you have the correct files list, iterate over the list and apply sed command.
for fileName in $(find . -type f -name "*.faa" -printf "%f\n"); do
stripedFileName=${fileName/.*/} # strip extension .faa
sed -i "1s|\$| $stripedFileName|" "fileName" # append value of stripedFileName at end of line 1
done

How to rename string in multiple filename in a folder using shell script without mv command since it will move the files to different folder? [duplicate]

This question already has answers here:
Rename multiple files based on pattern in Unix
(24 answers)
Closed 5 years ago.
Write a simple script that will automatically rename a number of files. As an example we want the file *001.jpg renamed to user defined string + 001.jpg (ex: MyVacation20110725_001.jpg) The usage for this script is to get the digital camera photos to have file names that make some sense.
I need to write a shell script for this. Can someone suggest how to begin?
An example to help you get off the ground.
for f in *.jpg; do mv "$f" "$(echo "$f" | sed s/IMG/VACATION/)"; done
In this example, I am assuming that all your image files contain the string IMG and you want to replace IMG with VACATION.
The shell automatically evaluates *.jpg to all the matching files.
The second argument of mv (the new name of the file) is the output of the sed command that replaces IMG with VACATION.
If your filenames include whitespace pay careful attention to the "$f" notation. You need the double-quotes to preserve the whitespace.
You can use rename utility to rename multiple files by a pattern. For example following command will prepend string MyVacation2011_ to all the files with jpg extension.
rename 's/^/MyVacation2011_/g' *.jpg
or
rename <pattern> <replacement> <file-list>
this example, I am assuming that all your image files begin with "IMG" and you want to replace "IMG" with "VACATION"
solution : first identified all jpg files and then replace keyword
find . -name '*jpg' -exec bash -c 'echo mv $0 ${0/IMG/VACATION}' {} \;
for file in *.jpg ; do mv $file ${file//IMG/myVacation} ; done
Again assuming that all your image files have the string "IMG" and you want to replace "IMG" with "myVacation".
With bash you can directly convert the string with parameter expansion.
Example: if the file is IMG_327.jpg, the mv command will be executed as if you do mv IMG_327.jpg myVacation_327.jpg. And this will be done for each file found in the directory matching *.jpg.
IMG_001.jpg -> myVacation_001.jpg
IMG_002.jpg -> myVacation_002.jpg
IMG_1023.jpg -> myVacation_1023.jpg
etcetera...
find . -type f |
sed -n "s/\(.*\)factory\.py$/& \1service\.py/p" |
xargs -p -n 2 mv
eg will rename all files in the cwd with names ending in "factory.py" to be replaced with names ending in "service.py"
explanation:
In the sed cmd, the -n flag will suppress normal behavior of echoing input to output after the s/// command is applied, and the p option on s/// will force writing to output if a substitution is made. Since a sub will only be made on match, sed will only have output for files ending in "factory.py"
In the s/// replacement string, we use "& " to interpolate the entire matching string, followed by a space character, into the replacement. Because of this, it's vital that our RE matches the entire filename. after the space char, we use "\1service.py" to interpolate the string we gulped before "factory.py", followed by "service.py", replacing it. So for more complex transformations youll have to change the args to s/// (with an re still matching the entire filename)
Example output:
foo_factory.py foo_service.py
bar_factory.py bar_service.py
We use xargs with -n 2 to consume the output of sed 2 delimited strings at a time, passing these to mv (i also put the -p option in there so you can feel safe when running this). voila.
NOTE: If you are facing more complicated file and folder scenarios, this post explains find (and some alternatives) in greater detail.
Another option is:
for i in *001.jpg
do
echo "mv $i yourstring${i#*001.jpg}"
done
remove echo after you have it right.
Parameter substitution with # will keep only the last part, so you can change its name.
Can't comment on Susam Pal's answer but if you're dealing with spaces, I'd surround with quotes:
for f in *.jpg; do mv "$f" "`echo $f | sed s/\ /\-/g`"; done;
You can try this:
for file in *.jpg;
do
mv $file $somestring_${file:((-7))}
done
You can see "parameter expansion" in man bash to understand the above better.

How to list all the folder in a folder and exclude a specific one

Let's say I have a folder like this:
my_folder
====my_sub_folder_1
====my_sub_folder_2
====my_sub_folder_3
====exclude
I would like a command that return a string like this :
["my_sub_folder_1", "my_dub_folder_2", "my_dub_folder_3"]
(Notice the exclusion of the excude folder)
The best I could is :
ls -dxm */
That return the following.
my_sub_folder_1/, my_dub_folder_2/, my_dub_folder_3/
So I'm still trying to remove the / at the end of each folder, add the [] and the "".
If it's possible I would like to do that in one line so I could diretly put in a shell variable, other wise I will put it in .sh file that will return the string I'm trying to build.
(I don't know if the last part is really possible).
Assuming you are executing the script in the directory where my_folder
belongs, how about:
while IFS= read -r -d "" f; do
ary+=("$f")
done < <(find "my_folder" -maxdepth 1 -mindepth 1 -type d -not -name "exclude" -printf "\"%f\"\0")
(IFS=","; echo "[${ary[*]}]")
[Explanations]
-printf option to find command specifies the output format. The format "\"%f\"\0"
prints the filename (excluding leading directory name) wrapped by
double quotes and followed by a NUL character \0.
The NUL character is used as a filename delimiter and the filenames
are split again in the read builtin by specifying the delimiter
to the NUL character with -d "".
Then the filenames (with double quotes) are stored in the array ary
one by one.
Finally echo "[${ary[*]}]" command prints out the elements of ary
separated by IFS. The whole output are surrounded by the square brackets [].
The last line is surrounded by parens () to be executed in the subprocess.
The purpose is just not to overwrite the current IFS.
If you save the script in my answer as my_script.sh, then you can assign
a variable MY_VAR to the output by saying:
MY_VAR=$(./my_script.sh)
echo "$MY_VAR"
# or another_command "$MY_VAR" or whatever
Alternatively you can assign the variable within the script by modifying
the last line as:
MY_VAR=$(IFS=","; echo "[${ary[*]}]")
echo "$MY_VAR"
Hope this helps.
In bash this can be done as follows, it's close but it doesn't work in one line.
Change the Internal Field Separator to be a new line rather than a space. This allows spaces in directory names to be ignored.
Then perform the following:
List the directories, one per line
Use grep to remove the directory to be excluded
Iterate over the results:
Output the directory name with the last character removed
Pipe everything to xargs to recombine into a single line and store in $var
Trim the last , from ${var} and wrap in '[]'
IFS=$'\n'
var=`for d in \`ls -d1 */ | grep -v exclude_dir \`; do echo '\"'${d::-1}'\",' ; done | xargs`
echo '['${var::-1}']'

List filenames up to a limit per each text file in Bash

How can I create a number of text files, which include the filenames of the files in a specific directory, up to a maximum of 999 rows per text file?
I started from this:
find ./J0902-405/*.evt -maxdepth 1 -type f -fprintf files_xselect.list %f\\n
And it writes the filenames properly in the textfile.
But afterwords, I need to put the 999 rows limit, and after that limit, create another text file with the following 999 names, and so on, until all the *.evt files are listed.
find ./J0902-405/*.evt -maxdepth 1 -type f | split -l999 -
From the manual page:
NAME
split - split a file into pieces
SYNOPSIS
split [OPTION]... [INPUT [PREFIX]]
DESCRIPTION
Output fixed-size pieces of INPUT to PREFIXaa, PREFIXab, ...; default size is 1000
lines, and default PREFIX is `x'. With no INPUT, or when INPUT is -, read standard
input.
-l, --lines=NUMBER
put NUMBER lines per output file
#DopeGhoti's answer is the right approach, but let me flesh it out a bit, for those new to split (like me):
find ./J0902-405 -maxdepth 1 -name '*.evt' -type f -printf '%f\n' | \
split -l 999 -d - files_xselect.list.
-name ... with a quoted filename pattern lets find do the pathname expansion (as opposed to the shell - no point in letting both the shell and find do the work).
-printf '%f\n' ensures that only filenames (no path components) are output, as in the OP.
-l 999 specifies the split size in lines; default is 1000.
-d causes numerical suffixes to be used for the output files (00, 01, ...) rather than the default letters (aa, ab, ...) [note: won't work on OSX]; default suffix length is 2; to control the number of digits/chars. in the suffix, use -a {length}.
- causes split to read from stdin - in this case, the output from find.
files_xselect.list. is the output-file prefix; thus, we get files files_xselect.list.00, files_xselect.list.01, ...
If you want more control over the output filename - e.g., to move the suffix data to a different part of the filename - you can use the --filter option (note: won't work on OS X), which accepts a shell command to which the output data for each file is piped, along with variable $FILE containing the name of the respective output filename; this gives you the chance to modify the output filename based on it:
For instance, to create output files named files_xselect.00.list, ... - i.e., to place the suffix data before the filename extension, you'd use:
... | split -l 999 -d --filter='> ${FILE}.list' - 'files_xselect.'
Something like
#!/bin/bash
for file in ./J0902-405/*.evt; do
[[ $i > 999 ]] && i=0 && j=$((j+1))
[[ -f $file ]] && i=$((i+1)) && echo "${file##*/}" >> "fileofnames$j.txt"
done

How to remove the extension of a file?

I have a folder that is full of .bak files and some other files also. I need to remove the extension of all .bak files in that folder. How do I make a command which will accept a folder name and then remove the extension of all .bak files in that folder ?
Thanks.
To remove a string from the end of a BASH variable, use the ${var%ending} syntax. It's one of a number of string manipulations available to you in BASH.
Use it like this:
# Run in the same directory as the files
for FILENAME in *.bak; do mv "$FILENAME" "${FILENAME%.bak}"; done
That works nicely as a one-liner, but you could also wrap it as a script to work in an arbitrary directory:
# If we're passed a parameter, cd into that directory. Otherwise, do nothing.
if [ -n "$1" ]; then
cd "$1"
fi
for FILENAME in *.bak; do mv "$FILENAME" "${FILENAME%.bak}"; done
Note that while quoting your variables is almost always a good practice, the for FILENAME in *.bak is still dangerous if any of your filenames might contain spaces. Read David W.'s answer for a more-robust solution, and this document for alternative solutions.
There are several ways to remove file suffixes:
In BASH and Kornshell, you can use the environment variable filtering. Search for ${parameter%word} in the BASH manpage for complete information. Basically, # is a left filter and % is a right filter. You can remember this because # is to the left of %.
If you use a double filter (i.e. ## or %%, you are trying to filter on the biggest match. If you have a single filter (i.e. # or %, you are trying to filter on the smallest match.
What matches is filtered out and you get the rest of the string:
file="this/is/my/file/name.txt"
echo ${file#*/} #Matches is "this/` and will print out "is/my/file/name.txt"
echo ${file##*/} #Matches "this/is/my/file/" and will print out "name.txt"
echo ${file%/*} #Matches "/name.txt" and will print out "/this/is/my/file"
echo ${file%%/*} #Matches "/is/my/file/name.txt" and will print out "this"
Notice this is a glob match and not a regular expression match!. If you want to remove a file suffix:
file_sans_ext=${file%.*}
The .* will match on the period and all characters after it. Since it is a single %, it will match on the smallest glob on the right side of the string. If the filter can't match anything, it the same as your original string.
You can verify a file suffix with something like this:
if [ "${file}" != "${file%.bak}" ]
then
echo "$file is a type '.bak' file"
else
echo "$file is not a type '.bak' file"
fi
Or you could do this:
file_suffix=$(file##*.}
echo "My file is a file '.$file_suffix'"
Note that this will remove the period of the file extension.
Next, we will loop:
find . -name "*.bak" -print0 | while read -d $'\0' file
do
echo "mv '$file' '${file%.bak}'"
done | tee find.out
The find command finds the files you specify. The -print0 separates out the names of the files with a NUL symbol -- which is one of the few characters not allowed in a file name. The -d $\0means that your input separators are NUL symbols. See how nicely thefind -print0andread -d $'\0'` together?
You should almost never use the for file in $(*.bak) method. This will fail if the files have any white space in the name.
Notice that this command doesn't actually move any files. Instead, it produces a find.out file with a list of all the file renames. You should always do something like this when you do commands that operate on massive amounts of files just to be sure everything is fine.
Once you've determined that all the commands in find.out are correct, you can run it like a shell script:
$ bash find.out
rename .bak '' *.bak
(rename is in the util-linux package)
Caveat: there is no error checking:
#!/bin/bash
cd "$1"
for i in *.bak ; do mv -f "$i" "${i%%.bak}" ; done
You can always use the find command to get all the subdirectories
for FILENAME in `find . -name "*.bak"`; do mv --force "$FILENAME" "${FILENAME%.bak}"; done

Resources