How to list all the folder in a folder and exclude a specific one - linux

Let's say I have a folder like this:
my_folder
====my_sub_folder_1
====my_sub_folder_2
====my_sub_folder_3
====exclude
I would like a command that return a string like this :
["my_sub_folder_1", "my_dub_folder_2", "my_dub_folder_3"]
(Notice the exclusion of the excude folder)
The best I could is :
ls -dxm */
That return the following.
my_sub_folder_1/, my_dub_folder_2/, my_dub_folder_3/
So I'm still trying to remove the / at the end of each folder, add the [] and the "".
If it's possible I would like to do that in one line so I could diretly put in a shell variable, other wise I will put it in .sh file that will return the string I'm trying to build.
(I don't know if the last part is really possible).

Assuming you are executing the script in the directory where my_folder
belongs, how about:
while IFS= read -r -d "" f; do
ary+=("$f")
done < <(find "my_folder" -maxdepth 1 -mindepth 1 -type d -not -name "exclude" -printf "\"%f\"\0")
(IFS=","; echo "[${ary[*]}]")
[Explanations]
-printf option to find command specifies the output format. The format "\"%f\"\0"
prints the filename (excluding leading directory name) wrapped by
double quotes and followed by a NUL character \0.
The NUL character is used as a filename delimiter and the filenames
are split again in the read builtin by specifying the delimiter
to the NUL character with -d "".
Then the filenames (with double quotes) are stored in the array ary
one by one.
Finally echo "[${ary[*]}]" command prints out the elements of ary
separated by IFS. The whole output are surrounded by the square brackets [].
The last line is surrounded by parens () to be executed in the subprocess.
The purpose is just not to overwrite the current IFS.
If you save the script in my answer as my_script.sh, then you can assign
a variable MY_VAR to the output by saying:
MY_VAR=$(./my_script.sh)
echo "$MY_VAR"
# or another_command "$MY_VAR" or whatever
Alternatively you can assign the variable within the script by modifying
the last line as:
MY_VAR=$(IFS=","; echo "[${ary[*]}]")
echo "$MY_VAR"
Hope this helps.

In bash this can be done as follows, it's close but it doesn't work in one line.
Change the Internal Field Separator to be a new line rather than a space. This allows spaces in directory names to be ignored.
Then perform the following:
List the directories, one per line
Use grep to remove the directory to be excluded
Iterate over the results:
Output the directory name with the last character removed
Pipe everything to xargs to recombine into a single line and store in $var
Trim the last , from ${var} and wrap in '[]'
IFS=$'\n'
var=`for d in \`ls -d1 */ | grep -v exclude_dir \`; do echo '\"'${d::-1}'\",' ; done | xargs`
echo '['${var::-1}']'

Related

How to rename fasta header based on filename in multiple files?

I have a directory with multiple fasta file named as followed:
BC-1_bin_1_genes.faa
BC-1_bin_2_genes.faa
BC-1_bin_3_genes.faa
BC-1_bin_4_genes.faa
etc. (about 200 individual files)
The fasta header look like this:
>BC-1_k127_3926653_6 # 4457 # 5341 # -1 # ID=2_6;partial=01;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.697
I now want to add the filename to the header since I want to annotate the sequences for each file.I tried the following:
for file in *.faa;
do
sed -i "s/>.*/${file%%.*}/" "$file" ;
done
It worked partially but it removed the ">" from the header which is essential for the fasta file. I tried to modify the "${file%%.*}" part to keep the carrot but it always called me out on bad substitutions.
I also tried this:
awk '/>/{sub(">","&"FILENAME"_");sub(/\.faa/,x)}1' *.faa
This worked in theory but only printed everything on my terminal rather than changing it in the respective files.
Could someone assist with this?
It's not clear whether you want to replace the earlier header, or add to it. Both scenarios are easy to do. Don't replace text you don't want to replace.
for file in ./*.faa;
do
sed -i "s/^>.*/>${file%%.*}/" "$file"
done
will replace the header, but include a leading > in the replacement, effectively preserving it; and
for file in ./*.faa;
do
sed -i "s/^>.*/&${file%%.*}/" "$file"
done
will append the file name at the end of the header (& in the replacement string evaluates to the string we are replacing, again effectively preserving it).
For another variation, try
for file in *.faa;
do
sed -i "/^>/s/\$/ ${file%%.*}/" "$file"
done
which says on lines which match the regex ^>, replace the empty string at the end of the line $ with the file name.
Of course, your Awk script could easily be fixed, too. Standard Awk does not have an option to parallel the -i "in-place" option of sed, but you can easily use a temporary file:
for file in ./*.faa;
do
awk '/>/{ $0 = $0 " " FILENAME);sub(/\.faa/,"")}1' "$file" >"$file.tmp" &&
mv "$file.tmp" "$file"
done
GNU Awk also has an -i inplace extension which you could simply add to the options of your existing script if you have GNU Awk.
Since FASTA files typically contain multiple headers, adding to the header rather than replacing all headers in a file with the same string seems more useful, so I changed your Awk script to do that instead.
For what it's worth, the name of the character ^ is caret (carrot is 🥕). The character > is called greater than or right angle bracket, or right broket or sometimes just wedge.
You just need to detect the pattern to replace and use regex to implement it:
fasta_helper.sh
location=$1
for file in $location/*.faa
do
full_filename=${file##*/}
filename="${full_filename%.*}"
#scape special chars
filename=$(echo $filename | sed 's_/_\\/_g')
echo "adding file name: $filename to: $full_filename"
sed -i -E "s/^[^#]+/>$filename /" $location/$full_filename
done
usage:
Just pass the folder with fasta files:
bash fasta_helper.sh /foo/bar
test:
lectures
Regex: matching up to the first occurrence of a character
Extract filename and extension in Bash
https://unix.stackexchange.com/questions/78625/using-sed-to-find-and-replace-complex-string-preferrably-with-regex
Locating your files
Suggesting to first identify your files with find command or ls command.
find . -type f -name "*.faa" -printf "%f\n"
A find command to print only file with filenames extension .faa. Including sub directories to current directory.
ls -1 "*.faa"
An ls command to print files and directories with extension .faa. In current directory.
Processing your files
Once you have the correct files list, iterate over the list and apply sed command.
for fileName in $(find . -type f -name "*.faa" -printf "%f\n"); do
stripedFileName=${fileName/.*/} # strip extension .faa
sed -i "1s|\$| $stripedFileName|" "fileName" # append value of stripedFileName at end of line 1
done

How to rename string in multiple filename in a folder using shell script without mv command since it will move the files to different folder? [duplicate]

This question already has answers here:
Rename multiple files based on pattern in Unix
(24 answers)
Closed 5 years ago.
Write a simple script that will automatically rename a number of files. As an example we want the file *001.jpg renamed to user defined string + 001.jpg (ex: MyVacation20110725_001.jpg) The usage for this script is to get the digital camera photos to have file names that make some sense.
I need to write a shell script for this. Can someone suggest how to begin?
An example to help you get off the ground.
for f in *.jpg; do mv "$f" "$(echo "$f" | sed s/IMG/VACATION/)"; done
In this example, I am assuming that all your image files contain the string IMG and you want to replace IMG with VACATION.
The shell automatically evaluates *.jpg to all the matching files.
The second argument of mv (the new name of the file) is the output of the sed command that replaces IMG with VACATION.
If your filenames include whitespace pay careful attention to the "$f" notation. You need the double-quotes to preserve the whitespace.
You can use rename utility to rename multiple files by a pattern. For example following command will prepend string MyVacation2011_ to all the files with jpg extension.
rename 's/^/MyVacation2011_/g' *.jpg
or
rename <pattern> <replacement> <file-list>
this example, I am assuming that all your image files begin with "IMG" and you want to replace "IMG" with "VACATION"
solution : first identified all jpg files and then replace keyword
find . -name '*jpg' -exec bash -c 'echo mv $0 ${0/IMG/VACATION}' {} \;
for file in *.jpg ; do mv $file ${file//IMG/myVacation} ; done
Again assuming that all your image files have the string "IMG" and you want to replace "IMG" with "myVacation".
With bash you can directly convert the string with parameter expansion.
Example: if the file is IMG_327.jpg, the mv command will be executed as if you do mv IMG_327.jpg myVacation_327.jpg. And this will be done for each file found in the directory matching *.jpg.
IMG_001.jpg -> myVacation_001.jpg
IMG_002.jpg -> myVacation_002.jpg
IMG_1023.jpg -> myVacation_1023.jpg
etcetera...
find . -type f |
sed -n "s/\(.*\)factory\.py$/& \1service\.py/p" |
xargs -p -n 2 mv
eg will rename all files in the cwd with names ending in "factory.py" to be replaced with names ending in "service.py"
explanation:
In the sed cmd, the -n flag will suppress normal behavior of echoing input to output after the s/// command is applied, and the p option on s/// will force writing to output if a substitution is made. Since a sub will only be made on match, sed will only have output for files ending in "factory.py"
In the s/// replacement string, we use "& " to interpolate the entire matching string, followed by a space character, into the replacement. Because of this, it's vital that our RE matches the entire filename. after the space char, we use "\1service.py" to interpolate the string we gulped before "factory.py", followed by "service.py", replacing it. So for more complex transformations youll have to change the args to s/// (with an re still matching the entire filename)
Example output:
foo_factory.py foo_service.py
bar_factory.py bar_service.py
We use xargs with -n 2 to consume the output of sed 2 delimited strings at a time, passing these to mv (i also put the -p option in there so you can feel safe when running this). voila.
NOTE: If you are facing more complicated file and folder scenarios, this post explains find (and some alternatives) in greater detail.
Another option is:
for i in *001.jpg
do
echo "mv $i yourstring${i#*001.jpg}"
done
remove echo after you have it right.
Parameter substitution with # will keep only the last part, so you can change its name.
Can't comment on Susam Pal's answer but if you're dealing with spaces, I'd surround with quotes:
for f in *.jpg; do mv "$f" "`echo $f | sed s/\ /\-/g`"; done;
You can try this:
for file in *.jpg;
do
mv $file $somestring_${file:((-7))}
done
You can see "parameter expansion" in man bash to understand the above better.

How to add sequential numbers say 1,2,3 etc. to each file name and also for each line of the file content in a directory?

I want to add sequential number for each file and its contents in a directory. The sequential number should be prefixed with the filename and for each line of its contents should have the same number prefixed. In this manner, the sequential numbers should be generated for all the files(for names and its contents) in the sub-folders of the directory.
I have tried using maxdepth, rename, print function as a part. but it throws error saying that "-maxdepth" - not a valid option.
I have already a part of code(to print the names and contents of text files in a directory) and this logic should be appended with it.
#!bin/bash
cd home/TESTING
for file in home/TESTING;
do
find home/TESTING/ -type f -name *.txt -exec basename {} ';' -exec cat {} \;
done
P.s - print, rename, maxdepth are not working
If the name of the first file is File1.txt and its contents is mentioned as "Louis" then the output for the filename should be 1File1.txt and the content should be as "1Louis".The same should be replaced with 2 for second file. In this manner, it has to traverse through all the subfolders in the directory and print accordingly. I have already a part of code and this logic should be appended with it.
There should be fail safe if you execute cd in a script. You can execute command in wrong directory if you don't.
In your attempt, the output would be the same even without the for cycle, as for file in home/TESTING only pass home/TESTING as argument to for so it only run once. In case of
for file in home/TESTING/* this would happen else how.
I used find without --maxdepth, so it will look into all subdirectory as well for *.txt files. If you want only the current directory $(find /home/TESTING/* -type f -name "*.txt") could be replaced to $(ls *.txt) as long you do not have directory that end to .txt there will be no problem.
#!/bin/bash
# try cd to directory, do things upon success.
if cd /home/TESTING ;then
# set sequence number
let "x = 1"
# pass every file to for that find matching, sub directories will be also as there is no maxdeapth.
for file in $(find /home/TESTING/* -type f -name "*.txt") ; do
# print sequence number, and base file name, processed by variable substitution.
# basename can be used as well but this is bash built in.
echo "${x}${file##*/}"
# print file content, and put sequence number before each line with stream editor.
sed 's#^#'"${x}"'#g' ${file}
# increase sequence number with one.
let "x++"
done
# unset sequence number
unset 'x'
else
# print error on stderr
echo 'cd to /home/TESTING directory is failed' >&2
fi
Variable Substitution:
There is more i only picked this 4 for now as they similar.
${var#pattern} - Use value of var after removing text that match pattern from the left
${var##pattern} - Same as above but remove the longest matching piece instead the shortest
${var%pattern} - Use value of var after removing text that match pattern from the right
${var%%pattern} - Same as above but remove the longest matching piece instead the shortest
So ${file##*/} will take the variable of $file and drop every caracter * before the last ## slash /. The $file variable value not get modified by this, so it still contain the path and filename.
sed 's#^#'"${x}"'#g' ${file} sed is a stream editor, there is whole books about its usage, for this particular one. It usually placed into single quote, so 's#^#1#g' will add 1 the beginning of every line in a file.s is substitution, ^ is the beginning of the file, 1 is a text, g is global if you not put there the g only first mach will be affected.
# is separator it can be else as well, like / for example. I brake single quote to let variable be used and reopened the single quote.
If you like to replace a text, .txt to .php, you can use sed 's#\.txt#\.php#g' file , . have special meaning, it can replace any singe character, so it need to be escaped \, to use it as a text. else not only file.txt will be matched but file1txt as well.
It can be piped , you not need to specify file name in that case, else you have to provide at least one filename in our case it was the ${file} variable that contain the filename. As i mentioned variable substitution is not modify variable value so its still contain the filename with path.

Listing directories with spaces using Bash in linux

I would like to create a bash script to list all the directories in a directory provided by the user via input, or all the directories in the current directory (given no input).
Here's what I have thus far, but when I execute it I encounter two problems.
1) The script completely ignores my input. The file is located on my desktop but when I type in "home" as the input, the script simply prints the directories of the Desktop (current directory).
2) The directories are printed on their own lines (intended) but it treats each word in a folder name as its own folder. i.e. is printed as:
this
folder
Here's the code I have so far:
#!/bin/bash
echo -n "Enter a directory to load files: "
read d
if [ $d="" ]; #if input is blank, assume d = current directory
then d=${PWD##*/}
for i in $(ls -d */);
do echo ${i%%/};
done
else #otherwise, print sub-directories of given directory
for i in $(ls -d */);
do echo ${i%%/};
done
fi
Also in your response please explain your answer as I'm very new to bash.
Thanks for looking, I appreciate your time.
EDIT: Thanks to John1024's answer, I came up with the following:
#!/bin/bash
echo -n "Enter a directory to load files: "
IFS= read d
ls -1 -d "${d:-.}"/*/
And it does everything I need. Much appreciated!
I believe that this script accomplishes what you want:
#!/bin/sh
ls -1 -d "${1:-.}"/*/
Usage example:
$ bash ./script.sh /usr/X11R6
/usr/X11R6/bin
/usr/X11R6/man
Explanation:
-1 tells ls to print each file/directory on a separate line
-d tells ls to list directories by name instead of their contents
The shell will ${1:-.} to be the first argument to the script if there is one or . (which means the current directory) if there isn't.
Enhancement
The above script displays a / at the end of each directory name. If you don't want that, we can use sed to remove trailing slashes from the output:
#!/bin/sh
ls -1d ${1:-.}/*/ | sed 's|/$||'
Revised Version of Your Script
Starting with your script, some simplifications can be made:
#!/bin/bash
echo -n "Enter a directory to load files: "
IFS= read d
d=${d:-$PWD}
for i in "$d"/*/
do
echo ${i%%/}
done
Notes:
IFS= read d
Normally leading and trailing white space are stripped before the input is assigned to d. By setting IFS to an empty value, however, leading and trailing white space will be preserved. Thus this will work even if the pathologically strange case where the user specifies a directory whose name begins or ends with white space.
If the user enters a backslash, the shell will try to process it as an escape. If you don't like that, use IFS= read -r d and backslashes will be treated as normal characters, not escapes.
d=${d:-$PWD}
If the user supplied a value for d, this leaves it unchanged. If he didn't, this assigns it to $PWD.
for i in "$d"/*/
This will loop over every subdirectory of $d and will correctly handle subdirectory names with spaces, tabs, or any other odd character.
By contrast, consider:
for i in $(ls -d */)
After ls executes here, the shell will split up the output into individual words. This is called "word splitting" and is why this form of the for loop should be avoided.
Notice the double-quotes in for i in "$d"/*/. They are there to prevent word splitting on $d.

Get and print directories from $PATH in bash

The script that I have to write must find the directories from the $PATH variable and print only the ones that end with an i.
How am I thinking about doing it
Get each directory from the variable with a for loop.
Find the length of each directory and get the last character from each using a substring
Use an If condition to print the directories that end with an i
Problems
The directories are not separated with a new line and I can't read them using a for loop.
Any ideas on how to get over this problem,or can you think of something more appropriate.
You can use this BASH one-liner for that job:
(IFS=':'; for i in $PATH; do [[ -d "$i" && $i =~ i$ ]] && echo "$i"; done)
IFS=':' sets input field separator to :
$PATH is iterated in a for loop
Each path element is tested if it is a directory and if it is ending with i using BASH regex
If test passes then it is pritned
Use bash's parameter expansion to replace all delimiters.
${parameter//pat/string}
For example,
mypaths="${PATH//:/ }"
will split the path by directory, so then you can run:
for directory in $mypaths
do
...
done
You can change the Inter Field Separator (IFS) to colon then path is dissected auto_magically. ;-)
IFS=:
for i in $PATH
do
echo $i | egrep -e 'i$'
done
grep 'i$' <<<"${PATH//:/$'\n'}"
The $PATH entries are split into individual lines by replacing : instances with newlines ($'\n') in a parameter expansion; $'\n' is an ANSI C-quoted string.
The resulting strings is passed to the stdin of grep as a here-string
(<<<...).
grep is then used to match only those lines that end in ($) the letter i.
To match case-insensitively, use grep -i 'i$'.
A demonstration:
$ (PATH='/ends/in_i:/usr/bin:/also/ends_in_i'; grep 'i$' <<<"${PATH//:/$'\n'}")
/ends/in_i
/also/ends_in_i

Resources