Shell Script With sed and Random number - linux

How to make a shell script that receives one or more text files and removes from them whitespaces and blanklines. After that new files will have a random 2-digit number in front of them.
For example File1.txt generates File1_56.txt
Tried this:
#!/bin/bash
for file in "$*"; do
sed -e '/^$/d;s/[[:blank:]]//g' $* >> "$*_$$.txt"
done
But when I give 2 files as input script merges them into one single file, when I want for each file a separate one.

Try:
#!/bin/bash
for file in "$#"; do
sed -e '/^$/d;s/[[:blank:]]//g' "$file" >> "${file%.txt}_$$.txt"
done
Notes
To loop over each argument without word splitting or other hazards, use for file in "$#" not for file in "$*"
To run the sed command on one file instead of all, specify "$file" as the file, not $*.
To save the output to the correct file, use "${file%.txt}_$$.txt" where ${file%.txt} is an example of suffix removal: it removes the final .txt from the file name.
$$ is the process ID. The title says mentions a "random" number. If you want a random number, replace $$ with $RANDOM.

Related

How to rename fasta header based on filename in multiple files?

I have a directory with multiple fasta file named as followed:
BC-1_bin_1_genes.faa
BC-1_bin_2_genes.faa
BC-1_bin_3_genes.faa
BC-1_bin_4_genes.faa
etc. (about 200 individual files)
The fasta header look like this:
>BC-1_k127_3926653_6 # 4457 # 5341 # -1 # ID=2_6;partial=01;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.697
I now want to add the filename to the header since I want to annotate the sequences for each file.I tried the following:
for file in *.faa;
do
sed -i "s/>.*/${file%%.*}/" "$file" ;
done
It worked partially but it removed the ">" from the header which is essential for the fasta file. I tried to modify the "${file%%.*}" part to keep the carrot but it always called me out on bad substitutions.
I also tried this:
awk '/>/{sub(">","&"FILENAME"_");sub(/\.faa/,x)}1' *.faa
This worked in theory but only printed everything on my terminal rather than changing it in the respective files.
Could someone assist with this?
It's not clear whether you want to replace the earlier header, or add to it. Both scenarios are easy to do. Don't replace text you don't want to replace.
for file in ./*.faa;
do
sed -i "s/^>.*/>${file%%.*}/" "$file"
done
will replace the header, but include a leading > in the replacement, effectively preserving it; and
for file in ./*.faa;
do
sed -i "s/^>.*/&${file%%.*}/" "$file"
done
will append the file name at the end of the header (& in the replacement string evaluates to the string we are replacing, again effectively preserving it).
For another variation, try
for file in *.faa;
do
sed -i "/^>/s/\$/ ${file%%.*}/" "$file"
done
which says on lines which match the regex ^>, replace the empty string at the end of the line $ with the file name.
Of course, your Awk script could easily be fixed, too. Standard Awk does not have an option to parallel the -i "in-place" option of sed, but you can easily use a temporary file:
for file in ./*.faa;
do
awk '/>/{ $0 = $0 " " FILENAME);sub(/\.faa/,"")}1' "$file" >"$file.tmp" &&
mv "$file.tmp" "$file"
done
GNU Awk also has an -i inplace extension which you could simply add to the options of your existing script if you have GNU Awk.
Since FASTA files typically contain multiple headers, adding to the header rather than replacing all headers in a file with the same string seems more useful, so I changed your Awk script to do that instead.
For what it's worth, the name of the character ^ is caret (carrot is 🥕). The character > is called greater than or right angle bracket, or right broket or sometimes just wedge.
You just need to detect the pattern to replace and use regex to implement it:
fasta_helper.sh
location=$1
for file in $location/*.faa
do
full_filename=${file##*/}
filename="${full_filename%.*}"
#scape special chars
filename=$(echo $filename | sed 's_/_\\/_g')
echo "adding file name: $filename to: $full_filename"
sed -i -E "s/^[^#]+/>$filename /" $location/$full_filename
done
usage:
Just pass the folder with fasta files:
bash fasta_helper.sh /foo/bar
test:
lectures
Regex: matching up to the first occurrence of a character
Extract filename and extension in Bash
https://unix.stackexchange.com/questions/78625/using-sed-to-find-and-replace-complex-string-preferrably-with-regex
Locating your files
Suggesting to first identify your files with find command or ls command.
find . -type f -name "*.faa" -printf "%f\n"
A find command to print only file with filenames extension .faa. Including sub directories to current directory.
ls -1 "*.faa"
An ls command to print files and directories with extension .faa. In current directory.
Processing your files
Once you have the correct files list, iterate over the list and apply sed command.
for fileName in $(find . -type f -name "*.faa" -printf "%f\n"); do
stripedFileName=${fileName/.*/} # strip extension .faa
sed -i "1s|\$| $stripedFileName|" "fileName" # append value of stripedFileName at end of line 1
done

How to auto insert a string in filename by bash?

I have the output file day by day:
linux-202105200900-foo.direct.tar.gz
The date and time string, ex: 202105200900 will change every day.
I need to manually rename these files to
linux-202105200900x86-foo.direct.tar.gz
( insert a short string x86 after date/time )
any bash script can help to do this?
If you're always inserting the string "x86" at character #18 in the string, you may use that command:
var="linux-202105200900-foo.direct.tar.gz"
var2=${var:0:18}"x86"${var:18}
echo $var2
The 2nd line means: "assign to variable var2 the first 18 characters of var, followed by x86 followed by the rest of the variable var"
If you want to insert "x86" just before the last hyphen in the string, you may write it like this:
var="linux-202105200900-foo.direct.tar.gz"
var2=${var%-*}"x86-"${var##*-}
echo $var2
The 2nd line means: "assign to variable var2:
the content of the variable var after removing the shortest matching pattern "-*" at the end
the string "x86-"
the content of the variable var after removing the longest matching pattern "*-" at the beginning
In addition to the very good answer by #Jean-Loup Sabatier another, perhaps more general way would simply be to replace the second occurrence of '-' with x86- which you can do with sed. Let's say you have:
fname=linux-202105200900-foo.direct.tar.gz
You can update that with:
fname="$(sed 's/-/x86-/2' <<< "$fname")"
Which simply uses a command substitution with sed and a herestring to modify fname assigning the modified result back to fname.
Example Use/Output
$ fname=linux-202105200900-foo.direct.tar.gz
fname="$(sed 's/-/x86-/2' <<< "$fname")"
echo $fname
linux-202105200900x86-foo.direct.tar.gz
Do you need this?
❯ dat=$(date '+%Y%m%d%H%M%S'); echo ${dat}
20210520170336
❯ filename="linux-${dat}x86-foo.direct.tar.gz"; echo ${filename}
linux-20210520170336x86-foo.direct.tar.gz
I wanted to go as simple as possible, considering only the timestamp is going to change, this script should do it. Just run it inside the folder where files are located and you'll get all of them renamed with x86.
#!/bin/bash
for file in $(ls); do
replaced=$(echo $file | sed 's|-foo|x86-foo|g')
mv $file $replaced
done
This is my output
filip#filip-ThinkPad-T14-Gen-1:~/test$ ls
linux-202105200900-foo.direct.tar.gz linux-202105201000-foo.direct.tar.gz linux-202105201100-foo.direct.tar.gz
filip#filip-ThinkPad-T14-Gen-1:~/test$ ./../development/bash-utils/bulk-rename.sh
filip#filip-ThinkPad-T14-Gen-1:~/test$ ls
linux-202105200900x86-foo.direct.tar.gz linux-202105201000x86-foo.direct.tar.gz linux-202105201100x86-foo.direct.tar.gz
Simply iterate through all the files in current folder and pipeline result to sed to replace regex -foo with x86-foo, then rename file with mv command.
As David mentioned in comment, if you're worried that there could be multiple occurrences of -foo then you can just replace g as global to 1 as first occurrence and that's it!
There is also the rename utility (https://man7.org/linux/man-pages/man1/rename.1.html), you could use:
rename -v 0-foo.direct.tar.gz 0x86-foo.direct.tar.gz *
which results in
`linux-202105200900-foo.direct.tar.gz' -> `linux-202105200900x86-foo.direct.tar.gz'
`linux-202205200900-foo.direct.tar.gz' -> `linux-202205200900x86-foo.direct.tar.gz'
`linux-202305200900-foo.direct.tar.gz' -> `linux-202305200900x86-foo.direct.tar.gz'
In addition to the very good answer by #David C. Rankin, just adding it in a loop and renaming the files
# !/usr/bin/bash
for file in `ls linux* 2>/dev/null` # Extract all files starting with linux
do
echo $file
fname="$(sed 's/-/x86-/2' <<< "$file")"
mv "$file" "$fname" # Rename file
done
Output recieved :
linux-202105200900x86-foo.direct.tar.gz

rename all files after extension

Is it possible to write a script to rename all files after the extension?
Example in the folder, there are :
hello.txt-123ahr
bye.txt-56athe
test.txt-98hg12
I want the output:
hello.txt
bye.txt
test.txt
If you just want to remove everything from the dash forwards, you can use Parameter expansion:
#!/bin/bash
for file in *.txt-* ; do
mv "$file" "${file%-*}"
done
Where ${file%-*} means "remove from $file everytning from the last dash". If you want to start from the first dash, use %%.
Note that you might overwrite some files if their leading parts are equivalent, e.g. hello.txt-123abc and hello.txt-456xyz.

For loop in command line runs bash script reading from text file line by line

I have a bash script which asks for two arguments with a space between them. Now I would like to automate filling out the prompt in the command line with reading from a text file. The text file contains a list with the argument combinations.
So something like this in the command line I think;
for line in 'cat text.file' ; do script.sh ; done
Can this be done? What am I missing/doing wrong?
Thanks for the help.
A while loop is probably what you need. Put the space separated strings in the file text.file :
cat text.file
bingo yankee
bravo delta
Then write the script in question like below.
#!/bin/bash
while read -r arg1 arg2
do
/path/to/your/script.sh "$arg1" "$arg2"
done<text.file
Don't use for to read files line by line
Try something like this:
#!/bin/bash
ARGS=
while IFS= read -r line; do
ARGS="${ARGS} ${line}"
done < ./text.file
script.sh "$ARGS"
This would add each line to a variable which then is used as the arguments of your script.
'cat text.file' is a string literal, $(cat text.file) would expand to output of command however cat is useless because bash can read file using redirection, also with quotes it will be treated as a single argument and without it will split at space tab and newlines.
Bash syntax to read a file line by line, but will be slow for big files
while IFS= read -r line; do ... "$line"; done < text.file
unsetting IFS for read command preserves leading spaces
-r option preserves \
another way, to read whole file is content=$(<file), note the < inside the command substitution. so a creative way to read a file to array, each element a non-empty line:
read_to_array () {
local oldsetf=${-//[^f]} oldifs=$IFS
set -f
IFS=$'\n' array_content=($(<"$1")) IFS=$oldifs
[[ $oldsetf ]]||set +f
}
read_to_array "file"
for element in "${array_content[#]}"; do ...; done
oldsetf used to store current set -f or set +f setting
oldifs used to store current IFS
IFS=$'\n' to split on newlines (multiple newlines will be treated as one)
set -f avoid glob expansion for example in case line contains single *
note () around $() to store the result of splitting to an array
If I were to create a solution determined by the literal of what you ask for (using a for loop and parsing lines from a file) I would use iterations determined by the number of lines in the file (if it isn't too large).
Assuming each line has two strings separated by a single space (to be used as positional parameters in your script:
file="$1"
f_count="$(wc -l < $file)"
for line in $(seq 1 $f_count)
do
script.sh $(head -n $line $file | tail -n1) && wait
done
You may have a much better time using sjsam's solution however.

Linux bash output fdirectory files to a text file with xargs and add new line

I want to generate a text file with the list of files present in the folder
ls | xargs echo > text.txt
I want to prepend the IP address to each file so that I can run parallel wget as per this post : Parallel wget in Bash
So my text.txt file content will have these lines :
123.123.123.123/file1
123.123.123.123/file2
123.123.123.123/file3
How can I append a string as the ls feeds xargs? (and also add line break at the end.)
Thank you
Simply printf and globbing to get the filenames:
printf '123.123.123.123/%s\n' * >file.txt
Or longer approach, leverage a for construct with help from globbing:
for f in *; do echo "123.123.123.123/$f"; done >file.txt
Assuming no filename with newline exists.

Resources