Linux bash output fdirectory files to a text file with xargs and add new line - linux

I want to generate a text file with the list of files present in the folder
ls | xargs echo > text.txt
I want to prepend the IP address to each file so that I can run parallel wget as per this post : Parallel wget in Bash
So my text.txt file content will have these lines :
123.123.123.123/file1
123.123.123.123/file2
123.123.123.123/file3
How can I append a string as the ls feeds xargs? (and also add line break at the end.)
Thank you

Simply printf and globbing to get the filenames:
printf '123.123.123.123/%s\n' * >file.txt
Or longer approach, leverage a for construct with help from globbing:
for f in *; do echo "123.123.123.123/$f"; done >file.txt
Assuming no filename with newline exists.

Related

How to rename fasta header based on filename in multiple files?

I have a directory with multiple fasta file named as followed:
BC-1_bin_1_genes.faa
BC-1_bin_2_genes.faa
BC-1_bin_3_genes.faa
BC-1_bin_4_genes.faa
etc. (about 200 individual files)
The fasta header look like this:
>BC-1_k127_3926653_6 # 4457 # 5341 # -1 # ID=2_6;partial=01;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.697
I now want to add the filename to the header since I want to annotate the sequences for each file.I tried the following:
for file in *.faa;
do
sed -i "s/>.*/${file%%.*}/" "$file" ;
done
It worked partially but it removed the ">" from the header which is essential for the fasta file. I tried to modify the "${file%%.*}" part to keep the carrot but it always called me out on bad substitutions.
I also tried this:
awk '/>/{sub(">","&"FILENAME"_");sub(/\.faa/,x)}1' *.faa
This worked in theory but only printed everything on my terminal rather than changing it in the respective files.
Could someone assist with this?
It's not clear whether you want to replace the earlier header, or add to it. Both scenarios are easy to do. Don't replace text you don't want to replace.
for file in ./*.faa;
do
sed -i "s/^>.*/>${file%%.*}/" "$file"
done
will replace the header, but include a leading > in the replacement, effectively preserving it; and
for file in ./*.faa;
do
sed -i "s/^>.*/&${file%%.*}/" "$file"
done
will append the file name at the end of the header (& in the replacement string evaluates to the string we are replacing, again effectively preserving it).
For another variation, try
for file in *.faa;
do
sed -i "/^>/s/\$/ ${file%%.*}/" "$file"
done
which says on lines which match the regex ^>, replace the empty string at the end of the line $ with the file name.
Of course, your Awk script could easily be fixed, too. Standard Awk does not have an option to parallel the -i "in-place" option of sed, but you can easily use a temporary file:
for file in ./*.faa;
do
awk '/>/{ $0 = $0 " " FILENAME);sub(/\.faa/,"")}1' "$file" >"$file.tmp" &&
mv "$file.tmp" "$file"
done
GNU Awk also has an -i inplace extension which you could simply add to the options of your existing script if you have GNU Awk.
Since FASTA files typically contain multiple headers, adding to the header rather than replacing all headers in a file with the same string seems more useful, so I changed your Awk script to do that instead.
For what it's worth, the name of the character ^ is caret (carrot is 🥕). The character > is called greater than or right angle bracket, or right broket or sometimes just wedge.
You just need to detect the pattern to replace and use regex to implement it:
fasta_helper.sh
location=$1
for file in $location/*.faa
do
full_filename=${file##*/}
filename="${full_filename%.*}"
#scape special chars
filename=$(echo $filename | sed 's_/_\\/_g')
echo "adding file name: $filename to: $full_filename"
sed -i -E "s/^[^#]+/>$filename /" $location/$full_filename
done
usage:
Just pass the folder with fasta files:
bash fasta_helper.sh /foo/bar
test:
lectures
Regex: matching up to the first occurrence of a character
Extract filename and extension in Bash
https://unix.stackexchange.com/questions/78625/using-sed-to-find-and-replace-complex-string-preferrably-with-regex
Locating your files
Suggesting to first identify your files with find command or ls command.
find . -type f -name "*.faa" -printf "%f\n"
A find command to print only file with filenames extension .faa. Including sub directories to current directory.
ls -1 "*.faa"
An ls command to print files and directories with extension .faa. In current directory.
Processing your files
Once you have the correct files list, iterate over the list and apply sed command.
for fileName in $(find . -type f -name "*.faa" -printf "%f\n"); do
stripedFileName=${fileName/.*/} # strip extension .faa
sed -i "1s|\$| $stripedFileName|" "fileName" # append value of stripedFileName at end of line 1
done

Shell Script With sed and Random number

How to make a shell script that receives one or more text files and removes from them whitespaces and blanklines. After that new files will have a random 2-digit number in front of them.
For example File1.txt generates File1_56.txt
Tried this:
#!/bin/bash
for file in "$*"; do
sed -e '/^$/d;s/[[:blank:]]//g' $* >> "$*_$$.txt"
done
But when I give 2 files as input script merges them into one single file, when I want for each file a separate one.
Try:
#!/bin/bash
for file in "$#"; do
sed -e '/^$/d;s/[[:blank:]]//g' "$file" >> "${file%.txt}_$$.txt"
done
Notes
To loop over each argument without word splitting or other hazards, use for file in "$#" not for file in "$*"
To run the sed command on one file instead of all, specify "$file" as the file, not $*.
To save the output to the correct file, use "${file%.txt}_$$.txt" where ${file%.txt} is an example of suffix removal: it removes the final .txt from the file name.
$$ is the process ID. The title says mentions a "random" number. If you want a random number, replace $$ with $RANDOM.

Copy text from multiple files, same names to different path in bash (linux)

I need help copying content from various files to others (same name and format, different path).
For example, $HOME/initial/baby.desktop has text which I need to write into $HOME/scripts/baby.desktop. This is very simple for a single file, but I have 2500 files in $HOME/initial/ and the same number in $HOME/scripts/ with corresponding names (same names and format). I want append (copy) the content of file in path A to path B (which have the same name and format), to the end of file in path B without erase the content of file in path B.
Example content of $HOME/initial/*.desktop to final $HOME/scripts/*.desktop. I tried the following, but it don't work:
cd $HOME/initial/
for i in $( ls *.desktop ); do egrep "Icon" $i >> $HOME/scripts/$i; done
Firstly, I would backup $HOME/initial and $HOME/scripts, because there is lots of scope for people misunderstanding your question. Like this:
cd $HOME
tar -cvf initial.tar initial
tar -cvf scripts.tar scripts
That will put all the files in $HOME/initial into a single tarfile called initial.tar and all the files in $HOME/scripts into a single tarfile called scripts.tar.
Now for your question... in general, if you want to put the contents of FileB onto the end of FileA, the command is
cat FileB >> FileA
Note the DOUBLE ">>" which means "append" rather than single ">" which means overwrite.
So, I think you want to do this:
cd $HOME/initial/baby.desktop
cat SomeFile >> $HOME/scripts/baby.desktop/SomeFile
where SomeFile is the name of any file you choose to test with. I would test that has worked and then, if you are happy with that, go ahead and run the same command inside a loop:
cd $HOME/initial/baby.desktop
for SOURCE in *
do
DESTINATION="$HOME/scripts/baby.desktop/$SOURCE"
echo Appending "$SOURCE" to "$DESTINATION"
#cat "$SOURCE" >> "$DESTINATION"
done
When the output looks correct, remove the "#" at the start of the penultimate line and run it again.
I solved it, if some people want learn how to resolve is very simple:
using Sed
I need only the match (or pattern) line "Icon=/usr/share/some_picture.png into $HOME/initial/example.desktop to other with same name and format $HOME/scripts/example.desktop, but I had a lot of .desktop files (2500 files)
cd $HOME/initial
STRING_LINE=`grep -l -R "Icon=" *.desktop`
for i in $STRING_LINE; do sed -ne '/Icon=/ p' $i >> $HOME/scripts/$i ; done
_________
If you need only copy all to other file with same name and format
using cat
cd $HOME/initial
STRING_LINE=`grep -l -R "Icon=" *.desktop`
for i in $STRING_LINE; do cat $i >> $HOME/scripts/$i ; done

Pass string to script

I have a script, download, that takes a string and checks if a file has the filename of the string. If it doesn't, it then downloads it. All the filenames are in a file.
This command is not working:
cat filenames | ./download
Download source:
filename=$1
if [ ! -f $1 ];
then
wget -q http://www.example.com/nature/life/${filename}.rdf
fi
Sample filename file:
file1
file2
file3
file4
How do I pass the command output from the cat to the download script?
In your script $1 is the positional arg on the command line. ./download somefile would work, but cat filename | ./download streams the data into download, which you ignore.
You should read the advanced bash scripting guide, which will give you a good base for how bash scripting works. To fix this, change your command to:
cat filename | xargs -n 1 ./download
This will run ./download for each filename in your list. However, the filenames may have spaces or other special characters in them, which would break your script. You should look into alternatives ways of doing this, to avoid these problems.
Specifically, use a while loop to read your file. This properly escapes your filenames on each line, if they were input into the file correctly. That way, you avoid the problems cat would have with filenames like: fi/\nle.
You can pass a filename to a file that contains file names to your script:
./download filenames
And then loop through file names from the file name in $1:
$!/bin/bash
# Do sanity check
fname=$1
for f in $(<$fname); do
if [ ! -f "$f.rdf" ]; then
wget -q http://www.example.com/nature/life/${f}.rdf
fi
done

How to insert a text at the beginning of a file?

So far I've been able to find out how to add a line at the beginning of a file but that's not exactly what I want. I'll show it with an example:
File content
some text at the beginning
Result
<added text> some text at the beginning
It's similar but I don't want to create any new line with it...
I would like to do this with sed if possible.
sed can operate on an address:
$ sed -i '1s/^/<added text> /' file
What is this magical 1s you see on every answer here? Line addressing!.
Want to add <added text> on the first 10 lines?
$ sed -i '1,10s/^/<added text> /' file
Or you can use Command Grouping:
$ { echo -n '<added text> '; cat file; } >file.new
$ mv file{.new,}
If you want to add a line at the beginning of a file, you need to add \n at the end of the string in the best solution above.
The best solution will add the string, but with the string, it will not add a line at the end of a file.
sed -i '1s/^/your text\n/' file
If the file is only one line, you can use:
sed 's/^/insert this /' oldfile > newfile
If it's more than one line. one of:
sed '1s/^/insert this /' oldfile > newfile
sed '1,1s/^/insert this /' oldfile > newfile
I've included the latter so that you know how to do ranges of lines. Both of these "replace" the start line marker on their affected lines with the text you want to insert. You can also (assuming your sed is modern enough) use:
sed -i 'whatever command you choose' filename
to do in-place editing.
Use subshell:
echo "$(echo -n 'hello'; cat filename)" > filename
Unfortunately, command substitution will remove newlines at the end of file. So as to keep them one can use:
echo -n "hello" | cat - filename > /tmp/filename.tmp
mv /tmp/filename.tmp filename
Neither grouping nor command substitution is needed.
To insert just a newline:
sed '1i\\'
You can use cat -
printf '%s' "some text at the beginning" | cat - filename
To add a line to the top of the file:
sed -i '1iText to add\'
my two cents:
sed -i '1i /path/of/file.sh' filename
This will work even is the string containing forward slash "/"
Hi with carriage return:
sed -i '1s/^/your text\n/' file
Note that on OS X, sed -i <pattern> file, fails. However, if you provide a backup extension, sed -i old <pattern> file, then file is modified in place while file.old is created. You can then delete file.old in your script.
There is a very easy way:
echo "your header" > headerFile.txt
cat yourFile >> headerFile.txt
PROBLEM: tag a file, at the top of the file, with the base name of the parent directory.
I.e., for
/mnt/Vancouver/Programming/file1
tag the top of file1 with Programming.
SOLUTION 1 -- non-empty files:
bn=${PWD##*/} ## bn: basename
sed -i '1s/^/'"$bn"'\n/' <file>
1s places the text at line 1 of the file.
SOLUTION 2 -- empty or non-empty files:
The sed command, above, fails on empty files. Here is a solution, based on https://superuser.com/questions/246837/how-do-i-add-text-to-the-beginning-of-a-file-in-bash/246841#246841
printf "${PWD##*/}\n" | cat - <file> > temp && mv -f temp <file>
Note that the - in the cat command is required (reads standard input: see man cat for more information). Here, I believe, it's needed to take the output of the printf statement (to STDIN), and cat that and the file to temp ... See also the explanation at the bottom of http://www.linfo.org/cat.html.
I also added -f to the mv command, to avoid being asked for confirmations when overwriting files.
To recurse over a directory:
for file in *; do printf "${PWD##*/}\n" | cat - $file > temp && mv -f temp $file; done
Note also that this will break over paths with spaces; there are solutions, elsewhere (e.g. file globbing, or find . -type f ... -type solutions) for those.
ADDENDUM: Re: my last comment, this script will allow you to recurse over directories with spaces in the paths:
#!/bin/bash
## https://stackoverflow.com/questions/4638874/how-to-loop-through-a-directory-recursively-to-delete-files-with-certain-extensi
## To allow spaces in filenames,
## at the top of the script include: IFS=$'\n'; set -f
## at the end of the script include: unset IFS; set +f
IFS=$'\n'; set -f
# ----------------------------------------------------------------------------
# SET PATHS:
IN="/mnt/Vancouver/Programming/data/claws-test/corpus test/"
# https://superuser.com/questions/716001/how-can-i-get-files-with-numeric-names-using-ls-command
# FILES=$(find $IN -type f -regex ".*/[0-9]*") ## recursive; numeric filenames only
FILES=$(find $IN -type f -regex ".*/[0-9 ]*") ## recursive; numeric filenames only (may include spaces)
# echo '$FILES:' ## single-quoted, (literally) prints: $FILES:
# echo "$FILES" ## double-quoted, prints path/, filename (one per line)
# ----------------------------------------------------------------------------
# MAIN LOOP:
for f in $FILES
do
# Tag top of file with basename of current dir:
printf "[top] Tag: ${PWD##*/}\n\n" | cat - $f > temp && mv -f temp $f
# Tag bottom of file with basename of current dir:
printf "\n[bottom] Tag: ${PWD##*/}\n" >> $f
done
unset IFS; set +f
Just for fun, here is a solution using ed which does not have the problem of not working on an empty file. You can put it into a shell script just like any other answer to this question.
ed Test <<EOF
a
.
0i
<added text>
.
1,+1 j
$ g/^$/d
wq
EOF
The above script adds the text to insert to the first line, and then joins the first and second line. To avoid ed exiting on error with an invalid join, it first creates a blank line at the end of the file and remove it later if it still exists.
Limitations: This script does not work if <added text> is exactly equal to a single period.
echo -n "text to insert " ;tac filename.txt| tac > newfilename.txt
The first tac pipes the file backwards (last line first) so the "text to insert" appears last. The 2nd tac wraps it once again so the inserted line is at the beginning and the original file is in its original order.
The simplest solution I found is:
echo -n "<text to add>" | cat - myFile.txt | tee myFile.txt
Notes:
Remove | tee myFile.txt if you don't want to change the file contents.
Remove the -n parameter if you want to append a full line.
Add &> /dev/null to the end if you don't want to see the output (the generated file).
This can be used to append a shebang to the file. Example:
# make it executable (use u+x to allow only current user)
chmod +x cropImage.ts
# append the shebang
echo '#''!'/usr/bin/env ts-node | cat - cropImage.ts | tee cropImage.ts &> /dev/null
# execute it
./cropImage.ts myImage.png
Another solution with aliases. Add to your init rc/ env file:
addtail () { find . -type f ! -path "./.git/*" -exec sh -c "echo $# >> {}" \; }
addhead () { find . -type f ! -path "./.git/*" -exec sh -c "sed -i '1s/^/$#\n/' {}" \; }
Usage:
addtail "string to add at the beginning of file"
addtail "string to add at the end of file"
With the echo approach, if you are on macOS/BSD like me, lose the -n switch that other people suggest. And I like to define a variable for the text.
So it would be like this:
Header="my complex header that may have difficult chars \"like these quotes\" and line breaks \n\n "
{ echo "$Header"; cat "old.txt"; } > "new.txt"
mv new.txt old.txt
TL;dr -
Consider using ex. Since you want the front of a given line, then the syntax is basically the same as what you might find for sed but the option of "in place editing" is built-in.
I cannot imagine an environment where you have sed but not ex/vi, unless it is a MS Windows box with some special "sed.exe", maybe.
sed & grep sort of evolved from ex / vi, so it might be better to say sed syntax is the same as ex.
You can change the line number to something besides #1 or search for a line and change that one.
source=myFile.txt
Front="This goes IN FRONT "
man true > $source
ex -s ${source} <<EOF
1s/^/$Front/
wq
EOF
$ head -n 3 $source
This goes IN FRONT TRUE(1) User Commands TRUE(1)
NAME
Long version, I recommend ex (or ed if you are one of the cool kids).
I like ex because it is portable, extremely powerful, allows me to write in-place, and/or make backups all without needing GNU (or even BSD) extensions.
Additionally, if you know the ex way, then you know how to do it in vi - and probably vim if that is your jam.
Notice that EOF is not quoted when we use "i"nsert and using echo:
str="+++ TOP +++" && ex -s <<EOF
r!man true
1i
`echo "$str"`
.
"0r!echo "${str}"
wq! true.txt
EOF
0r!echo "${str}" might also be used as shorthand for :0read! or :0r! that you have likely used in vi mode (it is literally the same thing) but the : is optional here and some implementations do not support "r"ead address of zero.
"r"eading directly to the special line #0 (or from line 1) would automatically push everything "down", and then you just :wq to save your changes.
$ head -n 3 true.txt | nl -ba
1 +++ TOP +++
2 TRUE(1) User Commands TRUE(1)
3
Also, most classic sed implementations do not have extensions (like \U&) that ex should have by default.
cat concatenates multiple files. <() sends output of a command as a file. Combining these two, we can insert lines at the beginning and end of a file by,
cat <(echo "line before the file") file.txt <(echo "line after the file")

Resources