Store files to a list with ls, while removing parts of the dirname with sed - linux

I have a folder lets say located here:
/Users/spotter/Downloads
and within the root folder there are two files:
test1.txt and test2.txt.
I want to write a shell script to save all the files to a list with a line like this:
file_list="$(ls /Users/spotter/Downloads)"
and echo $file_list will return:
/Users/spotter/Downloads/test1.txt
/Users/spotter/Downloads/test2.txt
However I want to change part of the dirname. Particularly I want to remove the /Users/spotter part.
I tried this like so:
file_list="$(ls /Users/spotter/Downloads |
while read path; do dirname "$path" | sed 's/users/spotter///'; done)"
which returns:
sed: 1: "s/users/spotter/Downloa ...": bad flag in substitute command: 'D'
sed: 1: "s/users/spotter/Downloa ...": bad flag in substitute command: 'D'
when I do echo $file_list I want this to be the output:
Downloads/test1.txt
Downloads/test2.txt

The problem is that sed thinks '/' is the delimiter between the RE and the substitution, so sed is not reading the other '/'s the way you want it to. You can use other characters as a delimiter. For instance 's~/Users/spotter/~~'.

Related

How to rename fasta header based on filename in multiple files?

I have a directory with multiple fasta file named as followed:
BC-1_bin_1_genes.faa
BC-1_bin_2_genes.faa
BC-1_bin_3_genes.faa
BC-1_bin_4_genes.faa
etc. (about 200 individual files)
The fasta header look like this:
>BC-1_k127_3926653_6 # 4457 # 5341 # -1 # ID=2_6;partial=01;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.697
I now want to add the filename to the header since I want to annotate the sequences for each file.I tried the following:
for file in *.faa;
do
sed -i "s/>.*/${file%%.*}/" "$file" ;
done
It worked partially but it removed the ">" from the header which is essential for the fasta file. I tried to modify the "${file%%.*}" part to keep the carrot but it always called me out on bad substitutions.
I also tried this:
awk '/>/{sub(">","&"FILENAME"_");sub(/\.faa/,x)}1' *.faa
This worked in theory but only printed everything on my terminal rather than changing it in the respective files.
Could someone assist with this?
It's not clear whether you want to replace the earlier header, or add to it. Both scenarios are easy to do. Don't replace text you don't want to replace.
for file in ./*.faa;
do
sed -i "s/^>.*/>${file%%.*}/" "$file"
done
will replace the header, but include a leading > in the replacement, effectively preserving it; and
for file in ./*.faa;
do
sed -i "s/^>.*/&${file%%.*}/" "$file"
done
will append the file name at the end of the header (& in the replacement string evaluates to the string we are replacing, again effectively preserving it).
For another variation, try
for file in *.faa;
do
sed -i "/^>/s/\$/ ${file%%.*}/" "$file"
done
which says on lines which match the regex ^>, replace the empty string at the end of the line $ with the file name.
Of course, your Awk script could easily be fixed, too. Standard Awk does not have an option to parallel the -i "in-place" option of sed, but you can easily use a temporary file:
for file in ./*.faa;
do
awk '/>/{ $0 = $0 " " FILENAME);sub(/\.faa/,"")}1' "$file" >"$file.tmp" &&
mv "$file.tmp" "$file"
done
GNU Awk also has an -i inplace extension which you could simply add to the options of your existing script if you have GNU Awk.
Since FASTA files typically contain multiple headers, adding to the header rather than replacing all headers in a file with the same string seems more useful, so I changed your Awk script to do that instead.
For what it's worth, the name of the character ^ is caret (carrot is 🥕). The character > is called greater than or right angle bracket, or right broket or sometimes just wedge.
You just need to detect the pattern to replace and use regex to implement it:
fasta_helper.sh
location=$1
for file in $location/*.faa
do
full_filename=${file##*/}
filename="${full_filename%.*}"
#scape special chars
filename=$(echo $filename | sed 's_/_\\/_g')
echo "adding file name: $filename to: $full_filename"
sed -i -E "s/^[^#]+/>$filename /" $location/$full_filename
done
usage:
Just pass the folder with fasta files:
bash fasta_helper.sh /foo/bar
test:
lectures
Regex: matching up to the first occurrence of a character
Extract filename and extension in Bash
https://unix.stackexchange.com/questions/78625/using-sed-to-find-and-replace-complex-string-preferrably-with-regex
Locating your files
Suggesting to first identify your files with find command or ls command.
find . -type f -name "*.faa" -printf "%f\n"
A find command to print only file with filenames extension .faa. Including sub directories to current directory.
ls -1 "*.faa"
An ls command to print files and directories with extension .faa. In current directory.
Processing your files
Once you have the correct files list, iterate over the list and apply sed command.
for fileName in $(find . -type f -name "*.faa" -printf "%f\n"); do
stripedFileName=${fileName/.*/} # strip extension .faa
sed -i "1s|\$| $stripedFileName|" "fileName" # append value of stripedFileName at end of line 1
done

How to replace string in files recursively via sed or awk?

I would like to know how to search from the command line for a string in various files of type .rb.
And replace:
.delay([ANY OPTIONAL TEXT FOR DELETION]).
with
.delay.
Besides sed an awk are there any other command line tools included in the OS that are better for the task?
Status
So far I have the following regular expression:
.delay\(*.*\)\.
I would like to know how to match only the expression ending on the first closing parenthesis? And avoid replacing:
.delay([ANY OPTIONAL TEXT FOR DELETION]).sometext(param)
Thanks in advance!
If you need to find and replace text in files - sed seems to be the best command line solution.
Search for a string in the text file and replace:
sed -i 's/PATTERN/REPLACEMENT/' file.name
Or, if you need to process multiple occurencies of PATTERN in file, add g key
sed -i 's/PATTERN/REPLACEMENT/g' file.name
For multiple files processing - redirect list of files to sed:
echo "${filesList}" | xargs sed -i ...
You can use find to generate your list of files, and xargs to run sed over the result:
find . -type f -print | xargs sed -i 's/\.delay.*/.delay./'
find will generate a list of files contained in your current directory (., although you can of course pass a different directory), xargs will read that list and then run sed with the list of files as an argument.
Instead of find, which here generates a list of all files, you could use something like grep to generate a list of files that contain a specific term. E.g.:
grep -rl '\.delay' | xargs sed -i ...
For the part of the question where you want to only match and replace until the first ) and not include a second pair of (), here is how to change your regex:
.delay\(*.*\)\.
->
\.delay\([^\)]*\)
I.e. match "actual dot, delay, brace open, everything but brace close and brace close".
E.g. using sed:
>echo .delay([ANY OPTIONAL TEXT FOR DELETION]).sometext(param) | sed -E "s/\.delay\([^\)]*\)/.delay/"
.delay.sometext(param)
I recommend to use grep for finding the right files:
grep -rl --include "*.rb" '\.delay' .
Then feed the list into xargs, as recommended by other answers.
Credits to the other answers for providing a solution for feeding multiple files into sed.

Need help editing multiple files using sed in linux terminal

I am trying to do a simple operation here. Which is to cut a few characters from one file (style.css) do a find a replace on another file (client_custom.css) for more then 100 directories with different names
When I use the following command
for d in */; do sed -n 73p ~/assets/*/style.css | cut -c 29-35 | xargs -I :hex: sed -i 's/!BGCOLOR!/:hex:/' ~/assets/*/client_custom.css $d; done
It keeps giving me the following error for all the directories
sed: couldn't edit dirname/: not a regular file
I am confused on why its giving me that error message explicitly gave the full path to the file. It works perfectly fine without a for loop.
Can anyone please help me out with this issue?
sed doesn't support folders as input.
for d in */;
puts folders into $d. If you write sed ... $d, then BASH will put the folder name into the arguments of sed and the poor tool will be confused.
Also ~/assets/*/client_custom.css since this will expand to all the files which match this pattern. So sed will be called once with all file names. You probably want to invoke sed once per file name.
Try
for f in ~/assets/*/client_custom.css; do
... | sed -i 's/!BGCOLOR!/:hex:/' $f
done
or, even better:
for f in ~/assets/*/client_custom.css; do
... | sed 's/!BGCOLOR!/:hex:/' "${f}.in" > "${f}"
done
(which doesn't overwrite the output file). This way, you can keep the "*.in" files, edit them with the patterns and then use sed to "expand" all the variables.

Quickest way to remove 70+ strings from a file?

I have 70+ strings I need to find and delete in a file. I need to remove the entire line in the file that the string appears in.
I know I can use sed -i '/string to remove/d' fileA.txt to remove them one at a time. However, considering I have 70+, it will take some time doing it this way.
Is there a way I can put these 70+ strings in a file and have sed go through them one by one? Or if I create a file containing the strings, is there a way to compare the two files so it removes any line from fileA that contains one of the strings?
You could use grep:
grep -vf file_with_words.txt file.txt
where file_with_words.txt would be the file containing the list of words, each word being on a different line and file.txt is the file that you want to remove the lines from.
If your list of words contains regex metacharacters, then tell grep to consider those as fixed strings (if that is what you want):
grep -F -vf file_with_words.txt file.txt
Using sed, you'd need to say:
sed '/word1\|word2\|word3/d' file.txt
or
sed -E '/word1|word2|word3/d' file.txt
You could use command substitution to construct the pattern too:
sed -E "/$(paste -sd'|' file_with_words.txt)/d" file.txt
but grep is clearly the tool to use in this case.
If you want to do the job in bash, here's how:
search=fileA.txt
queries=queries.txt
while read query
do
sed -i '' "/$query/d" $search
done < "$queries"
where queries.txt looks like
I
want
to
delete
these
lines

Find and replace string containing forward slash in ksh using a variable

In a file (file1.txt) I have /path1/|value1 (a path, followed by a value). I need to find the line containing that (unique) path and then change the value. So the line should end up as: /path1/|value2.
The challenge is that the /path1/, value1 and value2 parts are both contained within variables.
When I don't use a variable, I can use (thanks to this page):
sed '/path1/s/value1/value2/g' file1.txt > copyfile1.txt
(This creates a copy of the original file which I can later overwrite the original file using mv.)
This is just searching for path1. To search for /path1/ I can use:
sed '/\/path1\//s/value1/value2/g' file1.txt > copyfile1.txt
Using the answers to this question about extracting a substring I can put the /path1/, value1 and value2 parts into variables.
So my current code is:
sed '/'"${PATH}"'/s/'"${PREVIOUS_VALUE}"'/'"${NEW_VALUE}"'/g' file1.txt > copyfile1.txt
But this does not work because the PATH variable contains forward slashes. Using information from here I have tried first doing a substitution like this:
FORMATTED_PATH=$(echo "${PATH}" | sed 's/\//\/\//g')
first, and then used FORMATTED_PATH instead of PATH but then the find and replace does not work (no error messages, new file is empty). And in the logging FORMATTED_PATH = //path1// (which I think is correct).
How can I do this find and replace using variables containing forward slashes?
(I found out via this answer that I needed to close the single quote, use double quotes around the variable and then open the single quote again. But this does not help with the forward slashes.)
The code was so nearly right. Instead of:
FORMATTED_PATH=$(echo "${PATH}" | sed 's/\//\/\//g')
I should have had:
FORMATTED_PATH=$(echo "${PATH}" | sed 's/\//\\\//g')
This then produces the correct logging of: FORMATTED_PATH = \/path1\/
awk will work too:
awk -F '|' -v path="$paht" -v new="$new_value" '{
if ($1 == path) {print path FS new}
else {print}
}' file1.txt > copyfile1.txt
Also, don't use all-caps for your shell variables: you have wiped out your shell's PATH variable used to find programs..
Usually the sed's s command (as in s///) supports using separators other than /. For example:
$ echo '/path1/|value1' | sed 's,\(/path1/|\).*,\1value2,'
/path1/|value2
$
This is very convenient when dealing with file pathnames which include / chars.

Resources