Replace spaces in all files in a directory with underscores - linux

I have found some similar questions here but not this specific one and I do not want to break all my files. I have a list of files and I simply need to replace all spaces with underscores. I know this is a sed command but I am not sure how to generically apply this to every file.
I do not want to rename the files, just modify them in place.
Edit: To clarify, just in case it's not clear, I only want to replace whitespace within the files, file names should not be changed.

find . -type f -exec sed -i -e 's/ /_/g' {} \;
find grabs all items in the directory (and subdirectories) that are files, and passes those filenames as arguments to the sed command using the {} \; notation. The sed command it appears you already understand.
if you only want to search the current directory, and ignore subdirectories, you can use
find . -maxdepth 1 -type f -exec sed -i -e 's/ /_/g' {} \;

This is a 2 part problem. Step 1 is providing the proper sed command, 2 is providing the proper command to replace all files in a given directory.
Substitution in sed commands follows the form s/ItemToReplace/ItemToReplaceWith/pattern, where s stands for the substitution and pattern stands for how the operation should take place. According to this super user post, in order to match whitespace characters you must use either \s or [[:space:]] in your sed command. The difference being the later is for POSIX compliance. Lastly you need to specify a global operation which is simply /g at the end. This simply replaces all spaces in a file with underscores.
Substitution in sed commands follows the form s/ItemToReplace/ItemToReplaceWith/pattern, where s stands for the substitution and pattern stands for how the operation should take place. According to this super user post, in order to match whitespace characters you must use either just a space in your sed command, \s, or [[:space:]]. The difference being the last 2 are for whitespace catching (tabs and spaces), with the last needed for POSIX compliance. Lastly you need to specify a global operation which is simply /g at the end.
Therefore, your sed command is
sed s/ /_/g FileNameHere
However this only accomplishes half of your task. You also need to be able to do this for every file within a directory. Unfortunately, wildcards won't save us in the sed command, as * > * would be ambiguous. Your only solution is to iterate through each file and overwrite them individually. For loops by default should come equipped with file iteration syntax, and when used with wildcards expands out to all files in a directory. However sed's used in this manner appear to completely lose output when redirecting to a file. To correct this, you must specify sed with the -i flag so it will edit its files. Whatever item you pass after the -i flag will be used to create a backup of the old files. If no extension is passed (-i '' for instance), no backup will be created.
Therefore the final command should simply be
for i in *;do sed -i '' 's/ /_/g' $i;done
Which looks for all files in your current directory and echos the sed output to all files (Directories do get listed but no action occurs with them).

Well... since I was trying to get something running I found a method that worked for me:
for file in `ls`; do sed -i 's/ /_/g' $file; done

Related

How to rename fasta header based on filename in multiple files?

I have a directory with multiple fasta file named as followed:
BC-1_bin_1_genes.faa
BC-1_bin_2_genes.faa
BC-1_bin_3_genes.faa
BC-1_bin_4_genes.faa
etc. (about 200 individual files)
The fasta header look like this:
>BC-1_k127_3926653_6 # 4457 # 5341 # -1 # ID=2_6;partial=01;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.697
I now want to add the filename to the header since I want to annotate the sequences for each file.I tried the following:
for file in *.faa;
do
sed -i "s/>.*/${file%%.*}/" "$file" ;
done
It worked partially but it removed the ">" from the header which is essential for the fasta file. I tried to modify the "${file%%.*}" part to keep the carrot but it always called me out on bad substitutions.
I also tried this:
awk '/>/{sub(">","&"FILENAME"_");sub(/\.faa/,x)}1' *.faa
This worked in theory but only printed everything on my terminal rather than changing it in the respective files.
Could someone assist with this?
It's not clear whether you want to replace the earlier header, or add to it. Both scenarios are easy to do. Don't replace text you don't want to replace.
for file in ./*.faa;
do
sed -i "s/^>.*/>${file%%.*}/" "$file"
done
will replace the header, but include a leading > in the replacement, effectively preserving it; and
for file in ./*.faa;
do
sed -i "s/^>.*/&${file%%.*}/" "$file"
done
will append the file name at the end of the header (& in the replacement string evaluates to the string we are replacing, again effectively preserving it).
For another variation, try
for file in *.faa;
do
sed -i "/^>/s/\$/ ${file%%.*}/" "$file"
done
which says on lines which match the regex ^>, replace the empty string at the end of the line $ with the file name.
Of course, your Awk script could easily be fixed, too. Standard Awk does not have an option to parallel the -i "in-place" option of sed, but you can easily use a temporary file:
for file in ./*.faa;
do
awk '/>/{ $0 = $0 " " FILENAME);sub(/\.faa/,"")}1' "$file" >"$file.tmp" &&
mv "$file.tmp" "$file"
done
GNU Awk also has an -i inplace extension which you could simply add to the options of your existing script if you have GNU Awk.
Since FASTA files typically contain multiple headers, adding to the header rather than replacing all headers in a file with the same string seems more useful, so I changed your Awk script to do that instead.
For what it's worth, the name of the character ^ is caret (carrot is 🥕). The character > is called greater than or right angle bracket, or right broket or sometimes just wedge.
You just need to detect the pattern to replace and use regex to implement it:
fasta_helper.sh
location=$1
for file in $location/*.faa
do
full_filename=${file##*/}
filename="${full_filename%.*}"
#scape special chars
filename=$(echo $filename | sed 's_/_\\/_g')
echo "adding file name: $filename to: $full_filename"
sed -i -E "s/^[^#]+/>$filename /" $location/$full_filename
done
usage:
Just pass the folder with fasta files:
bash fasta_helper.sh /foo/bar
test:
lectures
Regex: matching up to the first occurrence of a character
Extract filename and extension in Bash
https://unix.stackexchange.com/questions/78625/using-sed-to-find-and-replace-complex-string-preferrably-with-regex
Locating your files
Suggesting to first identify your files with find command or ls command.
find . -type f -name "*.faa" -printf "%f\n"
A find command to print only file with filenames extension .faa. Including sub directories to current directory.
ls -1 "*.faa"
An ls command to print files and directories with extension .faa. In current directory.
Processing your files
Once you have the correct files list, iterate over the list and apply sed command.
for fileName in $(find . -type f -name "*.faa" -printf "%f\n"); do
stripedFileName=${fileName/.*/} # strip extension .faa
sed -i "1s|\$| $stripedFileName|" "fileName" # append value of stripedFileName at end of line 1
done

How to replace string in files recursively via sed or awk?

I would like to know how to search from the command line for a string in various files of type .rb.
And replace:
.delay([ANY OPTIONAL TEXT FOR DELETION]).
with
.delay.
Besides sed an awk are there any other command line tools included in the OS that are better for the task?
Status
So far I have the following regular expression:
.delay\(*.*\)\.
I would like to know how to match only the expression ending on the first closing parenthesis? And avoid replacing:
.delay([ANY OPTIONAL TEXT FOR DELETION]).sometext(param)
Thanks in advance!
If you need to find and replace text in files - sed seems to be the best command line solution.
Search for a string in the text file and replace:
sed -i 's/PATTERN/REPLACEMENT/' file.name
Or, if you need to process multiple occurencies of PATTERN in file, add g key
sed -i 's/PATTERN/REPLACEMENT/g' file.name
For multiple files processing - redirect list of files to sed:
echo "${filesList}" | xargs sed -i ...
You can use find to generate your list of files, and xargs to run sed over the result:
find . -type f -print | xargs sed -i 's/\.delay.*/.delay./'
find will generate a list of files contained in your current directory (., although you can of course pass a different directory), xargs will read that list and then run sed with the list of files as an argument.
Instead of find, which here generates a list of all files, you could use something like grep to generate a list of files that contain a specific term. E.g.:
grep -rl '\.delay' | xargs sed -i ...
For the part of the question where you want to only match and replace until the first ) and not include a second pair of (), here is how to change your regex:
.delay\(*.*\)\.
->
\.delay\([^\)]*\)
I.e. match "actual dot, delay, brace open, everything but brace close and brace close".
E.g. using sed:
>echo .delay([ANY OPTIONAL TEXT FOR DELETION]).sometext(param) | sed -E "s/\.delay\([^\)]*\)/.delay/"
.delay.sometext(param)
I recommend to use grep for finding the right files:
grep -rl --include "*.rb" '\.delay' .
Then feed the list into xargs, as recommended by other answers.
Credits to the other answers for providing a solution for feeding multiple files into sed.

sed for a string in only 1 line

What I want to do here is locate any file that contains a specific string in a specific line, and remove said line, not just the string.
What I have is something along the lines of this:
find / -type f -name '*.foo' -exec sed '1/stringtodetect/d' {} \;
However this will remove everything BETWEEN line 1 and the string. given that sed argument. (sed '1,/stringtodetect/d' "$file")
Lets say I have a .php file, and I'm looking for the string 'gotcha'.
I only want to edit the file if it has the string in the FIRST line of the file, like so:
gotcha with this.
gotcha
useful text
more text
dont delete me
If I ran the script, I'd want the contents of the same file to appear as such:
List item
List item
dont delete me
Any tips?
You are using the following range address for the delete command:
1,/stringtodelete/
This means all lines from line 1 until the first occurrence of stringtodelete.
Furthermore, you need not (and should not!) iterate over the results from find. find has the -exec option for that. It executes a command for each file which has been found, passing the filename as an argument.
It should be:
find / -type f -name '*.foo' -exec sed '/stringtodetect/d' {} \;
Test the command first. Once you are sure it works, use sed -i to modify the files in place. If you want a backup you can use sed -i.backup (for example). To remove the backups once you are sure you can use find again:
find / -type -name '*.foo.backup' -delete
You need a sed script that will skip any line by number that is not the one you are interested in, and only for the line you are interested in delete the line if it matches.
sed -e1bt -eb -e:t -e/string/d < $file
-e1bt = for line 1, branch to label "t"
-eb = branch unconditionally to the end of the script (at which point it will print the line).
-e:t = define label "t"
-e/string/d = delete the line if it contains "string" - this instruction will only be reached if the unconditional branch to the end of the script was NOT taken, i.e. if the line number branch WAS taken.
Could it be that it is matching parts of a string.
If you try exact match, it might help.
Also, remove the 1, at the beginning or replace it with 0,
sed '/<stringtodetect>/d' "$file";
sed is for simple substitutions on individual lines, that is all. For anything else just use awk for simplicity, clarity, robustness, portability and all of the other desirable attributes of software:
awk '!(NR==1 && /stringtodetect/)' file
You were close. I think what you're looking for is: sed '1{/gotcha/d;}'

Need help editing multiple files using sed in linux terminal

I am trying to do a simple operation here. Which is to cut a few characters from one file (style.css) do a find a replace on another file (client_custom.css) for more then 100 directories with different names
When I use the following command
for d in */; do sed -n 73p ~/assets/*/style.css | cut -c 29-35 | xargs -I :hex: sed -i 's/!BGCOLOR!/:hex:/' ~/assets/*/client_custom.css $d; done
It keeps giving me the following error for all the directories
sed: couldn't edit dirname/: not a regular file
I am confused on why its giving me that error message explicitly gave the full path to the file. It works perfectly fine without a for loop.
Can anyone please help me out with this issue?
sed doesn't support folders as input.
for d in */;
puts folders into $d. If you write sed ... $d, then BASH will put the folder name into the arguments of sed and the poor tool will be confused.
Also ~/assets/*/client_custom.css since this will expand to all the files which match this pattern. So sed will be called once with all file names. You probably want to invoke sed once per file name.
Try
for f in ~/assets/*/client_custom.css; do
... | sed -i 's/!BGCOLOR!/:hex:/' $f
done
or, even better:
for f in ~/assets/*/client_custom.css; do
... | sed 's/!BGCOLOR!/:hex:/' "${f}.in" > "${f}"
done
(which doesn't overwrite the output file). This way, you can keep the "*.in" files, edit them with the patterns and then use sed to "expand" all the variables.

How to remove multiple lines in multiple files on Linux using bash

I am trying to remove 2 lines from all my Javascript files on my Linux shared hosting. I wanted to do this without writing a script as I know this should be possible with sed. My current attempt looks like this:
find . -name "*.js" | xargs sed -i ";var
O0l='=sTKpUG"
The second line is actually longer than this but is malicious code so I have not included it here. As you guessed my server has been hacked so I need to clean up all these JavaScript files.
I forgot to mention that the output I am getting at the moment is:
sed: -e expression #1, char 4: expected newer version of sed
The 2 lines are just as follows consecutively:
;var
O0l='=sTKpUG
except that the second line is longer, but the rest of the second line should not influence the command.
He meant removing two adjacent lines.
you can do something like this, remember to backup your files.
find . -name "*.js" | xargs sed -i -e "/^;var/N;/^;var\nO0l='=sTKpUG/d"
Since sed processes input file line by line, it does not store the newline '\n' character in its buffer, so we need to tell it by using flag /N to append the next line, with newline character.
/^;var/N;
Then we do our pattern searching and deleting.
/^;var\nO0l='=sTKpUG/d
It really isn't clear yet what the two lines look like, and it isn't clear if they are adjacent to each other in the JavaScript, so we'll assume not. However, the answer is likely to be:
find . -name "*.js" |
xargs sed -i -e '/^distinctive-pattern1$/d' -e '/^alternative-pattern-2a$/d'
There are other ways of writing the sed script using a single command string; I prefer to use separate arguments for separate operations (it makes the script clearer).
Clearly, if you need to keep some of the information on one of the lines, you can use a search pattern adjusted as appropriate, and then do a substitute s/short-pattern// instead of d to remove the short section that must be removed. Similarly with the long line if that's relevant.

Resources