Delete non-numerical values - Unix - linux

I have a file where I only need numbers [0-9]. I have this command sed 's/[^0-9]*//g' which deletes anything that's not [0-9] but I need to only delete things left of a ","
I have this now, but it isn't working how I'd think it should...
sed -ri "s/[^0-9]+\,/,/g"

As I said in the comment above, I don't understand why 's/[^0-9].\(.*,\)/\1/g' doesn't work, but there are alternatives.
We can use t (test) to do what I expected g to do:
sed -e:a -e 's/[^0-9].\(.*,\)/\1/;ta'
Or use the hold space (overkill, but it works):
sed 'h;s/.*,/,/;x;s/,.*//;s/[^0-9].//g;G;s/\n//'

Related

Conditional replace using sed

My question is probably rather simple. I'm trying to replace sequences of strings that are at the beginning of lines in a file. For example, I would like to replace any instance of the pattern "GN" with "N" or "WR" with "R", but only if they are the first 2 characters of that line. For example, if I had a file with the following content:
WRONG
RIGHT
GNOME
I would like to transform this file to give
RONG
RIGHT
NOME
I know i can use the following to replace any instance of the above example;
sed -i 's/GN/N/g' file.txt
sed -i 's/WR/R/g' file.txt
The issue is that I want this to happen only if the above patterns are the first 2 characters in any given line. Possibly an IF statement, although i'm not sure what the condition would look like. Any pointers in the right direction would be much appreciated, thanks.
just add the circumflex, remove g suffix (unnecessary, since you want at most one replacement), you can also combine them in one script.
sed -i 's/^GN/N/;s/^WR/R/' file.txt
Use the start-of-string regexp anchor ^:
sed -i 's/^GN/N/' file.txt
sed -i 's/^WR/R/' file.txt
Since sed is line-oriented, start-of-string == start-of-line.

A good way to use sed to find and replace characters with 2 delimiters

I trying to find and replace items using bash. I was able to use sed to grab out some of the characters, but I think I might be using it in the wrong matter.
I am basically trying to remove the characters after ";" and before "," including removing ","
sed -e 's/\(;\).*\(,\)/\1\2/'
That is what I used to replace it with nothing. However, it ends up replacing everything in the middle so my output came out like this:
cmd2="BMC,./socflash_x64 if=B600G3_BMC_V0207.ima;,reboot -f"
This is the original text of what I need to replace
cmd2="BMC,./socflash_x64 if=B600G3_BMC_V0207.ima;X,sleep 120;after_BMC,./run-after-bmc-update.sh;hba_fw,./hba_fw.sh;X,sleep 5;DB,2;X,reboot -f"
Is there any way to make it look like this output?
./socflash_x64 if=B600G3_BMC_V0207.ima;sleep 120;./run-after-bmc-update.sh;./hba_fw.sh;sleep 5;reboot -f
Ff there is any way to make this happen other than bash I am fine with any type of language.
Non-greedy search can (mostly) be simulated in programs that don't support it by replacing match-any (dot .) with a negated character class.
Your original command is
sed -e 's/\(;\).*\(,\)/\1\2/'
You want to match everything in between the semi-colon and the comma, but not another comma (non-greedy). Replace .* with [^,]*
sed -e 's/\(;\)[^,]*\(,\)/\1\2/'
You may also want to exclude semi-colons themselves, making the expression
sed -e 's/\(;\)[^,;]*\(,\)/\1\2/'
Note this would treat a string like "asdf;zxcv;1234,qwer" differently, since one would match ;zxcv;1234, and the other would match only ;1234,
In perl:
perl -pe 's/;.*?,/;/g;' -pe 's/^[^,]*,//' foo.txt
will output:
./socflash_x64 if=B600G3_BMC_V0207.ima;sleep 120;./run-after-bmc-update.sh;./hba_fw.sh;sleep 5;2;reboot -f
The .*? is non greedy matching before the comma. The second command is to remove from the beginning to the comma.
Something like:
echo $cmd2 | tr ';' '\n' | cut -d',' -f2- | tr '\n' ';' ; echo
result is:
./socflash_x64 if=B600G3_BMC_V0207.ima;sleep 120;./run-after-bmc-update.sh;./hba_fw.sh;sleep 5;2;reboot -f;
however, I thing your requirements are a few more complex, because 'DB,2' seems a particular case. After "tr" command, insert a "grep" or "grep -v" to include/exclude these cases.

SED replacing with 'possible' newline

I have a sed command that is working fine, except when it comes across a newline right in the file somewhere. Here is my command:
sed -i 's,\(.*\),\2 - \1,g'
Now, it works perfectly, but I just ran across this file that has the a tag like so:
<a href="link">Click
here now</a>
Of course it didn't find this one. So I need to modify it somehow to allow for lines breaks in the search. But I have no clue how to make it allow for that unless I go over the entire file first off and remove all \n before hand. Problem there is I loose all formatting in the file.
You can do this by inserting a loop into your sed script:
sed -e '/<a href/{;:next;/<\/a>/!{N;b next;};s,\(.*\),\2 - \1,g;}' yourfile
As-is, that will leave an embedded newline in the output, and it wasn't clear if you wanted it that way or not. If not, just substitute out the newline:
sed -e '/<a href/{;:next;/<\/a>/!{N;b next;};s/\n//g;s,\(.*\),\2 - \1,g;}' yourfile
And maybe clean up extra spaces:
sed -e '/<a href/{;:next;/<\/a>/!{N;b next;};s/\n//g;s/\s\{2,\}/ /g;s,\(.*\),\2 - \1,g;}' yourfile
Explanation: The /<a href/{...} lets us ignore lines we don't care about. Once we find one we like, we check to see if it has the end marker. If not (/<\a>/!) we grab the next line and a newline (N) and branch (b) back to :next to see if we've found it yet. Once we find it we continue on with the substitutions.
Here is a quick and dirty solution that assumes there will be no more than one newline in a link:
sed -i '' -e '/\(.*\),\2 - \1,g'
The first command (/<a href=.*>/{/<\/a>/!{N;s|\n||;};}) checks for the presence of <a href=...> without </a>, in which case it reads the next line into the pattern space and removes the newline. The second is yours.

sed to remove a user in svn access control list

Filename: stackgroup.acl
[groups]
stackoverflow=linus,steve,bill,adrian
stackexchange=charlie,darwin,carol,kelly
I need an sed code that could remove a user whether it's in the start of the line, or the end of it.
Here's what I got so far:
sed 's/\(.*=*\)linus,\(.*\)/\1\2/g'
sed 's/\(.*=*\)steve,\(.*\)/\1\2/g'
sed 's/\(.*=*\),adrian\(.*\)/\1\2/g'
as you can see, the middle one is fine, but the first and last user will leave an additional comma.
I even tried using regex:
sed 's/\(.*=*\),\?linus,\?\(.*\)/\1\2/g'
or
sed 's/\(.*=*\),*linus,*\(.*\)/\1\2/g'
but it's not working.
Can anyone help?
Use two expressions, the 2nd one takes care of the edge case where the name is directly after the =
#!/bin/bash
user="linus"
sed "s/,\?$user//;s/=,/=/" stackgroup.acl
This might work for you:
v1=charlie;v2=darwin;v3=kelly
sed s'/'"$v1"',\?\|,'"$v1"'$//' <<<"stackexchange=charlie,darwin,carol,kelly"
stackexchange=darwin,carol,kelly
sed s'/'"$v2"',\?\|,'"$v2"'$//' <<<"stackexchange=charlie,darwin,carol,kelly"
stackexchange=charlie,carol,kelly
sed s'/'"$v3"',\?\|,'"$v3"'$//' <<<"stackexchange=charlie,darwin,carol,kelly"
stackexchange=charlie,darwin,carol
sed s'/'"$v3"',\?\|,'"$v3"'$//' <<<"stackexchange=kelly"
stackexchange=

Removing Parts of String With Sed

I have lines of data that looks like this:
sp_A0A342_ATPB_COFAR_6_+_contigs_full.fasta
sp_A0A342_ATPB_COFAR_9_-_contigs_full.fasta
sp_A0A373_RK16_COFAR_10_-_contigs_full.fasta
sp_A0A373_RK16_COFAR_8_+_contigs_full.fasta
sp_A0A4W3_SPEA_GEOSL_15_-_contigs_full.fasta
How can I use sed to delete parts of string after 4th column (_ separated) for each line.
Finally yielding:
sp_A0A342_ATPB_COFAR
sp_A0A342_ATPB_COFAR
sp_A0A373_RK16_COFAR
sp_A0A373_RK16_COFAR
sp_A0A4W3_SPEA_GEOSL
cut is a better fit.
cut -d_ -f 1-4 old_file
This simply means use _ as delimiter, and keep fields 1-4.
If you insist on sed:
sed 's/\(_[^_]*\)\{4\}$//'
This left hand side matches exactly four repetitions of a group, consisting of an underscore followed by 0 or more non-underscores. After that, we must be at the end of the line. This is all replaced by nothing.
sed -e 's/\([^_]*\)_\([^_]*\)_\([^_]*\)_\([^_]*\)_.*/\1_\2_\3_\4' infile > outfile
Match "any number of not '_'", saving what was matched between \( and \), followed by '_'. Do this 4 times, then match anything for the rest of the line (to be ignored). Substitute with each of the matches separated by '_'.
Here's another possibility:
sed -E -e 's|^([^_]+(_[^_]+){3}).*$|\1|'
where -E, like -r in GNU sed, turns on extended regular expressions for readability.
Just because you can do it in sed, though, doesn't mean you should. I like cut much much better for this.
AWK likes to play in the fields:
awk 'BEGIN{FS=OFS="_"}{print $1,$2,$3,$4}' inputfile
or, more generally:
awk -v count=4 'BEGIN{FS="_"}{for(i=1;i<=count;i++){printf "%s%s",sep,$i;sep=FS};printf "\n"}'
sed -e 's/_[0-9][0-9]*_[+-]_contigs_full.fasta$//g'
Still the cut answer is probably faster and just generally better.
Yes, cut is way better, and yes matching the back of each is easier.
I finally got a match using the beginning of each line:
sed -r 's/(([^_]*_){3}([^_]*)).*/\1/' oldFile > newFile

Resources