sed - search and replace between 2 marks - string

I want to search a String and modify it
I build a String like that:
|Bor.Team1-FCTeam2|
but also:
|FCTeam2-Bor.Team1|
or as pattern
|Text–Text|
I want to change the Name of Team1, it can have different Names eg:
Bor.Team1
B.Team1
BTeam091
...
I want everytime the same name -> B.Team1
Team is everytime IN the different diction!
I play arround with sed 's/\bBor.Team1\b/B.Team1/g', but I must find and know every different diction.

That shoudl do what you need:
echo $string | sed -e 's/[a-zA-Z\.]\+Team\([0-9]\+\)/B.Team\1/g'
It searches for alphabetic characters and also a dot (.) followed by the word Team, followed by a number ([0-9]). That all will be replaced by whatever you want. In the example it's B.Team followed by the found number (\1).

find and replace any something Team1 d replace the whole by B.Team1 and keep counter part the same in each | | surrounded content
sed -e 's/\|[^-|]*Team1[^-|]*-\([^|]*\)|/|B.Team1-\1|/g
s/\|\([^-|]\)-[^|]*Team1[^|]*|/|\1-B.Team1|/g' YourFile
ex:
|bla.team1.corp-other| -> |B.team1-other|
|ot.her-foolteam1| -> |ot.her-B.team1|

Related

How to use awk command to remove word "a" not character 'a' in a text file?

I tried to use awk '{$0 = tolower($0);gsub(/a|an|is|the/, "", $0);}' words.txt
but it also replaced a in words like day.I only want to delete word a.
for example:
input: The day is sunny the the the Sunny is is
expected output:day sunny
Using GNU awk and built-in variable RT:
$ echo this is a test and nothing more |
awk '
BEGIN {
RS="[ \n]+"
a["a"]
a["an"]
a["is"]
a["the"]
}
(tolower($0) in a==0) {
printf "%s%s",$0, RT
}'
this test and nothing more
However, post some sample data with expected output for more specific answers.
you need to define word boundary to eliminate partial matches
$ echo "This is a sunny day, that is it." |
awk '{$0=tolower($0); gsub(/\y(is|it|a|this)\y/,"")}1'
will print
sunny day, that .
you can eliminate punctuation signs as well by either adding them to field delimiters or to the gsub words.
Following awk may help you in same.
Condition 1st: Considering you want to only remove words like a, the and is here, you could edit my code and add more words too as per your need.
awk '{
for(i=1;i<=NF;i++){
if(tolower($i)=="a" || tolower($i)=="the" || tolower($i)=="is"){
$i=""
}
};
}
1' Input_file
Condition 2nd: In case you want to remove words like a, the and is and you want to remove duplicate fields too from lines then following may help you(this has come by seeing your example output shown in comments above):
awk '{
for(i=1;i<=NF;i++){
if(tolower($i)=="a" || tolower($i)=="the" || tolower($i)=="is" || ++a[tolower($i)]>1){
$i=""
}
};
}
1' Input_file
NOTE: Since I am nullifying the fields so I am considering that you are fine with little improper space in between the line.
You need to an expression where the word is delimited by something (you need to decide what delimits your words. For example, do numbers delimit the word or are a part of the word, for example, a4?) So the expression could be, for example, /[^:alphanum:](a|an|is|the)[^:alphanum:]/.
Note however that these expressions will match the word AND the delimiters. Use capture feature to deal with this problem.
It looks like your "words.txt" containts just one word per line, so the expression should be delimited by beginning and end of line, like /^a$/

Repeat a number in string twice using sed command

Let's say I have hell0 w0rld, I want it to become hell00 w0rld.
I tried sed s/0/00/, but that only replaces 0, it wouldn't work for he1lo wor1d(he11lo wor1d), what can I do so that it replaces any first digit, instead of just 0?
Since you don't want to match just 0, but any digit, you want to use [0-9]. This stands for "any one of the digits 0-9". You put this in parentheses to "capture" it, and in the replacement string, you can add backrefences:
$ sed 's/\([0-9]\)/\1\1/' <<< "he1lo wor1d"
he11lo wor1d
If you want to repeat the first number (as per the title) and not just digit, you append \+ to your character class. This stands for "one or more of these":
$ sed 's/\([0-9]\+\)/\1\1/' <<< "he12o wor1d"
he1212o wor1d
An alternative to the backreferences \1, which match the capture group /(.../), would be to use &, which stands for the complete match, i.e.,
sed 's/[0-9]/&&/' <<< "he1lo wor1d"
and
sed 's/[0-9]\+/&&/' <<< "he12lo wor1d"
where the /(.../) are not needed any longer.

Delete none AGTC charachter in a text file

I have a Text file and it should contains A,G,C,T characters. However it sometimes has some unknown characters (very few) which I want to delete and if it is N replace it with A. Also I want to escape the lines which starts with a > symbol.
So far I know only how to replace N with A, which I do like this :
sed "s/N/A/g" file1.fa >file2.fasta
But I don't know how to do the first task.
Example :
Initial file
first line
AGCCCMCCCN
Target file should be like this
first line
AGCCCCCCA
Any help will be appreciate. Thanks in advance!
You can do another substitution on your sed
sed -e 's/N/A/g' -e 's/[^AGCT>]//g' -e 's/^>/\\>/' -e 's/[^\]>//g' file1.fa > file2.fasta
Pattern 1
-e 's/N/A/g'
Your pattern, replaces all instances of N with A first of all.
Pattern 2
-e 's/[^AGCT>]//g'
Secondly replace all characters that aren't A, G, C, T or > with nothing.
Pattern 3
-e 's/^>/\\>/'
Then replace all instances of > that are at the start of a string with \>
Pattern 4
-e 's/[^\]>//g'
Finally remove all > characters that aren't preceded by a \

sed - pass match to external command

I have written a little script using sed to transform this:
kaefert#Ultrablech ~ $ cat /sys/class/power_supply/BAT0/uevent
POWER_SUPPLY_NAME=BAT0
POWER_SUPPLY_STATUS=Full
POWER_SUPPLY_PRESENT=1
POWER_SUPPLY_TECHNOLOGY=Li-ion
POWER_SUPPLY_CYCLE_COUNT=0
POWER_SUPPLY_VOLTAGE_MIN_DESIGN=7400000
POWER_SUPPLY_VOLTAGE_NOW=8370000
POWER_SUPPLY_POWER_NOW=0
POWER_SUPPLY_ENERGY_FULL_DESIGN=45640000
POWER_SUPPLY_ENERGY_FULL=44541000
POWER_SUPPLY_ENERGY_NOW=44541000
POWER_SUPPLY_MODEL_NAME=UX32-65
POWER_SUPPLY_MANUFACTURER=ASUSTeK
POWER_SUPPLY_SERIAL_NUMBER=
into a csv file format like this:
kaefert#Ultrablech ~ $ Documents/Asus\ Zenbook\ UX32VD/power_to_csv.sh
"date";"status";"voltage µV";"power µW";"energy full µWh";"energy now µWh"
2012-07-30 11:29:01;"Full";8369000;0;44541000;44541000
2012-07-30 11:29:02;"Full";8369000;0;44541000;44541000
2012-07-30 11:29:04;"Full";8369000;0;44541000;44541000
... (in a loop)
What I would like now is to divide each of those numbers by 1.000.000 so that they don't represent µV but V and W instead of µW, so that they are easily interpretable on a quick glance. Of course I could do this manually afterwards once I've opened this csv inside libre office calc, but I would like to automatize it.
So what I found is, that I can call external programs in between sed, like this:
...
s/\nPOWER_SUPPLY_PRESENT=1\nPOWER_SUPPLY_TECHNOLOGY=Li-ion\nPOWER_SUPPLY_CYCLE_COUNT=0\nPOWER_SUPPLY_VOLTAGE_MIN_DESIGN=7400000\nPOWER_SUPPLY_VOLTAGE_NOW=\([0-9]\{1,\}\)/";'`echo 0`'\1/
and that I could get values like I want by something like this:
echo "scale=6;3094030/1000000" | bc | sed 's/0\{1,\}$//'
But the problem now is, how do I pass my match "\1" into the external command?
If you are interested in looking at the full script, you'll find it there:
http://koega.no-ip.org/mediawiki/index.php/Battery_info_to_csv
if your sed is GNU sed. you can use 'e' to pass matched group to external command/tools within sed command.
an example might be helpful to make it clear:
say, you have a problem:
you have a string "120+20foobar" now you want to get the calculation result of 120+20 part, and replace "oo" to "xx" in "foobar" part.
Note that this example is not for solving the problem above, just for
showing the sed 'e' usage
so you could make 120+20 in the first match group, and rest in 2nd group, then pass two groups to different command/tools and then get the result. like:
kent$ echo "100+20foobar"|sed -r 's#([0-9+]*)(.*)#echo \1 \|bc\;echo \2 \| sed "s/oo/xx/g"#ge'
120
fxxbar
in this way, you could nest many seds one in another one, till you get lost. :D
As sed doesn't do arithmetic on its own I would recommend using awk for something like this, e.g. to divide 3rd, 5th and 6th field by a million do something like this:
awk -F';' -v OFS=';' '
NR == 1
NR != 1 {
$3 /= 1e6
$5 /= 1e6
$6 /= 1e6
print
}'
Explanation
-F';' and -v OFS=';' specify the input and output field separator.
NR == 1 pass first line through without change.
NR != 1 if it is not the first line, divide and print.
To divide by 1,000,000 directly, you do so :
Q='3094030/1000000'
sed ':r /^[[:digit:]]\{7\}/{s$\([[:digit:]]*\)\([[:digit:]]\{6\}\)/1000000$\1.\2$;p;d};s:^:0:;br;d'

sed split single line file and process resulting lines

I have an XML feed (this) in a single line so to extract the data I need I can do something like this:
sed -r 's:<([^>]+)>([^<]+)</\1>:&\n: g' feed | sed -nr '
/<item>/, $ s:.*<(title|link|description)>([^<]+)</\1>.*:\2: p'
since I can't find a way to make first sed call to process result as different lines.
Any advice?
My goal is to get all data I need in a single sed call
sed -rn -e 's|>[[:space:]]*<|>\n<|g
/^<title>/ { bx }
/^<description>/ { b x }
/^<link>/ { bx }
D
:x
s|<([^>]*)>([^\n]*)</\1>|\1=\2|;
P
D' rss.xml
New answer to new question. Now with branches and outputing all three chunks of information.
sed -rn -e 's|>[[:space:]]*<|>\n<|g # Insert newlines before each element
/^[^<]/ D # If not starting with <, delete until 1st \n and restart
/^<[^t]/ D # If not starting with <t, ""
/^<t[^i]/ D # If not starting with <ti, ""
/^<ti[^t]/ D
/^<tit[^l]/ D
/^<titl[^e]/ D
/^<title[^>]/ D # If not starting with <title>, delete until 1st \n and restart
s|^<title>|| # Delete <title>
s|</title>[^\n]*|| # Delete </title> and everything after it until the newline
P # Print everything up to the first newline
D' rss.xml # Delete everything up to the first newline and restart
By "restart" I mean go back to the top of the sed script and pretend we just read whatever is left.
I learned a lot about sed writing this. However, there is zero question that you really should be doing this in perl (or awk if you are old school).
In perl, this would be perl -pe 's%.*?<title>(.*?)</title>(?:.*?(?=<title>)|.*)%$1\n%g' rss.xml
Which is basically taking advantage of the minimal match (.*? is non-greedy, it will match the fewest number of character possible). The positive lookahead thing at the end is just so that I could do it in one s expression while still deleting everything at the end. There is more than one way…
If you needed multiple tags out of this xml file, it probably is still possible, but would probably involve branching and the like.
What about this:
sed -nr 's|>[[:space:]]*<|>\n<|g
h
/^<(title|link|description)>/ {
s:<([^>]+)>([^<]+)</\1>:\2: P
}
g
D
' feed

Resources