2g is not working in sed to skip the first occurrence of a match - linux

I am trying to replace some text in a xml file using sed. I am able to replace my text, but i want to skip the first occurence. i am using 2g, but it is not working. No error is displayed, but no change happens to file.
My Xml file :
<file-min-size>10830</file-min-size>
<rotate-log>true</rotate-log>
<file-min-size>25600</file-min-size>
<rotate-log>true</rotate-log>
<file-min-size>32300</file-min-size>
<rotate-log>true</rotate-log>
<file-min-size>13456</file-min-size>
<rotate-log>true</rotate-log>
My expected output :
<file-min-size>10830</file-min-size>
<rotate-log>true</rotate-log>
<file-min-size>25600</file-min-size>
<rotate-log>true insertvalue</rotate-log>
<file-min-size>32300</file-min-size>
<rotate-log>true insertvalue</rotate-log>
<file-min-size>13456</file-min-size>
<rotate-log>true insertvalue</rotate-log>
I am using the below sed command.
sed -i 's#</rotate-log>#insertvalue</rotate-log>#2g' myfile.xml
The above command is not working. if i remove 2g, then the text is repalcing. i want to skip the first occurence. Any help ?
Also when i run the command second time, the values are entering again. Is there a way to check and replace only if not available ?

With GNU sed, you may use
sed -i '/<\/rotate-log>/{:A;n;s#</rotate-log># insertvalue</rotate-log>#;bA}' file
See the online sed demo
The command finds the line with </rotate-log> and then
:A - sets a label A
n - discards the current pattern space value and reads the next line into it
s#</rotate-log># insertvalue</rotate-log># - replaces </rotate-log> with # insertvalue</rotate-log>
bA - goes to A label (reads the next line, replaces, goes on).

Related

grep specific part out of a line of text

this is my first question here so please bear with me.
I have a large text file from which I need only one specific part of one line. I can grep the line but I do not know how I can get that specific part out of that line.
here is my text line (stored in output.txt)
><source src="https://download.foobar.com/content/mp4/web01/2017/05/08/24599/mp4_web01.mp4" type="video/mp4" data-label="Laag - 360p" /><source src="https://download.foobar.com/content/mp4/web02/2017/05/08/24599/mp4_web02.mp4" type="video/mp4" data-label="Hoog - 720p" /><source src="https://download.foobar.com/content/mp4/web03/2017/05/08/24599/mp4_web03.mp4" type="video/mp4" data-label="Normaal - 480p" /></video></div></div>
the part I need to extract from this line is:
https://download.foobar.com/content/mp4/web02/2017/05/08/24599/mp4_web02.mp4
Now I can do a grep like this but that gives me back three lines:
grep -Po '><source src="\K[^"]+' output.txt
gives me:
https://download.omroep.nl/nos/content/mp4/web01/2017/05/08/24599/mp4_web01.mp4
https://download.omroep.nl/nos/content/mp4/web02/2017/05/08/24599/mp4_web02.mp4
https://download.omroep.nl/nos/content/mp4/web03/2017/05/08/24599/mp4_web03.mp4
I would like to get only the line I am looking for without making the extra sed command to remove the first and third line of the results.
How can I grep the input line and only get back the intended line. I only need the link to the mp4_web02.mp4 file.
Can anyone help me get this into one grep command?

vi keep only first 10 characters of a column

how do i do this in vi?
awk -F"," awk '{print substr($1,1,10)}'
I only want to keep the first 10 characters of my date column (example 2014-01-01) and not include the timestamp.
I tried to do it in awk but i got this error:
sed: RE error: illegal byte sequence
I believe it is a bash_profile setting error.
This is what i have in my bash_profile:
#export LANG=en_US.UTF-8
#export LOCALE=UTF-8
export LC_CTYPE=C
export LANG=C
in vim, do:
:%norm! 11|D
this will affect all lines in your buffer.
If you like, :s could do this job too.
:%s/.\{,10}\zs.*//
:%s/: apply the substitution to all the lines
.\{,10}: match anything up to 10 times (greedy)
\zs: indicates the beginning of the match
.*: match the rest of the line
/: end of the first part of :s
/: end of the second part of s (since there's nothing between the two /, replace with nothing, ie delete)
For editing blocks of text there is a -- VISUAL BLOCK -- mode accessible via CTRL-V (on Windows ussually CTRL-Q). Then you can press d to delete your selection.
Or with a simple substitute command
:%s/\%>10c.*//
\%>10c - matches after tenth column
. - matches any single character but not an end-of-line
* - matches 0 or more of the preceding atom, as many as possible
Or you can use range
:1,3s/\%>10c.*//
This would substitute for the first three lines.

Find entry in .netrc file via bash and delete it if exists

In bash, how do I search for the following string in a file ~/.netrc and delete that line plus the next two lines if found:
machine api.mydomain.com
Example is:
machine api.mydomain.com
user foo
password bar
It should delete all three lines, but I can't match user and password since those are unknown. The only fixed value is machine api.mydomain.com.
Try:
sed -i '' '/^machine api.mydomain.com$/{N;N;d;}' ~/.netrc
When this finds the line machine api.mydomain.com, it reads in two more lines and then deletes them all. Other lines pass through unchanged.
For GNU sed, the argument to -i is optional. For OSX (BSD) sed, the argument is required but is allowed to be empty as shown above.
Let's google it together - sed or awk: delete n lines following a pattern
So, the answer is sed -e '/machine api.mydomain.com/,+2d' ~/.netrc. Add -i flag if changes need to be done in place.

How do I write a sed script to grep information from a text file

I'm trying to do my homework that is restricted to only using sed to filter an input file to a certain format of output. Here is the input file (named stocks):
Symbol;Name;Volume
================================================
BAC;Bank of America Corporation Com;238,059,612
CSCO;Cisco Systems, Inc.;28,159,455
INTC;Intel Corporation;22,501,784
MSFT;Microsoft Corporation;23,363,118
VZ;Verizon Communications Inc. Com;5,744,385
KO;Coca-Cola Company (The) Common;3,752,569
MMM;3M Company Common Stock;1,660,453
================================================
And the output needs to be:
BAC, CSCO, INTC, MSFT, VZ, KO, MMM
I did come up with a solution, but it's not efficient. Here is my sed script (named try.sed):
/.*;.*;[0-9].*/ { N
N
N
N
N
N
s/\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*/\1, \2, \3, \4, \5, \6, \7/gp
}
The command that I run on shell is:
$ sed -nf try.sed stocks
My question is, is there a better way of using sed to get the same result? The script I wrote only works with 7 lines of data. If the data is longer, I need to re-modify my script. I'm not sure how I can make it any better, so I'm here asking for help!
Thanks for any recommendations.
One more way using sed:
sed -ne '/^====/,/^====/ { /;/ { s/;.*$// ; H } }; $ { g ; s/\n// ; s/\n/, /g ; p }' stocks
Output:
BAC, CSCO, INTC, MSFT, VZ, KO, MMM
Explanation:
-ne # Process each input line without printing and execute next commands...
/^====/,/^====/ # For all lines between these...
{
/;/ # If line has a semicolon...
{
s/;.*$// # Remove characters from first semicolon until end of line.
H # Append content to 'hold space'.
}
};
$ # In last input line...
{
g # Copy content of 'hold space' to 'pattern space' to work with it.
s/\n// # Remove first newline character.
s/\n/, /g # substitute the rest with output separator, comma in this case.
p # Print to output.
Edit: I've edited my algorithm, since I had neglected to consider the header and footer (I thought they were just for our benefit).
sed, by its design, accesses every line of an input file, and then performs expressions on ones that match some specification (or none). If you're tailoring your script to a certain number of lines, you're definitely doing something wrong! I won't write you a script since this is homework, but the general idea for one way to go about it is to write a script that does the following. Think of the ordering as the order things should be in a script.
Skip the first three lines using d, which deletes the pattern space and immediately moves on to the next line.
For each line that isn't a blank line, do the following steps. (This would all be in a single set of curly braces.)
Replace everything after and including the first semicolon (;) with a comma-and-space (", ") using the s (substitute) command.
Append the current pattern space into the hold buffer (look at H).
Delete the pattern space and move on to the next line, like in step 1.
For each line that gets to this point in the script (should be the first blank line), retrieve the contents of the hold space into the pattern space. (This would be after the curly braces above.)
Substitute all newlines in the pattern space with nothing.
Next, substitute the last comma-and-space in the pattern space with nothing.
Finally, quit the program so you don't process any more lines. My script worked without this, but I'm not 100% sure why.
That being said, that's just one way to go about it. sed often offers varying ways of varying complexity to accomplish a task. A solution I wrote with this method is 10 lines long.
As a note, I don't bother suppressing printing (with -n) or manually printing (with p); each line is printed by default. My script runs like this:
$ sed -f companies.sed companies
BAC, CSCO, INTC, MSFT, VZ, KO, MMM
This sed command should produce your required output:
sed -rn '/[0-9]+$/{s/^([^;]*).*$/\1/p;}' file.txt
OR on Mac:
sed -En '/[0-9]+$/{s/^([^;]*).*$/\1/p;}' file.txt
This might work for you:
sed '1d;/;/{s/;.*//;H};${g;s/.//;s/\n/, /g;q};d' stocks
We don't want the headings so let's delete them. 1d
All data items are delimited by ;'s so let's concentrate on those lines. /;/
Of the things above delete everything from the first ; to the end of line and then stuff it away in the the hold space (HS) {s/;.*//;H}
When you get to the last line, overwrite it with the HS using the g command, delete the first newline (generated by the H command), replace all subsequent newlines with a comma and a space and print out what's left. ${g;s/.//;s/\n/, /g;q}
Delete everything else d
Here's a terminal session showing the incremental refinement of building a sed command:
cat <<! >stock # paste the file into a here doc and pass it on to a file
> Symbol;Name;Volume
> ================================================
>
> BAC;Bank of America Corporation Com;238,059,612
> CSCO;Cisco Systems, Inc.;28,159,455
> INTC;Intel Corporation;22,501,784
> MSFT;Microsoft Corporation;23,363,118
> VZ;Verizon Communications Inc. Com;5,744,385
> KO;Coca-Cola Company (The) Common;3,752,569
> MMM;3M Company Common Stock;1,660,453
>
> ================================================
> !
sed '1d;/;/!d' stock # delete headings and everything but data lines
BAC;Bank of America Corporation Com;238,059,612
CSCO;Cisco Systems, Inc.;28,159,455
INTC;Intel Corporation;22,501,784
MSFT;Microsoft Corporation;23,363,118
VZ;Verizon Communications Inc. Com;5,744,385
KO;Coca-Cola Company (The) Common;3,752,569
MMM;3M Company Common Stock;1,660,453
sed '1d;/;/{s/;.*//p};d' stock # delete all non essential data
BAC
CSCO
INTC
MSFT
VZ
KO
MMM
sed '1d;/;/{s/;.*//;H};${g;l};d' stock # use the l command to see what's really there!
\nBAC\nCSCO\nINTC\nMSFT\nVZ\nKO\nMMM$
sed '1d;/;/{s/;.*//;H};${g;s/.//;s/\n/, /g;l};d' stock # refine refine
BAC, CSCO, INTC, MSFT, VZ, KO, MMM$
sed '1d;/;/{s/;.*//;H};${g;s/.//;s/\n/, /g;q};d' stock # all done!
BAC, CSCO, INTC, MSFT, VZ, KO, MMM

How can I replace a specific line by line number in a text file?

I have a 2GB text file on my linux box that I'm trying to import into my database.
The problem I'm having is that the script that is processing this rdf file is choking on one line:
mismatched tag at line 25462599, column 2, byte 1455502679:
<link r:resource="http://www.epuron.de/"/>
<link r:resource="http://www.oekoworld.com/"/>
</Topic>
=^
I want to replace the </Topic> with </Line>. I can't do a search/replace on all lines but I do have the line number so I'm hoping theres some easy way to just replace that one line with the new text.
Any ideas/suggestions?
sed -i yourfile.xml -e '25462599s!</Topic>!</Line>!'
sed -i '25462599 s|</Topic>|</Line>|' nameoffile.txt
The tool for editing text files in Unix, is called ed (as opposed to sed, which as the name implies is a stream editor).
ed was once intended as an interactive editor, but it can also easily scripted. The way ed works, is that all commands take an address parameter. The way to address a specific line is just the line number, and the way to change the addressed line(s) is the s command, which takes the same regexp that sed would. So, to change the 42nd line, you would write something like 42s/old/new/.
Here's the entire command:
FILENAME=/path/to/whereever
LINENUMBER=25462599
ed -- "${FILENAME}" <<-HERE
${LINENUMBER}s!</Topic>!</Line>!
w
q
HERE
The advantage of this is that ed is standardized, while the -i flag to sed is a proprietary GNU extension that is not available on a lot of systems.
Use "head" to get the first 25462598 lines and use "tail" to get the remaining lines (starting at 25462601). Though... for a 2GB file this will likely take a while.
Also are you sure the problem is just with that line and not somewhere previous (ie. the error looks like an XML parse error which might mean the actual problem is someplace else).
My shell script:
#!/bin/bash
awk -v line=$1 -v new_content="$2" '{
if (NR == line) {
print new_content;
} else {
print $0;
}
}' $3
Arguments:
first: line number you want change
second: text you want instead original line contents
third: file name
This script prints output to stdout then you need to redirect. Example:
./script.sh 5 "New fifth line text!" file.txt
You can improve it, for example, by taking care that all your arguments has expected values.

Resources