Remove 1 or multiple lines with pattern match? - text-editor

I'm trying to figure out how to edit a feeder txt file. Previously, I had been able to accomplish this by using Word's Replace function using Wildcards. But, the most recent feeder file seems to be too big to open in Word. So I'm having to find some other way to replace the text.
The file looks something like this:
VSTHDR|data|data|data|data
...
VSTPMTH|data|1|
CRDHLDR|data|data|data
ADDR|data|data|data
VSTPMTR|data|data
VSTPMTA|data|
VSTPMTA|data
VSTPMTH|data|2|
CRDHLDR|data|data|data
VSTPMTR|data|data
VSTPMTH|data|3|
VSTPMTR|data|data
VSTPMTA|data
...
VST...
...
ADDR|data|data|data
and repeat. For all but the last VSTPMTH, there is always a CRDHLDR line. Under CRDHLDR, there may or may not be an ADDR line. Then there is always a VSTPMTR. There may or may not be VSTPMTA lines. There will be more lines that start with VST before finally ending with another ADDR line before the next VSTHDR.
My goal is to remove all CRDHLDR lines, and any ADDR lines that immediately follow them. In Word, I was able to use replace all "CRDHLDR*VSTPMTR" with "VSTPMTR".
I thought I had it with
sed '/CRDHLDR/,/^[^V]/d'
but with that, if there wasn't an ADDR line immediately after, it would delete all of the VST lines following.
Another idea I had was to try taking any line that starts with ADDR and add it to the line before it, and then go back through to delete any CRDHLR lines, and then add a newline back in before any remaining ADDR. However, all the scripts I've found for combining lines seem to be restricted by a hold buffer size, which this file quickly exceeds. If you can think of a set of commands to try that maybe reduces the buffer use, I'll happily try it.
The closest I've been able to come to a solution so far has been to run:
sed '/CRDHLDR/,/VSTPMTR/d'
but that removes VSTPMTR which I don't want to delete. If I could get that to delete all but the last line of that selection (instead of the whole selection), that would be PERFECT.
I haven't seen any grep or awk solutions that seem quite right, but I'm willing to try any suggestions.

I think I found a two step answer:
sed '/CRDHLDR/,/VSTPMTR/ {ADDR/d}'
sed '/CRDHLDR/d'
The first line removes ADDR lines that are between CRDHLDR and VSTPMTR, the second line then removes all CRDHLDR lines.

Related

Simple way to remove multi-line string using sed

Using sed, is there a way to remove multiple lines from a text file based on some starting and ending expressions?
I have known markers in the file and want to remove everything between (markers inclusive). I have seen some really complicated solutions and I would like to do this without resorting to micro commands.
My file looks something like this:
cat /tmp/foobar.txt
this is line 1
this is line 3
tomcat.util.scan.StandardJarScanFilter.jarsToSkip=\
annotations-api.jar,\
ant-junit*.jar,\
ant-launcher.jar,\
ant.jar,\
asm-*.jar,\
aspectj*.jar,\
bootstrap.jar,\
catalina-ant.jar,\
catalina-ha.jar,\
catalina-ssi.jar,\
catalina-storeconfig.jar
the end leave me
and me
I want to remove everything starting at tomcat.util all the away to the last .jar
tldr;
I think this is the simplest way, ad no need for the assembly like micro commands
sed '/^tomcat\.util.*$/,/^.*[^\]$/d' /tmp/foobar.txt
which produces
this is line 1
this is line 3
the end leave me
and me
if you wanted to remove the lines in the file rather than spit out the output to stdout then use the inline flag, so
sed -i '/^tomcat\.util.*$/,/^.*[^\]$/d' /tmp/foobar.txt
So... how does this work?
sed commands, like vi commands operate on an address. Normally we don't specify an address and that simply applies the command to all lines of the file, eg when replacing the for that in a file we'd normally do
sed -i 's/the/that/g' /tmp/foobar.txt
ie applying the substitute or s command to all lines in the file.
In this case you want to delete some lines so we can use the delete or d command. But we need to tell it where to delete. So we need to give it an address.
The format of a sed command is
[addr][!]command[options]
(see the docs )
If no address is specified then the command is applied to all lines, if the ! is specified then it is applied to all lines that don't match the pattern. So far so good.
The trick here is that addr can be a single address or a range of addresses. The address can be a line number or a regex pattern. You use a , between two addresses to to specify a range.
so to delete line 5 to 8 inclusive you could do
sed -i '5,8d' /tmp/foobar.txt
in this case rather than knowing the line number we know some "markers" and we can use Regex instead, so the first marker, a line starting with tomcat.util is found by the regex
/^tomcat\.util.*$/
The second marker is a bit more tricky but if we look we can see that the final line to remove is the first one that does not end with a \, so we can match a line that consists of "anything but does not end with \"
/^.*[^\]$/
While the second marker could match a whole bunch of lines if we make a range out of these two regexes, the range means that the second "address" is the first line after the first address that matches the regex.
Putting that all together, we want to delete (d) all lines in the range from the address that is found by the regex matching a line starting with tomcat.util and ending with a line that does not end in \ ie
sed '/^tomcat\.util.*$/,/^.*[^\]$/d' /tmp/foobar.txt
hope that helps ;-)
Cheers
Karl
Awk is generally more useful than sed for anything spanning lines. Using any awk in any shell on every Unix box:
$ awk '!/\.jar/{f=0} /tomcat\.util/{f=1} !f' file
this is line 1
this is line 3
the end leave me
and me
This might work for you (GNU sed):
sed -n '/tomcat\.util/{:a;n;/\.jar/ba};p' file
Turn off implicit printing using the -n option.
Match on a line containing tomcat.util.
Continue fetching lines until such a line does not match one containing .jar.
Print all other lines.
Alternative:
sed -E '/tomcat\.util/{:a;$!N;/\.jar(,\\)?$/s/\n//;ta;D}' file
Gather up lines beginning tomcat.util and ending either .jar,\ or .jar, removing newlines until the end-of-file or a mis-match and then delete the collection.

How to add text before the first occurence of a character in Vim?

I have the text
af_ZA_work_013_A;135.300;150.203;Spreker-A;;;[no-speech] #mm
af_ZA_work_013_A;135.300;150.207;Spreker-B;;;[no-speech] #something
I want to add .wav before the first ; in each line, so I would get
af_ZA_work_013_A.wav;135.300;150.203;Spreker-A;;;[no-speech] #mm
af_ZA_work_013_A.wav;135.300;150.207;Spreker-B;;;[no-speech] #something
How can I do this?
s/search_regex/replace_regex/ will linewise execute your find and replace.
By default, this is done only on the current line, and only on the first match of search_regex on the current line.
Prepending % (%s/search/replace/) will execute your find and replace on all lines in the file, doing at most one replacement per line. You can give ranges (1,3s will execute on lines 1-3) or other line modifiers, but this isn't relevant here.
Appending g (s/search/replace/g) will do multiple replaces per line. Again, not relevant here, but useful for other scenarios.
You can search for ; and replace with .wav; (there are ways to keep the search term and add to it using capture groups but for one static character it's faster to just retype it).
TL;DR: :%s/;/.wav;/ does what you want.

Vim - sort the contents of a register before/after pasting it?

As part of a project of mine I'm trying to move certain lines from a file to the top, sorted in a certain fashion. I'm not sure how to do the sort once those lines are up there - I don't want to disturb the other lines in the file.
I'm moving them by yanking them and putting them back down, like so:
g:/pattern/yank A
g:/pattern/d
0put A
This moves all the lines I specify up to the top of the file like I need, but now I need to sort them according to a pattern, like so:
[range]sort r /pattern2/
Is there a way to sort the contents of a register before pasting it? Or a way to sort only lines which match /pattern/? (because all the yanked lines will, of course).
I'm stymied and help would be appreciated.
edit - a possible workaround might be to count the number of lines before they're yanked, and then use that to select and sort those lines once they're placed again. I'm not sure how to count those lines - I can print the number of lines that match a pattern with the command :%s/pattern//n but I can't do anything with that number, or use that in a function.
The whole point of :g/pattern/cmd is to execute cmd on every line matching pattern. cmd can, of course, be :sort.
In the same way you did:
:g/pattern/yank A
to append every line matching pattern to register a and:
:g/pattern/d
to cut every line matching pattern, you can do:
:g/pattern/sort r /pattern2/
to sort every line matching pattern on pattern2.
Your example is wasteful anyway. Instead of abusing registers with three commands you could simply do:
:g/pattern/m0
to move every line matching pattern to the top of the buffer before sorting them with:
:g//sort r /pattern2/
See :help :global, :help :sort, :help :move.
I know this is old, and may not be of any use to you anymore, but I just figured this one out today. It relies on the system's sort command (not vim's). Assuming you're saving to register A:
qaq
:g/pattern/yank A
<C-O>
:put=system('sort --stable --key=2,3',#A)
qaq: clears register A of anything
:g/pattern/yank A: searches current buffer for pattern and copies it to register A
<C-O>: pressing Ctrl+O in normal mode returns you to the last place your cursor was
:put=system('sort --stable --key=2,3',#A): sends the contents of register A to the sort command's STDIN and pastes the output to the current position of the cursor.
I mapped this whole thing to <F8>:
noremap <F8> qaq:g/pattern/yank A<CR><C-O>:put=system('sort --stable --key=2,3',#A)<CR>
I don't know how janky this is considered, cuz I'm a complete noob to vim. I spent hours today trying to figure this out. It works for me and I'm happy with it, hopefully it'll help someone else too.

searching elements of list in file

The list name is disk and its below:
disks
['5000cca025884d5\n', '5000cca025a1ee6\n']
The file name is p and its below:
c0t5000CCA025884D5Cd0 solaris
/scsi_vhci/disk#g5000cca025884d5c
c0t5000CCA025A1EE6Cd0
/scsi_vhci/disk#g5000cca025a1ee6c
c3t50060E8007DB981Ad1
/pci#400/pci#1/pci#0/pci#8/SUNW,emlxs#0/fp#0,0/ssd#w50060e8007db981a,1
c3t50060E8007DB981Ad2
/pci#400/pci#1/pci#0/pci#8/SUNW,emlxs#0/fp#0,0/ssd#w50060e8007db981a,2
c3t50060E8007DB981Ad3
/pci#400/pci#1/pci#0/pci#8/SUNW,emlxs#0/fp#0,0/ssd#w50060e8007db981a,3
c3t50060E8007DB981Ad4
i want to search elements of a list in file
There are a couple of things to look at here:
I haven't actually used re.match() before, but I can see the first issue: Your list of disks has a newline character after every entry, so that will mess up matches. Also, re.match() only matches from the start of the line. Your lines start with numbers, so you need to search during the line, using re.search(). Finally, you should make it case insensitive; one option to d this is to make everything lowercase just as your disks list is.
try adapting your loop as so:
#.strip() will get rid of new lines and .lower() will make the string lowercase
for line in q:
if re.search(disks[0].strip(),line.lower()):
print line
If that doesn't fix it, I would try making it print out disks[0].strip() and line for every iteration of the loop (not just when it matches the if clause) to make sure it's reading in what you think it is.

Delete all newline characters not followed by ‘£’ on the next line in Vim

I have a very large file, and I want to remove all newline characters at the end of each line, so to merge all, except if the line starts with the character £.
So, if I have this:
data1
data2
£data3
data4
data5
I would like to end up with this:
data1data2
£data3data4data5
I was thinking of something like
:%s/\n(but not \n£)//g
Any ideas?
Just remove all new lines, then add them again where they should be. Or use a negative look ahead, but this is simpler, easier, and more comprehensible to anyone.
s/\n//g
s/£/\n£/g
Solution offered by #pb2q will remove all newlines and a next character if this character is not a “£” or a newline (because collection doesn’t match a newline by default), while in your question you asked to remove only the newline. This can be fixed by either using \ze, or a negative look-ahead:
%s/\n\ze\_[^£]
%s/\n£\#!
Note some things: first, you can omit a replacement string if you want to delete some text (unless you need to have a substitution flags which you don’t in this case). Second, \_ adds newline to a collection. It can be also written as [^£\n], but I guess it is not the best thing you can do: any guy coming from some PCRE-capable language thinks about [^£\n] as “match anything except ‘£’ and newline”, while in Vim it is really “match anything (including newline) except ‘£’”.
I would use the following :global command:
:g/^[^£]/-j!
It goes through all the lines that start with any character but £,
going from top to bottom, and joins each of those lines with the
preceding one via the :join command.

Resources