remove all empty lines from text files while keeping format - linux

I have multiple notepad text files which contains one empty line (the last line of each file). I want to delete the empty line form all files. I tried different grep and awk lines but they didn't work plus they messed up the file format; all text are shown on one line instead of separate line. i also tried with notepad++ regex to find ^\s*$ and replace it with nothing, but it also didn't work.
Current text file looks like this:
apples
oranges
peaches
[empty line]
The output should be
apples
oranges
peaches

Ctrl+H
Find what: \R^$
Replace with: LEAVE EMPTY
check Wrap around
check Regular expression
Replace all
Explanation:
\R : any kind of linebreak
^ : begining of line
$ : end of line
Result for given example:
apples
oranges
peaches

If you want to delete the last line of the file, use sed '$d'. If you want to do that only when the last line is empty, use sed '${/^$/d;}' (This treats a line with some whitespace as a non-blank line, so you might prefer sed '${/^ *$/d;}' or some variant.

The "empty last line" may be a matter of interpretation. From the wikipedia "Newline" article:
Two ways to view newlines, both of which are self-consistent, are that newlines either separate lines or that they terminate lines. If a newline is considered a separator, there will be no newline after the last line of a file. Some programs have problems processing the last line of a file if it is not terminated by a newline. On the other hand, programs that expect newline to be used as a separator will interpret a final newline as starting a new (empty) line. Conversely, if a newline is considered a terminator, all text lines including the last are expected to be terminated by a newline. If the final character sequence in a text file is not a newline, the final line of the file may be considered to be an improper or incomplete text line, or the file may be considered to be improperly truncated.
In my little world, the Visual Studio Code editor takes the former view; vim the latter.

Related

Simple way to remove multi-line string using sed

Using sed, is there a way to remove multiple lines from a text file based on some starting and ending expressions?
I have known markers in the file and want to remove everything between (markers inclusive). I have seen some really complicated solutions and I would like to do this without resorting to micro commands.
My file looks something like this:
cat /tmp/foobar.txt
this is line 1
this is line 3
tomcat.util.scan.StandardJarScanFilter.jarsToSkip=\
annotations-api.jar,\
ant-junit*.jar,\
ant-launcher.jar,\
ant.jar,\
asm-*.jar,\
aspectj*.jar,\
bootstrap.jar,\
catalina-ant.jar,\
catalina-ha.jar,\
catalina-ssi.jar,\
catalina-storeconfig.jar
the end leave me
and me
I want to remove everything starting at tomcat.util all the away to the last .jar
tldr;
I think this is the simplest way, ad no need for the assembly like micro commands
sed '/^tomcat\.util.*$/,/^.*[^\]$/d' /tmp/foobar.txt
which produces
this is line 1
this is line 3
the end leave me
and me
if you wanted to remove the lines in the file rather than spit out the output to stdout then use the inline flag, so
sed -i '/^tomcat\.util.*$/,/^.*[^\]$/d' /tmp/foobar.txt
So... how does this work?
sed commands, like vi commands operate on an address. Normally we don't specify an address and that simply applies the command to all lines of the file, eg when replacing the for that in a file we'd normally do
sed -i 's/the/that/g' /tmp/foobar.txt
ie applying the substitute or s command to all lines in the file.
In this case you want to delete some lines so we can use the delete or d command. But we need to tell it where to delete. So we need to give it an address.
The format of a sed command is
[addr][!]command[options]
(see the docs )
If no address is specified then the command is applied to all lines, if the ! is specified then it is applied to all lines that don't match the pattern. So far so good.
The trick here is that addr can be a single address or a range of addresses. The address can be a line number or a regex pattern. You use a , between two addresses to to specify a range.
so to delete line 5 to 8 inclusive you could do
sed -i '5,8d' /tmp/foobar.txt
in this case rather than knowing the line number we know some "markers" and we can use Regex instead, so the first marker, a line starting with tomcat.util is found by the regex
/^tomcat\.util.*$/
The second marker is a bit more tricky but if we look we can see that the final line to remove is the first one that does not end with a \, so we can match a line that consists of "anything but does not end with \"
/^.*[^\]$/
While the second marker could match a whole bunch of lines if we make a range out of these two regexes, the range means that the second "address" is the first line after the first address that matches the regex.
Putting that all together, we want to delete (d) all lines in the range from the address that is found by the regex matching a line starting with tomcat.util and ending with a line that does not end in \ ie
sed '/^tomcat\.util.*$/,/^.*[^\]$/d' /tmp/foobar.txt
hope that helps ;-)
Cheers
Karl
Awk is generally more useful than sed for anything spanning lines. Using any awk in any shell on every Unix box:
$ awk '!/\.jar/{f=0} /tomcat\.util/{f=1} !f' file
this is line 1
this is line 3
the end leave me
and me
This might work for you (GNU sed):
sed -n '/tomcat\.util/{:a;n;/\.jar/ba};p' file
Turn off implicit printing using the -n option.
Match on a line containing tomcat.util.
Continue fetching lines until such a line does not match one containing .jar.
Print all other lines.
Alternative:
sed -E '/tomcat\.util/{:a;$!N;/\.jar(,\\)?$/s/\n//;ta;D}' file
Gather up lines beginning tomcat.util and ending either .jar,\ or .jar, removing newlines until the end-of-file or a mis-match and then delete the collection.

How to search for a word but replace characters in the line above in gvim

In the code below, I want to replace/remove the , from the line above .VSS(VSS).
It is at multiple places in the file. I have basic knowledge of gvim and I could not figure out how to just search and then pipe it with replace.
ANTENNABWP7THVT ANTENNABWP7THVT_spr_gate156 (
**.I(LTIELO_NET),
.VSS(VSS),**
.VDD(VDD));
Matching each line that contains .VSS(VSS) and doing something with it can be done with :global. You then want to address the line above it; that's a :help :range: .-1 (or short -1). And removal of a (all with the /g flag) comma can be done with plain :substitute. Taken together:
:global/\.VSS(VSS)/-1substitute/,//

How to echo/print actual file contents on a unix system

I would like to see the actual file contents without it being formatted to print. For example, to show:
\n0.032,170\n0.034,290
Instead of:
0.032,170
0.34,290
Is there a command to echo the file's actual data in bash? I've tried using head, cat, more, etc. but all those seem to echo the "print-formatted" text. For example:
$ cat example.csv
0.032,170
0.34,290
How can I print the actual characters within the file?
This reads as if you miss understand what the "actual characters in the file" are. You will not find the characters \ and n in that file. But only a line feed, which is a specific character. So the utilities like cat do actually output exactly the characters in the file.
Putting it the other way around: if you really had those two characters literally in the file, then a utility like cat would actually output them. I just checked that, just to be sure.
You can easily check that yourself if you open the file using a hexeditor. There you will see the character 0A (decimal 10) which is a line feed character. You will not see the pair of the two characters \ and n somewhere in that file.
Many programming languages and also shell environments use escape sequences like \n in string definitions and identify those as control characters which would not be typable otherwise. So maybe that is where your impression comes from that your files should contain those two characters.
To display newlines as \n, you might try:
awk 1 ORS='\\n' input-file
This is not the "actual characters in the file", as \n is merely a conventional method of displaying a newline, but this does seem to be what you want.

Ignore spaces, tabs and new line in SED

I tried to replace a string in a file that contains tabs and line breaks.
the command in the shell file looked something like this:
FILE="/Somewhere"
STRING_OLD="line 1[ \t\r\n]*line 2"
sed -i 's/'"$STRING_OLD"'/'"$STRING_NEW"'/' $FILE
if I manually remove the line breaks and the tabs and leave only the spaces then I can replace successfully the file. but if I leave the line breaks then SED is unable to locate the $STRING_OLD and unable to replace to the new string
thanks in advance
Kobi
sed reads lines one at a time, and usually lines are also processed one at a time, as they are read. However, sed does have facilities for reading additional lines and operating on the combined result. There are several ways that could be applied to your problem, such as:
FILE="/Somewhere"
STRING_OLD="line 1[ \t\r\n]*line 2"
sed -n "1h;2,\$H;\${g;s/$STRING_OLD/$STRING_NEW/g;p}"
That that does more or less what you describe doing manually: it concatenates all the lines of the file (but keeps newlines), and then performs the substitution on the overall buffer, all at once. That does assume, however, either that the file is short (POSIX does not require it to work if the overall file length exceeds 8192 bytes) or that you are using a sed that does not have buffer-size limitations, such as GNU sed. Since you tagged Linux, I'm supposing that GNU sed can be assumed.
In detail:
the -n option turns off line echoing, because we save everything up and print the modified text in one chunk at the end.
there are multiple sed commands, separated by semicolons, and with literal $ characters escaped (for the shell):
1h: when processing the first line of input, replace the "hold space" with the contents of the pattern space (i.e. the first line, excluding newline)
2,\$H: when processing any line from the second through the last, append a newline to the hold space, then the contents of the pattern space
\${g;s/$STRING_OLD/$STRING_NEW/g;p}: when processing the last line, perform this group of commands: copy the hold space into the pattern space; perform the substitution, globally; print the resulting contents of the pattern space.
That's one of the simpler approaches, but if you need to accommodate seds that are not as capable as GNU's with regard to buffer capacity then there are other ways to go about it. Those start to get ugly, though.

Can you force Vim to show a blank line at the end of a file?

When I open a text file in Notepad, it shows a blank line if there is a carriage return at the end of the last line containing text. However, in Vim it does not show this blank line. Another thing I've noticed is that the Vim editor adds a carriage return to the last line by default (even though it doesn't show it). I can tell, because if I open a file in Notepad that was created in Vim, it shows a blank line at the end of the file.
Anyway, I can live with these two differences, but I'm wondering if there is an option in Vim that allows you to toggle this behaviour.
Thanks
PS - GVim 7.2
[Update]
Would this make sense to be on Server Fault instead?
[Update 2]
I'll rephrase this... I need to know when there is a carriage return at the end of single line file (Notepad shows an extra line with no text, with Vim I cannot tell). This is due to a Progress program that reads a text file (expects a single line, but with a carriage return) and parses the text for some purpose. If there is no carriage return, Progress treats the line as if it is null.
[Workaround Solution]
One way I've found to ensure there is a carriage return (but make sure I don't add a second one) is to make sure I have the end of line write option turned on (:set eol) and then just do a write/save. This will put an end of line in the file if it's not already there. Otherwise, it doesn't add a new one.
:help endofline
explains how you could stop vim from adding an extra newline.
It seems that vim treats newline as a line terminator, while notepad treats it as a line separator: from http://en.wikipedia.org/wiki/Newline
There is also some confusion whether
newlines terminate or separate lines.
If a newline is considered a
separator, there will be no newline
after the last line of a file. The
general convention on most systems is
to add a newline even after the last
line, i.e., to treat newline as a line
terminator. Some programs have
problems processing the last line of a
file if it isn't newline terminated.
Conversely, programs that expect
newline to be used as a separator will
interpret a final newline as starting
a new (empty) line. This can result in
a different line count being reported
for the file, but is otherwise
generally harmless.
If I recall correctly, on unix-y systems a text file must be terminated with a newline.
One useful Vim option is
set list
It will help you see all end of lines characters (and possibly other generally invisible chars). So you will be able to view this last endofline directly in Vim and not only in Notepad.
When you open the file in VIM the status line should say [noeol] after the filename. So that's one indication. As Manni said, you can change this by setting both the endofline option off and the binary option on. You can set this as your default settings in a .vimrc file.

Resources