Remove blank lines with grep - text

I tried grep -v '^$' in Linux and that didn't work. This file came from a Windows file system.

Try the following:
grep -v -e '^$' foo.txt
The -e option allows regex patterns for matching.
The single quotes around ^$ makes it work for Cshell. Other shells will be happy with either single or double quotes.
UPDATE: This works for me for a file with blank lines or "all white space" (such as windows lines with \r\n style line endings), whereas the above only removes files with blank lines and unix style line endings:
grep -v -e '^[[:space:]]*$' foo.txt

Keep it simple.
grep . filename.txt

Use:
$ dos2unix file
$ grep -v "^$" file
Or just simply awk:
awk 'NF' file
If you don't have dos2unix, then you can use tools like tr:
tr -d '\r' < "$file" > t ; mv t "$file"

grep -v "^[[:space:]]*$"
The -v makes it print lines that do not completely match
===Each part explained===
^ match start of line
[[:space:]] match whitespace- spaces, tabs, carriage returns, etc.
* previous match (whitespace) may exist from 0 to infinite times
$ match end of line
Running the code-
$ echo "
> hello
>
> ok" |
> grep -v "^[[:space:]]*$"
hello
ok
To understand more about how/why this works, I recommend reading up on regular expressions. http://www.regular-expressions.info/tutorial.html

If you have sequences of multiple blank lines in a row, and would like only one blank line per sequence, try
grep -v "unwantedThing" foo.txt | cat -s
cat -s suppresses repeated empty output lines.
Your output would go from
match1
match2
to
match1
match2
The three blank lines in the original output would be compressed or "squeezed" into one blank line.

The same as the previous answers:
grep -v -e '^$' foo.txt
Here, grep -e means the extended version of grep. '^$' means that there isn't any character between ^(Start of line) and $(end of line). '^' and '$' are regex characters.
So the command grep -v will print all the lines that do not match this pattern (No characters between ^ and $).
This way, empty blank lines are eliminated.

I prefer using egrep, though in my test with a genuine file with blank line your approach worked fine (though without quotation marks in my test). This worked too:
egrep -v "^(\r?\n)?$" filename.txt

Do lines in the file have whitespace characters?
If so then
grep "\S" file.txt
Otherwise
grep . file.txt
Answer obtained from:
https://serverfault.com/a/688789

This code removes blank lines and lines that start with "#"
grep -v "^#" file.txt | grep -v ^[[:space:]]*$

awk 'NF' file-with-blank-lines > file-with-no-blank-lines

It's true that the use of grep -v -e '^$' can work, however it does not remove blank lines that have 1 or more spaces in them. I found the easiest and simplest answer for removing blank lines is the use of awk. The following is a modified a bit from the awk guys above:
awk 'NF' foo.txt
But since this question is for using grep I'm going to answer the following:
grep -v '^ *$' foo.txt
Note: the blank space between the ^ and *.
Or you can use the \s to represent blank space like this:
grep -v '^\s*$' foo.txt

I tried hard, but this seems to work (assuming \r is biting you here):
printf "\r" | egrep -xv "[[:space:]]*"

Using Perl:
perl -ne 'print if /\S/'
\S means match non-blank characters.

egrep -v "^\s\s+"
egrep already do regex, and the \s is white space.
The + duplicates current pattern.
The ^ is for the start

Use:
grep pattern filename.txt | uniq

Here is another way of removing the white lines and lines starting with the # sign. I think this is quite useful to read configuration files.
[root#localhost ~]# cat /etc/sudoers | egrep -v '^(#|$)'
Defaults requiretty
Defaults !visiblepw
Defaults always_set_home
Defaults env_reset
Defaults env_keep = "COLORS DISPLAY HOSTNAME HISTSIZE INPUTRC KDEDIR
LS_COLORS"
root ALL=(ALL) ALL
%wheel ALL=(ALL) ALL
stack ALL=(ALL) NOPASSWD: ALL

Read lines from file exclude EMPTY Lines
grep -v '^$' folderlist.txt
folderlist.txt
folder1/test
folder2
folder3
folder4/backup
folder5/backup
Results will be:
folder1/test
folder2
folder3
folder4/backup
folder5/backup

Related

grep: Invalid regular expression

I have a text file which looks like this:
haha1,haha2,haha3,haha4
test1,test2,test3,test4,[offline],test5
letter1,letter2,letter3,letter4
output1,output2,[offline],output3,output4
check1,[core],check2
num1,num2,num3,num4
I need to exclude all those lines that have "[ ]" and output them to another file without all those lines that have "[ ]".
I'm currently using this command:
grep ",[" loaded.txt | wc -l > newloaded.txt
But it's giving me an error:
grep: Invalid regular expression
Use grep -F to treat the search pattern as a fixed string. You could also replace wc -l with grep -c.
grep -cF ",[" loaded.txt > newloaded.txt
If you're curious, [ is a special character. If you don't use -F then you'll need to escape it with a backslash.
grep -c ",\[" loaded.txt > newloaded.txt
By the way, I'm not sure why you're using wc -l anyways...? From your problem description, it sounds like grep -v might be more appropriate. -v inverts grep's normal output, printing lines that don't match.
grep -vF ",[" loaded.txt > newloaded.txt
An alternative method to Grep
It's unclear if you want to remove lines that might contain either bracket [], or only the ones where the brackets specifically surround characters. Regardless of which method you intend to use, sed can easily remove lines that fit a definitive pattern:
To delete only lines that contained both brackets surrounding characters [...]:
sed '/\[.*\]/d' loaded.txt > newloaded.txt
Another approach might be to remove any line that contained either bracket:
sed '/\[/d;/\]/d' loaded.txt > newloaded.txt
(eg. lines containing either [ or ] would be deleted)
Your grep command doesn't seem to be excluding anything. Also, why are you using wc? I thought you want the lines, not their count.
So if you just want the lines, as you say, that don't have [], then this should work:
grep -v "\[" loaded.txt > new.txt
You can also use awk for this:
awk -F\[ 'NF==1' file > newfile
cat newfile
haha1,haha2,haha3,haha4
letter1,letter2,letter3,letter4
num1,num2,num3,num4
Or this:
awk '!/\[/' file

Delete empty lines using sed

I am trying to delete empty lines using sed:
sed '/^$/d'
but I have no luck with it.
For example, I have these lines:
xxxxxx
yyyyyy
zzzzzz
and I want it to be like:
xxxxxx
yyyyyy
zzzzzz
What should be the code for this?
You may have spaces or tabs in your "empty" line. Use POSIX classes with sed to remove all lines containing only whitespace:
sed '/^[[:space:]]*$/d'
A shorter version that uses ERE, for example with gnu sed:
sed -r '/^\s*$/d'
(Note that sed does NOT support PCRE.)
I am missing the awk solution:
awk 'NF' file
Which would return:
xxxxxx
yyyyyy
zzzzzz
How does this work? Since NF stands for "number of fields", those lines being empty have 0 fields, so that awk evaluates 0 to False and no line is printed; however, if there is at least one field, the evaluation is True and makes awk perform its default action: print the current line.
sed
'/^[[:space:]]*$/d'
'/^\s*$/d'
'/^$/d'
-n '/^\s*$/!p'
grep
.
-v '^$'
-v '^\s*$'
-v '^[[:space:]]*$'
awk
/./
'NF'
'length'
'/^[ \t]*$/ {next;} {print}'
'!/^[ \t]*$/'
sed '/^$/d' should be fine, are you expecting to modify the file in place? If so you should use the -i flag.
Maybe those lines are not empty, so if that's the case, look at this question Remove empty lines from txtfiles, remove spaces from start and end of line I believe that's what you're trying to achieve.
I believe this is the easiest and fastest one:
cat file.txt | grep .
If you need to ignore all white-space lines as well then try this:
cat file.txt | grep '\S'
Example:
s="\
\
a\
b\
\
Below is TAB:\
\
Below is space:\
\
c\
\
"; echo "$s" | grep . | wc -l; echo "$s" | grep '\S' | wc -l
outputs
7
5
Another option without sed, awk, perl, etc
strings $file > $output
strings - print the strings of printable characters in files.
With help from the accepted answer here and the accepted answer above, I have used:
$ sed 's/^ *//; s/ *$//; /^$/d; /^\s*$/d' file.txt > output.txt
`s/^ *//` => left trim
`s/ *$//` => right trim
`/^$/d` => remove empty line
`/^\s*$/d` => delete lines which may contain white space
This covers all the bases and works perfectly for my needs. Kudos to the original posters #Kent and #kev
The command you are trying is correct, just use -E flag with it.
sed -E '/^$/d'
-E flag makes sed catch extended regular expressions. More info here
You can say:
sed -n '/ / p' filename #there is a space between '//'
You are most likely seeing the unexpected behavior because your text file was created on Windows, so the end of line sequence is \r\n. You can use dos2unix to convert it to a UNIX style text file before running sed or use
sed -r "/^\r?$/d"
to remove blank lines whether or not the carriage return is there.
This works in awk as well.
awk '!/^$/' file
xxxxxx
yyyyyy
zzzzzz
You can do something like that using "grep", too:
egrep -v "^$" file.txt
My bash-specific answer is to recommend using perl substitution operator with the global pattern g flag for this, as follows:
$ perl -pe s'/^\n|^[\ ]*\n//g' $file
xxxxxx
yyyyyy
zzzzzz
This answer illustrates accounting for whether or not the empty lines have spaces in them ([\ ]*), as well as using | to separate multiple search terms/fields. Tested on macOS High Sierra and CentOS 6/7.
FYI, the OP's original code sed '/^$/d' $file works just fine in bash Terminal on macOS High Sierra and CentOS 6/7 Linux at a high-performance supercomputing cluster.
If you want to use modern Rust tools, you can consider:
ripgrep:
cat datafile | rg '.' line with spaces is considered non empty
cat datafile | rg '\S' line with spaces is considered empty
rg '\S' datafile line with spaces is considered empty (-N can be added to remove line numbers for on screen display)
sd
cat datafile | sd '^\n' '' line with spaces is considered non empty
cat datafile | sd '^\s*\n' '' line with spaces is considered empty
sd '^\s*\n' '' datafile inplace edit
Using vim editor to remove empty lines
:%s/^$\n//g
For me with FreeBSD 10.1 with sed worked only this solution:
sed -e '/^[ ]*$/d' "testfile"
inside [] there are space and tab symbols.
test file contains:
fffffff next 1 tabline ffffffffffff
ffffffff next 1 Space line ffffffffffff
ffffffff empty 1 lines ffffffffffff
============ EOF =============
NF is the command of awk you can use to delete empty lines in a file
awk NF filename
and by using sed
sed -r "/^\r?$/d"

How to remove lines from text file not starting with certain characters (sed or grep)

How do I delete all lines in a text file which do not start with the characters #, & or *? I'm looking for a solution using sed or grep.
Deleting lines:
With grep
From http://lowfatlinux.com/linux-grep.html :
The grep command selects and prints lines from a file (or a bunch of files) that match a pattern.
I think you can do something like this:
grep -v '^[\#\&\*]' yourFile.txt > output.txt
You can also use sed to do the same thing (check http://lowfatlinux.com/linux-sed.html ):
sed '^[\#\&\*]/d' yourFile.txt > output.txt
It's up to you to decide
Filtering lines:
My mistake, I understood you wanted to delete the lines. But if you want to "delete" all other lines (or filter the lines starting with the specified characters), then grep is the way to go:
grep '^[\#\&\*]' yourFile.txt > output.txt
sed -n '/^[#&*].*/p' input.txt > output.txt
this should work.
sed -ni '/^[#&*].*/p' input.txt
this one will edit the input file directly, be careful +
egrep '^(&|#|\*)' input.txt > output.txt

grep - print all lines containing 'cat' as the second word

Ok so considering i have a file containing the following text:
lknsglkn cat lknrhlkn lsrhkn
cat lknerylnk lknaselk cat
awiooiyt lkndrhlk dhlknl
blabla cat cat bla bla
I need to use grep to print only the lines containing 'cat' as the second word on the line, namely lines 1 and 4. I've tried multiple grep -e 'regex' <file> commands but can't seem to get the right one. I don't know how to match the N'th word of a line.
this may work for you?
grep -E '^\w+\s+cat\s' file
if the first "word" can contain some non-word characters, e.g. "#, (,[..", you could also try:
grep -E '^\S+\s+cat\s' file
with your example input:
kent$ echo "lknsglkn cat lknrhlkn lsrhkn
cat lknerylnk lknaselk cat
awiooiyt lkndrhlk dhlknl
blabla cat cat bla bla"|grep -E '^\S+\s+cat\s'
lknsglkn cat lknrhlkn lsrhkn
blabla cat cat bla bla
What constitutes a word?
grep '^[a-z][a-z]* *cat '
This will work if there is at least a blank after cat. If that's not guaranteed, then:
grep -E '^[a-z]+ +cat( |$)'
which looks for cat followed by a blank or end of line.
If you want a more extensive definition of 'first word' (upper case, digits, punctuation), change the character class. If you want to allow for blanks or tabs, there are changes that can be made. If you can have leading blanks, add '*' at the caret. Variations as required.
These variations will work with any version of grep that supports the -E option. POSIX does not mandate notations such as \S to mean 'non-white-space', though GNU grep does support that as an extension. The grep -E version will work with regular egrep if grep -E does not work but egrep exists (don't use the -E option with egrep).
The following should work:
grep -e '^\S\+\scat\s'
The line should start with a non-whitespace of length at least 1, followed by a whitespace and the word "cat" followed by a whitespace.
Will be slower, but perhaps more readable:
awk '$2 == "cat"' file

Linux, Print all lines in a file, NOT starting with

I would like to print the contents of a file, but all lines starting with # I want to ignore. I was trying some stuff with grep and awk, but it kept printing the whole file, or just printed the lines starting with #. I you could give me a push in the right way, or a grep/awk command that would print anyline in the file that does not start with #.
Use the -v option of grep to negate the condition:
grep -v '^#' file
You can use the ! operator:
awk '!/^ *#/ { print; }'
This negates the result of the match. I also included lines that start with spaces and then #, but you can tailor the regex how you like.
You could use grep to exclude all lines that begin with # using the -v option
grep -v '^#' filename
If you're a fan of sed:
sed '/^#/d' filename
This would also leave out lines with whitespace before the # :
awk '$1!~/^#/' file
or
grep -v '^[[:blank:]]*#' file
Here is the grep PCRE way,
grep -P '^(?!#)' file

Resources