Bash - How to remove all white spaces from a given text file? - linux

I want to remove all the white spaces from a given text file.
Is there any shell command available for this ?
Or, how to use sed for this purpose?
I want something like below:
$ cat hello.txt | sed ....
I tried this : cat hello.txt | sed 's/ //g' .
But it removes only spaces, not tabs.
Thanks.

$ man tr
NAME
tr - translate or delete characters
SYNOPSIS
tr [OPTION]... SET1 [SET2]
DESCRIPTION
Translate, squeeze, and/or delete characters from standard
input, writing to standard output.
In order to wipe all whitespace including newlines you can try:
cat file.txt | tr -d " \t\n\r"
You can also use the character classes defined by tr (credits to htompkins comment):
cat file.txt | tr -d "[:space:]"
For example, in order to wipe just horizontal white space:
cat file.txt | tr -d "[:blank:]"

Much simpler to my opinion:
sed -r 's/\s+//g' filename

I think you may use sed to wipe out the space while not losing some infomation like changing to another line.
cat hello.txt | sed '/^$/d;s/[[:blank:]]//g'
To apply into existing file, use following:
sed -i '/^$/d;s/[[:blank:]]//g' hello.txt

Try this:
sed -e 's/[\t ]//g;/^$/d'
(found here)
The first part removes all tabs (\t) and spaces, and the second part removes all empty lines

If you want to remove ALL whitespace, even newlines:
perl -pe 's/\s+//g' file

This answer is similar to other however as some people have been complaining that the output goes to STDOUT i am just going to suggest redirecting it to the original file and overwriting it. I would never normally suggest this but sometimes quick and dirty works.
cat file.txt | tr -d " \t\n\r" > file.txt

Easiest way for me:
echo "Hello my name is Donald" | sed s/\ //g

This is probably the simplest way of doing it:
sed -r 's/\s+//g' filename > output
mv ouput filename

Dude, Just python test.py in your terminal.
f = open('/home/hduser/Desktop/data.csv' , 'r')
x = f.read().split()
f.close()
y = ' '.join(x)
f = open('/home/hduser/Desktop/data.csv','w')
f.write(y)
f.close()

Try this:
tr -d " \t" <filename
See the manpage for tr(1) for more details.

hmm...seems like something on the order of sed -e "s/[ \t\n\r\v]//g" < hello.txt should be in the right ballpark (seems to work under cygwin in any case).

Related

unescaped newline inside substitute pattern in sed variable [duplicate]

Here are my attempts to replace a b character with a newline using sed while running bash
$> echo 'abc' | sed 's/b/\n/'
anc
no, that's not it
$> echo 'abc' | sed 's/b/\\n/'
a\nc
no, that's not it either. The output I want is
a
c
HELP!
Looks like you are on BSD or Solaris. Try this:
[jaypal:~/Temp] echo 'abc' | sed 's/b/\
> /'
a
c
Add a black slash and hit enter and complete your sed statement.
$ echo 'abc' | sed 's/b/\'$'\n''/'
a
c
In Bash, $'\n' expands to a single quoted newline character (see "QUOTING" section of man bash). The three strings are concatenated before being passed into sed as an argument. Sed requires that the newline character be escaped, hence the first backslash in the code I pasted.
You didn't say you want to globally replace all b. If yes, you want tr instead:
$ echo abcbd | tr b $'\n'
a
c
d
Works for me on Solaris 5.8 and bash 2.03
In a multiline file I had to pipe through tr on both sides of sed, like so:
echo "$FILE_CONTENTS" | \
tr '\n' ¥ | tr ' ' ∑ | mySedFunction $1 | tr ¥ '\n' | tr ∑ ' '
See unix likes to strip out newlines and extra leading spaces and all sorts of things, because I guess that seemed like the thing to do at the time when it was made back in the 1900s. Anyway, this method I show above solves the problem 100%. Wish I would have seen someone post this somewhere because it would have saved me about three hours of my life.
echo 'abc' | sed 's/b/\'\n'/'
you are missing '' around \n

Delete empty lines using sed

I am trying to delete empty lines using sed:
sed '/^$/d'
but I have no luck with it.
For example, I have these lines:
xxxxxx
yyyyyy
zzzzzz
and I want it to be like:
xxxxxx
yyyyyy
zzzzzz
What should be the code for this?
You may have spaces or tabs in your "empty" line. Use POSIX classes with sed to remove all lines containing only whitespace:
sed '/^[[:space:]]*$/d'
A shorter version that uses ERE, for example with gnu sed:
sed -r '/^\s*$/d'
(Note that sed does NOT support PCRE.)
I am missing the awk solution:
awk 'NF' file
Which would return:
xxxxxx
yyyyyy
zzzzzz
How does this work? Since NF stands for "number of fields", those lines being empty have 0 fields, so that awk evaluates 0 to False and no line is printed; however, if there is at least one field, the evaluation is True and makes awk perform its default action: print the current line.
sed
'/^[[:space:]]*$/d'
'/^\s*$/d'
'/^$/d'
-n '/^\s*$/!p'
grep
.
-v '^$'
-v '^\s*$'
-v '^[[:space:]]*$'
awk
/./
'NF'
'length'
'/^[ \t]*$/ {next;} {print}'
'!/^[ \t]*$/'
sed '/^$/d' should be fine, are you expecting to modify the file in place? If so you should use the -i flag.
Maybe those lines are not empty, so if that's the case, look at this question Remove empty lines from txtfiles, remove spaces from start and end of line I believe that's what you're trying to achieve.
I believe this is the easiest and fastest one:
cat file.txt | grep .
If you need to ignore all white-space lines as well then try this:
cat file.txt | grep '\S'
Example:
s="\
\
a\
b\
\
Below is TAB:\
\
Below is space:\
\
c\
\
"; echo "$s" | grep . | wc -l; echo "$s" | grep '\S' | wc -l
outputs
7
5
Another option without sed, awk, perl, etc
strings $file > $output
strings - print the strings of printable characters in files.
With help from the accepted answer here and the accepted answer above, I have used:
$ sed 's/^ *//; s/ *$//; /^$/d; /^\s*$/d' file.txt > output.txt
`s/^ *//` => left trim
`s/ *$//` => right trim
`/^$/d` => remove empty line
`/^\s*$/d` => delete lines which may contain white space
This covers all the bases and works perfectly for my needs. Kudos to the original posters #Kent and #kev
The command you are trying is correct, just use -E flag with it.
sed -E '/^$/d'
-E flag makes sed catch extended regular expressions. More info here
You can say:
sed -n '/ / p' filename #there is a space between '//'
You are most likely seeing the unexpected behavior because your text file was created on Windows, so the end of line sequence is \r\n. You can use dos2unix to convert it to a UNIX style text file before running sed or use
sed -r "/^\r?$/d"
to remove blank lines whether or not the carriage return is there.
This works in awk as well.
awk '!/^$/' file
xxxxxx
yyyyyy
zzzzzz
You can do something like that using "grep", too:
egrep -v "^$" file.txt
My bash-specific answer is to recommend using perl substitution operator with the global pattern g flag for this, as follows:
$ perl -pe s'/^\n|^[\ ]*\n//g' $file
xxxxxx
yyyyyy
zzzzzz
This answer illustrates accounting for whether or not the empty lines have spaces in them ([\ ]*), as well as using | to separate multiple search terms/fields. Tested on macOS High Sierra and CentOS 6/7.
FYI, the OP's original code sed '/^$/d' $file works just fine in bash Terminal on macOS High Sierra and CentOS 6/7 Linux at a high-performance supercomputing cluster.
If you want to use modern Rust tools, you can consider:
ripgrep:
cat datafile | rg '.' line with spaces is considered non empty
cat datafile | rg '\S' line with spaces is considered empty
rg '\S' datafile line with spaces is considered empty (-N can be added to remove line numbers for on screen display)
sd
cat datafile | sd '^\n' '' line with spaces is considered non empty
cat datafile | sd '^\s*\n' '' line with spaces is considered empty
sd '^\s*\n' '' datafile inplace edit
Using vim editor to remove empty lines
:%s/^$\n//g
For me with FreeBSD 10.1 with sed worked only this solution:
sed -e '/^[ ]*$/d' "testfile"
inside [] there are space and tab symbols.
test file contains:
fffffff next 1 tabline ffffffffffff
ffffffff next 1 Space line ffffffffffff
ffffffff empty 1 lines ffffffffffff
============ EOF =============
NF is the command of awk you can use to delete empty lines in a file
awk NF filename
and by using sed
sed -r "/^\r?$/d"

Text formating - sed, awk, shell

I need some assistance trying to build up a variable using a list of exclusions in a file.
So I have a exclude file I am using for rsync that looks like this:
*.log
*.out
*.csv
logs
shared
tracing
jdk*
8.6_Code
rpsupport
dbarchive
inarchive
comms
PR116PICL
**/lost+found*/
dlxwhsr*
regression
tmp
working
investigation
Investigation
dcsserver_weblogic_
dcswebrdtEAR_weblogic_
I need to build up a string to be used as a variable to feed into egrep -v, so that I can use the same exclusion list for rsync as I do when egrep -v from a find -ls.
So I have created this so far to remove all "*" and "/" - and then when it sees certain special characters it escapes them:
cat exclude-list.supt | while read line
do
echo $line | sed 's/\*//g' | sed 's/\///g' | 's/\([.-+_]\)/\\\1/g'
What I need the ouput too look like is this and then export that as a variable:
SEXCLUDE_supt="\.log|\.out|\.csv|logs|shared|PR116PICL|tracing|lost\+found|jdk|8\.6\_Code|rpsupport|dbarchive|inarchive|comms|dlxwhsr|regression|tmp|working|investigation|Investigation|dcsserver\_weblogic\_|dcswebrdtEAR\_weblogic\_"
Can anyone help?
A few issues with the following:
cat exclude-list.supt | while read line
do
echo $line | sed 's/\*//g' | sed 's/\///g' | 's/\([.-+_]\)/\\\1/g'
Sed reads files line by line so cat | while read line;do echo $line | sed is completely redundant also sed can do multiple substitutions by either passing them as a comma separated list or using the -e option so piping to sed three times is two too many. A problem with '[.-+_]' is the - is between . and + so it's interpreted as a range .-+ when using - inside a character class put it at the end beginning or end to lose this meaning like [._+-].
A much better way:
$ sed -e 's/[*/]//g' -e 's/\([._+-]\)/\\\1/g' file
\.log
\.out
\.csv
logs
shared
tracing
jdk
8\.6\_Code
rpsupport
dbarchive
inarchive
comms
PR116PICL
lost\+found
dlxwhsr
regression
tmp
working
investigation
Investigation
dcsserver\_weblogic\_
dcswebrdtEAR\_weblogic\_
Now we can pipe through tr '\n' '|' to replace the newlines with pipes for the alternation ready for egrep:
$ sed -e 's/[*/]//g' -e 's/\([._+-]\)/\\\1/g' file | tr "\n" "|"
\.log|\.out|\.csv|logs|shared|tracing|jdk|8\.6\_Code|rpsupport|dbarchive|...
$ EXCLUDE=$(sed -e 's/[*/]//g' -e 's/\([._+-]\)/\\\1/g' file | tr "\n" "|")
$ echo $EXCLUDE
\.log|\.out|\.csv|logs|shared|tracing|jdk|8\.6\_Code|rpsupport|dbarchive|...
Note: If your file ends with a newline character you will want to remove the final trailing |, try sed 's/\(.*\)|/\1/'.
This might work for you (GNU sed):
SEXCLUDE_supt=$(sed '1h;1!H;$!d;g;s/[*\/]//g;s/\([.-+_]\)/\\\1/g;s/\n/|/g' file)
This should work but I guess there are better solutions. First store everything in a bash array:
SEXCLUDE_supt=$( sed -e 's/\*//g' -e 's/\///g' -e 's/\([.-+_]\)/\\\1/g' exclude-list.supt)
and then process it again to substitute white space:
SEXCLUDE_supt=$(echo $SEXCLUDE_supt |sed 's/\s/|/g')

How to remove a special character in a string in a file using linux commands

I need to remove the character : from a file. Ex: I have numbers in the following format:
b3:07:4d
I want them to be like:
b3074d
I am using the following command:
grep ':' source.txt | sed -e 's/://' > des.txt
I am new to Linux. The file is quite big & I want to make sure I'm using the write command.
You can do without the grep:
sed -e 's/://g' source.txt > des.txt
The -i option edits the file in place.
sed -i 's/://' source.txt
the first part isn't right as it'll completely omit lines which don't contain :
below is untested but should be right. The g at end of the regex is for global, means it should get them all.
sed -e 's/://g' source.txt > out.txt
updated to better syntax from Jon Lin's answer but you still want the /g I would think

Delete whitespace in each begin of line of file, using bash

How i can delete whitespace in each line of file, using bash
For instance, file1.txt. Before:
gg g
gg g
t ttt
after:
gg g
gg g
t ttt
sed -i 's/ //g' your_file will do it, modifying the file inplace.
To delete only the whitespaces at the beginning of one single line, use sed -i 's/^ *//' your_file
In the first expression, we replace all spaces with nothing.
In the second one, we replace at the beginning using the ^ keyword
tr(delete all whitespaces):
$ tr -d ' ' <input.txt >output.txt
$ mv output.txt input.txt
sed(delete leading whitespaces)
$ sed -i 's/^ *//' input.txt
use can use perl -i for in place replacement.
perl -p -e 's/^ *//' file
To delete the white spaces before start of the line if the pattern matches. Use the following command.
For example your foo.in has pattern like this
This is a test
Lolll
blaahhh
This is a testtt
After issuing following command
sed -e '/This/s/ *//' < foo.in > foo.out
The foo.out will be
This is a test
Lolll
blaahhh
This is a testtt
"Whitespace" can include both spaces AND tabs. The solutions presented to date will only match and operate successfully on spaces; they will fail if the whitespace takes the form of a tab.
The below has been tested on the OP's specimen data set with both spaces AND tabs, matching successfully & operating on both:
sed 's/^[[:blank:]]*//g' yourFile
After testing, supply the -i switch to sed to make the changes persistent-

Resources