How to do str.strip() for every line in a text file? Unix - string

I could do the following in python to clean and strip unwanted whitespaces, but can it be done just through the terminal by other means like sed , grep or something?
outfile = open('textstripped.txt','w+','utf8')
for i in open('textfile.txt','r','utf8'):
print>>outfile, i.strip()

Using perl on the command line:
perl -lpe 's/^\s+//; s/\s+$//' file.txt > stripped.txt

This solution is based on sed man page:
sed 'y/\t/ /;s/^ *//;s/ *$//' input > output
http://www.gnu.org/software/sed/manual/sed.html#Centering-lines
Description:
y\t/ / replaces tabs with spaces
s/^ *// removes leading spaces
s/ *$// removes trailing spaces

$ cat input.txt | sed 's/^[ \t]*//;s/[ \t]*$//' > output.txt
This gets rid of the leading and trailing white spaces..
EDIT: sed -e "s/^[ \t]+//; s/[ \t]+$//" -i .bk input.txt
This does in place file editing, and saves backup to input.txt.bk (and saves a process as some suggested)

sed -E "s/(^[ \t]+|[ \t]+$)//" < input > output
Or if you have a GNU-compliant version of SED:
sed -E "s/^\s+|\s+$//g" < in > out
If you have a Mac, I recommed getting homebrew and installing gnu-sed.
Then, alias sed=gsed.

Related

How do I replace single quotes with another character in sed?

I have a flat file where I have multiple occurrences of strings that contains single quote, e.g. hari's and leader's.
I want to replace all occurrences of the single quote with space, i.e.
all occurences of hari's to hari s
all occurences of leader's to leader s
I tried
sed -e 's/"'"/ /g' myfile.txt
and
sed -e 's/"'"/" "/g' myfile.txt
but they are not giving me the expected result.
Try to keep sed commands simple as much as possible.
Otherwise you'll get confused of what you'd written reading it later.
#!/bin/bash
sed "s/'/ /g" myfile.txt
This will do what you want to
echo "hari's"| sed 's/\x27/ /g'
It will replace single quotes present anywhere in your file/text. Even if they are used for quoting they will be replaced with spaces. In that case(remove the quotes within a word not at word boundary) you can use the following:
echo "hari's"| sed -re 's/(\<.+)\x27(.+\>)/\1 \2/g'
HTH
Just go leave the single quote and put an escaped single quote:
sed 's/'\''/ /g' input
also possible with a variable:
quote=\'
sed "s/$quote/ /g" input
Here is based on my own experience.
Please notice on how I use special char ' vs " after sed
This won't do (no output)
2521 #> echo 1'2'3'4'5 | sed 's/'/ /g'
>
>
>
but This would do
2520 #> echo 1'2'3'4'5 | sed "s/'/ /g"
12345
The -i should replace it in the file
sed -i 's/“/"/g' filename.txt
if you want backups you can do
sed -i.bak 's/“/"/g' filename.txt
I had to replace "0x" string with "32'h" and resolved with:
sed 's/ 0x/ 32\x27h/'

Delete empty lines using sed

I am trying to delete empty lines using sed:
sed '/^$/d'
but I have no luck with it.
For example, I have these lines:
xxxxxx
yyyyyy
zzzzzz
and I want it to be like:
xxxxxx
yyyyyy
zzzzzz
What should be the code for this?
You may have spaces or tabs in your "empty" line. Use POSIX classes with sed to remove all lines containing only whitespace:
sed '/^[[:space:]]*$/d'
A shorter version that uses ERE, for example with gnu sed:
sed -r '/^\s*$/d'
(Note that sed does NOT support PCRE.)
I am missing the awk solution:
awk 'NF' file
Which would return:
xxxxxx
yyyyyy
zzzzzz
How does this work? Since NF stands for "number of fields", those lines being empty have 0 fields, so that awk evaluates 0 to False and no line is printed; however, if there is at least one field, the evaluation is True and makes awk perform its default action: print the current line.
sed
'/^[[:space:]]*$/d'
'/^\s*$/d'
'/^$/d'
-n '/^\s*$/!p'
grep
.
-v '^$'
-v '^\s*$'
-v '^[[:space:]]*$'
awk
/./
'NF'
'length'
'/^[ \t]*$/ {next;} {print}'
'!/^[ \t]*$/'
sed '/^$/d' should be fine, are you expecting to modify the file in place? If so you should use the -i flag.
Maybe those lines are not empty, so if that's the case, look at this question Remove empty lines from txtfiles, remove spaces from start and end of line I believe that's what you're trying to achieve.
I believe this is the easiest and fastest one:
cat file.txt | grep .
If you need to ignore all white-space lines as well then try this:
cat file.txt | grep '\S'
Example:
s="\
\
a\
b\
\
Below is TAB:\
\
Below is space:\
\
c\
\
"; echo "$s" | grep . | wc -l; echo "$s" | grep '\S' | wc -l
outputs
7
5
Another option without sed, awk, perl, etc
strings $file > $output
strings - print the strings of printable characters in files.
With help from the accepted answer here and the accepted answer above, I have used:
$ sed 's/^ *//; s/ *$//; /^$/d; /^\s*$/d' file.txt > output.txt
`s/^ *//` => left trim
`s/ *$//` => right trim
`/^$/d` => remove empty line
`/^\s*$/d` => delete lines which may contain white space
This covers all the bases and works perfectly for my needs. Kudos to the original posters #Kent and #kev
The command you are trying is correct, just use -E flag with it.
sed -E '/^$/d'
-E flag makes sed catch extended regular expressions. More info here
You can say:
sed -n '/ / p' filename #there is a space between '//'
You are most likely seeing the unexpected behavior because your text file was created on Windows, so the end of line sequence is \r\n. You can use dos2unix to convert it to a UNIX style text file before running sed or use
sed -r "/^\r?$/d"
to remove blank lines whether or not the carriage return is there.
This works in awk as well.
awk '!/^$/' file
xxxxxx
yyyyyy
zzzzzz
You can do something like that using "grep", too:
egrep -v "^$" file.txt
My bash-specific answer is to recommend using perl substitution operator with the global pattern g flag for this, as follows:
$ perl -pe s'/^\n|^[\ ]*\n//g' $file
xxxxxx
yyyyyy
zzzzzz
This answer illustrates accounting for whether or not the empty lines have spaces in them ([\ ]*), as well as using | to separate multiple search terms/fields. Tested on macOS High Sierra and CentOS 6/7.
FYI, the OP's original code sed '/^$/d' $file works just fine in bash Terminal on macOS High Sierra and CentOS 6/7 Linux at a high-performance supercomputing cluster.
If you want to use modern Rust tools, you can consider:
ripgrep:
cat datafile | rg '.' line with spaces is considered non empty
cat datafile | rg '\S' line with spaces is considered empty
rg '\S' datafile line with spaces is considered empty (-N can be added to remove line numbers for on screen display)
sd
cat datafile | sd '^\n' '' line with spaces is considered non empty
cat datafile | sd '^\s*\n' '' line with spaces is considered empty
sd '^\s*\n' '' datafile inplace edit
Using vim editor to remove empty lines
:%s/^$\n//g
For me with FreeBSD 10.1 with sed worked only this solution:
sed -e '/^[ ]*$/d' "testfile"
inside [] there are space and tab symbols.
test file contains:
fffffff next 1 tabline ffffffffffff
ffffffff next 1 Space line ffffffffffff
ffffffff empty 1 lines ffffffffffff
============ EOF =============
NF is the command of awk you can use to delete empty lines in a file
awk NF filename
and by using sed
sed -r "/^\r?$/d"

How to remove lines from text file not starting with certain characters (sed or grep)

How do I delete all lines in a text file which do not start with the characters #, & or *? I'm looking for a solution using sed or grep.
Deleting lines:
With grep
From http://lowfatlinux.com/linux-grep.html :
The grep command selects and prints lines from a file (or a bunch of files) that match a pattern.
I think you can do something like this:
grep -v '^[\#\&\*]' yourFile.txt > output.txt
You can also use sed to do the same thing (check http://lowfatlinux.com/linux-sed.html ):
sed '^[\#\&\*]/d' yourFile.txt > output.txt
It's up to you to decide
Filtering lines:
My mistake, I understood you wanted to delete the lines. But if you want to "delete" all other lines (or filter the lines starting with the specified characters), then grep is the way to go:
grep '^[\#\&\*]' yourFile.txt > output.txt
sed -n '/^[#&*].*/p' input.txt > output.txt
this should work.
sed -ni '/^[#&*].*/p' input.txt
this one will edit the input file directly, be careful +
egrep '^(&|#|\*)' input.txt > output.txt

Delete whitespace in each begin of line of file, using bash

How i can delete whitespace in each line of file, using bash
For instance, file1.txt. Before:
gg g
gg g
t ttt
after:
gg g
gg g
t ttt
sed -i 's/ //g' your_file will do it, modifying the file inplace.
To delete only the whitespaces at the beginning of one single line, use sed -i 's/^ *//' your_file
In the first expression, we replace all spaces with nothing.
In the second one, we replace at the beginning using the ^ keyword
tr(delete all whitespaces):
$ tr -d ' ' <input.txt >output.txt
$ mv output.txt input.txt
sed(delete leading whitespaces)
$ sed -i 's/^ *//' input.txt
use can use perl -i for in place replacement.
perl -p -e 's/^ *//' file
To delete the white spaces before start of the line if the pattern matches. Use the following command.
For example your foo.in has pattern like this
This is a test
Lolll
blaahhh
This is a testtt
After issuing following command
sed -e '/This/s/ *//' < foo.in > foo.out
The foo.out will be
This is a test
Lolll
blaahhh
This is a testtt
"Whitespace" can include both spaces AND tabs. The solutions presented to date will only match and operate successfully on spaces; they will fail if the whitespace takes the form of a tab.
The below has been tested on the OP's specimen data set with both spaces AND tabs, matching successfully & operating on both:
sed 's/^[[:blank:]]*//g' yourFile
After testing, supply the -i switch to sed to make the changes persistent-

Use sed to delete all leading/following blank spaces in a text file

File1:
hello
world
How would one delete the leading/trailing blank spaces within this file using sed - using one command (no intermediate files)?
I've currently got:
sed -e 's/^[ \t]*//' a > b
For leading spaces.
sed 's/ *$//' b > c
And this for trailing spaces.
You almost got it:
sed -e 's/^[ \t]*//;s/[ \t]*$//' a > c
Moreover on some flavours of sed, there is also an option for editing inline:
sed -i -e 's/^[ \t]*//;s/[ \t]*$//' a
easier way, using awk
awk '{$1=$1}1' file
or
awk '{gsub(/^ +| +$/,"")}1' file
perl -lape 's/^\s+|\s+$//g'
Honestly, I know perl regexps the best, so I find perl -lape much easier to use than sed -e.
Also, to answer the original question, you can have sed execute multiple operations like this:
sed -e 's/something/something else/' -e 's/another substitution/another replacement/'
Apparently you can also put the two substitutions in one string and separate them with a semicolon, as indicated in another answer.
Note that in the more general case of applying several filters in a row to an input file without using intermediate files, the solution is to use pipes:
sed -e 's/^[ \t]*//' a | sed -e 's/ *$//' > c
Obviously they are not required here because one invocation of sed is sufficient, but if the second sed command was something different, like uniq or sort, then this pattern is the right one.

Resources