blank lines in new file written by sed - text

I used sed '/pattern/d' file > newfile to remove some lines in a text file, but there are lots of blank lines left in the new file.
How can I modify the command to avoid the blank lines?

sed '/pattern/d; /^$/d' file > newfile
There is some good discussion about regular expressions for deleting empty lines in a file in this Stack Overflow post

sed '/^$/d' file >newfile
...will do it for you.

The command that you use will delete all the lines you want to delete, but it leaves the blank lines that are already there in the original file in place. To delete these too, simply apply a second filter to the input:
$ sed -e '/pattern/d' -e '/^[:blank:]*$/d' <file >newfile
The second filter will remove lines that are empty, or that contains only whitespace characters (i.e., that are "blank" in their appearance).

Related

Copy lines from one file to another in Linux excluding comments

How do I copy lines from one file to another in Linux without opening source and destination files and I need to exclude the comments when copying the lines.
I do not want to copy the comments in the first file and the files are in different locations
Assuming lines are commented with # at the very beginning of each line, the following should work:
grep -v "^#" path/to/input/file >path/to/output/file
(Note: This will either create a new output file or irreversibly overwrite the output file if it already exists)
Assuming comment lines in your file contain # at the beginning of each line, the following sed command will delete these lines:
$ sed '/^#/d' path/to/input-file > path/to/output-file
If your file can also contain lines with whitespace before the #, the following sed command will delete lines beginning with zero or more spaces or tabs (in any order), followed by a hash (#) character:
$ sed '/^[ \t]*#/d' path/to/input-file > path/to/output-file
If your file also contains lines containing code followed by a comment, the following sed command should work:
$ sed -e '/^[ \t]*#/d' -e 's/#.*$//' path/to/input-file > path/to/output-file

Delete everything after pattern including pattern

I have a text file like
some
important
content
goes here
---from here--
some
unwanted content
I am trying to delete all lines after ---from here-- including ---from here--. That is, the desired output is
some
important
content
goes here
I tried sed '1,/---from here--/!d' input.txt but it's not removing the ---from here-- part. If I use sed '/---from here--.*/d' input.txt, it's only removing ---from here-- text.
How can I remove lines after a pattern including that pattern?
EDIT
I can achieve it by doing the first operation and pipe its output to second, like sed '1,/---from here--/!d' input.txt | sed '/---from here--.*/d' > outputput.txt.
Is there a single step solution?
Another approach with sed:
sed '/---from here--/,$d' file
The d(delete) command is applied to all lines from first line containing ---from here-- up to the end of file($)
Another awk approach:
awk '/---from here--/{exit}1' file
If you have GNU awk 4.1.0+, you can add -i inplace to change the file in-place.
Otherwise appened | tee file to change the file in-place.
I'm not positive, but I believe this will work:
sed -n '/---from here--/q; p' file
The q command tells sed to quit processing input lines after matching a given line.
Could you please try following(in case you are ok with awk).
awk '/--from here--/{found_from=1} !found_from{print}' Input_file
You can try Perl
perl -ne ' $x++ if /---from here--/; print if !$x '
using your inputs..
$ cat johnykutty.txt
some
important
content
goes here
---from here--
some
unwanted content
$ perl -ne ' $x++ if /---from here--/; print if !$x ' johnykutty.txt
some
important
content
goes here
$

Hidden line in file?

I have a UTF-8/no BOM file (converted from ISO-8859-1) that has 31214 lines. I have already run dos2unix on the file. When I open it in notepad++, I see a blank line underneath. When I remove this blank line, the line count reduces by one. I save it under a different name and when I tail the file, the prompt displays on the same line. From bash, how do I delete the blank line in the 1st file to produce the result displayed below in the 2nd file?
The goal is to do this from bash w/o manually deleting the line in notepad++
1st file:
[user#server]$ cat file1.txt | wc -l
31214
[user#server]$ tail file1.txt
T 31212 Data 20170517
[user#server]$
2nd file (edited with notepad++)
[user#server]$ cat file2.txt | wc -l
31213
[user#server]$ tail file2.txt
T 31212 Data 20170517[user#server]$
That's the trailing newline of the last line. Some editors allow you to go to the nonexisting "empty" line at the end, some don't show it. Again, some programs may allow you to remove the final newline, but note that e.g. POSIX in effect requires it to be there, and some standard utilities act oddly if it isn't present.
E.g. wc -l counts the number of newlines in the input file (printf "foo\nbar" | wc -l shows 1) so removing the final newline does decrease the line count.
Also, Bash prints the prompt wherever it was that the cursor was left on the screen, so if you print something that doesn't have the trailing newline, the prompt will be placed where the final incomplete line ended, as you saw.
There's no need to remove that final newline, just leave it there.
To remove the final newline character it is possible, as explained here, to use
sed -i '$ s/.$//' your.file
which will substitute nothing for the last character in the last line of the file (if you want to delete smth else from the end of the file you can replace the regex .$ with smth-else$). -i means ‘substitute in-place’ (in FreeBSD/MacOS you need to add an empty string as an argument: sed -i "" '$ s/.$//' your.file)
The file2.txt is missing a trailing newline.
Yes, a text file should end on a newline character.
Given that you do know that a trailing newline is missing, this command should be enough to correct the problem:
$ echo >> file2.txt

How to use sed to delete multiple lines when the pattern is matched and stop until the first blank line?

I am very new to shell script.
How can I delete multiple lines when the pattern is matched and stop deleting until the first blank line is matched?
You can do this:
sed '/STARTING_PATTERN/,/^$/d' filename
This will select all the lines starting from STARTING_PATTERN upto a blank line ^$ and then delete those lines.
To edit files in place, use -i option.
sed -i '/STARTING_PATTER/,/^$/d' filename
Or using awk:
awk 'BEGIN{f=1} /STARTING_PATTERN/{f=0} f{print} !$0{f=1}' filename

Replace whitespace with a comma in a text file in Linux

I need to edit a few text files (an output from sar) and convert them into CSV files.
I need to change every whitespace (maybe it's a tab between the numbers in the output) using sed or awk functions (an easy shell script in Linux).
Can anyone help me? Every command I used didn't change the file at all; I tried gsub.
tr ' ' ',' <input >output
Substitutes each space with a comma, if you need you can make a pass with the -s flag (squeeze repeats), that replaces each input sequence of a repeated character that is listed in SET1 (the blank space) with a single occurrence of that character.
Use of squeeze repeats used to after substitute tabs:
tr -s '\t' <input | tr '\t' ',' >output
Try something like:
sed 's/[:space:]+/,/g' orig.txt > modified.txt
The character class [:space:] will match all whitespace (spaces, tabs, etc.). If you just want to replace a single character, eg. just space, use that only.
EDIT: Actually [:space:] includes carriage return, so this may not do what you want. The following will replace tabs and spaces.
sed 's/[:blank:]+/,/g' orig.txt > modified.txt
as will
sed 's/[\t ]+/,/g' orig.txt > modified.txt
In all of this, you need to be careful that the items in your file that are separated by whitespace don't contain their own whitespace that you want to keep, eg. two words.
without looking at your input file, only a guess
awk '{$1=$1}1' OFS=","
redirect to another file and rename as needed
What about something like this :
cat texte.txt | sed -e 's/\s/,/g' > texte-new.txt
(Yes, with some useless catting and piping ; could also use < to read from the file directly, I suppose -- used cat first to output the content of the file, and only after, I added sed to my command-line)
EDIT : as #ghostdog74 pointed out in a comment, there's definitly no need for thet cat/pipe ; you can give the name of the file to sed :
sed -e 's/\s/,/g' texte.txt > texte-new.txt
If "texte.txt" is this way :
$ cat texte.txt
this is a text
in which I want to replace
spaces by commas
You'll get a "texte-new.txt" that'll look like this :
$ cat texte-new.txt
this,is,a,text
in,which,I,want,to,replace
spaces,by,commas
I wouldn't go just replacing the old file by the new one (could be done with sed -i, if I remember correctly ; and as #ghostdog74 said, this one would accept creating the backup on the fly) : keeping might be wise, as a security measure (even if it means having to rename it to something like "texte-backup.txt")
This command should work:
sed "s/\s/,/g" < infile.txt > outfile.txt
Note that you have to redirect the output to a new file. The input file is not changed in place.
sed can do this:
sed 's/[\t ]/,/g' input.file
That will send to the console,
sed -i 's/[\t ]/,/g' input.file
will edit the file in-place
Here's a Perl script which will edit the files in-place:
perl -i.bak -lpe 's/\s+/,/g' files*
Consecutive whitespace is converted to a single comma.
Each input file is moved to .bak
These command-line options are used:
-i.bak edit in-place and make .bak copies
-p loop around every line of the input file, automatically print the line
-l removes newlines before processing, and adds them back in afterwards
-e execute the perl code
If you want to replace an arbitrary sequence of blank characters (tab, space) with one comma, use the following:
sed 's/[\t ]+/,/g' input_file > output_file
or
sed -r 's/[[:blank:]]+/,/g' input_file > output_file
If some of your input lines include leading space characters which are redundant and don't need to be converted to commas, then first you need to get rid of them, and then convert the remaining blank characters to commas. For such case, use the following:
sed 's/ +//' input_file | sed 's/[\t ]+/,/g' > output_file
This worked for me.
sed -e 's/\s\+/,/g' input.txt >> output.csv

Resources