Remove everything before a blank line using sed - linux

Lets say I have a file which is something like this:
"Testing is important"
Nothing is impossible
The output should be:
Nothing is impossible
This means the sed removed everything before new line. Also, I need to make sure it works on bash on windows.
Please help.

You can try this
sed '1,/^\s*$/d' file
\s is whitespace, it's same with
sed '1,/^[[:blank:]]*$/d' file

Sed supports addressing lines both as numbers and as matching regex. In your case, you can delete all lines starting from 1, and ending with an empty line:
sed -e '1,/^$/d'
On Windows your files may contain contain carriage returns, in which case you can use:
sed -e '1,/^\r*$/d'
(assuming GNU sed)

To allow for more than one blank line and multiple lines after the final blank line, you could use something like this:
awk 'BEGIN{RS=ORS=""}{a=$0}END{print a}' file
This unsets the Record Separator RS, so that each block/paragraph is treated as a separate record. It assigns each record to the variable a, then prints the last value of a once the file has been processed. The Output Record Separator ORS is also unset, so that no newline is appended to the final block.

Related

How to truncate rest of the text in a file after finding a specific text pattern, in unix?

I have a HTML PAGE which I have extracted in unix using wget command, in that after the word "Check list" I need to remove all of the text and with the remaining I am trying to grep some data. I am unable to think on a way which can be helpful for removing the text after a keyword. if I do
s/Check list.*//g
It just removes the line , I want everything below that to be gone. How do I perform this?
The other solutions you have so far require non-POSIX-mandatory tools (GNU sed, GNU awk, or perl) so YMMV with their availability and will read the whole file into memory at once.
These will work in any awk in any shell on every Unix box and only read 1 line at a time into memory:
awk -F 'Check list' '{print $1} NF>1{exit}' file
or:
awk 'sub(/Check list.*/,""){f=1} {print} f{exit}' file
With GNU awk for multi-char RS you could do:
awk -v RS='Check list' '{print; exit}' file
but that would still read all of the text before Check list into memory at once.
Depending on which sed version you have, maybe
sed -z 's/Check list.*//'
The /g flag is useless as you only want to replace everything once.
If your sed does not have the -z option (which says to use the ASCII null character as line terminator instead of newline; this hinges on your file not containing any actual nulls, but that should trivially be true for any text file), try Perl:
perl -0777 -pe 's/Check list.*//s'
Unlike sed -z, this explicitly says to slurp the entire file into memory (the argument to -0 is the octal character code of a terminator character, but 777 is not a valid terminator character at all, so it always reads the entire file as a single "line") so this works even if there are spurious nulls in your file. The final s flag says to include newline in what . matches (otherwise s/.*// would still only substitute on the matching physical line).
I assume you are aware that removing everything will violate the integrity of the HTML file; it needs there to be a closing tag for every start tag near the beginning of the document (so if it starts with <html><body> you should keep </body></html> just before the end of the file, for example).
With awk you could make use of RS variable and then set field separator to regex with word boundaries and then print the very first field as per need.
awk -v RS="^$" -v FS='\\<check_list\\>' '{print $1}' Input_file
You might use q to instruct GNU sed to quit, thus ending processing, consider following simple example, let file.txt content be
123
456
789
and say you want to jettison everything beyond 5, then you could do
sed '/5/{s/5.*//;q}' file.txt
which gives output
123
4
Explanation: for line having 5, substitute 5 and everything beyond it with empty string (i.e. delete it), then q. Observe that lowercase q is used to provide printing of altered line before quiting.
(tested in GNU sed 4.7)

How to extract and replace columns with a multi-character delimiter?

I got a file with ^$ as delimiter, the text is like :
tony^$36^$developer^$20210310^$CA
I want to replace the datetime.
I tried awk -F '\^\$' '{print $4}' file.txt | sed -i '/20210310/20221210/' , but it returns nothing. Then I tried the awk part, it returns nothing, I guess it still treat the line as a whole and the delimiter doesn't work. Wondering why and how to solve it?
A simple solution would be:
sed 's/\^\$/\n/g; s/20210310/20221210/g' -i file.txt
which will modify the file to separate each section to a new line.
If you need a different delimiter, change the \n in the command to maybe space or , .. up to you.
And it will also replace the date in the file.
If you want to see the changes, and really modify the file, remove the -i from the command.
When I run your awk command, I get these warnings:
awk: warning: escape sequence `\^' treated as plain `^'
awk: warning: escape sequence `\$' treated as plain `$'
That explains why your output is blank: the field delimiter is interpreted as the regular expression '^$', which matches a completely blank line (only). As a result, each non-blank line of input is without any field separators, and therefore has only a single field. $4 can be non-empty only if there are at least four fields.
You can fix that by escaping the backslashes:
awk -F '\\^\\$' '{print $4}' file.txt
If all you want to do is print the modified datecodes py themselves, then that should get you going. However, the question ...
How to extract and replace columns with a multi-character delimiter?
... sounds like you may want actually to replace the datecode within each line, keeping the rest intact. In that case, it is a non-starter for the awk command to discard the other parts of the line. You have several options here, but two of the more likely would be
instead of sending field 4 out to sed for substitution, do the sub in the awk script, and then reconstitute the input line by printing all fields, with the expected delimiters. (This is left as an exercise.) OR
do the whole thing in sed:
sed -E 's/^((([^^]|\^[^$])*\^\$){3})20210310(\^\$.*)/\120221210\4/' file.txt
If you wanted to modify file.txt in-place then you could add the -i flag (which, on the other hand, is not useful in your original command, where sed's input is coming from a pipe rather than a file).
The -E option engages the POSIX extended regex dialect, which allows the given regex to be more readable (the alternative would require a bunch more \ characters).
Overall, presuming that there are five or more fields delimited by literal '^$' strings, and the fourth contains exactly "20210310", that matches the first three fields, including their trailing delimiters, and captures them all as group 1; matches the leading delimiter of the fifth field and all the remainder of the line and captures it as group 4; and substitutes replaces the whole line with group 1 followed by the new datecode followed by group 4.

How to add character at the end of specific line in UNIX/LINUX?

Here is my input file. I want to add a character ":" into the end of lines that have ">" at the beginning of the line. I tried seq -i 's|$|:|' input.txt but ":" was added to all the ending of each line. It is also hard to call out specific line numbers because, in each of my input files, the line contains">" present in different line numbers. I want to run a loop for multiple files so it is useless.
>Pas_pyrG_2
AAAGTCACAATGGTTAAAATGGATCCTTATATTAATGTCGATCCAGGGACAATGAGCCCA
TTCCAGCATGGTGAAGTTTTTGTTACCGAAGATGGTGCAGAAACAGATCTGGATCTGGGT
>Pas_rpoB_4
CAAACTCACTATGGTCGTGTTTGTCCAATTGAAACTCCTGAAGGTCCAAACATTGGTTTG
ATCAACTCGCTTTCTGTATACGCAAAAGCGAATGACTTCGGTTTCTTGGAAACTCCATAC
CGCAAAGTTGTAGATGGTCGTGTAACTGATGATGTTGAATATTTATCTGCAATTGAAGAA
>Pas_cpn60_2
ATGAACCCAATGGATTTAAAACGCGGTATCGACATTGCAGTAAAAACTGTAGTTGAAAAT
ATCCGTTCTATTGCTAAACCAGCTGATGATTTCAAAGCAATTGAACAAGTAGGTTCAATC
TCTGCTAACTCTGATACTACTGTTGGTAAACTTATTGCTCAAGCAATGGAAAAAGTAGGT
AAAGAAGGCGTAATCACTGTAGAAGAAGGCTCAGGCTTCGAAGACGCATTAGACGTTGTA
Here is experted output file:
>Pas_pyrG_2:
AAAGTCACAATGGTTAAAATGGATCCTTATATTAATGTCGATCCAGGGACAATGAGCCCA
TTCCAGCATGGTGAAGTTTTTGTTACCGAAGATGGTGCAGAAACAGATCTGGATCTGGGT
>Pas_rpoB_4:
CAAACTCACTATGGTCGTGTTTGTCCAATTGAAACTCCTGAAGGTCCAAACATTGGTTTG
ATCAACTCGCTTTCTGTATACGCAAAAGCGAATGACTTCGGTTTCTTGGAAACTCCATAC
CGCAAAGTTGTAGATGGTCGTGTAACTGATGATGTTGAATATTTATCTGCAATTGAAGAA
>Pas_cpn60_2:
ATGAACCCAATGGATTTAAAACGCGGTATCGACATTGCAGTAAAAACTGTAGTTGAAAAT
ATCCGTTCTATTGCTAAACCAGCTGATGATTTCAAAGCAATTGAACAAGTAGGTTCAATC
TCTGCTAACTCTGATACTACTGTTGGTAAACTTATTGCTCAAGCAATGGAAAAAGTAGGT
AAAGAAGGCGTAATCACTGTAGAAGAAGGCTCAGGCTTCGAAGACGCATTAGACGTTGTA
Do seq have more option to modify or the other commands can solve this problem?
sed -i '/^>/ s/$/:/' input.txt
Search the lines of input for lines that match ^> (regex for "starts with the > character). Those that do substitute : for end-of-line (you got this part right).
/ slashes are the standard separator character in sed. If you wish to use different characters, be sure to pass -e or s|$|:| probably won't work. Since / characters, unlike | characters, are not meaningful character within the shell, it's best to use them unless the pattern also contains slashes, in which case things get unwieldy.
Be careful with sed -i. Make a backup - make sure you know what's changing by using diff to compare the files.
On OSX -i requires an argument.
Using ed to edit the file:
printf "%s\n" 'g/^>/s/$/:/' w | ed -s input.txt
For every line starting with >, add a colon to the end, and then write the changed file back to disk.

Sed command with exact variable change

I want to replace exact word by sed command with variable. My file looks like this:
//module xyz
module xyz
Suppose I have the following shell variables defined:
var1='module xyz'
var2='module abc'
I want to change xyz to abc in uncommented line only(module xyz)
So after executing command output should be
//module xyz
module abc
I do not want to change commented line (//module xyz)
currently I am using sed command as,
sed -i "s|$var1|$var2|g" file_name
But this command doesn't work. It also replace commented line. How can I only replace the line that isn't commented?
Assuming that you know the pattern is at the start of the line, you can use this:
sed "s|^$var1|$var2|" file_name
That is, add an anchor ^, so that the match has to be at the start of the line.
I removed the -i switch so you can test it and also the g modifier, which isn't necessary as you only want to do one substitution per line.
It's worth mentioning that using shell variables in sed is actually quite tricky to do in a reliable way, so you should take this into account.
Your shell variable assignment should be quoted if there is space. Like:
var1="foo bar blah"
You can add pattern, "the lines don't start with // " to your sed command, so that do your substitution only for those lines
This line should work for your example:
sed -i "\#^//#\!s/$var1/$var2/g" file
the ! needs to be escaped, because we used double quote
since your pattern (comment) has slash (/), I used other char as regex separator
This command will only do substitution on lines not starting with //. If there are leading spaces, you may want to adjust the pattern ^//
You need to identify a pattern so that lines containing that pattern should not be processed.
Assuming that // will exist only in commented lines you can use
sed -i '/\/\// !s/$var1/$var2/g' file_name
/\/\// will enable sed to identify lines which contain the pattern //, and !s will enable you to skip those lines.

Filter out only matched values from a text file in each line

I have a file "test.txt" with the lines below and also lot bunch of extra stuff after the "version"
soainfra_metrics{metric_group="sca_composite",partition="test",is_active="true",state="on",is_default="true",composite="test123"} map:stats version:1.0
soainfra_metrics{metric_group="sca_composite",partition="gello",is_active="true",state="on",is_default="true",composite="test234"} map:stats version:1.8
soainfra_metrics{metric_group="sca_composite",partition="bolo",is_active="true",state="on",is_default="true",composite="3415"} map:stats version:3.1
soainfra_metrics{metric_group="sca_composite",partition="solo",is_active="true",state="on",is_default="true",composite="hji"} map:stats version:1.1
I tried:
egrep -r 'partition|is_active|state|is_default|composite' test.txt
It's displaying every line, but I need only specific mentioned fields like this below,ignoring rest of the data/stuff or lines
in a nut shell, i want to display only these fields from a line not the rest
partition="test",is_active="true",state="on",is_default="true",composite="test123"
partition="gello",is_active="true",state="on",is_default="true",composite="test234"
partition="bolo",is_active="true",state="on",is_default="true",composite="3415"
partition="solo",is_active="true",state="on",is_default="true",composite="hji"
If your version of grep supports Perl-style regular expressions, then I'd use this:
grep -oP '.*?,\K[^}]+' file
It removes everything up to the first comma (\K kills any previous output) and prints everything up to the }.
Alternatively, using awk:
awk -F'}' '{ sub(/[^,]+,/, ""); print $1 }' file
This sets the field separator to } so the part you're interested in is the first field. It then uses sub to remove the part up to the first comma.
For completeness, you could also use sed:
sed 's/[^,]*,\([^}]*\).*/\1/' file
This captures the part after the first , up to the } and replaces the content of the line with it.
After the grep to pick out the lines you want, use sed to edit the lines:
sed 's/.*\(partition[^}]*\)} map.*/\1/'
This means: "whenever you see anything .*, followed by partition and
any number of non-}, then } map and anything else, grab the part
from partition up to but not including the brace \(...\) as group 1.
The replacement text is just group 1 \1.
Use a pipe | to connect the output of egrep to the input of sed:
egrep ... | sed ...
As far as i understood your file might have more lines you don't want to see, so i would use:
sed -n 's/.*\(partition.*\)}.*/\1/p' file
we use -n p to show only lines where we made substitution. The substitution part just gets the part of the line you need substituting the whole line with the pattern.
This might work for you (GNU sed):
sed -r 's/(partition|is_active|state|is_default|composite)="[^"]*"/\n&\n/g;s/[^\n]*\n([^\n]*)\n[^\n]*/\1,/g;s/,$//' file
Treat the problem as if it were a "decomposed club sandwich". Identify the fillings, remove the bread and tidy up.

Resources