Get rid of unwanted lines from file

Get rid of unwanted lines from file - linux

In bellow example ^[ - are escape characters to stain terminal output (just type ctrl+v+[).
1) My file:
-------- just to mark start of file ----------
^[[1;31mbla bla bla^[[0m
^[[0;36mTREE;01;^[[0m
^[[1;31m^[[0m
^[[1;31m^[[1;31mapple tree:^[[0m^[[0m
^[[1;31m4 apples^M^M^[[0m
^[[1;31m6 leafs^M^[[0m
^[[0;36mTREE;02;^[[0m
^[[0;36mTREE;03;^[[0m
withered
^[[0;36mTREE;04;^[[0m
^[[0;36mTREE;05;^[[0m
^[[0;36mTREE;06;^[[0m
^[[0;36mTREE;07;^[[0m
^[[1;31m^[[0m
^[[1;31m^[[1;31mcherry tree:^[[0m^[[0m
^[[1;31mbig branches^M^M^[[0m
^[[1;31mtchick roots^M^[[0m
^[[0;36mTREE;08;^[[0m
^[[0;36mMy tree ^[[0m I have tree house on it^[[0;31m:-)^[[0m
^[[0;36mTREE;09;^[[0m
-------- just to mark end of file ----------
2) I want to get rid of all "empty labels" - it is all labels that have no comments under it.
So the result I want to achieve is:
-------- just to mark start of results ----------
^[[1;31mbla bla bla^[[0m
^[[0;36mTREE;01;^[[0m
^[[1;31m^[[0m
^[[1;31m^[[1;31mapple tree:^[[0m^[[0m
^[[1;31m4 apples^M^M^[[0m
^[[1;31m6 leafs^M^[[0m
^[[0;36mTREE;03;^[[0m
withered
^[[0;36mTREE;07;^[[0m
^[[1;31m^[[0m
^[[1;31m^[[1;31mcherry tree:^[[0m^[[0m
^[[1;31mbig branches^M^M^[[0m
^[[1;31mtchick roots^M^[[0m
^[[0;36mTREE;08;^[[0m
^[[0;36mMy tree ^[[0m I have tree house on it^[[0;31m:-)^[[0m
-------- just to mark end of results ----------
3) I do:
pcregrep -M 'TREE.*\n(\n|\s)+(?=.*TREE|\z)' my_file
and it works as I expect - it leaves only labels with no comments
-------- just to mark start of results ----------
^[[0;36mTREE;02;^[[0m
^[[0;36mTREE;04;^[[0m
^[[0;36mTREE;05;^[[0m
^[[0;36mTREE;06;^[[0m
^[[0;36mTREE;09;^[[0m
-------- just to mark end of results ----------
4) But command:
pcregrep -Mv 'TREE.*\n(\n|\s)+(?=.*TREE|\z)' my_file
products "wired results" I do not understand.
*) How to get result I want?
With any tool like: pcregrep, ag, ack, sed, awk, ...

The simplest and, probably, the stupidest solution that I have came up with:
[steelrat#archlinux ~]$ awk '/TREE/ {f=$0;p=1} !/^ *$/&&!/TREE/ {if (p==1) {print f; p=0} print $0}' my_file
-------- just to mark start of results ----------
^[[1;31mbla bla bla^[[0m
^[[0;36mTREE;01;^[[0m
^[[1;31m^[[0m
^[[1;31m^[[1;31mapple tree:^[[0m^[[0m
^[[1;31m4 apples^M^M^[[0m
^[[1;31m6 leafs^M^[[0m
^[[0;36mTREE;03;^[[0m
withered
^[[0;36mTREE;07;^[[0m
^[[1;31m^[[0m
^[[1;31m^[[1;31mcherry tree:^[[0m^[[0m
^[[1;31mbig branches^M^M^[[0m
^[[1;31mtchick roots^M^[[0m
^[[0;36mTREE;08;^[[0m
^[[0;36mMy tree ^[[0m I have tree house on it^[[0;31m:-)^[[0m
-------- just to mark end of results ----------
If you need spaces (requires some extra work to get rid of spaces from empty sections):
$ awk '/^ *$/ {print $0} /TREE/ {f=$0;p=1} !/^ *$/&&!/TREE/ {if (p==1) {print f; p=0} print $0}' my_file
-------- just to mark start of results ----------
^[[1;31mbla bla bla^[[0m
^[[0;36mTREE;01;^[[0m
^[[1;31m^[[0m
^[[1;31m^[[1;31mapple tree:^[[0m^[[0m
^[[1;31m4 apples^M^M^[[0m
^[[1;31m6 leafs^M^[[0m
^[[0;36mTREE;03;^[[0m
withered
^[[0;36mTREE;07;^[[0m
^[[1;31m^[[0m
^[[1;31m^[[1;31mcherry tree:^[[0m^[[0m
^[[1;31mbig branches^M^M^[[0m
^[[1;31mtchick roots^M^[[0m
^[[0;36mTREE;08;^[[0m
^[[0;36mMy tree ^[[0m I have tree house on it^[[0;31m:-)^[[0m
-------- just to mark end of results ----------

Well I did it.
(1) sed 's/^M//g;
(2) s/$/#VAV#/' my_file | \
(3) paste -sd "" | \
(4) sed 's/^[\[0;36mTREE[[:print:]]\+^[\[0m\(\(#VAV#\)\|\([[:blank:]]\)\|\(^[\[0;36mTREE[[:print:]]\+^[\[0m\)\)*\(\(^[\[0;36mTREE[[:print:]]\+^[\[0m\)\|$\)/\6/g;
(5) s/#VAV#/\n/g'
(1) Get rid if ^M escape char - it handicap things.
(2) Put "some deliberate" string at end of each line.
(3) Concatenate all lines into one string.
(4) Do proper regular expression substitution.
(5) Change back that string from point (2) to end of line.

Related

VIM - Reformat text to one line paragraphs

I have a text file like the following:
--------
FOX&DOGS
The quick brown. Fox
jumped.
Over the lazy dogs.
-------------------
I want to change it as follow:
--------
FOX&DOGS
The quick brown. Fox jumped.
Over the lazy dogs.
-------------------
So in general:
preserve empty line/lines
have new-lines just after any period_newline ".\n" (end of paragraph... In the above example I don’t want to cut line after "brown." for instance: there is just a period but not followed by newline, so it isn’t an end of a paragraph, so it has to stay on the same line)
My solution:
%s/\n\n/#\r#\r/ | %s/\.\n/\.#\r/ | %j | s/# /\r/g | $$d
The idea is a bit rude:
mark all ends of paragraph and empty lines (I have chosen "#" as marker)
join all lines in a long single one
substitute the marker "# " (there is a space after #) with carriage return "\r" (newline)
delete last empty line created during this procedure
It seemed to work so I also created an alias in vimrc:
command Par %s/\n\n/#\r#\r/ | %s/\.\n/\.#\r/ | %j | s/# /\r/g | $$d
The problem:
If there aren’t any empty lines it returns error "pattern not found", and it doesn’t change anything. Seems a sort of conditional instruction is needed (if you find pattern substitute it with... else don't stop, continue with the other commands).
Any idea to solve in a simple way?

Maybe I found a solution:
add a blank line after the last one, so that the pattern “\n\n” is always found even if it isn’t present in the original file, and the error can’t block next commands.
in the end we will have to remove 2 blank lines at the bottom created by the substitution “s/# /\r/g”
So the command I tried is:
$ | put _ | %s/\n\n/#\r#\r/ | %s/\.\n/\.#\r/ | %j | s/# /\r/g | $$d | $$d
$ go to the last line
append a blank line
mark newlines involving blank lines (also last blank line added) with # character
mark newlines involving period (the last line can’t end with period due to the marker # added at the previous step)
join all lines in a long one
replace markers “# ” with a newline (here we creates two more blank lines at the bottom, have to be removed)
remove the two last blank lines added
Limitations:
if a paragraph ends with a punctuation mark other than “period”, it doesn’t work at all.
Any idea to improve my raw oneliner is welcome!

Print between two patterns with filepath/filename in a directory

I need a command that prints data between two strings (Hello and End) along with the file name and file path on each line. Here is the input and output. Appreciate your time and help
Input
file1:
Hello
abc
xyz
End
file2:
Hello
123
456
End
file3:
Hello
Output:
/home/test/seq/file1 abc
/home/test/seq/file1 xyz
/home/test/seq/file2 123
/home/test/seq/file2 456
I tried awk and sed but not able to print the file with the path.
awk '/Hello/{flag=1;next}/End/{flag=0}flag' * 2>/dev/null

With awk:
awk '!/Hello/ && !/End/ {print FILENAME,$0} ' /home/test/seq/file?
Output:
/home/test/seq/file1 abc
/home/test/seq/file1 xyz
/home/test/seq/file2 123
/home/test/seq/file2 456

If your file contains lines above Hello and/or below End, then you can use a flag to control printing as you had attempted in your question, e.g.
awk -v f=0 '/End/{f=0} f == 1 {print FILENAME, $0} /Hello/{f=1}' file1 file2 file..
This would handle the case where your input file contained, e.g.
$cat file
some text
some more
Hello
abc
xyz
End
still more text
The flag f is a simple ON/OFF flag to control printing and placing the end rule first with the actual print in the middle eliminates the need for any next command.

Splunk search bunch of Strings and display table of _raw

I want to search a set of strings using OR (any better way is appreciated). Is there a way to assign name to Strings.
index=blah host=123 "ERROR" ("FILE1" OR "FILE2" OR "FILE3" ) | rex
field=_raw ".errorDesc\":\"(?.)\",\"errorCode.*" | table
_time RESP_JSON
Now, I want to add Filename as another column in table. If File is not present show empty values for rest of columns
Note: fileName is not a field, its just a string in _raw field
Splunk ::
[12/12/2015:12:12:12.123] ERROR occured while processing FILE1. errorDesc":"{field:123,code:124}","errorCode
[12/12/2015:13:13:12.123] ERROR occured while processing FILE3. errorDesc":"{field:125,code:124}","errorCode
eg Output:
File -------------------_time ----------------------- RESP_JSON
FILE1 ----- 12/12/2015:12:12:12.123 ----- {field:123,code:124}
FILE2
FILE3 ----- 12/12/2015:13:13:12.123 ----- {field:125,code:124}
No log entry for File2 is present, so empty row with just file Name is displayed

Have u tried below to extract the filename?
index=blah host=123 "ERROR" ("FILE1" OR "FILE2" OR "FILE3" ) | rex field=_raw "(?<filename>). errorDesc" | table _time RESP_JSON filename
Regarding the first question of naming search terms have you looked at macros or using subsearchs with lookups?

Give this a shot:
index=blah host=123 "ERROR" ("FILE1" OR "FILE2" OR "FILE3" ) | rex "processing\s+(?<filename>[^\.]+)\.\s+" | table _time RESP_JSON filename
It's the same search as above, just a different regex extraction.

Parsing in Linux

I want to parse the compute zones in open-stack command output as below
+-----------------------+----------------------------------------+
| Name | Status |
+-----------------------+----------------------------------------+
| internal | available |
| |- controller | |
| | |- nova-conductor | enabled :-) 2016-07-07T08:09:57.000000 |
| | |- nova-consoleauth | enabled :-) 2016-07-07T08:10:01.000000 |
| | |- nova-scheduler | enabled :-) 2016-07-07T08:10:00.000000 |
| | |- nova-cert | enabled :-) 2016-07-07T08:10:00.000000 |
| Compute01 | available |
| |- compute01 | |
| | |- nova-compute | enabled :-) 2016-07-07T08:09:53.000000 |
| Compute02 | available |
| |- compute02 | |
| | |- nova-compute | enabled :-) 2016-07-07T08:10:00.000000 |
| nova | not available |
+-----------------------+----------------------------------------+
i want to parse the result as below, taking only nodes having nova-compute
Compute01;Compute02
I used below command:
nova availability-zone-list | awk 'NR>2 {print $2}' | grep -v '|' | tr '\n' ';'
but it returns output like this
;internal;Compute01;Compute02;nova;;

In Perl (and written rather more verbosely than is really necessary):
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
my $node; # Store current node name
my #compute_nodes; # Store known nova-compute nodes
while (<>) { # Read from STDIN
# If we find the start of line, followed by a pipe, a space and
# a series of word characters...
if (/^\| (\w+)/) {
# Store the series of word characters (i.e. the node name) in $node
$node = $1;
}
# If we find a line that contains "nova-compute", add the current
# node name in #compute_nodes
push #compute_nodes, $node if /nova-compute/;
}
# Print out all of the values in #compute_nodes
say join ';', #compute_nodes;

I detest one-line programs except for the most simple of applications. They are unnecessarily cryptic, they have none of the usual programming support, and they are stored only in the terminal buffer. Want to do the same thing tomorrow? You must start coding again
Here's a Perl solution. Run it as
$ perl nova-compute.pl command-output.txt
use strict;
use warnings 'all';
my ($node, #nodes);
while ( <> ) {
$node = $1 if /^ \| \s* (\w+) /x;
push #nodes, $node if /nova-compute/;
}
print join(';', #nodes), "\n";
output
Compute01;Compute02
Now all of that is saved on disk. It may be run again at any time, modified for similar results, or fixed if you got it wrong. It is also readable. No contest

$ nova availability-zone-list | awk '/^[|] [^|]/{node=$2} node && /nova-compute/ {s=s ";" node} END{print substr(s,2)}'
Compute01;Compute02
How it works:
/^[|] [^|]/{node=$2}
Any time a line begins with | followed by space followed by a character not |, then save the second field as a node name.
node && /nova-compute/ {s=s ";" node}
If node is non-empty and the current line contains nova-compute, then append node to the string s.
END{print substr(s,2)}
After we have read all the lines, print out string s minus its first character which is a superfluous ;.

Trying to replace a pattern with another one

this is my first question on this website.(glad i found out about this community)
I am trying to replace a specific pattern in a file(multiple lines) that looks somehow like this:
Bla bla bla bla |SMTH AWESOME INSIDE >>> LOL| bla bla bla | let's do it again >>> AWESOME |
Into a format that looks like this
Bla bla bla bla ( LOL | SMTH AWESOME INSIDE ) bla bla bla ( AWESOME | let's do it again )
I tried doing this by using a code that parses the line word by word and if it finds out the "|" character starts creating a string that contains the first word,then, after it finds the >>> character it starts creating the second string till it finds the "|" last character, but it didn't work.
I also tried afterwards using AWK(but since i am new to linux i failed as well.
awk -F 'BEGIN { FS=OFS="|" } { sub(/.*<<</,"", $2); }1' $1 }'
and then parse the output with sed(removing the ) and ( characters from both strings. But it didn't work.
Thank you for reading.

It looks like this is just a simple substitution within each line so all you need is sed:
$ sed 's/| *\([^|]*\) >>> \([^|]*\) *|/( \2 | \1 )/g' file
Bla bla bla bla ( LOL | SMTH AWESOME INSIDE ) bla bla bla ( AWESOME | let's do it again )
You can do the same in GNU awk with gensub() or other awks with match() and substr().

With extended regexp in sed:
sed -r 's/\|([^|]+)[[:space:]]*>>>[[:space:]]*([^|]+)\|/( \2 | \1 )/g' File
Logic:
We look for a pattern which starts with | followed by a sequence of non-| characters followed by >>> followed by a sequence of non-| characters again. See the groupings done with ( and ). Then we substitute these patterns according to our need. ( \2 | \1 ) is the replacement pattern where \1 and \2 are the first and second groupings respectively.
With basic regexp in sed:
sed 's/|\([^|]*\)[[:space:]]*>>>[[:space:]]*\([^|]*\)|/( \2 | \1 )/g' File

Perl's regular expressions have a "non-greedy" matching feature that awk's do not:
perl -pe '
s/ \| # the first delimiter
(.*?) # capture up to ...
>>> # the middle delimiter
(.*?) # capture up to ...
\| # the last delimiter
/($2 | $1)/gx
' file
Bla bla bla bla ( LOL | SMTH AWESOME INSIDE ) bla bla bla ( AWESOME | let's do it again )

Let's try with awk:
awk 'NR%2{ printf("%s", $0) } NR%2==0{ printf("( %s %s",$NF,RS); gsub(/>>>.*$/,")"); printf("%s",$0) }' RS='|' file
Bla bla bla bla ( LOL | SMTH AWESOME INSIDE ) bla bla bla ( AWESOME | let's do it again )
The RS defines | as record separator. So when the input record number (NR) isn't module of 2 (NR%2 return 1) then print that record itself. If the NR is module of 2 (NR%2==0 means if record is module of 2), then print a single open parentheses followed by printing last field from it and print record separator (printf("( %s %s",$NF,RS)), then replace >>>.*$ with close parentheses and print the rest of record (gsub(/>>>.*$/,")"); printf("%s",$0))

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Get rid of unwanted lines from file - linux

Related

VIM - Reformat text to one line paragraphs

Print between two patterns with filepath/filename in a directory

Splunk search bunch of Strings and display table of _raw

Parsing in Linux

Trying to replace a pattern with another one

Categories

Resources