Filter out only matched values from a text file in each line - linux

I have a file "test.txt" with the lines below and also lot bunch of extra stuff after the "version"
soainfra_metrics{metric_group="sca_composite",partition="test",is_active="true",state="on",is_default="true",composite="test123"} map:stats version:1.0
soainfra_metrics{metric_group="sca_composite",partition="gello",is_active="true",state="on",is_default="true",composite="test234"} map:stats version:1.8
soainfra_metrics{metric_group="sca_composite",partition="bolo",is_active="true",state="on",is_default="true",composite="3415"} map:stats version:3.1
soainfra_metrics{metric_group="sca_composite",partition="solo",is_active="true",state="on",is_default="true",composite="hji"} map:stats version:1.1
I tried:
egrep -r 'partition|is_active|state|is_default|composite' test.txt
It's displaying every line, but I need only specific mentioned fields like this below,ignoring rest of the data/stuff or lines
in a nut shell, i want to display only these fields from a line not the rest
partition="test",is_active="true",state="on",is_default="true",composite="test123"
partition="gello",is_active="true",state="on",is_default="true",composite="test234"
partition="bolo",is_active="true",state="on",is_default="true",composite="3415"
partition="solo",is_active="true",state="on",is_default="true",composite="hji"

If your version of grep supports Perl-style regular expressions, then I'd use this:
grep -oP '.*?,\K[^}]+' file
It removes everything up to the first comma (\K kills any previous output) and prints everything up to the }.
Alternatively, using awk:
awk -F'}' '{ sub(/[^,]+,/, ""); print $1 }' file
This sets the field separator to } so the part you're interested in is the first field. It then uses sub to remove the part up to the first comma.
For completeness, you could also use sed:
sed 's/[^,]*,\([^}]*\).*/\1/' file
This captures the part after the first , up to the } and replaces the content of the line with it.

After the grep to pick out the lines you want, use sed to edit the lines:
sed 's/.*\(partition[^}]*\)} map.*/\1/'
This means: "whenever you see anything .*, followed by partition and
any number of non-}, then } map and anything else, grab the part
from partition up to but not including the brace \(...\) as group 1.
The replacement text is just group 1 \1.
Use a pipe | to connect the output of egrep to the input of sed:
egrep ... | sed ...

As far as i understood your file might have more lines you don't want to see, so i would use:
sed -n 's/.*\(partition.*\)}.*/\1/p' file
we use -n p to show only lines where we made substitution. The substitution part just gets the part of the line you need substituting the whole line with the pattern.

This might work for you (GNU sed):
sed -r 's/(partition|is_active|state|is_default|composite)="[^"]*"/\n&\n/g;s/[^\n]*\n([^\n]*)\n[^\n]*/\1,/g;s/,$//' file
Treat the problem as if it were a "decomposed club sandwich". Identify the fillings, remove the bread and tidy up.

Related

How to extract and replace columns with a multi-character delimiter?

I got a file with ^$ as delimiter, the text is like :
tony^$36^$developer^$20210310^$CA
I want to replace the datetime.
I tried awk -F '\^\$' '{print $4}' file.txt | sed -i '/20210310/20221210/' , but it returns nothing. Then I tried the awk part, it returns nothing, I guess it still treat the line as a whole and the delimiter doesn't work. Wondering why and how to solve it?
A simple solution would be:
sed 's/\^\$/\n/g; s/20210310/20221210/g' -i file.txt
which will modify the file to separate each section to a new line.
If you need a different delimiter, change the \n in the command to maybe space or , .. up to you.
And it will also replace the date in the file.
If you want to see the changes, and really modify the file, remove the -i from the command.
When I run your awk command, I get these warnings:
awk: warning: escape sequence `\^' treated as plain `^'
awk: warning: escape sequence `\$' treated as plain `$'
That explains why your output is blank: the field delimiter is interpreted as the regular expression '^$', which matches a completely blank line (only). As a result, each non-blank line of input is without any field separators, and therefore has only a single field. $4 can be non-empty only if there are at least four fields.
You can fix that by escaping the backslashes:
awk -F '\\^\\$' '{print $4}' file.txt
If all you want to do is print the modified datecodes py themselves, then that should get you going. However, the question ...
How to extract and replace columns with a multi-character delimiter?
... sounds like you may want actually to replace the datecode within each line, keeping the rest intact. In that case, it is a non-starter for the awk command to discard the other parts of the line. You have several options here, but two of the more likely would be
instead of sending field 4 out to sed for substitution, do the sub in the awk script, and then reconstitute the input line by printing all fields, with the expected delimiters. (This is left as an exercise.) OR
do the whole thing in sed:
sed -E 's/^((([^^]|\^[^$])*\^\$){3})20210310(\^\$.*)/\120221210\4/' file.txt
If you wanted to modify file.txt in-place then you could add the -i flag (which, on the other hand, is not useful in your original command, where sed's input is coming from a pipe rather than a file).
The -E option engages the POSIX extended regex dialect, which allows the given regex to be more readable (the alternative would require a bunch more \ characters).
Overall, presuming that there are five or more fields delimited by literal '^$' strings, and the fourth contains exactly "20210310", that matches the first three fields, including their trailing delimiters, and captures them all as group 1; matches the leading delimiter of the fifth field and all the remainder of the line and captures it as group 4; and substitutes replaces the whole line with group 1 followed by the new datecode followed by group 4.

How to delete 5 lines before and 6 lines after pattern match using Sed?

I want to search for a pattern "xxxx" in a file and delete 5 lines before this pattern and 6 lines after this match. How can i do this using Sed?
This might work for you (GNU sed):
sed ':a;N;s/\n/&/5;Ta;/xxxx/!{P;D};:b;N;s/\n/&/11;Tb;d' file
Keep a rolling window of 5 lines and on encountering the specified string add 6 more (11 in total) and delete.
N.B. This is a barebones solution and will most probably need tailoring to your specific needs. Questions such as: what if there are multiple string throughout the file? What if the string is within the first five lines or multiple strings are within five lines of each other etc etc etc.
Here's one way you could do it using awk. I assume that you also want to delete the line itself and that the file is small enough to fit into memory:
awk '{a[NR]=$0}/xxxx/{f=NR}END{for(i=1;i<=NR;++i)if(i<f-5||i>f+6)print a[i]}' file
Store every line into the array a. When the pattern /xxxx/ is matched, save the line number. After the whole file has been processed, loop through the array, only printing the lines you want to keep.
Alternatively, you can use grep to obtain the line number first:
grep -n 'xxxx' file | awk -F: 'NR==FNR{f=$1}NR<f-5||NR>f+6' - file
In both cases, the lines deleted will be surrounding the last line where the pattern is matched.
A third option would be to use grep to obtain the line number then use sed to delete the lines:
line=$(grep -nm1 'xxxx' file | cut -d: -f1)
sed "$((line-5)),$((line+6))d" file
In this case I've also added the -m switch so grep exits after finding the first match.
if you know, the line number (what is not difficult to obtain), you can use something like that:
filename="test"
start=`expr $curr_line - 5`
end=`expr $curr_line + 6`
sed "${start},${end}d" $filename (optionally sed -i)
of course, you have to remember about additional conditions like start shouldn't be less than 1 and end greater than number of lines in file.
Another - maybe more easy to follow - solution would be to use grep to find the keyword and the corresponding line:
grep -n 'KEYWORD' <file>
then use sed to get the line number only like this:
grep -n 'KEYWORD' <file> | sed 's/:.*//'
Now that you have the line number simply use sed like this:
sed -i "$(LINE_START),$(LINE_END) d" <file>
to remove lines before and/or after! With only the -i you will override the <file> (no backup).
A script example could be:
#!/bin/bash
KEYWORD=$1
LINES_BEFORE=$2
LINES_AFTER=$3
FILE=$4
LINE_NO=$(grep -n $KEYWORD $FILE | sed 's/:.*//' )
echo "Keyword found in line: $LINE_NO"
LINE_START=$(($LINE_NO-$LINES_BEFORE))
LINE_END=$(($LINE_NO+$LINES_AFTER))
echo "Deleting lines $LINE_START to $LINE_END!"
sed -i "$LINE_START,$LINE_END d" $FILE
Please note that this will work only if the keyword is found once! Adapt the script to your needs!

Removing string between two symbol in line

I am trying to remove a string between two symbol in line from a csv file. Here is my sample file :
1.1.1.1,A-B:,awef.C.D.E
1.1.1.2,A-B:,few.C.D.E
1.1.1.3,A-B:,dfs.C.D
1.1.1.4,A-B:,few.C.D
1.1.1.5,A-B:,fdsferger.C.D.E
1.1.1.6,A-B:,wef.C.D
1.1.1.7,A-B:,jty.C.D.E
The output would be like this :
1.1.1.1,A-B:,C.D.E
1.1.1.2,A-B:,C.D.E
1.1.1.3,A-B:,C.D
1.1.1.4,A-B:,C.D
1.1.1.5,A-B:,C.D.E
1.1.1.6,A-B:,C.D
1.1.1.7,A-B:,C.D.E
Any way I can achieve it?
The following awk command can do this:
awk 'BEGIN{FS=OFS=","}{sub("[^.]*.","",$3);print}'
It basically divides each line into the three comma-separated fields then removes the initial part of the third field, up to and including the first . character.
Then it simply outputs them again.
See the following transcript for a demonstration:
pax> echo '1.1.1.1,A-B:,awef.C.D.E
1.1.1.2,A-B:,few.C.D.E
1.1.1.3,A-B:,dfs.C.D
1.1.1.4,A-B:,few.C.D
1.1.1.5,A-B:,fdsferger.C.D.E
1.1.1.6,A-B:,wef.C.D
1.1.1.7,A-B:,jty.C.D.E' | awk 'BEGIN{FS=OFS=","}{sub("[^.]*.","",$3);print}'
1.1.1.1,A-B:,C.D.E
1.1.1.2,A-B:,C.D.E
1.1.1.3,A-B:,C.D
1.1.1.4,A-B:,C.D
1.1.1.5,A-B:,C.D.E
1.1.1.6,A-B:,C.D
1.1.1.7,A-B:,C.D.E
Here is an awk that should do:
awk '{sub(/:,[^.]*\./,":,")}1' file
1.1.1.1,A-B:,C.D.E
1.1.1.2,A-B:,C.D.E
1.1.1.3,A-B:,C.D
1.1.1.4,A-B:,C.D
1.1.1.5,A-B:,C.D.E
1.1.1.6,A-B:,C.D
1.1.1.7,A-B:,C.D.E
You can use sed also
sed -r 's/(.*:,)([a-z]*.)(.*)/\1\3/g'
(or)
sed -r 's/:,[^.]+\./:,/' file
This might work for you (GNU sed):
sed 's/^\(.*,\)[^.]*\./\1/' file
Use greed to gather up all the columns but the last and then delete upto and including the first ..

find words in two quotes unix

I would like to display the last word in these lines I tried to look for example the word value but no answer, so I thought to look for the words between quotes but my file contains other words between quotes that I have I need not actually want to display the values ​​of the select tag knowing that my html file is.
grep '*' hosts.html | awk '{print $NF}'
For example:
value='www.visit-tunisia.com'>www.visit-tunisia.com
value='www.watania1.tn'>www.watania1.tn
value='www.watania2.tn'>www.watania2.tn
I would have
www.visit-tunisia.com
www.watania1.tn
www.watania2.tn
You need to set the field separator to > you do this with the -F option:
$ awk -F'>' '{print $NF}' hosts.html
www.visit-tunisia.com
www.watania1.tn
www.watania2.tn
Note: I'm not sure what you are trying to achieve by grep '*' hosts.html?
Interpreting the comment liberally, you have input lines which might contain:
value='www.visit-tunisia.com'>www.visit-tunisia.com
value='www.watania1.tn'>www.watania1.tn
value='www.watania2.tn'>www.watania2.tn
and you would like the names which are repeated on a line as the output:
www.visit-tunisia.com
www.watania1.tn
www.watania2.tn
This can be done using sed and capturing parentheses.
sed -n -e "s/.*'\([^']*\)'.*\1.*/\1/p"
The -n says "don't print unless I say to do so". The s///p command prints if the substitute works. The pattern looks for a stream of 'anything' (.*), a single quote, captures what's inside up to the next single quote ('\([^']*\)') followed by any text, the captured text (the first \1), and anything. The replacement text is what was captured (the second \1).
Example:
$ cat data
www and wotnot
value='www.visit-tunisia.com'>www.visit-tunisia.com
blah
value='www.watania1.tn'>www.watania1.tn
hooplah
value='www.watania2.tn'>www.watania2.tn
if 'nothing' is required, nothing will be done.
$ sed -n -e "s/.*'\([^']*\)'.*\1.*/\1/p" data
www.visit-tunisia.com
www.watania1.tn
www.watania2.tn
nothing
$
Clearly, you can refine the [^']* part of the match if you want to. I used double quotes around the expression since the pattern matches on single quotes. Life is trickier if you need to allow both single and double quotes; at that point, I'd put the script into a file and run sed -f script data to make life easier.
sed 's/.*>\(.*\)/\1/g' your_file

Replace whitespace with a comma in a text file in Linux

I need to edit a few text files (an output from sar) and convert them into CSV files.
I need to change every whitespace (maybe it's a tab between the numbers in the output) using sed or awk functions (an easy shell script in Linux).
Can anyone help me? Every command I used didn't change the file at all; I tried gsub.
tr ' ' ',' <input >output
Substitutes each space with a comma, if you need you can make a pass with the -s flag (squeeze repeats), that replaces each input sequence of a repeated character that is listed in SET1 (the blank space) with a single occurrence of that character.
Use of squeeze repeats used to after substitute tabs:
tr -s '\t' <input | tr '\t' ',' >output
Try something like:
sed 's/[:space:]+/,/g' orig.txt > modified.txt
The character class [:space:] will match all whitespace (spaces, tabs, etc.). If you just want to replace a single character, eg. just space, use that only.
EDIT: Actually [:space:] includes carriage return, so this may not do what you want. The following will replace tabs and spaces.
sed 's/[:blank:]+/,/g' orig.txt > modified.txt
as will
sed 's/[\t ]+/,/g' orig.txt > modified.txt
In all of this, you need to be careful that the items in your file that are separated by whitespace don't contain their own whitespace that you want to keep, eg. two words.
without looking at your input file, only a guess
awk '{$1=$1}1' OFS=","
redirect to another file and rename as needed
What about something like this :
cat texte.txt | sed -e 's/\s/,/g' > texte-new.txt
(Yes, with some useless catting and piping ; could also use < to read from the file directly, I suppose -- used cat first to output the content of the file, and only after, I added sed to my command-line)
EDIT : as #ghostdog74 pointed out in a comment, there's definitly no need for thet cat/pipe ; you can give the name of the file to sed :
sed -e 's/\s/,/g' texte.txt > texte-new.txt
If "texte.txt" is this way :
$ cat texte.txt
this is a text
in which I want to replace
spaces by commas
You'll get a "texte-new.txt" that'll look like this :
$ cat texte-new.txt
this,is,a,text
in,which,I,want,to,replace
spaces,by,commas
I wouldn't go just replacing the old file by the new one (could be done with sed -i, if I remember correctly ; and as #ghostdog74 said, this one would accept creating the backup on the fly) : keeping might be wise, as a security measure (even if it means having to rename it to something like "texte-backup.txt")
This command should work:
sed "s/\s/,/g" < infile.txt > outfile.txt
Note that you have to redirect the output to a new file. The input file is not changed in place.
sed can do this:
sed 's/[\t ]/,/g' input.file
That will send to the console,
sed -i 's/[\t ]/,/g' input.file
will edit the file in-place
Here's a Perl script which will edit the files in-place:
perl -i.bak -lpe 's/\s+/,/g' files*
Consecutive whitespace is converted to a single comma.
Each input file is moved to .bak
These command-line options are used:
-i.bak edit in-place and make .bak copies
-p loop around every line of the input file, automatically print the line
-l removes newlines before processing, and adds them back in afterwards
-e execute the perl code
If you want to replace an arbitrary sequence of blank characters (tab, space) with one comma, use the following:
sed 's/[\t ]+/,/g' input_file > output_file
or
sed -r 's/[[:blank:]]+/,/g' input_file > output_file
If some of your input lines include leading space characters which are redundant and don't need to be converted to commas, then first you need to get rid of them, and then convert the remaining blank characters to commas. For such case, use the following:
sed 's/ +//' input_file | sed 's/[\t ]+/,/g' > output_file
This worked for me.
sed -e 's/\s\+/,/g' input.txt >> output.csv

Resources