Removing string between two symbol in line - linux

I am trying to remove a string between two symbol in line from a csv file. Here is my sample file :
1.1.1.1,A-B:,awef.C.D.E
1.1.1.2,A-B:,few.C.D.E
1.1.1.3,A-B:,dfs.C.D
1.1.1.4,A-B:,few.C.D
1.1.1.5,A-B:,fdsferger.C.D.E
1.1.1.6,A-B:,wef.C.D
1.1.1.7,A-B:,jty.C.D.E
The output would be like this :
1.1.1.1,A-B:,C.D.E
1.1.1.2,A-B:,C.D.E
1.1.1.3,A-B:,C.D
1.1.1.4,A-B:,C.D
1.1.1.5,A-B:,C.D.E
1.1.1.6,A-B:,C.D
1.1.1.7,A-B:,C.D.E
Any way I can achieve it?

The following awk command can do this:
awk 'BEGIN{FS=OFS=","}{sub("[^.]*.","",$3);print}'
It basically divides each line into the three comma-separated fields then removes the initial part of the third field, up to and including the first . character.
Then it simply outputs them again.
See the following transcript for a demonstration:
pax> echo '1.1.1.1,A-B:,awef.C.D.E
1.1.1.2,A-B:,few.C.D.E
1.1.1.3,A-B:,dfs.C.D
1.1.1.4,A-B:,few.C.D
1.1.1.5,A-B:,fdsferger.C.D.E
1.1.1.6,A-B:,wef.C.D
1.1.1.7,A-B:,jty.C.D.E' | awk 'BEGIN{FS=OFS=","}{sub("[^.]*.","",$3);print}'
1.1.1.1,A-B:,C.D.E
1.1.1.2,A-B:,C.D.E
1.1.1.3,A-B:,C.D
1.1.1.4,A-B:,C.D
1.1.1.5,A-B:,C.D.E
1.1.1.6,A-B:,C.D
1.1.1.7,A-B:,C.D.E

Here is an awk that should do:
awk '{sub(/:,[^.]*\./,":,")}1' file
1.1.1.1,A-B:,C.D.E
1.1.1.2,A-B:,C.D.E
1.1.1.3,A-B:,C.D
1.1.1.4,A-B:,C.D
1.1.1.5,A-B:,C.D.E
1.1.1.6,A-B:,C.D
1.1.1.7,A-B:,C.D.E

You can use sed also
sed -r 's/(.*:,)([a-z]*.)(.*)/\1\3/g'
(or)
sed -r 's/:,[^.]+\./:,/' file

This might work for you (GNU sed):
sed 's/^\(.*,\)[^.]*\./\1/' file
Use greed to gather up all the columns but the last and then delete upto and including the first ..

Related

Delete everything after pattern including pattern

I have a text file like
some
important
content
goes here
---from here--
some
unwanted content
I am trying to delete all lines after ---from here-- including ---from here--. That is, the desired output is
some
important
content
goes here
I tried sed '1,/---from here--/!d' input.txt but it's not removing the ---from here-- part. If I use sed '/---from here--.*/d' input.txt, it's only removing ---from here-- text.
How can I remove lines after a pattern including that pattern?
EDIT
I can achieve it by doing the first operation and pipe its output to second, like sed '1,/---from here--/!d' input.txt | sed '/---from here--.*/d' > outputput.txt.
Is there a single step solution?
Another approach with sed:
sed '/---from here--/,$d' file
The d(delete) command is applied to all lines from first line containing ---from here-- up to the end of file($)
Another awk approach:
awk '/---from here--/{exit}1' file
If you have GNU awk 4.1.0+, you can add -i inplace to change the file in-place.
Otherwise appened | tee file to change the file in-place.
I'm not positive, but I believe this will work:
sed -n '/---from here--/q; p' file
The q command tells sed to quit processing input lines after matching a given line.
Could you please try following(in case you are ok with awk).
awk '/--from here--/{found_from=1} !found_from{print}' Input_file
You can try Perl
perl -ne ' $x++ if /---from here--/; print if !$x '
using your inputs..
$ cat johnykutty.txt
some
important
content
goes here
---from here--
some
unwanted content
$ perl -ne ' $x++ if /---from here--/; print if !$x ' johnykutty.txt
some
important
content
goes here
$

Using Sed or Awk to divide a file into two based on whether a line contains a numeric value

I have used sed and awk for little while now, but I am having a challenge with the below problem. I am asking for an experienced sed/awk guru to help.
I have a file where some lines have numbers and some lines do not, like:
afjjdjfj.uihuihi
trfg.rtyhd
0rtgfd.tjbghhh
hbvfd4.rtgbvdgf
00fhfg.fdrgf
rtygfd.ijhniuh
etc.
I would like to have exactly two files out of this one, where every line is represented in one of the two files (none are deleted).
One containing all lines with any numbers 0-9 on them so given above file result would be:
0rtgfd.tjbghhh
hbvfd4.rtgbvdgf
00fhfg.fdrgf
and another file containing the rest of the lines that do not have any numbers 0-9 on them, so given the above, file it would be:
afjjdjfj.uihuihi
trfg.rtyhd
rtygfd.ijhniuh
I've tried different strategies in both sed and awk and nothing is giving me exactly what I need.
What would be the best sed or awk one liner to solve this problem?
Thank you for your time,
Tom
Easily with Awk:
awk '/[0-9]/{print > file1; next} {print > file2}' inputfile
With single GNU sed command:
sed -ne '/[0-9]/w with_digits.txt' -e '//!w no_digits.txt' input
Results:
> cat no_digits.txt
afjjdjfj.uihuihi
trfg.rtyhd
rtygfd.ijhniuh
> cat with_digits.txt
0rtgfd.tjbghhh
hbvfd4.rtgbvdgf
00fhfg.fdrgf
w filename Write the pattern space to filename.
If you don't mind running twice over the input, you can use just grep:
grep '[0-9]' input > with_digits
grep -v '[0-9]' input > without_digits
perl -MFile::Slurp -lpe '/\d/ ? append_file("digits.txt",$_) : append_file("no_digits.txt",$_)' input.txt

Filter out only matched values from a text file in each line

I have a file "test.txt" with the lines below and also lot bunch of extra stuff after the "version"
soainfra_metrics{metric_group="sca_composite",partition="test",is_active="true",state="on",is_default="true",composite="test123"} map:stats version:1.0
soainfra_metrics{metric_group="sca_composite",partition="gello",is_active="true",state="on",is_default="true",composite="test234"} map:stats version:1.8
soainfra_metrics{metric_group="sca_composite",partition="bolo",is_active="true",state="on",is_default="true",composite="3415"} map:stats version:3.1
soainfra_metrics{metric_group="sca_composite",partition="solo",is_active="true",state="on",is_default="true",composite="hji"} map:stats version:1.1
I tried:
egrep -r 'partition|is_active|state|is_default|composite' test.txt
It's displaying every line, but I need only specific mentioned fields like this below,ignoring rest of the data/stuff or lines
in a nut shell, i want to display only these fields from a line not the rest
partition="test",is_active="true",state="on",is_default="true",composite="test123"
partition="gello",is_active="true",state="on",is_default="true",composite="test234"
partition="bolo",is_active="true",state="on",is_default="true",composite="3415"
partition="solo",is_active="true",state="on",is_default="true",composite="hji"
If your version of grep supports Perl-style regular expressions, then I'd use this:
grep -oP '.*?,\K[^}]+' file
It removes everything up to the first comma (\K kills any previous output) and prints everything up to the }.
Alternatively, using awk:
awk -F'}' '{ sub(/[^,]+,/, ""); print $1 }' file
This sets the field separator to } so the part you're interested in is the first field. It then uses sub to remove the part up to the first comma.
For completeness, you could also use sed:
sed 's/[^,]*,\([^}]*\).*/\1/' file
This captures the part after the first , up to the } and replaces the content of the line with it.
After the grep to pick out the lines you want, use sed to edit the lines:
sed 's/.*\(partition[^}]*\)} map.*/\1/'
This means: "whenever you see anything .*, followed by partition and
any number of non-}, then } map and anything else, grab the part
from partition up to but not including the brace \(...\) as group 1.
The replacement text is just group 1 \1.
Use a pipe | to connect the output of egrep to the input of sed:
egrep ... | sed ...
As far as i understood your file might have more lines you don't want to see, so i would use:
sed -n 's/.*\(partition.*\)}.*/\1/p' file
we use -n p to show only lines where we made substitution. The substitution part just gets the part of the line you need substituting the whole line with the pattern.
This might work for you (GNU sed):
sed -r 's/(partition|is_active|state|is_default|composite)="[^"]*"/\n&\n/g;s/[^\n]*\n([^\n]*)\n[^\n]*/\1,/g;s/,$//' file
Treat the problem as if it were a "decomposed club sandwich". Identify the fillings, remove the bread and tidy up.

Quickest way to remove 70+ strings from a file?

I have 70+ strings I need to find and delete in a file. I need to remove the entire line in the file that the string appears in.
I know I can use sed -i '/string to remove/d' fileA.txt to remove them one at a time. However, considering I have 70+, it will take some time doing it this way.
Is there a way I can put these 70+ strings in a file and have sed go through them one by one? Or if I create a file containing the strings, is there a way to compare the two files so it removes any line from fileA that contains one of the strings?
You could use grep:
grep -vf file_with_words.txt file.txt
where file_with_words.txt would be the file containing the list of words, each word being on a different line and file.txt is the file that you want to remove the lines from.
If your list of words contains regex metacharacters, then tell grep to consider those as fixed strings (if that is what you want):
grep -F -vf file_with_words.txt file.txt
Using sed, you'd need to say:
sed '/word1\|word2\|word3/d' file.txt
or
sed -E '/word1|word2|word3/d' file.txt
You could use command substitution to construct the pattern too:
sed -E "/$(paste -sd'|' file_with_words.txt)/d" file.txt
but grep is clearly the tool to use in this case.
If you want to do the job in bash, here's how:
search=fileA.txt
queries=queries.txt
while read query
do
sed -i '' "/$query/d" $search
done < "$queries"
where queries.txt looks like
I
want
to
delete
these
lines

Replace whitespace with a comma in a text file in Linux

I need to edit a few text files (an output from sar) and convert them into CSV files.
I need to change every whitespace (maybe it's a tab between the numbers in the output) using sed or awk functions (an easy shell script in Linux).
Can anyone help me? Every command I used didn't change the file at all; I tried gsub.
tr ' ' ',' <input >output
Substitutes each space with a comma, if you need you can make a pass with the -s flag (squeeze repeats), that replaces each input sequence of a repeated character that is listed in SET1 (the blank space) with a single occurrence of that character.
Use of squeeze repeats used to after substitute tabs:
tr -s '\t' <input | tr '\t' ',' >output
Try something like:
sed 's/[:space:]+/,/g' orig.txt > modified.txt
The character class [:space:] will match all whitespace (spaces, tabs, etc.). If you just want to replace a single character, eg. just space, use that only.
EDIT: Actually [:space:] includes carriage return, so this may not do what you want. The following will replace tabs and spaces.
sed 's/[:blank:]+/,/g' orig.txt > modified.txt
as will
sed 's/[\t ]+/,/g' orig.txt > modified.txt
In all of this, you need to be careful that the items in your file that are separated by whitespace don't contain their own whitespace that you want to keep, eg. two words.
without looking at your input file, only a guess
awk '{$1=$1}1' OFS=","
redirect to another file and rename as needed
What about something like this :
cat texte.txt | sed -e 's/\s/,/g' > texte-new.txt
(Yes, with some useless catting and piping ; could also use < to read from the file directly, I suppose -- used cat first to output the content of the file, and only after, I added sed to my command-line)
EDIT : as #ghostdog74 pointed out in a comment, there's definitly no need for thet cat/pipe ; you can give the name of the file to sed :
sed -e 's/\s/,/g' texte.txt > texte-new.txt
If "texte.txt" is this way :
$ cat texte.txt
this is a text
in which I want to replace
spaces by commas
You'll get a "texte-new.txt" that'll look like this :
$ cat texte-new.txt
this,is,a,text
in,which,I,want,to,replace
spaces,by,commas
I wouldn't go just replacing the old file by the new one (could be done with sed -i, if I remember correctly ; and as #ghostdog74 said, this one would accept creating the backup on the fly) : keeping might be wise, as a security measure (even if it means having to rename it to something like "texte-backup.txt")
This command should work:
sed "s/\s/,/g" < infile.txt > outfile.txt
Note that you have to redirect the output to a new file. The input file is not changed in place.
sed can do this:
sed 's/[\t ]/,/g' input.file
That will send to the console,
sed -i 's/[\t ]/,/g' input.file
will edit the file in-place
Here's a Perl script which will edit the files in-place:
perl -i.bak -lpe 's/\s+/,/g' files*
Consecutive whitespace is converted to a single comma.
Each input file is moved to .bak
These command-line options are used:
-i.bak edit in-place and make .bak copies
-p loop around every line of the input file, automatically print the line
-l removes newlines before processing, and adds them back in afterwards
-e execute the perl code
If you want to replace an arbitrary sequence of blank characters (tab, space) with one comma, use the following:
sed 's/[\t ]+/,/g' input_file > output_file
or
sed -r 's/[[:blank:]]+/,/g' input_file > output_file
If some of your input lines include leading space characters which are redundant and don't need to be converted to commas, then first you need to get rid of them, and then convert the remaining blank characters to commas. For such case, use the following:
sed 's/ +//' input_file | sed 's/[\t ]+/,/g' > output_file
This worked for me.
sed -e 's/\s\+/,/g' input.txt >> output.csv

Resources