How to replace same occurrences with different strings in linux - linux

I have already seen few similar type questions, but I have a different situation here to post it.
I have to replace different strings to same occurences in a file. I have used
sed -i 's/X/Y/g' file.txt
For similar occurence I have used line numbers like
sed -i '3s/X/Y/g ; 4s/X/Z/g' file.txt
This is possible only if those strings are always in same line.
Ex : file.txt
This color is
...some more lines
This color is
...some more lines
This color is
...some more lines
This color is
...some more lines`
I need to change them as
This color is blue
...some more lines
This color is red
...some more lines
This color is green
...some more lines
This color is yellow
...some more lines
Without using line numbers. As the line numbers for those strings can change anytime if more info is added?
Can anyone please help. Thank you

awk to the rescue!
this will cycle through the colors if there are more lines than the colors
$ awk -v colors='blue,red,green,yellow' 'BEGIN {n=split(colors,v,",")}
/color/ {$0=$0 OFS v[i++%n+1]}1' file
to embed this into a quoted string, it will be easier to remove double quotes altogether. Simply change to
$ awk -v colors='blue red green yellow' 'BEGIN {n=split(colors,v)}
/color/ {$0=$0 OFS v[i++%n+1]}1' file
if your colors are not single words, you can't use the above, so back to splitting with comma (or any other delimiter), just need to escape them
$ awk -v colors='true blue,scarlet red,pistachio green,canary yellow' '
BEGIN {n=split(colors,v,\",\")}
/color/ {$0=$0 OFS v[i++%n+1]}1' file

Your question isn't clear but it SOUNDS like you're trying to do this:
awk 'NR==FNR{colors[NR];next} /This color is/{$0 = $0 OFS colors[++c]} 1' colors file
where colors is a file containing one color per line and file is the file you want the color values added to. If that's not what you want then edit your question to specify your requirements more clearly and come up with a better (and complete/testable) example.

Related

extracting text from one file and creating new file with that text using linux/bash

I have a sequence file that has a repeated pattern that looks like this:
$>g34 | effector probability: 0.6
GPCKPRTSASNTLTTTLTTAEPTPTTIATETTIATSDSSKTTTIDNITTTTSEAESNTKTESSTIAQTRTTTDTSEHESTTASSVSSQPTTTEGITTTSIAQTRTTTDTSEHESTTASSVSSQPTTTEGITTTS"
$>g104 | effector probability: 0.65
GIFSSLICATTAVTTGIICHGTVTLATGGTCALATLPAPTTSIAQTRTTTDTSEH
$>g115 | effector probability: 0.99
IAQTRTTTDTSEHESTTASSVSSQPTTTEGITTTSIAQTRTTTDTSEHESTTASSVSSQPTTTEGITTTS
and so on.
I want to extract the text between and including each >g## and create a new file titled protein_g##.faa
In the above example it would create a file called "protein_g34.faa" and it would be:
$>g34 | effector probability: 0.6
GPCKPRTSASNTLTTTLTTAEPTPTTIATETTIATSDSSKTTTIDNITTTTSEAESNTKTESSTIAQTRTTTDTSEHESTTASSVSSQPTTTEGITTTSIAQTRTTTDTSEHESTTASSVSSQPTTTEGITTTS
I was trying to use sed but I am not very experienced using it. My guess was something like this:
$ sed -n '/^>g*/s///p; y/ /\n/' file > "g##"
but I can clearly tell that that is wrong... maybe the right thing is using awk?
Thanks!
Yeah, I would use awk for that. I don't think sed can write to more than one different output stream.
Here's how I would write that:
< input.txt awk '/^\$>/{fname = "protein_" substr($1, 3) ".faa"; print "sending to " fname} {print $0 > fname}'
Breaking it down into details:
< input.txt This part reads in the input file.
awk Runs awk.
/^\$>/ On lines which start with the literal string $>, run the piece of code in brackets.
(If previous step matched) {fname = "protein_" substr($1, 3) ".faa"; print "sending to " fname} Take the first field in the previous line. Remove the first two characters of that field. Surround that with protein_ .faa. Save it as the variable fname. Print a message about switching files.
This next block has no condition before it. Implicitly, that means that it matches every line.
{print $0 > fname} Take the entire line, and send it to the filename held by fname. If no file is selected, this will cause an error.
Hope that helps!
If awk is an option:
awk '/\|/ {split($1,a,">"); fname="protein_"a[2]".faa"} {print $0 >> fname}' src.dat
awk is better than sed for this problem. You can implement it in sed with
sed -rz 's/(\$>)(g[^ ]*)([^\n]*\n[^\n]*)\n/echo '\''\1\2\3'\'' > protein_\2.faa/ge' file
This solution is nice for showing some sed tricks:
-z for parsing fragments that span several lines
(..) for remembering strings
\$ matching a literal $
[^\n]* matching until end of line
'\'' for a single quote
End single quoted string, escape single quote and start new single quoted string
\2 for recalling the second remembered string
Write a bash command in the replacement string
e execute result of replacement
awk procedure
awk allows records to be extracted between empty (or white space only) lines by setting the record separator to an empty string RS=""
Thus the records intended for each file can be got automatically.
The id to be used in the filename can be extracted from field 1 $1 by splitting the (default white-space-separated) field at the ">" mark, and using element 2 of the split array (named id in this example).
The file is written from awk before closing the file to prevent errors is you have many lines to process.
The awk procedure
The example data was saved in a file named all.seq and the following procedure used to process it:
awk 'BEGIN{RS="";} {split($1,id,">"); fn="protein_"id[2]".faa"; print $0 > fn; close(fn)}' all.seq
tests results
(terminal listings/outputs)
$ ls
all.seq protein_g104.faa protein_g115.faa protein_g34.faa
$ cat protein_g104.faa
$>g104 | effector probability: 0.65
GIFSSLICATTAVTTGIICHGTVTLATGGTCALATLPAPTTSIAQTRTTTDTSEH
$ cat protein_g115.faa
$>g115 | effector probability: 0.99
IAQTRTTTDTSEHESTTASSVSSQPTTTEGITTTSIAQTRTTTDTSEHESTTASSVSSQPTTTEGITTTS
$ cat protein_g34.faa
$>g34 | effector probability: 0.6
GPCKPRTSASNTLTTTLTTAEPTPTTIATETTIATSDSSKTTTIDNITTTTSEAESNTKTESSTIAQTRTTTDTSEHESTTASSVSSQPTTTEGITTTSIAQTRTTTDTSEHESTTASSVSSQPTTTEGITTTS"
Tested using GNU Awk 5.1.0

Print the lines between two lines that has the same pattern using bash

I have a text file containing with string pattern 'OG' -
OG
ANNNFKSODJFHFJ
SSJSJKSKJSAJAS
SSSSSSSSSSSSFA
OG
FALJFNAFAFNAFJL
AFJLJSLJFLFSLFL
ASJFAJFAKFKAFKK
OG
AJSFLJASFLSFLFF
SJFLAFLAFLFLAFA
ASASFASFOFLJAJF
I want to print the lines between the first two lines that has 'OG' as string pattern, i.e. The result should be -
ANNNFKSODJFHFJ
SSJSJKSKJSAJAS
SSSSSSSSSSSSFA
Here, any suggestions using 'sed' or 'awk'. I don't want to use printing by line number but rather with pattern search.
I tried using awk, but not working -
awk '/^OG/{flag=1;next}/^OG/{flag=0}flag' file.txt
You're almost there:
$ awk '/^OG/ {if(go) exit; go=1; next} go {print}' file.txt
ANNNFKSODJFHFJ
SSJSJKSKJSAJAS
SSSSSSSSSSSSFA

Sed/awk: Aligning words in a file

I have a file with the following structure:
# #################################################################
# TEXT: MORE TEXT
# TEXT: MORE TEXT
# #################################################################
___________________________________________________________________
ITEM 1
___________________________________________________________________
PROPERTY1: VALUE1_1
PROPERTY222: VALUE2_1
PROPERTY33: VALUE3_1
PROPERTY4444: VALUE4_1
PROPERTY55: VALUE5_1
Description1: Some text goes here
Description2: Some text goes here
___________________________________________________________________
ITEM 2
___________________________________________________________________
PROPERTY1: VALUE1_2
PROPERTY222: VALUE2_2
PROPERTY33: VALUE3_2
PROPERTY4444: VALUE4_2
PROPERTY55: VALUE5_2
Description1: Some text goes here
Description2: Some text goes here
I want to add another item to the file, using sed or awk:
sed -i -r "\$a$PROPERTY1: VALUE1_3" file.txt
sed -i -r "\$a$PROPERTY2222: VALUE2_3" file.txt
etc. So my next item looks like this:
___________________________________________________________________
ITEM 3
___________________________________________________________________
PROPERTY1: VALUE1_3
PROPERTY222: VALUE2_3
PROPERTY33: VALUE3_3
PROPERTY4444: VALUE4_3
PROPERTY55: VALUE5_3
Description1: Some text goes here
Description2: Some text goes here
The column values is jagged. How do I align my values to the left like for previous items? I can see 2 solutions here:
To align the values while inserting them into the file.
To insert the values into the file the way I did it and align them next.
The command
sed -i -r "s|.*:.*|&|g" file.txt
catches the properties and values I want to align, but I haven't been able to align them properly, i.e.
awk '/^.*:.*$/{ printf "%-40s %-70s\n", $1, $2 }' file.txt
It prints out the file, but it includes the description values and tags, cuts the values if they include spaces or dashes. It just a big mess.
I've tried more commands based on what I've found on Stack Overflow and some blogs, but nothing does what I need.
Note: Values of the description tags are not jagged- this is because I write them to the file in a separate way.
What is wrong with my commands? How do I achieve what I need?
When your file is without tabs, try this:
sed -r 's/: +/:\t/' file.txt | expand -20
When this works, redirect the output to a tmpfile and move the tmpfile to file.txt.
You can use gensub and thoughtful field seperators to take care of this:
for i in {1..5}; do
echo $(( 10 ** i )): $i;
done | awk -F ':::' '/^[^:]+:.+/{
$0 = gensub(/: +/, ":::", $0 );
key=( $1 ":" );
printf "%-40s %s\n", key, $2;
}'
The relevant part being where we swap out ": +" for just ":::" and then do a printf to bring it back together.
You could use \t to insert tabs (rather than spaces which is why you get 'jagged' values)
instead of
sed -i -r "\$a$PROPERTY1: VALUE1_3" file.txt
use
sed -i -r "\$a$PROPERTY1:\t\tVALUE1_3" file.txt
All you need to do is remember the existing indentation when inserting the new line, e.g.:
echo 'PROPERTY732: VALUE9_8_7' |
awk -v prop="PROPERTY1" -v val="VALUE1_3" '
match($0,/^PROPERTY[^[:space:]]+[[:space:]]+/) { wid=RLENGTH }
{ print }
END { printf "%-*s%s\n", wid, prop":", val }
'
PROPERTY732: VALUE9_8_7
PROPERTY1: VALUE1_3
but it's not clear that adding 1 line at a time makes sense or where all of the other text you're adding is coming from.
The above will work with any awk on any UNIX system.
If your "properties" don't actually start with the word PROPERTY then you just need to edit your question to show more realistic sample input/output and tell/show us how to distinguish a PROPERTY line from a Description line and, again, the solution will be trivial with awk.

Shell script - new line after comma?

I have a shell script I wrote that grabs a list of names from a location, and each name is separated by a comma , <-- I was wondering if there is anything I can write to make the list of names that gets stored in the text file to indent to a new line after each comma?
For example the list of names that gets stored in the text file look like this:
"Red", "Blue", "Green"
And I want them to look like this:
Red
Blue
Green
The data gets pulled from html code off a website so they have quotations and commas around them, if it's possible to at least format them to a new line, that would be great. Thanks if you help.
Assuming the comma separated date is in the variable $data, you can tell bash to split it by setting $IFS (the list separator variable) to ', ' and using a for loop:
TMPIFS=$IFS #Stores the original value to be reset later
IFS=', '
echo '' > new_file #Ensures the new file is empty, omit if the file already has contents
for item in $data; do
item=${item//'"'/} #Remove double quotes from entries, use item=${item//"'"/} to remove single quotes
echo "$item" >> new_file #Appends each item to the new file, automatically starts on a new line
done
IFS=$TMPIFS #Reset $IFS in case other programs rely on the default value
This will give you the output in the desired format, albeit with a leading blank line.
Just use sed.
% echo '"Red", "Blue", "Green"' | sed -e 's/\"//g' -e 's/, /\n/g'
Red
Blue
Green
awk -F, '{for(i=1;i<=NF;i++){ print $i;}}'
see below command line:
kent$ echo '"Red", "Blue", "Green"'|sed 's/, /\n/g'
"Red"
"Blue"
"Green"
\n is for new line. Like "Red\n", "Blue\n", "Green\n"

Remove lines with duplicate cells

I need to remove lines with a duplicate value. For example I need to remove line 1 and 3 in the block below because they contain "Value04" - I cannot remove all lines containing Value03 because there are lines with that data that are NOT duplicates and must be kept. I can use any editor; excel, vim, any other Linux command lines.
In the end there should be no duplicate "UserX" values. User1 should only appear 1 time. But if User1 exists twice, I need to remove the entire line containing "Value04" and keep the one with "Value03"
Value01,Value03,User1
Value02,Value04,User1
Value01,Value03,User2
Value02,Value04,User2
Value01,Value03,User3
Value01,Value03,User4
Your ideas and thoughts are greatly appreciated.
Edit: For clarity and leaving words out from the editing process.
The following Awk command removes all but the first occurrence of a value in the third column:
$ awk -F',' '{
if (!seen[$3]) {
seen[$3] = 1
print
}
}' textfile.txt
Output:
Value01,Value03,User1
Value01,Value03,User2
Value01,Value03,User3
Value01,Value03,User4
same thing in Perl:
perl -F, -nae 'print unless $c{$F[2]}++;' textfile.txt
this uses autosplit mode: "-F, -a" splits by comma and places the result into #F array

Resources