replace text between two tabs - sed - linux

I have the following input files:
text1 text2 text3 text4
abc1 abc2 abc3 abc4
and I am trying to find the second string between the two tabs (e.g. text2, abc2) and replace it with another word.
I have tried with
sed s'/\t*\t/sample/1'
but it only deletes the tab and does not replace the word.
I appreciate any help!

I would suggest using awk here:
awk 'BEGIN { FS = OFS = "\t" } { $2 = "sample" } 1' file
Set the input and output field separators to a tab and change the second field. The 1 at the end is always true, so awk does the default action, { print }.

Use this sed:
sed 's/\t[^\t]*\t/\tsample\t/'

An alternative in gawk, since you tagged awk ---
gawk -- 'BEGIN {FS="\t"; OFS="\t"} {$2="sample"; print}'
For example,
echo -e 'a\tb\tc\td' | gawk -- 'BEGIN {FS="\t"; OFS="\t"} {$2="sample"; print}'
prints
a sample c d
The FS breaks input at tabs, OFS separates output fields using tabs, and $2="sample" changes only the second field, leaving the rest of the fields unchanged.

Try this
sed -e 's/\([a-zA-Z0-9]*\) \([a-zA-Z0-9]*\) \([a-zA-Z0-9]*\) \([a-zA-Z0-9]*\)/\1 sample \2 \3 \4/'

In GNU sed v4.2.2 I had to use -r:
sed -r 's/^([^\t]*\t)[^\t]*/\1sample/'
The ^([^\t]*\t) is the first field and the first tab, and the [^\t]* is the text of the second field. The \1 restores the first field and the sample is whatever you want :) .
For example,
echo -e 'a\tb\tc\td' | sed -r 's/^([^\t]*\t)[^\t]*/\1sample/'
prints
a sample c d
This also works for other than four columnns. For example
$ echo -e 'a\tb\tc' | sed -r 's/^([^\t]*\t)[^\t]*/\1sample/'
a sample c
$ echo -e 'a\tb\tc\td\te' | sed -r 's/^([^\t]*\t)[^\t]*/\1sample/'
a sample c d e

Related

How to convert to title case a specific column

I have come up with this code:
cut -d';' -f4 columns.csv | sed 's/.*/\L&/; s/[a-z]*/\u&/g'
which actually does the job for the fourth column, but in the way I have lost the other columns..
I have unsuccessfully tried:
cut -d';' -f4 columns.csv | sed -i 's/.*/\L&/; s/[a-z]*/\u&/g'
So, how could I apply the change to that specific column in the file and keep other columns as they are?
Let say that columns.csv content is:
TEXT;more text;SoMe MoRe TeXt;THE FOURTH COLUMN;something else
Then, expected output should be:
TEXT;more text;SoMe MoRe TeXt;The Fourth Column;something else
GNU sed:
sed -ri 's/;/&\r/3;:1;s/\r([^; ]+\s*)/\L\u\1\r/;t1;s/\r//' columns.csv
update:
sed -i 's/; */&\n/3;:1;s/\n\([^; ]\+ *\)/\L\u\1\n/;t1;s/\n//' columns.csv
Place anchor \r (\n) at the beginning of field 4. We edit the whole word and move the anchor to the beginning of the next one. Jump by label t1 :1 is carried out as long as there are matches for the pattern in the substitution command. Removing the anchor.
Not a short simple awk, but should work:
awk -F";" '{t=split($4,a," ");$4="";for(i=1;i<=t;i++) {a[i]=substr(a[i],1,1) tolower(substr(a[i],2));$4=$4 sprintf("%s ",a[i])}$4=substr($4,1,length($4)-1)}1' OFS=";" file
TEXT;more text;SoMe MoRe TeXt;The Fourth Column;something else
Some shorter version
awk -F";" '{t=split($4,a," ");$4="";for(i=1;i<=t;i++) {a[i]=substr(a[i],1,1) tolower(substr(a[i],2));$4=$4 a[i](t==i?"":" ")}}1' OFS=";" file
With perl:
$ perl -F';' -lane '$F[3] =~ s/[a-z]+/\L\u$&/gi; print join ";", #F' columns.csv
TEXT;more text;SoMe MoRe TeXt;The Fourth Column;something else
-F';' use ; to split the input line
$F[3] =~ s/[a-z]+/\L\u$&/gi change case only for the 4th column
print join ";", #F print the modified fields
Unicode version:
perl -Mopen=locale -Mutf8 -F';' -lane '$F[3]=~s/\p{L}+/\L\u$&/gi;
print join ";", #F'
Using any awk in any shell on every Unix box:
$ cat tst.awk
BEGIN { FS=OFS=";" }
{
title = ""
numWords = split($4,words,/ /)
for (wordNr=1; wordNr<=numWords; wordNr++) {
word = words[wordNr]
word = toupper(substr(word,1,1)) tolower(substr(word,2))
title = (wordNr>1 ? title " " : "") word
}
$4 = title
print
}
$ awk -f tst.awk file
TEXT;more text;SoMe MoRe TeXt;The Fourth Column;something else
True capitalization in a title is much more complicated than that though.
This might work for you (GNU sed):
sed -E 's/[^;]*/\n&\n/4;h;s/\S*/\L\u&/g;H;g;s/\n.*\n(.*)\n.*\n(.*)\n.*/\2\1/' file
Delimit the fourth field by newlines and make a copy.
Uppercase the first character of each word.
Append the amended line to the original.
Using pattern matching, replace the original fourth field by the amended one.

grep with two or more words, one line by file with many files

everyone. I have
file 1.log:
text1 value11 text
text text
text2 value12 text
file 2.log:
text1 value21 text
text text
text2 value22 text
I want:
value11;value12
value21;value22
For now I grep values in separated files and paste later in another file, but I think this is not a very elegant solution because I need to read all files more than one time, so I try to use grep for extract all data in a single cat | grep line, but is not the result I expected.
I use:
cat *.log | grep -oP "(?<=text1 ).*?(?= )|(?<=text2 ).*?(?= )" | tr '\n' '; '
or
cat *.log | grep -oP "(?<=text1 ).*?(?= )|(?<=text2 ).*?(?= )" | xargs
but I get in each case:
value11;value12;value21;value22
value11 value12 value21 value22
Thank you so much.
Try:
$ awk -v RS='[[:space:]]+' '$0=="text1" || $0=="text2"{getline; printf "%s%s",sep,$0; sep=";"} ENDFILE{if(sep)print""; sep=""}' *.log
value11;value12
value21;value22
For those who prefer their commands spread over multiple lines:
awk -v RS='[[:space:]]+' '
$0=="text1" || $0=="text2" {
getline
printf "%s%s",sep,$0
sep=";"
}
ENDFILE {
if(sep)print""
sep=""
}' *.log
How it works
-v RS='[[:space:]]+'
This tells awk to treat any sequence of whitespace (newlines, blanks, tabs, etc) as a record separator.
$0=="text1" || $0=="text2"{getline; printf "%s%s",sep,$0; sep=";"}
This tells awk to look file records that matches either text1 ortext2`. For those records and those records only the commands in curly braces are executed. Those commands are:
getline tells awk to read in the next record.
printf "%s%s",sep,$0 tells awk to print the variable sep followed by the word in the record.
After we print the first match, the command sep=";" is executed which tells awk to set the value of sep to a semicolon.
As we start each file, sep is empty. This means that the first match from any file is printed with no separator preceding it. All subsequent matches from the same file will have a ; to separate them.
ENDFILE{if(sep)print""; sep=""}
After the end of each file is reached, we print a newline if sep is not empty and then we set sep back to an empty string.
Alternative: Printing the second word if the first word ends with a number
In an alternative interpretation of the question (hat tip: David C. Rankin), we want to print the second word on any line for which the first word ends with a number. In that case, try:
$ awk '$1~/[0-9]$/{printf "%s%s",sep,$2; sep=";"} ENDFILE{if(sep)print""; sep=""}' *.log
value11;value12
value21;value22
In the above, $1~/[0-9]$/ selects the lines for which the first word ends with a number and printf "%s%s",sep,$2 prints the second field on that line.
Discussion
The original command was:
$ cat *.log | grep -oP "(?<=text1 ).*?(?= )|(?<=text2 ).*?(?= )" | tr '\n' '; '
value11;value12;value21;value22;
Note that, when using most unix commands, cat is rarely ever needed. In this case, for example, grep accepts a list of files. So, we could easily do without the extra cat process and get the same output:
$ grep -hoP "(?<=text1 ).*?(?= )|(?<=text2 ).*?(?= )" *.log | tr '\n' '; '
value11;value12;value21;value22;
I agree with #John1024 and how you approach this problem will really depend on what the actual text is you are looking for. If for instance your lines of concern start with text{1,2,...} and then what you want in the second field can be anything, then his approach is optimal. However, if the values in the first field and vary and what you are really interested in is records where you have valueXX in the second field, then an approach keying off the second field may be what you are looking for.
Taking for example your second field, if the text you are interested in is in the form valueXX (where XX are two or more digits at the end of the field), you can process only those records where your second field matches and then use a simple conditional testing whether FNR == 1 to control the ';' delimiter output and ENDFILE to control the new line similar to:
awk '$2 ~ /^value[0-9][0-9][0-9]*$/ {
printf "%s%s", (FNR == 1) ? "" : ";", $2
}
ENDFILE {
print ""
}' file1.log file2.log
Example Use/Output
$ awk '$2 ~ /^value[0-9][0-9][0-9]*$/ {
printf "%s%s", (FNR == 1) ? "" : ";", $2
}
ENDFILE {
print ""
}' file1.log file2.log
value11;value12
value21;value22
Look things over and consider your actual input files and then either one of these two approaches should get you there.
If I understood you correctly, you want the values but search for the text[12] ie. to get the word after matching search word, not the matching search word:
$ awk -v s="^text[12]$" ' # set the search regex *
FNR==1 { # in the beginning of each file
b=b (b==""?"":"\n") # terminate current buffer with a newline
}
{
for(i=1;i<NF;i++) # iterate all but last word
if($i~s) # if current word matches search pattern
b=b (b~/^$|\n$/?"":";") $(i+1) # add following word to buffer
}
END { # after searching all files
print b # output buffer
}' *.log
Output:
value11;value12
value21;value22
* regex could be for example ^(text1|text2)$, too.

search for a string and after getting result cut that word and store result in variable

I Have a file name abc.lst i ahve stored that in a variable it contain 3 words string among them i want to grep second word and in that i want to cut the word from expdp to .dmp and store that into variable
example:-
REFLIST_OP=/tmp/abc.lst
cat $REFLIST_OP
34 /data/abc/GOon/expdp_TEST_P119_*_18112017.dmp 12-JAN-18 04.27.00 AM
Desired Output:-
expdp_TEST_P119_*_18112017.dmp
I Have tried below command :-
FULL_DMP_NAME=`cat $REFLIST_OP|grep /orabackup|awk '{print $2}'`
echo $FULL_DMP_NAME
/data/abc/GOon/expdp_TEST_P119_*_18112017.dmp
REFLIST_OP=/tmp/abc.lst
awk '{n=split($2,arr,/\//); print arr[n]}' "$REFLIST_OP"
Test Results:
$ REFLIST_OP=/tmp/abc.lst
$ cat "$REFLIST_OP"
34 /data/abc/GOon/expdp_TEST_P119_*_18112017.dmp 12-JAN-18 04.27.00 AM
$ awk '{n=split($2,arr,/\//); print arr[n]}' "$REFLIST_OP"
expdp_TEST_P119_*_18112017.dmp
To save in variable
myvar=$( awk '{n=split($2,arr,/\//); print arr[n]}' "$REFLIST_OP" )
Following awk may help you on same.
awk -F'/| ' '{print $6}' Input_file
OR
awk -F'/| ' '{print $6}' "$REFLIST_OP"
Explanation: Simply making space and / as a field separator(as per your shown Input_file) and then printing 6th field of the line which is required by OP.
To see the field number and field's value you could use following command too:
awk -F'/| ' '{for(i=1;i<=NF;i++){print i,$i}}' "$REFLIST_OP"
Using sed with one of these regex
sed -e 's/.*\/\([^[:space:]]*\).*/\1/' abc.lst capture non space characters after /, printing only the captured part.
sed -re 's|.*/([^[:space:]]*).*|\1|' abc.lst Same as above, but using different separator, thus avoiding to escape the /. -r to use unescaped (
sed -e 's|.*/||' -e 's|[[:space:]].*||' abc.lst in two steps, remove up to last /, remove from space to end. (May be easiest to read/understand)
myvar=$(<abc.lst); myvar=${myvar##*/}; myvar=${myvar%% *}; echo $myvar
If you want to avoid external command (sed)

Replace string in a file from a file [duplicate]

This question already has answers here:
Difference between single and double quotes in Bash
(7 answers)
Closed 5 years ago.
I need help with replacing a string in a file where "from"-"to" strings coming from a given file.
fromto.txt:
"TRAVEL","TRAVEL_CHANNEL"
"TRAVEL HD","TRAVEL_HD_CHANNEL"
"FROM","TO"
First column is what to I'm searching for, which is to be replaced with the second column.
So far I wrote this small script:
while read p; do
var1=`echo "$p" | awk -F',' '{print $1}'`
var2=`echo "$p" | awk -F',' '{print $2}'`
echo "$var1" "AND" "$var2"
sed -i -e 's/$var1/$var2/g' test.txt
done <fromto.txt
Output looks good (x AND y), but for some reason it does not replace the first column ($var1) with the second ($var2).
test.txt:
"TRAVEL"
Output:
"TRAVEL" AND "TRAVEL_CHANNEL"
sed -i -e 's/"TRAVEL"/"TRAVEL_CHANNEL"/g' test.txt
"TRAVEL HD" AND "TRAVEL_HD_CHANNEL"
sed -i -e 's/"TRAVEL HD"/"TRAVEL_HD_CHANNEL"/g' test.txt
"FROM" AND "TO"
sed -i -e 's/"FROM"/"TO"/g' test.txt
$ cat test.txt
"TRAVEL"
input:
➜ cat fromto
TRAVEL TRAVEL_CHANNEL
TRAVELHD TRAVEL_HD
➜ cat inputFile
TRAVEL
TRAVELHD
The work:
➜ awk 'BEGIN{while(getline < "fromto") {from[$1] = $2}} {for (key in from) {gsub(key,from[key])} print}' inputFile > output
and output:
➜ cat output
TRAVEL_CHANNEL
TRAVEL_CHANNEL_HD
➜
This first (BEGIN{}) loads your input file into an associate array: from["TRAVEL"] = "TRAVEL_HD", then rather inefficiently performs search and replace line by line for each array element in the input file, outputting the results, which I piped to a separate outputfile.
The caveat, you'll notice, is that the search and replaces can interfere with each other, the 2nd line of output being a perfect example since the first replacement happens. You can try ordering your replacements differently, or use a regex instead of a gsub. I'm not certain if awk arrays are guaranteed to have a certain order, though. Something to get you started, anyway.
2nd caveat. There's a way to do the gsub for the whole file as the 2nd step of your BEGIN and probably make this much faster, but I'm not sure what it is.
you can't do this oneshot you have to use variables within a script
maybe something like below sed command for full replacement
-bash-4.4$ cat > toto.txt
1
2
3
-bash-4.4$ cat > titi.txt
a
b
c
-bash-4.4$ sed 's|^\s*\(\S*\)\s*\(.*\)$|/^\2\\>/s//\1/|' toto.txt | sed -f - titi.txt > toto.txt
-bash-4.4$ cat toto.txt
a
b
c
-bash-4.4$

Adding String at specific postition of a delimiter for hundred thousand lines

One text file:
1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;#;
17;18;19;20;21;22;23;24;25;26;27;28;29;30;31;32;#;
etc..
Want to add '/c_h/' to every column 15
Want the results to be:
1;2;3;4;5;6;7;8;9;10;11;12;13;14;/c_h/15;16;#;
17;18;19;20;21;22;23;24;25;26;27;28;29;30;/c_h/31;32;#;
etc..
I was using awk but couldn't figure it out
Try this command:
awk -v FS=";" -v OFS=";" ' { $15 = "/c_h/"$15 ; print $0 } ' file.txt
First of all, you need to set your input field separator and output field separator to be ";" in this case. Then just modify the field you want.
sed -r -e 's/(([0-9]+;){14})/\1\/c_h\//' YourFileName
To do it inline
sed -r -e 's/(([0-9]+;){14})/\1\/c_h\//' -i.bak YourFileName

Resources