Substitute pattern with multiline text in bash script - linux

I have bash variable:
VAR="This is \n what i want"
and file with following content:
asdf
zxcv
qwer
I would like to substitue every occurance of zxcv with value of $VAR (multiline text)
How to do it with awk or sed?
I tried:
sed -i 's/zxcv/'$VAR'/g' filename
sed -i "s/zxcv/$VAR/g" filename
sed -i "s/zxcv/$(VAR)/g" filename
sed -i "s/zxcv/$(($VAR))/g" filename
sed -i "s/zxcv/`echo -n $VAR`/g" filename

Using gnu-awk, you can do this by passing variable's value on command line:
var=$'This is \n what i want'
awk -v var="$var" -v w='zxcv' 'n=index($0, w){
$0 = substr($0, 1, n-1) var substr($0, n + length(w))} 1' file
asdf
This is
what i want
qwer
If search word appears in a separate line (as shown in question) then this command can simplified to this:
awk -v var="$var" -v w='zxcv' 'index($0, w)==1{$0 = var} 1' file

You can use GNU sed as well:
$ var="This is \n what i want"
$ sed "s/^zxcv/$var/" file
asdf
This is
what i want
qwer
The key is using double quotes versus single quotes so the value of $var is inserted into the sed replacement string.
(btw, as convention, use lower case in Bash for user variables.)
From comments: Ed Morton's version is way better by using string concatenation and only exposing the variable component to shell interpretation:
$ sed 's/^zxcv/'"$var"'/' file
Use that

Related

Replace pattern in one column bash

I have multiple *csv file that cat like:
#sample,time,N
SPH-01-HG00186-1_R1_001,8.33386,93
SPH-01-HG00266-1_R1_001,7.41229,93
SPH-01-HG00274-1_R1_001,7.63903,93
SPH-01-HG00276-1_R1_001,7.94798,93
SPH-01-HG00403-1_R1_001,7.99299,93
SPH-01-HG00404-1_R1_001,8.38001,93
And I try to wrangle cated csv file to:
#sample,time,N
HG00186,8.33386,93
HG00266,7.41229,93
HG00274,7.63903,93
HG00276,7.94798,93
HG00403,7.99299,93
HG00404,8.38001,93
I did:
for i in $(ls *csv); do line=$(cat ${i} | grep -v "#" | cut -d'-' -f3); sed 's/*${line}*/${line}/g'; done
Yet no result showed up... Any advice of doing so? Thanks.
With awk and the logic of splitting each line by , then split their first field by -:
awk -v FS=',' -v OFS=',' 'NR > 1 { split($1,w,"-"); $1 = w[3] } 1' file.csv
With sed and a robust regex that cannot possibly modify the other fields:
sed -E 's/^([^,-]*-){2}([^,-]*)[^,]*/\2/' file.csv
# or
sed -E 's/^(([^,-]*)-){3}[^,]*/\2/' file.csv
Use this Perl one-liner:
perl -i -pe 's{.*?-.*?-(.*?)-.*?,}{$1,}' *.csv
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
-i.bak : Edit input files in-place (overwrite the input file). Before overwriting, save a backup copy of the original file by appending to its name the extension .bak (you can omit .bak, to avoid creating any backup files).
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlre: Perl regular expressions (regexes)
perldoc perlre: Perl regular expressions (regexes): Quantifiers; Character Classes and other Special Escapes; Assertions; Capture groups
perldoc perlrequick: Perl regular expressions quick start
You can use
sed -E 's/^[^-]+-[0-9]+-([^-]+)[^,]+/\1/' file > newfile
Details:
-E - enabling the POSIX ERE regex flavor
^[^-]+-[0-9]+-([^-]+)[^,]+ - the regex pattern that searches for
^ - start of string
[^-]+ - one or more non-hyphen chars
- - a hyphen
[0-9]+ - one or more digits
- - a hyphen
([^-]+) - Group 1: one or more non-hyphens
[^,]+ - one or more non-comma chars
\1 - replace the match with Group 1 value.
See the online demo:
#!/bin/bash
s='SPH-01-HG00186-1_R1_001,8.33386,93
SPH-01-HG00266-1_R1_001,7.41229,93
SPH-01-HG00274-1_R1_001,7.63903,93
SPH-01-HG00276-1_R1_001,7.94798,93
SPH-01-HG00403-1_R1_001,7.99299,93
SPH-01-HG00404-1_R1_001,8.38001,93'
sed -E 's/^[^-]+-[0-9]+-([^-]+)[^,]+/\1/' <<< "$s"
Output:
HG00186,8.33386,93
HG00266,7.41229,93
HG00274,7.63903,93
HG00276,7.94798,93
HG00403,7.99299,93
HG00404,8.38001,93
You can mangle text using bash parameter expansion, without resorting to external tools like awk and sed:
IFS=","
while read -r -a line; do
x="${line[0]%-*}"
x="${x##*-}"
printf "%s,%s,%s\n" "$x" "${line[1]}" "${line[2]}"
done < input.txt
Or you could do it with simple awk, as others have done.
awk '{print $3,$5,$6}' FS='[-,]' OFS=, < input.txt
If you need to use cut AT ANY PRICE then I suggest following solution, let file.txt content be
#sample,time,N
SPH-01-HG00186-1_R1_001,8.33386,93
SPH-01-HG00266-1_R1_001,7.41229,93
SPH-01-HG00274-1_R1_001,7.63903,93
SPH-01-HG00276-1_R1_001,7.94798,93
SPH-01-HG00403-1_R1_001,7.99299,93
SPH-01-HG00404-1_R1_001,8.38001,93
then
head -1 file.txt && tail -6 file.txt | tr '-' ',' | cut --delimiter=',' --fields=3,5,6
gives output
#sample,time,N
HG00186,8.33386,93
HG00266,7.41229,93
HG00274,7.63903,93
HG00276,7.94798,93
HG00403,7.99299,93
HG00404,8.38001,93
Explanation: output 1st line as-is using head then ram 6 last lines into tr to replace - using , finally use cut with , delimiter and specify desired fields.
{m,n,g}awk NF++ FS='^[^-]+-[^-]+-|-[^,]+' OFS=
|
#sample,time,N
HG00186,8.33386,93
HG00266,7.41229,93
HG00274,7.63903,93
HG00276,7.94798,93
HG00403,7.99299,93
HG00404,8.38001,93

How to add single quotes in a shell script using sed

Need help in making a sed script to find and replace user input along with single quotes. Input file admins.py:
Script:
read adminsid
while [[ $adminsid == "" ]];
do
echo "You did not enter anything. Please re-enter AdminID"
read adminsid
done
## Please enter Admin's ID
9999999999,8888888888,1111111111
## Script To Replace ADMIN_IDS = [] to ADMIN_IDS = ['9999999999,8888888888,1111111111'] in file
sed -i "s|ADMIN_IDS = \[.*\]|ADMIN_IDS = ['$adminsid']|g" $file
## Current results:
ADMIN_IDS = ['9999999999,8888888888,1111111111']
## Expected results:
ADMIN_IDS = ['9999999999','8888888888','1111111111']
Assign the variable to the data
adminsid=9999999999,8888888888,1111111111
Then use sed -e (script) option to add the quoting, and square brackets.
echo "$adminsid" | sed -e "s/,/\',\'/g" -e "s/^/[\'/" -e "s/$/\']/"
or to apply changes to a file (filename in $file):
sed -i "$file" -e "s/,/\',\'/g" -e "s/^/[\'/" -e "s/$/\']/"
You can do this with awk too:
Suppose you have assigned the variable as :
adminsid=9999999999,8888888888,1111111111
Then the solution:
echo "$adminsid"| awk -F"," -v quote="'" -v OFS="','" '$1=$1 {print "["quote $0 quote"]"}'
-F"," -v OFS="','" :: Replacing separator (,) with (',')
print "["quote $0 quote"]" :: Add single quotes(') and ([) and (]) to the begin and end of line
This might work for you (GNU sed & bash):
<<<"$adminsid" sed 's/[^,]\+/'\''&'\''/g;s/.*/[&]/'
Surround all non-comma characters by single quotes and then surround the entire string by square brackets.
Replace the , with ',' in the variable and add characters at the beginning and at the end.
sed "s/.*/['&']/" <<< "${adminsid//,/','}"
echo "('${adminsid//,/\\',\\'}')"

sed only print substring in a string

I am trying to get a substring in a string that is in a large line of data.
The regex (INC............) matches the substring I am trying to get the value of at https://regexr.com/, but I am unable to get the value of the substring into a variable or print it out.
The part of the string around this value is
......TemplateID2":null,"Incident Number":"INC000006743193","Priority":"High","mc_ueid":null,"Assint......
I am getting the error char 26: unknown option to `s' when I try this or the entire string is printed out.
cat /tmp/file1 | sed -n 's/\(INC............\)/\1/p'
cat /tmp/file1 | sed -n 's/./*\(INC............).*/\1/'
Using sed, you need to remove what precedes and follows the string:
sed 's/.*\(INC............\).*/\1/' file
But you can also use grep, if your implementation supports the -o option:
grep -o 'INC............' file
Perl can be used, too:
perl -lne 'print $1 if /(INC............)/' file
That looks like JSON. If it's got {braces} around it which you cut out before posting (tsk tsk), you should definitely use jq if it's available. That said, this page needs some awk!
POSIX (works everywhere):
awk 'match($0, /INC[^"]+/) {print substr($0, RSTART, RLENGTH)}' /tmp/file1`
GNU (works on GNU/Linux):
gawk 'match($0, /INC[^"]+/, a) {print a[0]}' /tmp/file1
If you have more than one match per line (GNU):
gawk '{while(match($0=substr($0, RSTART+RLENGTH), /INC[0-9]+/, a)) print a[0]}' /tmp/file1

search a line that contain a special character using sed or awk

I wonder if there is a command in Linux that can help me to find a line that begins with "*" and contains the special character "|"
for example
* Date | Auteurs
Simply use:
grep -ne '^\*.*|' "${filename}"
Or if you want to use sed:
sed -n '/^\*.*|/{=;p}' "${filename}" | sed '{N;s/\n/:/}'
Or (gnu) awk equivalent (require to backslash the pipe):
awk '/^\*.*\|/' "${filename}"
Where:
^ : start of the line
\*: a literal *
.*: zero or more generic char (not newline)
| : a literal pipe
NB: "${filename}": i've assumed you're using the command in a script with the target file passed in a double quoted variable as "${filename}". In the shell simply use the actual name of the file (or the path to it).
UPDATE (line numbers)
Modify the above commands to obtain also the line number of the matched lines. With grep is simple as to add -n switch:
grep -ne '^\*.*|' "${filename}"
We obtain an output like this:
81806:* Date | Auteurs
To obtain exactly the same output from sed and awk we have to complicate the commands a little bit:
awk '/^\*.*\|/{print NR ":" $0}' "${filename}"
# the = print the line number, p the actual match but it's on two different lines so the second sed call
sed -n '/^\*.*|/{=;p}' "${filename}" | sed '{N;s/\n/:/}'

bash script to strip out some characters

Bash scripting. How can i get a simple while loop to go through a file with below content and strip out all character from T (including T) using sed
"2012-05-04T10:16:04Z"
"2012-04-05T15:27:40Z"
"2012-03-05T14:58:27Z"
"2011-11-29T15:04:09Z"
"2011-11-16T12:12:00Z"
Thanks
A simple awk command to do this:
awk -F '["T]' '{print $2}' file
2012-05-04
2012-04-05
2012-03-05
2011-11-29
2011-11-16
Through sed,
sed 's/"\|T.*//g' file
"matches double quotes \| or T.* starts from the first T match all the characters upto the last. Replacing the matched characters with an empty string will give you the desired output.
Example:
$ echo '"2012-05-04T10:16:04Z"' | sed 's/"\|T.*//g'
2012-05-04
With bash builtins:
while IFS='T"' read -r a a b; do echo "$a"; done < filename
Output:
2012-05-04
2012-04-05
2012-03-05
2011-11-29
2011-11-16

Resources