awk gsub in perl - linux

I have file name called test.txt with content as below.
cat test.txt
MOD_12345_67890
I need to get rid of "MOD_" in the file
I used
my $file = `awk '{gsub("MOD_", "");print}' test.txt`;
in perl. It removes MOD_ and prints 12345_67890 as expected but there more spaces after the output.
My output statement is
my $new_file = $file . ".dat";
print "$new_file";
My output should be as follows.
12345_67890.dat
but i am getting the output as
12345_67890
.dat
.dat is coming in next line
Can anyone, help me remove the spaces or garbage values after the numbers and output should be 12345_67890.dat

Perl does no processing of its input by default. That means the trailing newline on each input line is still there, unlike in Awk. You need to chomp the newline off the input line, or do something like s/$/.dat/. Or run Perl with the -l option, which will remove newlines from input and add them back on output.
Your question about gsub is not very clear. To remove a leading "MOD_", s/^MOD_//. To do both substitutions in one expression, perl -pe 's/^MOD_(.*)$/$1.dat/' test.txt. The parentheses capture their match into $1; additional parentheses create additional backreferences $2, $3, etc.

Just do:
$file =~ s/^MOD_//;
That will remove "MOD_" from the beginning of the string. To emulate awk's gsub function as you have written it, do:
$file =~ s/MOD_//g;
This will remove all occurrences of "MOD_" from anywhere in the string. There is no reason to use backticks (there is never a reason to use backticks instead of qx!) in this case.

Related

How to extract and replace columns with a multi-character delimiter?

I got a file with ^$ as delimiter, the text is like :
tony^$36^$developer^$20210310^$CA
I want to replace the datetime.
I tried awk -F '\^\$' '{print $4}' file.txt | sed -i '/20210310/20221210/' , but it returns nothing. Then I tried the awk part, it returns nothing, I guess it still treat the line as a whole and the delimiter doesn't work. Wondering why and how to solve it?
A simple solution would be:
sed 's/\^\$/\n/g; s/20210310/20221210/g' -i file.txt
which will modify the file to separate each section to a new line.
If you need a different delimiter, change the \n in the command to maybe space or , .. up to you.
And it will also replace the date in the file.
If you want to see the changes, and really modify the file, remove the -i from the command.
When I run your awk command, I get these warnings:
awk: warning: escape sequence `\^' treated as plain `^'
awk: warning: escape sequence `\$' treated as plain `$'
That explains why your output is blank: the field delimiter is interpreted as the regular expression '^$', which matches a completely blank line (only). As a result, each non-blank line of input is without any field separators, and therefore has only a single field. $4 can be non-empty only if there are at least four fields.
You can fix that by escaping the backslashes:
awk -F '\\^\\$' '{print $4}' file.txt
If all you want to do is print the modified datecodes py themselves, then that should get you going. However, the question ...
How to extract and replace columns with a multi-character delimiter?
... sounds like you may want actually to replace the datecode within each line, keeping the rest intact. In that case, it is a non-starter for the awk command to discard the other parts of the line. You have several options here, but two of the more likely would be
instead of sending field 4 out to sed for substitution, do the sub in the awk script, and then reconstitute the input line by printing all fields, with the expected delimiters. (This is left as an exercise.) OR
do the whole thing in sed:
sed -E 's/^((([^^]|\^[^$])*\^\$){3})20210310(\^\$.*)/\120221210\4/' file.txt
If you wanted to modify file.txt in-place then you could add the -i flag (which, on the other hand, is not useful in your original command, where sed's input is coming from a pipe rather than a file).
The -E option engages the POSIX extended regex dialect, which allows the given regex to be more readable (the alternative would require a bunch more \ characters).
Overall, presuming that there are five or more fields delimited by literal '^$' strings, and the fourth contains exactly "20210310", that matches the first three fields, including their trailing delimiters, and captures them all as group 1; matches the leading delimiter of the fifth field and all the remainder of the line and captures it as group 4; and substitutes replaces the whole line with group 1 followed by the new datecode followed by group 4.

Select lines between two patterns using variables inside SED command

I'm new to shell scripting. My requirement is to retrieve lines between two pattern, its working fine if I run it from the terminal without using variables inside sed cmd. But the problem arises when I put all those below cmd in a file and tried to execute it.
#!/bin/sh
word="ajp-qdcls2228.us.qdx.com%2F156.30.35.204-8009-34"
upto="2017-01-03 23:00"
fileC=`cat test.log`
output=`echo $fileC | sed -e "n/\$word/$upto/p"`
printf '%s\n' "$output"
If I use the below cmd in the terminal it works fine
sed -n '/ajp-qdcls2228.us.qdx.com%2F156.30.35.204-8009-34/,/2017-01-03 23:00/ p' test.log
Please suggest a workaround.
If we put aside for a moment the fact you shouldn't cat a file to a variable and then echo it for sed filtering, the reason why your command is not working is because you're not quoting the file content variable, fileC when echoing. This will munge together multiple whitespace characters and turn them into a single space. So, you're losing newlines from the file, as well as multiple spaces, tabs, etc.
To fix it, you can write:
fileC=$(cat test.log)
output=$(echo "$fileC" | sed -n "/$word/,/$upto/p")
Note the double-quotes around fileC (and a fixed sed expression, similar to your second example). Without the quotes (try echo $fileC), your fileC is expanded (with the default IFS) into a series of words, each being one argument to echo, and echo will just print those words separated with a single space. Additionally, if the file contains some of the globbing characters (like *), those patterns are also expanded. This is a common bash pitfall.
Much better would be to write it like this:
output=$(sed -n "/$word/,/$upto/p" test.log)
And if your patterns include some of the sed metacharacters, you should really escape them before using with sed, like this:
escape() {
sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$1";
}
output=$(sed -n "/$(escape "$word")/,/$(escape "$upto")/ p" test.log)
The correct approach will be something like:
word="ajp-qdcls2228.us.qdx.com%2F156.30.35.204-8009-34"
upto="2017-01-03 23:00"
awk -v beg="$word" -v end="$upto" '$0==beg{f=1} f{print; if ($0==end) exit}' file
but until we see your sample input and output we can't know for sure what it is you need to match on (full lines, partial lines, all text on one line, etc.) or what you want to print (include delimiters, exclude one, exclude both, etc.).

Filter out only matched values from a text file in each line

I have a file "test.txt" with the lines below and also lot bunch of extra stuff after the "version"
soainfra_metrics{metric_group="sca_composite",partition="test",is_active="true",state="on",is_default="true",composite="test123"} map:stats version:1.0
soainfra_metrics{metric_group="sca_composite",partition="gello",is_active="true",state="on",is_default="true",composite="test234"} map:stats version:1.8
soainfra_metrics{metric_group="sca_composite",partition="bolo",is_active="true",state="on",is_default="true",composite="3415"} map:stats version:3.1
soainfra_metrics{metric_group="sca_composite",partition="solo",is_active="true",state="on",is_default="true",composite="hji"} map:stats version:1.1
I tried:
egrep -r 'partition|is_active|state|is_default|composite' test.txt
It's displaying every line, but I need only specific mentioned fields like this below,ignoring rest of the data/stuff or lines
in a nut shell, i want to display only these fields from a line not the rest
partition="test",is_active="true",state="on",is_default="true",composite="test123"
partition="gello",is_active="true",state="on",is_default="true",composite="test234"
partition="bolo",is_active="true",state="on",is_default="true",composite="3415"
partition="solo",is_active="true",state="on",is_default="true",composite="hji"
If your version of grep supports Perl-style regular expressions, then I'd use this:
grep -oP '.*?,\K[^}]+' file
It removes everything up to the first comma (\K kills any previous output) and prints everything up to the }.
Alternatively, using awk:
awk -F'}' '{ sub(/[^,]+,/, ""); print $1 }' file
This sets the field separator to } so the part you're interested in is the first field. It then uses sub to remove the part up to the first comma.
For completeness, you could also use sed:
sed 's/[^,]*,\([^}]*\).*/\1/' file
This captures the part after the first , up to the } and replaces the content of the line with it.
After the grep to pick out the lines you want, use sed to edit the lines:
sed 's/.*\(partition[^}]*\)} map.*/\1/'
This means: "whenever you see anything .*, followed by partition and
any number of non-}, then } map and anything else, grab the part
from partition up to but not including the brace \(...\) as group 1.
The replacement text is just group 1 \1.
Use a pipe | to connect the output of egrep to the input of sed:
egrep ... | sed ...
As far as i understood your file might have more lines you don't want to see, so i would use:
sed -n 's/.*\(partition.*\)}.*/\1/p' file
we use -n p to show only lines where we made substitution. The substitution part just gets the part of the line you need substituting the whole line with the pattern.
This might work for you (GNU sed):
sed -r 's/(partition|is_active|state|is_default|composite)="[^"]*"/\n&\n/g;s/[^\n]*\n([^\n]*)\n[^\n]*/\1,/g;s/,$//' file
Treat the problem as if it were a "decomposed club sandwich". Identify the fillings, remove the bread and tidy up.

Match a string that contains a newline using sed

I have a string like this one:
#
pap
which basically translates to a \t#\n\tpap and I want to replace it with:
#
pap
python
which translates to \t#\n\tpap\n\tpython.
Tried this with sed in a lot of ways but it's not working maybe because sed uses new lines in a different way. I tried with:
sed -i "s/\t#\n\tpap/\t#\tpython\n\tpap/" /etc/freeradius/sites-available/default
...and many different other ways with no result. Any idea how can I do my replace in this situation?
try this line with gawk:
awk -v RS="\0" -v ORS="" '{gsub(/\t#\n\tpap/,"yourNEwString")}7' file
if you want to let sed handle new lines, you have to read the whole file first:
sed ':a;N;$!ba;s/\t#\n\tpap/NewString/g' file
This might work for you (GNU sed):
sed '/^\t#$/{n;/^\tpap$/{p;s//\tpython/}}' file
If a line contains only \t# print it, then if the next line contains only \tpap print it too, then replace that line with \tpython and print that.
A GNU sed solution that doesn't require reading the entire file at once:
sed '/^\t#$/ {n;/^\tpap$/a\\tpython'$'\n''}' file
/^\t#$/ matches comment-only lines (matching \t# exactly), in which case (only) the entire {...} expression is executed:
n loads and prints the next line.
/^\tpap/ matches that next line against \tpap exactly.
in case of a match, a\\tpython will then output \n\tpython before the following line is read - note that the spliced-in newline ($'\n') is required to signal the end of the text passed to the a command (you can alternatively use multiple -e options).
(As an aside: with BSD sed (OS X), it gets cumbersome, because
Control chars. such as \n and \t aren't directly supported and must be spliced in as ANSI C-quoted literals.
Leading whitespace is invariably stripped from the text argument to the a command, so a substitution approach must be used: s//&\'$'\n\t'python'/ replaces the pap line with itself plus the line to append:
sed '/^'$'\t''#$/ {n; /^'$'\t''pap$/ s//&\'$'\n\t'python'/;}' file
)
An awk solution (POSIX-compliant) that also doesn't require reading the entire file at once:
awk '{print} /^\t#$/ {f=1;next} f && /^\tpap$/ {print "\tpython"} {f=0}' file
{print}: prints every input line
/^\t#$/ {f=1;next}: sets flag f (for 'found') to 1 if a comment-only line (matching \t# exactly) is found and moves on to the next line.
f && /^\tpap$/ {print "\tpython"}: if a line is preceded by a comment line and matches \tpap exactly, outputs extra line \tpython.
{f=0}: resets the flag that indicates a comment-only line.
A couple of pure bash solutions:
Concise, but somewhat fragile, using parameter expansion:
in=$'\t#\n\tpap\n' # input string
echo "${in/$'\t#\n\tpap\n'/$'\t#\n\tpap\n\tpython\n'}"
Parameter expansion only supports patterns (wildcard expressions) as search strings, which limits the matching abilities:
Here the assumption is made that pap is followed by \n, whereas no assumption is made about what precedes \t#, potentially resulting in false positives.
If the assumption could be made that \t#\n\tpap is always enclosed in \n, echo "${in/$'\n\t#\n\tpap\n'/$'\n\t#\n\tpap\n\tpython\n'}" would work robustly; otherwise, see below.
Robust, but verbose, using the =~ operator for regex matching:
The =~ operator supports extended regular expressions on the right-hand side and thus allows more flexible and robust matching:
in=$'\t#\n\tpap' # input string
# Search string and string to append after.
search=$'\t#\n\tpap'
append=$'\n\tpython'
out=$in # Initialize output string to input string.
if [[ $in =~ ^(.*$'\n')?("$search")($'\n'.*)?$ ]]; then # perform regex matching
out=${out/$search/$search$append} # replace match with match + appendage
fi
echo "$out"
You can just translate the character \n to another one, then apply sed, then apply the reverse translation. If tr is used, it must be a 1-byte character, for instance \v (vertical tabulation, nowadays almost unused).
cat FILE|tr '\n' '\v'|sed 's/\t#\v\tpap/&\v\tpython/'|tr '\v' '\n'|sponge FILE
or, without sponge:
cat FILE|tr '\n' '\v'|sed 's/\t#\v\tpap/&\v\tpython/'|tr '\v' '\n' >FILE.bak && mv FILE.bak FILE

how to read each line from a .dat file in unix?

trade.dat is my file which consists of lines of data.
i have to concatanate each line of that file with comma (,)
help me please
If you mean just add a comma to the end of each line:
sed 's/$/,/' <oldfile >newfile
If you mean join all lines together into one line, separating each with a comma:
awk '{printf "%s,",$0}' <oldfile >newfile
Or the more correct one without a trailing comma (thanks, #hacker, for pointing out the error):
awk 'BEGIN {s=""} {printf "%s%s",s,$0;s=","}' <oldfile >newfile
If you want the output of any of those in a shell variable, simply use the $() construct, such as:
str=$(awk 'BEGIN {s=""} {printf "%s%s",s,$0;s=","}' <oldfile)
I find it preferable to use $() rather than backticks since it allows me to nest commands, something backticks can't do.
Two obligatory perl versions (credit goes to William Pursell for the second one):
perl -i -p -e 'chomp($_); $_ = "$_,\n"' trade.dat
perl -i -p -e 's/$/,/' trade.dat
Note that
this does not make backups of the original file by default (use -i.bak for that).
this answer appends a comma to every line. To join all lines together into a single line, separated by commas, look at William Purcell's answer.
tryfullline=""
for line in $(cat trade.dat)
do
fullline="$fullline,$line"
done And then use $fullline to show youe file concatenated
hope this'll helps ;p
perl -pe 's/\n/,/ unless eof'
First thing that comes into my head:
gawk -- '{ if(a) { printf ",%s",$0; } else { printf "%s",$0; a=1 } }' trade.dat
if I correctly understand what you want.
Answering the question in the title, one way to get each line in a variable in a loop in BASH is to:
cat file.dat | while read line; do echo -n "$line",; done
That will leave a trailing comma, but shows how to read each line.
But clearly a sed or awk or perl solutions are the best suited to the problem described in the body of your question.

Resources