Running awk on file, with regular expressions

Running awk on file, with regular expressions - linux

I would like to find all occurrences of INPUT in a file, JUST INPUT. I have the following, but it finds everything with INPUT*
awk '{for(i=1;i<=NF;i++){if($i~/^INPUT/){print $i}}}'
I would like to support that though, so if I have INPUT* or INPUT? or INPUT. (any regular expression) instead of INPUT in the above, it should work for that.
Anyone know how to fix the above to do that? Thanks.
I'm trying to do the following in a perl script using $INPUT
`awk '{for(i=1;i<=NF;i++){if($i~/^$INPUT$/){print $i}}}' $file`
but I can't get it to work any ideas?

If you want to use backticks, then escape all dollar signs (assuming you have something, e.g., 'INPUT' in $INPUT)::
`awk '{for(i=1;i<=NF;i++){if(\$i~/^$INPUT\$/){print \$i}}}' $file | wc -l`;
awk can count the number of matches for you too (this one counts once one per line):
`awk '/\y$INPUT\y/{s++} END{print s}' $file`;
and using native Perl, which is recommended:
my $cnt;
open my $f, "<", "input" or die("$!");
while (<$f>) {
$cnt++ while /\bINPUT\b/g;
}
close $f;
print $cnt;

The regular expression you use is achored at the beginning ^ but not the end $. Try:
awk '{for(i=1;i<=NF;i++){if($i~/^INPUT$/){print $i}}}'
If you want to match INPUT anywhere in the field, try:
awk '{for(i=1;i<=NF;i++){if($i~/INPUT/){print $i}}}'

Related

How to get the rest of the Pattern using any linux command?

I am try to update a file and doing some transformation using any linux tool.
For example, here I am trying with awk.
Would be great to know how to get the rest of the pattern?
awk -F '/' '{print $1"/raw"$2}' <<< "string1/string2/string3/string4/string5"
string1,rawstring2
here I dont know how many "/" is there and I want to get the output:
string1/rawstring2/string3/string4/string5

Something like
awk -F/ -v OFS=/ '{ $2 = "raw" $2 } 1' <<< "string1/string2/string3/string4/string5"
Just modify the desired field, and print out the changed line (Have to set OFS so it uses a slash instead of a space to separate fields on output, and a pattern of 1 uses the default action of printing $0. It's an idiom you'll see a lot of with awk.)

Also possible with sed:
sed -E 's|([^/]*/)|\1raw|' <<< "string1/string2/string3/string4/string5"
The \1 in the replacement string reproduces the bit inside parenthesis and appends raw to it.
Equivalent to
sed 's|\([^/]*/\)|\1raw|' <<< "string1/string2/string3/string4/string5"

Replacing characters in each line on a file in linux

I have a file with different word in each line.
My goal is to replace the first character to a capital letter and replace the 3rd character to "#".
For example: football will be exchanged to Foo#ball.
I tried thinking about using awk and sed.It didn't help me since (to my knowledge) sed needs an exact character input and awk can print the desired character but not change it.

With GNU sed and two s commands:
echo 'football' | sed -E 's/(.)/\U\1/; s/(...)./\1#/'
Output:
Foo#ball
See: 3.3 The s Command, 5.7 Back-references and Subexpressions and 5.9.2 Upper/Lower case conversion

This might work for you (GNU sed):
sed 's/\(...\)./\u\1#/' file

With bash you can use parameter expansions alone to accomplish the task. For example, if you read each line into the variable line, you can do:
line="${line^}" # change football to Football (capitalize 1st char)
line="${line:0:3}#${line:4}" # make 4th character '#'
Example Input File
$ cat file
football
soccer
baseball
Example Use/Output
$ while read -r line; do line="${line^}"; echo "${line:0:3}#${line:4}"; done < file
Foo#ball
Soc#er
Bas#ball
While shell is typically slower, when use is limited to builtins, it doesn't fall too far behind.
(note: your question says 3rd character, but your example replaces the 4th character with '#')

With GNU awk for the 3rd arg to match():
$ echo 'football' | awk 'match($0,/(.)(..).(.*)/,a){$0=toupper(a[1]) a[2] "#" a[3]} 1'
Foo#ball

Cyrus' or Potong's answers are the preferred ones. (For Linux or systems with GNU sed because of \U or \u.)
This is just an additional solution with awk because you mentioned it and used also awk tag:
$ echo 'football'|awk '{a=substr($0,1,1);b=substr($0,2,2);c=substr($0,5);print toupper(a)b"#"c}'
Foo#ball
This is a most simple solution without RegEx. It will also work on non-GNU awk.

This should work with any version of awk:
awk '{
for(i=1;i<=NF;i++){
# Note that string indexes start at 1 in awk !
$i=toupper(substr($i,1,1)) "" substr($i,2,1) "#" substr($i,3)
}
print
}' file
Note: If a word is less than 3 characters long, like it, it will be printed as It#

if your data in 'd' file, tried on gnu sed:
sed -E 's/^(\w)(\w\w)\w/\U\1\E\2#/' d

How to use grep and sed in order to replace the substring after searching some specific string?

I want to know how to use two 'grep' and 'sed' utilities or something else in order to replace the substring. I will explain what I want to do below.
We have the file 'test.txt' with the following string:
A1='AA1', A2='AA2', A3='AA3', A4='AA4', A5{ATTR}='AA5', A6='keyword_A'
After searching 'keyword_A' using grep, I want to replace the value of A5 with other string, for example, "NEW".
A1='AA1', A2='AA2', A3='AA3', A4='AA4', A5{ATTR}='NEW', A6='keyword_A'
I tried to use two commands like
grep keyword_A test.txt | sed -e 's/blabla/blabla/'
After trying all I know, I gave up at all.
Please let me know the right solution.

First, you never need grep and sed. Sed has a full regular-expression search engine, so it is a superset of grep. This command will read test.txt, change the lines that you've indicated, and print the entire result on standard output:
sed "/keyword_A/s/A5{ATTR}='[A-Z0-9]*'/A5{ATTR}='NEW'/g" < test.txt
If you want to store the results back into the file test.txt, use the -i (in-place editing) switch to sed:
sed "/keyword_A/s/A5{ATTR}='[A-Z0-9]*'/A5{ATTR}='NEW'/g" -i.bak test.txt
If you want to select only the indicated lines, modify those, and print only those lines to standard out, use a combination of the p (print) command and the -n (no output) switch.
sed "/keyword_A/s/A5{ATTR}='[A-Z0-9]*'/A5{ATTR}='NEW'/gp" -n test.txt

Using grep+sed is always the wrong approach. Here's one way to do it with GNU awk:
$ awk '/keyword_A/{ $0=gensub(/(A5({[^}]+})?=\047)[^\047]+/,"\\1NEW",1) } 1' file
A1='AA1', A2='AA2', A3='AA3', A4='AA4', A5{ATTR}='NEW', A6='keyword_A'

Using a couple variables you could define the keyword and replacement ( if they change at all ):
q="keyword_A"
r="NEW"
Then with sed:
sed -r "s/^(.+\{.+\}=')(.+)('.+"${q}".+)$/\1"${r}"\3/" file
Result:
A1='AA1', A2='AA2', A3='AA3', A4='AA4', A5{ATTR}='NEW', A6='keyword_A'

A5="NEW"
A6="keyword_A"
# with sed
sed "s/='[^']*\(',[[:blank:]]*A6='${A6}'\)/='${A5}\1/" YourFile
# with awk
awk -F "'" -v A5="${A5}" -v A6="${A6}" '
BEGIN { OFS="\047" }
$12 == A6 { $10 = A5; $0 = $0 }
7
' YourFile
Change by the end of the string, for sed and using ' as field separator in awk instead of traditional space.
assuming there is no ' in value (or need to treat the escaping method) for awk version

We can just directly replace the fifth column when the sting keyword_A is found as shown below:
awk -F, 'BEGIN{OFS=",";}/keyword_A/{$5="A5{ATTR}='"'"NEW"'"'"}1' filename

Couple of slight alternatives:
sed -r "/keyword_A/s/(A5[^']*')[^']*/\1NEW/"
awk -F"'" '/keyword_A/{$10 = "NEW"}1' OFS="'"
Of course the negative with awk is afterwards you would have to rename the new file.

Linux scripting: Search a specific column for a keyword

I have a large text file that contains multiple columns of data. I'm trying to write a script that accepts a column number and keyword from the command line and searches for any hits before displaying the entire row of any matches.
I've been trying something along the lines of:
grep $fileName | awk '{if ($'$columnNumber' == '$searchTerm') print $0;}'
But this doesn't work at all. Am I on the right lines? Thanks for any help!

The -v option can be used to pass shell variables to awk command.
The following may be what you're looking for:
awk -v s=$SEARCH -v c=$COLUMN '$c == s { print $0 }' file.txt
EDIT:
I am always trying to write more elegant and tighter code. So here's what Dennis means:
awk -v s="$search" -v c="$column" '$c == s { print $0 }' file.txt

Looks reasonable enough. Try using set -x to look at exactly what's being passed to awk. You can also use different and/or more awk things, including getting rid of the separate grep:
awk -v colnum=$columnNumber -v require="$searchTerm"
"/$fileName/ { if (\$colnum == require) print }"
which works by setting awk variables (colnum and require, in this case) and then using the literal string $colnum to get the desired field, and the variable require to get the required-string.
Note that in all cases (with or without the grep command), any regular expression meta-characters in $fileName will be meta-y, e.g., this.that will match the file named this.that but also the file named thisXthat.

how to read each line from a .dat file in unix?

trade.dat is my file which consists of lines of data.
i have to concatanate each line of that file with comma (,)
help me please

If you mean just add a comma to the end of each line:
sed 's/$/,/' <oldfile >newfile
If you mean join all lines together into one line, separating each with a comma:
awk '{printf "%s,",$0}' <oldfile >newfile
Or the more correct one without a trailing comma (thanks, #hacker, for pointing out the error):
awk 'BEGIN {s=""} {printf "%s%s",s,$0;s=","}' <oldfile >newfile
If you want the output of any of those in a shell variable, simply use the $() construct, such as:
str=$(awk 'BEGIN {s=""} {printf "%s%s",s,$0;s=","}' <oldfile)
I find it preferable to use $() rather than backticks since it allows me to nest commands, something backticks can't do.

Two obligatory perl versions (credit goes to William Pursell for the second one):
perl -i -p -e 'chomp($_); $_ = "$_,\n"' trade.dat
perl -i -p -e 's/$/,/' trade.dat
Note that
this does not make backups of the original file by default (use -i.bak for that).
this answer appends a comma to every line. To join all lines together into a single line, separated by commas, look at William Purcell's answer.

tryfullline=""
for line in $(cat trade.dat)
do
fullline="$fullline,$line"
done And then use $fullline to show youe file concatenated
hope this'll helps ;p

perl -pe 's/\n/,/ unless eof'

First thing that comes into my head:
gawk -- '{ if(a) { printf ",%s",$0; } else { printf "%s",$0; a=1 } }' trade.dat
if I correctly understand what you want.

Answering the question in the title, one way to get each line in a variable in a loop in BASH is to:
cat file.dat | while read line; do echo -n "$line",; done
That will leave a trailing comma, but shows how to read each line.
But clearly a sed or awk or perl solutions are the best suited to the problem described in the body of your question.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Running awk on file, with regular expressions - linux

The regular expression you use is achored at the beginning ^ but not the end $. Try: awk '{for(i=1;i<=NF;i++){if($i~/^INPUT$/){print $i}}}' If you want to match INPUT anywhere in the field, try: awk '{for(i=1;i<=NF;i++){if($i~/INPUT/){print $i}}}'

Related

How to get the rest of the Pattern using any linux command?

Replacing characters in each line on a file in linux

How to use grep and sed in order to replace the substring after searching some specific string?

Linux scripting: Search a specific column for a keyword

how to read each line from a .dat file in unix?

Categories

Resources