What to do instead of the regex exp in awk - linux

Right now I am using awk to search for a string I put into the regular expression, however I am trying to use a "defined variable" instead to make my code more reusable. My code I want to replace is:
awk '
/lksdfjsalfjl/ { counter++}
END{}
' file
While researching online I found out a variable cant be used inside of the /.../. Do you have any suggestions on an alternative?

Use ~ for regex matching:
awk -v x='lksdfjsalfjl' '$0~x {counter++} END{print counter+0}' file
In more detail:
-v x='lksdfjsalfjl'
This defines a variable x
$0~x {counter++}
This increments the counter if the current line, $0, matches the regular expression in x.
You can, if your like, use a shell variable to set the awk variable:
a='lksdfjsalfjl'
awk -v x="$a" '$0~x {counter++} END{print counter+0}' file

Related

How to get the rest of the Pattern using any linux command?

I am try to update a file and doing some transformation using any linux tool.
For example, here I am trying with awk.
Would be great to know how to get the rest of the pattern?
awk -F '/' '{print $1"/raw"$2}' <<< "string1/string2/string3/string4/string5"
string1,rawstring2
here I dont know how many "/" is there and I want to get the output:
string1/rawstring2/string3/string4/string5
Something like
awk -F/ -v OFS=/ '{ $2 = "raw" $2 } 1' <<< "string1/string2/string3/string4/string5"
Just modify the desired field, and print out the changed line (Have to set OFS so it uses a slash instead of a space to separate fields on output, and a pattern of 1 uses the default action of printing $0. It's an idiom you'll see a lot of with awk.)
Also possible with sed:
sed -E 's|([^/]*/)|\1raw|' <<< "string1/string2/string3/string4/string5"
The \1 in the replacement string reproduces the bit inside parenthesis and appends raw to it.
Equivalent to
sed 's|\([^/]*/\)|\1raw|' <<< "string1/string2/string3/string4/string5"

Using awk command in Bash

I'm trying to loop an awk command using bash script and I'm having a hard time including a variable within the single quotes for the awk command. I'm thinking I should be doing this completely in awk, but I feel more comfortable with bash right now.
#!/bin/bash
index="1"
while [ $index -le 13 ]
do
awk "'"/^$index/ {print}"'" text.txt
done
Use the standard approach -- -v option of awk to set/pass the variable:
awk -v idx="$index" '$0 ~ "^"idx' text.txt
Here i have set the variable idx as having the value of shell variable $index. Inside awk, i have simply used idx as an awk variable.
$0 ~ "^"idx matches if the record starts with (^) whatever the variable idx contains; if so, print the record.
awk '/'"$index"'/' text.txt
# A lil play with the script part where you split the awk command
# and sandwich the bash variable in between using double quotes
# Note awk prints by default, so idiomatic awk omits the '{print}' too.
should do, alternatively use grep like
grep "$index" text.txt # Mind the double quotes
Note : -le is used for comparing numerals, so you may change index="1" to index=1.

Find and replace string containing forward slash in ksh using a variable

In a file (file1.txt) I have /path1/|value1 (a path, followed by a value). I need to find the line containing that (unique) path and then change the value. So the line should end up as: /path1/|value2.
The challenge is that the /path1/, value1 and value2 parts are both contained within variables.
When I don't use a variable, I can use (thanks to this page):
sed '/path1/s/value1/value2/g' file1.txt > copyfile1.txt
(This creates a copy of the original file which I can later overwrite the original file using mv.)
This is just searching for path1. To search for /path1/ I can use:
sed '/\/path1\//s/value1/value2/g' file1.txt > copyfile1.txt
Using the answers to this question about extracting a substring I can put the /path1/, value1 and value2 parts into variables.
So my current code is:
sed '/'"${PATH}"'/s/'"${PREVIOUS_VALUE}"'/'"${NEW_VALUE}"'/g' file1.txt > copyfile1.txt
But this does not work because the PATH variable contains forward slashes. Using information from here I have tried first doing a substitution like this:
FORMATTED_PATH=$(echo "${PATH}" | sed 's/\//\/\//g')
first, and then used FORMATTED_PATH instead of PATH but then the find and replace does not work (no error messages, new file is empty). And in the logging FORMATTED_PATH = //path1// (which I think is correct).
How can I do this find and replace using variables containing forward slashes?
(I found out via this answer that I needed to close the single quote, use double quotes around the variable and then open the single quote again. But this does not help with the forward slashes.)
The code was so nearly right. Instead of:
FORMATTED_PATH=$(echo "${PATH}" | sed 's/\//\/\//g')
I should have had:
FORMATTED_PATH=$(echo "${PATH}" | sed 's/\//\\\//g')
This then produces the correct logging of: FORMATTED_PATH = \/path1\/
awk will work too:
awk -F '|' -v path="$paht" -v new="$new_value" '{
if ($1 == path) {print path FS new}
else {print}
}' file1.txt > copyfile1.txt
Also, don't use all-caps for your shell variables: you have wiped out your shell's PATH variable used to find programs..
Usually the sed's s command (as in s///) supports using separators other than /. For example:
$ echo '/path1/|value1' | sed 's,\(/path1/|\).*,\1value2,'
/path1/|value2
$
This is very convenient when dealing with file pathnames which include / chars.

Linux scripting: Search a specific column for a keyword

I have a large text file that contains multiple columns of data. I'm trying to write a script that accepts a column number and keyword from the command line and searches for any hits before displaying the entire row of any matches.
I've been trying something along the lines of:
grep $fileName | awk '{if ($'$columnNumber' == '$searchTerm') print $0;}'
But this doesn't work at all. Am I on the right lines? Thanks for any help!
The -v option can be used to pass shell variables to awk command.
The following may be what you're looking for:
awk -v s=$SEARCH -v c=$COLUMN '$c == s { print $0 }' file.txt
EDIT:
I am always trying to write more elegant and tighter code. So here's what Dennis means:
awk -v s="$search" -v c="$column" '$c == s { print $0 }' file.txt
Looks reasonable enough. Try using set -x to look at exactly what's being passed to awk. You can also use different and/or more awk things, including getting rid of the separate grep:
awk -v colnum=$columnNumber -v require="$searchTerm"
"/$fileName/ { if (\$colnum == require) print }"
which works by setting awk variables (colnum and require, in this case) and then using the literal string $colnum to get the desired field, and the variable require to get the required-string.
Note that in all cases (with or without the grep command), any regular expression meta-characters in $fileName will be meta-y, e.g., this.that will match the file named this.that but also the file named thisXthat.

Simple linux script help

I have a text file with the following structure:
text1;text2;text3;text4
...
I need to write a script that gets 2 arguments: the column we want to search in and the content we want to find.
So the script should output only the lines (WHOLE LINES!) that match content(arg2) found in column x(arg1).
I tried with egrep and sed, but I'm not experienced enough to finish it. I would appreciate some guidance...
Given your added information of needing to output the entire line, awk is easiest:
awk -F';' -v col=$col -v pat="$val" '$col ~ pat' $input
Explaining the above, the -v options set awk variables without needing to worry about quoting issues in the body of the awk script. Pre-POSIX versions of awk won't understand the -v option, but will recognize the variable assignment without it. The -F option sets the field separator. In the body, we are using a pattern with the default action (which is print); the pattern uses the variables we set with -v for both the column ($ there is awk's "field index" operator, not a shell variable) and the pattern (and pat can indeed hold an awk-style regex).
cat text_file.txt| cut -d';' column_num | grep pattern
It prints only the column that is matched and not the entire line. let me think if there is a simple solution for that.
Python
#!/usr/bin/env python
import sys
column = 1 # the column to search
value = "the data you're looking for"
with open("your file","r") as source:
for line in source:
fields = line.strip().split(';')
if fields[column] == value:
print line
There's also a solution with egrep. It's not a very beautiful one but it works:
egrep "^([^;]+;){`expr $col - 1`}$value;([^;]+;){`expr 3 - $col`}([^;]+){`expr 4 - $col`}$" filename
or even shorter:
egrep "^([^;]+;){`expr $col - 1`}$value(;|$)" filename
grep -B1 -i "string from previous line" |grep -iv 'check string from previous line' |awk -F" " '{print $1}'
This will print your line.

Resources