Linux scripting: Search a specific column for a keyword - linux

I have a large text file that contains multiple columns of data. I'm trying to write a script that accepts a column number and keyword from the command line and searches for any hits before displaying the entire row of any matches.
I've been trying something along the lines of:
grep $fileName | awk '{if ($'$columnNumber' == '$searchTerm') print $0;}'
But this doesn't work at all. Am I on the right lines? Thanks for any help!

The -v option can be used to pass shell variables to awk command.
The following may be what you're looking for:
awk -v s=$SEARCH -v c=$COLUMN '$c == s { print $0 }' file.txt
EDIT:
I am always trying to write more elegant and tighter code. So here's what Dennis means:
awk -v s="$search" -v c="$column" '$c == s { print $0 }' file.txt

Looks reasonable enough. Try using set -x to look at exactly what's being passed to awk. You can also use different and/or more awk things, including getting rid of the separate grep:
awk -v colnum=$columnNumber -v require="$searchTerm"
"/$fileName/ { if (\$colnum == require) print }"
which works by setting awk variables (colnum and require, in this case) and then using the literal string $colnum to get the desired field, and the variable require to get the required-string.
Note that in all cases (with or without the grep command), any regular expression meta-characters in $fileName will be meta-y, e.g., this.that will match the file named this.that but also the file named thisXthat.

Related

using grep lookup/cut function instead of source to load config file in bash

I have a script that I'm using now that loads all my config variables in by means of source command. It's simple quick and effective. But i understand that it's not a very secure option.
I've read that I can use the $include directive to achieve the same results. Is that any different or safer than source or are they essentially the same?
As a final alternative if the above two options are not safe ones, I'm trying to understand a lookup function I found in a shell scripting book. It basically used grep, a delimiter and cut to perform a variable name lookup from the file and retrieve the value. This seems safe and I can use it to modify my scripts.
It almost works as is. I think I just need to change the delimiter to "=" from $TAB but I'm not sure how it works or if it even will.
My config file format:
Param=value
Sample function (from notes)
lookup() {
grep "^$1$TAB" "$2" | cut -f2
}
Usage:
lookup [options] KEY FILE
-f sets field delimiter
-k sets the number of field which has key
-v specifies which field to return
I'm using Debian version of Raspbian Jessie Lite in case that matters on syntax.
Instead of grep and cut you should consider using awk that can do both search and cut operations based on a given delimiter easily:
lookup() {
key="$1"
filename="$2"
awk -F= -v key="$key" '$1 == key{print $2}' "$filename"
# use this awk if = can be present in value part as well
# awk -v key="^$key=" '$0 ~ key && sub(key, "")' "$filename"
}
This can be called as:
lookup search configfile.txt
-F= sets delimiter as = for awk command.
Also note that $1 and $2 inside single quotes are columns #1 and #2 and one shouldn't be confused with positional shell variables $1, $2 etc.
You should look into getopts to make it accept -f, -k etc type arguments.

awk output to variable [duplicate]

This question already has answers here:
How do I set a variable to the output of a command in Bash?
(15 answers)
Closed 6 years ago.
[Dd])
echo"What is the record ID?"
read rID
numA= awk -f "%" '{print $1'}< practice.txt
I cannot figure out how to set numA = to the output of the awk in order to compare rID and numA. numA is equal to the first field of a txt file which is separated by %. Any suggestions?
You can capture the output of any command in a variable via command substitution:
numA=$(awk -F '%' '{print $1}' < practice.txt)
Unless your file contains only one line, however, the awk command you presented (as corrected above) is unlikely to be what you want to use. If the practice.txt file contains, say, answers to multiple questions, one per line, then you probably want to structure the script altogether differently.
You don't need to use awk, just use parameter expansion:
numA=${rID%%\%*}
this is the correct syntax.
numA=$(awk -F'%' '{print $1}' practice.txt)
however, it will be easier to do comparisons in awk by passing the bash variable in.
awk -F'%' -v r="$rID" '$1==r{... do something ...}' practice.txt
since you didn't specify any details it's difficult to suggest more...
to remove rID matching line from the file do this
awk -F'%' -v r="$rID" '$1!=r' practice.txt > output
will print the lines where the condition is met ($1 not equal to rID), equivalent to deleting the ones which are equal. You can mimic in place replacement by
awk ... practice.txt > temp && mv temp practice.txt
where you fill in ... from the line above.
Try using
$ numA=`awk -F'%' '{ if($1 != $0) { print $1; exit; }}' practice.txt`
From the question, "numA is equal to the first field of a txt file which is separated by %"
-F'%', meaning % is the only separator we care about
if($1 != $0), meaning ignore lines that don't have the separator
print $1; exit;, meaning exit after printing the first field that we encounter separated by %. Remove the exit if you don't want to stop after the first field.

What to do instead of the regex exp in awk

Right now I am using awk to search for a string I put into the regular expression, however I am trying to use a "defined variable" instead to make my code more reusable. My code I want to replace is:
awk '
/lksdfjsalfjl/ { counter++}
END{}
' file
While researching online I found out a variable cant be used inside of the /.../. Do you have any suggestions on an alternative?
Use ~ for regex matching:
awk -v x='lksdfjsalfjl' '$0~x {counter++} END{print counter+0}' file
In more detail:
-v x='lksdfjsalfjl'
This defines a variable x
$0~x {counter++}
This increments the counter if the current line, $0, matches the regular expression in x.
You can, if your like, use a shell variable to set the awk variable:
a='lksdfjsalfjl'
awk -v x="$a" '$0~x {counter++} END{print counter+0}' file

Issues with the AWK function

Does Awk have a limit to the amount of data it can process?
for i in "052" "064" "060" "070" "074" "076" "178"
do
awk -v f="${i}" -F, 'match ($1,f) { print $2","$3 }' uls.csv > ul$i.csv
awk -v f="${i}" -F, 'match ($1,f) { print $2","$3 }' dls.csv > dl$i.csv
awk -v n="${i}" -F, 'match ($1,n) { print $2","$3 }' dlsur.csv >> dlu$i.csv
awk -v k="${i}" -F, 'match ($1,k) { print $2","$3 }' dailyd.csv >> dla$i.csv
awk -v m="${i}" -F, 'match ($1,m) { print $2","$3 }' dailyu.csv >> ula$i.csv
done
When I run that piece of code, it basically pulls data from csv files and creates new files.
that piece of code works perfectly.
but when i add an extra file (in the for loop), for example "180" it will create that file, but will also include a few lines of data from other files. I went over the code many times. I even checked the raw data before it goes into this loop, and it is all correct. This seems like a glitch in awk.
Do I need to apply a wait function so that it can catch up?
Also something like
for file in uls dls dlsur dailyd dailyu; do
awk -F, -vOFS=, -vfile=$i '$1 ~ /052|064|060|070|074|076|178/ {print $2,$3 >> file$1.csv}' $file.csv
done
is probably better if it does what you want. Many fewer invocations of awk and loops through your files. (Slightly different output file names. That would be fixable but complicate the script a bit more than I thought was necessary for the purpose.)
No. What you say you think is happening cannot be happening - awk WILL NOT randomly pull data from un-specified files and put it in it's output stream.
Note that in your 3rd and subsequent lines you are using '>>' instead of '>' for your output redirection - have you accounted for that?
If you update your question (i.e. do NOT try to do it in a comment!) to tell us what you're trying to do with some representative sample input and expected output (just 2 input files, not 5, should be enough to explain your problem), we can help you write a correct script to do that.

Simple linux script help

I have a text file with the following structure:
text1;text2;text3;text4
...
I need to write a script that gets 2 arguments: the column we want to search in and the content we want to find.
So the script should output only the lines (WHOLE LINES!) that match content(arg2) found in column x(arg1).
I tried with egrep and sed, but I'm not experienced enough to finish it. I would appreciate some guidance...
Given your added information of needing to output the entire line, awk is easiest:
awk -F';' -v col=$col -v pat="$val" '$col ~ pat' $input
Explaining the above, the -v options set awk variables without needing to worry about quoting issues in the body of the awk script. Pre-POSIX versions of awk won't understand the -v option, but will recognize the variable assignment without it. The -F option sets the field separator. In the body, we are using a pattern with the default action (which is print); the pattern uses the variables we set with -v for both the column ($ there is awk's "field index" operator, not a shell variable) and the pattern (and pat can indeed hold an awk-style regex).
cat text_file.txt| cut -d';' column_num | grep pattern
It prints only the column that is matched and not the entire line. let me think if there is a simple solution for that.
Python
#!/usr/bin/env python
import sys
column = 1 # the column to search
value = "the data you're looking for"
with open("your file","r") as source:
for line in source:
fields = line.strip().split(';')
if fields[column] == value:
print line
There's also a solution with egrep. It's not a very beautiful one but it works:
egrep "^([^;]+;){`expr $col - 1`}$value;([^;]+;){`expr 3 - $col`}([^;]+){`expr 4 - $col`}$" filename
or even shorter:
egrep "^([^;]+;){`expr $col - 1`}$value(;|$)" filename
grep -B1 -i "string from previous line" |grep -iv 'check string from previous line' |awk -F" " '{print $1}'
This will print your line.

Resources