Using AWK and setting results to bash variables/arrays?

Using AWK and setting results to bash variables/arrays? - linux

I have a file that replicates the results from show processlist command from mySQL.
The file looks like this:
*************************** 1. row ***************************
Id: 1
User: system user
Host:
db: NULL
Command: Connect
Time: 1030455
State: Waiting for master to send event
Info: NULL
*************************** 2. row ***************************
Id: 2
User: system user
Host:
db: NULL
Command: Connect
Time: 1004
State: Has read all relay log; waiting for the slave
I/O thread to update it
Info: NULL
And it keeps going on for a few more times in the same structure.
I want to use AWK to only get these parameters: Time,ID,Command and State, and store every one of these parameters into a different variable or array so that I can later use / print them in my bash shell.
The problem is, I am pretty bad with AWK, I dont know how to both seperate the parameters I want from the file and also set them as a bash variable or array.
Many thanks in advance for the help!
EDIT: Here is my code so far
echo "Enter age"
read age
cat data | awk 'BEGIN{ RS="row"
FS="\n"
OFS="\n"}
{ print $2,$7}
' | awk 'BEGIN{ RS="Id"}
{if ($4 > $age){print $2}}'
The file 'data' contains blocks like I have pasted above. The code should, if the 'age' entered is smaller than the Time parameter in the data file (which is $4 in my awk code), return the ID parameter, but it returns nothing.
If I remove the if statement and print $4 instead of $2 this is my output
Enter age
1
1030455
1004
2144
2086
0
So I was thinking maybe that blank line is somehow messing up my AWK print? Is there a simple way to ignore that blank line while keeping my other data?

This is how you'd use awk to produce the values you want as a set of tab-separated fields on each line per "row" block from the input:
$ cat tst.awk
BEGIN {
RS="[*]+ [[:digit:]]+[]. row [*]+\n"
FS="\n"
OFS="\t"
}
NR>1 {
sub(/\n$/,"") # remove the trailing newline
gsub(/\n\s+/," ") # compress all multi-line fields into single lines
gsub(OFS," ") # ensure the only OFS in the output IS between fields
delete n2v
for (i=1; i<=NF; i++) {
name = gensub(/:.*/,"","",$i)
value = gensub(/^[^:]+:\s+/,"","",$i)
n2v[name] = value
}
if (n2v["Time"]+0 > age) { # force a numeric comparison
print n2v["Time"], n2v["Id"], n2v["Command"], n2v["State"]
}
}
$ awk -v age=2000 -f tst.awk file
1030455 1 Connect Waiting for master to send event
If the target age is already stored in a shell variable just init the awk variable from the shell variable of the same name:
$ age="2000"
$ awk -v age="$age" -f tst.awk file
The above uses GNU awk for multi-char RS (which you already had), gensub(), \s, and delete array.
When you say "and store every one of these parameters into a different variable or array" it could mean one of several things so I'll leave that part up to you but you might be looking for something like:
arr=( $(awk '...') )
or
awk '...' |
while IFS="\t" read -r Time Id Command State
do
<do something with those 4 vars>
done
but by far the most likely situation is that you don't want to use shell at all but instead just stay inside awk.
Remember - every time you write a loop in shell just to manipulate text you have the wrong approach. UNIX shell is an environment from which to call UNIX tools and the UNIX tool for general text manipulation is awk.
Until you edit your question to tell us more about your problem though, we can't guess what the right solution is from this point on.

At the first level you have your shell which you use to run any other child process. It's impossible to modify parents environment from within child process. When you run your bash script file (which has +x right) it's spawned as new process (child). It can set it's own environment but when it ends its live you'll get back to the original (parent).
You can set some variables on bash and export them to it's environment. It'll be inherited by it's children. However it can't be done in opposite direction (parent can't inherit from its child).
If you wish to execute some commands from the script file in the current bash's context you can source the script file. source ./your_script.sh or . ./your_script.sh will do that for you.
If you need to run awk to filter some data for you and keep results in the bash you can do:
awk ... | read foo
This works as read is shell buildin function rather than external process (check type read, help, help read, man bash to check it by yourself).
or:
foo=`awk ....`
There are many other constructions you can use. Whatever bash script you do please compare your code with bash pitfalls webpage.

Related

Writing to the same file with awk and truncate

My system is Arch Linux and my window manager is DWM. I use dash as my shell interpreter.
I have written this extension shell script for my timer.
xev -root |
awk -F'[ )]+' '/^KeyPress/ { a[NR+2] }
NR in a {
if ($8 == "Return") {
exit 0;
} else if ($8 == "BackSpace") {
system("truncate -s-1 timer.txt");
} else if (length($8) == 1) {
printf "%s", $8;
fflush(stdout);
}
system("pkill -RTMIN+3 dwmblocks");
}' | tee timer.txt
The timer itself sits in dwmblocks status bar. I want to name my timers first and then let it start. But I don't think that's that important.
The purpose of this script - I want to input characters into the root window of DWM and have them appear in my status bar instantly. So, xev produces the key pressed information, then awk takes that information, finds the exact key (from all the information that xev outputs) and checks. If the key is "Return", awk exits (job done). If key is "BackSpace" awk calls truncate from the system. If it's a regular character key, then awk outputs it to timer.txt with tee (I could use "> timer.txt" too, I think, but I want to see the output in my terminal for debugging.
After every relevant keypress (single character) I fflush stdout. After all of that I finally call pkill so that dwmblocks knows that it should update. (dwmblocks issues cat operation on the file)
Okay, "Return" and character input works fine. But there's a problem with "BackSpace". I've read about it a bit (I'd say I'm still a Unix newbie even though I've been using Linux for two years now) and I found out that writing to the same file from different processes is bad news. Still. Could it be done somehow? The fact is that truncate only writes to the file when awk, doesn't, so, maybe, it wouldn't be that big of a deal?
This exact script worked earlier yesterday but now it doesn't. At first, I tried using sed instead of truncate and truncate seemed to let me delete characters from timer.txt but now truncate seems to not work anymore too. Well, it kinda works. I can input my characters and then I can delete them. BUT. After pressing Backspace I can not enter any more characters. If I try to enter a character Backspace stops working too.
So yeah. I'd have several questions. First - what the hell is the problem? As I've said, it used to work and now it doesn't. Am I wandering into undefined behavior in this script?
Second - could this be done - meaning - could I somehow write and delete from the same file. Maybe with some other tool, not awk?
Thanks in advance.

This probably isn't an answer but it's too much to go in a comment. I don't know the details of most of the tools you mention, nor do I really understand what it is you're trying to do but:
A shell is a tool to manipulate files and processes and schedule calls to other tools. Awk is a tool to manipulate text. You're trying to use awk like a shell - you have it sequencing calls to truncate and pkill and calling system to spawn a subshell each time you want to execute either of them. What you should be doing, for example, is just:
shell { truncate }
but what you're actually doing is:
shell { awk { system { shell { truncate } } } }
Can you take that role away from awk and give it back to your shell? It should make your overall script simpler, conceptually at least, and probably more robust.
Maybe try something like this (untested):
#!/usr/bin/env bash
while IFS= read -r str; do
case $str in
Return ) exit 0 ;;
BackSpace ) truncate -s-1 timer.txt ;;
? ) printf "%s" "$str" | tee -a timer.txt ;;
esac
pkill -RTMIN+3 dwmblocks
done < <(
xev -root |
awk -F'[ )]+' '/^KeyPress/{a[NR+2]} NR in a{print $8; fflush()}'
)
I moved the write to timer.txt inside the loop to make sure tees not trying to write to it while you're truncating it - that may not be necessary.

Unix: What does cat by itself do?

I saw the line data=$(cat) in a bash script (just declaring an empty variable) and am mystified as to what that could possibly do.
I read the man pages, but it doesn't have an example or explanation of this. Does this capture stdin or something? Any documentation on this?
EDIT: Specifically how the heck does doing data=$(cat) allow for it to run this hook script?
#!/bin/bash
# Runs all executable pre-commit-* hooks and exits after,
# if any of them was not successful.
#
# Based on
# http://osdir.com/ml/git/2009-01/msg00308.html
data=$(cat)
exitcodes=()
hookname=`basename $0`
# Run each hook, passing through STDIN and storing the exit code.
# We don't want to bail at the first failure, as the user might
# then bypass the hooks without knowing about additional issues.
for hook in $GIT_DIR/hooks/$hookname-*; do
test -x "$hook" || continue
echo "$data" | "$hook"
exitcodes+=($?)
done
https://github.com/henrik/dotfiles/blob/master/git_template/hooks/pre-commit

cat will catenate its input to its output.
In the context of the variable capture you posted, the effect is to assign the statement's (or containing script's) standard input to the variable.
The command substitution $(command) will return the command's output; the assignment will assign the substituted string to the variable; and in the absence of a file name argument, cat will read and print standard input.
The Git hook script you found this in captures the commit data from standard input so that it can be repeatedly piped to each hook script separately. You only get one copy of standard input, so if you need it multiple times, you need to capture it somehow. (I would use a temporary file, and quote all file name variables properly; but keeping the data in a variable is certainly okay, especially if you only expect fairly small amounts of input.)

Doing:
t#t:~# temp=$(cat)
hello how
are you?
t#t:~# echo $temp
hello how are you?
(A single Controld on the line by itself following "are you?" terminates the input.)
As manual says
cat - concatenate files and print on the standard output
Also
cat Copy standard input to standard output.
here, cat will concatenate your STDIN into a single string and assign it to variable temp.

Say your bash script script.sh is:
#!/bin/bash
data=$(cat)
Then, the following commands will store the string STR in the variable data:
echo STR | bash script.sh
bash script.sh < <(echo STR)
bash script.sh <<< STR

Parsing oracle SQLPLUS error message in shell script for emailing

I'm trying to extract a substring from an Oracle error message so I can email it off to an administrator using awk, this part of the code is trying to find where the important bit I want to extract.
starts here's what I have....
(The table name is incorrect to generate the error)
validate_iwpcount(){
DB_RETURN_VALUE=`sqlplus -s $DB_CRED <<END
SELECT count(COLUMN)
FROM INCORRECT_TABLE NAME;
exit
END`
a="$DB_RETURN_VALUE"
b="ERROR at line"
awk -v a="$a" -v b="$b" 'BEGIN{print index(a,b)}'
echo $DB_RETURN_VALUE
}
Strange thing is no matter how big that $DB_RETURN_VALUE is the return value from awk is always 28. Im assuming that somewhere in this error message there's something linux either thinks is a implcit delimiter of somesort and its messing with the count or something stranger. This works fine with regular strings as opposed to what oracle gives me.
Could anybody shine a light on this?
Many thanks

28 seems to be the right answer for the query you have (slightly amended to avoid an ORA-00936, and with tabs in the script). The message you're echoing includes a file expansion; the raw message is:
FROM IW_PRODUCTzS
*
ERROR at line 2:
ORA-00942: table or view does not exist
The * is expanded when you echo $DB_RETURN_VALUE, so the directory you're executing this from seem to have logs mail_files scripts in it, and they are being shown through expansion of the *. If you run it from different directories the echoed message length will vary, but the length of the actual message from Oracle stays the same - the length is changing (through the expansion) after the SQL*Plus call and after awk has done its thing. You can avoid that expansion with echo "$DB_RETURN_VALUE" instead, though I don't suppose you actually want to see that full message anyway in the end.
The substring from character 28 gives you what you want though:
validate_iwpcount(){
DB_RETURN_VALUE=`sqlplus -s $CENSYS_ORACLE_UID <<END
SELECT count(COLUMN_NAME)
FROM IW_PRODUCTzS;
exit
END`
# To see the original message; note the double-quotes
# echo "$DB_RETURN_VALUE"
a="$DB_RETURN_VALUE"
b="ERROR at line"
p=`awk -v a="$a" -v b="$b" 'BEGIN{print index(a,b)}'`
if [ ${p} -gt 0 ]; then
awk -v a="$a" -v p="$p" 'BEGIN{print substr(a,p)}'
fi
}
validate_iwpcount
... displays just:
ERROR at line 2:
ORA-00942: table or view does not exist
I'm sure that can be simplified, maybe into a single awk call, but I'm not that familiar with it.

"bad interpreter" error message when trying to run awk executable

I'm trying to make an awk file executable. I've written the script, and did chmod +x filename. Here is the code:
#!/bin/awk -v
'TOPNUM = $1
## pick1 - pick one random number out of y
## main routine
BEGIN {
## set seed
srand ()
## get a random number
select = 1 +int(rand() * TOPNUM)
# print pick
print select
}'
When I try and run the program and put in a variable for the TOPNUM:
pick1 50
I get the response:
-bash: /home/petersone/bin/pick1: /bin/awk: bad interpreter: No such file or directory
I'm sure that there's something simple that I'm messing up, but I simply cannot figure out what it is. How can I fix this?

From a command line, run this command:
which awk
This will print the path of AWK, which is likely /usr/bin/awk. Correct the first line and your script should work.
Also, your script shouldn't have the single-quote characters at the beginning and end. You can run AWK from the command line and pass in a script as a quoted string, or you can write a script in a file and use the #!/usr/bin/awk first line, with the commands just in the file.
Also, the first line of your script isn't going to work right. In AWK, setup code needs to be inside the BEGIN block, and $1 is a reference to the first word in the input line. You need to use ARGV[1] to refer to the first argument.
http://www.gnu.org/software/gawk/manual/html_node/ARGC-and-ARGV.html
As #TrueY pointed out, there should be a -f on the first line:
#!/usr/bin/awk -f
This is discussed here: Invoking a script, which has an awk shebang, with parameters (vars)
Working, tested version of the program:
#!/usr/bin/awk -f
## pick1 - pick one random number out of y
## main routine
BEGIN {
TOPNUM = ARGV[1]
## set seed
srand ()
## get a random number
select = 1 +int(rand() * TOPNUM)
# print pick
print select
}

Actually this form is more preferrable:
#! /bin/awk -E
Man told:
-E Similar to -f, however, this is option is the last one processed and should be used with #! scripts, particularly for CGI applications, to avoid passing in options or source code (!) on the command line from a URL. This option disables command-line variable assignments

awk unix insert into file location directory

In linux, I am trying to select a variable from a specific column and row of CSV file and then use this variable as the end of a file location hierarchy. When I type the following into a bash terminal window, it seems to work by outputting the variable in correct row and column on screen.
awk -F "," 'FNR == 2 {print $8}' /sdata/images/projects/ASD_SSD/1/ruths_data/ruth/imaging\ study/imaging\ study\ working/delete2.csv
However, I am trying to go do the following substitution within a script, this fails to work...
r=2
c=8
s=awk -F "," 'FNR == $r {print $c}' /sdata/images/projects/ASD_SSD/1/ruths_data/ruth/imaging\ study/imaging\ study\ working/delete2.csv
I then try to use the s output as the end of a hierarchy file location. For example, /home/ork/js/s*
I keep getting the following error, so this looks like it's not creating the s variable and then not inserting it into the actual file location.
omitting directory `/home/ork/js/'
I have spent a few hours trying to figure out what is preventing this from working and am a new user (so I am sure it is something simple, sorry).
I hope I was clear enough, please let me know if this requires further clarification.

This is a common question here. The single quotes are protecting the variables from the shell, so they never get expanded. Also command substitution is needed when assigning to variable s. One way to do it would be:
s=$(awk -F, 'FNR==r{print c}' r="$r" c="$c" file)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string