How to create parallel connections and queries to db in bash script - linux

I have an oracle db on my Linux machine.
A single sql query (1 connection) via bash is as follows:
su - oracle
sqlplus <dbuser>/<dbpass>
select * from cat;
exit
I'm trying to run parallel queries via bash, the following script is for running 10000 connections in parallel (Correct me if i'm wrong):
for i in $(seq 1 10000); do echo "select * from <tableName>;" | sqlplus <dbuser>/<dbpass>&done
I would like to make this code more robust and flexible, for the sake of example i want to add a sleep between each of the following command:
Create a connection
Create a table (Unique to this connection, i as index for example)
Select data from the table
Close the connection
The following code is my attempt of doing so: (Not working)
for i in $(seq 1 10000);
do
echo "CREATE TABLE test+i (id NUMBER NOT NULL);"
sleep 2
echo "select * from test+i"
sleep 2
echo "DROP TABLE test+i" | sqlplus <dbuser>/<dbpass>&
done
1) Syntactically, how should i write it?
2) How can i know how many queries/connections succeeded and how many failed?
3) How can i know how many connections actually ran in parallel

1) you can use ( and ) to group command into subshells, and send them background:
for i in $(seq 1 10000);
do
echo "CREATE TABLE test_$i (id NUMBER NOT NULL);
!sleep 2
select * from test_$i;
!sleep 2
DROP TABLE test_$i;" | sqlplus <dbuser>/<dbpass> &
done
2) you can set up error handling after each sqlplus call (examine output or exit value)
echo "CREATE TABLE test_$i (id NUMBER NOT NULL);" | sqlplus <dbuser>/<dbpass> 2>&1 | grep -i error
3) you can use the jobs command to examine how many job is running in the background:
> sleep 100 &
[1] 31642
> jobs
[1]+ Running sleep 100 &

10000 jobs in parallel will often cause overflow. By setting 'WHENEVER SQLERROR EXIT SQL.SQLCODE' sqlplus will return an error, if the SQL fails. GNU Parallel can then re-run the query.
my.log will show if the query failed after rerunning 3 times.
doit() {
i=$1
(echo "WHENEVER SQLERROR EXIT SQL.SQLCODE CREATE TABLE test$i (id NUMBER NOT NULL);"
sleep 2
echo "WHENEVER SQLERROR EXIT SQL.SQLCODE select * from test$i;"
sleep 2
echo "WHENEVER SQLERROR EXIT SQL.SQLCODE DROP TABLE test$i;") |
sqlplus <dbuser>/<dbpass>
}
export -f doit
seq 1 10000 | parallel --joblog my.log -j0 --retries 3 doit

Related

Running multiple parallel sqlplus connections for a certain period of time in bash

I have this code that creates 20 parallel sqlplus instances, doing some queries and exits:
#!/bin/sh
for i in $(seq 1 20);
do
echo "CREATE TABLE table_$i (id NUMBER NOT NULL);
select * from table_$i;
! sleep 30
select * from table_$i;
! sleep 30
DROP TABLE table_$i;" | sqlplus system/password &
done
wait
I need to adjust this code if possible so it would run for an hour with the following conditions:
Always stay on 20 connections, if one sqlplus instance is closed (Finished it's process) another one should open, i need to maintain a certain amount of connections for X amount of time.
Is there anything i can add to this code that will achieve what i need?
For looping during an hour, see https://stackoverflow.com/a/22735757/3220113
runsql() {
i="$1"
end=$((SECONDS+3600))
SECONDS=0
while (( SECONDS < end )); do
# Do what you want.
echo "CREATE TABLE table_$i (id NUMBER NOT NULL);
select * from table_$i;
! sleep 30
select * from table_$i;
! sleep 30
DROP TABLE table_$i;" | sqlplus system/password
sleep 1 # precaution when sqlplus fails, maybe wrong password
done
}
for i in $(seq 1 20); do
runsql $i &
done
wait
Explanation:
The main loop at the bottom starts the function runsql 20 times in the background.
The function runsql could use $1 everywhere, I copy it to i for code that looks like the original.
SECONDS is a counter that is changed every second by the shell, so we do not need to call date.
3600 is an hour.
Inside (( .. )) you can do math without $ in front of variables.

shell script has been retrieving from edbplus sql results with echo outputs

I am trying to call edbplus to count a table from a command-line linux shell script, but I have been retrieving from edbplus the response number with others outputs in the same response, I am trying to retrieve from it only an integer response number.
#!/bin/sh
COUNT=`./edbplus.sh -silent user/password#localhost:5444/mydb<<-EOF
SET PAGESIZE 0 FEEDBACK OFF VERIFY OFF HEADING OFF ECHO OFF
SELECT COUNT(ID) FROM MYTABLE
EXIT;
EOF`
echo $COUNT
Response:
$ echo $COUNT
6-------------------d always takes 2 parameters: variable_name value
Do you know how get only the integer number?
If the 1st value is going to be integer. Please try the below commands
echo $COUNT | cut -d - -f 1
(or)
if only one int value if required, then please try
echo $COUNT | cut -c 1
To solve it from EDB perspective:
If the below flags are used in EDB in single line, then the above issue would have caused.
SET PAGESIZE 0
SET FEEDBACK OFF
SET VERIFY OFF
SET HEADING OFF
SET ECHO OFF
Kindly update it as above and provide it in individual lines.

Running crontab only on one line in a file each time

I'm trying to configure crontab to execute at different times different lines of code inside a file. I basically have a bash script file that starts some java -jar. The problem is that each line should be executed at a different time. I can configure crontab to run the whole script at different times but no the lines to run. Now this is important that the bash file will stay only one file and not broken down to a few files.
Thanks!
One way of doing it (via command line arguments passed by cron)
some_script.sh:
if test $1 = 1 ; then
# echo "1 was entered"
java -jar some_file.jar
elif test $1 = 2 ; then
# echo "2 was entered"
java -jar another_file.jar
fi
crontab example:
* 1 * * * /bin/bash /home/username/some_script.sh 1
* 2 * * * /bin/bash /home/username/some_script.sh 2
Another approach (hour matching done in bash script)
some_script.sh:
hour=$(date +"%H");
if test $hour = 1 ; then
# echo "the hour is 1";
java -jar some_file.jar
elif test $hour = 2 ; then
# echo "the hour is 2";
java -jar another_file.jar
fi
crontab example:
* 1 * * * /bin/bash /home/username/some_script.sh
* 2 * * * /bin/bash /home/username/some_script.sh

Stopping the shell script if any of the query gets failed

Below is my shell script from which I am trying to invoke few hive SQL queries which is working fine.
#!/bin/bash
DATE_YEST_FORMAT1=`perl -e 'use POSIX qw(strftime); print strftime "%Y-%m-%d",localtime(time()- 3600*504);'`
echo $DATE_YEST_FORMAT1
hive -e "
SELECT t1 [0] AS buyer_id
,t1 [1] AS item_id
,created_time
FROM (
SELECT split(ckey, '\\\\|') AS t1
,created_time
FROM (
SELECT CONCAT (
buyer_id
,'|'
,item_id
) AS ckey
,created_time
FROM dw_checkout_trans
WHERE to_date(from_unixtime(cast(UNIX_TIMESTAMP(created_time) AS BIGINT))) = '$DATE_YEST_FORMAT1' distribute BY ckey sort BY ckey
,created_time DESC
) a
WHERE rank(ckey) < 1
) X
ORDER BY buyer_id
,created_time DESC;"
sleep 120
QUERY1=`hive -e "
set mapred.job.queue.name=hdmi-technology;
SELECT SUM(total_items_purchased), SUM(total_items_missingormismatch) from lip_data_quality where dt='$DATE_YEST_FORMAT2';"`
Problem Statement:-
If you see my first hive -e block after the echo $DATE_YEST_FORMAT1. Sometimes that query gets failed due to certain reasons. So currently what happens is that, if the first Hive SQL query gets failed, then it goes to second Hive SQL query after sleeping for 120 seconds. And that is the thing I don't want. So Is there any way if the first query gets failed dues to any reasons, it should get stopped automatically at that point. And it should start running automatically from the starting again after few minutes(should be configurable)
Update:-
As suggested by Stephen.
I tried something like this-
#!/bin/bash
hive -e " blaah blaah;"
RET_VAL=$?
echo $RET_VAL
if [ $RET_VAL -ne 0]; then
echo "HiveQL failed due to certain reason" | mailx -s "LIP Query Failed" -r rj#host.com rj#host.com
exit(1)
I got something like this below as an error and I didn't got any email too. Anything wrong with my syntax and approach?
syntax error at line 152: `exit' unexpected
Note:-
Zero is success here if the Hive Query is executed successfully.
Another Update after putting the space:-
After making changes like below
#!/bin/bash
hive -e " blaah blaah;"
RET_VAL=$?
echo $RET_VAL
if [ $RET_VAL -ne 0 ]; then
echo "HiveQL failed due to certain reason for LIP" | mailx -s "LIP Query Failed" -r rj#host.com rj#host.com
fi
exit
hive -e 'Another SQL Query;'
I got something like below-
RET_VAL=0
+ echo 0
0
+ [ 0 -ne 0 ]
+ exit
Status code was zero as my first query was successful but my program exited after that and it didn't went to execute my second query? Why? I am missing something here for sure again.
You may also find useful setting the exit immediately option:
set -e Exit immediately if a simple command (see SHELL GRAMMAR
above) exits with a non-zero status. The shell does not
exit if the command that fails is part of the command
list immediately following a while or until keyword,
part of the test in an if statement, part of a && or ||
list, or if the command's return value is being inverted
via !. A trap on ERR, if set, is executed before the
shell exits.
as in this example
#!/bin/bash
set -e
false
echo "Never reached"
Unless I'm misunderstanding the situation, it's very simple:
#!/bin/bash
DATE_YEST_FORMAT1=`perl -e 'use POSIX qw(strftime); print strftime "%Y-%m-%d",localtime(time()- 3600*504);'`
echo $DATE_YEST_FORMAT1
QUERY0="
SELECT t1 [0] AS buyer_id
,t1 [1] AS item_id
,created_time
FROM (
SELECT split(ckey, '\\\\|') AS t1
,created_time
FROM (
SELECT CONCAT (
buyer_id
,'|'
,item_id
) AS ckey
,created_time
FROM dw_checkout_trans
WHERE to_date(from_unixtime(cast(UNIX_TIMESTAMP(created_time) AS BIGINT))) = '$DATE_YEST_FORMAT1' distribute BY ckey sort BY ckey
,created_time DESC
) a
WHERE rank(ckey) < 1
) X
ORDER BY buyer_id
,created_time DESC;"
if hive -e "$QUERY0"
then
sleep 120
QUERY1=`hive -e "
set mapred.job.queue.name=hdmi-technology;
SELECT SUM(total_items_purchased), SUM(total_items_missingormismatch) from lip_data_quality where dt='$DATE_YEST_FORMAT2';"`
# ...and whatever you do with $QUERY1...
fi
The string $QUERY0 is for convenience, not necessity. The key point is that you can test whether a command succeeded (returned status 0) with the if statement. The test command (better known as [) is just a command that returns 0 when the tested condition is met, and 1 (non-zero) when it is not met.
So, the if statement runs the first hive query; if it passes (exit status 0), then (and only then) does it move on to the actions in the then clause.
I've resisted the temptation to reformat your SQL; suffice to say, it is not the layout I would use in my own code.

Run cron job in non-silent mode?

I created a simple linux script that essentially calls sqlplus and puts the results in variable X. I then analyze X and determine whether or not I need to send out a syslog message.
The script works perfectly when I run it from the command line as "oracle"; however when I use crontab as "oracle" and add it to my job, X isn't getting filled.
I could be wrong, but I believe the issue is since cron runs things in silent mode, X isn't actually getting filled, but when I run it manually it is.
Here's my crontab -l result (as oracle):
0,30 * * * * /scripts/isOracleUp.sh syslog
Here's my full script:
#Created by: hatguy
#Created date: May 8, 2012
#File Attributes: Must be executable by "oracle"
#Description: This script is used to determine if Oracle is up
# and running. It does a simple select on dual to check this.
DATE=`date`
USER=$(whoami)
if [ "$USER" != "oracle" ]; then
#note: $0 is the full path of whatever script is being run.
echo "You must run this as oracle. Try \"su - oracle -c $0\" instead"
exit;
fi
X=`sqlplus -s '/ as sysdba'<<eof
set serveroutput on;
set feedback off;
set linesize 1000;
select count(*) as count_col from dual;
EXIT;
eof`
#This COULD be more elegant. The issue I'm having is that I can't figure out
#which hidden characters are getting fed into X, so instead what I did was
#check the string legth (26) and checked that COUNT_COL and 1 were where I
#expected.
if [ ${#X} -eq 26 ] && [ ${X:1:10} = "COUNT_COL" ] && [ ${X:24:3} = "1" ] ; then
echo "Connected"
#log to a text file that we checked and confirmed connection
if [ "$1" == "syslog" ]; then
echo "$DATE: Connected" >> /scripts/log/isOracleUp.log
fi
else
echo "Not Connected"
echo "Details: $X"
if [ "$1" == "syslog" ]; then
echo "Sending this to syslog"
echo "==========================================================" >> /scripts/log/isOracleUp.log
echo "$DATE: Disconnected" >> /scripts/log/isOracleUp.log
echo "Message from sqlplus: $X" >> /scripts/log/isOracleUp.log
/scripts/sendMessageToSyslog.sh "PROD Oracle is DOWN!!!"
/scripts/sendMessageToSyslog.sh "PROD Details: $X"
fi
fi
Here's output when run as oracle from terminal:
Wed May 9 10:03:07 MDT 2012: Disconnected
Message from sqlplus: select count(*) as count_col from dual
*
ERROR at line 1:
ORA-01034: ORACLE not available
Process ID: 0
Session ID: 0 Serial number: 0
Here's my log output when run through oracle's crontab job:
Wed May 9 11:00:04 MDT 2012: Disconnected
Message from sqlplus:
And to syslog:
PROD Details:
PROD Oracle is DOWN!!!
Any help would be appreciated as I'm a new linux user and this is my first linux script.
Thanks!
My Oracle db skills are pretty limited but dont you need to set ORACLE_SID and ORACLE_HOME ?
Check these variables from the command lines and set these variables within cron and retry.

Resources