oh wise StackOverflow users. I have a question about parallel processing in SAS 9.4:
I'm aware that SAS typically executes procedures in a sequential, or linear manner, however, I am also aware that SAS is capable of executing procedures in parallel as well. My question is: How do you set this up? I've checked several blogs and I've not had any degree of success. The general layout of my code is this:
MACRO VARIABLES;
%syslput _all_;
RSubmit;
Data step to slightly modify variables;
Run;
EndRSubmit;
PROC SQL 1;
Connect to server statement;
SQL code;
QUIT;
...
PROC SQL n;
Connect to server statement;
SQL code;
QUIT;
There are 8 pieces of PROC SQL code that I would ideally like to execute in parallel, rather than in linear fashion. Any help or advise would be appreciated.
Thanks!
Not quite threadding, but if you're able to use RSUBMIT there's an option where the code is sent and then you have control of your computer again. You still have to wait for the output but it doesn't han up or hold up your computer.
RSUBMIT CONNECTWAIT=NO;
Not sure if you need a slash after the RSUBMIT, but nothing shows in the documentation.
https://documentation.sas.com/?docsetId=connref&docsetTarget=p1eyablk3vvdlkn1h5euyczvt585.htm&docsetVersion=9.4&locale=en
Provided your code pieces are independent, you can submit them and run in parallel in separate SAS sessions. Even though these sessions are called "remote" you can run them even on a single machine. Here is an example of one "remote" session:
/* Prosess 1 */
signon task1;
rsubmit task1 wait=no;
PROC SQL n;
Connect to server statement;
SQL code;
QUIT;
endrsubmit;
You can have multiple such RSUBMIT/ENDRSUBMIT sessions.
At the end you need to synchronize your results using
waitfor _all_;
signoff _all_;
You may find detailed explanation of this method in Running SAS programs in parallel using SAS/CONNECT® blog post.
Related
I have written a sybase stored procedure to move data from certain tables[~50] on primary db for given id to archive db. Since it's taking a very long time to archive, I am thinking to execute the same stored procedure in parallel with unique input id for each call.
I manually ran the stored proc twice at same time with different input and it seems to work. Now I want to use Perl threads[maximum 4 threads] and each thread execute the same procedure with different input.
Please advise if this is recommended way or any other efficient way to achieve this. If the experts choice is threads, any pointers or examples would be helpful.
What you do in Perl does not really matter here: what matters is what happens on the side of the Sybase server. Assuming each client task creates its own connection to the database, then it's all fine and how the client achieved this makes no diff for the Sybase server. But do not use a model where the different client tasks will try to use the same client-server connection as that will never happen in parallel.
No 'answer' per se, but some questions/comments:
Can you quantify taking a very long time to archive? Assuming your archive process consists of a mix of insert/select and delete operations, do query plans and MDA data show fast, efficient operations? If you're seeing table scans, sort merges, deferred inserts/deletes, etc ... then it may be worth the effort to address said performance issues.
Can you expand on the comment that running two stored proc invocations at the same time seems to work? Again, any sign of performance issues for the individual proc calls? Any sign of contention (eg, blocking) between the two proc calls? If the archival proc isn't designed properly for parallel/concurrent operations (eg, eliminate blocking), then you may not be gaining much by running multiple procs in parallel.
How many engines does your dataserver have, and are you planning on running your archive process during a period of moderate-to-heavy user activity? If the current archive process runs at/near 100% cpu utilization on a single dataserver engine, then spawning 4 copies of the same process could see your archive process tying up 4 dataserver engines with heavy cpu utilization ... and if your dataserver doesn't have many engines ... combined with moderate-to-heavy user activity at the same time ... you could end up invoking the wrath of your DBA(s) and users. Net result is that you may need to make sure your archive process hog the dataserver.
One other item to consider, and this may require input from the DBAs ... if you're replicating out of either database (source or archive), increasing the volume of transactions per a given time period could have a negative effect on replication throughput (ie, an increase in replication latency); if replication latency needs to be kept at a minimum, then you may want to rethink your entire archive process from the point of view of spreading out transactional activity enough so as to not have an effect on replication latency (eg, single-threaded archive process that does a few insert/select/delete operations, sleeps a bit, then does another batch, then sleeps, ...).
It's been my experience that archive processes are not considered high-priority operations (assuming they're run on a regular basis, and before the source db fills up); this in turn means the archive process is usually designed so that it's efficient while at the same time putting a (relatively) light load on the dataserver (think: running as a trickle in the background) ... ymmv ...
Daily a number of jobs run in mainframe, i need to fetch their start and end time by some other jcl or rexx automatically, is it possible ?
Yes it's possible. As #SaggingRufus indicates, your job scheduler (CA JMR, Control-M, etc.) may provide this functionality - talk to your technical staff and ask.
It's possible to do this via the Rexx interface to SDSF and then scheduling a job to execute your Rexx code. The ID under which the Rexx code executes must have authority to look at the jobs for which you wish to retrieve information. There is also a Java interface to SDSF.
Another mechanism that may be available to you is SMF, but that's not going to be an easy road unless you've also got MXG.
Talk to your technical staff and explain what you want and why, they may have already solved this problem.
The standard way to do what you want is to use SMF 30 records. You can do this in REXX but it will be a little bit tricky if you don't understand the format of SMF records. Your site may have a tool like SAS which will make it trivial.
SMF 30 subtype 1 is written when a job (or any address space) starts.
SMF 30 subtype 5 is written when a job ends.
There are several other subtype records written such as job step termination deltas. The SMF 30s contain absolutely everything you could possibly want to know about a batch job. If you just wanted to know how much elapsed or CPU time a job has taken then just read the subtype 5 and look at the performance section.
If you really must use REXX then there are products that have REXX APIs that access SMF data such as IBM Transaction Analysis Workbench for z/OS. Disclaimer: I'm one of the developers of that product.
This solution will work if your site uses CA JMR
//SCANJMR JOB (11111),'JMRSCAN',
// CLASS=T,MSGCLASS=X,MSGLEVEL=(1,1),
//SCAN EXEC JMRSCAN
//JMRPRINT DD DSN=&&OUTDATASET,
// DISP=(NEW,CATLG,),
// UNIT=SYSDA,SPACE=(TRK,(20,20)),
// DCB=(LRECL=133,RECFM=FB,BLKSIZE=6118)
//JMRIN DD *
FUNCTION LIST=ALL JOBMASK=* SDATE=16/11/16
EDATE=16/11/16
/*
then all you need to do is get a count of how many records are in this file.
If not, other job schedulers may provide similar functionality.
Another way could be to add a simple steps to the jobs that run a Rexx program that stores the date.
These steps needn't even be in the actual production job(s) you could schedule them as jobs with the production job as a successor and then as a predecessor.
Rexx has in-built time and date functions an example of their use is:-
rc = audit('OACG22X Invoked by' userid() 'at' time() 'on' date()'.')
You could update the report data by either using a DISP of MOD or by reading it in and then rewriting it with the new record added. EXECIO being the rexx function that you'd use.
When you've run the report, this would then clear the data or perhaps cycle a GDG (create an empty +1).
The following Rexx is pretty close to what could be used (albeit quite inflated i.e. you would basically be interested in the EXECIO's and the generation of the out.1 (using the current date and time) , (this only maintains 1 record in the output)):-
/* REXX - CYCLE TAPES WITHIN A POOL FOR EMHA800W BATCH JOB */
/*--------------------------------------------------------------------*/
/* read in data from tape cycle dataset */
/*--------------------------------------------------------------------*/
"EXECIO 1 DISKR CYCTAPE (stem in. FINIS"
LastTape = SUBSTR(in.1,1,6)
If LastTape = "XXXXXX" Then NewTape = "SAP001"
Else Do
TapeNum = SUBSTR(in.1,5,2)
If DATATYPE(TapeNum,"N") Then Do
NewNum = TapeNum + 1
If Newnum > 4 Then NewNum = 1
RetCde = NewNum
Newnum = RIGHT(Newnum,2,"0")
NewTape = "SAP0"||NewNum
End
Else RetCde = 100
End
out.1 = NewTape||" "||DATE("E")||" "||TIME("N")
"EXECIO 1 DISKW CYCTAPEO (stem out. FINIS"
Say "Return Code will be "||RetCde
Return RetCde
Running Rexx via batch is detailed here How can I run my Rexx program as a batch job?.
I haven't used Zeke but from a very brief search it appears that you may be able to inspect the EMR (Event Master Record).
I have a use case where I need a sequence to wait for a period of time before it continues. Basically it is a "Thread.Sleep(x)", but this would mean the Thread is not available for the Thread pool. This could have consequences for high load systems. So therefore I have two questions:
1) What would be the best way to implement this use case?
2) How much of a burden would using Thread.Sleep be for WSO?
Alternative solutions, for example using topic and stuff are also welcome :)
Hope you guys can help!
Answering the questions in the responses:
We are sending requests to an external system and an offline data store (ODS; DSS component of WSO2). The external system has precedense, but when it doesn't return within one second we want the ODS to answer the request.
Alternative paths:
- The ODS is offline, in this case the system has to wait for the external system for a longer time;
- The external system returns after some time, althought the ODS result has been send to the requester we still want the response of the external system to update our ODS.
We are currently investigating clone and aggregator.
When you say, Thread.sleep(), the first thing came to my mind is using a Class Mediator. This would be an easy way to write custom logic and add a sleep.
The sample for "Writing your own Custom Mediation in Java" will help you to learn the steps for writing a Class Mediator.
You need to copy the Jar containing custom mediator class to repository/components/lib/
When you use thread sleep inside your mediation logic, the request will hang for the specified time period.
This may impact your performance. But you should be able to tune the parameters for your needs.
It all depends on your requirements.
I'm using SQLite3 on an embedded system and on x86 in a Qt application. I'm experiencing the common error "Database is locked" when multiple threads try to read/write the database.
I read this article suggested in some other answers, but I'm creating a different connection for each thread.
By adjusting a bit the QSQLITE_BUSY_TIMEOUT option (to a very large value: 10000000) I solved this problem on the x86 system and on the embedded system, but in the latter case only when not using transactions. Unfortunately I need to use transactions for all the work of each thread.
My question is: isn't it supported by SQLite3 to read/write from/to the database concurrently when using transactions? Why doesn't it simply wait all the necessary time to acquire the lock? Maybe I haven't set it up correctly?
Read BEGIN TRANSACTION statement of SQL. It explicitly says that the default transaction behavior is deferred which explains the error that you are seeing. Also read this link for another good explanation.
So you need to start your SQL as "BEGIN IMMEDIATE TRANSACTION"and everybody else must do the same.
You can find source code example here. Pay attention to
bool SqlEngine::beginTransaction()
method and do the same in your code.
I'm creating a windows console application that will read text file line by line and extract the data from the string that is fixed length data. The application is written as windows application for now but will convert to windows console app later on. I've notice that it take a while for the application to run from reading the text, inserting into the database and exporting out of the database.
Would it help speed up the process if i use multiple threads? I'm thinking one thread to read the data and another thread to do inserting the data to the database.
any suggestion?
edit: the application is going to be done in VB.net
I will assume this is an SQL database.
Your problem is likely to be that you are doing one item at a time. SQL hates that. SQL and SQL databases operate on sets of items.
So, open a transaction, read and insert 1,000 items. Save those items in case the transaction commit fails for some reason so that you can retry.
I have managed to speed up some Perl scripts doing work that sounds similar to your description by over 20x with this technique.
I do not know the Microsoft library that you are using, but here is a sample in Perl using DBI. The parts that make it work are AutoCommit => 0 and $dbh->commit.
#!/usr/bin/perl
use strict;
use DBI;
my $dbname = 'urls';
my $user = 'postgres';
my $pass = '';
my $dbh = DBI->connect(
"DBI:Pg:dbname=$dbname",
$user,
$pass,
{ 'RaiseError' => 1, AutoCommit => 0 }
);
my $insert = $dbh->prepare('
INSERT INTO todo (domain, path)
VALUES (?, ?)
');
my $count = 0;
while(<>) {
if( $count++ % 1000 == 0) {
$dbh->commit;
}
chomp;
my ($one, $two) = split;
$insert->execute($one, $two);
}
$dbh->commit;
$dbh->disconnect;
With multiple threads, you may be able to get some overlap - one thread is reading from disk while another thread is doing a database insert. I'm guessing that you probably won't see that much of an improvement - unless you're reading very large files, most of your time is probably spent inserting into the database, and the time in disk I/O is just noise.
It's impossible to say in general - the only way to find out is to build the app and test the performance. The bottleneck is likely to be the DB insert, but whether multi-threading will speed thibngs up depennds on a host of factors:
are your app and the db server running on thge same machine?
do they use the same disk?
can one insert cause contention with another?
You get the idea. Having said that, I have written servers in the finance industry where multi-threading the DB access did make a huge difference. But these were talking to a gigantic Sun enterprise server which had database I/Os to spare, so flooding it with requests from a multi-threaded app made sense.
Comitting data to the database is a time intensive operation. Try collecting items in batches (say 1000) and submit these batches to the database rather than submitting the items one by one. This should improve your performance. Multithreading is overkill for this type of application.
You probably wouldn't gain much from that, as the task you're outlining here is pretty much sequential in nature.
You won't know if multithreading will help until you build the application, but it seems that you really just want better performance. Before doing anything you need to measure the performance of the application. Perhaps there is some code that is inefficient, so use a profiler to identify bottlenecks.
multiple Threads not always improve the performance. If the activities can truly be executed in parallel then only the basic multithreading works. If lots of IO operations are being done in reading data then its worth to give a try. Best way is to prototype and verify.
What are you using to build windows app? If you are using .Net use thread pool. There is nice library called Power threading developed by Jeff Richter.Download
Also, understand how threads work in windows OS. Adding multiple threads sometimes may not help and I often not encourage it.