Parallel processing in odi - multithreading

I want to run a scenario in odi parallely with a condition.
For ex if the year in table is 2015 or 2016 or so on it should run parallely and store in the target table without creating multiple scenarios .
How do I do that using asynchronous type in odi and without having to create multiple mappings.

Related

Run VoltDB stored procedures at regular interval from VoltDB

Is there any way to execute VoltDB stored procedures at regular interval or schedule store procedure to run at a specific time?
I am exploring VotlDB to shift out product from RDBMS to VotlDB. Out produce written in java.
Most of the query can be migrated into the VoltDB stored procedures. But In our product, we have cron job in oracle which executes at regular interval. Now I do not find such features in VoltDB.
I know VoltDB stored procedures can be called from the application at regular interval but our product deploys in an Active-Active mode, in that case, all application will call store procedure at regular interval and that is not a good solution or otherwise, we have to develop some mechanism to run procedure from one instance only.
so It would be good if I get cron job feature from VoltDB.
I work at VoltDB. There isn't currently a feature like this in VoltDB, for example like DBMS_JOB in Oracle.
You could certainly use a cron job on one of the servers in your cluster, or on some other server within your network that could invoke sqlcmd to run a script or echo individual SQL statements or execute procedure commands through sqlcmd to the database. Making cron jobs highly available is a general problem. You might find these other discussions helpful:
How to convert Linux cron jobs to "the Amazon way"?
https://www.reddit.com/r/linuxadmin/comments/3j3bz4/run_cronjob_only_on_one_node_in_cluster/
You could also look into something like rcron.
One thing to be careful of when converting from an RDBMS to VoltDB is that VoltDB is optimized for processing many small transactions in parallel across many partitions. While the architecture of serialized execution per partition excels for many operational and streaming workloads, it is not designed to perform bulk operations on many rows at a time, especially transactions that need to perform writes on many rows that may be in different partitions within one transaction.
If you have a periodic job that does something like "process all the new rows that meet some criteria" you may find this transaction is slow and every time it runs it could delay other parts of the workload, especially if many rows have accumulated. It would be more the "VoltDB Way" to replace a simple INSERT statement that you may be using to ingest data (to be processed later by a scheduled job) with a procedure that inserts and immediately processes the row of data. You might even need a procedure that checks for other records and processes small sets of rows as a group, for example stitching together segments of data that go together but may have arrived out of order. By operating on fewer records at a time within one partition at a time, this type of procedure would be more scalable and would keep the data closer to your desired finished state in real time, rather than always having some data waiting to be processed.

Is it possible to run a table to table mapping scenario in parallel (multi threading)

Is it possible to run a table to table mapping scenario in parallel (multi threading)
we have a huge table and we already created table mapping and scenario on the mapping.
we also executing it from loadplan.
but is there way I can run the scenario in multiple threads to make the data transfer faster.
I am using groovy to script all these task.
It will be better if I get someway to script it in groovy.
A load plan with Parallel steps or a packages with scenarios in asynchronous mode will do for the parallelism part.
An issue you might run in, depending on which KMs are used, is that the same name will be used by temporary tables in all mappings. To avoid that, select the "Use Unique Temporary Object Names" checkbox appears in the Physical tab of your mapping. It will generate a different name for these objects for each execution.
It is possible on the ODI side, you may need some modifications on the mapping to not load any duplicate data. We have a similar flow where we use modula function on a numeric key to split source data into partitions. Then this data gets loaded into target.
To run this interface in multi-thread way, we have a package with a loop that executes the scenario asynchronously of this mapping with a MODULO_VALUE variable.
For loading data we are using oracle sqlloader utility, it is able to work in a parallel way to load data into one target table. I am not sure about if data pump utility also has this ability. But I know if you try to load data by SQL as a multithread approach you would get a ORA-00054: resource busy and acquire with NOWAIT specified error.
As you see there is no Groovy code included in this flow, all handled by ODI mappings, packages and KMs. I hope this helps.

U-SQL - Execution related queries

I wrote multiple U-SQL scripts and its output got stored in ADLA, based on this I have few question.
How we can run dependent jobs in U-SQL?
How to execute statement based on some condition like
If RecordCount > 0 then
insert into table1
endif
How we can schedule U-SQL jobs?
Can we write multiple scripts and call them from main script?
During script execution, compiler prepare and compiles the code. It took almost 30-40 secs. How we can bundle the compiled code and create the ADF pipeline?
You can schedule and orchestrate U-SQL jobs with Azure Data Factory or by writing your own scheduler with one of the SDKs (Powershell, C#, Java, node.js, Python).
U-SQL supports two ways for conditional execution:
If your conditional can be evaluated at compile time, e.g., when you pass a parameter value or check for the existence of a file, you can use the IF statement.
If your conditional can only be determined during the execution of the script, then you can use the WHERE clause as wBob outlines in his comment.
As wBob mentions, you can encapsulate most of the U-SQL statements in procedures and then call them from other scripts/procedures, or you can write your own way of inclusion/orchestration if you need script file reuse.
There is currently no ability to reuse and submit just compiled code, since the compilation depends on the exact information such as what files are present and the statistics of the accessed data.

Parallel loading csv file into one table by SQLLDR with sequence (MAX,1)

I have around 100 threads running parallel and dumping data in a single table using sqlldr ctl file. the query generates values for ID using expression ID SEQUENCE(MAX,1).
The process fails to load files properly due to parallel execution and may be two or more threads get same ID. it works fine when I run it sequentially with one single thread.
Please suggest a workaround.
Each CSV file contains data associated with a test cases and cases are supposed to be run in parallel. I can not concatenate all files in one go.
You could load the data and then run a separate update in which you could update ID with a traditional oracle sequence?

Multi-threading access to SubmitChanges() (LINQ to SQL)

I am using Visual Studio 2010 Beta 2.
In Parallel.For loop I execute the same method with different parameter values. After execution processed data must be stored in the database.
But I've got an exception hat says that I could not work with the same data context from different threads.
So the question will be how to work with data context and SubmitChanges() from multiple threads?
I would recommend creating a threadsafe structure for storing your results. Once your parallel for has completed you can read these out of the structure and push them into your linq dataset.

Resources