Using Transaction scope to wrap Rhino ETL Processes

Using Transaction scope to wrap Rhino ETL Processes - scope

I'm using Rhino ETL for handling both SQL SERVER and Excel file Extraction/Load.
I've put more than one ETL processes (which insert data into sql server tables) in a transaction scope (c#) so that I can rollback the whole process in case of any errors or exceptions.
I'm not sure if C# transaction scope could be used along with ETL processes, but the way I did is as follows:
private void ELTProcessesInTransaction(string a, string b, int check)
{
using (var scope = new TransactionScope())
{
using (ETLProcess1 etlProcess1 = new ETLProcess1(a, b))
{
etlProcess1.Execute();
}
using (ETLProcess2 etlProcess2 = new ETLProcess2(a, b))
{
etlProcess2.Execute();
}
if (!_InSeverDB.HasError(check))
scope.Complete();//Commits based on a condition
}
}
This does not rollback the transaction at all. I tried removing the scope.Complete() line, still the process is getting committed.
Please let me know where I need to correct or if the whole approach is incorrect.

Related

Durable Task Framework re-queue failed task

How to use "waiting for external" event functionality of durable task framework in the code. Following is a sample code.
context.ScheduleWithRetry<LicenseActivityResponse>(
typeof(LicensesCreatorActivity),
_retryOptions,
input);
I am using ScheduleWithRetry<> method of context for scheduling my task on DTF but when there is an exception occurring in the code. The above method retries for the _retryOptions number of times.
After completing the retries, the Orchestration status will be marked as Failed.
I need a process by which i can resume my orchestration on DTF after correcting the reason of exception.
I am looking into the githib code for the concerned method in the code but no success.
I have concluded two solution:
Call a framework's method (if exist) and re-queue the orchestration from the state where it failed.
Hold the orchestration code in try catch and in catch section i implement a method CreateOrchestrationInstanceWithRaisedEventAsync whcih will put the orchestration in hold state until an external event triggers it back. Whenever a user (using some front end application) will call the external event for resuming (which means the user have made the corrections which were causing exception).
These are my understandings, if one of the above is possible then kindly guide me through some technical suggestions. otherwise find me a correct path for this task.

For the community's benefit, Salman resolved the issue by doing the following:
"I solved the problem by creating a sub orchestration in case of an exception occurs while performing an activity. The sub orchestration lock the event on azure as pending state and wait for an external event which raise the locked event so that the parent orchestration resumes the process on activity. This process helps if our orchestrations is about to fail on azure durable task framework"

I have figured out the solution for my problem by using "Signal Orchestrations" taken from code from GitHub repository.
Following is the solution diagram for the problem.
In this diagram, before the solution implemented, we only had "Process Activity" which actually executes the activity.
Azure Storage Table is for storing the multiplier values of an instanceId and ActivityName. Why we implemented this will get clear later.
Monitoring Website is the platform from where a user can re-queue/retry the orchestration activity to perform.
Now we have a pre-step and a post-step.
1. Get Retry Option (Pre-Step)
This method basically set the value of RetryOptions instance value.
private RetryOptions ModifyMaxRetires(OrchestrationContext context, string activityName)
{
var failedInstance =
_azureStorageFailedOrchestrationTasks.GetSingleEntity(context.OrchestrationInstance.InstanceId,
activityName);
var configuration = Container.GetInstance<IConfigurationManager>();
if (failedInstance.Result == null)
{
return new RetryOptions(TimeSpan.FromSeconds(configuration.OrderTaskFailureWaitInSeconds),
configuration.OrderTaskMaxRetries);
}
var multiplier = ((FailedOrchestrationEntity)failedInstance.Result).Multiplier;
return new RetryOptions(TimeSpan.FromSeconds(configuration.OrderTaskFailureWaitInSeconds),
configuration.OrderTaskMaxRetries * multiplier);
}
If we have any entry in our azure storage table against the instanceId and ActivityName, we takes the multiplier value from the table and updates the value of retry number in RetryOption instance creation. otherwise we are using the default number of retry value which is coming from our config.
Then:
We process the activity with scheduled retry number (if activity fails in any case).
2. Handle Exceptions (Post-Step)
This method basically handles the exception in case of the activity fails to complete even after the number of retry count set for the activity in RetryOption instance.
private async Task HandleExceptionForSignal(OrchestrationContext context, Exception exception, string activityName)
{
var failedInstance = _azureStorageFailedOrchestrationTasks.GetSingleEntity(context.OrchestrationInstance.InstanceId, activityName);
if (failedInstance.Result != null)
{
_azureStorageFailedOrchestrationTasks.UpdateSingleEntity(context.OrchestrationInstance.InstanceId, activityName, ((FailedOrchestrationEntity)failedInstance.Result).Multiplier + 1);
}
else
{
//const multiplier when first time exception occurs.
const int multiplier = 2;
_azureStorageFailedOrchestrationTasks.InsertActivity(new FailedOrchestrationEntity(context.OrchestrationInstance.InstanceId, activityName)
{
Multiplier = multiplier
});
}
var exceptionInput = new OrderExceptionContext
{
Exception = exception.ToString(),
Message = exception.Message
};
await context.CreateSubOrchestrationInstance<string>(typeof(ProcessFailedOrderOrchestration), $"{context.OrchestrationInstance.InstanceId}_{Guid.NewGuid()}", exceptionInput);
}
The above code first try to find the instanceID and ActivityName in azure storage. If it is not there then we simply add a new row in azure storage table for the InstanceId and ActivityName with the default multiplier value 2.
Later on we creates a new exception type instance for sending the exception message and details to sub-orchestration (which will be shown on monitoring website to a user). The sub-orchestration waits for the external event fired from a user against the InstanceId of the sub-orchestration.
Whenever it is fired from monitoring website, the sub-orchestration will end up and go back to start parent orchestration once again. But this time, when the Pre-Step activity will be called once again it will find the entry in azure storage table with a multiplier. Which means the retry options will get updated after multiplying it with default retry options.
So by this way, we can continue our orchestrations and prevent them from failing.
Following is the class of sub-orchestrations.
internal class ProcessFailedOrderOrchestration : TaskOrchestration<string, OrderExceptionContext>
{
private TaskCompletionSource<string> _resumeHandle;
public override async Task<string> RunTask(OrchestrationContext context, OrderExceptionContext input)
{
await WaitForSignal();
return "Completed";
}
private async Task<string> WaitForSignal()
{
_resumeHandle = new TaskCompletionSource<string>();
var data = await _resumeHandle.Task;
_resumeHandle = null;
return data;
}
public override void OnEvent(OrchestrationContext context, string name, string input)
{
_resumeHandle?.SetResult(input);
}
}

Flux - Isn't it a bad practice to include the dispatcher instance everywhere?

Note: My question is about the way of including/passing the dispatcher instance around, not about how the pattern is useful.
I am studying the Flux Architecture and I cannot get my head around the concept of the dispatcher (instance) potentially being included everywhere...
What if I want to trigger an Action from my Model Layer? It feels weird to me to include an instance of an object in my Model files... I feel like this is missing some injection pattern...
I have the impression that the exact PHP equivalent is something (that feels) horrible similar to:
<?php
$dispatcher = require '../dispatcher_instance.php';
class MyModel {
...
public function someMethod() {
...
$dispatcher->...
}
}
I think my question is not exactly only related to the Flux Architecture but more to the NodeJS "way of doing things"/practices in general.

TLDR:
No, it is not bad practice to pass around the instance of the dispatcher in your stores
All data stores should have a reference to the dispatcher
The invoking/consuming code (in React, this is usually the view) should only have references to the action-creators, not the dispatcher
Your code doesn't quite align with React because you are creating a public mutable function on your data store.
The ONLY way to communicate with a store in Flux is via message passing which always flows through the dispatcher.
For example:
var Dispatcher = require('MyAppDispatcher');
var ExampleActions = require('ExampleActions');
var _data = 10;
var ExampleStore = assign({}, EventEmitter.prototype, {
getData() {
return _data;
},
emitChange() {
this.emit('change');
},
dispatcherKey: Dispatcher.register(payload => {
var {action} = payload;
switch (action.type) {
case ACTIONS.ADD_1:
_data += 1;
ExampleStore.emitChange();
ExampleActions.doThatOtherThing();
break;
}
})
});
module.exports = ExampleStore;
By closing over _data instead of having a data property directly on the store, you can enforce the message passing rule. It's a private member.
Also important to note, although you can call Dispatcher.emit() directly, it's not a good idea.
There are two main reasons to go through the action-creators:
Consistency - This is how your views and other consuming code interacts with the stores
Easier Refactoring - If you ever remove the ADD_1 action from your app, this code will throw an exception rather than silently failing by sending a message that doesn't match any of the switch statements in any of the stores
Main Advantages to this Approach
Loose coupling - Adding and removing features is a breeze. Stores can respond to any event in the system with by adding one line of code.
Less complexity - One way data flow makes wrapping head around data flow a lot easier. Less interdependencies.
Easier debugging - You can debug every change in your system with a few lines of code.
debugging example:
var MyAppDispatcher = require('MyAppDispatcher');
MyAppDispatcher.register(payload => {
console.debug(payload);
});

Transactions in multi threaded environment in SQLite WP 8.1

I am facing an issue using SQLite with following scenario.
There are two threads working on database.
Both threads have to insert messages in transactions.
So if say one thread commits after inserting 20k rows and other thread has not committed yet.
In output I see all the data has been committed which was inserted by thread 2 till the moment data was committed by thread 1.
Sample function:
/// <summary>
/// Inserts list of messages in message table
/// </summary>
/// <param name="listMessages"></param>
/// <returns></returns>
public bool InsertMessages(IList<MessageBase> listMessages, bool commitTransaction)
{
bool success = false;
if (listMessages == null || listMessages.Count == 0)
return success;
DatabaseHelper.BeginTransaction(_sqlLiteConnection);
foreach (MessageBase message in listMessages)
{
using (var statement = _sqlLiteConnection.Prepare(_insertMessageQuery))
{
BindMessageData(message, statement);
SQLiteResult result = statement.Step();
success = result == SQLiteResult.DONE;
if (success)
{
Debug.WriteLine("Message inserted suucessfully,messageId:{0}, message:{1}", message.Id, message.Message);
}
else
{
Debug.WriteLine("Message failed,Result:{0}, message:{1}", result, message.Message);
}
}
}
if (commitTransaction)
{
Debug.WriteLine("Data committed");
DatabaseHelper.CommitTransaction(_sqlLiteConnection);
}
else
{
Debug.WriteLine("Data not committed");
}
return success;
}
Is there any way to prevent commit transaction of thread 2 inserts?

In short, it's not possible on a single database.
A single sqlite database cannot have multiple simultaneous writers with separate transaction contexts. sqlite databases also do not have separate contexts per connection; to get a context, you would need to make a new connection. However, as soon as you start the initial write using insert (or update/delete), the transaction needs a RESERVED lock on the database (readers allowed, no other writers), which means parallel writes are impossible. I thought you might be able to fake it with SAVEPOINT and RELEASE, but these are also serialized on the connection and do not generate a separate context.
With that in mind, you may be able to use separate databases connected using ATTACH DATABASE, as long as both threads are not writing to the same table. To do so, you would attach the additional database at runtime which contains the other tables. However, you still need a separate connection for each parallel writer because commits to the connection still apply to all open transactions.
Otherwise, you can get separate transactions by opening an additional connection and the later connection & transaction will just have to wait until the RESERVED lock is released.
References:
SQLite Transactions
SQLite Locking
SQLite ATTACH DATABASE

Caching requests to reduce processing (TPL?)

I'm currently trying to reduce the number of similar requests being processed in a business layer by:
Caching the requests a method receives
Performing the slow processing task (once for all similar requests)
Return the result to each requesting method calls
Things to note, are that:
The original method calls are not currently in a async BeginMethod() / EndMethod(IAsyncResult)
The requests arrive faster than the time it takes to generate the output
I'm trying to use TPL where possible, as I am currently trying to learn more about this library
eg. Improving the following
byte[] RequestSlowOperation(string operationParameter)
{
Perform slow task here...
}
Any thoughts?
Follow up:
class SomeClass
{
private int _threadCount;
public SomeClass(int threadCount)
{
_threadCount = threadCount;
int parameter = 0;
var taskFactory = Task<int>.Factory;
for (int i = 0; i < threadCount; i++)
{
int i1 = i;
taskFactory
.StartNew(() => RequestSlowOperation(parameter))
.ContinueWith(result => Console.WriteLine("Result {0} : {1}", result.Result, i1));
}
}
private int RequestSlowOperation(int parameter)
{
Lazy<int> result2;
var result = _cacheMap.GetOrAdd(parameter, new Lazy<int>(() => RequestSlowOperation2(parameter))).Value;
//_cacheMap.TryRemove(parameter, out result2); <<<<< Thought I could remove immediately, but this causes blobby behaviour
return result;
}
static ConcurrentDictionary<int, Lazy<int>> _cacheMap = new ConcurrentDictionary<int, Lazy<int>>();
private int RequestSlowOperation2(int parameter)
{
Console.WriteLine("Evaluating");
Thread.Sleep(100);
return parameter;
}
}

Here is a fast, safe and maintainable way to do this:
static var cacheMap = new ConcurrentDictionary<string, Lazy<byte[]>>();
byte[] RequestSlowOperation(string operationParameter)
{
return cacheMap.GetOrAdd(operationParameter, () => new Lazy<byte[]>(() => RequestSlowOperation2(operationParameter))).Value;
}
byte[] RequestSlowOperation2(string operationParameter)
{
Perform slow task here...
}
This will execute RequestSlowOperation2 at most once per key. Please be aware that the memory held by the dictionary will never be released.
The user delegate passed to the ConcurrentDictionary is not executed under lock, meaning that it could execute multiple times! My solution allows multiple lazies to be created but only one of them will ever be published and materialized.
Regarding locking: this solution will take locks, but it does not matter because the work items are far more expensive than the (few) lock operations.

Honestly, the use of TPL as a technology here is not really important, this is just a straight up concurrency problem. You're trying to protect access to a shared resource (the cached data) and, to do that, the only approach is to lock. Either that or, if the cache entry does not already exist, you could allow all incoming threads to generate it and then subsequent requesters benefit from the cached value once it's stored, but there's little value in that if the resource is slow/expensive to generate and cache.
Perhaps some more details will make it clear on exactly why you're trying to accomplish this without a lock. I'll happily to revise my answer if more detail makes it clearer what you're trying to do.

Trouble getting transaction working with SubSonic

I'm having a little trouble getting a multi delete transaction working using SubSonic in an ASP.NET/SQL Server 2005. It seems it always makes the change(s) in the database even without a call the complete method on the transactionscope object?
I've been reading through the posts regarding this and tried various alternatives (switching the order of my using statements), using DTC, not using DTC etc but no joy so far.
I'm going to assume it's my code that's the problem but I can't spot the issue - is anyone able to help? I'm using SubSonic 2.2. Code sample below:
using (TransactionScope ts = new TransactionScope())
{
using (SharedDbConnectionScope sts = new SharedDbConnectionScope())
{
foreach (RelatedAsset raAsset in relAssets)
{
// grab the asset id:
Guid assetId = new Select(RelatedAssetLink.AssetIdColumn)
.From<RelatedAssetLink>()
.Where(RelatedAssetLink.RelatedAssetIdColumn).IsEqualTo(raAsset.RelatedAssetId).ExecuteScalar<Guid>();
// step 1 - delete the related asset:
new Delete().From<RelatedAsset>().Where(RelatedAsset.RelatedAssetIdColumn).IsEqualTo(raAsset.RelatedAssetId).Execute();
// more deletion steps...
}
// complete the transaction:
ts.Complete();
}
}

The order of your using statements is correct (I remember the order myself with this trick: The Connection needs to know about the transaction while it is created, and it does that by checking `System.Transactions.Transaction.Current).
One hint: you don't need to use double brackets. And you don't need a reference to the SharedDbConnectionScope().
This looks far more readable.
using (var ts = new TransactionScope())
using (new SharedDbConnectionScope())
{
// some db stuff
ts.Complete();
}
Anyway, I don't see, why this shouldn't work.
If the problem would be related to the MSDTC, an exception would occur.
I only could imagine, that there is a problem in the SqlServer 2005 Configuration but I am not a SqlServer expert.
Maybe you should try some demo code to verify that transactions work:
using (var conn = new SqlConnection("your connection String");
{
conn.Open();
var tx = conn.BeginTransaction();
using (var cmd = new SqlCommand(conn)
cmd.ExecuteScalar("DELETE FROM table WHERE id = 1");
using (var cmd2 = new SqlCommand(conn)
cmd2.ExecuteScalar("DELETE FROM table WHERE id = 2");
tx.Commit();
}
And subsonic supports native transactions without using TransactionScope: http://subsonicproject.com/docs/BatchQuery

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Using Transaction scope to wrap Rhino ETL Processes - scope

Related

Durable Task Framework re-queue failed task

Flux - Isn't it a bad practice to include the dispatcher instance everywhere?

Transactions in multi threaded environment in SQLite WP 8.1

Caching requests to reduce processing (TPL?)

Trouble getting transaction working with SubSonic

Categories

Resources