How can I write operational results to remote database from IBM ODM Rules - ibm-odm

I know it is not a good practice, but I need to write operational decision results into a remote database at the end of execution of the ruleflow. Does IBM ODM 8.7 provides any mechanism to achieve that ? Any workaround ? Any help ?

ODM has a module called Decision Warehouse that does exactly this. It allows the input and output message to be logged as well as other information such as the rules that fired, or rule tasks that executed.
Here is the ODM documentation on DW.

Here you should have two problems using this approach:
All information available in the Warehouse is 'opened'. This means that if you have (probably will) sensitive information, these will appear as plain text in the Warehouse Console — this maybe is not a feasible solution for some companies.
You will probably have a loss of performance for the execution of the ruleset. This can delay your business responses in a critical and productive environment.

Related

Data access layer patterns using azure function

We are currently working on a design using Azure functions with Azure storage queue binding.
Each message in the queue represents a complete transaction. An Azure function will be bound to that queue so that the function will be triggered as soon as there is a new message in the queue.
The function will then commit the transaction in a SQL DB.
The first-cut implementation is also complete; and it's working fine. However, on retrospective, we are considering the following:
In a typical DAL, there are well-established design patterns using entity framework, repository patterns, etc. However, we didn't find a similar guidance/best practices when implementing DAL within a server-less code.
Therefore, my question is: should such patterns be implemented with Azure functions (this would be challenging :) ), or should the server-less code be kept as light as possible or this is not a use-case for azure functions, at all?
It doesn't take anything too special. We're using a routine set of library DLLs for all kinds of things -- database, interacting with other parts of Azure (like retrieving Key Vault secrets for connection strings), parsing file uploads, business rules, and so on. The libraries are targeting netstandard20 so we can more easily migrate to Functions v2 when the right triggers become available.
Mainly just design your libraries so they're highly modularized, so you can minimize how much you load to get the job done (assuming reuse in other areas of the system is important, which it usually is).
It would be easier if dependency injection was available today. See this for a few ways some of us have hacked it together until we get official DI support. (DI is on the roadmap for Functions, I believe the 3.0 release.)
At first I was a little worried about startup time with the library approach, but the underlying WebJobs stack itself is already pretty heavy, and Functions startup performance seems to vary wildly anyway (on the cheaper tiers, at least). During testing, one of our infrequently-executed Functions has varied from just ~300ms to a peak of about ~3800ms to parse the exact same test file, with all but ~55ms spent on startup).
should such patterns be implemented with Azure functions (this would
be challenging :) ), or should the server-less code be kept as light
as possible or this is not a use-case for azure functions, at all?
My answer is NO.
There should be patterns to follow, but the traditional repository patterns and CRUD operations do not seem to be valid in the cloud era.
Many strong concepts we were raised up to adhere to, became invalid these days.
Denormalizing the data base became something not only acceptable but preferable.
Now designing a pattern will depend on the database you selected for your solution and also depends of the type of your application and the type of your data.
This is a link for general guideline when you do Table Storage design Guidelines.
Is your application read-heavy or write-heavy ? The design will vary accordingly.
Are you using Azure Tables or Mongo? There are design decisions based on that. Indexing is important in Mongo while there is non in Azure table that you can do.
Sharding consideration.
Redundancy Consideration.
In modern development/Architecture many principles has changed, each Microservice has its own database that might be totally different that any other Microservices'.
If you read along the guidelines that I provided, you will see what I mean.
Designing your Table service solution to be read efficient:
Design for querying in read-heavy applications. When you are designing your tables, think about the queries (especially the latency sensitive ones) that you will execute before you think about how you will update your entities. This typically results in an efficient and performant solution.
Specify both PartitionKey and RowKey in your queries. Point queries such as these are the most efficient table service queries.
Consider storing duplicate copies of entities. Table storage is cheap so consider storing the same entity multiple times (with different keys) to enable more efficient queries.
Consider denormalizing your data. Table storage is cheap so consider denormalizing your data. For example, store summary entities so that queries for aggregate data only need to access a single entity.
Use compound key values. The only keys you have are PartitionKey and RowKey. For example, use compound key values to enable alternate keyed access paths to entities.
Use query projection. You can reduce the amount of data that you transfer over the network by using queries that select just the fields you need.
Designing your Table service solution to be write efficient:
Do not create hot partitions. Choose keys that enable you to spread your requests across multiple partitions at any point of time.
Avoid spikes in traffic. Smooth the traffic over a reasonable period of time and avoid spikes in traffic.
Don't necessarily create a separate table for each type of entity. When you require atomic transactions across entity types, you can store these multiple entity types in the same partition in the same table.
Consider the maximum throughput you must achieve. You must be aware of the scalability targets for the Table service and ensure that your design will not cause you to exceed them.
Another good source is this link:

Need architecture hint: Data replication into the cloud + data cleansing

I need to sync customer data from several on-premise databases into the cloud. In a second step, the customer data there needs some cleanup in order to remove duplicates (of different types). Based on that cleansed data I need to do some data analytics.
To achieve this goal, I'm searching for an open source framework or cloud solution I can use for. I took a look into Apache Apex and Apache Kafka, but I'm not sure whether these are the right solutions.
Can you give me a hint which frameworks you would use for such an task?
From my quick read on APEX it requires Hadoop underneath coupling to more dependencies than you probably want early on.
Kafka on the other hand is used for transmitting messages (it has other APIs such as streams and connect which im not as familiar with).
Im currently using Kafka to stream log files in real time from a client system. Out of the box Kafka really only provides fire and forget semantics. I have had to add a bit to make it an exactly once delivery semantic (Kafka 0.11.0 should solve this).
Overall, think of KAFKA being a more low level solution with logical message domains with queues and from what I skimmed over APEX being a more heavy packaged library with alot more things to explore.
Kafka would allow you to switch out the underlying analytical system of your choosing with their consumer api.
The question is very generic, but I'll try to outline a few different scenarios, as there are many parameters in play here. One of them is cost, which on the cloud it can quickly build up. Of course, the size of data is also important.
These are a few things you should consider:
batch vs streaming: do the updates flow continuously, or the process is run on demand/periodically (sounds the latter rather than the former)
what's the latency required ? That is, what's the maximum time that it would take an update to propagate through the system ? Answer to this question influences question 1)
how much data are we talking about ? If you're up the Gbyte size, Tbyte or Pbyte ? Different tools have different 'maximum altitude'
and what format ? Do you have text files, or are you pulling from relational DBs ?
Cleaning and deduping can be tricky in plain SQL. What language/tools are you planning on using to do that part ? Depending on question 3), data size, deduping usually requires a join by ID, which is done in constant time in a key value store, but requires a sort (generally O(nlogn)) in most other data systems (spark, hadoop, etc)
So, while you ponder all this questions, if you're not sure, I'd recommend you start your cloud work with an elastic solution, that is, pay as you go vs setting up entire clusters on the cloud, which could quickly become expensive.
One cloud solution that you could quickly fire up is amazon athena (https://aws.amazon.com/athena/). You can dump your data in S3, where it's read by Athena, and you just pay per query, so you don't pay when you're not using it. It is based on Apache Presto, so you could write the whole system using basically SQL.
Otherwise you could use Elastic Mapreduce with Hive (http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hive.html). Or Spark (http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark.html). It depends on what language/technology you're most comfortable with. Also, there are similar products from Google (BigData, etc) and Microsoft (Azure).
Yes, you can use Apache Apex for your use case. Apache Apex is supported with Apache Malhar which can help you build application quickly to load data using JDBC input operator and then either store it to your cloud storage ( may be S3 ) or you can do de-duplication before storing it to any sink. It also supports Dedup operator for such kind of operations. But as mentioned in previous reply, Apex do need Hadoop underneath to function.

Separate application service for command / query in CQRS implementation in Domain Driven Design?

When implementing CQRS with Domain Driven Design, we separate our command interface from our query interface.
My understanding is that at the domain level this reduces complexity significantly (especially when using event sourcing) in the domain model sense your read model will be different from your write model. So that looks like a separate domain service for your read and write bounded context.
At the application level, do we need a separate application service for the read and write separations of our domain?
I've been playing devil's advocate on the matter. My thoughts are it could be overkill, requiring clients to know the difference. But then I think about how a consuming webservice might use it. Generally, it will issue get requests for reading and post for writing, which means it already knows.
I see the benefits being cleaner application services.
The real value is having a properly separated read and domain model. They do fundamentally different things. And often have very different shapes. It's entirely possible for the read model to contain an amalgam of data from differing domain objects for example.
When you think about how they are used and the way the function within an application you can start to appreciate the need for the separation. The classic example here is to consider the number of writes compared to the reads in a typical application. The reads massively outnumber the writes. By maintaining a difference you can optimise each side for it's respective role.
Another aspect to bear in mind is that a 'post' will constitute a command not a viewmodel which may contain a read model. If using a CQRS approach you need to adapt the way you do queries and posts. In fact you can achieve a much more descriptive language rather than simply reflecting a view model back and forth to a server.
If your interested I have a blog post which outlines the high level overview of a typical CQRS architecture. You can find it here: A step by step overview of a typical CQRS application. Hope you find it useful.
Final point. We are in the process of adding new functionality and have found the separation to be very helpful. Changes to one side don't impact the other in the same way as they might.

Relational Database User trying to understand Non-Relational and how to implement CRUD

I'm currently involved in a app project, and I'm incharge of setting up the backend.
What i'm use to using is a MYSQL database + php for cleaning and managing the data sent to and fro the front end, which I have much more experience in. However, because of certain preferences of my bosses, on this project I've found myself looking at IBMs Bluemix and Cloudant software. Cloudant is a NoSQL database(like CouchDB) and my experience regarding noSQL is severely lacking. All I've mananged to do so far is to create a few JSON documents, and some basic views
What I need to figure out is how to perform the CRUD(create,read,update,delete) actions on a NoSQL database, or at least what it would look like.
In addition to this, I need to know if there are ways to implement security measures(implement security and anti-hacking functions) on a NoSQL database without an external source, or will I need to learn how to reroute the data through some sort of php function first, if i want it cleaned, before sending it to the Cloudant server where my database sits.
Let me know if my attempt to explain my problem is lacking in clarity. I'll try my best to state a different way, if need be.
Generally speaking, there is nothing equivalent to an ANSI to NoSQL databases. In other words, NoSQL databases are not as standardized as SQL databases. All standards are starting to appear. You can think of it as a technology still in the making.
What you have in general is an API with methods such as put_record or delete_record, or a REST interface that is logically equivalent. Also, in general you CRUD the whole record, not parts of the record.
Take a look at the reference: Cloudant - Reading and Writing
Having that said, in your case I would recommend abstracting away from the specific implementation of the NoSQL you want to use if you care about avoiding vendor lock-in. So I would suggest you to wrap CRUD functions using PHP functions that later can be replaced if you want to change the NoSQL database flavor.
This approach has the additional advantage to provide an abstraction for you to implement your own security. Some important NoSQL databases have no concept of multi-tenancy or just implemented that. Again, it is a technology in the making.
When your mindset is the relational one, you tend to think of the database as something that will help you guarantee data consistency as much as possible. But NoSQL databases are not like that. Think of them as a simple repository of documents (in a JSON or XML structure, for instance), without cross references.
Then the obvious question is perhaps: why would anyone want such a thing? One of the possible answers is because NoSQL databases may hold an aggregate of consolidated data. You can then retrieve aggregates to save time reprocessing or re-retrieving data unnecessarily.
As for security, most (if no all) NoSQL databases have some pretty good authentication mechanisms.

Multi-Version Concurrency Control and CQRS and DDD

In order to support offline clients, I want to evaluate how to fit Multi-Version Concurrency Control with a CQRS-DDD system.
Learning from CouchDB I felt tempted to provide each Entity with a version field. However there are other version concurrency algorithms like vector clocks. This made me think that maybe, I should just not expose this version concept for each Entity and/or Event.
Unfortunately most of the implementations I have seen are based on the assumption that the software runs on a single server, where the timestamps for the events come from one reliable source. However if some events are generated remotely AND offline, there is the problem with the local client clock offset. In that case, a normal timestamp does not seem a reliable source for ordering my events.
Does this force me to evaluate some form of MVCC solution not based on timestamps?
What implementation details must an offline-CQRS client evaluate to synchronize a delayed chain of events with a central server?
Is there any good opensource example?
Should my DDD Entities and/or CQRS Query DTOs provide a version parameter?
I manage a version number and it has worked out well for me. The nice thing about the version number is that you can make your code very explicit when dealing with concurrency conflicts. My approach is to ensure that my DTO's all have the version number of the aggregate they are associated with. When I send in a command it has the current version as seen on the client. This number may or may not be in sync with the actual version of the aggregate, ie. the client has been offline. Before the event is persisted I check the version number is the one I expected and if not I do a check against the preceding events to see any of them actually conflict. Only then if they do, do I raise an exception. This is essentially a very fine grained form of optimistic concurrency. If your interested I've written more detail including some code samples on my blog at: http://danielwhittaker.me/2014/09/29/handling-concurrency-issues-cqrs-event-sourced-system/
I hope that helps.
I suggest you to have a look at Greg's presentation on the subject. It might have answers you're looking for https://skillsmatter.com/skillscasts/1980-cqrs-not-just-for-server-systems
I guess you should rethink your domain, separate remote client logic in its own bounded context and integrate it with the other BC using the known principles of DDD for BC interop.

Resources