Preventing duplicate entries in Multi Instance Application Environment - multithreading

I am writing an application to serve facebook APIs; share, like etc.. I am keeping all those shared objects from my appliction in a database and I do not want to share the same object if it already been shared.
Considering I will deploy application on different servers there could be a case where both instance tries to insert the same object to table.
How can I manage this concurrency problem with blocking the applications fully ? I mean two threads will try to insert same object and they must sync but they should not block a 3rd thread where it is inserting totally different object.

If there's a way to derive primary key of data entry from data itself, database will resolve such concurrency issue by itself -- 2nd insert will fail with 'Primary Key constraint violation'. Perhaps, data supplied by Facebook API already have some unique ID?
Or, you can consider some distributed lock solution, for example, based on Hazelcast or on similar data grid. This would allow to have record state shared by different JVMs, so it will be possible to avoid unneeded INSERTS.

Related

Hazelcast and the need for custom serializers; works when creating the server but not when connecting to existing

We are using Hazelcast to store stuff in distributed maps. We are having a problem with remote servers and I need some feedback on what we can do to resolve the issue.
We create the server - WORKS
We create a new server (Hazelcast.newHazelcastInstance) inside our application's JVM. The hazelcast Config object we pass in has a bunch of custom serializers defined for all the types we are going to put in the maps. Our objects are a mixture of Protobufs, plain java objects, and a combination of the two. The server starts, we can put objects in the map and get objects back out later. We recently decided to start running Hazelcast in its own dedicated server so we tried the scenario below.
Server already exists externally, we connect as a client - DOESN'T WORK
Rather than creating our Hazelcast instance we connect to a remote instance that is already running. We pass in a config with all the same serializers we used before. We successfully connect to Hazelcast and we can put stuff in the map (works as far as I can tell) but we don't get anything back out. No events get fired letting our listeners know objects were added to a map.
I want to be able to connect to a Hazelcast instance that is already running outside of our JVM. It is not working for our use case and I am not sure how it is supposed to work.
Does the JVM running Hazelcast externally need in its class loader all of the class types we might put into the map? It seems like that might be where the problem is but wouldn't that make it very limiting to use Hazelcast?
How do you typically manage those class loader issues?
Assuming the above is true, is there a way to tell Hazelcast we will serialize the objects before even putting them in the map? Basically we would give Hazelcast an ID and byte array and that is all we would expect back in return. If so that would avoid the entire class loader issue I think we are running into. We do not need to be able to search on objects based on their fields. We just need to know as objects come and go and what their ID is.
#Jonathan, when using client-server architecture, unless you use queries or other operations that require data to be serialized on the cluster, members don't need to know anything about serialization. They just store already serialized data & serve it. If these listeners that you mentioned are on the client app, it should be working fine.
Hazelcast has a feature called User Code Deployment, https://docs.hazelcast.org/docs/3.11/manual/html-single/index.html#member-user-code-deployment-beta, but it's mainly for user classes. Serialization related config should be present on members or you should add that later & do a rolling restart.
If you can share some of the exceptions/setup etc, I can give specific answers as well.

Caching posts using redis

I have a forum which contains groups, new groups are created all the time by users, currently I'm using node-cache with ttl to cache groups and it's content (posts, likes and comments).
The server worked great at the begging but the performance decreased when more people start using the app, so I decided to use the node.js Cluster module as the next step to improve performance.
The node-cache will cause a consistency problem, the same group could be cached in two workers, so if one of them changed, the other will not know (unless you do).
The first solution that came to my mind is using redis to store the whole group and it's content with the help of redis datatypes (sets and hash objects), but I don't know how efficient this could be.
The other solution is using redis to map requests to the correct worker, in this case the cached data is distributed randomly in workers, so when a worker receives a request that related to some group, he checks the group owner(the worker that holds this group instance in-memory) in redis and ask him to get the wanted data using node-ipc and then return it to the user.
Is there any problem with the first solution?
The second solution does not provides a fairness (if all the popular groups landed in the same worker), is there a solution for this?
Any suggestions?
Thanks in advance

Using Sessions in Cassandra

When using cassandra datastax java driver, When can I use multiple sessions under same cluster? I am not able to find any good usecase for having a cluster and multiple sessions.
My application have multiple components/modules that accesses Cassandra. Based on the answer I may decide Should I be having one session per component/module or just one session shared across all the components of my application.
Update: Everywhere on the internet they recommend to use one session. I get it, but my question is "in what scenario do you create multiple sessions for one cluster?". If there is no such scenario, why the library allows to create multiple sessions, instead the library can just have a method to return a singleton session object.
Use Just One Session across all your component.
Because In Cassandra Session is a heavy object. Thread-safe. It maintain multiple connection, cached prepared statement etc.
Here is the JavaDoc :
A session holds connections to a Cassandra cluster, allowing it to be queried. Each session maintains multiple connections to the cluster nodes, provides policies to choose which node to use for each query (round-robin on all nodes of the cluster by default), and handles retries for failed query (when it makes sense), etc...
Session instances are thread-safe and usually a single instance is enough per application. As a given session can only be "logged" into one keyspace at a time (where the "logged" keyspace is the one used by query if the query doesn't explicitely use a fully qualified table name), it can make sense to create one session per keyspace used. This is however not necessary to query multiple keyspaces since it is always possible to use a single session with fully qualified table name in queries.
Source :
https://docs.datastax.com/en/drivers/java/2.0/com/datastax/driver/core/Session.html
https://ahappyknockoutmouse.wordpress.com/2014/11/12/246/

How to share an object in multiple instances of nodejs?

I have a functionality where user post data containing few userid and some data related to those userid and I am saving it into postgresql database. I want to save this returned userid in some object.
I just want to check if userid is present in this object and then only call database. This check happen very frequently so I can not hit db every time just to check is there any data present for that userid.
Problem is, I have multiple nodejs instances running on different server so how can I have a common object.
I know I can use redis/riak for storing key-value on server, but don't want to increase complexity/learning just for a single case.(I have never used redis/riak before.)
Any suggestion ?
If your data is in different node.js processes on different servers, then the ONLY option is to use networking to communicate across servers with some common server to get the value. There are lots of different ways to do that.
Put the value in a database and always read the value from the common database
Designate one of your node.js instances as the master and have all the other node.js instances ask the value is on the master anytime they need it
Synchronize the value to each node.js process using networking so each node.js instance always has a current value in its own process
Use a shared file system (kind of like a poor man's database)
Since you already have a database, you probably want to just store it in the database you already have and query it from there rather than introduce another data store with redis just for this one use. If possible, you can have each process cache the value over some interval of time to improve performance for frequent requests.

Custom Logging mechanism: Master Operation with n-Operation Details or Child operations

I'm trying to implement logging mechanism in a Service-Workflow-hybrid application. The requirements for logging is that instead for independent log action, each log must be considered as a detail operation and placed against a parent/master operation. So, it's a parent-child and goes to database table(s). This is the primary reason, NLog failed.
To help understand better, I'm diving in a generic detail. This is how the application flow goes:
Now, the Main entry point of the application (normally called Program.cs) is Platform. It initializes an engine that is capable of listening incoming calls from ISDN lines, VoIP, or web services. The interface is generic, so any call that reaches the Platform triggers OnConnecting(). OnConnecting() is a thread-safe event and can be triggered as many times as system requires.
Within OnConnecting(), a new instance of our custom Workflow manager is launched and the context is a custom object called ProcessingInfo:
new WorkflowManager<ZeProcessingInfo>();
Where, ZeProcessingInfo:
var ZeProcessingInfo = new ProcessingInfo(this, new LogMaster());
As you can see, the ProcessingInfo is composed of Platform itself and a new instance of LogMaster. LogMaster is defined in an independent assembly.
Now this LogMaster is available throughout the WorkflowManager, all the Workflows it launches, all the activities within any running Workflow, and passed on to external code called from within any Activity. Now, when a new LogMaster is initialized, a Master Operation entry is created in the database and this LogMaster object now lives until this call is ended after a series of very serious roller coaster rides through different workflows. Upon every call of OnConnecting(), a new Master Operation is created and maintained.
The LogMaster allows for calling a AddDetail() method that adds new child detail under the internally stored Master Operation (distinguished through a Guid Primary Key). The LogMaster is built upon Entity Framework.
And, I'm able to log under the same Master Operation as many times as I require. But the application requirements are changing and there is a need to log from other assemblies now. There is a Platform Server assembly witch is a Windows Service that acts as a server listening to web service based calls and once a client calls a method, OnConnecting in Platform is triggered.
I need a mechanism to somehow retrieve the related LogMaster object so that I can add detail to the same Master Operation. But Platform Server is the once triggering the OnConnecting() on the Platform and thus, instantiating LogMaster. This creates a redundancy loop.
Also, failure scenarios are being considered as well. If LogMaster fails, need to revert to Event Logging from Database Logging. If Event Logging is failed (or not allowed through unified configuration), need to revert to file-based (XML) logging.
I hope I have given a rough idea. I don't expect code but I need some strategy for a very seamless plug-able configurable logging mechanism that supports Master-Child operations.
Thanks for reading. Any help would be much appreciated.
I've read this question a number of times and it was pretty hard to figure out what was going on. I don't think your diagram helps at all. If your question is about trying to retrieve the master log record when writing child log records then I would forget about trying to create normalised data in the log tables. You will just slow down the transactional system in trying to do so. You want the log/audit records to write as fast as possible and you can later aggregate them when you want to read them.
Create a de-normalised table for the logs entries and use a single Guid in that table to track the session/parent log master. Yes this will be a big table but it will write fast.
As for guaranteed delivery of log messages to a destination, I would try not to create multiple destinations as combining them later will be a nightmare but rather use something like MSMQ to emit the audit logs as fast as possible and have another service pick them up and process them in a guaranteed delivery manner. ETW (Event Logging) is not guaranteed under load and you will not know that it has failed.

Resources