Ignite Persistent Store Loading mechanism - persistent-storage

I need the cachestore to be loaded at startup according to its configuration without any additional codes like :
CacheStore.load()
But in https://apacheignite.readme.io/docs/persistent-store , I could not come accross with an expression that it loads itself at startup automaticly.
am I missing something here or is there really no way to do this at boot time without conding?
Thx

I see the following way that should work in your case:
implement org.apache.ignite.lifecycle.LifecycleBean interface and process org.apache.ignite.lifecycle.LifecycleEventType#AFTER_NODE_START event in the implementation;
when the event fires call cache("cache_name").localLoadCache() in bean's implementation. Entries for which a started node is either primary or backup will be stored on that node.
register your LifecycleBean implementation with IgniteConfiguration.setLifecycleBeans(lifeCycleBean) or the same way in Spring XML.
As a result when a node is started with such configuration the pre-loading will be started automatically because of registered LifecycleBean.
Here you can find an example on how to work with LifecycleBean in Ignite.

Related

Hazelcast and the need for custom serializers; works when creating the server but not when connecting to existing

We are using Hazelcast to store stuff in distributed maps. We are having a problem with remote servers and I need some feedback on what we can do to resolve the issue.
We create the server - WORKS
We create a new server (Hazelcast.newHazelcastInstance) inside our application's JVM. The hazelcast Config object we pass in has a bunch of custom serializers defined for all the types we are going to put in the maps. Our objects are a mixture of Protobufs, plain java objects, and a combination of the two. The server starts, we can put objects in the map and get objects back out later. We recently decided to start running Hazelcast in its own dedicated server so we tried the scenario below.
Server already exists externally, we connect as a client - DOESN'T WORK
Rather than creating our Hazelcast instance we connect to a remote instance that is already running. We pass in a config with all the same serializers we used before. We successfully connect to Hazelcast and we can put stuff in the map (works as far as I can tell) but we don't get anything back out. No events get fired letting our listeners know objects were added to a map.
I want to be able to connect to a Hazelcast instance that is already running outside of our JVM. It is not working for our use case and I am not sure how it is supposed to work.
Does the JVM running Hazelcast externally need in its class loader all of the class types we might put into the map? It seems like that might be where the problem is but wouldn't that make it very limiting to use Hazelcast?
How do you typically manage those class loader issues?
Assuming the above is true, is there a way to tell Hazelcast we will serialize the objects before even putting them in the map? Basically we would give Hazelcast an ID and byte array and that is all we would expect back in return. If so that would avoid the entire class loader issue I think we are running into. We do not need to be able to search on objects based on their fields. We just need to know as objects come and go and what their ID is.
#Jonathan, when using client-server architecture, unless you use queries or other operations that require data to be serialized on the cluster, members don't need to know anything about serialization. They just store already serialized data & serve it. If these listeners that you mentioned are on the client app, it should be working fine.
Hazelcast has a feature called User Code Deployment, https://docs.hazelcast.org/docs/3.11/manual/html-single/index.html#member-user-code-deployment-beta, but it's mainly for user classes. Serialization related config should be present on members or you should add that later & do a rolling restart.
If you can share some of the exceptions/setup etc, I can give specific answers as well.

spring batch design advice for processing 50k files

We have more than 50k files coming in everyday and needs to be processed. For that we have developed POC apps with design like,
Polling app picks the file continuously from ftp zone.
Validate that file and create metadata in db table.
Another poller picks 10-20 files from db(only file id and status) and deliver it to slave apps as message
Slave app take message and launch a spring batch job, which is reading data, does biz validation in processors and writes validated data to db/another file.
We used spring integration and spring batch technology for this POC
Is it a good idea to launch spring batch job in slaves or directly implement read,process and write logic as plan java or spring bean objects?
Need some insight on launching this job where slave can have 10-25 MDP(spring message driven pojo) and each of this MDP is launching a job.
Note : Each file will have max 30 - 40 thousand records
Generally, using SpringIntegration and SpringBatch for such tasks is a good idea. This is what they are intended for.
With regard to SpringBatch, you get the whole retry, skip and restart handling out of the box. Moreover, you have all these readers and writers that are optimised for bulk operations. This works very well and you only have to concentrate on writing the appropriate mappers and such stuff.
If you want to use plain java or spring bean objects, you will probably end up developing such infrastructure code by yourself... incl. all the needed effort for testing and so on.
Concerning your design:
Besides validating and creation of the metadata entry, you could consider to load the entries directly into a database table. This would give you a better "transactional" control, if something fails. Your load job could look something like this:
step1:
tasklet to create an entry in metadata table with columns like
FILE_TO_PROCESS: XY.txt
STATE: START_LOADING
DATE: ...
ATTEMPT: ... first attempt
step2:
read and validate each line of the file and store it in a data table
DATA: ........
STATE:
FK_META_TABLE: ForeignKey to meta table
step3:
update metatable with status LOAD_completed
-STATE : LOAD_COMPLETED
So, as soon as your metatable entry gets the state LOAD_COMPLETED, you know that all entries of the files have been validated and are ready for further processing.
If something fails, you just can fix the file and reload it.
Then, to process further, you could just have jobs which poll periodically and check if there are new data in the database which should be processed. If more than one file had been loaded during the last period, simply process all files that are ready.
You could even have several slave-processes polling from time to time. Just do a read for update on the state of the metadata table or use an optimistic locking approach to prevent several slaves from trying to process the same entries.
With this solution, you don't need a message infrastructure and you can still scale the whole application without any problems.

How to connect pinoccio to apache couchdb

Is there anyone using the nice pinoccio from www.pinocc.io ?
I want to use it to post data into an apache couchdb using node.js. So I'm trying to poll data from the pinnocio API, but I'm a little lost:
schedule the polls
do long polls
do a completely different approach
Any ideas are welcome
Pitt
Sure. I wrote the Pinoccio API, here’s how you do it
https://gist.github.com/soldair/c11d6ae6f4bead140838
This example depends on the pinoccio npm module ~0.1.3 so make sure to npm install again to pick up the newest version.
you don't need to poll because pinoccio will send you changes as they happen if you have an open connection to either "stats" or "sync". if you want to poll you can but its not "real time".
sync gives you the current state + streams changes as they happen. so its perfect if you
only need to save the changes to your troop while your script is running. or show the current and last known state on a web page.
The solution that replicates every data point we store is stats. This is the example provided. Stats lets you read everything that has happened to a scout. Digital pins for example are the "digital" report. You can ask for data from a specific point in time or just from the current time (default). Changes to this "digital" report will continue streaming live as they happen, until the "end" time is reached, or if "tail" equals 0 in the options passed to stats.
hope this helps. i tested the script on my local couch and it worked well. you would need to modify it to copy more stats from each scout. I hope that soon you will be able to request multiple reports from multiple scouts in the same stream. i just have some bugs to sort out ;)
You need to look into 2 dimensions:
node.js talking to CouchDB. This is well understood and there are some questions you can find here.
Getting the data from the pinoccio. The API suggests that as long as the connection is open, you get data. So use a short timeout and a loop. You might want to run your own node.js instance for that.
Interesting fact: the CouchDB team seems to work on replacing their internal JS engine with node.js

How to get which hazelcast instance is running on the node

I am running many instances of hazelcast with different group names(i.e different cluster) on different nodes. Now I want to make program which runs on a given node and needs to know which HazelcastInstance is running on this node and access its config file. I dont want this program to create any new hazelcast instance. How this can be done?
It depends.
You can always look up the HazelcastInstance(s) using Hazelcast.getHazelcastInstanceByName if you know the name or get them all using getAllHazelcastInstances.
In some cases you want to get the HazelcastInstance after deserialization (e.g. you send a task to a hz instance using an iexecutor). In this case you can implement the HazelcastInstanceAware interface to get the instance injected.
So it depends a bit on your setup.
The config object you can load using HazelcastInstance.getConfig. The instance doesn't know if the config was made using a xml file, or was made programmatically.

Custom Logging mechanism: Master Operation with n-Operation Details or Child operations

I'm trying to implement logging mechanism in a Service-Workflow-hybrid application. The requirements for logging is that instead for independent log action, each log must be considered as a detail operation and placed against a parent/master operation. So, it's a parent-child and goes to database table(s). This is the primary reason, NLog failed.
To help understand better, I'm diving in a generic detail. This is how the application flow goes:
Now, the Main entry point of the application (normally called Program.cs) is Platform. It initializes an engine that is capable of listening incoming calls from ISDN lines, VoIP, or web services. The interface is generic, so any call that reaches the Platform triggers OnConnecting(). OnConnecting() is a thread-safe event and can be triggered as many times as system requires.
Within OnConnecting(), a new instance of our custom Workflow manager is launched and the context is a custom object called ProcessingInfo:
new WorkflowManager<ZeProcessingInfo>();
Where, ZeProcessingInfo:
var ZeProcessingInfo = new ProcessingInfo(this, new LogMaster());
As you can see, the ProcessingInfo is composed of Platform itself and a new instance of LogMaster. LogMaster is defined in an independent assembly.
Now this LogMaster is available throughout the WorkflowManager, all the Workflows it launches, all the activities within any running Workflow, and passed on to external code called from within any Activity. Now, when a new LogMaster is initialized, a Master Operation entry is created in the database and this LogMaster object now lives until this call is ended after a series of very serious roller coaster rides through different workflows. Upon every call of OnConnecting(), a new Master Operation is created and maintained.
The LogMaster allows for calling a AddDetail() method that adds new child detail under the internally stored Master Operation (distinguished through a Guid Primary Key). The LogMaster is built upon Entity Framework.
And, I'm able to log under the same Master Operation as many times as I require. But the application requirements are changing and there is a need to log from other assemblies now. There is a Platform Server assembly witch is a Windows Service that acts as a server listening to web service based calls and once a client calls a method, OnConnecting in Platform is triggered.
I need a mechanism to somehow retrieve the related LogMaster object so that I can add detail to the same Master Operation. But Platform Server is the once triggering the OnConnecting() on the Platform and thus, instantiating LogMaster. This creates a redundancy loop.
Also, failure scenarios are being considered as well. If LogMaster fails, need to revert to Event Logging from Database Logging. If Event Logging is failed (or not allowed through unified configuration), need to revert to file-based (XML) logging.
I hope I have given a rough idea. I don't expect code but I need some strategy for a very seamless plug-able configurable logging mechanism that supports Master-Child operations.
Thanks for reading. Any help would be much appreciated.
I've read this question a number of times and it was pretty hard to figure out what was going on. I don't think your diagram helps at all. If your question is about trying to retrieve the master log record when writing child log records then I would forget about trying to create normalised data in the log tables. You will just slow down the transactional system in trying to do so. You want the log/audit records to write as fast as possible and you can later aggregate them when you want to read them.
Create a de-normalised table for the logs entries and use a single Guid in that table to track the session/parent log master. Yes this will be a big table but it will write fast.
As for guaranteed delivery of log messages to a destination, I would try not to create multiple destinations as combining them later will be a nightmare but rather use something like MSMQ to emit the audit logs as fast as possible and have another service pick them up and process them in a guaranteed delivery manner. ETW (Event Logging) is not guaranteed under load and you will not know that it has failed.

Resources