Hazelcast Map reload on demand - hazelcast

HazeCast 3.2-RC1 Evaluation:
I am not able find any HazelCast api to re-load i.e, trigger MapLoader (loadAllKeys(), loadAll()) on-demand.
I see this autoload (ALL) happens only when Server starts, But I need a control to re-load on demand when required to re-synchronize with underlying database.
Map.clear() clears all the data, but not finding any control to to re-load automatically rather write additional code to populate the data and push it to the cache?
Can some advise if there are any workarounds?
Thanks

the documentation says that the MapStore is called if a key is not in memory. So after you clear the Map will be populated by simply call get() on it. You will only have the data in memory that is really used.
On the other hand, MapStore is called "when the map is first touched/used". Maybe you can create a new hazelcast map and switch to the new map.
see http://www.hazelcast.org/docs/latest/manual/html-single/hazelcast-documentation.html#persistence for more information.
Regards
Thorsten

Related

Hazelcast forcefully call loadAll() method when any of the Hazelcast member goes down

I have implemented read-through using MapLoader with loadAll key functionality which loads specific list of keys and value from database to Hazelcast map during map initialization. My question is if some of the member goes down, I need repopulate those keys and values accordingly. This is because we are aggressively using Hazelcast predicates to search the relevant data (not by the key) but if one the member goes down there are chances we won't be getting actual result from cache.
As far as I know Hazelcast will also not create any backup of data which are configured with MapLoader. Only one option I can think of is to forcefully call loadAll function of MapLoader once any of the members goes down. But not really sure how to implement the same. I am also open for other suggestion.
As far as I know Hazelcast will also not create any backup of data which are configured with MapLoader.
I am not sure what made you believe that. Backups and MapLoader are two separate configuration settings on a map and you can configure them independently. So unless you configured zero backups you will have a backup of your data, even when loaded by a MapLoader.
For more see Making Your Map Data Safe section in the documentation.
If you want to reload the data using MapLoader you can use com.hazelcast.map.IMap#loadAll(boolean) for all data or com.hazelcast.map.IMap#loadAll(java.util.Set<K>, boolean) for specific set of keys.

Hazelcast load data from RDBMS in client-server topology

I am using client-server topology for hazelcast cache. I have multiple maps which I load eagerly using MapLoaders. When there is a cache miss- Maploader's load(key) method is called. The MapLoader.load(key) method seems to be executed by the partition thread, which means that all other operations on the partition are blocked until loading is done. A very common use case for the MapLoader is to load data from a DB, which arguably can take some time. So what is the best possible approach to take so that other operations on partition are not blocked when laod is taking place? Is there any other way to load missing data at runtime?(Hazelcast version : Hazelcast 4.0.3)
There's a good answer to this question that gives a few options.
MapLoader.load(key) only loads a single entry, but if the remote source is really slow or there are lots of cache misses it's going to mount up.
Another alternative to #mike-yawn 's answer would be to have a Runnable that fetches needed items from the database and writes them directly into the map. You can still have the MapLoader.load(key) as well, but the chances of cache miss are reduced if your fetcher code is good at predicting what entries will be needed.
If you don't cache 100% of the records, then a cache miss is inevitable. If it's punitively slow you could always return a Entry.Value that contains some sort of flag that it's a placeholder and launch a thread to do the actual load. Then your code has to deal with that placeholder and try again later -- noting that when it tries later the eventual result of the database query could be no record found.

Does get(replicated)map load the whole data or just the reference?

I would like to get the value of a key, however the Map is large so I don't want it to be completely loaded into memory. So if I do something like:
hazelcast.getReplicatedMap(name).get(key)
will it load the whole map into memory then get the value?
If yes, is there a way to get the value of a key without loading everything into memory?
With the replicated map the whole map is replicated to all members in the cluster. So it will always be fully in memory on those members.
On the client side, only the value is pulled into memory when you call replicatedMap.get(key)
EDIT: Please see #pveentjer's answer since I supposed the question was asked for client topology and answered accordingly.
It does not load the whole map but returns an instance of it. So when you call hazelcast.getReplicatedMap(name).get(key) only one entry - if exists, will be fetched from distributed map.

Defining a Hazelcast MapStore for Key-Ranges

When we want to implement the MapStore interface we only have the method loadAll for initializing the map. Therefore you have to provide a set of keys to load into the map. How do you handle the situation when you have date / time as primary key. Intuitively one would define a key range where tst between a and b. But since we only can provide a Set we have to pre-fetch all the possible date-time values (via SQL or whatever). And the next time the IMap will start to hammer the database fetching every key one by one. Is this the best approach? Isn't there a more convenient way to do this?
My advice would be to stop thinking about the maps as if they were tables in a relational database. Try to think in terms that conform to the semantics of a Map (if you are using a Map as there are other distributed collections in Hazelcast). For example, you have to keep in mind that you can only make queries about objects that are available in memory as the semantics of query applies only to the case in which Hazelcast is used as a data grid and not as a cache. If semantics is the use of a cache, you should limit your access by key, as you would proceed with a traditional map in Java.
For example, when it comes to a data grid, you must think that access to the database will occur typically only to respond to disaster recovery scenarios. Therefore, the initial data loading from disk to memory may strongly hit the database, but that would only occur in cases of recovery, so that it is not such a major handicap. In the case of use of caching, yes it would be important to be very efficient when planning your persistence strategy since access to the database will be more frequent.
If you provide further information about what it's your particular use case, especially regarding eviction policies, maybe I might help you more.
Hope this helps.

Gremlin: SetProperty iteratively to existing graph database

I am trying to run JUNGs PageRank algorithm onto my existing neo4j graph database and save a node's score as a property for future reference.
So I created the following groovy file:
import edu.uci.ics.jung.algorithms.scoring.PageRank
g = new Neo4jGraph('/path/to/graph.db')
j = new GraphJung(g)
pr = new PageRank<Vertex,Edge>(j, 0.15d)
pr.evaluate()
g.V.sideEffect{it.pagerank=pr.getVertexScore(it)}
and run it through gremlin.
It runs smoothly and if I were to check the property via g.v(2381).map() I get what I'd expect.
However, when I leave gremlin and start up my neo4j server, these modifications are non-existant.
Can anyone explain why and how to fix this?
My hunch is that it has something to do with my graph in gremlin being embedded:
gremlin> g
==>neo4jgraph[EmbeddedGraphDatabase [/path/to/graph.db]]
Any ideas?
You will need a g.shutdown() at the end of your groovy script. Without a g.shutdown() all changes to the graph are most likely to stay in memory. Re-initializing the graph from disk (/path/to/graph.db in your case), will lose the changes which were still in memory. g.shutdown() will flush the current transaction from memory to disk. This will make sure your changes persist and will be retrieved when you try to access the database again.
Hope this helps.
Note: You are correct on the hunch for embedded database. This issue will not occur if you use Neo4j's REST interface because every REST API request is treated as a single transaction.

Resources