Hazelcast absolute expiration of items - hazelcast

I am using Hazelcast "3.6.3" on scala 2.11.8
I have written this code.
val config = new Config("mycluster")
config.getNetworkConfig.getJoin.getMultcastConfig.setEnabled(false)
config.getNetworkConfig.getJoin.getMulticastConfig.setEnabled(false)
config.getNetworkConfig.getJoin.getAwsConfig.setEnabled(false)
config.getNetworkConfig.getJoin.getTcpIpConfig.setMembers(...)
config.getNetworkConfig.getJoin.getTcpIpConfig.setEnabled(true)
val hc = Hazelcast.newHazelcastInstance(config)
hc.getConfig.addMapConfig(new MapConfig()
.setName("foo")
.setBackupCount(1)
.setTimeToLiveSeconds(3600)
.setAsyncBackupCount(1)
.setInMemoryFormat(InMemoryFormat.BINARY)
.setMaxSizeConfig(new MaxSizeConfig(1, MaxSizePolicy.USED_HEAP_SIZE))
)
hc.putValue[(String, Int)]("foo", "1", ("foo", 10))
I notice that when 1 hour is over hazelcast does not remove the items from the cache. The items seem to be living forever in the cache.
I don't want sliding expiration. I want absolute expiration this means that after 1 hour the item has to be kicked out no matter how many times it was accessed during the hour.
I have done required googling and I think my code above is correct. But when I look at my server logs, I am pretty sure that nothing is removed from the cache.

Sorry I am not a scala guy. But can you explain what does hc.addTimeToLiveMapConfig does?
Normally you need to add the TTL config into Config object before starting Hazelcast.
I believe in your case, you are starting Hazelcast and only then updating the config with TTL. Please try with reverse order.
If you don't like to add this to the configuration, there is an overloaded map.put method that takes TTL as an input. This way you can specify TTL per entry.

Related

Right way to delete and then reindex ES documents

I have a python3 script that attempts to reindex certain documents in an existing ElasticSearch index. I can't update the documents because I'm changing from an autogenerated id to an explicitly assigned id.
I'm currently attempting to do this by deleting existing documents using delete_by_query and then indexing once the delete is complete:
self.elasticsearch.delete_by_query(
index='%s_*' % base_index_name,
doc_type='type_a',
conflicts='proceed',
wait_for_completion=True,
refresh=True,
body={}
)
However, the index is massive, and so the delete can take several hours to finish. I'm currently getting a ReadTimeoutError, which is causing the script to crash:
WARNING:elasticsearch:Connection <Urllib3HttpConnection: X> has failed for 2 times in a row, putting on 120 second timeout.
WARNING:elasticsearch:POST X:9200/base_index_name_*/type_a/_delete_by_query?conflicts=proceed&wait_for_completion=true&refresh=true [status:N/A request:140.117s]
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='X', port=9200): Read timed out. (read timeout=140)
Is my approach correct? If so, how can I make my script wait long enough for the delete_by_query to complete? There are 2 timeout parameters that can be passed to delete_by_query - search_timeout and timeout, but search_timeout defaults to no timeout (which is I think what I want), and timeout doesn't seem to do what I want. Is there some other parameter I can pass to delete_by_query to make it wait as long as it takes for the delete to finish? Or do I need to make my script wait some other way?
Or is there some better way to do this using the ElasticSearch API?
You should set wait_for_completion to False. In this case you'll get task details and will be able to track task progress using corresponding API: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html#docs-delete-by-query-task-api
Just to explain more in the form of codebase explained by Random for the newbee in ES/python like me:
ES = Elasticsearch(['http://localhost:9200'])
query = {'query': {'match_all': dict()}}
task_id = ES.delete_by_query(index='index_name', doc_type='sample_doc', wait_for_completion=False, body=query, ignore=[400, 404])
response_task = ES.tasks.get(task_id) # check if the task is completed
isCompleted = response_task["completed"] # if complete key is true it means task is completed
One can write custom definition to check if the task is completed in some interval using while loop.
I have used python 3.x and ElasticSearch 6.x
You can use the 'request_timeout' global param. This will reset the Connections timeout settings, as mentioned here
For example -
es.delete_by_query(index=<index_name>, body=<query>,request_timeout=300)
Or set it at connection level, for example
es = Elasticsearch(**(get_es_connection_parms()),timeout=60)

How to clean FoundationDB?

Is there any fast way to remove all data from the local database? Like SQL 'drop database'?
I was looking through the documentation but haven't found anythig interesting yet.
The "CLI" way
Using the provided fdbcli interface, you can clear all the keys in the database using a single clearrange command, like this:
fdb> writemode on
fdb> clearrange "" \xFF
Committed (68666816293119)
Be warned that it executes instantly and that there is no undo possible!
Also, any application still connected to the database may continue reading/writing data using cached directory subspace prefixes, which may introduce data corruption! You should make sure to only use this method when nothing is actively using the cluster.
This method requires that your cluster be in a working state, and it will not immediately reclaim the space used on disk, and also will not reset the cluster's read version.
The "hard" way
If you have a single-node cluster, you can stop the fdb service, remove all files in its data_dir folder, restart the service, and then using fdbcli, execute the configure new single ssd command.
This will reclaim the disk space used previously, and reset everything back to the post-install state.
You can do this by clearing the entire range of keys.
In Python, it looks like this:
Database.clear_range('', '\xFF')
Where '' is the default slice begin, and '\xFF' is the default slice end, according to the clear_range documentation.
You can find the more information on clear_range for the API you're using in the documentation.
To do this programmatically in Java:
db.run(tx -> {
final byte[] st = new Subspace(new byte[]{(byte) 0x00}).getKey();
final byte[] en = new Subspace(new byte[]{(byte) 0xFF}).getKey();
tx.clear(st, en);
return null;
});

Can the TTL of tupule be updated without updating the tupule itself in cassandra

I am sort of noob to cassandra. I was wondering if it is possible to add expiry to a tupule without actually updating the tupule. I have not specified the TTL during INSERT of the tupule. Now I just want to update the TTL.
Is this possible?
Regards
Rajesh
As far as I can tell there's no way to set only the ttl. You could probably re-set one of the values to allow you to pass in the ttl:
UPDATE TABLE USING TTL 10 SET a_col = a_col WHERE key = key;
See the syntax: here
Note: keep in mind that this will set the TTL for the a_col column and will result in an write operation.
Update: this answer is also a valid option.

Shiro resets the session after 2 min

I am using Apache Shiro in my webapp.
I store some parameters in the session notably the primary key of an object stored in the database.
When the user logs in, I load the object from the database and save the primary key in the session. Then within the app the user can edit the object's data and either hit a cancel or a save button.
Both buttons triggers a RPC that gets the updated data to the server. The object is then updated in the database using the primary key stored in the session.
If the user remains active in the app (making some RPCs) everything works fine. But if he stays inactive for 3 min and subsequently makes a RPC then Shiro's securityUtils.getSubject().getSession() returns null.
The session timeout is set to 1,200,000 ms (20 min) so I don't think this is the issue.
When I go through the sessions stored in the cache of my session manager I can see the user's session org.apache.shiro.session.mgt.SimpleSession,id=6de78f10-b58e-496c-b40a-e2a9a4ad069c but when I try to get the session ID from the cookie and to call SecurityUtils.getSecurityManager().getSession(key) to get the session (where key is a SessionKey implementation): I get an exception.
When I try building a new subject from the session ID I lose all the attributes saved in the session.
I am happy to post some code to help resolve the issue but I tried so many workarounds that I don't know where to start... So please let me know what you need.
Alternatively if someone knows a better documented framework than Shiro I am all ears (Shiro's lack of documentation makes it really too time consuming)
The issue was related to the session config in the ini file. As usual with shiro the order mattered and some of my lines were out of place.
Below is the config that worked for me:
sessionDAO = org.apache.shiro.session.mgt.eis.EnterpriseCacheSessionDAO
#sessionDAO.activeSessionsCacheName = dropship-activeSessionCache
sessionManager = org.apache.shiro.web.session.mgt.DefaultWebSessionManager
sessionManager.sessionDAO = $sessionDAO
# cookie for single sign on
cookie = org.apache.shiro.web.servlet.SimpleCookie
cookie.name = www.foo.com.session
cookie.path = /
sessionManager.sessionIdCookie = $cookie
# 1,800,000 milliseconds = 30 mins
sessionManager.globalSessionTimeout = 1800000
sessionValidationScheduler =
org.apache.shiro.session.mgt.ExecutorServiceSessionValidationScheduler
sessionValidationScheduler.interval = 1800000
sessionManager.sessionValidationScheduler = $sessionValidationScheduler
securityManager.sessionManager = $sessionManager
cacheManager = org.apache.shiro.cache.ehcache.EhCacheManager
securityManager.cacheManager = $cacheManager
It sounds as if you have sorted out your problem already. As you discovered, the main thing to keep in mind with the Shiro INI file is that order matters; the file is parsed in order, which can actually be useful for constructing objects used in the configuration.
Since you mentioned Shiro's lack of documentation, I wanted to go ahead and point out two tutorials that I found helpful when starting:
http://www.javacodegeeks.com/2012/05/apache-shiro-part-1-basics.html
and
http://www.ibm.com/developerworks/web/library/wa-apacheshiro/.
There are quite a few other blog posts that provide good information to supplement the official documentation if you look around.
Good luck!

How can i manually reindex solr using sunspot?

I have couchdb. Sunspot was correctly indexing everything. But the Solr server crashed. I need to reindex the whole thing. rake sunspot:reindex wont work as it is tigthly coupled with active record. sunspot.index(model.all) didnt work. the solr core says 0 indexed docs even after doing that. is there a way out?
Post.solr_reindex
There are a number of options that can be passed to solr_reindex. The same options as to index; from the documentation
index in batches of 50, commit after each
Post.index
index all rows at once, then commit
Post.index(:batch_size => nil)
index in batches of 50, commit when all batches complete
Post.index(:batch_commit => false)
include the associated +author+ object when loading to index
Post.index(:include => :author)
What I was looking for was this:
Post.index!(Model.all)
There was something bad happening when I tried to index assuming that batch commits would happen automatically. Any way this worked totally fine for me.
I usually write below command to index models. It works perfectly every time.
For a Model i.e. (Post)
Sunspot.index Post.all
For a Model row i.e. (Post.where(id: 5))
Sunspot.index Post.where(id: 5)
It will work.
Cheers!

Resources