How to trigger the pre-load of Hazelcast NearCache? - hazelcast

I understand that the NearCache gets loaded only after first get operation is performed on that key on the IMap. But I am interested in knowing if there is any way to trigger the pre-load of the NearCache with all the entries from its cluster.
Use Case:
The key is a simple bean object and the value is a DAO object of type TIntHashMap containing lot of entries.
Size:
The size of value object ranges from 0.1MB to 24MB (and >90% of the entries have less than 5MB). The number of entries range from 150-250 in the IMap.
Benchmarks:
The first call to the get operation is taking 2-3 seconds and later calls are taking <10 ms.
Right now I have created below routine which reads the IMap and reads each entries to refresh the NearCache.
long startTime = System.currentTimeMillis();
IMap<Object, Object> map = client.getMap("utility-cache");
log.info("Connected to the Cache cluster. Starting the NearCache refresh.");
int i = 0;
for (Object key : map.keySet()) {
Object value = map.get(key);
if(log.isTraceEnabled()){
SizeOf sizeOfKey = new SizeOf(key);
SizeOf sizeOfValue = new SizeOf(value);
log.info(String.format("Size of %s Key(%s) Object = %s MB - Size of %s Value Object = %s MB", key.getClass().getSimpleName(), key.toString(),
sizeOfKey.sizeInMB(), value.getClass().getSimpleName(), sizeOfValue.sizeInMB()));
}
i++;
}
log.info("Refreshed NearCache with " + i + " Entries in " + (System.currentTimeMillis() - startTime) + " ms");

As you said, the Near Cache gets populated on get() calls on IMap or JCache data structures. At the moment there is no system to automatically preload any data.
For efficiency you can use getAll() which will get the data in batches. This should improve the performance of your own preloading functionality. You can vary your batch sizes until you find the optimum for your use case.
With Hazelcast 3.8 there will be a Near Cache preloader feature, which will store the keys in the Near Cache on disk. When the Hazelcast client is restarted the previous data set will be pre-fetched to re-populate the previous hot data set in the Near Cache as fast as possible (only the keys are stored, the data is fetched again from the cluster). So this won't help for the first deployment, but for all following restarts. Maybe this is already what you are looking for?
You can test the feature in the 3.8-EA or the recent 3.8-SNAPSHOT version. The documentation for the configuration can be found here: http://docs.hazelcast.org/docs/latest-dev/manual/html-single/index.html#configuring-near-cache
Please be aware that we changed the configuration parameter from file-name to filename between EA and the actual SNAPSHOT. I recommend the SNAPSHOT version, since we also made some other improvements in the preloader code.

Related

Throttling EF queries to save DTUs

We have an asp.Net application using EF 6 hosted in Azure. The database runs at about 20% DTU usage for most of the time except for certain rare actions.
These are almost like db dumps in Excel format, like having all orders of the last X years etc. which the (power) users can trigger and then get the result later by email.
The problem is that these queries use up all DTU and the whole application goes into a crawl. We would like to kind of throttle these non-critical queries as it doesn't matter if this takes 10-15min longer.
Googling I found the option to reduce the DEADLOCK_PRIORITY but this wont fix the issue of using up all resources.
Thanks for any pointers, ideas or solutions.
Optimizing is going to be hard as it is more or less a db dump.
Azure SQL Database doesn't have Resource Governor available, so you'll have to handle this in code.
Azure SQL Database runs in READ COMMITTED SNAPSHOT mode, so slowing down the session that dumps the data from a table (or any streaming query plan) should reduce its DTU consumption without adversely affecting other sessions.
To do this put waits in the loop that reads the query results, either an IEnumerable<TEntity> returned from a LINQ query or a SqlDataReader returned from an ADO.NET SqlCommand.
But you'll have to directly loop over the streaming results. You can't copy the query results into memory first using IQueryable<TEntity>.ToList() or DataTable.Load(), SqlDataAdapter.Fill(), etc as that would read as fast as possible.
eg
var results = new List<TEntity>();
int rc = 0;
using (var dr = cmd.ExecuteReader())
{
while (dr.Read())
{
rc++;
var e = new TEntity();
e.Id = dr.GetInt(0);
e.Name = dr.GetString(1);
// ...
results.Add(e);
if (rc%100==0)
Thread.CurrentThread.Sleep(100);
}
}
or
var results = new List<TEntity>();
int rc = 0;
foreach (var e in db.MyTable.AsEnumerable())
{
rc++;
var e = new TEntity();
e.Id = dr.GetInt(0);
e.Name = dr.GetString(1);
// ...
results.Add(e);
if (rc%100==0)
Thread.CurrentThread.Sleep(100);
}
For extra credit, use async waits and stream the results directly to the client without batching in memory.
Alternatively, or in addition, you can limit the number of sessions that can concurrently perform the dump to one, or one per table, etc using named Application Locks.

arangodb truncate fails on large a collection

I get a timeout in arangosh and the arangodb service gets unresponsive if I try to truncate a large collection of ~40 million docs. Message:
arangosh [database_xxx]> db.[collection_yyy].truncate() ; JavaScript exception in file '/usr/share/arangodb/js/client/modules/org/arangodb/arangosh.js' at 104,13: [ArangoError 2001: Error reading from: 'tcp://127.0.0.1:8529' 'timeout during read'] !
throw new ArangoError(requestResult); ! ^ stacktrace: Error
at Object.exports.checkRequestResult (/usr/share/arangodb/js/client/modules/org/arangodb/arangosh.js:104:13)
at ArangoCollection.truncate (/usr/share/arangodb/js/client/modules/org/arangodb/arango-collection.js:468:12)
at <shell command>:1:11
ArangoDB 2.6.9 on Debian Jessie, AWS ec2 m4.xlarge, 16G RAM, SSD.
The service gets unresponsive. I suspect it got stuck (not just busy), because it doesn't work until after I stop, delete database in /var/lib/arangodb/databases/, then start again.
I know I may be leaning towards the limits of performance due to the size, but I would guess that it is the intention not to fail regardless of size.
However on a non cloud Windows 10, 16GB RAM, SSD the same action succeeded well - after a while.
Is it a bug? I have some python code that loads dummy data into a collection if it helps. Please let me know if I shall provide more info.
Would it help to fiddle with --server.request-timeout ?
Increasing --server.request-timeout for the ArangoShell will only increase the timeout that the shell will use before it closes an idle connection.
The arangod server will also shut down lingering keep-alive connections, and that may happen earlier. This is controlled via the server's --server.keep-alive-timeout setting.
However, increasing both won't help much. The actual problem seems to be the truncate() operation itself. And yes, it may be very expensive. truncate() is a transactional operation, so it will write a deletion marker for each document it removes into the server's write-ahead log. It will also buffer each deletion in memory so the operation can be rolled back if it fails.
A much less intrusive operation than truncate() is to instead drop the collection and re-create it. This should be very fast.
However, indexes and special settings of the collection need to be recreated / restored manually if they existed before dropping it.
For a document collection, it can be achieved like this:
function dropAndRecreateCollection (collectionName) {
// save state
var c = db._collection(collectionName);
var properties = c.properties();
var type = c.type();
var indexes = c.getIndexes();
// drop existing collection
db._drop(collectionName);
// restore collection
var i;
if (type == 2) {
// document collection
c = db._create(collectionName, properties);
i = 1;
}
else {
// edge collection
c = db._createEdgeCollection(collectionName, properties);
i = 2;
}
// restore indexes
for (; i < indexes.length; ++i) {
c.ensureIndex(indexes[i]);
}
}

Time slowed down using `S.D.Stopwatch` on Azure

I just ran some code which reports its performance on an Azure Web Sites instance; the result seemed a little off. I re-ran the operation, and indeed it seems consistent: System.Diagnostics.Stopwatch sees an execution time of 12 seconds for an operation that actually took more than three minutes (at least 3m16s).
Debug.WriteLine("Loading dataset in database ...");
var stopwatch = new Stopwatch();
stopwatch.Start();
ProcessDataset(CurrentDataSource.Database.Connection as SqlConnection, parser);
stopwatch.Stop();
Debug.WriteLine("Dataset loaded in database ({0}s)", stopwatch.Elapsed.Seconds);
return (short)stopwatch.Elapsed.Seconds;
This process runs in the context of a WCF Data Service "action" and seeds test data in a SQL Database (this is not production code). Specifically, it:
Opens a connection to an Azure SQL Database,
Disables a null constraint,
Uses System.Data.SqlClient.SqlBulkCopy to lock an empty table and load it using a buffered stream that retrieves a dataset (2.4MB) from Azure Blob Storage via the filesystem, decompresses it (GZip, 4.9MB inflated) and parses it (CSV, 349996 records, parsed with a custom IDataReader using TextFieldParser),
Updates a column of the same table to set a common value,
Re-enables the null constraint.
No less, no more; there's nothing particularly intensive going on, I figure the operation is mostly network-bound.
Any idea why time is slowing down?
Notes:
Interestingly, timeouts for both the bulk insert and the update commands had to be increased (set to five minutes). I read that the default is 30 seconds, which is more than the reported 12 seconds; hence, I conclude that SqlClient measures time differently.
Reports from local execution seem perfectly correct, although it's consistently faster (4-6s using LocalDB) so it may just be that the effects are not noticeable.
You used stopwatch.Elapsed.Seconds to get total time but it is wrong. Elapsed.Seconds is the seconds component of the time interval represented by the TimeSpan structure. Please try stopwatch.Elapsed.TotalSeconds instead.

Why aren't the unused segment files being deleted?

I don't know what changed--things were working relatively well with our Lucene implementation. But now, the number of files in the index directory just keeps growing. It started with _0 files, then _1 files appeared, then _2 and _3 files. I am passing in false to the IndexWriter's constructor for the 'create' parameter, if there are existing files in that directory when it begins:
indexWriter = new IndexWriter(azureDirectory, analyzer, (azureDirectory.ListAll().Length == 0), IndexWriter.MaxFieldLength.UNLIMITED);
if (indexWriter != null)
{
// Set the number of segments to save in memory before writing to disk.
indexWriter.MergeFactor = 1000;
indexWriter.UseCompoundFile = false;
indexWriter.SetRAMBufferSizeMB(800);
...
indexWriter.Dispose(); indexWriter = null;
}
Maybe it's realated to the UseCompoundFile flag?
Every couple of minutes, I create a new IndexWriter, process 10,000 documents, then dispose the IndexWriter. The index works, but the growing number of files is very bad, because I'm using AzureDirectory which copies every file out of Azure into a cache directory before starting the Lucene write.
Thanks.
This is the normal behavior. If you want a single index segment you have some options:
Use compound files
Use a MergeFactor of 1 if you use LogMergePolicy, which is the default policy for lucene 3.0. Note that the method you use on the IndexWriter is just a convenience method that calls mergePolicy.MergeFactor as long as mergePolicy is an instance of LogMergePolicy.
Run an optimization after each updates to your index
Low merge factors and optimizations after each updates can have serious drawbacks on the performance of your app which will depend on the type of indexing you do.
See this link which documents a little bit the effects of MergeFactor :
http://lucene.apache.org/core/old_versioned_docs/versions/3_0_1/api/core/org/apache/lucene/index/LogMergePolicy.html#setMergeFactor%28%29

Hector to get the resulting counter value after doing incrementCounter

We are doing the following to update the value of a counter, now we wonder if there is a straightforward way to get back the updated counter value immediately.
mutator.incrementCounter(rowid1, "cf1", "counter1", value);
There's no single 'incrementAndGet' operation in Cassandra thrift API.
Counters in Cassandra are eventually consistent and non-atomic. Fragile ConsistencyLevel.ALL operation is required to get "guaranteed to be updated" counter value, i.e. perform consistent read. ConsistencyLevel.QUORUM is not sufficient (as specified in counters design document: https://issues.apache.org/jira/secure/attachment/12459754/Partitionedcountersdesigndoc.pdf).
To implement incrementAndGet method that looks consistent, you might want at first read counter value, then issue increment mutation, and return (read value + inc).
For example, if previous counter value is 10 to 20 (on different replicas), and one add 50 to it, read-before-increment will return either 60 or 70. And read-after-increment might still return 10 or 20.
The only way to do it is query for it. There is no increment-then-read functionality available in Cassandra.

Resources