Hazelcast Aggregations API results in ClassCastException with Predicates - hazelcast

I'm using a Hazelcast IMap instance to hold objects like the following:
public class Report implements Portable, Comparable<Report>, Serializable
{
private String id;
private String name;
private String sourceId;
private Date timestamp;
private Map<String,Object> payload;
// ...
}
The IMap is keyed by the id, and I have also created an index on sourceId, as I need to query and aggregate based on that field.
IMap<String, Report> reportMap = hazelcast.getMap("reports");
reportMap.addIndex("sourceId", false);
I've been trying to use the Aggregations framework to count reports by sourceId. Attempt #1:
public static int reportCountforSource(String sourceId)
{
EntryObject e = new PredicateBuilder().getEntryObject();
Predicate<String, Report> predicate = e.get("sourceId").equal(sourceId);
Supplier<String, Report, Object> supplier = Supplier.fromPredicate(predicate);
Long count = reportMap.aggregate(supplier, Aggregations.count());
return count.intValue();
}
This resulted in a ClassCastException being thrown by the Aggregations framework:
Caused by: java.lang.ClassCastException: com.hazelcast.mapreduce.aggregation.impl.SupplierConsumingMapper$SimpleEntry cannot be cast to com.hazelcast.query.impl.QueryableEntry
at com.hazelcast.query.Predicates$AbstractPredicate.readAttribute(Predicates.java:859)
at com.hazelcast.query.Predicates$EqualPredicate.apply(Predicates.java:779)
at com.hazelcast.mapreduce.aggregation.impl.PredicateSupplier.apply(PredicateSupplier.java:58)
at com.hazelcast.mapreduce.aggregation.impl.SupplierConsumingMapper.map(SupplierConsumingMapper.java:55)
at com.hazelcast.mapreduce.impl.task.KeyValueSourceMappingPhase.executeMappingPhase(KeyValueSourceMappingPhase.java:49)
I then changed to use Predicates instead of PredicateBuilder().getEntryObject() for Attempt #2:
public static int reportCountforSource(String sourceId)
{
#SuppressWarnings("unchecked")
Predicate<String, Report> predicate = Predicates.equal("sourceId", sourceId);
Supplier<String, Report, Object> supplier = Supplier.fromPredicate(predicate);
Long count = reportMap.aggregate(supplier, Aggregations.count());
return count.intValue();
}
This resulted in the same ClassCastException.
Finally, I used a lambda to implement the Predicate interface in Attempt #3:
public static int reportCountforSource(String sourceId)
{
Predicate<String, Report> predicate = (entry) -> entry.getValue().getSourceId().equals(sourceId);
Supplier<String, Report, Object> supplier = Supplier.fromPredicate(predicate);
Long count = reportMap.aggregate(supplier, Aggregations.count());
return count.intValue();
}
This attempt finally works.
Question #1: Is this a bug in Hazelcast? It seems that the Aggregations framework should support a Predicate constructed from either Predicates or PredicateBuilder? If not, then a new type should be created (e.g., AggregationPredicate) to avoid this kind of confusion.
Question #2 (related to #1): Using the lambda Predicate results in the index I created not being used. Instead, each entry within the map is being deserialized to determine if it matches the Predicate, which slows things down quite a bit. Is there any way to create a Supplier from a Predicate that will use the index? (EDIT: I verified that each entry is being deserialized by putting a counter in the readPortable method).

this looks like a Hazelcast bug. I guess I never created a unittest to test a Predicate created by PredicateBuilder. Can you please file an issue at github?
Currently indexes are not supported on mapreduce, whatever you try. The indexing system will be rewritten in the near future to also support all kinds of non-primitive indexes like partial or stuff.
Another thing that is not yet available is an optimized reader for Portable objects which would prevent full deserialization.

Related

Cucumber V5-V6 - passing complex object in feature file step

So I have recently migrated to v6 and I will try to simplify my question
I have the following class
#AllArgsConstructor
public class Songs {
String title;
List<String> genres;
}
In my scenario I want to have something like:
Then The results are as follows:
|title |genre |
|happy song |romance, happy|
And the implementation should be something like:
#Then("Then The results are as follows:")
public void theResultsAreAsFollows(Songs song) {
//Some code here
}
I have the default transformer
#DefaultParameterTransformer
#DefaultDataTableEntryTransformer(replaceWithEmptyString = "[blank]")
#DefaultDataTableCellTransformer
public Object transformer(Object fromValue, Type toValueType) {
ObjectMapper objectMapper = new ObjectMapper();
return objectMapper.convertValue(fromValue, objectMapper.constructType(toValueType));
}
My current issue is that I get the following error: Cannot construct instance of java.util.ArrayList (although at least one Creator exists)
How can I tell cucumber to interpret specific cells as lists? but keeping all in the same step not splitting apart? Or better how can I send an object in a steps containing different variable types such as List, HashSet, etc.
If I do a change and replace the list with a String everything is working as expected
#M.P.Korstanje thank you for your idea. If anyone is trying to find a solution for this here is the way I did it as per suggestions received. Inspected to see the type fromValue has and and updated the transform method into something like:
if (fromValue instanceof LinkedHashMap) {
Map<String, Object> map = (LinkedHashMap<String, Object>) fromValue;
Set<String> keys = map.keySet();
for (String key : keys) {
if (key.equals("genres")) {
List<String> genres = Arrays.asList(map.get(key).toString().split(",", -1));
map.put("genres", genres);
}
return objectMapper.convertValue(map, objectMapper.constructType(toValueType));
}
}
It is somehow quite specific but could not find a better solution :)

Why am I getting poor performance with Hazelcast relative to database?

I have a cluster that I routinely run with several nodes and I am interested in resolving some performance issues. It could be that what I am doing is correct but I am not entirely sure and could use some expert guidance. The goal of this project was to offload database data into the hazelcast map to provide more scalable and performant access.
Assume there are three nodes in the cluster and there are 30,000 entries in the container map spread roughly evenly across the cluster. For the sake of the question assume a simple structure like so with its incumbent getters, setters, constructors and so on:
class Container {
int id;
Set<Integer> dataItems;
}
class Data {
int id;
String value;
}
The map config for the two maps looks like the following:
<map name="Container">
<in-memory-format>OBJECT</in-memory-format>
<backup-count>1</backup-count>
<async-backup-count>0</async-backup-count>
<time-to-live-seconds>0</time-to-live-seconds>
<max-idle-seconds>259200</max-idle-seconds>
<eviction-policy>LRU</eviction-policy>
<max-size policy="PER_NODE">0</max-size>
<eviction-percentage>25</eviction-percentage>
<merge-policy>com.hazelcast.map.merge.PutIfAbsentMapMergePolicy</merge-policy>
</map>
As you can see this map has a large eviction time but is used heavily. Since the data experiences heavy write traffic as well as even heavier read traffic, I thought a near cache may not be entirely helpful as invalidations are quick. Now a standard iteration strategy if this were a local data set would be something like the following:
public List<Map<String, Object>> jsonMap(final Set<Integer> keys) {
final IMap<Integer, Container> cmap = hazelcast.getMap("Containers");
final IMap<Integer, Data> dmap = hazelcast.getMap("Data");
final List<Map<String, Object>> result = new ArrayList<>();
cmap.getAll(keys).values().stream().forEach((c) -> {
final Map<String, Object> cJson = new HashMap<>();
result.add(cJson);
cJson.put("containerId", c.id);
final List<Map<String, Object>> dataList = new ArrayList<>();
cJson.put("data", dataList);
dmap.getAll(c.dataItems).values().stream().forEach(d -> {
final Map<String, Object> dJson = new HashMap<>();
dataList.add(dJson);
dJson.put("id", d.id);
dJson.put("value", d.value);
});
});
return result;
}
As you can see there is simple iteration here to create a JSON representation. However since the data is scattered across the nodes we have found this to be extremely slow in performance. An order of magnitude slower than simply getting the data from the database directly. That has lead some to question the strategy of using hazelcast at all. As a solution I proposed the redesign of the system to use a completable future created with an execution callback.
public <K, R> CompletableFuture<R> submitToKeyOwner(final K key, final String executor, final Callable<R> callable) {
final CompletableFuture<R> future = new CompletableFuture<>();
hazelcast.getExecutorService(executor).submitToKeyOwner((Callable<R> & Serializable) callable, key, new FutureExecutionCallback<>(future));
return future;
}
public class FutureExecutionCallback<R> implements ExecutionCallback<R> {
private final CompletableFuture<R> future;
public FutureExecutionCallback(final CompletableFuture<R> future) {
this.future = future;
}
#Override
public void onResponse(final R response) {
future.complete(response);
}
#Override
public void onFailure(final Throwable t) {
future.completeExceptionally(t);
}
}
public List<Map<String, Object>> jsonMap2(final Set<Integer> keys) {
final List<Map<String, Object>> result = new ArrayList<>();
keys.stream().forEach(k -> {
result.add(submitToKeyOwner(k, (Callable<Map<String, Object>> & Serializable) () -> {
final IMap<Integer, Container> cmap = hazelcast.getMap("Containers");
final Container c = cmap.get(k);
final Map<String, Object> cJson = new HashMap<>();
cJson.put("containerId", c.id);
final List<Map<String, Object>> dataList = new ArrayList<>();
cJson.put("data", dataList);
c.dataItems.stream().map((dk) ->
dataList.add(submitToKeyOwner(dk, (Callable<Map<String, Object>> & Serializable) () -> {
final IMap<Integer, Data> dmap = hazelcast.getMap("Data");
final Data d = dmap.get(dk);
final Map<String, Object> dJson = new HashMap<>();
dJson.put("id", d.id);
dJson.put("value", d.value);
return dJson;
}).join()));
return cJson;
}).join());
});
return result;
}
Essentially I have devolved everything into a submitToKey and used Completable futures to wrap it all up. The logic being that the fetch of the object will run only on the node where it is locally stored. Although this works it is still slower than accessing the database directly given the hundreds of records we are accessing when a single database Hibernate call would bring the records in nanoseconds, this one is measured in tens of milliseconds. That seems counterintuitive in some ways. I would think the access to the cache should be much quicker than it actually is. Perhaps I am doing something wrong both in the implementation of the iteration or just a general paradigm. Entry processors are not an option because although I have posted a trivial example, the real example uses other maps in its process as well and entry processors have serious limitations. Using map reduce is not appropriate because the administration overhead of the job has proven to be more costly than either of these two methods.
The question I have is whether each of these is the right paradigm and if I should be expecting tens or hundreds of milliseconds in latency? Is this just the cost of doing business in a clusterable fault tolerant world or is there something I can do to reduce the time? Finally is there any better paradigm to use when accessing data in this manner?
Thanks a bunch for your time.
It won't solve your problem, but it's worth mentioning that <in-memory-format>BINARY</in-memory-format> usually yields better performance than <in-memory-format>OBJECT</in-memory-format> (using OBJECT adds a serializations step to map.get()).
From the docs:
Regular operations like get rely on the object instance. When the OBJECT format is used and a get is performed, the map does not return the stored instance, but creates a clone. Therefore, this whole get operation includes a serialization first on the node owning the instance, and then a deserialization on the node calling the instance. When the BINARY format is used, only a deserialization is required; this is faster.
Also, I read you are using Hibernate, have you considered simply using Hazelcast as Hibernate second level cache (instead of implementing the cache logic)? (works for hibernate3 and hibernate4)
(And one last thing, I believe setting eviction-percentage and eviction-policy does nothing unless you set max-size).

GXT Grid ValueProvider / PropertyAccess for a Map<K,V> Datastore?

Rather than using Bean model objects, my data model is built on Key-Value pairs in a HashMap container.
Does anyone have an example of the GXT's Grid ValueProvider and PropertyAccess that will work with a underlying Map?
It doesn't have one built in, but it is easy to build your own. Check out this blog post for a similar way of thinking, especially the ValueProvider section: http://www.sencha.com/blog/building-gxt-charts
The purpose of a ValueProvider is to be a simple reflection-like mechanism to read and write values in some object. The purpose of PropertyAccess<T> then is to autogenerate some of these value/modelkey/label provider instances based on getters and setters as are found on Java Beans, a very common use case. It doesn't have much more complexity than that, it is just a way to simply ask the compiler to do some very easy boilerplate code for you.
As that blog post shows, you can very easily build a ValueProvider just by implementing the interface. Here's a quick example of how you could make one that reads a Map<String, Object>. When you create each instance, you tell it which key are you working off of, and the type of data it should find when it reads out that value:
public class MapValueProvider<T> implements
ValueProvider<Map<String, Object>, T> {
private final String key;
public MapValueProvider(String key) {
this.key = key;
}
public T getValue(Map<String, Object> object) {
return (T) object.get(key);
}
public void setValue(Map<String, Object> object, T value) {
object.put(key, value);
}
public String getPath() {
return key;
}
}
You then build one of these for each key you want to read out, and can pass it along to ColumnConfig instances or whatever else might be expecting them.
The main point though is that ValueProvider is just an interface, and can be implemented any way you like.

Is this Object Casting pattern acceptable in SharePoint?

I'm creating a SharePoint application, and am trying some new things to create what amounts to an API for Data Access to maintain consistency and conventions.
I haven't seen this before, and that makes me think it might be bad :)
I've overloaded the constructor for class Post to only take an SPListItem as a parameter. I then have an embedded Generic List of Post that takes an SPListItemCollection in the method signature.
I loop through the items in a more efficient for statement, and this means if I ever need to add or modify how the Post object is cast, I can do it in the Class definition for a single source.
class Post
{
public int ID { get; set; }
public string Title { get; set; }
public Post(SPListItem item)
{
ID = item.ID;
Title = (string)item["Title"];
}
public static List<Post> Posts(SPListItemCollection _items)
{
var returnlist = new List<Post>();
for (int i = 0; i < _items.Count; i++) {returnlist.Add(new Post(_items[i]));}
return returnlist;
}
}
This enables me to do the following:
static public List<Post> GetPostsByCommunity(string communityName)
{
var targetList = CoreLists.SystemAccount.Posts(); //CAML emitted for brevity
return Post.Posts(targetList.GetItems(query)); //Call the constructor
}
Is this a bad idea?
This approach might be suitable, but that FOR loop causes some concern. _items.Count will force the SPListItemCollection to retrieve ALL those items in the list from the database. With large lists, this could either a) cause a throttling exception, or b) use up a lot of resources. Why not use a FOREACH loop? With that, I think the SPListItems are retrieved and disposed one at a time.
If I were writing this I would have a 'Posts' class as well 'Post', and give it the constructor accepting the SPListItemCollection.
To be honest, though, the few times I've seen people try and wrap SharePoint SPListItems, it's always ended up seeming more effort than it's worth.
Also, if you're using SharePoint 2010, have you considered using SPMetal?

Abstract Azure TableServiceEntity

I want to abstract the implementation of my Azure TableServiceEntities so that I have one entity, that will take an object, of any type, use the properties of that object as the properties in the TableServiceEntity.
so my base object would be like
public class SomeObject
{
[EntityAttribute(PartitionKey=true)]
public string OneProperty {get; set:}
[EntityAttribute(RowKey=true)]
public string TwoProperty {get; set;}
public string SomeOtherProperty {get;set;}
}
public class SomeEntity<T> : TableServiceEntity
{
public SomeEntity(T obj)
{
foreach (var propertyInfo in properties)
{
object[] attributes = propertyInfo.GetCustomAttributes(typeof (DataObjectAttributes), false);
foreach (var attribute in attributes)
{
DataObjectAttributes doa = (DataObjectAttributes) attribute;
if (doa.PartitionKey)
PartitionKey = propertyInfo.Name;
}
}
}
}
Then I could access the entity in the context like this
var objects =
(from entity in context.CreateQuery<SomeEntity>("SomeEntities") select entity);
var entityList = objects.ToList();
foreach (var obj in entityList)
{
var someObject = new SomeObject();
SomeObject.OneProperty = obj.OneProperty;
SomeObject.TwoProperty = obj.TwoProperty;
}
This doesn't seem like it should be that difficult, but I have a feeling I have been looking at too many possible solutions and have just managed to confuse myself.
Thanks for any pointers.
Take a look at Lokad Cloud O/C mapper I think the source code imitates what you're attempting, but has insightful rationale about its different approach to Azure table storage.
http://lokadcloud.codeplex.com/
I have written an alternate Azure table storage client in F#, Lucifure Stash, which supports many abstractions including persisting a dictionary object. Lucifure Stash also supports large data columns > 64K, arrays & lists, enumerations, out of the box serialization, user defined morphing, public and private properties and fields and more.
It is available free for personal use at http://www.lucifure.com or via NuGet.com.
What you are attempting to achieve, a single generic class for any entity, can be implemented in Lucifure Stash by using the [StashPool] attribute on a dictionary type.
I have written a blog post about the table storage context, entities by specifying the entity type. Maybe it can help you http://wblo.gs/a2G
It seems you still want to use concrete types. Thus, the SomeEntity is a bit redundant. Actually, TableServiceEntity is already an abstract class. You can derive SomeObject from TableServiceEntity. From my experience, this won’t introduce any issues to your scenario.
In addition, even with your custom SomeEntity, it is failed to remove the dependence on the concrete SomeObject class in your last piece of code anyway.
Best Regards,
Ming Xu.

Resources