Hazelcast Jet with an IMap source and OBJECT in-memory format - hazelcast-jet

I have items in a Hazelcast IMap in OBJECT format, and I'm using a Jet aggregation operation with that IMap as a pipeline source. I was hoping, because of the OBJECT format, to avoid any serialisation/deserialisation of the items in my IMap during processing, the same way native Hazelcast entry processing and queries work. However, I can see that my items are in fact being serialised and then deserialised before being passed to my aggregator.
Is it possible to avoid the serialisation/deserialisation step when using Jet in this way? If so, how?

Yes, local map reader will always serialize/deserialize the entries. The only way I can think of to work around this is to use a custom source which uses map.localKeySet() and then use mapUsingIMap to do a join on those keys. The source would look like below:
SourceBuilder.batch("localKeys", c -> c.jetInstance().getMap("map"))
.fillBufferFn((map, buf) -> {
for (Object key : map.localKeySet(predicate)) {
buf.add(key);
}
buf.close();
}).distributed(1).build());

Related

Does `ResultQuery#fetchAsync()` work with r2dbc in jOOQ 3.15.1?

I am wanting to use non-blocking r2dbc without 'using a third party reactive stream API', and currently have this working when I configure the DSLContext with JDBC (ie. all the records are printed):
// appended to a jOOQ select query
.fetchAsync()
.thenApply{ it.map(mapping(::Film)) }
.whenComplete { result, _ -> println( result ) }
however, if I configure the DSLContext to use r2dbc (without any other changes), the println( result ) prints null :-(
I:
am using Kotlin, but not bridging to coroutines yet ... only the above calls are involved
am using io.r2dbc:r2dbc-mssql:0.8.6.RELEASE
don't know if r2dbc is 'working' in any sense ... I'm relying on jOOQ to hit me with an exception if not ... I've not seen a single piece of data relayed by r2dbc at this stage.
As of jOOQ 3.15.1, this isn't possible yet, see https://github.com/jOOQ/jOOQ/issues/11717. It's likely this will be fixed in a 3.15.x patch release

How to store set of tuples into cassandra using datastax driver

I'm trying to run my service with Micronaut and Cassandra (currently version 3.11.10) and store a column that is a set of tuples into Cassandra.
example code:
QueryBuilder
.insertInto(table)
.value("column", QueryBuilder.literal(items.map { it.toTuple() }.toSet())))
The toTuple() method is just an extension method that transfer the items into Term objects
When I'm doing so I'm receiving the following error:
Internal Server Error: Could not inline literal of type java.util.Collections$SingletonSet. This happens because the driver doesn't know how to map it to a CQL type. Try passing a TypeCodec or CodecRegistry to literal().
I've checked online in multiple sources but couldn't find a simple way to store a set of tuples into the database without implementing my custom TypeCodec. As I'm sure that I'm not the first person to have this issue, I'm probably doing something completely wrong, however I couldn't find any documentation regarding to what is the correct way of doing this.

Passing sets of properties and nodes as a POST statement wit KOA-NEO4J or BOLT

I am building a REST API which connects to a NEO4J instance. I am using the koa-neo4j library as the basis (https://github.com/assister-ai/koa-neo4j-starter-kit). I am a beginner at all these technologies but thanks to some help from this forum I have the basic functionality working. For example the below code allows me to create a new node with the label "metric" and set the name and dateAdded propertis.
URL:
/metric?metricName=Test&dateAdded=2/21/2017
index.js
app.defineAPI({
method: 'POST',
route: '/api/v1/imm/metric',
cypherQueryFile: './src/api/v1/imm/metric/createMetric.cyp'
});
createMetric.cyp"
CREATE (n:metric {
name: $metricName,
dateAdded: $dateAdded
})
return ID(n) as id
However, I am struggling to know how I can approach more complicated examples. How can I handle situations when I don't know how many properties will be added when creating a new node beforehand or when I want to create multiple nodes in a single post statement. Ideally I would like to be able to pass something like JSON as part of the POST which would contain all of the nodes, labels and properties that I want to create. Is something like this possible? I tried using the below Cypher query and passing a JSON string in the POST body but it didn't work.
UNWIND $props AS properties
CREATE (n:metric)
SET n = properties
RETURN n
Would I be better off switching tothe Neo4j Rest API instead of the BOLT protocol and the KOA-NEO4J framework. From my research I thought it was better to use BOLT but I want to have a Rest API as the middle layer between my front and back end so I am willing to change over if this will be easier in the longer term.
Thanks for the help!
Your Cypher syntax is bad in a couple of ways.
UNWIND only accepts a collection as its argument, not a string.
SET n = properties is only legal if properties is a map, not a string.
This query should work for creating a single node (assuming that $props is a map containing all the properties you want to store with the newly created node):
CREATE (n:metric $props)
RETURN n
If you want to create multiple nodes, then this query (essentially the same as yours) should work (but only if $prop_collection is a collection of maps):
UNWIND $prop_collection AS props
CREATE (n:metric)
SET n = props
RETURN n
I too have faced difficulties when trying to pass complex types as arguments to neo4j, this has to do with type conversions between js and cypher over bolt and there is not much one could do except for filing an issue in the official neo4j JavaScript driver repo. koa-neo4j uses the official driver under the hood.
One way to go about such scenarios in koa-neo4j is using JavaScript to manipulate the arguments before sending to Cypher:
https://github.com/assister-ai/koa-neo4j#preprocess-lifecycle
Also possible to further manipulate the results of a Cypher query using postProcess lifecycle hook:
https://github.com/assister-ai/koa-neo4j#postprocess-lifecycle

Hazelcast/Coherence Grid Computing EntryProcessor with data for each key

I want to use hazelcast or coherence EntryProcessor to process some logic in parallel execution on different nodes where the keys are stored in a cache. I see i can use something like sendToEachKey(EntryProcessor process).
My problem comes when i need to send also with the logic a piece of data to process too that belongs to another system and i receive it(in a http request for example).
Sure i can do something like sendToEachKey(EntryProcessor(data) process). But if the data it's diferent to each key and i want to send to a specific key only his data to process, how can i do that? Why i want to do that is because the data is too big and I have network overload.
Sure that if I open a thread pool to send each data to each key is possible but it is inefficient because of the huge requests.
Thanks!
For Hazelcast you could retrieve all values and send each key it's own EntryProcessor, however this will create a lot of overhead.
The other option would be to use a combination of EntryProcessor and our distributed ExecutorService.
You send a Runnable to an ExecutorService. Inside the Runnable you retrieve the local keyset, retrieve all the external values (all that already local to the node) and than you emit one EntryProcessor per local key. Since you're already local to the node, there's no more traffic flying around (apart from the backups, obviously :)). That said you might want to implement a specific EntryProcessor that only transmits the changed value but not the full processor itself (to save even more traffic).
In Coherence you could use PartitionedService to find association of cache keys to cluster members. Then you could invoke entry processor with data for each member, using PartitionedFilter to make sure that data is sent only to that member. Something like this:
// keys in this map are also keys in cache
void processData(Map<String, Data> externalData) {
PartitionedService partitionedService = (PartitionedService) cache.getCacheService();
Map<Member, Map<String, Data>> dataForMembers = splitDataByMembers(partitionedService, externalData);
for (Entry<Member, Map<String, Data>> dataForMember : dataForMembers.entrySet()) {
Member member = dataForMember.getKey();
Map<String, Data> data = dataForMember.getValue();
PartitionSet partitions = partitionedService.getOwnedPartitions(member);
PartitionedFilter filter = new PartitionedFilter<>(Filters.always(), partitions);
EntryProcessor processor = new MyEntryProcessor(data);
cache.async().invokeAll(filter, processor);
}
}
Map<Member, Map<String, Data>> splitDataByMembers(
PartitionedService partitionedService,
Map<String, Data> externalData) {
Map<Member, Map<String, Data>> dataForMembers = new HashMap<>();
for (Object member : partitionedService.getInfo().getServiceMembers()) {
dataForMembers.put((Member) member, new HashMap<>());
}
for (Entry<String, Data> dataForKey : externalData.entrySet()) {
Member member = partitionedService.getKeyOwner(dataForKey.getKey());
dataForMembers.get(member).put(dataForKey.getKey(), dataForKey.getValue());
}
return dataForMembers;
}
This way there will be only one entry processor invocation for each member in cluster and each member will get only data it is interested in.
I used String as a cache key and arbitrary Data type for data associated with this key, but you could of course use any other types (and you don't have to model external data as a map at all).
In Hazelcast you would be doing executeOnKeys(keys, new EntryProcessor(data)) and this is too much as the data is too big.
Why not
executeOnKey(key1, new EntryProcessor(data1));
executeOnKey(key2, new EntryProcessor(data2));
executeOnKey(key3, new EntryProcessor(data3));
to send the data subset each key needs ?

Insert now() using Cassandra's Java Object-mapping API

What is the equivalent of:
INSERT INTO table (myColumn) VALUES (now())
using the Cassandra object-mapping api?
The #Computed annotation doesnt look like it would work unfortunately.
You can also set the value of your object to a type1 uuid. The jre doesnt have standard function for it but you can use the java driver util, JUG, cassandra-all or even write one yourself. This would be a little different because your setting the time as the time of creation as opposed to coordinator setting time of when it receives the request but with ORM's abstractions you tend to lose some control.
Alternatively there is nothing preventing you from issuing CQL statements while still using the object mapping api. Maybe even adding a query to a method on your object to do it ie:
#Query("UPDATE table SET myColumn = now() WHERE ....")
public ResultSet setNow()

Resources