Spring Integration JDBC Poller. Update with compound Primary Key - spring-integration

I am using a JDBC poller to process some db records and when the workflow is finished, i need to update those records. I could not find a way to make if work for tables with compound keys.
This is my example. Table EVENTS. with primary key (DATETIME, EVENT_LOCATION, EVENT_TYPE). I cannot change the schema.
Rows are mapped into a POJO with the property names: dateTime, location, type.
<int-jdbc:inbound-channel-adapter
query="select * from events where uploaded = 0”
channel="fromdb" data-source="dataSource"
max-rows="${app.maxrows}"
row-mapper=“eventRowMapper”
update="update events set uploaded=1 where DATETIME =:dateTime AND EVENT_LOCATION=:location AND EVENT_TYPE = :type”>
<int:poller fixed-delay="${app.intervalmsecs}" />
But I get a syntax error response from the server when the poller tries to update those records.
After reading the docs, it seems that the poller uses ´(:id)´ to update the rows , but it assumes a single-column PK. I could not find any information about updating rows with multiple columns in the primary key
Is there any way to update rows with multiple column Primary Key? Or should i use an outbound jdbc or code my own update solution?

Show the complete stack trace; your Event object and row mapper; I just changed one of the tests from
JdbcPollingChannelAdapter adapter = new JdbcPollingChannelAdapter(embeddedDatabase,
"select * from item where id not in (select id from copy)");
adapter.setUpdateSql("insert into copy values(:id,10)");
to
JdbcPollingChannelAdapter adapter = new JdbcPollingChannelAdapter(embeddedDatabase,
"select * from item where id not in (select foo from copy)");
adapter.setUpdateSql("insert into copy values(:foo,:status)");
and it worked just fine.
As long as the column appears as a property of the result of the select query it will work. (The result being the object created by the rowmapper).
i.e. dateTime, location and type must be properties on Event.
Also, based on your update-query, it looks like you should have update-per-row set to true since it only updates one row.

Related

Best way to retrieve an item from dynamoDB using attribute which is not partition key

I am new to dynamoDB and need some suggestion from experienced people here . There is a table created with below model
orderId - PartitionKey
stockId
orderDetails
and there is a new requirement to fetch all the orderIds which includes particular stockId. The item in the table looks like
{
"orderId":"ord_12234",
"stockId":[
123221,
234556,
123231
],
"orderDetails":{
"createdDate":"",
"dateOfDel":""
}
}
provided the scenario that stockId can be an array of id it cant be made as GSI .Performing scan would be heavy as the table has large number of records and keeps growing . what would be the best option here , How the existing table can be modified to achieve this in efficient way
You definitely want to avoid scanning the table. One option is to modify your schema to a Single Table Design where you have order items and order/stock items.
For example:
pk
sk
orderDetails
stockId
...
order#ord_12234
order#ord_12234
{createdDate:xxx, dateOfDel:yyy}
...
order#ord_12234
stock#123221
23221
...
order#ord_12234
stock#234556
234556
...
order#ord_12234
stock#123231
123231
...
You can then issue the following queries, as needed:
get the order details with a query on pk=order#ord_12234, sk=order#ord_12234
get the stocks for a given order with a query on pk=order#ord_12234, sk=stock#
get everything associated with the order with a query on pk=order#ord_12234

Hazelcast Jet - Group By Use Case

We have a requirement to group by multiple fields in a dynamic way on a huge data set. The data is stored in Hazelcast Jet cluster. Example: if Person class contains 4 fields: age, name, city and country. We first need to group by city and then by country and then we may group by name based on conditional parameters.
We already tried using Distributed collection and is not working. Even when we tried using Pipeline API it is throwing error.
Code:
IMap res= client.getMap("res"); // res is distrbuted map
Pipeline p = Pipeline.create();
JobConfig jobConfig = new JobConfig();
p.drawFrom(Sources.<Person>list("inputList"))
.aggregate(AggregateOperations.groupingBy(Person::getCountry))
.drainTo(Sinks.map(res));
jobConfig = new JobConfig();
jobConfig.addClass(Person.class);
jobConfig.addClass(HzJetListClientPersonMultipleGroupBy.class);
Job job = client.newJob(p, jobConfig);
job.join();
Then we read from the map in the client and destroy it.
Error Message on the server:
Caused by: java.lang.ClassCastException: java.util.HashMap cannot be cast to java.util.Map$Entry
groupingBy aggregates all the input items into a HashMap where the key is extracted using the given function. In your case it aggregates a stream of Person items into a single HashMap<String, List<Person>> item.
You need to use this:
p.drawFrom(Sources.<Person>list("inputList"))
.groupingKey(Person::getCountry)
.aggregate(AggregateOperations.toList())
.drainTo(Sinks.map(res));
This will populate the res map with a list of persons in each city.
Remember, without groupingKey() the aggregation is always global. That is, all items in the input will be aggregated to one output item.

How to avoid Cassandra ALLOW FILTERING?

I have Following Data Model :-
campaigns {
id int PRIMARY KEY,
scheduletime text,
SchduleStartdate text,
SchduleEndDate text,
enable boolean,
actionFlag boolean,
.... etc
}
Here i need to fetch the data basing on start date and end data with out ALLOW FILTERING .
I got more suggestions to re-design schema to full fill the requirement But i cannot filter the data basing on id since i need the data in b/w the dates .
Some one give me a good suggestion to full fill this scenario to execute Following Query :-
select * from campaings WHERE startdate='XXX' AND endDate='XXX' ; // With out Allow Filtering thing
CREATE TABLE campaigns (
SchduleStartdate text,
SchduleEndDate text,
id int,
scheduletime text,
enable boolean,
PRIMARY KEY ((SchduleStartdate, SchduleEndDate),id));
You can make the below queries to the table,
slect * from campaigns where SchduleStartdate = 'xxx' and SchduleEndDate = 'xx'; -- to get the answer to above question.
slect * from campaigns where SchduleStartdate = 'xxx' and SchduleEndDate = 'xx' and id = 1; -- if you want to filter the data again for specific ids
Here the SchduleStartdate and SchduleEndDate is used as the Partition Key and the ID is used as the Clustering key to make sure the entries are unique.
By this way, you can filter based on start, end and then id if needed.
One downside with this will be if you only need to filter by id that wont be possible as you need to first restrict the partition keys.

Data Versioning in Cassandra with CQL3

I am quite a n00b in Cassandra (I'm mainly from an RDBMS background with some NoSQL here and there, like Google's BigTable and MongoDB) and I'm struggling with the data modelling for the use cases I'm trying to satisfy. I looked at this and this and even this but they're not exactly what I needed.
I have this basic table:
CREATE TABLE documents (
itemid_version text,
xml_payload text,
insert_time timestamp,
PRIMARY KEY (itemid_version)
);
itemid is actually a UUID (and unique for all documents), and version is an int (version 0 is the "first" version). xml_payload is the full XML doc, and can get quite big. Yes, I'm essentially creating a versioned document store.
As you can see, I concatenated the two to create a primary key and I'll get to why I did this later as I explain the requirements and/or use cases:
user needs to get the single (1) doc he wants, he knows the item id and version (not necessarily the latest)
user needs to get the single (1) doc he wants, he knows the item id but does not know the latest version
user needs the version history of a single (1) doc.
user needs to get the list (1 or more) of docs he wants, he knows the item id AND version (not necessarily the latest)
I will be writing the client code that will perform the use cases, please excuse the syntax as I'm trying to be language-agnostic
first one's straightforward:
$itemid_version = concat($itemid, $version)
$doc = csql("select * from documents where itemid_version = {0};"
-f $itemid_version)
now to satisfy the 2nd and 3rd use cases, I am adding the following table:
CREATE TABLE document_versions (
itemid uuid,
version int,
PRIMARY KEY (itemid, version)
) WITH clustering order by (version DESC);
new records will be added as new docs and new versions of existing docs are created
now we have this (use case #2):
$latest_itemid, $latest_version = csql("select itemid,
version from document_versions where item_id = {0}
order by version DESC limit 1;" -f $itemid)
$itemid_version = concat($latest_itemid, $latest_version)
$doc = csql("select * from documents where itemid_version = {0};"
-f $itemid_version)
and this (use case #3):
$versions = csql("select version from document_versions where item_id = {0}"
-f $itemid)
for the 3rd requirement, I am adding yet another table:
CREATE TABLE latest_documents (
itemid uuid,
version int,
PRIMARY KEY (itemid, version)
)
records are inserted for new docs, records are updated for existing docs
and now we have this:
$latest_itemids, $latest_versions = csql("select itemid, version
from latest_documents where item_id in ({0})" -f $itemid_list.toCSV())
foreach ($one_itemid in $latest_itemids, $one_version in $latest_versions)
$itemid_version = concat($latest_itemid, $latest_version)
$latest_docs.append(
cql("select * from documents where itemid_version = {0};"
-f $itemid_version))
Now I hope it's clear why I concatenated itemid and version to create an index for documents as opposed to creating a compound key: I cannot have OR in the WHERE clause in SELECT
You can assume that only one process will do the inserts/updates so you don't need to worry about consistency or isolation issues.
Am I on the right track here? there are quite a number of things that doesn't sit well with me...but mainly because I don't understand Cassandra yet:
I feel that the primary key for documents should be a composite of (itemid, version) but I can't satisfy use case #4 (return a list from a query)...I can't possibly use a separate SELECT statement for each document due to the performance hit (network overhead)...or can (should) I?
2 trips to get a document if the version is not known beforehand. probably a compromise I have to live with, or maybe there's a better way.
How would this work Dexter?
It is actually very similar to your solution actually except you can store all versions and be able to fetch the 'latest' version just from one table (document_versions).
In most cases I think you can get what you want in a single SELECT except use case #2 where fetching the most recent version of a document where a pre SELECT is needed on document_versions first.
SECOND ATTEMPT
(I removed the code from the first attempt, apologies to anyone who was following in the comments).
CREATE TABLE documents (
itemid_version text,
xml_payload text,
insert_time timestamp,
PRIMARY KEY (itemid_version)
);
CREATE TABLE document_versions (
itemid text,
version int,
PRIMARY KEY (itemid, version)
) WITH CLUSTERING ORDER BY (version DESC);
INSERT INTO documents (itemid_version, xml_payload, insert_time) VALUES ('doc1-1', '<?xml>1st</xml>', '2014-05-21 18:00:00');
INSERT INTO documents (itemid_version, xml_payload, insert_time) VALUES ('doc1-2', '<?xml>2nd</xml>', '2014-05-21 18:00:00');
INSERT INTO documents (itemid_version, xml_payload, insert_time) VALUES ('doc2-1', '<?xml>1st</xml>', '2014-05-21 18:00:00');
INSERT INTO documents (itemid_version, xml_payload, insert_time) VALUES ('doc2-2', '<?xml>2nd</xml>', '2014-05-21 18:00:00');
INSERT INTO document_versions (itemid, version) VALUES ('doc1', 1);
INSERT INTO document_versions (itemid, version) VALUES ('doc1', 2);
INSERT INTO document_versions (itemid, version) VALUES ('doc2', 1);
INSERT INTO document_versions (itemid, version) VALUES ('doc2', 2);
user needs to get the single (1) doc he wants, he knows the item id and version (not necessarily the latest)
SELECT * FROM documents WHERE itemid_version = 'doc1-2';
user needs to get the single (1) doc he wants, he knows the item id but does not know the latest version
(You would feed concatenated itemid + version in result of first query into second query)
SELECT * FROM document_versions WHERE itemid = 'doc2' LIMIT 1;
SELECT * FROM documents WHERE itemid_version = 'doc2-2';
user needs the version history of a single (1) doc.
SELECT * FROM document_versions WHERE itemid = 'doc2';
user needs to get the list (1 or more) of docs he wants, he knows the item id AND version (not necessarily the latest)
SELECT * FROM documents WHERE itemid_version IN ('doc1-2', 'doc2-1');
Cheers,
Lets see if we can come up with a model in a top down fashion starting from your queries:
CREATE TABLE document_versions (
itemid uuid,
name text STATIC,
vewrsion int,
xml_payload text,
insert_time timestamp,
PRIMARY KEY ((itemid), version)
) WITH CLUSTERING ORDER BY (version DESC);
Use case 1: user needs to get the single (1) doc he wants, he knows the item id and version (not necessarily the latest)
SELECT * FROM document_versions
WHERE itemid = ? and version = ?;
Use case 2: user needs to get the single (1) doc he wants, he knows the item id but does not know the latest version
SELECT * FROM document_versions
WHERE itemid = ? limit 1;
Use case 3: user needs the version history of a single (1) doc.
SELECT * FROM document_versions
WHERE itemid = ?
Use case 4: user needs to get the list (1 or more) of docs he wants, he knows the item id AND version (not necessarily the latest)
SELECT * FROM documents
WHERE itemid = 'doc1' and version IN ('1', '2');
One table for all these queries is the correct approach. I would suggest taking the Datastax free online course: DS220 Data Modeling

Cassandra count query failing due to AssertionError

I am trying out Cassandra for the first time and running it locally for simple session management db. [Cassandra-2.0.4, CQL3, datastax driver 2.0.0-rc2]
The following count query works fine when there is no data in the table:
select count(*) from session_data where app_name=? and account=? and last_access > ?
But after even a single row is inserted into the table, the query fails with the following error:
java.lang.AssertionError
at org.apache.cassandra.db.filter.ExtendedFilter$WithClauses.getExtraFilter(ExtendedFilter.java:258)
at org.apache.cassandra.db.ColumnFamilyStore.filter(ColumnFamilyStore.java:1719)
at org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1674)
at org.apache.cassandra.db.PagedRangeCommand.executeLocally(PagedRangeCommand.java:111)
at org.apache.cassandra.service.StorageProxy$LocalRangeSliceRunnable.runMayThrow(StorageProxy.java:1418)
at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1931)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Here is the schema I am using:
CREATE KEYSPACE session WITH replication= {'class': 'SimpleStrategy', 'replication_factor': 1};
CREATE TABLE session_data (
username text,
session_id text,
app_name text,
account text,
last_access timestamp,
created_on timestamp,
PRIMARY KEY (username, session_id, app_name, account)
);
create index sessionIndex ON session_data (session_id);
create index sessionAppName ON session_data (app_name);
create index lastAccessIndex ON session_data (last_access);
I am wondering if there is something wrong in the table definition/indexes or the query itself. Any help/insight would be greatly appreciated.
It looks like you're tripping over a bug in Cassandra. Here is the assertion and related comments in the Cassandra sources:
/*
* This method assumes the IndexExpression names are valid column names, which is not the
* case with composites. This is ok for now however since:
* 1) CompositeSearcher doesn't use it.
* 2) We don't yet allow non-indexed range slice with filters in CQL3 (i.e. this will never be
* called by CFS.filter() for composites).
*/
assert !(cfs.getComparator() instanceof CompositeType);
This code was modified between cassandra-2.0.4 and trunk as part of ticket CASSANDRA-5417, but it's not clear to me that the author was aware of this issue. The assertion was removed, but the comment was not. I would recommend submitting a bug report to the Cassandra project.

Resources