How to cleanup the JdbcMetadataStore? - spring-integration

Initially our flow of cimmunicating with google Pub/Sub was so:
Application accepts message
Checks that it doesn't exist in idempotencyStore
3.1 If doesn't exist - put it into idempotency store (key is a value of unique header, value is a current timestamp)
3.2 If exist - just ignore this message
When processing is finished - send acknowledge
In the acknowledge successfull callback - remove this msg from metadatastore
The point 5 is wrong because theoretically we can get duplicated message even after message has processed. Moreover we found out that sometimes message might not be removed even although successful callback was invoked( Message is received from Google Pub/Sub subscription again and again after acknowledge[Heisenbug]) So we decided to update value after message is proccessed and replace timestamp with "FiNISHED" string
But sooner or later we will encounter that this table will be overcrowded. So we have to cleanup messages in the MetaDataStore. We can remove messages which are processed and they were processed more 1 day.
As was mentioned in the comments of https://stackoverflow.com/a/51845202/2674303 I can add additional column in the metadataStore table where I could mark if message is processed. It is not a problem at all. But how can I use this flag in the my cleaner? MetadataStore has only key and value

In the acknowledge successfull callback - remove this msg from metadatastore
I don't see a reason in this step at all.
Since you say that you store in the value a timestamp that means that you can analyze this table from time to time to remove definitely old entries.
In some my project we have a daily job in DB to archive a table for better main process performance. Right, just because we don't need old data any more. For this reason we definitely check some timestamp in the raw to determine if that should go into archive or not. I wouldn't remove data immediately after process just because there is a chance for redelivery from external system.
On the other hand for better performance I would add extra indexed column with timestamp type into that metadata table and would populate a value via trigger on each update or instert. Well, MetadataStore just insert an entry from the MetadataStoreSelector:
return this.metadataStore.putIfAbsent(key, value) == null;
So, you need an on_insert trigger to populate that date column. This way you will know in the end of day if you need to remove an entry or not.

Related

Way to get messages from previous flows in a linear chain

I recently had a scenario like below:
Flow_A ------> Flow_B ------> Flow_C ------> Flow_D
Where
Flow_A is the initiator and should pass messageA.
Flow_B should pass messageA+messageB.
Flow_C should pass messageA+messageB+messageC
Flow_D should pass messageA+messageB+messageC+messageD.
So, I was thinking to enhance the headers with an old message and again pass to another flow. But, it will be very bulky at the end.
Should I store the message somewhere and then pass the messageId in the header, so that the next flow can get the old message with the messageId?
What should be the best way to achieve this?
See Claim Check pattern: https://docs.spring.io/spring-integration/docs/current/reference/html/message-transformation.html#claim-check
You store a message using ClaimCheckInTransformer and get its id as an output payload.
You move this id into a header and produce the next message.
Repeat #1 and #2 steps for this second message to be ready for the third one.
And so on to prepare environment for the fourth message.
To restore those messages you need to repeat the procedure opposite direction.
Get a header from the message into a payload, remove it and call ClaimCheckOutTransformer to restore a stored message. I say "remove header" to let the stack to be restored properly: the ClaimCheckOutTransformer has a logic like this:
AbstractIntegrationMessageBuilder<?> responseBuilder = getMessageBuilderFactory().fromMessage(retrievedMessage);
// headers on the 'current' message take precedence
responseBuilder.copyHeaders(message.getHeaders());
So, without removing that header, the same message id is going to be carried into the next step and you will be is a loop - StackOverflowError.
Another is to store messages manually somewhere, e.g. MetadataStore and collect their ids in the list for payload. This way you don't need extra logic to deal with headers. Everything in a list of your payload. You can consult the store any time for any id item in that list!

How does the fetchNext(int) method works in jooq?

From the documentation of fetchNext(int number) -
"This will conveniently close the Cursor, after the last Record was fetched."
Assuming number=100 and there are 1000 records in total.
Will it close the cursor once the 100th record was fetched, or when the 1000th was fetched?
In other words, what is the "last record" referred in the documentation?
Cursor<Record> records = dsl.select...fetchLazy();
while (records.hasNext()) {
records.fetchNext(100).formatCSV(out);
}
out.close();
This convenience is a historic feature in jOOQ, which will be removed eventually: https://github.com/jOOQ/jOOQ/issues/8884. As with every Closeable resource in Java, you should never rely on this sort of auto closing. It is always better to eagerly close the resource when you know you're done using it. In your case, ideally, wrap the code in a try-with-resources statement.
What the Javadoc means is that the underlying JDBC ResultSet will be closed as soon as jOOQ's call to ResultSet.next() yields false, i.e. the database returns no more records. So, no. If there are 1000 records in total from your select, and you're only fetching 100, then the cursor will not be closed. If it were, this wouldn't be a "convenience feature", but break all sorts of other API, including the one you've called. It's totally possible to call fetchNext(100) twice, or in a loop, as you did.

Hazelcast EntryListener: Why is set() returning and oldValue different than null?

I have joined a new project about a year ago and I have started to do some minor tasks with Hazelcast, including the creation of MapStores and EntryListeners for our IMaps.
Since the beginning that I am aware of the difference between using set() and put(), with the ladder carrying the weight of deserializing and returning the old value. That is why I would use put when we needed to access the oldValue in the EntryListeners and use set otherwise.
However, for the past weeks, my team started to report occurrences where map insertions done with set() would trigger the cEntryUpdated with a populated oldValue, which "breaks" some of our current logic.
Now I don't know if this was some recent change released by Hazelcast (we are currently using version 3.12.1) or if I'm just doing something wrong from the beginning. Shouldn't I expect that set would always trigger the listener with an empty oldValue?
There is always an old value, but the writer and listener are independently configurable for whether they receive it.
On map, the writer can use V Map.put(K,V) to receive the old value.
Or, the writer can use void Map.put(K,V) to not receive the old value.
On a listener, use the include-value=true to receive the old and new values, and include-value=false not to. On an insert, the old value will be null. On a delete, the new value will be null.

Insert bulk data into big-query without keeping it in streaming buffer

My motive here is as follow:
Insert bulk records into big-query every half an hour
Delete the record if the exists
Those records are transactions which change their statuses from: pending, success, fail and expire.
BigQuery does not allow me to delete the rows that are inserted just half an hour ago as they are still in the streaming buffer.
can anyone suggest me some workaround as i am getting some duplicate rows in my table.
A better course of action would be to:
Perform periodic loads into a staging table (loading is a free operation)
After the load completes, execute a MERGE statement.
You would want something like this:
MERGE dataset.TransactionTable dt
USING dataset.StagingTransactionTable st
ON dt.tx_id = st.tx_id
WHEN MATCHED THEN
UPDATE dt.status = st.status
WHEN NOT MATCHED THEN
INSERT (tx_id, status) VALUES (st.tx_id, st.status)

OrientDB ORecordHookAbstract is firing onRecordAfterRead twice if no index is present

I'm using the hook functionality in OrientDB to implement encryption of some document fields "transparently" to the caller.
Basically the process is:
1 : when onRecordBeforeCreate or onRecordBeforeUpdate event fires we apply encryption to some data fields and change the document before is created or updated
byte[] data = doc.field("data");
byte[] encrypted = encrypt(data);
doc.field("data", encrypted);
2 : when onRecordAfterRead fires we get the encrypted data from the document fields, decrypt them, and change the document fields again with the decrypted data.
byte[] encrypted = doc.field("data");
byte[] decrypted = decrypt(encrypted);
doc.field("data", decrypted);
The problem is that the event onRecordAfterRead is firing twice and in the first time the data decrypts correctly (because is encrypted) but on the second time the decryption fails because we already had decrypted it, and so the document "load" fails.
It happens if the query that i execute to load the document uses some field of the document in the filter (where clause).
So for example, the following query does not trigger the problem:
select count from Data;
but the following query triggers the problem:
select from Data where status ="processed";
This is because i don't have an index on the status field. If i add an index then it fires the event only once. So this is related with the use or not of the indexes.
Should the events be fired if OrientDB is "scanning" the documents when executing a query? Shouldn't it only fire when the matching documents are in fact loaded? Is there a way around this?
the onRecordAfterRead should be called once, if is called multiple times should be a bug, you can report it on https://github.com/orientechnologies/orientdb/issues with a test case that reproduce the problem.

Resources