Spring integration - aggregator from sftp inbound - spring-integration

What is the best solution to aggregate one message from sftp inbound message source that contains multiple files?
We have on remote machine 3 files that need to be received. After that we combine content of those files to one json message and send it forward.
public IntegrationFlow sftpIntegrationFlowBean() {
final Map<String, Object> headers = new HashMap<>();
headers.put("sftpFile", "sftpFile");
final Consumer<AggregatorSpec> aggregator = t -> {
t.sendPartialResultOnExpiry(true);
t.expireGroupsUponCompletion(true);
t.processor(new CustomMessageAggregator());
};
return IntegrationFlows
.from(sftpInboundMessageSource(),
e -> e.id("sftpIntegrationFlow").poller(pollerMetadataSftp))
.enrichHeaders(headers).aggregate(aggregator)
.handle(customMessageSender).get();
}
Poller polls every 15 minutes.
While running this code next thing happens:
Retrieve files and process one of them
After 15 minutes second file is processed
After another 15 minutes third file is processed
And finally after more 15 minutes message is sent to destination
How can this all be done in one operation without delays? I did try this with FileReadingMessageSource, but had a same result.
Thank you in advance.

Increase maxMessagesPerPoll in the PollerMetadata.

Related

what code-instrument should be added to register each http event in MeterRegistry with specific tag & minute value. Event requests are in millions

I need to analyse one http event value which should not be greater than 30mins. & 95% event should belong to this bucket. If it fails send the alert.
My first concern is to get the right metrics in /actuator/prometheus
Steps I took:
As in every http request event, I am getting one integer value called eventMinute.
Using micrometer MeterRegistry, I tried below code
// MeterRegistry meterRegistry ...
meterRegistry.summary("MINUTES_ANALYSIS", tags);
where tag = EVENT_MINUTE which receives some integer value in each
http event.
But this way, it floods the metrics due to millions of event.
Guide me a way please, i am beginner to this. Thanks!!
The simplest solution (which I would recommend you start with) would be to just create 2 counters:
int theThing = //getTheThing()
if(theThing > 30) {
meterRegistry.counter("my.request.counter.abovethreshold").inc()
}
meterRegistry.counter("my.request.counter.total").inc()
You would increment the counter that matches your threshold and another that tracks all requests (or reuse another meter that does that for you).
Then it is simple to setup a chart or alarm:
my_request_counter_abovethreshold/my_request_counter_total < .95
(I didn't test the code. It might need a tiny bit of tweaking)
You'll be able to do a similar thing with DistributionSummary by setting various SLOs (I'm not familiar with them to be able to offer one), but start with something simple first and if it is sufficient, you won't need the other complexity.
There are certain ways to solve this problem
1 ; here is a function which receives tags, name of metrics and a value
public void createOrUpdateHistogram(String metricName, Map<String, String> stringTags, double numericValue)
{
DistributionSummary.builder(metricName)
.tags(tags)
//can enforce slo if required
.publishPercentileHistogram()
.minimumExpectedValue(1.0D) // can take this based on how you want your distibution
.maximumExpectedValue(30.0D)
.register(this.meterRegistry)
.record(numericValue);
}
then it produce metrics like
delta_bucket{mode="CURRENT",le="30.0",} 11.0
delta_bucket{mode="CURRENT", le="+Inf",} 11.0
so as infinte also hold the less than value, so subtract the le=30 from le=+Inf
Another ways could be
public void createOrUpdateHistogram(String metricName, Map<String, String> stringTags, double numericValue)
{
Timer.builder(metricName)
.tags(tags)
.publishPercentiles(new double[]{0.5D, 0.95D})
.publishPercentileHistogram()
.serviceLevelObjectives(new Duration[]{Duration.ofMinutes(30L)})
.minimumExpectedValue(Duration.ofMinutes(30L))
.maximumExpectedValue(Duration.ofMinutes(30L))
.register(this.meterRegistry)
.record((long)timeDifference, TimeUnit.MINUTES);
}
it will only have two le, the given time and +inf
it can be change based on our requirements also it gives us quantile.

Spring batch multithreading

Problem Statement: After successfully completion of spring job won't be able to access data from ExecutionContext which is set inside spring batch partition.
Partition code:
for (String files : fileNameListmatch) {
ExecutionContext executionContext = new ExecutionContext();
executionContext.putString("file", files);
partitionData.put("partition: " + partitionNo, executionContext);
partitionNo++;
}
Inside partition code, I added list of files to ExecutionContext.
JobListener code:
#Value("#{stepExecutionContext['file']}")
String file;
#Override
public void afterJob(JobExecution jobExecution) {
for (String file1 : file) {
moveCSVFile = Files.move(Paths.get(inputFilePath + "/" + file1 + ".csv"),
Paths.get(archiveFilePath + file1 + ".csv"));
moveCTLFile = Files.move(Paths.get(inputFilePath + "/" + file1 + ".ctl"),
Paths.get(archiveFilePath + file1 + ".ctl"));
}
}
Inside afterJob, I tried to access list of files from ExecutionContext after completion of job.Getting null inside ExecutionContext.
After completion of job successfully, I have to move input files to another folder but won't be able to access files (getting null inside executionContext). After completion of job I have to move input files to one folder to another folder.
There are two different execution contexts: one at the step level and one at the job level. Make sure to use the job scoped one since you want to access the execution context from a job listener.
If you use the step scoped one, you can always promote keys to the job execution context using a ExecutionContextPromotionListener. Please refer to the Passing Data to Future Steps section for more details.

Spring Integration aggregator's release strategy based on last modified

I'm trying to implement the following scenario:
I get a bunch of files that have common file pattern, i.e. doc0001_page0001, doc0001_page0002, doc0001_page0003, doc0002_page0001 (where doc0001 would be one document consisting of 3 pages that I would need to merge, doc0002 would only have 1 page)
I want to aggregate them in a way that I will release a group only if all of the files for specific document are gathered (doc0001 after 3 files were picked up, doc0002 after 1 file)
My idea was to read the files in an alphabetical order and wait for 2 seconds after a group was last modified to release it (g.getLastModified() is smaller than the current time minus 2 seconds)
I've tried the following without success:
return IntegrationFlows.from(Files.inboundAdapter(tmpDir.getRoot())
.patternFilter("*.json")
.useWatchService(true)
.watchEvents(FileReadingMessageSource.WatchEventType.CREATE,
FileReadingMessageSource.WatchEventType.MODIFY),
e -> e.poller(Pollers.fixedDelay(100)
.errorChannel("filePollingErrorChannel")))
.enrichHeaders(h -> h.headerExpression("CORRELATION_PATTERN", "headers[" + FileHeaders.FILENAME + "].substring(0,7)")) // docxxxx.length()
.aggregate(a -> a.correlationExpression("headers['CORRELATION_PATTERN']")
.releaseStrategy(g -> g.getLastModified() < System.currentTimeMillis() - 2000)) .channel(MessageChannels.queue("fileReadingResultChannel"))
.get();
Changing the release strategy to the following also didn't work:
.aggregate(a -> a.correlationExpression("headers['CORRELATION_PATTERN']")
.releaseStrategy(g -> {
Stream<Message<?>> stream = g.getMessages()
.stream();
Long timestamp = (Long) stream.skip(stream.count() - 1)
.findFirst()
.get()
.getHeaders()
.get(MessageHeaders.TIMESTAMP);
System.out.println("Timestamp: " + timestamp);
return timestamp.longValue() < System.currentTimeMillis() - 2000;
}))
Am I misunderstanding the release strategy concept?
Also, is it possible to print something out from the releaseStrategy block? I wanted to compare the timestamp (see System.out.println("Timestamp: " + timestamp);)
Right, since you don't know the whole sequence for message group, you don't have any other choice unless to use a groupTimeout. The regular releaseStrategy works only when a message arrives to the aggregator. Since at the point of one message you don't have enough info to release the group, it is going to sit in the group store forever.
The groupTimeout option has been introduced to the aggregator especially for this kind of use-cases when we definitely would like to release a group without enough messages to group normally.
You may consider to use a groupTimeoutExpression instead of constant-based groupTimeout. The MessageGroup is a root evaluation context object for SpEL, so you will be able to get access to the mentioned lastModified for it.
The .sendPartialResultOnExpiry(true) is right option to deal with here.
See more info in the docs: https://docs.spring.io/spring-integration/reference/html/#agg-and-group-to
I found a solution to that with a different approach. I still don't understand why the above one wasn't working.
I've also found a cleaner way of defining the correlation function.
IntegrationFlows.from(Files.inboundAdapter(tmpDir.getRoot())
.patternFilter("*.json")
.useWatchService(true)
.watchEvents(FileReadingMessageSource.WatchEventType.CREATE, FileReadingMessageSource.WatchEventType.MODIFY), e -> e
.poller(Pollers.fixedDelay(100)))
.enrichHeaders(h -> h.headerFunction(IntegrationMessageHeaderAccessor.CORRELATION_ID, m -> ((String) m
.getHeaders()
.get(FileHeaders.FILENAME)).substring(0, 17)))
.aggregate(a -> a.groupTimeout(2000)
.sendPartialResultOnExpiry(true))
.channel(MessageChannels.queue("fileReadingResultChannel"))
.get();

Build spring integration release strategy using spring DSL

I am new to Spring integration. I am trying to split the message from a file using file splitter and then use .aggregate() to build a single message and send to output channel.
I have markers as true and hence apply-sequence is false by default now.
I have set correlationId to a constant "1" using enrichHeaders. I have trouble setting the realease strategy as I do not have a hold on the sequence end. Here is how my code looks.
IntegrationFlows
.from(s -> s.file(new File(fileDir))
.filter(getFileFilter(fileName)),
e -> e.poller(poller))
.split(Files.splitter(true, true)
.charset(StandardCharsets.US_ASCII),
e -> e.id(beanName)).enrichHeaders(h -> h.header("correlationId", "1"));
IntegrationFlow integrationFlow = integrationFlowBuilder
.<Object, Class<?>>route(Object::getClass, m -> m
.channelMapping(FileSplitter.FileMarker.class, "markers.input")
.channelMapping(String.class, "lines.input"))
.get();
#Bean
public IntegrationFlow itemExcludes() {
return flow -> flow.transform(new ItemExcludeRowMapper(itemExcludeRowUnmarshaller)) //This maps each line to ItemExclude object
.aggregate(aggregator -> aggregator
.outputProcessor(group -> group.getMessages()
.stream()
.map(message -> ((ItemExclude) message.getPayload()).getPartNumber())
.collect(Collectors.joining(","))))
.transform(Transformers.toJson())
.channel(customSource.itemExclude());
}
#Bean
public IntegrationFlow itemExcludeMarkers() {
return flow -> flow
.log(LoggingHandler.Level.INFO)
.<FileSplitter.FileMarker>filter(m -> m.getMark().equals(FileSplitter.FileMarker.Mark.END))
.<FileHandler>handle(new FileHandler(configProps))
.channel(NULL_CHANNEL);
}
Any help appreciated.
I would move your header enricher for the correlationId before splitter and make it like this:
.enrichHeaders(h -> h
.headerFunction(IntegrationMessageHeaderAccessor.CORRELATION_ID,
m -> m.getHeaders().getId()))
The constant correlationId is absolutely not good in the multi-threaded environment: different threads splits different files and send different lines to the same aggregator. So, with the "1" as correlation key you'd have always one group to aggregate and release. The default sequence behavior is to populate the original message id to the correlationId. Since you are not going to rely on the applySequence from the FileSplitter I suggest that simple solution to emulate that behavior.
As Gary pointed in his answer you need to think about custom ReleaseStrategy and send FileSplitter.FileMarker to the aggregator as well. The FileSplitter.FileMarker.END has lineCount property which can be compared with the MessageGroup.size to decide that we are good to release the group. The MessageGroupProcessor indeed has to filter FileSplitter.FileMarker messages during building the result for output.
Use a custom release strategy that looks for the END marker in the last message and, perhaps, a custom output processor that removes the markers from the collection.

how we can set sequential execution of azure service bus queue..?

I have three queue in my project.
1.verify email and number.
2.register user.
3.perform investment operation like. deposit, withdrawn, invest etc.
I want the flow of execution is first then second when second is running first run next record. and when second is completed then third.. because we have some data dependency for all.
how I create this kind of sequence of queue
queue 1
Trace.TraceInformation("verification is started");
BrokeredMessage verificationqueuedata = Client.Receive();
try
{
if (creditcheckqueuedata != null)
{
UserModel userModel = verificationqueuedata.GetBody<UserModel>();
if (userModel == null)
{
verificationqueuedata.Abandon();
}
else
{//project code
verificationqueuedata.Complete();
}
}
all the three queue are created in same manner..
Support me for creating of sequence
I have three queue in my project. 1.verify email and number.
2.register user. 3.perform investment operation like. deposit, withdrawn, invest etc.
If you mean, three separate queues for separate tasks: Pick up item from Queue-1 once it is completed put message into Queue-2 & so on. There is no race condition here.
If you are using same queue for three types of messages: You need to maintain correlation-id with each of your message & use some kind of state mechanism (database) to find whether previous operation for this correlation id is completed or no.

Resources