Spring Integration aggregator's release strategy based on last modified - spring-integration

I'm trying to implement the following scenario:
I get a bunch of files that have common file pattern, i.e. doc0001_page0001, doc0001_page0002, doc0001_page0003, doc0002_page0001 (where doc0001 would be one document consisting of 3 pages that I would need to merge, doc0002 would only have 1 page)
I want to aggregate them in a way that I will release a group only if all of the files for specific document are gathered (doc0001 after 3 files were picked up, doc0002 after 1 file)
My idea was to read the files in an alphabetical order and wait for 2 seconds after a group was last modified to release it (g.getLastModified() is smaller than the current time minus 2 seconds)
I've tried the following without success:
return IntegrationFlows.from(Files.inboundAdapter(tmpDir.getRoot())
.patternFilter("*.json")
.useWatchService(true)
.watchEvents(FileReadingMessageSource.WatchEventType.CREATE,
FileReadingMessageSource.WatchEventType.MODIFY),
e -> e.poller(Pollers.fixedDelay(100)
.errorChannel("filePollingErrorChannel")))
.enrichHeaders(h -> h.headerExpression("CORRELATION_PATTERN", "headers[" + FileHeaders.FILENAME + "].substring(0,7)")) // docxxxx.length()
.aggregate(a -> a.correlationExpression("headers['CORRELATION_PATTERN']")
.releaseStrategy(g -> g.getLastModified() < System.currentTimeMillis() - 2000)) .channel(MessageChannels.queue("fileReadingResultChannel"))
.get();
Changing the release strategy to the following also didn't work:
.aggregate(a -> a.correlationExpression("headers['CORRELATION_PATTERN']")
.releaseStrategy(g -> {
Stream<Message<?>> stream = g.getMessages()
.stream();
Long timestamp = (Long) stream.skip(stream.count() - 1)
.findFirst()
.get()
.getHeaders()
.get(MessageHeaders.TIMESTAMP);
System.out.println("Timestamp: " + timestamp);
return timestamp.longValue() < System.currentTimeMillis() - 2000;
}))
Am I misunderstanding the release strategy concept?
Also, is it possible to print something out from the releaseStrategy block? I wanted to compare the timestamp (see System.out.println("Timestamp: " + timestamp);)

Right, since you don't know the whole sequence for message group, you don't have any other choice unless to use a groupTimeout. The regular releaseStrategy works only when a message arrives to the aggregator. Since at the point of one message you don't have enough info to release the group, it is going to sit in the group store forever.
The groupTimeout option has been introduced to the aggregator especially for this kind of use-cases when we definitely would like to release a group without enough messages to group normally.
You may consider to use a groupTimeoutExpression instead of constant-based groupTimeout. The MessageGroup is a root evaluation context object for SpEL, so you will be able to get access to the mentioned lastModified for it.
The .sendPartialResultOnExpiry(true) is right option to deal with here.
See more info in the docs: https://docs.spring.io/spring-integration/reference/html/#agg-and-group-to

I found a solution to that with a different approach. I still don't understand why the above one wasn't working.
I've also found a cleaner way of defining the correlation function.
IntegrationFlows.from(Files.inboundAdapter(tmpDir.getRoot())
.patternFilter("*.json")
.useWatchService(true)
.watchEvents(FileReadingMessageSource.WatchEventType.CREATE, FileReadingMessageSource.WatchEventType.MODIFY), e -> e
.poller(Pollers.fixedDelay(100)))
.enrichHeaders(h -> h.headerFunction(IntegrationMessageHeaderAccessor.CORRELATION_ID, m -> ((String) m
.getHeaders()
.get(FileHeaders.FILENAME)).substring(0, 17)))
.aggregate(a -> a.groupTimeout(2000)
.sendPartialResultOnExpiry(true))
.channel(MessageChannels.queue("fileReadingResultChannel"))
.get();

Related

Spring Integration Gateway Without Reply in DSL

My question is very similar to this Stackoverflow Question in that I want to send to JMS and then carry on with my integration flow.
The response is totally asynchronous and is therefore handled on a separate Jms.messageDrivenChannelAdapter. So, I basically want to "fire and forget".
My code is this (Spring 5.3.14);
.enrichHeaders(
h -> h.headerFunction("JMSCorrelationID", m -> m.getHeaders().get(MessageHeaders.ID))
)
.handle(
Jms
.outboundGateway(connectionFactory)
.requestDestination(queueName)
)
.handle(p -> System.err.println("Do something else with ... " + p))
.get();
And I get this;
org.springframework.integration.MessageTimeoutException: failed to receive JMS response within timeout of: 5000ms,
The referenced answer to me implies that I need to listen to a dummy queue, which I don't want to have to do. So what do I need to fix in my code above?
Edit; final code using the solution below tested with/without "queue PUT inhibit" in order cause an exception.
.publishSubscribeChannel(s -> s
.subscribe(f -> f.handle(
Jms
.outboundAdapter(connectionFactory)
.destination(queueName)
))
.subscribe(f -> f.handle(
p -> System.err.println("Do something else with ... " + p)
))
)
You need to use Jms.outboundChannelAdapter() instead. And together with that next handle() wrap them into a publishSubscribeChannel() as two subscribers sub-flows. This way they are going to be called one after other, but in parallel and their individual flows.

what code-instrument should be added to register each http event in MeterRegistry with specific tag & minute value. Event requests are in millions

I need to analyse one http event value which should not be greater than 30mins. & 95% event should belong to this bucket. If it fails send the alert.
My first concern is to get the right metrics in /actuator/prometheus
Steps I took:
As in every http request event, I am getting one integer value called eventMinute.
Using micrometer MeterRegistry, I tried below code
// MeterRegistry meterRegistry ...
meterRegistry.summary("MINUTES_ANALYSIS", tags);
where tag = EVENT_MINUTE which receives some integer value in each
http event.
But this way, it floods the metrics due to millions of event.
Guide me a way please, i am beginner to this. Thanks!!
The simplest solution (which I would recommend you start with) would be to just create 2 counters:
int theThing = //getTheThing()
if(theThing > 30) {
meterRegistry.counter("my.request.counter.abovethreshold").inc()
}
meterRegistry.counter("my.request.counter.total").inc()
You would increment the counter that matches your threshold and another that tracks all requests (or reuse another meter that does that for you).
Then it is simple to setup a chart or alarm:
my_request_counter_abovethreshold/my_request_counter_total < .95
(I didn't test the code. It might need a tiny bit of tweaking)
You'll be able to do a similar thing with DistributionSummary by setting various SLOs (I'm not familiar with them to be able to offer one), but start with something simple first and if it is sufficient, you won't need the other complexity.
There are certain ways to solve this problem
1 ; here is a function which receives tags, name of metrics and a value
public void createOrUpdateHistogram(String metricName, Map<String, String> stringTags, double numericValue)
{
DistributionSummary.builder(metricName)
.tags(tags)
//can enforce slo if required
.publishPercentileHistogram()
.minimumExpectedValue(1.0D) // can take this based on how you want your distibution
.maximumExpectedValue(30.0D)
.register(this.meterRegistry)
.record(numericValue);
}
then it produce metrics like
delta_bucket{mode="CURRENT",le="30.0",} 11.0
delta_bucket{mode="CURRENT", le="+Inf",} 11.0
so as infinte also hold the less than value, so subtract the le=30 from le=+Inf
Another ways could be
public void createOrUpdateHistogram(String metricName, Map<String, String> stringTags, double numericValue)
{
Timer.builder(metricName)
.tags(tags)
.publishPercentiles(new double[]{0.5D, 0.95D})
.publishPercentileHistogram()
.serviceLevelObjectives(new Duration[]{Duration.ofMinutes(30L)})
.minimumExpectedValue(Duration.ofMinutes(30L))
.maximumExpectedValue(Duration.ofMinutes(30L))
.register(this.meterRegistry)
.record((long)timeDifference, TimeUnit.MINUTES);
}
it will only have two le, the given time and +inf
it can be change based on our requirements also it gives us quantile.

Build spring integration release strategy using spring DSL

I am new to Spring integration. I am trying to split the message from a file using file splitter and then use .aggregate() to build a single message and send to output channel.
I have markers as true and hence apply-sequence is false by default now.
I have set correlationId to a constant "1" using enrichHeaders. I have trouble setting the realease strategy as I do not have a hold on the sequence end. Here is how my code looks.
IntegrationFlows
.from(s -> s.file(new File(fileDir))
.filter(getFileFilter(fileName)),
e -> e.poller(poller))
.split(Files.splitter(true, true)
.charset(StandardCharsets.US_ASCII),
e -> e.id(beanName)).enrichHeaders(h -> h.header("correlationId", "1"));
IntegrationFlow integrationFlow = integrationFlowBuilder
.<Object, Class<?>>route(Object::getClass, m -> m
.channelMapping(FileSplitter.FileMarker.class, "markers.input")
.channelMapping(String.class, "lines.input"))
.get();
#Bean
public IntegrationFlow itemExcludes() {
return flow -> flow.transform(new ItemExcludeRowMapper(itemExcludeRowUnmarshaller)) //This maps each line to ItemExclude object
.aggregate(aggregator -> aggregator
.outputProcessor(group -> group.getMessages()
.stream()
.map(message -> ((ItemExclude) message.getPayload()).getPartNumber())
.collect(Collectors.joining(","))))
.transform(Transformers.toJson())
.channel(customSource.itemExclude());
}
#Bean
public IntegrationFlow itemExcludeMarkers() {
return flow -> flow
.log(LoggingHandler.Level.INFO)
.<FileSplitter.FileMarker>filter(m -> m.getMark().equals(FileSplitter.FileMarker.Mark.END))
.<FileHandler>handle(new FileHandler(configProps))
.channel(NULL_CHANNEL);
}
Any help appreciated.
I would move your header enricher for the correlationId before splitter and make it like this:
.enrichHeaders(h -> h
.headerFunction(IntegrationMessageHeaderAccessor.CORRELATION_ID,
m -> m.getHeaders().getId()))
The constant correlationId is absolutely not good in the multi-threaded environment: different threads splits different files and send different lines to the same aggregator. So, with the "1" as correlation key you'd have always one group to aggregate and release. The default sequence behavior is to populate the original message id to the correlationId. Since you are not going to rely on the applySequence from the FileSplitter I suggest that simple solution to emulate that behavior.
As Gary pointed in his answer you need to think about custom ReleaseStrategy and send FileSplitter.FileMarker to the aggregator as well. The FileSplitter.FileMarker.END has lineCount property which can be compared with the MessageGroup.size to decide that we are good to release the group. The MessageGroupProcessor indeed has to filter FileSplitter.FileMarker messages during building the result for output.
Use a custom release strategy that looks for the END marker in the last message and, perhaps, a custom output processor that removes the markers from the collection.

Spring Integration aggregator based on content of next message

I have to read a file and split each line and group lines based on first column, when the first column value changes I have to release previous group. Can this be done in Spring integration DSL ?
here is how file look like, and it's sorted:
x 1
x 2
x 3
y 4
y 5
y 6
The out put should be two messages with x =1, 2, 3 and y = 4, 5, 6.
Since this doesn't have any other relation, regarding when message should be grouped, Can I group message as soon as I hit next non matching record ? In this case as son as I hit "y" at line number 4, group the previous "x" messages and release it ? Is it possible using custom aggregator ?
The simplest solution is to rely on the groupTimeout() as far as you split and aggregate in a single thread and quickly enough. So, all your records will be processed and distributed to their groups. But since we don't know how to release them, we will rely on some scheduled timeout. So, the configuration for the aggregator would be like this:
.aggregate(a -> a
.correlationExpression("payload.column1")
.releaseStrategy(g -> false)
.groupTimeout(1000)
.sendPartialResultOnExpiry(true)
.outputProcessor(g -> {
Collection<Message<?>> messages = g.getMessages();
// iterate and build your output payload
})
)

How to maintain counters with LinqToObjects?

I have the following c# code:
private XElement BuildXmlBlob(string id, Part part, out int counter)
{
// return some unique xml particular to the parameters passed
// remember to increment the counter also before returning.
}
Which is called by:
var counter = 0;
result.AddRange(from rec in listOfRecordings
from par in rec.Parts
let id = GetId("mods", rec.CKey + par.UniqueId)
select BuildXmlBlob(id, par, counter));
Above code samples are symbolic of what I am trying to achieve.
According to the Eric Lippert, the out keyword and linq does not mix. OK fair enough but can someone help me refactor the above so it does work? A colleague at work mentioned accumulator and aggregate functions but I am novice to Linq and my google searches were bearing any real fruit so I thought I would ask here :).
To Clarify:
I am counting the number of parts I might have which could be any number of them each time the code is called. So every time the BuildXmlBlob() method is called, the resulting xml produced will have a unique element in there denoting the 'partNumber'.
So if the counter is currently on 7, that means we are processing 7th part so far!! That means XML returned from BuildXmlBlob() will have the counter value embedded in there somewhere. That's why I need it somehow to be passed and incremented every time the BuildXmlBlob() is called per run through.
If you want to keep this purely in LINQ and you need to maintain a running count for use within your queries, the cleanest way to do so would be to make use of the Select() overloads that includes the index in the query to get the current index.
In this case, it would be cleaner to do a query which collects the inputs first, then use the overload to do the projection.
var inputs =
from recording in listOfRecordings
from part in recording.Parts
select new
{
Id = GetId("mods", recording.CKey + part.UniqueId),
Part = part,
};
result.AddRange(inputs.Select((x, i) => BuildXmlBlob(x.Id, x.Part, i)));
Then you wouldn't need to use the out/ref parameter.
XElement BuildXmlBlob(string id, Part part, int counter)
{
// implementation
}
Below is what I managed to figure out on my own:.
result.AddRange(listOfRecordings.SelectMany(rec => rec.Parts, (rec, par) => new {rec, par})
.Select(#t => new
{
#t,
Id = GetStructMapItemId("mods", #t.rec.CKey + #t.par.UniqueId)
})
.Select((#t, i) => BuildPartsDmdSec(#t.Id, #t.#t.par, i)));
I used resharper to convert it into a method chain which constructed the basics for what I needed and then i simply tacked on the select statement right at the end.

Resources