How to process files in separate threads?

How to process files in separate threads? - multithreading

How to process files in separate threads?
There is a /local dir where files are being put by other means and new files with same name replace old ones.
I want move files from /local to /processing dir and activate some service. Further in the end of filter chain a cleanup task will remove files from /processing.
I made it working 1 by 1, but processing takes minutes so I'd like to
Add multithreading: i.e. Several files are moved and processed simultaneously.
If there is a file that was not yet processed say "File1.abc" and a new version of this file has been put to /local then no need to process old message with old version of file. I.e. messages should be sent only for version of files in the moment they are moved from /local to /processing
I am trying something like this:
<file:inbound-channel-adapter channel="processingChannel"
directory="#{localDir}"
prevent-duplicates="false" filter="acceptAllFileListFilter">
<int:poller fixed-rate="20" max-messages-per-poll="3" task-executor="executor"/>
</file:inbound-channel-adapter>
<task:executor id="executor" pool-size="3" queue-capacity="0" rejection-policy="ABORT"/>
<file:outbound-gateway request-channel="processingChannel" reply-channel="serviceChannel"
directory="#{processing}"
auto-create-directory="true"
filename-generator-expression="payload.name + '_' + { T(java.lang.System).currentTimeMillis()}"
delete-source-files="true"
mode="FAIL" />
<int:service-activator input-channel="serviceChannel" output-channel="furtherChannels"
ref="someService" method="process">
</int:service-activator>
<bean id="someService" class="com.dot.SomeService"/>
But it does not work and I cannot figure out how to fix it. I tried different ways but there are always errors like messages are generated for already deleted files or some other problems. The task itself seems simple. How to make files process in say 3 threads and send messages only for actual versions of files? Maybe problem here with polling consumer but inbound adapter is used only with this consumer, right?

I don't see solution for you yet, but maybe you just don't explain the challenge properly... Try just share the business requirements.
Plus I don't see reason for the <file:outbound-gateway>. You can just read files from the /local dir and process them. For the concurrency and some discard in-flight processes logic you could use some custom FileListFilter, from where you should determine the new file version and by its key cancel() the ran process to start a new one in the end of current poll().
There might be some other solution, but let's start just from the business requirements!

Related

Spring Integration JMS/IBM MQ: how to send different message to different queue in parallel

I am working with a project which is using JMS listener to receive incoming message, and then route to different destination, currently the process only pick one destination among below 3 for each incoming message. so the xml configuration is written as below
<integration:router ref="jmsRouter" input-channel="jmsFilterOutput" default-output-channel="jmsRouterOutput" />
<integration:service-activator id="serviceActivator1" input-channel="input1"
ref="messageProcessService" method="callMsgProcessor1" />
<integration:service-activator id="serviceActivator2" input-channel="input2"
ref="messageProcessService" method="callMsgProcessor2" />
<integration:service-activator id="serviceActivator3" input-channel="sharedInput"
ref="messageProcessService" method="callMsgProcessor3" output-channel="reqChannel" />
among above 3 serviceActivator, the output-channel of the last one is defined as IBM mq in another xml configuration file.
now my job is to generate a different message from sharedInput, and send to a different queue in parallel
so I add a line as below
<integration:service-activator id="serviceActivator4" input-channel="sharedInput"
ref="messageProcessService" method="callMsgProcessorNew" output-channel="reqChannelNew" />
however when running JMS, the message from sharedInput only goes to callMsgProcessor3, and the populated message is sent to reqChannel only as well, and ignore my new destination. if I comment out the third service activator, sharedInput can go to callMsgProcessorNew, and route to new queue.
can anyone advise how I should configure to push the sharedInput go to two processors (callMsgProcessor3 and callMsgProcessorNew), and also sent to their corresponding output mq channel in parallel?
I googled online, seems router splitter or recipient list router can solve my problem? but still feeling confused after reading the related doc, and not sure how to configure it in my case. appreciate if someone can help provide a sample
please let me know if I need to provide more info to clarify the issue.

You can make a sharedInput as a PublishSubscribeChannel and have another service activator subscribed to it so the same message will go to both of them. After that you can make absolutely different flows and do whatever logic you need to parallel. See docs for more info: https://docs.spring.io/spring-integration/docs/current/reference/html/core.html#channel-implementations-publishsubscribechannel.
Also respective EIP determination : https://www.enterpriseintegrationpatterns.com/patterns/messaging/PublishSubscribeChannel.html

thanks for your reply, #Artem! I realized one thing, sharedInput only go to one destination is because it is one message. if I can duplicate the message, it will go to two destination. so I add recipient-list-router, and made change as below, and it worked!
<integration:recipient-list-router id="duplicateMsgRouter" input-channel="sharedInput"
timeout="1234"
ignore-send-failures="true"
apply-sequence="true">
<integration:recipient channel="channel1"/>
<integration:recipient channel="channel2"/>
</integration:recipient-list-router>
<integration:service-activator id="serviceActivator3" input-channel="channel1"
ref="messageProcessService" method="callMsgProcessor3" output-channel="reqChannel" />
<integration:service-activator id="serviceActivator4" input-channel="channel2"
ref="messageProcessService" method="callMsgProcessorNew" output-channel="reqChannelNew" />

Please let me know working mechanism of poll(CRON JOB) in Apache Camel

file://D:/Users/schintha/temp/input?autoCreate=false&include=.*.csv|.*.CSV|.*.eof|.*.EOF
&maxMessagesPerPoll=1000&moveFailed=.error&scheduler=spring&scheduler.cron=0+*+*+*+*+?
&sendEmptyMessageWhenIdle=true&sortBy=file:modified;ignoreCase:file:name
I am using above Route with poll(cron scheduler) is at every 0 second (once in a minute).
if file(s) (i.e.,one or more files) transfer takes more than one minute then sendEmptyMessageWhenIdle will work or not is my question
.
Since, i am stopping route when there is no file during the poll.
Please let me know the functionality of poll(cron scheduler)if file transfer takes more than poll time (i.e., more than one minute in this case)
Structure of my route:
<route>
<from>
<when>
<simple>${headers.CamelBatchSize} >= 1 and ${body} != null
and ${headers.CamelFileName} != null</simple>
<to>
<otherwise> <toD uri="controlbus:route?routeId=${routeId}&action=stop"/>
</route>

I'm not entirely sure about what the question is?
The sendEmptyMessageWhenIdleoption will only send an empty message body if the current poll didn't find any files to process. If the poll finds a file and it takes more than one minute to process all that happens is that a new poll will execute in parallel to the one that's already in progress.
I.e. you won't get an empty message if the current poll takes more than a minute to finish.
Also, if the only thing you want to do if there are no files to process is to stop the route you might as well just remove sendEmptyMessageWhenIdle altogether. If that option is set to false (which it is by default) the route will stop automatically (until the next poll, that is).

Issues in polling a file using Spring Integration

My req. is to poll a directory for a specified time interval say 10 mins. If a file of a particular extension say *.xml is found in the directory then the it just consumes (i.e. picks and deletes) the file and prints the name else after the specified time (say 10 mins.) interval it sends out a mail that the file has not been picked (i.e. consumed) or the file has not come.
There are 2 options either I do it through Spring integration OR WatchService of Core Java. Following is the code in Spring Integration which I have written till now:
<int:channel id="fileChannel" />
<int:channel id="processedFileChannel" />
<context:property-placeholder location="localProps.properties" />
<int:poller default="true" fixed-rate="10000" id="poller"></int:poller>
<int-file:inbound-channel-adapter
directory="file:${inbound.folder}" channel="fileChannel"
filename-pattern="*.xml" />
<int:service-activator input-channel="fileChannel"
ref="fileHandlerService" method="processFile" output-channel="processedFileChannel"/>
<bean id="fileHandlerService" class="com.practice.cmrs.springintegration.Poll" />
The above code is successfully polling the folder for a particular file pattern. Now I have 2 things to do:
1) Stop polling after a particular time interval (configurable) say 10 mins.
2) Check whether a file with a particular extension is there in the folder ... if the file is there (it consumes and then deletes) else it sends an email to a group of people (email part is done.)
Please help me in the above 2 points.

You can use a Smart Poller to do things like that.
You can adjust the poller and/or take different actions if/when the poll results in a message.
Version 4.2 introduced the AbstractMessageSourceAdvice. Any Advice objects in the advice-chain that subclass this class, are applied to just the receive operation. Such classes implement the following methods:
beforeReceive(MessageSource<?> source)
This method is called before the MessageSource.receive() method. It enables you to examine and or reconfigure the source at this time. Returning false cancels this poll (similar to the PollSkipAdvice mentioned above).
Message<?> afterReceive(Message<?> result, MessageSource<?> source)
This method is called after the receive() method; again, you can reconfigure the source, or take any action perhaps depending on the result (which can be null if there was no message created by the source). You can even return a different message!

Spring Integration File Polling. If moving the file does a AcceptOnceFileListFilter need to be used?

I'm writing a file polling implementation and am trying to determine if I need to use a AcceptOnceFileListFilter.
The first step the FileProcessor will perform is to move the file to another directory.
Does the poller "batchFilePoller" use multiple threads when polling? Can a race condition occur where a file will be read by multiple threads? In this case I assume I need to use the AcceptOnceFileListFilter.
However if the poller is only using one thread from the pool.
Then if the file is moved before the next poll time and it succeeds I assume there is no posability of the file been processed twice?
<int-file:inbound-channel-adapter id="batchFileInAdapter" directory="/somefolder" auto-create-directory="true" auto-startup="false" channel="batchFileInChannel" >
<int:poller id="batchFilePoller" fixed-rate="6000" task-executor="batchTaskExecutor" max-messages-per-poll="1" error-channel="batchPollingErrorChannel" />
</int-file:inbound-channel-adapter>
<int:channel id="batchFileInChannel"/>
<int:service-activator input-channel="batchFileInChannel" >
<bean class="com.foo.FileProcessor" />
</int:service-activator>
<task:executor id="batchTaskExecutor" pool-size="5" queue-capacity="20"/>

The <int-file:inbound-channel-adapter> has prevent-duplicates option which is true by default and it is your case since you don't provide any other options which prevent prevent-duplicates to be true.
And yes: any polling adapter is multi-threaded, if you use fixed-rate. In this case the new polling task can be run before a finish of previous one.
Even if it will be a single-threaded (using fixed-delay), the AcceptOnceFileListFilter must be there, because a new polling task doesn't know if file has been processed or not. And it reads the same file again.
AcceptOnceFileListFilter is exactly for those cases when you don't like to read the same file one more time. You can overcome that with <int:transactional synchronization-factory=""/> for the <poller> of the <int-file:inbound-channel-adapter>:
<int:transaction-synchronization-factory id="txSyncFactory">
<int:after-commit expression="payload.delete()"/>
</int:transaction-synchronization-factory>
and PseudoTransactionManager.
More info you can find in the Spring Integration Reference Manual.

How do I support multiple server.pid files?

I am running play on multiple machines in our datacenter. We loadbalance the hell out of everything. On each play node/VM I'm using Apache and an init.d/play script to start and stop the play service.
The problem is that our play websites are hosted on shared network storage. This makes deployment really nice, you deploy to one place and the website is updated on all 100 machines. Each machine has a mapped folder "/z/www/PlayApp1" where the play app lives.
The issue is that when the service starts or stops the server.pid file is being written to that network location where the apps files live.
The problem is that as I bring up 100 nodes, the 100th node will override the PID file with it's pid and now that pid file only represents the correct process ID for 1 out of 100 nodes.
So how do I get play to store the pid file locally and not with the app files on the network share? I'll need each server's PID file to reflect that machines actual process.
We are using CentOS (Linux)
Thanks in advance
Josh

According to https://github.com/playframework/play/pull/43 it looks like there is a --pid_file command line option; it might only work with paths under the application root so you might have to make directories for each distinct host (which could possibly be symlinks)
I have 0 experience with Play so hopefully this is helpful information.

I don't even think it should run a second copy, based on the current source code. The main function is:
public static void main(String[] args) throws Exception {
File root = new File(System.getProperty("application.path"));
if (System.getProperty("precompiled", "false").equals("true")) {
Play.usePrecompiled = true;
}
if (System.getProperty("writepid", "false").equals("true")) {
writePID(root);
}
:
blah blah blah
}
and writePID is:
private static void writePID(File root) {
String pid = ManagementFactory.getRuntimeMXBean().getName().split("#")[0];
File pidfile = new File(root, PID_FILE);
if (pidfile.exists()) {
throw new RuntimeException("The " + PID_FILE + " already exists. Is the server already running?");
}
IO.write(pid.getBytes(), pidfile);
}
meaning it should throw an exception when you try to run multiple copies using the same application.path.
So either you're not using the version I'm looking at or you're discussing something else.
It seems to me it would be a simple matter to change that one line above:
File root = new File(System.getProperty("application.path"));
to use a different property for the PID file storage, one that's not on the shared drive.
Although you'd need to be careful, root is also passed to Play.int so you should investigate the impact of changing it.
This is, after all, one of the great advantages of open source software, inasmuch as you can fix the "bugs" yourself.
For what it's worth, I'm not a big fan of the method you've chosen for deployment. Yes, it simplifies deployment but upgrading your servers is an all-or-nothing thing which will cause you grief if you accidentally install some dodgy software.
I much prefer staged deployments so I can shut down non-performing nodes as needed.

Change your init script to write the pid to /tmp or somewhere else machine-local.
If that is hard, a symlink might work.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to process files in separate threads? - multithreading

Related

Spring Integration JMS/IBM MQ: how to send different message to different queue in parallel

Please let me know working mechanism of poll(CRON JOB) in Apache Camel

Issues in polling a file using Spring Integration

Spring Integration File Polling. If moving the file does a AcceptOnceFileListFilter need to be used?

How do I support multiple server.pid files?

Categories

Resources