Hi I have grails web application where users can save the data by importing the data from a excel or csv file and my application parses the data and saves them in database, but if there are too many records like 40,000 it's taking more time.
So I what I wanted to do is run this task in the background and notify users after the task is done through an email, so that users don't have to sit there ideally and can work on some other task.
Can you suggest me a way where I can save the records in database using background thread
You can launch the long running task in a separate thread, using the #Asyncannotation, see the documentation for it here:
25.5.2 The #Async Annotation
The #Async annotation can be provided on a method so that invocation
of that method will occur asynchronously. In other words, the caller
will return immediately upon invocation and the actual execution of
the method will occur in a task that has been submitted to a Spring
TaskExecutor. In the simplest case, the annotation may be applied to a
void-returning method.
This is a code example of how to use it:
#Async
void doSomething() {
// this will be executed asynchronously
}
To give a ETL (Extract Transform Load) like structure to the batch, have a look a spring batch. This is an example of how to read a CSV and upload it to the database using spring batch - CSV File Upload.
You can use the Quartz Plugin, and then create a Job that runs on demand and does the inserting for you without blocking the users. Link to plugin: http://grails.org/plugin/quartz
Related
Error Image Jmeter
Error Image JTL
My API calls take token which is valid for 4 min. In CSV data set config file, we have records with tokens placed in each row for each user. I am running this test for 20 min. I am running JMeter in CLI mode and running another thread to update it every 2 min. Thread uses a custom library to create tokens.
Now the issue is: just in some cases, Jmeter reads the file while it is being updated by a separate thread and this causes errors.
How I know this is caused by thread :
This error appears after the thread updates the file. before it, everything works fine.
My CSV has parameters
server,portNumber,userId,username,password,teamspaceID,Token
and in JMeter script using URL like "Http://${server}:${portNumber}"
but in .jtl file, few of the records have "Http:// some part of token string:8082"
Is there any other efficient way to tackle this
It's a classic race condition, JMeter's CSV Data Set Config doesn't expect that the file can change during the runtime, it's hard to come up with the exact solution without seeing your test plan, however you can consider the following alternatives:
Generating token right before the request using JSR223 PreProcessor, by default the time taken by the PreProcessors is not included into Sampler's elapsed time so you will get only HTTP request execution time in the results file
Putting logic which is not thread safe under Critical Section Controller
Using Inter-Thread Communication Plugin instead of interim CSV file for keeping/passing the tokens
My team lead has solved this by using database. Now we are writing header parameters in database instead of csv file from thread, and in jmeter using pre-processor to get values from database to update in header properties.
In addition we are writing three sets of data. Thread is updating the oldest data and jmeter is using latest data by using order by in pre-processor query.
I have an application written for Spark using Scala language. My application code is kind of ready and the job runs for around 10-15 mins.
There is an additional requirement to provide status of the application execution when spark job is executing at run time. I know that spark runs in lazy way and it is not nice to retrieve data back to the driver program during spark execution. Typically, I would be interested in providing status at regular intervals.
Eg. if there 20 functional points configured in the spark application then I would like to provide status of each of these functional points as and when they are executed/ or steps are over during spark execution.
These incoming status of function points will then be taken to some custom User Interface to display the status of the job.
Can some one give me some pointers on how this can be achieved.
There are few things you can do on this front that I can think of.
If your job contains multiple actions, you can write a script to poll for the expected output of those actions. For example, imagine your script have 4 different DataFrame save calls. You could have your status script poll HDFS/S3 to see if the data has showed up in the expected output location yet. Another example, I have used Spark to index to ElasticSearch, and I have written status logging to poll for how many records are in the index to print periodic progress.
Another thing I tried before is use Accumulators to try and keep rough track of progress and how much data has been written. This works ok, but it is a little arbitrary when Spark updates the visible totals with information from the executors so I haven't found it to be too helpfully for this purpose generally.
The other approach you could do is poll Spark's status and metric APIs directly. You will be able to pull all of the information backing the Spark UI into your code and do with it whatever you want. It won't necessarily tell you exactly where you are in your driver code, but if you manually figure out how your driver maps to stages you could figure that out. For reference, here are is the documentation on polling the status API:
https://spark.apache.org/docs/latest/monitoring.html#rest-api
I am using spring batch remote chunking for distributed processing.
When a slave node is done with processing a chunk I would like to return some additional data along with ChunkResponse.
For example if a chunk consist of 10 user Ids I would like to return in response how many user ids were processed successfully.
The response could include some other data as well. I have spent considerable time trying to figure out ways to achieve this
but without any success.
For example I have tried to extend ChunkResponse class and add some additional fields to it. And then extend ChunkProcessorChunkHandler
and return customized ChunkResponse from it. But I am not sure if this is proper approach.
I also need a way on master node to read the ChunkResponse in some callback. I guess I can use afterChunk(ChunkContext) method of ChunkListener
but I couldn't find a way to get ChunkResponse from ChunkContext in the method.
So to sump it up I would like to know how can I pass data from slave to master per chunk and on master node how can I read this data.
Thanks a lot.
EDIT
In my case master node reads user records and slave nodes process these records. At the end of the job
master needs to take conditional action based on whether processing of a particular user failed or succeeded. The fail/success on
slave node is not based on any exception thrown there but based on some business rules. And there is other data that master needs to know about, for example
how many emails were sent for each user. Now if I was using remote partitioning I could use jobContext to put and get this data but in remote chunking
jobContext is not available. So I was wondering if along with ChunkResponse I could send back some additional data from slave to master.
Is there function in Spark just like mapreduce's cleanup() function in hadoop? If there isn't, how to know the end of the task?
There is a demand: when the task process data of the last one or last row (the data will be process one by one, isn't it?), I need to execute some custom code or customized behavior.
You need to invoke SparkContext.stop() at the end of your job. But in case you want to have some customized behavior like ensuring that connection are closed than you have to write custom code for achieving the same.
Invoking SparkContext.stop() will cleanup/ destroy/ release all resources used claimed by the specific Spark Job.
There is also a SparkContext.isStopped which returns true in case SparkContext is destroyed or in process of destroying. Refer API Here
I am working with a spring batch application for the first time and since the framework is way too flexible, I have a few questions on performance and best practices implementing jobs which I couldn't find clear answers in the spring docs.
My Goals:
read an ASCII file with fixed column length values sent by a third-party with previously specified layout (STEP 1 reader)
validate the read values and register (Log file) the errors (custom messages)
Apply some business logic on the processor to filter any undesirable lines (STEP 1 processor)
write the valid lines on oracle database (STEP 1 writer)
After the execution of the previous step update a table on the database with the the step 1 finish timestamp (STEP 2 tasklet)
Send an email when the job is stopped with a summary of the quantities already processed, errors and written lines, start time and finish time (Are These informations on the jobRepository meta-data?)
Assumptions:
The file is incremental, so the third party always sends the prior file lines (possible with some values changes) and any new lines (~120Million lines on total). A new file is sent every 6 months.
we must validate if input file lines while processing (Are required values present? Some can be converted to number and Dates?)
The job must be stoppable/restartable since is intended to run on a Time window.
What I planning to do:
To achieve some performance on reading and writing I am avoiding use of Spring's out-of-the-box reflection beans and using jdbcBatchWriter to write the processed lines to the database.
The FileReader reads the lines with a custom FieldSetMapper, transform all the columns with FieldSet.readString method (this implies no ParseException on Reading). A Bean injected on the Processor performs parsing and validation, so this way we can avoid skipping exceptions during reading which seems an expensive operation and can count the invalid lines to pass through future steps, saving the info on the step/job execution context.
The processor bean should convert the object read an return a Wrapper with the original object, the parsed values (i.e., Dates and Longs), the first eventual Exception thrown by the Parsing and a boolean that indicates whether the validation was successful or not. After the parsing another CustomProcessor check if the register should be inserted on the database by querying similar or identical registers already inserted. This business rule could imply in a query into the database per valid line in the worst scenario.
A jdbcItemWriter discards null values returned by the processors and writes valid registers to the database.
So The real Questions regarding batch processing:
What are some performance tips that a I could use to improve the batch performance? In a preliminary attempt the load of a perfect valid mock input file into the database led to 15 hours of processing without querying the database to verify if the processed register should be inserted. What could be the local processing simplest solution?
Have you seen partitioning ? http://docs.spring.io/spring-batch/reference/html/scalability.html and this may also helpful remote chunking with the control on reader in spring batch