Groovy/Grails GPARS: How to execute 2 calculations parallel? - groovy
I'm new to the GPARS-library and implementing it in our software at the moment.
It's no problem for me to use it instead of the normal groovy-methods like
[..].each{..}
->
[..].eachParallel{..}
But I'm wondering how to parallelize 2 tasks which are returning a value.
Without GPARS I would do it this way:
List<Thread> threads = []
def forecastData
def actualData
threads.add(Thread.start {
forecastData = cosmoSegmentationService.getForecastSegmentCharacteristics(dataset, planPeriod, thruPeriod)
})
threads.add(Thread.start {
actualData = cosmoSegmentationService.getMeasuredSegmentCharacteristics(dataset, fromPeriod, thruPeriodActual)
})
threads*.join()
// merge both datasets
def data = actualData + forecastData
But (how) can this be done with the GparsPool?
You could use Dataflow:
import groovyx.gpars.dataflow.*
import static groovyx.gpars.dataflow.Dataflow.task
def forecastData = new DataflowVariable()
def actualData = new DataflowVariable()
def result = new DataflowVariable()
task {
forecastData << cosmoSegmentationService.getForecastSegmentCharacteristics( dataset, planPeriod, thruPeriod )
}
task {
actualData << cosmoSegmentationService.getMeasuredSegmentCharacteristics( dataset, fromPeriod, thruPeriodActual )
}
task {
result << forecastData.val + actualData.val
}
println result.val
Alternative for GPars 0.9:
import static groovyx.gpars.GParsPool.withPool
def getForecast = {
cosmoSegmentationService.getForecastSegmentCharacteristics( dataset, planPeriod, }
def getActual = {
cosmoSegmentationService.getMeasuredSegmentCharacteristics( dataset, fromPeriod, thruPeriodActual )
}
def results = withPool {
[ getForecast.callAsync(), getActual.callAsync() ]
}
println results*.get().sum()
import groovyx.gpars.GParsPool
List todoList =[]
todoList.add {
for(int i1: 1..100){
println "task 1:" +i1
sleep(300)
}
}
todoList.add {
for(int i2: 101..200){
println "task 2:" +i2
sleep(300)
}
}
GParsPool.withPool(2) {
todoList.collectParallel { closure->closure() }
}
Related
Finding a String from list in a String is not efficient enough
def errorList = readFile WORKSPACE + "/list.txt" def knownErrorListbyLine = errorList.readLines() def build_log = new URL (Build_Log_URL).getText() def found_errors = null for(knownError in knownErrorListbyLine) { if (build_log.contains(knownError)) { found_errors = build_log.readLines().findAll{ it.contains(knownError) } for(error in found_errors) { println "FOUND ERROR: " + error } } } I wrote this code to find listed errors in a string, but it takes about 20 seconds. How can I improve the performance? I would love to learn from this. Thanks a lot! list.txt contains a string per line: Step ... was FAILED [ERROR] Pod-domainrouter call failed #type":"ErrorExtender [postDeploymentSteps] ... does not exist. etc... And build logs is where I need to find these errors.
Try this: def errorList = readFile WORKSPACE + "/list.txt" def knownErrorListbyLine = errorList.readLines() def build_log = new URL (Build_Log_URL) def found_errors = null for(knownError in knownErrorListbyLine) { build_log.eachLine{ if ( it.contains(knownError) ) { println "FOUND ERROR: " + error } } } This might be even more performant: def errorList = readFile WORKSPACE + "/list.txt" def knownErrorListbyLine = errorList.readLines() def build_log = new URL (Build_Log_URL) def found_errors = null build_log.eachLine{ for(knownError in knownErrorListbyLine) { if ( it.contains(knownError) ) { println "FOUND ERROR: " + error } } } Attempt using the last one relying on string eachLine instead. def errorList = readFile WORKSPACE + "/list.txt" def knownErrorListbyLine = errorList.readLines() def build_log = new URL (Build_Log_URL).getText() def found_errors = null build_log.eachLine{ for(knownError in knownErrorListbyLine) { if ( it.contains(knownError) ) { println "FOUND ERROR: " + error } } }
Try to move build_log.readLines() to the variable outside of the loop. def errorList = readFile WORKSPACE + "/list.txt" def knownErrorListbyLine = errorList.readLines() def build_log = new URL (Build_Log_URL).getText() def found_errors = null def buildLogByLine = build_log.readLines() for(knownError in knownErrorListbyLine) { if (build_log.contains(knownError)) { found_errors = buildLogByLine.findAll{ it.contains(knownError) } for(error in found_errors) { println "FOUND ERROR: " + error } } } Update: using multiple threads Note: this may help in case errorList size is large enough. And also if the matching errors distributed evenly. def sublists = knownErrorListbyLine.collate(x) // int x - the sublist size, // depends on the knownErrorListbyLine size, set the value to get e. g. 4 sublists (threads). // Also do not use more than 2 threads per CPU. Start from 1 thread per CPU. def logsWithErrors = []// list for store results per thread def lock = new Object() def threads = sublists.collect { errorSublist -> Thread.start { def logs = build_log.readLines() errorSublist.findAll { build_log.contains(it) }.each { error -> def results = logs.findAll { it.contains(error) } synchronized(lock) { logsWithErrors << results } } } } threads*.join() // wait for all threads to finish logsWithErrors.flatten().each { println "FOUND ERROR: $it" } Also, as was suggested earlier by other user, try to measure the logs download time, it could be the bottleneck: def errorList = readFile WORKSPACE + "/list.txt" def knownErrorListbyLine = errorList.readLines() def start = Calendar.getInstance().timeInMillis def build_log = new URL(Build_Log_URL).getText() def end = Calendar.getInstance().timeInMillis println "Logs download time: ${(end-start)/1000} ms" def found_errors = null
Nested JSON with duplicate keys
I will have to process 10 billion Nested JSON records per day using NiFi (version 1.9). As part of the job, am trying to convert the nested JSON to csv using Groovy script. I referred the below Stack Overflow questions related to the same topic and came up with the below code. Groovy collect from map and submap how to convert json into key value pair completely using groovy But am not sure how to retrieve the value of duplicate keys. Sample json is defined in the variable "json" in the below code. key "Flag1" will be coming in multiple sections (i.e., "OF" & "SF"). I want to get the output as csv. Below is the output if I execute the below groovy code 2019-10-08 22:33:29.244000,v12,-,36178,0,0/0,10.65.5.56,sf,sf (flag1 key value is replaced by that key column's last occurrence value) I am not an expert in Groovy. Also please suggest if there is any other better approach, so that I will give a try. import groovy.json.* def json = '{"transaction":{"TS":"2019-10-08 22:33:29.244000","CIPG":{"CIP":"10.65.5.56","CP":"0"},"OF":{"Flag1":"of","Flag2":"-"},"SF":{"Flag1":"sf","Flag2":"-"}}' def jsonReplace = json.replace('{"transaction":{','{"transaction":[{').replace('}}}','}}]}') def jsonRecord = new JsonSlurper().parseText(jsonReplace) def columns = ["TS","V","PID","RS","SR","CnID","CIP","Flag1","Flag1"] def flatten flatten = { row -> def flattened = [:] row.each { k, v -> if (v instanceof Map) { flattened << flatten(v) } else if (v instanceof Collection && v.every {it instanceof Map}) { v.each { flattened << flatten(it) } } else { flattened[k] = v } } flattened } print "output: " + jsonRecord.transaction.collect {row -> columns.collect {colName -> flatten(row)[colName]}.join(',')}.join('\n') Edit: Based on the reply from #cfrick and #stck, I have tried the option and have follow up question below. #cfrick and #stck- Thanks for your response. Original source JSON record will have more than 100 columns and I am using "InvokeScriptedProcessor" in NiFi to trigger the Groovy script. Below is the original Groovy script am using in "InvokeScriptedProcessor" in which I have used Streams(inputstream, outputstream). Is this what you are referring. Am I doing anything wrong? import groovy.json.JsonSlurper class customJSONtoCSV implements Processor { def REL_SUCCESS = new Relationship.Builder().name("success").description("FlowFiles that were successfully processed").build(); def log static def flatten(row, prefix="") { def flattened = new HashMap<String, String>() row.each { String k, Object v -> def key = prefix ? prefix + "_" + k : k; if (v instanceof Map) { flattened.putAll(flatten(v, k)) } else { flattened.put(key, v.toString()) } } return flattened } static def toCSVRow(HashMap row) { def columns = ["CIPG_CIP","CIPG_CP","CIPG_SLP","CIPG_SLEP","CIPG_CVID","SIPG_SIP","SIPG_SP","SIPG_InP","SIPG_SVID","TG_T","TG_R","TG_C","TG_SDL","DL","I_R","UAP","EDBL","Ca","A","RQM","RSM","FIT","CSR","OF_Flag1","OF_Flag2","OF_Flag3","OF_Flag4","OF_Flag5","OF_Flag6","OF_Flag7","OF_Flag8","OF_Flag9","OF_Flag10","OF_Flag11","OF_Flag12","OF_Flag13","OF_Flag14","OF_Flag15","OF_Flag16","OF_Flag17","OF_Flag18","OF_Flag19","OF_Flag20","OF_Flag21","OF_Flag22","OF_Flag23","SF_Flag1","SF_Flag2","SF_Flag3","SF_Flag4","SF_Flag5","SF_Flag6","SF_Flag7","SF_Flag8","SF_Flag9","SF_Flag10","SF_Flag11","SF_Flag12","SF_Flag13","SF_Flag14","SF_Flag15","SF_Flag16","SF_Flag17","SF_Flag18","SF_Flag19","SF_Flag20","SF_Flag21","SF_Flag22","SF_Flag23","SF_Flag24","GF_Flag1","GF_Flag2","GF_Flag3","GF_Flag4","GF_Flag5","GF_Flag6","GF_Flag7","GF_Flag8","GF_Flag9","GF_Flag10","GF_Flag11","GF_Flag12","GF_Flag13","GF_Flag14","GF_Flag15","GF_Flag16","GF_Flag17","GF_Flag18","GF_Flag19","GF_Flag20","GF_Flag21","GF_Flag22","GF_Flag23","GF_Flag24","GF_Flag25","GF_Flag26","GF_Flag27","GF_Flag28","GF_Flag29","GF_Flag30","GF_Flag31","GF_Flag32","GF_Flag33","GF_Flag34","GF_Flag35","VSL_VSID","VSL_TC","VSL_MTC","VSL_NRTC","VSL_ET","VSL_HRES","VSL_VRES","VSL_FS","VSL_FR","VSL_VSD","VSL_ACB","VSL_ASB","VSL_VPR","VSL_VSST","HRU_HM","HRU_HD","HRU_HP","HRU_HQ","URLF_CID","URLF_CGID","URLF_CR","URLF_RA","URLF_USM","URLF_USP","URLF_MUS","TCPSt_WS","TCPSt_SE","TCPSt_WSFNS","TCPSt_WSF","TCPSt_EM","TCPSt_RSTE","TCPSt_MSS","NS_OPID","NS_ODID","NS_EPID","NS_TrID","NS_VSN","NS_LSUT","NS_STTS","NS_TCPPR","CQA_NL","CQA_CL","CQA_CLC","CQA_SQ","CQA_SQC","TS","V","PID","RS","SR","CnID","A_S","OS","CPr","CVB","CS","HS","SUNR","SUNS","ML","MT","TCPSL","CT","MS","MSH","SID","SuID","UA","DID","UAG","CID","HR","CRG","CP1","CP2","AIDF","UCB","CLID","CLCL","OPTS","PUAG","SSLIL"] return columns.collect { column -> return row.containsKey(column) ? row.get(column) : "" }.join(',') } #Override void initialize(ProcessorInitializationContext context) { log = context.getLogger() } #Override Set<Relationship> getRelationships() { return [REL_SUCCESS] as Set } #Override void onTrigger(ProcessContext context, ProcessSessionFactory sessionFactory) throws ProcessException { try { def session = sessionFactory.createSession() def flowFile = session.get() if (!flowFile) return flowFile = session.write(flowFile, { inputStream, outputStream -> def bufferedReader = new BufferedReader(new InputStreamReader(inputStream, 'UTF-8')) def jsonSlurper = new JsonSlurper() def line def header = "CIPG_CIP,CIPG_CP,CIPG_SLP,CIPG_SLEP,CIPG_CVID,SIPG_SIP,SIPG_SP,SIPG_InP,SIPG_SVID,TG_T,TG_R,TG_C,TG_SDL,DL,I_R,UAP,EDBL,Ca,A,RQM,RSM,FIT,CSR,OF_Flag1,OF_Flag2,OF_Flag3,OF_Flag4,OF_Flag5,OF_Flag6,OF_Flag7,OF_Flag8,OF_Flag9,OF_Flag10,OF_Flag11,OF_Flag12,OF_Flag13,OF_Flag14,OF_Flag15,OF_Flag16,OF_Flag17,OF_Flag18,OF_Flag19,OF_Flag20,OF_Flag21,OF_Flag22,OF_Flag23,SF_Flag1,SF_Flag2,SF_Flag3,SF_Flag4,SF_Flag5,SF_Flag6,SF_Flag7,SF_Flag8,SF_Flag9,SF_Flag10,SF_Flag11,SF_Flag12,SF_Flag13,SF_Flag14,SF_Flag15,SF_Flag16,SF_Flag17,SF_Flag18,SF_Flag19,SF_Flag20,SF_Flag21,SF_Flag22,SF_Flag23,SF_Flag24,GF_Flag1,GF_Flag2,GF_Flag3,GF_Flag4,GF_Flag5,GF_Flag6,GF_Flag7,GF_Flag8,GF_Flag9,GF_Flag10,GF_Flag11,GF_Flag12,GF_Flag13,GF_Flag14,GF_Flag15,GF_Flag16,GF_Flag17,GF_Flag18,GF_Flag19,GF_Flag20,GF_Flag21,GF_Flag22,GF_Flag23,GF_Flag24,GF_Flag25,GF_Flag26,GF_Flag27,GF_Flag28,GF_Flag29,GF_Flag30,GF_Flag31,GF_Flag32,GF_Flag33,GF_Flag34,GF_Flag35,VSL_VSID,VSL_TC,VSL_MTC,VSL_NRTC,VSL_ET,VSL_HRES,VSL_VRES,VSL_FS,VSL_FR,VSL_VSD,VSL_ACB,VSL_ASB,VSL_VPR,VSL_VSST,HRU_HM,HRU_HD,HRU_HP,HRU_HQ,URLF_CID,URLF_CGID,URLF_CR,URLF_RA,URLF_USM,URLF_USP,URLF_MUS,TCPSt_WS,TCPSt_SE,TCPSt_WSFNS,TCPSt_WSF,TCPSt_EM,TCPSt_RSTE,TCPSt_MSS,NS_OPID,NS_ODID,NS_EPID,NS_TrID,NS_VSN,NS_LSUT,NS_STTS,NS_TCPPR,CQA_NL,CQA_CL,CQA_CLC,CQA_SQ,CQA_SQC,TS,V,PID,RS,SR,CnID,A_S,OS,CPr,CVB,CS,HS,SUNR,SUNS,ML,MT,TCPSL,CT,MS,MSH,SID,SuID,UA,DID,UAG,CID,HR,CRG,CP1,CP2,AIDF,UCB,CLID,CLCL,OPTS,PUAG,SSLIL" outputStream.write("${header}\n".getBytes('UTF-8')) while (line = bufferedReader.readLine()) { def jsonReplace = line.replace('{"transaction":{','{"transaction":[{').replace('}}}','}}]}') def jsonRecord = new JsonSlurper().parseText(jsonReplace) def a = jsonRecord.transaction.collect { row -> return flatten(row) }.collect { row -> return toCSVRow(row) } outputStream.write("${a}\n".getBytes('UTF-8')) } } as StreamCallback) session.transfer(flowFile, REL_SUCCESS) session.commit() } catch (e) { throw new ProcessException(e) } } #Override Collection<ValidationResult> validate(ValidationContext context) { return null } #Override PropertyDescriptor getPropertyDescriptor(String name) { return null } #Override void onPropertyModified(PropertyDescriptor descriptor, String oldValue, String newValue) { } #Override List<PropertyDescriptor> getPropertyDescriptors() { return [] as List } #Override String getIdentifier() { return null } } processor = new customJSONtoCSV() If I should not use "collect" then what else I need to use to create the rows. In the output flow file, the record output is coming inside []. I tried the below but it is not working. Not sure whether am doing the right thing. I want csv output without [] return toCSVRow(row).toString()
If you know what you want to extract exactly (and given you want to generate a CSV from it) IMHO you are way better off to just shape the data in the way you later want to consume it. E.g. def data = new groovy.json.JsonSlurper().parseText('[{"TS":"2019-10-08 22:33:29.244000","CIPG":{"CIP":"10.65.5.56","CP":"0"},"OF":{"Flag1":"of","Flag2":"-"},"SF":{"Flag1":"sf","Flag2":"-"}}]') extractors = [ { it.TS }, { it.V }, { it.PID }, { it.RS }, { it.SR }, { it.CIPG.CIP }, { it.CIPG.CP }, { it.OF.Flag1 }, { it.SF.Flag1 },] def extract(row) { extractors.collect{ it(row) } } println(data.collect{extract it}) // ⇒ [[2019-10-08 22:33:29.244000, null, null, null, null, 10.65.5.56, 0, of, sf]] As stated in the other answer, due to the sheer amount of data you are trying to convert:: Make sure to use a library to generate the CSV file from that, or else you will hit problems with the content, you try to write (e.g. line breaks or the data containing the separator char). Don't use collect (it is eager) to create the rows.
The idea is to modify "flatten" method - it should differentiate between same nested keys by providing parent key as a prefix. I've simplified code a bit: import groovy.json.* def json = '{"transaction":{"TS":"2019-10-08 22:33:29.244000","CIPG":{"CIP":"10.65.5.56","CP":"0"},"OF":{"Flag1":"of","Flag2":"-"},"SF":{"Flag1":"sf","Flag2":"-"}}' def jsonReplace = json.replace('{"transaction":{','{"transaction":[{').replace('}}','}}]') def jsonRecord = new JsonSlurper().parseText(jsonReplace) static def flatten(row, prefix="") { def flattened = new HashMap<String, String>() row.each { String k, Object v -> def key = prefix ? prefix + "." + k : k; if (v instanceof Map) { flattened.putAll(flatten(v, k)) } else { flattened.put(key, v.toString()) } } return flattened } static def toCSVRow(HashMap row) { def columns = ["TS","V","PID","RS","SR","CnID","CIP","OF.Flag1","SF.Flag1"] // Last 2 keys have changed! return columns.collect { column -> return row.containsKey(column) ? row.get(column) : "" }.join(', ') } def a = jsonRecord.transaction.collect { row -> return flatten(row) }.collect { row -> return toCSVRow(row) }.join('\n') println a Output would be: 2019-10-08 22:33:29.244000, , , , , , , of, sf
How to use MockK to mock an observable
I have a data provider that has an Observable<Int> as part of the public API. My class under test maps this into a Observable<String>. How do I create a mock so that it can send out different values on the data provider's observable? I can do it using a Fake object, but that is a lot of work that I don't think is necessary with MockK. Simplified code: interface DataProvider { val numberData:Observable<Int> } class FakeDataProvider():DataProvider { private val _numberData = BehaviorSubject.createDefault(0) override val numberData = _numberData.hide() // Note: the internals of this class cause the _numberData changes. // I can use this method to fake the changes for this fake object, // but the real class doesn't have this method. fun fakeNewNumber( newNumber:Int ) { _numberData.onNext( newNumber ) } } interface ClassUnderTest { val stringData:Observable<String> } class MyClassUnderTest( dataProvider: DataProvider ):ClassUnderTest { override val stringData = dataProvider.numberData.map { "string = " + it.toString() } } class MockKTests { #Test fun testUsingFakeDataProvider() { val fakeDataProvider = FakeDataProvider() val classUnderTest = MyClassUnderTest( fakeDataProvider ) val stringDataTestObserver = TestObserver<String>() classUnderTest.stringData.subscribe( stringDataTestObserver ) fakeDataProvider.fakeNewNumber( 1 ) fakeDataProvider.fakeNewNumber( 2 ) fakeDataProvider.fakeNewNumber( 3 ) // Note we are expecting the initial value of 0 to also come through stringDataTestObserver.assertValuesOnly( "string = 0", "string = 1","string = 2","string = 3" ) } // How do you write the mock to trigger the dataProvider observable? #Test fun testUsingMockDataProvider() { val mockDataProvider = mockk<DataProvider>() // every { ... what goes here ... } just Runs val classUnderTest = MyClassUnderTest( mockDataProvider ) val stringDataTestObserver = TestObserver<String>() classUnderTest.stringData.subscribe( stringDataTestObserver ) // Note we are expecting the initial value of 0 to also come through stringDataTestObserver.assertValuesOnly( "string = 0", "string = 1","string = 2","string = 3" ) } }
Try to use following: every { mockDataProvider.numberData } answers { Observable.range(1, 3) } And maybe you need to use another way to make a mock object, like this: val mockDataProvider = spyk(DataProvider())
Do something like this where we create an observable fakelist of the observable var fakeList :List<Quiz> = (listOf<Quiz>( Quiz("G1","fromtest","","",1) )) var observableFakelist = Observable.fromArray(fakeList) you can then return your observableFakelist.
groovy thread for urls
I wrote logic for testing urls using threads. This works good for less number of urls and failing with more than 400 urls to check . class URL extends Thread{ def valid def url URL( url ) { this.url = url } void run() { try { def connection = url.toURL().openConnection() connection.setConnectTimeout(10000) if(connection.responseCode == 200 ){ valid = Boolean.TRUE }else{ valid = Boolean.FALSE } } catch ( Exception e ) { valid = Boolean.FALSE } } } def threads = []; urls.each { ur -> def reader = new URL(ur) reader.start() threads.add(reader); } while (threads.size() > 0) { for(int i =0; i < threads.size();i++) { def tr = threads.get(i); if (!tr.isAlive()) { if(tr.valid == true){ threads.remove(i); i--; }else{ threads.remove(i); i--; } } } Could any one please tell me how to optimize the logic and where i was going wrong . thanks in advance.
Have you considered using the java.util.concurrent helpers? It allows multithreaded programming at a higher level of abstraction. There's a simple interface to run parallel tasks in a thread pool, which is easier to manage and tune than just creating n threads for n tasks and hoping for the best. Your code then ends up looking something like this, where you can tune nThreads until you get the best performance: import java.util.concurrent.* def nThreads = 1000 def pool = Executors.newFixedThreadPool(nThreads) urls.each { url -> pool.submit(url) } def timeout = 60 pool.awaitTermination(timeout, TimeUnit.SECONDS)
Using ataylor's suggestion, and your code, I got to this: import java.util.concurrent.Executors import java.util.concurrent.TimeUnit class MyURL implements Runnable { def valid def url void run() { try { url.toURL().openConnection().with { connectTimeout = 10000 if( responseCode == 200 ) { valid = true } else { valid = false } disconnect() } } catch( e ) { valid = false } } } // A list of URLs to check def urls = [ 'http://www.google.com', 'http://stackoverflow.com/questions/2720325/groovy-thread-for-urls', 'http://www.nonexistanturlfortesting.co.ch/whatever' ] // How many threads to kick off def nThreads = 3 def pool = Executors.newFixedThreadPool( nThreads ) // Construct a list of the URL objects we're running, submitted to the pool def results = urls.inject( [] ) { list, url -> def u = new MyURL( url:url ) pool.submit u list << u } // Wait for the poolclose when all threads are completed def timeout = 10 pool.shutdown() pool.awaitTermination( timeout, TimeUnit.SECONDS ) // Print our results results.each { println "$it.url : $it.valid" } Which prints out this: http://www.google.com : true http://stackoverflow.com/questions/2720325/groovy-thread-for-urls : true http://www.nonexistanturlfortesting.co.ch/whatever : false I changed the classname to MyURL rather than URL as you had it, as it will more likely avoid problems when you start using the java.net.URL class
groovy multithreading
I'm newbie to groovy/grails. How to implement thread for this code . Had 2500 urls and this was taking hours of time for checking each url. so i decided to implement multi-thread for this : Here is my sample code : def urls = [ "http://www.wordpress.com", "http://67.192.103.225/QRA.Public/" , "http://www.subaru.com", "http://baldwinfilter.com/products/start.html" ] def up = urls.collect { ur -> try { def url = new URL(ur) def connection = url.openConnection() if (connection.responseCode == 200) { return true } else { return false } } catch (Exception e) { return false } } For this code i need to implement multi-threading . Could any one please suggest me the code. thanks in advance, sri.
I would take a look at the Groovy Parallel Systems library. In particular I think that the Parallel collections section would be useful. Looking at the docs, I believe that collectParallel is a direct drop-in replacement for collect (bearing in mind the obvious caveats about side-effects). The following works fine for me: def urls = [ "http://www.wordpress.com", "http://www.subaru.com", "http://baldwinfilter.com/products/start.html" ] Parallelizer.doParallel { def up = urls.collectParallel { ur -> try { def url = new URL(ur) def connection = url.openConnection() if (connection.responseCode == 200) { return true } else { return false } } catch (Exception e) { return false } } println up }
See the Groovy docs for an example how to use an ExecutorService to do what you want.
You can use this to check the URL in a separate thread. class URLReader implements Runnable { def valid def url URLReader( url ) { this.url = url } void run() { try { def connection = url.toURL().openConnection() valid = ( connection.responseCode == 200 ) as Boolean } catch ( Exception e ) { println e.message valid = Boolean.FALSE } } } def reader = new URLReader( "http://www.google.com" ) new Thread( reader ).start() while ( reader.valid == null ) { Thread.sleep( 500 ) } println "valid: ${reader.valid}" Notes: The valid attribute will be either null, Boolean.TRUE or Boolean.FALSE. You'll need to wait for a while to give all the threads a chance to open the connection. Depending on the number of URLs you're checking you will eventually hit a limit of the number of threads / connections you can realistically handle, so should check URLs in batches of the appropriate size.
I think this way is very simple to achieve. import java.util.concurrent.* //Thread number THREADS = 100 pool = Executors.newFixedThreadPool(THREADS) defer = { c -> pool.submit(c as Callable) } def urls = [ "http://www.wordpress.com", "http://www.subaru.com", ] def getUrl = { url -> def connection = url.openConnection() if (connection.responseCode == 200) { return true } else { return false } } def up = urls.collect { ur -> try { def url = new URL(ur) defer{ getUrl(url) }.get() } catch (Exception e) { return false } } println up pool.shutdown()
This is how I implemented: class ValidateLinks extends Thread{ def valid def url ValidateLinks( url ) { this.url = url } void run() { try { def connection = url.toURL().openConnection() connection.setConnectTimeout(5000) valid = ( connection.responseCode == 200 ) as Boolean } catch ( Exception e ) { println url + "-" + e.message valid = Boolean.FALSE } } } def threads = []; urls.each { ur -> def reader = new ValidateLinks(ur.site_url) reader.start() threads.add(reader); } while (threads.size() > 0) { for(int i =0; i < threads.size();i++) { def tr = threads.get(i); if (!tr.isAlive()) { println "URL : " + tr.url + "Valid " + tr.valid threads.remove(i); i--; } } }