groovy multithreading - multithreading

groovy multithreading - multithreading

I'm newbie to groovy/grails.
How to implement thread for this code . Had 2500 urls and this was taking hours of time for checking each url.
so i decided to implement multi-thread for this :
Here is my sample code :
def urls = [
"http://www.wordpress.com",
"http://67.192.103.225/QRA.Public/" ,
"http://www.subaru.com",
"http://baldwinfilter.com/products/start.html"
]
def up = urls.collect { ur ->
try {
def url = new URL(ur)
def connection = url.openConnection()
if (connection.responseCode == 200) {
return true
} else {
return false
}
} catch (Exception e) {
return false
}
}
For this code i need to implement multi-threading .
Could any one please suggest me the code.
thanks in advance,
sri.

I would take a look at the Groovy Parallel Systems library. In particular I think that the Parallel collections section would be useful.
Looking at the docs, I believe that collectParallel is a direct drop-in replacement for collect (bearing in mind the obvious caveats about side-effects). The following works fine for me:
def urls = [
"http://www.wordpress.com",
"http://www.subaru.com",
"http://baldwinfilter.com/products/start.html"
]
Parallelizer.doParallel {
def up = urls.collectParallel { ur ->
try {
def url = new URL(ur)
def connection = url.openConnection()
if (connection.responseCode == 200) {
return true
} else {
return false
}
} catch (Exception e) {
return false
}
}
println up
}

See the Groovy docs for an example how to use an ExecutorService to do what you want.

You can use this to check the URL in a separate thread.
class URLReader implements Runnable
{
def valid
def url
URLReader( url ) {
this.url = url
}
void run() {
try {
def connection = url.toURL().openConnection()
valid = ( connection.responseCode == 200 ) as Boolean
} catch ( Exception e ) {
println e.message
valid = Boolean.FALSE
}
}
}
def reader = new URLReader( "http://www.google.com" )
new Thread( reader ).start()
while ( reader.valid == null )
{
Thread.sleep( 500 )
}
println "valid: ${reader.valid}"
Notes: The valid attribute will be either null, Boolean.TRUE or Boolean.FALSE. You'll need to wait for a while to give all the threads a chance to open the connection. Depending on the number of URLs you're checking you will eventually hit a limit of the number of threads / connections you can realistically handle, so should check URLs in batches of the appropriate size.

I think this way is very simple to achieve.
import java.util.concurrent.*
//Thread number
THREADS = 100
pool = Executors.newFixedThreadPool(THREADS)
defer = { c -> pool.submit(c as Callable) }
def urls = [
"http://www.wordpress.com",
"http://www.subaru.com",
]
def getUrl = { url ->
def connection = url.openConnection()
if (connection.responseCode == 200) {
return true
} else {
return false
}
}
def up = urls.collect { ur ->
try {
def url = new URL(ur)
defer{ getUrl(url) }.get()
} catch (Exception e) {
return false
}
}
println up
pool.shutdown()

This is how I implemented:
class ValidateLinks extends Thread{
def valid
def url
ValidateLinks( url ) {
this.url = url
}
void run() {
try {
def connection = url.toURL().openConnection()
connection.setConnectTimeout(5000)
valid = ( connection.responseCode == 200 ) as Boolean
} catch ( Exception e ) {
println url + "-" + e.message
valid = Boolean.FALSE
}
}
}
def threads = [];
urls.each { ur ->
def reader = new ValidateLinks(ur.site_url)
reader.start()
threads.add(reader);
}
while (threads.size() > 0) {
for(int i =0; i < threads.size();i++) {
def tr = threads.get(i);
if (!tr.isAlive()) {
println "URL : " + tr.url + "Valid " + tr.valid
threads.remove(i);
i--;
}
}
}

Related

Finding a String from list in a String is not efficient enough

def errorList = readFile WORKSPACE + "/list.txt"
def knownErrorListbyLine = errorList.readLines()
def build_log = new URL (Build_Log_URL).getText()
def found_errors = null
for(knownError in knownErrorListbyLine) {
if (build_log.contains(knownError)) {
found_errors = build_log.readLines().findAll{ it.contains(knownError) }
for(error in found_errors) {
println "FOUND ERROR: " + error
}
}
}
I wrote this code to find listed errors in a string, but it takes about 20 seconds.
How can I improve the performance? I would love to learn from this.
Thanks a lot!
list.txt contains a string per line:
Step ... was FAILED
[ERROR] Pod-domainrouter call failed
#type":"ErrorExtender
[postDeploymentSteps] ... does not exist.
etc...
And build logs is where I need to find these errors.

Try this:
def errorList = readFile WORKSPACE + "/list.txt"
def knownErrorListbyLine = errorList.readLines()
def build_log = new URL (Build_Log_URL)
def found_errors = null
for(knownError in knownErrorListbyLine) {
build_log.eachLine{
if ( it.contains(knownError) ) {
println "FOUND ERROR: " + error
}
}
}
This might be even more performant:
def errorList = readFile WORKSPACE + "/list.txt"
def knownErrorListbyLine = errorList.readLines()
def build_log = new URL (Build_Log_URL)
def found_errors = null
build_log.eachLine{
for(knownError in knownErrorListbyLine) {
if ( it.contains(knownError) ) {
println "FOUND ERROR: " + error
}
}
}
Attempt using the last one relying on string eachLine instead.
def errorList = readFile WORKSPACE + "/list.txt"
def knownErrorListbyLine = errorList.readLines()
def build_log = new URL (Build_Log_URL).getText()
def found_errors = null
build_log.eachLine{
for(knownError in knownErrorListbyLine) {
if ( it.contains(knownError) ) {
println "FOUND ERROR: " + error
}
}
}

Try to move build_log.readLines() to the variable outside of the loop.
def errorList = readFile WORKSPACE + "/list.txt"
def knownErrorListbyLine = errorList.readLines()
def build_log = new URL (Build_Log_URL).getText()
def found_errors = null
def buildLogByLine = build_log.readLines()
for(knownError in knownErrorListbyLine) {
if (build_log.contains(knownError)) {
found_errors = buildLogByLine.findAll{ it.contains(knownError) }
for(error in found_errors) {
println "FOUND ERROR: " + error
}
}
}
Update: using multiple threads
Note: this may help in case errorList size is large enough. And also if the matching errors distributed evenly.
def sublists = knownErrorListbyLine.collate(x)
// int x - the sublist size,
// depends on the knownErrorListbyLine size, set the value to get e. g. 4 sublists (threads).
// Also do not use more than 2 threads per CPU. Start from 1 thread per CPU.
def logsWithErrors = []// list for store results per thread
def lock = new Object()
def threads = sublists.collect { errorSublist ->
Thread.start {
def logs = build_log.readLines()
errorSublist.findAll { build_log.contains(it) }.each { error ->
def results = logs.findAll { it.contains(error) }
synchronized(lock) {
logsWithErrors << results
}
}
}
}
threads*.join() // wait for all threads to finish
logsWithErrors.flatten().each {
println "FOUND ERROR: $it"
}
Also, as was suggested earlier by other user, try to measure the logs download time, it could be the bottleneck:
def errorList = readFile WORKSPACE + "/list.txt"
def knownErrorListbyLine = errorList.readLines()
def start = Calendar.getInstance().timeInMillis
def build_log = new URL(Build_Log_URL).getText()
def end = Calendar.getInstance().timeInMillis
println "Logs download time: ${(end-start)/1000} ms"
def found_errors = null

concurrent query and insert have any side effect in android with objectbox?

In my android project, I use objectbox as database, if I insert with lock and query without lock, is there any side effect ? such as crash and so on.
fun query(uniqueId: String = ""): MutableList<T> {
if (box.store.isClosed) return mutableListOf()
val query = box.query()
withQueryBuilder(query, uniqueId)
//开始
return query.build().find()
}
private fun putInner(entity: T): Long {
synchronized(box.store) {
if (box.store.isClosed) return -1
if (entity.unique.isBlank()) {
entity.unique = entity.providerUnique()
}
entity.timestamp = System.currentTimeMillis()
return try {
box.put(entity).let { id -> entity.id = id }
entity.id
} catch (ex: Exception) {
-1
}
}
}

Nested JSON with duplicate keys

I will have to process 10 billion Nested JSON records per day using NiFi (version 1.9). As part of the job, am trying to convert the nested JSON to csv using Groovy script. I referred the below Stack Overflow questions related to the same topic and came up with the below code.
Groovy collect from map and submap
how to convert json into key value pair completely using groovy
But am not sure how to retrieve the value of duplicate keys. Sample json is defined in the variable "json" in the below code. key "Flag1" will be coming in multiple sections (i.e., "OF" & "SF"). I want to get the output as csv.
Below is the output if I execute the below groovy code 2019-10-08 22:33:29.244000,v12,-,36178,0,0/0,10.65.5.56,sf,sf (flag1 key value is replaced by that key column's last occurrence value)
I am not an expert in Groovy. Also please suggest if there is any other better approach, so that I will give a try.
import groovy.json.*
def json = '{"transaction":{"TS":"2019-10-08 22:33:29.244000","CIPG":{"CIP":"10.65.5.56","CP":"0"},"OF":{"Flag1":"of","Flag2":"-"},"SF":{"Flag1":"sf","Flag2":"-"}}'
def jsonReplace = json.replace('{"transaction":{','{"transaction":[{').replace('}}}','}}]}')
def jsonRecord = new JsonSlurper().parseText(jsonReplace)
def columns = ["TS","V","PID","RS","SR","CnID","CIP","Flag1","Flag1"]
def flatten
flatten = { row ->
def flattened = [:]
row.each { k, v ->
if (v instanceof Map) {
flattened << flatten(v)
} else if (v instanceof Collection && v.every {it instanceof Map}) {
v.each { flattened << flatten(it) }
} else {
flattened[k] = v
}
}
flattened
}
print "output: " + jsonRecord.transaction.collect {row -> columns.collect {colName -> flatten(row)[colName]}.join(',')}.join('\n')
Edit: Based on the reply from #cfrick and #stck, I have tried the option and have follow up question below.
#cfrick and #stck- Thanks for your response.
Original source JSON record will have more than 100 columns and I am using "InvokeScriptedProcessor" in NiFi to trigger the Groovy script.
Below is the original Groovy script am using in "InvokeScriptedProcessor" in which I have used Streams(inputstream, outputstream). Is this what you are referring.
Am I doing anything wrong?
import groovy.json.JsonSlurper
class customJSONtoCSV implements Processor {
def REL_SUCCESS = new Relationship.Builder().name("success").description("FlowFiles that were successfully processed").build();
def log
static def flatten(row, prefix="") {
def flattened = new HashMap<String, String>()
row.each { String k, Object v ->
def key = prefix ? prefix + "_" + k : k;
if (v instanceof Map) {
flattened.putAll(flatten(v, k))
} else {
flattened.put(key, v.toString())
}
}
return flattened
}
static def toCSVRow(HashMap row) {
def columns = ["CIPG_CIP","CIPG_CP","CIPG_SLP","CIPG_SLEP","CIPG_CVID","SIPG_SIP","SIPG_SP","SIPG_InP","SIPG_SVID","TG_T","TG_R","TG_C","TG_SDL","DL","I_R","UAP","EDBL","Ca","A","RQM","RSM","FIT","CSR","OF_Flag1","OF_Flag2","OF_Flag3","OF_Flag4","OF_Flag5","OF_Flag6","OF_Flag7","OF_Flag8","OF_Flag9","OF_Flag10","OF_Flag11","OF_Flag12","OF_Flag13","OF_Flag14","OF_Flag15","OF_Flag16","OF_Flag17","OF_Flag18","OF_Flag19","OF_Flag20","OF_Flag21","OF_Flag22","OF_Flag23","SF_Flag1","SF_Flag2","SF_Flag3","SF_Flag4","SF_Flag5","SF_Flag6","SF_Flag7","SF_Flag8","SF_Flag9","SF_Flag10","SF_Flag11","SF_Flag12","SF_Flag13","SF_Flag14","SF_Flag15","SF_Flag16","SF_Flag17","SF_Flag18","SF_Flag19","SF_Flag20","SF_Flag21","SF_Flag22","SF_Flag23","SF_Flag24","GF_Flag1","GF_Flag2","GF_Flag3","GF_Flag4","GF_Flag5","GF_Flag6","GF_Flag7","GF_Flag8","GF_Flag9","GF_Flag10","GF_Flag11","GF_Flag12","GF_Flag13","GF_Flag14","GF_Flag15","GF_Flag16","GF_Flag17","GF_Flag18","GF_Flag19","GF_Flag20","GF_Flag21","GF_Flag22","GF_Flag23","GF_Flag24","GF_Flag25","GF_Flag26","GF_Flag27","GF_Flag28","GF_Flag29","GF_Flag30","GF_Flag31","GF_Flag32","GF_Flag33","GF_Flag34","GF_Flag35","VSL_VSID","VSL_TC","VSL_MTC","VSL_NRTC","VSL_ET","VSL_HRES","VSL_VRES","VSL_FS","VSL_FR","VSL_VSD","VSL_ACB","VSL_ASB","VSL_VPR","VSL_VSST","HRU_HM","HRU_HD","HRU_HP","HRU_HQ","URLF_CID","URLF_CGID","URLF_CR","URLF_RA","URLF_USM","URLF_USP","URLF_MUS","TCPSt_WS","TCPSt_SE","TCPSt_WSFNS","TCPSt_WSF","TCPSt_EM","TCPSt_RSTE","TCPSt_MSS","NS_OPID","NS_ODID","NS_EPID","NS_TrID","NS_VSN","NS_LSUT","NS_STTS","NS_TCPPR","CQA_NL","CQA_CL","CQA_CLC","CQA_SQ","CQA_SQC","TS","V","PID","RS","SR","CnID","A_S","OS","CPr","CVB","CS","HS","SUNR","SUNS","ML","MT","TCPSL","CT","MS","MSH","SID","SuID","UA","DID","UAG","CID","HR","CRG","CP1","CP2","AIDF","UCB","CLID","CLCL","OPTS","PUAG","SSLIL"]
return columns.collect { column ->
return row.containsKey(column) ? row.get(column) : ""
}.join(',')
}
#Override
void initialize(ProcessorInitializationContext context) {
log = context.getLogger()
}
#Override
Set<Relationship> getRelationships() {
return [REL_SUCCESS] as Set
}
#Override
void onTrigger(ProcessContext context, ProcessSessionFactory sessionFactory) throws ProcessException {
try {
def session = sessionFactory.createSession()
def flowFile = session.get()
if (!flowFile) return
flowFile = session.write(flowFile,
{ inputStream, outputStream ->
def bufferedReader = new BufferedReader(new InputStreamReader(inputStream, 'UTF-8'))
def jsonSlurper = new JsonSlurper()
def line
def header = "CIPG_CIP,CIPG_CP,CIPG_SLP,CIPG_SLEP,CIPG_CVID,SIPG_SIP,SIPG_SP,SIPG_InP,SIPG_SVID,TG_T,TG_R,TG_C,TG_SDL,DL,I_R,UAP,EDBL,Ca,A,RQM,RSM,FIT,CSR,OF_Flag1,OF_Flag2,OF_Flag3,OF_Flag4,OF_Flag5,OF_Flag6,OF_Flag7,OF_Flag8,OF_Flag9,OF_Flag10,OF_Flag11,OF_Flag12,OF_Flag13,OF_Flag14,OF_Flag15,OF_Flag16,OF_Flag17,OF_Flag18,OF_Flag19,OF_Flag20,OF_Flag21,OF_Flag22,OF_Flag23,SF_Flag1,SF_Flag2,SF_Flag3,SF_Flag4,SF_Flag5,SF_Flag6,SF_Flag7,SF_Flag8,SF_Flag9,SF_Flag10,SF_Flag11,SF_Flag12,SF_Flag13,SF_Flag14,SF_Flag15,SF_Flag16,SF_Flag17,SF_Flag18,SF_Flag19,SF_Flag20,SF_Flag21,SF_Flag22,SF_Flag23,SF_Flag24,GF_Flag1,GF_Flag2,GF_Flag3,GF_Flag4,GF_Flag5,GF_Flag6,GF_Flag7,GF_Flag8,GF_Flag9,GF_Flag10,GF_Flag11,GF_Flag12,GF_Flag13,GF_Flag14,GF_Flag15,GF_Flag16,GF_Flag17,GF_Flag18,GF_Flag19,GF_Flag20,GF_Flag21,GF_Flag22,GF_Flag23,GF_Flag24,GF_Flag25,GF_Flag26,GF_Flag27,GF_Flag28,GF_Flag29,GF_Flag30,GF_Flag31,GF_Flag32,GF_Flag33,GF_Flag34,GF_Flag35,VSL_VSID,VSL_TC,VSL_MTC,VSL_NRTC,VSL_ET,VSL_HRES,VSL_VRES,VSL_FS,VSL_FR,VSL_VSD,VSL_ACB,VSL_ASB,VSL_VPR,VSL_VSST,HRU_HM,HRU_HD,HRU_HP,HRU_HQ,URLF_CID,URLF_CGID,URLF_CR,URLF_RA,URLF_USM,URLF_USP,URLF_MUS,TCPSt_WS,TCPSt_SE,TCPSt_WSFNS,TCPSt_WSF,TCPSt_EM,TCPSt_RSTE,TCPSt_MSS,NS_OPID,NS_ODID,NS_EPID,NS_TrID,NS_VSN,NS_LSUT,NS_STTS,NS_TCPPR,CQA_NL,CQA_CL,CQA_CLC,CQA_SQ,CQA_SQC,TS,V,PID,RS,SR,CnID,A_S,OS,CPr,CVB,CS,HS,SUNR,SUNS,ML,MT,TCPSL,CT,MS,MSH,SID,SuID,UA,DID,UAG,CID,HR,CRG,CP1,CP2,AIDF,UCB,CLID,CLCL,OPTS,PUAG,SSLIL"
outputStream.write("${header}\n".getBytes('UTF-8'))
while (line = bufferedReader.readLine()) {
def jsonReplace = line.replace('{"transaction":{','{"transaction":[{').replace('}}}','}}]}')
def jsonRecord = new JsonSlurper().parseText(jsonReplace)
def a = jsonRecord.transaction.collect { row ->
return flatten(row)
}.collect { row ->
return toCSVRow(row)
}
outputStream.write("${a}\n".getBytes('UTF-8'))
}
} as StreamCallback)
session.transfer(flowFile, REL_SUCCESS)
session.commit()
}
catch (e) {
throw new ProcessException(e)
}
}
#Override
Collection<ValidationResult> validate(ValidationContext context) { return null }
#Override
PropertyDescriptor getPropertyDescriptor(String name) { return null }
#Override
void onPropertyModified(PropertyDescriptor descriptor, String oldValue, String newValue) { }
#Override
List<PropertyDescriptor> getPropertyDescriptors() {
return [] as List
}
#Override
String getIdentifier() { return null }
}
processor = new customJSONtoCSV()
If I should not use "collect" then what else I need to use to create the rows.
In the output flow file, the record output is coming inside []. I tried the below but it is not working. Not sure whether am doing the right thing. I want csv output without []
return toCSVRow(row).toString()

If you know what you want to extract exactly (and given you want to
generate a CSV from it) IMHO you are way better off to just shape the
data in the way you later want to consume it. E.g.
def data = new groovy.json.JsonSlurper().parseText('[{"TS":"2019-10-08 22:33:29.244000","CIPG":{"CIP":"10.65.5.56","CP":"0"},"OF":{"Flag1":"of","Flag2":"-"},"SF":{"Flag1":"sf","Flag2":"-"}}]')
extractors = [
{ it.TS },
{ it.V },
{ it.PID },
{ it.RS },
{ it.SR },
{ it.CIPG.CIP },
{ it.CIPG.CP },
{ it.OF.Flag1 },
{ it.SF.Flag1 },]
def extract(row) {
extractors.collect{ it(row) }
}
println(data.collect{extract it})
// ⇒ [[2019-10-08 22:33:29.244000, null, null, null, null, 10.65.5.56, 0, of, sf]]
As stated in the other answer, due to the sheer amount of data you are trying to
convert::
Make sure to use a library to generate the CSV file from that, or else
you will hit problems with the content, you try to write (e.g. line
breaks or the data containing the separator char).
Don't use collect (it is eager) to create the rows.

The idea is to modify "flatten" method - it should differentiate between same nested keys by providing parent key as a prefix.
I've simplified code a bit:
import groovy.json.*
def json = '{"transaction":{"TS":"2019-10-08 22:33:29.244000","CIPG":{"CIP":"10.65.5.56","CP":"0"},"OF":{"Flag1":"of","Flag2":"-"},"SF":{"Flag1":"sf","Flag2":"-"}}'
def jsonReplace = json.replace('{"transaction":{','{"transaction":[{').replace('}}','}}]')
def jsonRecord = new JsonSlurper().parseText(jsonReplace)
static def flatten(row, prefix="") {
def flattened = new HashMap<String, String>()
row.each { String k, Object v ->
def key = prefix ? prefix + "." + k : k;
if (v instanceof Map) {
flattened.putAll(flatten(v, k))
} else {
flattened.put(key, v.toString())
}
}
return flattened
}
static def toCSVRow(HashMap row) {
def columns = ["TS","V","PID","RS","SR","CnID","CIP","OF.Flag1","SF.Flag1"] // Last 2 keys have changed!
return columns.collect { column ->
return row.containsKey(column) ? row.get(column) : ""
}.join(', ')
}
def a = jsonRecord.transaction.collect { row ->
return flatten(row)
}.collect { row ->
return toCSVRow(row)
}.join('\n')
println a
Output would be:
2019-10-08 22:33:29.244000, , , , , , , of, sf

Groovy/Grails GPARS: How to execute 2 calculations parallel?

I'm new to the GPARS-library and implementing it in our software at the moment.
It's no problem for me to use it instead of the normal groovy-methods like
[..].each{..}
->
[..].eachParallel{..}
But I'm wondering how to parallelize 2 tasks which are returning a value.
Without GPARS I would do it this way:
List<Thread> threads = []
def forecastData
def actualData
threads.add(Thread.start {
forecastData = cosmoSegmentationService.getForecastSegmentCharacteristics(dataset, planPeriod, thruPeriod)
})
threads.add(Thread.start {
actualData = cosmoSegmentationService.getMeasuredSegmentCharacteristics(dataset, fromPeriod, thruPeriodActual)
})
threads*.join()
// merge both datasets
def data = actualData + forecastData
But (how) can this be done with the GparsPool?

You could use Dataflow:
import groovyx.gpars.dataflow.*
import static groovyx.gpars.dataflow.Dataflow.task
def forecastData = new DataflowVariable()
def actualData = new DataflowVariable()
def result = new DataflowVariable()
task {
forecastData << cosmoSegmentationService.getForecastSegmentCharacteristics( dataset, planPeriod, thruPeriod )
}
task {
actualData << cosmoSegmentationService.getMeasuredSegmentCharacteristics( dataset, fromPeriod, thruPeriodActual )
}
task {
result << forecastData.val + actualData.val
}
println result.val
Alternative for GPars 0.9:
import static groovyx.gpars.GParsPool.withPool
def getForecast = {
cosmoSegmentationService.getForecastSegmentCharacteristics( dataset, planPeriod, }
def getActual = {
cosmoSegmentationService.getMeasuredSegmentCharacteristics( dataset, fromPeriod, thruPeriodActual )
}
def results = withPool {
[ getForecast.callAsync(), getActual.callAsync() ]
}
println results*.get().sum()

import groovyx.gpars.GParsPool
List todoList =[]
todoList.add {
for(int i1: 1..100){
println "task 1:" +i1
sleep(300)
}
}
todoList.add {
for(int i2: 101..200){
println "task 2:" +i2
sleep(300)
}
}
GParsPool.withPool(2) {
todoList.collectParallel { closure->closure() }
}

groovy thread for urls

I wrote logic for testing urls using threads.
This works good for less number of urls and failing with more than 400 urls to check .
class URL extends Thread{
def valid
def url
URL( url ) {
this.url = url
}
void run() {
try {
def connection = url.toURL().openConnection()
connection.setConnectTimeout(10000)
if(connection.responseCode == 200 ){
valid = Boolean.TRUE
}else{
valid = Boolean.FALSE
}
} catch ( Exception e ) {
valid = Boolean.FALSE
}
}
}
def threads = [];
urls.each { ur ->
def reader = new URL(ur)
reader.start()
threads.add(reader);
}
while (threads.size() > 0) {
for(int i =0; i < threads.size();i++) {
def tr = threads.get(i);
if (!tr.isAlive()) {
if(tr.valid == true){
threads.remove(i);
i--;
}else{
threads.remove(i);
i--;
}
}
}
Could any one please tell me how to optimize the logic and where i was going wrong .
thanks in advance.

Have you considered using the java.util.concurrent helpers? It allows multithreaded programming at a higher level of abstraction. There's a simple interface to run parallel tasks in a thread pool, which is easier to manage and tune than just creating n threads for n tasks and hoping for the best.
Your code then ends up looking something like this, where you can tune nThreads until you get the best performance:
import java.util.concurrent.*
def nThreads = 1000
def pool = Executors.newFixedThreadPool(nThreads)
urls.each { url ->
pool.submit(url)
}
def timeout = 60
pool.awaitTermination(timeout, TimeUnit.SECONDS)

Using ataylor's suggestion, and your code, I got to this:
import java.util.concurrent.Executors
import java.util.concurrent.TimeUnit
class MyURL implements Runnable {
def valid
def url
void run() {
try {
url.toURL().openConnection().with {
connectTimeout = 10000
if( responseCode == 200 ) {
valid = true
}
else {
valid = false
}
disconnect()
}
}
catch( e ) {
valid = false
}
}
}
// A list of URLs to check
def urls = [ 'http://www.google.com',
'http://stackoverflow.com/questions/2720325/groovy-thread-for-urls',
'http://www.nonexistanturlfortesting.co.ch/whatever' ]
// How many threads to kick off
def nThreads = 3
def pool = Executors.newFixedThreadPool( nThreads )
// Construct a list of the URL objects we're running, submitted to the pool
def results = urls.inject( [] ) { list, url ->
def u = new MyURL( url:url )
pool.submit u
list << u
}
// Wait for the poolclose when all threads are completed
def timeout = 10
pool.shutdown()
pool.awaitTermination( timeout, TimeUnit.SECONDS )
// Print our results
results.each {
println "$it.url : $it.valid"
}
Which prints out this:
http://www.google.com : true
http://stackoverflow.com/questions/2720325/groovy-thread-for-urls : true
http://www.nonexistanturlfortesting.co.ch/whatever : false
I changed the classname to MyURL rather than URL as you had it, as it will more likely avoid problems when you start using the java.net.URL class

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

groovy multithreading - multithreading

See the Groovy docs for an example how to use an ExecutorService to do what you want.

Related

Finding a String from list in a String is not efficient enough

concurrent query and insert have any side effect in android with objectbox?

Nested JSON with duplicate keys

Groovy/Grails GPARS: How to execute 2 calculations parallel?

groovy thread for urls

Categories

Resources