groovy thread for urls - groovy

I wrote logic for testing urls using threads.
This works good for less number of urls and failing with more than 400 urls to check .
class URL extends Thread{
def valid
def url
URL( url ) {
this.url = url
}
void run() {
try {
def connection = url.toURL().openConnection()
connection.setConnectTimeout(10000)
if(connection.responseCode == 200 ){
valid = Boolean.TRUE
}else{
valid = Boolean.FALSE
}
} catch ( Exception e ) {
valid = Boolean.FALSE
}
}
}
def threads = [];
urls.each { ur ->
def reader = new URL(ur)
reader.start()
threads.add(reader);
}
while (threads.size() > 0) {
for(int i =0; i < threads.size();i++) {
def tr = threads.get(i);
if (!tr.isAlive()) {
if(tr.valid == true){
threads.remove(i);
i--;
}else{
threads.remove(i);
i--;
}
}
}
Could any one please tell me how to optimize the logic and where i was going wrong .
thanks in advance.

Have you considered using the java.util.concurrent helpers? It allows multithreaded programming at a higher level of abstraction. There's a simple interface to run parallel tasks in a thread pool, which is easier to manage and tune than just creating n threads for n tasks and hoping for the best.
Your code then ends up looking something like this, where you can tune nThreads until you get the best performance:
import java.util.concurrent.*
def nThreads = 1000
def pool = Executors.newFixedThreadPool(nThreads)
urls.each { url ->
pool.submit(url)
}
def timeout = 60
pool.awaitTermination(timeout, TimeUnit.SECONDS)

Using ataylor's suggestion, and your code, I got to this:
import java.util.concurrent.Executors
import java.util.concurrent.TimeUnit
class MyURL implements Runnable {
def valid
def url
void run() {
try {
url.toURL().openConnection().with {
connectTimeout = 10000
if( responseCode == 200 ) {
valid = true
}
else {
valid = false
}
disconnect()
}
}
catch( e ) {
valid = false
}
}
}
// A list of URLs to check
def urls = [ 'http://www.google.com',
'http://stackoverflow.com/questions/2720325/groovy-thread-for-urls',
'http://www.nonexistanturlfortesting.co.ch/whatever' ]
// How many threads to kick off
def nThreads = 3
def pool = Executors.newFixedThreadPool( nThreads )
// Construct a list of the URL objects we're running, submitted to the pool
def results = urls.inject( [] ) { list, url ->
def u = new MyURL( url:url )
pool.submit u
list << u
}
// Wait for the poolclose when all threads are completed
def timeout = 10
pool.shutdown()
pool.awaitTermination( timeout, TimeUnit.SECONDS )
// Print our results
results.each {
println "$it.url : $it.valid"
}
Which prints out this:
http://www.google.com : true
http://stackoverflow.com/questions/2720325/groovy-thread-for-urls : true
http://www.nonexistanturlfortesting.co.ch/whatever : false
I changed the classname to MyURL rather than URL as you had it, as it will more likely avoid problems when you start using the java.net.URL class

Related

Finding a String from list in a String is not efficient enough

def errorList = readFile WORKSPACE + "/list.txt"
def knownErrorListbyLine = errorList.readLines()
def build_log = new URL (Build_Log_URL).getText()
def found_errors = null
for(knownError in knownErrorListbyLine) {
if (build_log.contains(knownError)) {
found_errors = build_log.readLines().findAll{ it.contains(knownError) }
for(error in found_errors) {
println "FOUND ERROR: " + error
}
}
}
I wrote this code to find listed errors in a string, but it takes about 20 seconds.
How can I improve the performance? I would love to learn from this.
Thanks a lot!
list.txt contains a string per line:
Step ... was FAILED
[ERROR] Pod-domainrouter call failed
#type":"ErrorExtender
[postDeploymentSteps] ... does not exist.
etc...
And build logs is where I need to find these errors.
Try this:
def errorList = readFile WORKSPACE + "/list.txt"
def knownErrorListbyLine = errorList.readLines()
def build_log = new URL (Build_Log_URL)
def found_errors = null
for(knownError in knownErrorListbyLine) {
build_log.eachLine{
if ( it.contains(knownError) ) {
println "FOUND ERROR: " + error
}
}
}
This might be even more performant:
def errorList = readFile WORKSPACE + "/list.txt"
def knownErrorListbyLine = errorList.readLines()
def build_log = new URL (Build_Log_URL)
def found_errors = null
build_log.eachLine{
for(knownError in knownErrorListbyLine) {
if ( it.contains(knownError) ) {
println "FOUND ERROR: " + error
}
}
}
Attempt using the last one relying on string eachLine instead.
def errorList = readFile WORKSPACE + "/list.txt"
def knownErrorListbyLine = errorList.readLines()
def build_log = new URL (Build_Log_URL).getText()
def found_errors = null
build_log.eachLine{
for(knownError in knownErrorListbyLine) {
if ( it.contains(knownError) ) {
println "FOUND ERROR: " + error
}
}
}
Try to move build_log.readLines() to the variable outside of the loop.
def errorList = readFile WORKSPACE + "/list.txt"
def knownErrorListbyLine = errorList.readLines()
def build_log = new URL (Build_Log_URL).getText()
def found_errors = null
def buildLogByLine = build_log.readLines()
for(knownError in knownErrorListbyLine) {
if (build_log.contains(knownError)) {
found_errors = buildLogByLine.findAll{ it.contains(knownError) }
for(error in found_errors) {
println "FOUND ERROR: " + error
}
}
}
Update: using multiple threads
Note: this may help in case errorList size is large enough. And also if the matching errors distributed evenly.
def sublists = knownErrorListbyLine.collate(x)
// int x - the sublist size,
// depends on the knownErrorListbyLine size, set the value to get e. g. 4 sublists (threads).
// Also do not use more than 2 threads per CPU. Start from 1 thread per CPU.
def logsWithErrors = []// list for store results per thread
def lock = new Object()
def threads = sublists.collect { errorSublist ->
Thread.start {
def logs = build_log.readLines()
errorSublist.findAll { build_log.contains(it) }.each { error ->
def results = logs.findAll { it.contains(error) }
synchronized(lock) {
logsWithErrors << results
}
}
}
}
threads*.join() // wait for all threads to finish
logsWithErrors.flatten().each {
println "FOUND ERROR: $it"
}
Also, as was suggested earlier by other user, try to measure the logs download time, it could be the bottleneck:
def errorList = readFile WORKSPACE + "/list.txt"
def knownErrorListbyLine = errorList.readLines()
def start = Calendar.getInstance().timeInMillis
def build_log = new URL(Build_Log_URL).getText()
def end = Calendar.getInstance().timeInMillis
println "Logs download time: ${(end-start)/1000} ms"
def found_errors = null

Scala synchronized consumer producer

I want to implement something like the producer-consumer problem (with only one information transmitted at a time), but I want the producer to wait for someone to take his message before leaving.
Here is an example that doesn't block the producer but works otherwise.
class Channel[T]
{
private var _msg : Option[T] = None
def put(msg : T) : Unit =
{
this.synchronized
{
waitFor(_msg == None)
_msg = Some(msg)
notifyAll
}
}
def get() : T =
{
this.synchronized
{
waitFor(_msg != None)
val ret = _msg.get
_msg = None
notifyAll
return ret
}
}
private def waitFor(b : => Boolean) =
while(!b) wait
}
How can I changed it so the producers gets blocked (as the consumer is) ?
I tried to add another waitFor at the end of but sometimes my producer doesn't get released.
For instance, if I have put ; get || get ; put, most of the time it works, but sometimes, the first put is not terminated and the left thread never even runs the get method (I print something once the put call is terminated, and in this case, it never gets printed).
This is why you should use a standard class, SynchronousQueue in this case.
If you really want to work through your problematic code, start by giving us a failing test case or a stack trace from when the put is blocking.
You can do this by means of a BlockingQueue descendant whose producer put () method creates a semaphore/event object that is queued up with the passed message and then the producer thread waits on it.
The consumer get() method extracts a message from the queue and signals its semaphore, so allowing its original producer to run on.
This allows a 'synchronous queue' with actual queueing functionality, should that be what you want?
I came up with something that appears to be working.
class Channel[T]
{
class Transfer[A]
{
protected var _msg : Option[A] = None
def msg_=(a : A) = _msg = Some(a)
def msg : A =
{
// Reading the message destroys it
val ret = _msg.get
_msg = None
return ret
}
def isEmpty = _msg == None
def notEmpty = !isEmpty
}
object Transfer {
def apply[A](msg : A) : Transfer[A] =
{
var t = new Transfer[A]()
t.msg = msg
return t
}
}
// Hacky but Transfer has to be invariant
object Idle extends Transfer[T]
protected var offer : Transfer[T] = Idle
protected var request : Transfer[T] = Idle
def put(msg : T) : Unit =
{
this.synchronized
{
// push an offer as soon as possible
waitFor(offer == Idle)
offer = Transfer(msg)
// request the transfer
requestTransfer
// wait for the transfer to go (ie the msg to be absorbed)
waitFor(offer isEmpty)
// delete the completed offer
offer = Idle
notifyAll
}
}
def get() : T =
{
this.synchronized
{
// push a request as soon as possible
waitFor(request == Idle)
request = new Transfer()
// request the transfer
requestTransfer
// wait for the transfer to go (ie the msg to be delivered)
waitFor(request notEmpty)
val ret = request.msg
// delete the completed request
request = Idle
notifyAll
return ret
}
}
protected def requestTransfer()
{
this.synchronized
{
if(offer != Idle && request != Idle)
{
request.msg = offer.msg
notifyAll
}
}
}
protected def waitFor(b : => Boolean) =
while(!b) wait
}
It has the advantage of respecting symmetry between producer and consumer but it is a bit longer than what I had before.
Thanks for your help.
Edit : It is better but still not safeā€¦

Groovy/Grails GPARS: How to execute 2 calculations parallel?

I'm new to the GPARS-library and implementing it in our software at the moment.
It's no problem for me to use it instead of the normal groovy-methods like
[..].each{..}
->
[..].eachParallel{..}
But I'm wondering how to parallelize 2 tasks which are returning a value.
Without GPARS I would do it this way:
List<Thread> threads = []
def forecastData
def actualData
threads.add(Thread.start {
forecastData = cosmoSegmentationService.getForecastSegmentCharacteristics(dataset, planPeriod, thruPeriod)
})
threads.add(Thread.start {
actualData = cosmoSegmentationService.getMeasuredSegmentCharacteristics(dataset, fromPeriod, thruPeriodActual)
})
threads*.join()
// merge both datasets
def data = actualData + forecastData
But (how) can this be done with the GparsPool?
You could use Dataflow:
import groovyx.gpars.dataflow.*
import static groovyx.gpars.dataflow.Dataflow.task
def forecastData = new DataflowVariable()
def actualData = new DataflowVariable()
def result = new DataflowVariable()
task {
forecastData << cosmoSegmentationService.getForecastSegmentCharacteristics( dataset, planPeriod, thruPeriod )
}
task {
actualData << cosmoSegmentationService.getMeasuredSegmentCharacteristics( dataset, fromPeriod, thruPeriodActual )
}
task {
result << forecastData.val + actualData.val
}
println result.val
Alternative for GPars 0.9:
import static groovyx.gpars.GParsPool.withPool
def getForecast = {
cosmoSegmentationService.getForecastSegmentCharacteristics( dataset, planPeriod, }
def getActual = {
cosmoSegmentationService.getMeasuredSegmentCharacteristics( dataset, fromPeriod, thruPeriodActual )
}
def results = withPool {
[ getForecast.callAsync(), getActual.callAsync() ]
}
println results*.get().sum()
import groovyx.gpars.GParsPool
List todoList =[]
todoList.add {
for(int i1: 1..100){
println "task 1:" +i1
sleep(300)
}
}
todoList.add {
for(int i2: 101..200){
println "task 2:" +i2
sleep(300)
}
}
GParsPool.withPool(2) {
todoList.collectParallel { closure->closure() }
}

Tail a file in Groovy

I am wondering is there is a simple way to tail a file in Groovy? I know how to read a file, but how do I read a file and then wait for more lines to be added, read them, wait, etc...
I have what I am sure is a really stupid solution:
def lNum = 0
def num= 0
def numLines = 0
def myFile = new File("foo.txt")
def origNumLines = myFile.eachLine { num++ }
def precIndex = origNumLines
while (true) {
num = 0
lNum = 0
numLines = myFile.eachLine { num++ }
if (numLines > origNumLines) {
myFile.eachLine({ line ->
if (lNum > precIndex) {
println line
}
lNum++
})
}
precIndex = numLines
Thread.sleep(5000)
}
Note that I am not really interested in invoking the Unix "tail" command. Unless it is the only solution.
I wrote a groovy class which resembles the basic tail functionality:
class TailReader {
boolean stop = false
public void stop () {
stop = true
}
public void tail (File file, Closure c) {
def runnable = {
def reader
try {
reader = file.newReader()
reader.skip(file.length())
def line
while (!stop) {
line = reader.readLine()
if (line) {
c.call(line)
}
else {
Thread.currentThread().sleep(1000)
}
}
}
finally {
reader?.close()
}
} as Runnable
def t = new Thread(runnable)
t.start()
}
}
The tail method taks a file and closure as parameters. It will run in a separate thread and will pass each new line that will be appended to the file to the given closure. The tail method will run until the stop method is called on the TailReader instance. Here's a short example of how to use the TailReader class:
def reader = new TailReader()
reader.tail(new File("/tmp/foo.log")) { println it }
// Do something else, e.g.
// Thread.currentThread().sleep(30 * 1000)
reader.stop()
In reply to Christoph :
For my use case, I've replaced the block
line = reader.readLine()
if (line) {
c.call(line)
}
else {
Thread.currentThread().sleep(1000)
}
with
while ((line = reader.readLine()) != null)
c.call(line)
Thread.currentThread().sleep(1000)`
... != null => to ensure empty lines are outputted as well
while (... => to ensure every line is read as fast as possible

groovy multithreading

I'm newbie to groovy/grails.
How to implement thread for this code . Had 2500 urls and this was taking hours of time for checking each url.
so i decided to implement multi-thread for this :
Here is my sample code :
def urls = [
"http://www.wordpress.com",
"http://67.192.103.225/QRA.Public/" ,
"http://www.subaru.com",
"http://baldwinfilter.com/products/start.html"
]
def up = urls.collect { ur ->
try {
def url = new URL(ur)
def connection = url.openConnection()
if (connection.responseCode == 200) {
return true
} else {
return false
}
} catch (Exception e) {
return false
}
}
For this code i need to implement multi-threading .
Could any one please suggest me the code.
thanks in advance,
sri.
I would take a look at the Groovy Parallel Systems library. In particular I think that the Parallel collections section would be useful.
Looking at the docs, I believe that collectParallel is a direct drop-in replacement for collect (bearing in mind the obvious caveats about side-effects). The following works fine for me:
def urls = [
"http://www.wordpress.com",
"http://www.subaru.com",
"http://baldwinfilter.com/products/start.html"
]
Parallelizer.doParallel {
def up = urls.collectParallel { ur ->
try {
def url = new URL(ur)
def connection = url.openConnection()
if (connection.responseCode == 200) {
return true
} else {
return false
}
} catch (Exception e) {
return false
}
}
println up
}
See the Groovy docs for an example how to use an ExecutorService to do what you want.
You can use this to check the URL in a separate thread.
class URLReader implements Runnable
{
def valid
def url
URLReader( url ) {
this.url = url
}
void run() {
try {
def connection = url.toURL().openConnection()
valid = ( connection.responseCode == 200 ) as Boolean
} catch ( Exception e ) {
println e.message
valid = Boolean.FALSE
}
}
}
def reader = new URLReader( "http://www.google.com" )
new Thread( reader ).start()
while ( reader.valid == null )
{
Thread.sleep( 500 )
}
println "valid: ${reader.valid}"
Notes: The valid attribute will be either null, Boolean.TRUE or Boolean.FALSE. You'll need to wait for a while to give all the threads a chance to open the connection. Depending on the number of URLs you're checking you will eventually hit a limit of the number of threads / connections you can realistically handle, so should check URLs in batches of the appropriate size.
I think this way is very simple to achieve.
import java.util.concurrent.*
//Thread number
THREADS = 100
pool = Executors.newFixedThreadPool(THREADS)
defer = { c -> pool.submit(c as Callable) }
def urls = [
"http://www.wordpress.com",
"http://www.subaru.com",
]
def getUrl = { url ->
def connection = url.openConnection()
if (connection.responseCode == 200) {
return true
} else {
return false
}
}
def up = urls.collect { ur ->
try {
def url = new URL(ur)
defer{ getUrl(url) }.get()
} catch (Exception e) {
return false
}
}
println up
pool.shutdown()
This is how I implemented:
class ValidateLinks extends Thread{
def valid
def url
ValidateLinks( url ) {
this.url = url
}
void run() {
try {
def connection = url.toURL().openConnection()
connection.setConnectTimeout(5000)
valid = ( connection.responseCode == 200 ) as Boolean
} catch ( Exception e ) {
println url + "-" + e.message
valid = Boolean.FALSE
}
}
}
def threads = [];
urls.each { ur ->
def reader = new ValidateLinks(ur.site_url)
reader.start()
threads.add(reader);
}
while (threads.size() > 0) {
for(int i =0; i < threads.size();i++) {
def tr = threads.get(i);
if (!tr.isAlive()) {
println "URL : " + tr.url + "Valid " + tr.valid
threads.remove(i);
i--;
}
}
}

Resources