How to make capturing output from an external process thread-safe?

How to make capturing output from an external process thread-safe? - multithreading

I've written a small method to execute the git command line tool and capture its output:
def git(String command) {
command = "git ${command}"
def outputStream = new StringBuilder()
def errorStream = new StringBuilder()
def process = command.execute()
process.waitForProcessOutput(outputStream, errorStream)
return [process.exitValue(), outputStream, errorStream, command]
}
I'm using it with GPars to clone multiple repositories simultaneously like
GParsPool.withPool(10) {
repos.eachParallel { cloneUrl, cloneDir->
(exit, out, err, cmd) = git("clone ${cloneUrl} ${cloneDir}")
if (exit != 0) {
println "Error: ${cmd} failed with '${errorStream}'."
}
}
}
However, I believe my git method it not thread-safe: For example, a second thread could modify command in the first line of the method before the first thread reached command.execute() in the fifth line of the method.
I could solve this by making the whole git method synchronized, but that would defeat the purpose of running it in different threads as I want clones to happen in parallel.
So I was thinking to do partial synchronization like
def git(String command) {
def outputStream
def errorStream
def process
synchronized {
command = "git ${command}"
outputStream = new StringBuilder()
errorStream = new StringBuilder()
process = command.execute()
}
process.waitForProcessOutput(outputStream, errorStream)
return [process.exitValue(), outputStream, errorStream, command]
}
But I guess that also is not safe as in thread two waitForProcessOutput() might return earlier than in thread one, screwing up the outputStream / errorStream variables.
What is the correct way to get this thread-safe?

Change the assignment statement inside the eachParallel closure argument as follows:
def (exit, out, err, cmd) = git("clone ${cloneUrl} ${cloneDir}")
This will make the variables local to the closure, which in turn will make them thread-safe. The git() method is fine as is.

Related

P4API.net: how to use P4Callbacks delegates

I am working on a small tool to schedule p4 sync daily at specific times.
In this tool, I want to display the outputs from the P4API while it is running commands.
I can see that the P4API.net has a P4Callbacks class, with several delegates: InfoResultsDelegate, TaggedOutputDelegate, LogMessageDelegate, ErrorDelegate.
My question is: How can I use those, I could not find a single example online of that. A short example code would be amazing !
Note: I am quite a beginner and have never used delegates before.

Answering my own questions by an example. I ended up figuring out by myself, it is a simple event.
Note that this only works with P4Server. My last attempt at getting TaggedOutput from a P4.Connection was unsuccessful, they were never triggered when running a command.
So, here is a code example:
P4Server p4Server = new P4Server(syncPath);
p4Server.TaggedOutputReceived += P4ServerTaggedOutputEvent;
p4Server.ErrorReceived += P4ServerErrorReceived;
bool syncSuccess = false;
try
{
P4Command syncCommand = new P4Command(p4Server, "sync", true, syncPath + "\\...");
P4CommandResult rslt = syncCommand.Run();
syncSuccess=true;
//Here you can read the content of the P4CommandResult
//But it will only be accessible when the command is finished.
}
catch (P4Exception ex) //Will be caught only when the command has completely failed
{
Console.WriteLine("P4Command failed: " + ex.Message);
}
And the two methods, those will be triggered while the sync command is being executed.
private void P4ServerErrorReceived(uint cmdId, int severity, int errorNumber, string data)
{
Console.WriteLine("P4ServerErrorReceived:" + data);
}
private void P4ServerTaggedOutputEvent(uint cmdId, int ObjId, TaggedObject Obj)
{
Console.WriteLine("P4ServerTaggedOutputEvent:" + Obj["clientFile"]);
}

Batch up requests in Groovy?

I'm new to Groovy and am a bit lost on how to batch up requests so they can be submitted to a server as a batch, instead of individually, as I currently have:
class Handler {
private String jobId
// [...]
void submit() {
// [...]
// client is a single instance of Client used by all Handlers
jobId = client.add(args)
}
}
class Client {
//...
String add(String args) {
response = postJson(args)
return parseIdFromJson(response)
}
}
As it is now, something calls Client.add(), which POSTs to a REST API and returns a parsed result.
The issue I have is that the add() method is called maybe thousands of times in quick succession, and it would be much more efficient to collect all the args passed in to add(), wait until there's a moment when the add() calls stop coming in, and then POST to the REST API a single time for that batch, sending all the args in one go.
Is this possible? Potentially, add() can return a fake id immediately, as long as the batching occurs, the submit happens, and Client can later know the lookup between fake id and the ID coming from the REST API (which will return IDs in the order corresponding to the args sent to it).

As mentioned in the comments, this might be a good case for gpars which is excellent at these kinds of scenarios.
This really is less about groovy and more about asynchronous programming in java and on the jvm in general.
If you want to stick with the java concurrent idioms I threw together a code snippet you could use as a potential starting point. This has not been tested and edge cases have not been considered. I wrote this up for fun and since this is asynchronous programming and I haven't spent the appropriate time thinking about it, I suspect there are holes in there big enough to drive a tank through.
That being said, here is some code which makes an attempt at batching up the requests:
import java.util.concurrent.*
import java.util.concurrent.locks.*
// test code
def client = new Client()
client.start()
def futureResponses = []
1000.times {
futureResponses << client.add(it as String)
}
client.stop()
futureResponses.each { futureResponse ->
// resolve future...will wait if the batch has not completed yet
def response = futureResponse.get()
println "received response with index ${response.responseIndex}"
}
// end of test code
class FutureResponse extends CompletableFuture<String> {
String args
}
class Client {
int minMillisLullToSubmitBatch = 100
int maxBatchSizeBeforeSubmit = 100
int millisBetweenChecks = 10
long lastAddTime = Long.MAX_VALUE
def batch = []
def lock = new ReentrantLock()
boolean running = true
def start() {
running = true
Thread.start {
while (running) {
checkForSubmission()
sleep millisBetweenChecks
}
}
}
def stop() {
running = false
checkForSubmission()
}
def withLock(Closure c) {
try {
lock.lock()
c.call()
} finally {
lock.unlock()
}
}
FutureResponse add(String args) {
def future = new FutureResponse(args: args)
withLock {
batch << future
lastAddTime = System.currentTimeMillis()
}
future
}
def checkForSubmission() {
withLock {
if (System.currentTimeMillis() - lastAddTime > minMillisLullToSubmitBatch ||
batch.size() > maxBatchSizeBeforeSubmit) {
submitBatch()
}
}
}
def submitBatch() {
// here you would need to put the combined args on a format
// suitable for the endpoint you are calling. In this
// example we are just creating a list containing the args
def combinedArgs = batch.collect { it.args }
// further there needs to be a way to map one specific set of
// args in the combined args to a specific response. If the
// endpoint responds with the same order as the args we submitted
// were in, then that can be used otherwise something else like
// an id in the response etc would need to be figured out. Here
// we just assume responses are returned in the order args were submitted
List<String> combinedResponses = postJson(combinedArgs)
combinedResponses.indexed().each { index, response ->
// here the FutureResponse gets a value, can be retrieved with
// futureResponse.get()
batch[index].complete(response)
}
// clear the batch
batch = []
}
// bogus method to fake post
def postJson(combinedArgs) {
println "posting json with batch size: ${combinedArgs.size()}"
combinedArgs.collect { [responseIndex: it] }
}
}
A few notes:
something needs to be able to react to the fact that there were no calls to add for a while. This implies a separate monitoring thread and is what the start and stop methods manage.
if we have an infinite sequence of adds without pauses, you might run out of resources. Therefore the code has a max batch size where it will submit the batch even if there is no lull in the calls to add.
the code uses a lock to make sure (or try to, as mentioned above, I have not considered all potential issues here) we stay thread safe during batch submissions etc
assuming the general idea here is sound, you are left with implementing the logic in submitBatch where the main problem is dealing with mapping specific args to specific responses
CompletableFuture is a java 8 class. This can be solved using other constructs in earlier releases, but I happened to be on java 8.
I more or less wrote this without executing or testing, I'm sure there are some mistakes in there.
as can be seen in the printout below, the "maxBatchSizeBeforeSubmit" setting is more a recommendation that an actual max. Since the monitoring thread sleeps for some time and then wakes up to check how we are doing, the threads calling the add method might have accumulated any number of requests in the batch. All we are guaranteed is that every millisBetweenChecks we will wake up and check how we are doing and if the criteria for submitting a batch has been reached, then the batch will be submitted.
If you are unfamiliar with java Futures and locks, I would recommend you read up on them.
If you save the above code in a groovy script code.groovy and run it:
~> groovy code.groovy
posting json with batch size: 153
posting json with batch size: 234
posting json with batch size: 243
posting json with batch size: 370
received response with index 0
received response with index 1
received response with index 2
...
received response with index 998
received response with index 999
~>
it should work and print out the "responses" received from our fake json submissions.

Terminate existing pool when all work is done

Alright, brand new to gpars so please forgive me if this has an obvious answer.
Here is my scenario. We currently have a piece of our code wrapped in a Thread.start {} block. It does this so it can send messages to an message queue in the background and not block the user request. An issue we have recently ran into with this is for large blocks of work, it is possible for the users to perform another action which would cause this block to execute again. As it is threaded, it is possible for the second batch of messages to get sent before the first causing corrupted data.
I would like to change this process to work as a queue flow with gpars. I've seen examples of creating pools such as
def pool = GParsPool.createPool()
or
def pool = new ForkJoinPool()
and then using the pool as
GParsPool.withExistingPool(pool) {
...
}
This seems like it would account for the case that if the user performs an action again, I could reuse the created pool and the actions would not be performed out of order, provided I have a pool size of one.
My question is, is this the best way to do this with gpars? And furthermore, how do I know when the pool is finished all of its work? Does it terminate when all the work is finished? If so, is there a method that can be used to check if the pool has finished/terminated to know I need a new one?
Any help would be appreciated.

No, explicitly created pools do not terminate by themselves. You have to call shutdown() on them explicitly.
Using withPool() {} command, however, will guarantee that the pool is destroyed once the code block is finished.

Here is the current solution we have to our issue. It should be noted that we followed this route due to our requirements
Work is grouped by some context
Work within a given context is ordered
Work within a given context is synchronous
Additional work for a context should execute after the preceding work
Work should not block the user request
Contexts are asynchronous between each other
Once work for a context is finished, the context should clean up after itself
Given the above, we've implemented the following:
class AsyncService {
def queueContexts
def AsyncService() {
queueContexts = new QueueContexts()
}
def queue(contextString, closure) {
queueContexts.retrieveContextWithWork(contextString, true).send(closure)
}
class QueueContexts {
def contextMap = [:]
def synchronized retrieveContextWithWork(contextString, incrementWork) {
def context = contextMap[contextString]
if (context) {
if (!context.hasWork(incrementWork)) {
contextMap.remove(contextString)
context.terminate()
}
} else {
def queueContexts = this
contextMap[contextString] = new QueueContext({->
queueContexts.retrieveContextWithWork(contextString, false)
})
}
contextMap[contextString]
}
class QueueContext {
def workCount
def actor
def QueueContext(finishClosure) {
workCount = 1
actor = Actors.actor {
loop {
react { closure ->
try {
closure()
} catch (Throwable th) {
log.error("Uncaught exception in async queue context", th)
}
finishClosure()
}
}
}
}
def send(closure) {
actor.send(closure)
}
def terminate(){
actor.terminate()
}
def hasWork(incrementWork) {
workCount += (incrementWork ? 1 : -1)
workCount > 0
}
}
}
}

Threading in unity

On button click I run method Initialize().
private IEnumerator Initialize()
{
Download download;
download = new Download();
StartCoroutine(download.LoadAsset("http://localhost/3dobjects?key=11","car13",(x)=>{j = x;}));
yield return j;
int k=download.GetRate(j)
}
Second one (GetRate) depends on result from first method (LoadAsset), so it should run after LoadAsset finishes working.
But they run synchronously like in different thread, how to solve it?

I think you want to
yield return StartCoroutine( ...
otherwise you won't be waiting for the coroutine to end.

Is there a way to reopen an input stream with withReader? - Groovy

I know that the input stream is automatically closed at the end of this kind of block in Groovy:
def exec = ""
System.in.withReader {
println "input: "
exec = it.readLine()
}
but is there a way to reopen the stream if I want to do something like that:
def exec = ""
while(!exec.equals("q")) {
System.in.withReader {
println "input: "
exec = it.readLine()
}
if(!exec.equals("q")) {
//do something
}
}
When I try this I get this error at the second execution of the while loop:
Exception in thread "main" java.io.IOException: Stream closed
So what would be the best way to achieve that?
Thanks.

You shouldn't try to reopen System.in as you shouldn't close it in the first place. You could try something like the following
def exec
def reader = System.in.newReader()
// create new version of readLine that accepts a prompt to remove duplication from the loop
reader.metaClass.readLine = { String prompt -> println prompt ; readLine() }
// process lines until finished
while ((exec = reader.readLine("input: ")) != 'q') {
// do something
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to make capturing output from an external process thread-safe? - multithreading

Change the assignment statement inside the eachParallel closure argument as follows: def (exit, out, err, cmd) = git("clone ${cloneUrl} ${cloneDir}") This will make the variables local to the closure, which in turn will make them thread-safe. The git() method is fine as is.

Related

P4API.net: how to use P4Callbacks delegates

Batch up requests in Groovy?

Terminate existing pool when all work is done

Threading in unity

Is there a way to reopen an input stream with withReader? - Groovy

Categories

Resources