I am trying to find a way to define a multi processed (not a multi threaded) build on multicores with Ant on Ubuntu.
I guess is not suitable
In the documentation: https://ant.apache.org/manual/Tasks/parallel.html
I found the next "disclaimer":
"The primary use case for is to run external programs ...
Accordingly, while this task has uses, it should be considered an advanced task which should be used in certain batch-processing or testing situations, rather than an easy trick to speed up build times on a multiway CPU.
Related
I'm new with multithreading, here is my problem statement,
I have an xml file (TestCase.xml) where each tag resembles a test case something like below,
TestCase.xml
In turn, each main tag has a child-tag that links to another xml(TestStep.xml) which dictates the steps of the test case, it’s TS in the above example.
TestStep.xml
The execution always starts from the TestCase.xml based on the id provided. With this overview, I have 100 test cases in my suite & I want to execute them in parallel, i.e. execute at least 5-6 test cases at the same time. I’m not able to use external plug-ins like Testng, Junit, BDD or mavensurefire etc. After a lot of R&D we have ended up with Multithreading. I would need assistance on how to implement the same.
I'm doing embedded development on a resource-limited system, and need to run a number of separate Node.js tasks (call them task1.js, task2.js and task3.js). The obvious solution would be to run them separately, e.g.:
$ node task1.js &
[1] 1968
$ node task2.js &
[2] 1969
$ node task3.js &
[3] 1970
$
This works, but I end up with three independent Node stacks, each with its own multi-megabyte heap, interpreter, etc. etc. etc., which is a waste that I'd like to avoid.
Another obvious solution would be to concatenate the source files:
$ cat task1.js task2.js task3.js | node -
This works, but it has problems. First, all three task sources would end up in the same module, so I'd risk name collisions. For example, if each task file included const crypto = require('crypto');, then when concatenated Node would complain about the multiply-defined crypto variable.
This would also require all of the primary task source files to be in the same directory, otherwise any relative path references to dependent files would be calculated based on the default working directory, and would likely break.
So, I'm looking for a way to run multiple tasks in the same Node instance, sharing Node resources as much as possible.
It would be great if some or all the following were true:
For development/debugging convenience, the same taskX.js sources could be used individually (as at the top), or run at the same time in the same node instance
No special care would have to be made in each task's code to prevent namespace collisions
Relative path references in include statements wouldn't all be resolved from the same working directory, so that I could have separate source trees for the separate tasks
Problems I don't need to be solved:
Multiprocessing or multithreading
Sharing data between the tasks
Inter-task events and communication services (if I do I'll write them myself)
Protecting each task from the others' bad behavior
Expected constraints for the task code:
No busy-waiting, so that none block the others from executing
No exclusive use of common system resources (e.g. no two will open a server socket on the same port)
Use of global Node resources will be restricted or forbidden
Is this resource limited to 1 CPU? If so, then your best bet is making each task return an async function, and processing them with something like async.parallel.
Especially if your subtasks are broken down into mostly async functions, this will allow the tasks to run as "parallel" as possible.
In a multi-cpu environment, you can boost performance using child processes (or using native node cluster module). But, as others stated, this would require the memory overhead of v8 for each process.
If your tasks are mostly cpu intensive, you will not see much gain from the async.parallel, and it could even be slower than doing all your tasks sync. But, if there is network or disk access (IO), then using parallel should be faster.
Suppose I'm writing a program in node.js (or perhaps another typical back-end scripting language). Suppose further I have a C function f (or a python function, or what have you) that does some pure data transformation.
If I want to use f in my node program, there are two approaches:
Bind f via something like node-gyp that makes it callable from JavaScript land.
Make f into a binary (or, in the case of a language like python, a single f.py interface) that sits on the file system, and then call it from node as if were any other system command (so that one can then take the output from the system call as a string, convert it into node.js data, and then use it).
Question: What are the performance implications of choosing (2) over (1)?
This is important because if you are using a language like C to make some aspect of your application run significantly faster, then using (2) would seem pointless if it slowed things down past some threshold.
The cost of 1 is the cost of loading the native code, transfering arguments (ffi), calling the native code, and transfering arguments back. With loading being done only once.
The cost of 2 is always going to be the cost to startup the process, running the process, converting the results back from strings.
If the cost of f is high, you may never see a difference between 1 and 2. If the cost of f is low, then 2 will take longer because the process startup overhead will dominate.
However, depending on the complexity of f (it might be a very large data-processing application in C), it's almost always faster to create a native binding like 1. Avoiding process startup overhead is important, it also reduces the total amount of memory needed to run your application.
Alternatively you could do option:
Have the C code talk over a local network socket. Accepting requests and responding with answers when the computation is done.
This has the benefit of scaling out to multiple nodes if you need it.
Benchmarking both for your use case is the only way to be sure but method 1 is
likely to be faster.
The startup cost of calling a binary and starting an interpreter for python/perl/blah would likely kill any performance gain you might get using their Foreign Function Interface (FFI). Startup cost is one of the reasons why Apache has mod_python, mod_perl and why FastCGI exists.
Another thing to consider is that you're adding another language to the mix and this might kill performance of the team ie now everyone needs to know two languages and two FFI methods etc. If your app is in Node, keep it in Node and use node to call native methods.
This is perhaps a slightly unusual Jenkins query, but we've got a project that spans many projects. All of them are Linux based, but they span multiple architectures (MIPS, SPARC, ARMv6, ARMv7).
For a specific component, let's call it 'video-encoder', we'll therefore have 4 projects: mips-video-encoder, sparc-video-encoder, etc.
Each project is built on 4 separate slaves with a label that correlates to their architecture, i.e. the MIPS slave has the labels 'mips' 'linux'.
My objectives are to:
Consolidate all of our separate jobs. This should make it easier for us to modify job properties as well as easier to add more jobs without the duplicitous effort of adding so many architecture specific jobs.
To allow us to build only one architecture at a time if we so wish. If the MIPS job fails, we'd like to build just for MIPS and not for others.
I have looked at the 'Multi-configuration' type job -- at the moment we are just using Single confguration jobs which are simple. I am not sure if the Multi-configuration type allows for us to build only individual architectures at once. I had a play with the configuration matrix, but wasn't sure if this could be changed / adapted to just build a for single platform. It looks like I may be able to use a Groovy statement to do this? Something like:
(label=="mips".implies("slave"=="mips")
Maybe that could be simplified to something like slave == label where label is the former name of the job when it was in its single-configuration state and is now a build parameter?
I am thinking that we don't need a Multi-config job for this, if we can programatically choose the slave for this.
I would greatly appreciate some advice on how we can consolidate the number of jobs we have and programatically change the target slave based on the architecture of the project which is a build parameter.
Many thanks in advance,
You can make a wrapper job with a system groovy script. You need the groovy plugin for this. let call the wrapper job - video-encoder-wrapper, here are the bullets how to configure it:
Define the parameter ARCH
Assign the label to the video-encoder job based on the ARCH parameter by the step Execute system Groovy script
import hudson.model.*
encoder=Hudson.instance.getItem('video-encoder')
def arch =build.buildVariableResolver.resolve("ARCH")
label= Hudson.instance.getLabel(arch)
encoder.setAssignedLabel(label)
Invoke non blocking downstream project video-encoder, don't forget to pass the ARCH parameter
Check the option Set Build Name in the video-encoder job's configuration and set it to the something as ${ENV,var="ARCH"} - #${BUILD_NUMBER}. It will allow you to track easily the build history.
Disable the concurrent builds of video-encoder-wrapper job. It will prevent the assigning of 2 different labels in the same time to the video-encoder job
Hope it helps
I have a non conventional build script in gradle that does cyclic compiling of projects.
It's gonna take weeks to change to standard gradle build so that's not gonna happen now.
The issue is I want to stop using ant in my script and move to use groovy + gradle only.
The question is how to change tasks like copy ? replaceregexp ? unzip ?
I assume in the standart gradle plugin and java plugin I have most of what i need, but they are all tasks, now if i have in my main task a method that needs copy operation, how do I got about ?
Is there a way to call the tasks code ? Is there a way to call the task itself from a groovy script ?
First of all, you should never call a task from another task - bad things will happen if you do. Instead, you should declare a relationship between the two tasks (dependsOn, mustRunAfter, finalizedBy).
In some cases (fewer than people tend to believe), chaining tasks may not be flexible enough; hence, for some tasks (e.g. Copy) an equivalent method (e.g. project.copy) is provided. However, these methods should be used with care. In many cases, tasks are the better choice, as they are the basic building blocks of Gradle and offer many advantages (e.g. automatic up-to-date checks).
Occasionally it also makes sense to use a GradleBuild task, which allows to execute one build as part of another.