Can graphicsmagick batch process on more than 2 threads? - multithreading

If I create 6 graphicsmagick batch files for converting 35k images, this is what I see in htop:
Why aren't more threads being used? I'm guessing that both of those threads are even on the same core (4 core intel with hyperthreading). I can't find a graphicsmagick config about this online. Do I blame my OS for poorly scheduling?
The only related option in the gm man page, -limit <type> <value>, is a resource limit per image, while I am looking for a way to increase the number of threads used for multiple images, not for a single image.
It is true that the only thing graphicsmagick says about parallelism is about OpenMP (which seems to be about multi-threaded single image processing). So maybe there is no support for what I am trying to do. My question might be more general then: "if I launch multiple instances of gm, why do they all run on the same thread?" I'm not sure if this an OS question or a gm question.
Each line in the batch files is:
convert in/file1.jpeg -fuzz 10% -trim -scale 112x112^ -gravity center -extent 112x112 -quality 100 out/file2.jpeg
I run the batch file with: gm batch -echo on -feedback on data/convert/simple_crop_batchfile_2.txt
I am on GraphicsMagick 1.3.18 2013-03-10 Q8 and Ubuntu 14.10, which when I upgrade with apt-get tells me: Calculating upgrade... graphicsmagick is already the newest version
My story here does show the pointlessness of using multiple batch files (although there is a 30% speedup in overall processing time using 2 batch files concurrently over 1)

Turns out I can blame this on the CPU, the picture in the question of core utilization comes from an Intel Xeon X5365 # 3.00GHz processor. Here is a picture of just 4 concurrent processes on an Intel Xeon E5-2620 v2 # 2.10GHz:
The OS and software versions are the same on the two machines (as well as the same exact task w/ the same exact data), the only difference is the CPU. In this case the later CPU is over 2x as fast (for the case of 4 batchfiles).

Related

How do I isolate 3 cores of a quadcore from Linux and use them for Halcon, exclusively?

How do I isolate 3 cores of a quadcore from Linux and use them for Halcon, exclusively?
Here is what I've tried so far:
I configured Linux to only use core 0 of the quadcore CPU by boot option isolcpu=1,2,3
I started my multi-thread C++ program and let one thread configure Halcon with a few HSystem::SetSystem(). This is the halcon main thread. By default, the "thread_pool" option is set to "true" (but I also tried "false"). And, which is important, at first, this run-function of the halcon main-thread calls pthread_setaffinity(getpid(), sizeof(set), &set); for cpu_set_t set for which I added core 1, 2 and 3 with CPU_SET(index, &set).
Anyway, now a QR matrix code with "Maximum" mode should start several threads on core 1, 2 and 3. But it doesn't work. It only runs on core 1 with almost 90% CPU load, and core 2 and 3 stay at 0% CPU load (seen with top -H). This looks to me as if Halcon does miss an magic option to use all 3 cores.
Are you 100% sure this should run in parallel?
Could you try it with a different code type (ECC200). According to https://www.mvtec.com/products/halcon/documentation/release-notes-1911-0/ in the Speedup section we know for sure the ECC200 reader is parallelized internally by HALCON. If this reader runs in parallel on your system and the QR Code Reader doesn't i would assume the QR Code reader simply isnt parallelized by HALCON.

How to make android faster on ubuntu 14.04?

I have a pc with ubuntu 14.04,the configuration is:
CPU:Intel® Core™ i7-4790 CPU # 3.60GHz × 8
Memory:16GB
A server with ubuntu-14.04-server,the configuration is:
CPU:Intel(R) Xeon(R) CPU E5-2603 v2 # 1.80GHz x 4
Memory:32GB
And I was trying to make Android by the following command:
make -j
Then the computer halt...
So how to specify the value of 'j' to ensure making fastest?
I suppose the value should be the numbers of cpu's processors...
make -j launches as many jobs in parallel as the build dependencies allow - in the kernel build case that's way too many for most consumer class machines, the system becomes barely responsive and eventually runs out of PIDs or memory.
You need to add a number to that -j. Which brings you to this Q&A:
https://unix.stackexchange.com/questions/208568/how-to-determine-the-maximum-number-to-pass-to-make-j-option
If you want to go into fine tuning you might find the BuildIn tool useful (disclaimer - I'm the author): https://apartsw-buildin.appspot.com/

multithreading ghostscript with dNumRenderingThreads=4 NO SPEED IMPROVEMENT

I'm trying to render .pdf to .png file using multithreaded ghostscript 9.07. Installed from .exe file.
For this I call following procedure:
gswin64c.exe -dNumRenderingThreads=4 -dSAFER -dBATCH -dNOPAUSE -sDEVICE=png16m -r300 -sOutputFile=Graphic1.png Graphic1.pdf
My system is Windows 8 x64 running on quad core AMD phenom II processor and my test graphic is single page 109 MB pdf.
The procedure is processing the same time (about 32s for 300 dpi) regardless of whether the dNumRenderingThreads is set or not. What's more windows task manager shows that gs process uses only 2 threads (one for parsing and one for rednering as far as I know)
What am I doing wrong that rendering is not spread on many threads?

Matlab 2011a Use all Cores Available on 64 bit Linux?

Hi I've looked online but I can't seem to find the answer whether I need to do anything to make matlab use all cores? From what I understand multi-threading has been supported since 2007. On my machine matlab only uses one core #100% and the rest hang at ~2%. I'm using a 64 bit Linux (Mint 12). On my other computer which has only 2 cores and is 32 bit Matlab seems to be utilizing both cores #100%. Not all of the time but in sufficient number of cases. On the 64 bit, 4 core PC this never happens.
Do I have to do anything in 64 bit to get Matlab to use all the cores whenever possible? I had to do some custom linking after install as Matlab wasn't finding the libraries (eg. libc.so.6) because it wasn't looking in the correct places.
By standard, since the latest release, you can use 12 cores using the Parallel Computing Toolbox. Without this toolbox, I guess you're out of luck. Any additional cores could be accessed by the MATLAB Distributed Computing Server, where you actually pay per number of worker threads.
To make matlab use your multiple cores you have to do
matlabpool open
And it of course works better if you actually have multithreaded code (like using the spmd function or parfor loops)
More info at the Matlab homepage
MATLAB has only one single thread for Computation.
That said, multiple threads would be created for certain functions which use the multithreaded features of the BLAS libraries that it uses underneath.
Thus, you would only be able to gain a 'multi threaded' advantage if you are calling functions which use these multi-threaded blas libraries.
This link has information on the list of functions which are multithreaded.
Now for the use of your cores, that would depend on your OS. I believe the OS would have to load balance your threads to be used on all cores. One CANNOT set affinities to threads from within MATLAB. One can however set worker MATLAB processes to have affinities to cores from within the Parallel Computing toolbox.
However, you could always try setting the affinity for the MATLAB process to all your processors manually by the details available at the following link for Linux
Windows users can simply right click on the process in the task manager and set affinity.
My understanding is that this is only a request to the OS and is not a hard binding rule that the OS must adhere to.

FIFO/Pipe changes between debian 5 and 6

We're currently building a chain of linux tools to do some realtime encoding for video broadcast purposes. In order to archieve this, we created a program in C++ that spawns some ffmpeg decoder processes (for both audio and video), pipe this output to the encoders (ffmpeg & mpeg2enc) through fifo's, and then pipe the encoded output to our muxer which caches a few MB of data and then outputs the muxed file through an ASI output card.
In debian 5, this setup works flawlessly, and generally doesn't even create a high CPU load. On debian 6 and Ubuntu 10.04 however, the internal buffer of the muxer gradually decreases until it hits zero, after which frequent output hickups start to occur.
Using nice and ionice doesn't seem to fix this issue. I've also tried various custom kernel compile options (increased frequency, preemptation, etc) but this also doesn't seem to work.
Altough it might be possible that there has been serious regression in either ffmpeg or mpeg2enc, I'm guessing that the problem has to do with the way the new kernel/distro handles FIFO's.
Does anybody know what could be causing this problem? Or what have been recent changes in both Debian or it's kernel configuration (between version 5 and 6) and Ubuntu that could possibly caused this undesired behaviour?

Resources