Writing "fib" to run in parallel: -N2 is slower?

Writing "fib" to run in parallel: -N2 is slower? - haskell

I'm learning Haskell and trying write code to execute in parallel, but Haskell always runs it sequentially. And when I execute with the -N2 runtime flag it take more time to execute than if I omit this flag.
Here is code:
import Control.Parallel
import Control.Parallel.Strategies
fib :: Int -> Int
fib 1 = 1
fib 0 = 1
fib n = fib (n - 1) + fib (n - 2)
fib2 :: Int -> Int
fib2 n = a `par` (b `pseq` (a+b))
where a = fib n
b = fib n + 1
fib3 :: Int -> Int
fib3 n = runEval $ do
a <- rpar (fib n)
b <- rpar (fib n + 1)
rseq a
rseq b
return (a+b)
main = do putStrLn (show (fib3 40))
What did I do wrong? I tried this sample in Windows 7 on Intel core i5 and in Linux on Atom.
Here is log from my console session:
ghc -rtsopts -threaded -O2 test.hs
[1 of 1] Compiling Main ( test.hs, test.o )
test +RTS -s
331160283
64,496 bytes allocated in the heap
2,024 bytes copied during GC
42,888 bytes maximum residency (1 sample(s))
22,648 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
Generation 0: 0 collections, 0 parallel, 0.00s, 0.00s elapsed
Generation 1: 1 collections, 0 parallel, 0.00s, 0.00s elapsed
Parallel GC work balance: nan (0 / 0, ideal 1)
MUT time (elapsed) GC time (elapsed)
Task 0 (worker) : 0.00s ( 6.59s) 0.00s ( 0.00s)
Task 1 (worker) : 0.00s ( 0.00s) 0.00s ( 0.00s)
Task 2 (bound) : 6.33s ( 6.59s) 0.00s ( 0.00s)
SPARKS: 2 (0 converted, 0 pruned)
INIT time 0.00s ( 0.00s elapsed)
MUT time 6.33s ( 6.59s elapsed)
GC time 0.00s ( 0.00s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 6.33s ( 6.59s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 10,191 bytes per MUT second
Productivity 100.0% of total user, 96.0% of total elapsed
gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync_large_objects: 0
gen[1].sync_large_objects: 0
test +RTS -N2 -s
331160283
72,688 bytes allocated in the heap
5,644 bytes copied during GC
28,300 bytes maximum residency (1 sample(s))
24,948 bytes maximum slop
2 MB total memory in use (0 MB lost due to fragmentation)
Generation 0: 1 collections, 0 parallel, 0.00s, 0.00s elapsed
Generation 1: 1 collections, 1 parallel, 0.00s, 0.01s elapsed
Parallel GC work balance: 1.51 (937 / 621, ideal 2)
MUT time (elapsed) GC time (elapsed)
Task 0 (worker) : 0.00s ( 9.29s) 0.00s ( 0.00s)
Task 1 (worker) : 4.53s ( 9.29s) 0.00s ( 0.00s)
Task 2 (bound) : 5.84s ( 9.29s) 0.00s ( 0.01s)
Task 3 (worker) : 0.00s ( 9.29s) 0.00s ( 0.00s)
SPARKS: 2 (1 converted, 0 pruned)
INIT time 0.00s ( 0.00s elapsed)
MUT time 10.38s ( 9.29s elapsed)
GC time 0.00s ( 0.01s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 10.38s ( 9.30s elapsed)
%GC time 0.0% (0.1% elapsed)
Alloc rate 7,006 bytes per MUT second
Productivity 100.0% of total user, 111.6% of total elapsed
gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync_large_objects: 0
gen[1].sync_large_objects: 0

I think answer is that "GHC will optimise the fib function so that it does no allocation, and
computations that do no allocation cause problems for the RTS because
the scheduler never gets to run and do load-balancing (which is
necessary for parallelism)" as wrote Simon in this discussion group. Also I found good tutorial.

Related

What are different between perf with event and without event?

I want to profile the run time and some information about a program create by myself.
I'm using the Linux tool - perf to measure the run time.
the command I used is like below:
$ perf stat -r 10000 ./program > /dev/null
Performance counter stats for './program' (10000 runs):
2.510453 task-clock (msec) # 0.946 CPUs utilized ( +- 0.04% )
0 context-switches # 0.002 K/sec ( +- 15.59% )
0 cpu-migrations # 0.000 K/sec
50 page-faults # 0.020 M/sec ( +- 0.02% )
9237800 cycles # 3.680 GHz ( +- 0.00% )
4011695 instructions # 0.43 insn per cycle ( +- 0.00% )
689371 branches # 274.600 M/sec ( +- 0.00% )
5144 branch-misses # 0.75% of all branches ( +- 0.03% )
0.002653910 seconds time elapsed ( +- 0.04% )
And then I using the -e flag to customize the events by myself.
$ perf stat -r 10000 -e cache-misses ./program > /dev/null
Performance counter stats for './program' (10000 runs):
941 cache-misses ( +- 0.69% )
0.003506780 seconds time elapsed ( +- 0.25% )
You can see that the run time of two command are totally different. The run time of command with event flag is slower than the one without the flag.
Why is that happened? And which run time of command is same as the run time of real user?

How to read log file for last n min in linux

I have file in following format and I want read the file for last n no of minutes.
2019-09-22T04:00:03.052+0000: 774093.613: [GC (Allocation Failure)
Desired survivor size 47710208 bytes, new threshold 15 (max 15)
[PSYoungGen: 629228K->22591K(650752K)] 1676693K->1075010K(2049024K), 0.0139764 secs] [Times: user=0.05 sys=0.00, real=0.01 secs]
I want to read the log file for x n of minutes based on user requirement so that I can monitor it for last 30 min or 120 min based on user requirement.
I have tried below option to read the file but seems its not working as expected:
awk -F - -vDT="$(date --date="60 minutes ago" +"%Y-%m-%dT%H:%M:%S")" ' DT > $NF,$0' gc-2019-09-13-04-58.log.0.current
Also, in above command "60 minutes ago" option is there which I tried to pass as a variable like v1=30 , date --date="$v1 minutes ago", this one also not working?
Please suggest how to read this file for last x no of minutes?

Here is one for GNU awk (time functions and gensub()). First the test data, two lines of your data with year changed in the first one:
2018-09-22T04:00:03.052+0000: 774093.613: [GC (Allocation Failure)
Desired survivor size 47710208 bytes, new threshold 15 (max 15)
[PSYoungGen: 629228K->22591K(650752K)] 1676693K->1075010K(2049024K), 0.0139764 secs] [Times: user=0.05 sys=0.00, real=0.01 secs]
2019-09-22T04:00:03.052+0000: 774093.613: [GC (Allocation Failure)
Desired survivor size 47710208 bytes, new threshold 15 (max 15)
[PSYoungGen: 629228K->22591K(650752K)] 1676693K->1075010K(2049024K), 0.0139764 secs] [Times: user=0.05 sys=0.00, real=0.01 secs]
and the awk program, to which the data is fed backwards using tac:
$ tac file | gawk '
BEGIN {
threshold = systime()-10*60*60 # time threshold is set to 10 hrs
# threshold = systime()-mins*60# uncomment and replace with above
} # for command line switch
{
if(match($1,/^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}/)) {
if( mktime( gensub(/[-T:]/," ","g",substr($1,RSTART,RLENGTH))) < threshold)
exit # exit once first beyond threshold time is found
print $0 b # output current record and the buffer
b="" # reset buffer
} else # for non-time starting records:
b=ORS $0 b # buffer them
}'
You could write the program code between the 's to a file, say program.awk and run it with tac file | gawk -f program.awk and furthemore add a command line switch by uncommenting the marked line in the BEGIN section and running with:
$ gawk -v mins=10 -f program.awk <(tac file)

Get the last N lines of a log file. The most important command is "tail". ...
Get new lines from a file continuously. To get all newly added lines from a log file in realtime on the shell, use the command: tail -f /var/log/mail.log. ...
Get the result line by line. ...
Search in a log file. ...
View the whole content of a file.

What does "file system outputs" mean with time -v?

What is 'file system outputs' counting when using the Linux 'time' command with dd?
It doesn't equal dd 'count' (presumably the number of calls to fwrite?), nor the size of the output in 4096-byte pages (which should be 1024000 in this example).
An example:
> /usr/bin/time -v dd if=/dev/zero of=/tmp/dd.test bs=4M count=1000
1000+0 records in
1000+0 records out
4194304000 bytes (4.2 GB) copied, 4.94305 s, 849 MB/s
Command being timed: "dd if=/dev/zero of=/tmp/dd.test bs=4M count=1000"
User time (seconds): 0.00
System time (seconds): 4.72
Percent of CPU this job got: 95%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:04.94
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 5040
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 1322
Voluntary context switches: 32
Involuntary context switches: 15
Swaps: 0
File system inputs: 240
File system outputs: 8192000
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

The command time is printing out values from the rusage struct (see getrusage(2)).
And according to the source:
/*
* We approximate number of blocks, because we account bytes only.
* A 'block' is 512 bytes
*/
static inline unsigned long task_io_get_oublock(const struct task_struct *p)
{
return p->ioac.write_bytes >> 9;
}
So (at least on Linux) "File system outputs" in time output is the total number of bytes written / 512.

CPU and HDD information

I searched but I found nothing for my problem.
I would like to have in Linux command line the information about the CPU usage and the local HDDs with formatting text like exactly as the examples below for my program.
These examples are command line outputs on MS-Windows.
I hope it is possible on Linux, too.
Thank you
wmic logicaldisk where drivetype=3 get caption,freespace,size
Caption FreeSpace Size
C: 135314194432 255953203200
D: 126288519168 128033222656
E: 336546639872 1000194015232
F: 162184503296 1000194015232
wmic cpu get loadpercentage
LoadPercentage
4

You won't find anything exactly like the output you provided.
The only option is to use for disk space df:
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/root 73216256 27988724 41485276 41% /
devtmpfs 8170164 0 8170164 0% /dev
tmpfs 8203680 544 8203136 1% /dev/shm
tmpfs 8203680 12004 8191676 1% /run
tmpfs 5120 4 5116 1% /run/lock
tmpfs 8203680 0 8203680 0% /sys/fs/cgroup
/dev/sdb1 482922 83939 374049 19% /boot
and for cpu you have many more options, e.g.
vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 11865304 149956 1474172 0 0 53 46 126 707 3 0 96 0 0
or top -b | head:
top - 21:48:43 up 54 min, 1 user, load average: 0.13, 0.17, 0.22
Tasks: 188 total, 1 running, 187 sleeping, 0 stopped, 0 zombie
%Cpu(s): 3.0 us, 0.4 sy, 0.1 ni, 96.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 16407364 total, 11848936 free, 2888844 used, 1669584 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 13230972 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 40544 6440 3780 S 0.0 0.0 0:01.15 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0

There is no command that gives you a load percentage of the cpu. It's actually impossible to get that with a system call (nor in linux neither in Windows). What you can get is the number of ticks currently executed (for each field, user, system, io,irq idle)..., then call it again a certain amount of time later and calculate it. That way is how work all the commands for reading the cpu percentage.
Here a script bash that gives you that: (just create a file named for example cpu.sh paste this code and execute to see the results)
_estado()
{
cat /proc/stat | grep "cpu " | sed -e 's/ */:/g' -e 's/^cpux//'
}
_ticksconcretos()
{
echo $1 | cut -d ':' -f $2
}
while true ; do
INICIAL=$(_estado)
sleep 1
FINAL=$(_estado)
UsuarioI=$(_ticksconcretos $INICIAL 1)
UsuarioF=$(_ticksconcretos $FINAL 1)
NiceI=$(_ticksconcretos $INICIAL 2)
NiceF=$(_ticksconcretos $FINAL 2)
SistemaI=$(_ticksconcretos $INICIAL 3)
SistemaF=$(_ticksconcretos $FINAL 3)
idleI=$(_ticksconcretos $INICIAL 4)
idleF=$(_ticksconcretos $FINAL 4)
IOI=$(_ticksconcretos $INICIAL 5)
IOF=$(_ticksconcretos $FINAL 5)
IRQI=$(_ticksconcretos $INICIAL 6)
IRQF=$(_ticksconcretos $FINAL 6)
SOFTIRQI=$(_ticksconcretos $INICIAL 7)
SOFTIRQF=$(_ticksconcretos $FINAL 7)
STEALI=$(_ticksconcretos $INICIAL 8)
STEALF=$(_ticksconcretos $FINAL 8)
InactivoF=$(( $idleF + $IOF ))
InactivoI=$(( $idleI + $IOI ))
ActivoI=$(( $UsuarioI + $NiceI + $SistemaI + $IRQI + $SOFTIRQI + $STEALI ))
ActivoF=$(( $UsuarioF + $NiceF + $SistemaF + $IRQF + $SOFTIRQF + $STEALF ))
TOTALI=$(( $ActivoI + $InactivoI ))
TOTALF=$(( $ActivoF + $InactivoF ))
PORC=$(( ( ( ( $TOTALF - $TOTALI ) - ( $InactivoF - $InactivoI ) ) * 100 / ( $TOTALF - $TOTALI ) ) ))
clear
echo "CPU: $PORC %"
done
For the free space You could use something like this:
df -h -x tmpfs -x devtmpfs | awk -F " " '{print $1 " " $4 " " $2}'
wich will give you this output:
Filesystem Free Size
/dev/sda1 16G 25G
/dev/sda5 46G 79G
/dev/sdb8 130G 423G
sda represents the first disk, sda1 the first partition, sda2, the second one etc. you can add (or change) $6 inside the print to get the mount points instead of the partitions, change the order or even more things.

for and start commands in a batch for parallel and sequential work

I have an 8 core CPU with 8GB of RAM, and I'm creating a batch file to automate 7-zip CLI in exhausting most parameters and variables to compress the same set of files with the ultimate goal of finding the strongest combination of parameters and variables that result in the smallest archive size possible.
This is very time consuming by nature especially when the set of files to be processed is in gigabytes. I need a way not only to automate but to speed up this whole process.
7-zip works with different compression algorithms, some are single-threaded only, and some are multi-threaded, some do not require much amount of memory, and some require huge amounts of it and could even surpass the 8GB barrier. I've already successfully created an automated batch that works in sequence which exclude combinations requiring more than 8GB of memory.
I've split the different compression algorithms in several batches to simplify the whole process. For example, compression in PPMd as a 7z archive uses 1-thread and up to 1024MB. This is my current batch:
#echo off
echo mem=1m 2m 3m 4m 6m 8m 12m 16m 24m 32m 48m 64m 96m 128m 192m 256m 384m 512m 768m 1024m
echo o=2 3 4 5 6 7 8 10 12 14 16 20 24 28 32
echo s=off 1m 2m 4m 8m 16m 32m 64m 128m 256m 512m 1g 2g 4g 8g 16g 32g 64g on
echo x=1 3 5 7 9
for %%x IN (9) DO for %%d IN (1024m 768m 512m 384m 256m 192m 128m 96m 64m 48m 32m 24m 16m 12m 8m 6m 4m 3m 2m 1m) DO for %%w IN (32 28 24 20 16 14 12 10 8 7 6 5 4 3 2) DO for %%s IN (on) DO 7z.exe a teste.resultado\%%xx.ppmd.%%dd.%%ww.%%ss.7z .\teste.original\* -mx=%%x -m0=PPMd:mem=%%d:o=%%w -ms=%%s
exit
x, s, o and mem are parameters, and what's after each of them are the variables which 7z.exe will work with. x and s in this case are of no concern, they mean compression strength and solid block size for the archive.
That batch will work fine, but is limited to running only 1 instance of 7z.exe at a time and now I'm looking for a way to make it run more 7z.exe instances in parallel but without exceeding 8GB of RAM or 8 threads at once, whichever comes first, before proceeding to do the next ones in the sequence.
How can I improve this? I have some ideas but I don't know how to make them work in a batch. I was thinking of 2 other variables that won't interact with the 7z processes but would control when the next 7z instance would start. One variable would keep track of how many threads are currently in use and another would track how much memory are in use. Could that work?
Edit:
Sorry, I need to add details, I'm new to this posting style. Following this answer - https://stackoverflow.com/a/19481253/2896127 - I mentioned 8 batches were created and that 7z.PPMd batch was one of them. Maybe listing all the batches and how 7z deals with the parameters will give a better insight on the whole issue. I'll start with the simple ones:
7z.PPMd - 1 fully utilized thread and dictionary dependant 32m-1055m memory usage per instance.
7z.BZip2 - 8 fully utilized threads and fixed 109m memory usage per instance.
zip.Bzip2 - 8 partially utilized threads and fixed 336m memory usage per instance.
zip.Deflate - 8 partially utilized threads and fixed 260m memory usage per instance.
zip.PPMd - 8 partially utilized threads and dictionary dependant 280m-2320m memory usage per instance.
What I mean with partially utilized threads is that, while I assign 8 threads to be used by each 7.exe instance, the algorithm can do variable CPU usage at a randomly fashion, out of my control, unpredictable, but the limitation is set there - no more than 8 threads. In the case of 8 fully utilized threads, it means that on my 8 core CPU, each instance is utilizing 100% of CPU.
The most complex ones - 7z.LZMA, 7z.LZMA2, zip.LZMA - will need to be explained in detail but I am running short on time now. I'll be back to edit the LZMA part whenever I have more free time.
Thanks again.
EDIT: Adding in LZMA part.
7z.LZMA - each instance is n-threaded, ranging from 1 to 2:
1 fully utilized thread, dictionary dependant, 64k to 512m:
64k dictionary uses 32m memory
...
512m dictionary uses 5407m memory
excluded range: 768m to 1024m (above the limit of 8192m memory available)
2 partially utilized threads, dictionary dependant, 64k to 512m:
64k dictionary uses 38m memory
...
512m dictionary uses 5413m memory
excluded range: 768m to 1024m (above the limit of 8192m memory available)
7z.LZMA2 - each instance is n-threaded, ranging from 1 to 8:
1 fully utilized thread, dictionary dependant, 64k to 512m:
64k dictionary uses 32m memory
...
512m dictionary uses 5407m memory
excluded range: 768m to 1024m (above the limit of 8192m memory available)
2 or 3 partially utilized threads, dictionary dependant, 64k to 512m:
64k dictionary uses 38m memory
...
512m dictionary uses 5413m memory
excluded range: 768m to 1024m (above the limit of 8192m memory available)
4 or 5 partially utilized threads, dictionary dependant, 64k to 256m:
64k dictionary uses 51m memory
...
256m dictionary uses 5677m memory
excluded range: 384m to 1024m (above the limit of 8192m memory available)
6 or 7 partially utilized threads, dictionary dependant, 64k to 192m:
64k dictionary uses 62m memory
...
192m dictionary uses 6965m memory
excluded range: 256m to 1024m (above the limit of 8192m memory available)
8 partially utilized threads, dictionary dependant, 64k to 128m:
64k dictionary uses 72m memory
...
128m dictionary uses 6717m memory
excluded range: 192m to 1024m (above the limit of 8192m memory available)
zip.LZMA - each instance is n-threaded, ranging from 1 to 8:
1 fully utilized thread, dictionary dependant, 64k to 512m:
64k dictionary uses 3m memory
...
512m dictionary uses 5378m memory
excluded range: 768m to 1024m (above the limit of 8192m memory available)
2 or 3 partially utilized threads, dictionary dependant, 64k to 512m:
64k dictionary uses 9m memory
...
512m dictionary uses 5384m memory
excluded range: 768m to 1024m (above the limit of 8192m memory available)
4 or 5 partially utilized threads, dictionary dependant, 64k to 256m:
64k dictionary uses 82m memory
...
256m dictionary uses 5456m memory
excluded range: 384m to 1024m (above the limit of 8192m memory available)
6 or 7 partially utilized threads, dictionary dependant, 64k to 256m:
64k dictionary uses 123m memory
...
256m dictionary uses 8184m (very close to the limit though, I may consider excluding it)
excluded range: 384m to 1024m (above the limit of 8192m memory available)
8 partially utilized threads, dictionary dependant, 64k to 128m:
64k dictionary uses 164m memory
...
128m dictionary uses 5536m memory
excluded range: 192m to 1024m (above the limit of 8192m memory available)
I'm trying to understand the behaviour of the commands with nul in them. I don't quite understand what's happening during that part, what those symbols ^ > ^&1 "" are meant to say.
2>nul del %lock%!nextProc!
%= Redirect the lock handle to the lock file. The CMD process will =%
%= maintain an exclusive lock on the lock file until the process ends. =%
start /b "" cmd /c %lockHandle%^>"%lock%!nextProc!" 2^>^&1 !cpu%%N! !cmd!
)
set "launch="
Then later on, at the :wait code:
) 9>>"%lock%%%N"
) 2>nul
if %endCount% lss %startCount% (
1>nul 2>nul ping /n 2 ::1
goto :wait
)
2>nul del %lock%*
EDIT 2 (29-10-2013): Adding the current point of the situation.
After trial and error research, complemented with step by step notes of what's happening, I was able to understand the behaviour above. I simplified the line with start command to this:
start /b /low cmd /c !cmd!>"%lock%!nextProc!"
Though it works, I still don't understand the meaning of 1^>"filename" 2^>^&1 'command'. I know it is related to writing text in the filename what would otherwise be displayed to me. In this case, it would show all of 7z.exe text but written in the file. Until 7z.exe instance finishes its job, nothing is written in the file, but the file already exists, yet at the same time doesn't exist. When 7z.exe actually finishes, the file is finalized and this time it exists for the next part of the script.
Now I can understand the processing behaviour of the suggested script and I'm complementing it with something of my own - I am trying to implement all batches into "one batch do it all" script. In the simplified version, this is it:
echo 8 threads - maxproc=1
for %%x IN (9) DO for %%t IN (8) DO for %%d IN (900k) DO for %%s IN (on) DO 7z.exe a teste.resultado\%%xx.bzip2.%%tt.%%dd.%%ss.7z .\teste.original\* -mx=%%x -ms=%%s -m0=BZip2:d=%%d:mt=%%t
for %%x IN (9) DO for %%t IN (8) DO for %%d IN (900k) DO 7z.exe a teste.resultado\%%xx.bzip2.%%tt.%%dd.zip .\teste.original\* -mx=%%x -mm=BZip2:d=%%d -mmt=%%t
for %%x IN (9) DO for %%t IN (8) DO for %%w IN (257 256 192 128 96 64 48 32 24 16 12 8) DO 7z.exe a teste.resultado\%%xx.deflate64.%%tt.%%ww.zip .\teste.original\* -mx=%%x -mm=deflate64:fb=%%w -mmt=%%t
for %%x IN (9) DO for %%t IN (8) DO for %%w IN (258 256 192 128 96 64 48 32 24 16 12 8) DO 7z.exe a teste.resultado\%%xx.deflate.%%tt.%%ww.zip .\teste.original\* -mx=%%x -mm=deflate:fb=%%w -mmt=%%t
for %%x IN (9) DO for %%t IN (8) DO for %%d IN (256m 128m 64m 32m 16m 8m 4m 2m 1m) DO for %%w IN (16 15 14 13 12 11 10 9 8 7 6 5 4 3 2) DO 7z.exe a teste.resultado\%%xx.ppmd.%%tt.%%dd.%%ww.zip .\teste.original\* -mx=%%x -mm=PPMd:mem=%%d:o=%%w -mmt=%%t
echo 4 threads - maxproc=2
for %%x IN (9) DO for %%t IN (4) DO for %%d IN (256m) DO for %%w IN (273 256 192 128 96 64 48 32 24 16 12 8) DO for %%s IN (on) DO 7z.exe a teste.resultado\%%xx.lzma2.%%tt.%%dd.%%ww.%%ss.7z .\teste.original\* -mx=%%x -ms=%%s -m0=lzma2:d=%%d:fb=%%w -mmt=%%t
echo 2 threads - maxproc=4
for %%x IN (9) DO for %%t IN (2) DO for %%d IN (512m) DO for %%w IN (273 256 192 128 96 64 48 32 24 16 12 8) DO for %%s IN (on) DO 7z.exe a teste.resultado\%%xx.lzma.%%tt.%%dd.%%ww.%%ss.7z .\teste.original\* -mx=%%x -ms=%%s -m0=LZMA:d=%%d:fb=%%w -mmt=%%t
for %%x IN (9) DO for %%t IN (2) DO for %%d IN (512m) DO for %%w IN (273 256 192 128 96 64 48 32 24 16 12 8) DO for %%s IN (on) DO 7z.exe a teste.resultado\%%xx.lzma2.%%tt.%%dd.%%ww.%%ss.7z .\teste.original\* -mx=%%x -ms=%%s -m0=lzma2:d=%%d:fb=%%w -mmt=%%t
for %%x IN (9) DO for %%t IN (2) DO for %%d IN (512m) DO for %%w IN (273 256 192 128 96 64 48 32 24 16 12 8) DO 7z.exe a teste.resultado\%%xx.lzma.%%tt.%%dd.%%ww.zip .\teste.original\* -mx=%%x -mm=lzma:d=%%d:fb=%%w -mmt=%%t
echo 1 threads - maxproc=8
for %%x IN (9) DO for %%t IN (1) DO for %%d IN (512m) DO for %%w IN (273 256 192 128 96 64 48 32 24 16 12 8) DO for %%s IN (on) DO 7z.exe a teste.resultado\%%xx.lzma.%%tt.%%dd.%%ww.%%ss.7z .\teste.original\* -mx=%%x -ms=%%s -m0=LZMA:d=%%d:fb=%%w -mmt=%%t
for %%x IN (9) DO for %%t IN (1) DO for %%d IN (512m) DO for %%w IN (273 256 192 128 96 64 48 32 24 16 12 8) DO for %%s IN (on) DO 7z.exe a teste.resultado\%%xx.lzma2.%%tt.%%dd.%%ww.%%ss.7z .\teste.original\* -mx=%%x -ms=%%s -m0=lzma2:d=%%d:fb=%%w -mmt=%%t
for %%x IN (9) DO for %%d IN (1024m 768m 512m 384m 256m 192m 128m 96m 64m 48m 32m 24m 16m 12m 8m 6m 4m 3m 2m 1m) DO for %%w IN (32 28 24 20 16 14 12 10 8 7 6 5 4 3 2) DO for %%s IN (on) DO 7z.exe a teste.resultado\%%xx.ppmd.%%dd.%%ww.%%ss.7z .\teste.original\* -mx=%%x -m0=PPMd:mem=%%d:o=%%w -ms=%%s
for %%x IN (9) DO for %%t IN (1) DO for %%d IN (512m) DO for %%w IN (273 256 192 128 96 64 48 32 24 16 12 8) DO 7z.exe a teste.resultado\%%xx.lzma.%%tt.%%dd.%%ww.zip .\teste.original\* -mx=%%x -mm=lzma:d=%%d:fb=%%w -mmt=%%t
In short, I want to process all that in the most efficient manner possible. Doing it by deciding how many processes can run at a time would be a way, but then there's also the memory required for each process, so that the sum of all required memory by those processes won't exceed 8192 MB. I got this part working.
#echo off
setlocal enableDelayedExpansion
set "maxMem=8192"
set "maxThreads=8"
:cycle1
set "cycleCount=4"
set "cycleThreads=1"
set "maxProc="
set /a "maxProc=maxThreads/cycleThreads"
set "cycleFor1=for %%x IN (9) DO for %%t IN (1) DO for %%d IN (512m) DO for %%w IN (273 256 192 128 96 64 48 32 24 16 12 8) DO for %%s IN (on) DO ("
set "cycleFor2=for %%x IN (9) DO for %%t IN (1) DO for %%d IN (512m) DO for %%w IN (273 256 192 128 96 64 48 32 24 16 12 8) DO for %%s IN (on) DO ("
set "cycleFor3=for %%x IN (9) DO for %%d IN (1024m 768m 512m 384m 256m 192m 128m 96m 64m 48m 32m 24m 16m 12m 8m 6m 4m 3m 2m 1m) DO for %%w IN (32 28 24 20 16 14 12 10 8 7 6 5 4 3 2) DO for %%s IN (on) DO ("
set "cycleFor4=for %%x IN (9) DO for %%t IN (1) DO for %%d IN (512m) DO for %%w IN (273 256 192 128 96 64 48 32 24 16 12 8) DO ("
set "cycleCmd1=7z.exe a teste.resultado\%%xx.lzma.%%tt.%%dd.%%ww.%%ss.7z .\teste.original\* -mx=%%x -ms=%%s -m0=LZMA:d=%%d:fb=%%w -mmt=%%t"
set "cycleCmd2=7z.exe a teste.resultado\%%xx.lzma2.%%tt.%%dd.%%ww.%%ss.7z .\teste.original\* -mx=%%x -ms=%%s -m0=lzma2:d=%%d:fb=%%w -mmt=%%t"
set "cycleCmd3=7z.exe a teste.resultado\%%xx.ppmd.%%dd.%%ww.%%ss.7z .\teste.original\* -mx=%%x -m0=PPMd:mem=%%d:o=%%w -ms=%%s"
set "cycleCmd4=7z.exe a teste.resultado\%%xx.lzma.%%tt.%%dd.%%ww.zip .\teste.original\* -mx=%%x -mm=lzma:d=%%d:fb=%%w -mmt=%%t"
set "tempMem1=5407"
set "tempMem2=5407"
set "tempMem3=1055"
set "tempMem4=5378"
rem set "tempMem1=5407"
rem set "tempMem2=5407"
rem set "tempMem3=1055 799 543 415 287 223 159 127 95 79 63 55 47 43 39 37 35 34 33 32"
rem set "tempMem4=5378"
set "memSum=0"
if not defined memRem set "memRem=!maxMem!"
for /l %%N in (1 1 %cycleCount%) DO (set "tempProc%%N=")
for /l %%N in (1 1 %cycleCount%) DO (
set memRem
set /a "tempProc%%N=%memRem%/tempMem%%N"
set /a "memSum+=tempMem%%N"
set /a "memRem-=tempMem%%N"
set /a "maxProc=!tempProc%%N!"
call :executeCycle
set /a "memRem+=tempMem%%N"
set /a "memSum-=tempMem%%N"
set /a "maxProc-=!tempProc%%!
)
goto :fim
:executeCycle
set "lock=lock_%random%_"
set /a "startCount=0, endCount=0"
for /l %%N in (1 1 %maxProc%) DO set "endProc%%N="
set launch=1
for %%x IN (9) DO for %%t IN (1) DO for %%d IN (512m) DO for %%w IN (273 256 192 128 96 64 48 32 24 16 12 8) DO for %%s IN (on) DO (
set "cmd=7z.exe a teste.resultado\%%xx.lzma.%%tt.%%dd.%%ww.%%ss.7z .\teste.original\* -mx=%%x -ms=%%s -m0=LZMA:d=%%d:fb=%%w -mmt=%%t"
if !startCount! lss %maxProc% (
set /a "startCount+=1, nextProc=startCount"
) else (
call :wait
)
set cmd!nextProc!=!cmd!
echo !time! - proc!nextProc!: starting !cmd!
2>nul del %lock%!nextProc!
start /b /low cmd /c !cmd!>"%lock%!nextProc!"
)
set "launch="
:wait
for /l %%N in (1 1 %startCount%) do (
if not defined endProc%%N if exist "%lock%%%N" (
echo !time! - proc%%N: finished !cmd%%N!
if defined launch (
set nextProc=%%N
exit /b
)
set /a "endCount+=1, endProc%%N=1"
) 9>>"%lock%%%N"
) 2>nul
if %endCount% lss %startCount% (
1>nul 2>nul ping /n 2 ::1
goto :wait
)
2>nul del %lock%*
echo ===
echo Thats all folks!
exit /b
:fim
pause
I have trouble with cycleFor1 and cycleCmd1 located in :cycle1 part - they should be replacing the for line and the first cmd variable inside the :executeCycle, to make it work as I intend to. How do I do that?
The other issue I have is about tempMem3. I have logged all memory required when the command cycleCmd3 would be running. It is dictionary dependant. tempMem3 and cycleCmd3 are related like this:
for %%d IN (1024m 768m 512m 384m 256m 192m 128m 96m 64m 48m 32m 24m 16m 12m 8m 6m 4m 3m 2m 1m) DO
set "tempMem3=1055 799 543 415 287 223 159 127 95 79 63 55 47 43 39 37 35 34 33 32"
So 1024m would use 1055, 768m would use 799, and so on till 1m using 32. I don't know how to translate that into the script.
Any help is appreciated.

I've already posted a robust batch solution that limits the number of parallel processes at Parallel execution of shell processes. That script uses a list of commands that is embedded within the script. Follow the link to see how it works.
I modified that script to generate the commands using FOR loops as per your question. I also set the limit to 8 simultaneous processes.
Your maximum memory is 1g, and you never have more than 8 processes, so I don't see how you could ever exceed 8g. If you increase the max memory per processes, then you do have to worry about total memory. you will have to add additional logic to keep track of how much memory is being used, and which cpu IDs are available. Note that batch numbers are limited to ~2g, so I recommend computing memory used in megabytes.
By default, the script hides the output of the commands. If you want to see the output, then run it with the /O option.
#echo off
setlocal enableDelayedExpansion
:: Display the output of each process if the /O option is used
:: else ignore the output of each process
if /i "%~1" equ "/O" (
set "lockHandle=1"
set "showOutput=1"
) else (
set "lockHandle=1^>nul 9"
set "showOutput="
)
:: Define the maximum number of parallel processes to run.
:: Each process number can optionally be assigned to a particular server
:: and/or cpu via psexec specs (untested).
set "maxProc=8"
:: Optional - Define CPU targets in terms of PSEXEC specs
:: (everything but the command)
::
:: If a cpu is not defined for a proc, then it will be run on the local machine.
:: I haven't tested this feature, but it seems like it should work.
::
:: set cpu1=psexec \\server1 ...
:: set cpu2=psexec \\server1 ...
:: set cpu3=psexec \\server2 ...
:: etc.
:: For this demo force all cpu specs to undefined (local machine)
for /l %%N in (1 1 %maxProc%) do set "cpu%%N="
:: Get a unique base lock name for this particular instantiation.
:: Incorporate a timestamp from WMIC if possible, but don't fail if
:: WMIC not available. Also incorporate a random number.
set "lock="
for /f "skip=1 delims=-+ " %%T in ('2^>nul wmic os get localdatetime') do (
set "lock=%%T"
goto :break
)
:break
set "lock=%temp%\lock%lock%_%random%_"
:: Initialize the counters
set /a "startCount=0, endCount=0"
:: Clear any existing end flags
for /l %%N in (1 1 %maxProc%) do set "endProc%%N="
:: Launch the commands in a loop
set launch=1
echo mem=1m 2m 3m 4m 6m 8m 12m 16m 24m 32m 48m 64m 96m 128m 192m 256m 384m 512m 768m 1024m
echo o=2 3 4 5 6 7 8 10 12 14 16 20 24 28 32
echo s=off 1m 2m 4m 8m 16m 32m 64m 128m 256m 512m 1g 2g 4g 8g 16g 32g 64g on
echo x=1 3 5 7 9
for %%x IN (9) DO for %%d IN (1024m 768m 512m 384m 256m 192m 128m 96m 64m 48m 32m 24m 16m 12m 8m 6m 4m 3m 2m 1m) DO (
set "cmd=7z.exe a teste.resultado\%%xx.ppmd.%%dd.%%ww.%%ss.7z .\teste.original\* -mx=%%x -m0=PPMd:mem=%%d:o=%%w -ms=%%s"
if !startCount! lss %maxProc% (
set /a "startCount+=1, nextProc=startCount"
) else (
call :wait
)
set cmd!nextProc!=!cmd!
if defined showOutput echo -------------------------------------------------------------------------------
echo !time! - proc!nextProc!: starting !cmd!
2>nul del %lock%!nextProc!
%= Redirect the lock handle to the lock file. The CMD process will =%
%= maintain an exclusive lock on the lock file until the process ends. =%
start /b "" cmd /c %lockHandle%^>"%lock%!nextProc!" 2^>^&1 !cpu%%N! !cmd!
)
set "launch="
:wait
:: Wait for procs to finish in a loop
:: If still launching then return as soon as a proc ends
:: else wait for all procs to finish
:: redirect stderr to null to suppress any error message if redirection
:: within the loop fails.
for /l %%N in (1 1 %startCount%) do (
%= Redirect an unused file handle to the lock file. If the process is =%
%= still running then redirection will fail and the IF body will not run =%
if not defined endProc%%N if exist "%lock%%%N" (
%= Made it inside the IF body so the process must have finished =%
if defined showOutput echo ===============================================================================
echo !time! - proc%%N: finished !cmd%%N!
if defined showOutput type "%lock%%%N"
if defined launch (
set nextProc=%%N
exit /b
)
set /a "endCount+=1, endProc%%N=1"
) 9>>"%lock%%%N"
) 2>nul
if %endCount% lss %startCount% (
1>nul 2>nul ping /n 2 ::1
goto :wait
)
2>nul del %lock%*
if defined showOutput echo ===============================================================================
echo Thats all folks!

To execute at the same time no more than 8 instances of 7z.exe process you could do this:
#Echo OFF & Setlocal EnableDelayedExpansion
Set /A "pCount=0" & REm Process count
For
...
) DO (
Set /A "pCount+=1"
If !pCount! LEQ 8 (
Start /B 7z.exe a teste.resultado\%%xx.ppmd.%%dd.%%ww.%%ss.7z .\teste.original\* -mx=%%x -m0=PPMd:mem=%%d:o=%%w -ms=%%s
)
)
...
If do you want to run each process in a new parallel CMD window then you would replace the Start /B line on my code for this else:
CMD /C "Start /w 7z.exe a teste.resultado\%%xx.ppmd.%%dd.%%ww.%%ss.7z .\teste.original\* -mx=%%x -m0=PPMd:mem=%%d:o=%%w -ms=%%s"

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Writing "fib" to run in parallel: -N2 is slower? - haskell

Related

What are different between perf with event and without event?

How to read log file for last n min in linux

What does "file system outputs" mean with time -v?

CPU and HDD information

for and start commands in a batch for parallel and sequential work

Categories

Resources