I've lots of files to compress (22K # 31.5 GB) and then unpack (install) them via Inno Installer. My current [Setup] config is:
LZMANumBlockThreads=6
LZMADictionarySize=1048576 // doesn't affect the installation performance
LZMANumFastBytes=273 // doesn't affect the installation performance
CompressionThreads=auto
DisableReadyPage=True
InternalCompressLevel=max
SolidCompression=no // 'yes' causes out of memory error due to overloading 16 GB of RAM
Compression=lzma2/ultra64
The problem exists in the installation performance. I see that only 3-4 of 48 available CPU threads are fully loaded and the unpacking process seems to be much slower than it can be on my 2 * Xeons.
NVMe SSD uses only 2-4% of its performance.
My colleagues with different PC configurations noticed the same problem.
How to bypass this bottleneck and say to Inno to use every available CPU thread and unpack my files faster?
P.S. tried different configs described there: https://github.com/teeks99/inno-test but didn't get any appliable result. As I can see, I need to do something with islzma64.exe but it doesn't have any sources - even there https://github.com/jrsoftware/issrc/tree/main/Files it is the solid executable file without any documentation.
Thank you.
try to add
[setup]
LZMAUseSeparateProcess=yes
Description of LZMAUseSeparateProcess
also, when you are using solidcompression=no, then 1 file = 1 solidblock ~= dict size, so you have ~22k of solidblocks, decomprassion from 1 (few) solid block(s) should be faster than from thousands. I think, solid compression and LZMADictionarySize=1048576 is the best variation, also lzma2 have a limits of allocating memory (its about 'out of memory error due to overloading 16 GB of RAM'), not sure how much in modification inside inno
Related
I am writing mathematical modules for analysis problems. All files are compiled to .fasl.
The sizes of these files are gradually increasing and new ones are added to them. I ran into a problem today when loading a module load("foo.mac") ~0.4s loading 100+ files and another module from 200+, which declare functions and variables without precomputing.
Error: Thread local storage exhausted fatal error encountered is SBCL pid %PRIMITIVE HALT called; the party is over. Welcome to LDB.. CPU and RAM indicators are stable at this moment
Doesn't help maxima -X '--dynamic-space-size 2048', 4096 - too, by default 1024. Why it does not work?
SBCL + Windows = works without errors. SBCL 1.4.5.debian + Linux (server) this error is thrown. However, if I reduce the size of the files a little, then the module is loaded.
I recompiled the files, checked all .UNLISP. Changed the order of uploaded files, but an error occurs when loading the most recent ones in the list. Tests run without errors. There are some ways to increase the amount "local storage" through SBCL, Maxima? In which direction to move? Any ideas
Update:
Significantly reduced the load by removing duplicate code matchdeclare(..). Now no error is observed.
From https://sourceforge.net/p/maxima/mailman/message/36659152/
maxima uses quite a few special variables which sometimes makes
sbcl run out of thread-local storage when running the testsuite.
They proposed to add an environment variable that allows to change
the thread-local storage size but added a command-line option
instead => if supported by sbcl we now generate an image with
ab bigger default thread-local storage whose size can be
overridden by users passing the --tls-limit option.
The NEWS file in SBCL's source code also indicates that the default value is 4096
changes in sbcl-1.5.2 relative to sbcl-1.5.1:
* enhancement: RISC-V support with the generational garbage collector.
* enhancement: command-line option "--tls-limit" can be used to alter the
maximum number of thread-local symbols from its default of 4096.
* enhancement: better muffling of redefinition and lambda-list warnings
* platform support:
** OS X: use Grand Central Dispatch semaphores, rather than Mach semaphores
** Windows: remove non-functional definition of make-listener-thread
* new feature: decimal reader syntax for rationals, using the R exponent
marker and/or *READ-DEFAULT-FLOAT-FORMAT* of RATIONAL.
* optimization: various Unicode tables have been packed more efficiently
When I run Inno Setup on a large set of files (>2GB), it takes a long time to run. I believe it is spending its time in the compression, which should be CPU bound, but it is only using a couple of CPUs. Is there a way to spread this across (many) more cores?
Specifically, I'm working with this boost-release repository, which has an Inno Setup script that includes:
[Setup]
....
Compression=lzma2/ultra64
....
[Files]
Source: "boost_1.69.0/*"; DestDir: "{app}"; Flags: ignoreversion recursesubdirs ignoreversion
....
Calling Compil32.exe boost_installer.iss takes approximately 25 minutes on a machine with 16 cores and 32GB of RAM (Azure F16s v2).
The set of files is approximately 2.5GB with 2 GB of that being a set of about 300 compiled libraries. The remaining 500MB is 60,000 source files.
So to get to the bottom of this, I created a test project that went through all sorts of permutations of various Inno Setup configuration options.
The ones I found useful (and gave me a 40% improvement in speed!) are:
SolidCompression=yes
LZMAUseSeparateProcess=yes
LZMANumBlockThreads=6
Without SolidCompression, the LZMANumBlockThreads doesn't have much impact. But together, I saw a more typically parallelize-able problem, where more threads gave faster results (to a point).
If you find this interesting, I'd recommend the writeup I did on it, it has a lot of data to back it up.
Try setting LZMANumBlockThreads directive (the default value is 1):
When compressing a large amount of data, the LZMA2 compressor has the ability to divide the data into "blocks" and compress two or more of these blocks in parallel through the use of additional threads (provided sufficient processor power is available). This directive specifies the number of threads to use -- that is, the maximum number of blocks that the LZMA2 compressor may compress in parallel.
Compression=zip
SolidCompression=yes
LZMAUseSeparateProcess=yes
LZMANumBlockThreads=6
use zip compression for 2x faster installations.
speed tested. approved. use zip.
I am doing Monte Carlo simulations. I am trying to direct the results of my program into a Huge file using fprintf to avoid tabs because it necessitate much memory size.
The problem is that, when the data size on file achieve 2Go, the program can't write on it anymore. I did some research in this and other sites but I didn't get a helpful response to my problem.
I am using Ubuntu 12.04 LTS with file type ext4 and the partition size is 88 Go. I am not good at computer sciences and I don't know even what means ext but I saw that this type of file can support individual files with 16 Go at least.
So can anyone tell me what to do?
The maximal file size limit for a 32 bit is 2^31 (2 GiB), but using the LFS interface on filesystems that support LFS applications can handle files as large as 263 bytes.
Thank you for your answer it was so helpful. I changed fopen with fopen64 and i used -D_FILE_OFFSET_BITS=64 when compiling, and all got fine :)
I am profiling some code on a Linux system (running on Intel Core i7 4500U) to obtain the time of ONLY the execution costs. The application is the demo mpeg2dec from libmpeg2. I am trying to obtain a probability distribution for the mpeg2 execution times. However we want to see the raw execution cost when cache is switched off.
Is there a way I can disable the cpu cache of my system via a Linux command, or via a gcc flag ? or even set the cpu (L1/L2) cache size to 0KB ? or even add some code changed to disable cache ? Of course, without modifying or rebuilding the kernel.
See this 2012 thread, someone posted a tiny kernel module source to disable cache through asm.
http://www.linuxquestions.org/questions/linux-kernel-70/disabling-cpu-caches-936077/
If disabling the cache is really necessary, then so be it.
Otherwise, to know how much time a process takes in terms of user or system "cycles", then I would recommend the getrusage() function.
struct rusage usage;
getrusage(RUSAGE_SELF, &usage);
You can call it before/after your loop/test and subtracted the values to get a good idea of how much time your process took, even if many other processes run in parallel on the same machine. The main problem you'd get is if your process start swapping. In that case your timings will be off.
double user_usage = usage.ru_utime.tv_sec + usage.ru_utime.tv_usec / 1000000.0;
double system_uage = usage.ru_stime.tv_sec + usage.ru_stime.tv_usec / 1000000.0;
This is really precise from my own experience. To increase precision, you could be root when running your test and give it a negative priority (-1 or -2 is enough.) Then it won't be swapped out until you call a function that may require it.
Of course, you still get the effect of the cache... assuming you do not handle very large amount of data with code that goes on and on (opposed to having a loop).
I'm working with IntelliJ IDEA on Linux and recently I've got 16 GB of RAM, so is there any ways to speed up my projects compilation using this memory?
First of all, in order to speedup IntelliJ IDEA itself, you may find this discussion very useful.
The easiest way to speedup compilation is to move compilation output to RAM disk.
RAM disk setup
Open fstab
$ sudo gedit /etc/fstab
(instead of gedit you can use vi or whatever you like)
Set up RAM disk mount point
I'm using RAM disks in several places in my system, and one of them is /tmp, so I'll just put my compile output there:
tmpfs /var/tmp tmpfs defaults 0 0
In this case your filesystem size will not be bounded, but it's ok, my /tmp size right now is 73MB. But if you afraid that RAM disk size will become too big - you can limit it's size, e.g.:
tmpfs /var/tmp tmpfs defaults,size=512M 0 0
Project setup
In IntelliJ IDEA, open Project Structure (Ctrl+Alt+Shift+S by default), then go to Project - 'Project compiler output' and move it to RAM disk mount point:
/tmp/projectName/out
(I've added projectName folder in order to find it easily if I need to get there or will work with several projects at same time)
Then, go to Modules, and in all your modules go to Paths and select 'Inherit project compile output path' or, if you want to use custom compile output path, modify 'Output path' and 'Test output path' the way you did it to project compiler output before.
That's all, folks!
P.S. A few numbers: time of my current project compilation in different cases (approx):
HDD: 80s
SSD: 30s
SSD+RAM: 20s
P.P.S. If you use SSD disk, besides compilation speedup you will reduce write operations on your disk, so it will also help your SSD to live happily ever after ;)
Yes you can. There is several ways to do this. First you can fine tune the JVM for the amount of memory you have. Take this https://gist.github.com/zafarella/43bc260c3c0cdc34f109 one as example.
In addition depending on what linux distribution you use there is a way creating RAM disk and rsyncing content into HDD. Basically you will place all logs and tmp files (including indexes) into RAM - your Idea will fly.
Use something like this profile-sync-daemon to keep files synced. It is possible easily add Idea as an app. Alternatively you can use anything-sync-daemon
You need to change "idea.system.path" and "idea.log.path"
More details on Idea settings could be found at their docs. The idea is to move whatever changes often into RAM.
More RAM Disk alternatives https://wiki.debian.org/SSDOptimization#Persistent_RAMDISK
The bad about this solution is that when you run out of space in RAM OS will page things and it will slow down everything.
Hope that helps.
In addition to ramdisk approach, you might speedup compilation by giving its process more memory (but not too much) and compiling independent modules in parallel. Both options can be found on Settings | Compiler.