What's a reasonable size for a Visual C++ PDB file?

What's a reasonable size for a Visual C++ PDB file? - visual-c++

I am working to reduce the build time of a large Visual C++ 2008 application. One of the worst bottlenecks appears to be the generation of the PDB file: during the linking stage, mspdbsrv.exe quickly consumes available RAM, and the build machine begins to page constantly.
My current theory is that our PDB files are simply too large. However, I've been unable to find any information on what the "normal" size of a PDB file is. I've taking some rough measurements of one of the DLLs in our application, as follows:
CPP files: 34.5 MB, 900k lines
Header files: 21 MB, 400k lines
Compiled DLL: 33 MB (compiled for debug, not release)
PDB: 187 MB
So, the PDB file is roughly 570% the size of the DLL. Can someone with experience with large Visual C++ applications tell me whether these ratios and sizes make sense? Or is there a sign here that we are doing something wrong?
(The largest PDB file in our application is currently 271 MB, for a 47.5 MB DLL. Source code size is harder to measure for that one, though.)
Thanks!

Yes, .pdb files can be very large - even of the sizes you mention. Since a .pdb file contains data to map source lines to machine code and you compile a lot of code there's a lot of data in the .pdb file and you likely can't do anything with that directly.
One thing you could try is to split your program into smaller parts - DLLs. Each DLL will have its own independent .pdb. However I seriously doubt it will decrease the build time.

Do you really need full debug information at all time? You can create a configuration with less debug info in it.
But as sharptooth already said, it is time to refactor and split your program in small more maintainable parts. This won't only reduce build time.

Related

Why is a fresh install of Haskell-Stack and GHC so large/big?

When doing a fresh install of Haskell Stack through the install script from here:
wget -qO- https://get.haskellstack.org/ | sh
Followed by:
stack setup
you will end up with a $HOME/.stack/ directory of 1.5 GB size (from just a 120+ MB download). Further if you run:
stack update
the size increases to 2.5 GB.
I am used to Java which is usually considered large/big (covers pretty much everything and has deprecated alternatives for backwards compatibility), but as a comparison: an IDE including a JDK, a stand alone JDK, and the JDK source is probably around 1.5 GB in size.
On the other hand, that Haskell which is a "small beautiful" language (from what I have heard and read, this is probably referring mostly to the syntax and semantics, but still), is that large/big, seems strange to me.
Why is it so big (is it related to this question?)?
Is this size normal or have I installed something extra?
If there are several (4?, 5?) flavors of everything, then can I remove all but one?
Are some of the data cache/temporary that can be removed?
The largest directories are: .stack/programs/x86_64-linux/ghc-tinfo6-nopie-8.2.2/lib/ghc-8.2.2 (1.3 GB) and .stack/indices/Hackage (980 MB). I assume the first one are installed packages (and related to stack setup) and the latter is some index over the Hackage package archive (and related to stack update)? Can these be reduced (as above in 3 or grabbing needed Hackage information online)?

As you can probably see by inspection, it is a combination of:
three flavors (static, dynamic, and profiled) of the GHC runtime (about 400 megs total) and the core GHC libraries (another 700 megs total) plus 100 megs of interface files and another 200 megs of documentation and 120 megs of compressed source (1.5 gigs total, all under programs/x86_64-linux/ghc-8.2.2* or similar)
two identical copies of the uncompressed Hackage index 00-index.tar and 01-index.tar, each containing the .cabal file for every version of every package ever published in the Hackage database, each about 457 megs, plus a few other files to bring the total up to 1.0 gigs
The first of these is installed when you run stack setup; the second when you run stack update.
To answer your questions:
It's so big because clearly no one has made any effort to make it smaller, as evidenced by the whole 00-index.tar, 00-index.tar.gz, and 01-index.tar situation.
That's a normal size for a minimum install.
You can remove the profile versions (the *_p.a files) if you never want to compile a program with profiling. I haven't tested this extensively, but it seems to work. I guess this'll save you around 800 megs. You can also remove the static versions (all *.a files) if you only want to dynamically link programs (i.e., using ghc -dynamic). Again, I haven't tested this extensively, but it seems to work. Removing the dynamic versions would be very difficult -- you'd have to find a way to remove only those *.so files that GHC itself doesn't need, and anything you did remove would no longer be loadable in the interpreter.
Several things are cached and you can remove them. For example, you can remove 00-index.tar and 00-index.tar.gz (saving about half a gigabyte), and Stack seems to run fine. It'll recreate them the next time you run stack update, though. I don't think this is documented anywhere, so it'll be a lot of trial and error determining what can be safely removed.
I think this question has already been covered above.
A propos of nothing, the other day, I saw a good deal on some 3-terabyte drives, and in my excitement I ordered two before realizing I didn't really have anything to put on them. It kind of puts a few gigabytes in perspective, doesn't it?
I guess I wouldn't expend a lot of effort trying to trim down your .stack directory, at least on a beefy desktop machine. If you're working on a laptop with a relatively small SSD, think about maybe putting your .stack directory on a filesystem that supports transparent compression (e.g., Btrfs), if you think it's likely to get out of hand.

Slow linking in release with optimization disabled

I have a project (VC2005) which takes an unreasonable time (over 40 min) to link in Release while it is linked in less than 5 sec in Debug.
Both builds have incremental linking disabled and all files are located on the same drive.
Disabling Linker optimization in Release does not help.
Task manager never shows more than 150,000 K memory used by linker, which for a computer with 3GB of RAM is nothing.
I am building much bigger projects and never noticed such difference in building time.
Any ideas why this happens?

As remarked, the most probable reason is /LTCG (whole program optimization).
Other factors might be individual files compiled with /Gy (you should see some warnings in the output), or /OPT:REF, /OPT:ICF (check project properties/linker/optimization), or - very unlikely - you're unknowingly running some phase of PGO instrumentation.

My J2ME application take a long time to start running?

I make a j2me application that almost all of it, are text files.
size: 3mb
The problem is, when I run it on my mobile, it take about 10 sec to run. I do nothing on startup. I have another app with size: 7mb, but it runs without any delay!
Jar files link:
mine: http://s1.picofile.com/file/7252355799/mine.jar.html
correct one: http://s1.picofile.com/file/7252346448/correctone.jar.html
install both of them and run. mine take some seconds to show up, but the other shows immediately.

You need to take into account that JAR is a compressed file format.
To use jar file contents, device has to first de-compress it. How long does decompression take very much depends on jar contents and because of that, jar file size may be not directly related to startup delay.
You better use some zip tool (most if not all such tools can handle jar format) to learn about contents of the jar files you work with - this might give you better indication on what to expect at startup.
For example, I can easily imagine your "7 mb" jar file containing just a handful of jpeg images of total size, well, about same 7 mb - and decompressing very quickly.
As for "3 mb of text files" - if these decompress to something like few hundreds files of total size 50 mb then I would not wonder if it takes long to unpack at device startup.

Debugging memory leaks in Windows Explorer extensions

Greetings all,
I'm the developer of a rather large C# Windows Explorer extension. As you can imagine, there is a lot of P/Invoke involved, and unfortunately, I've confirmed that it's leaking unmanaged memory somewhere. However, I'm coming up empty as to how to find the leak. I tried following this helpful guide, which says to use WinDBG. But, when I try to use !heap, it won't let me because I don't have the .PDB files for explorer.exe (and the public symbol files aren't sufficient, apparently).
Help?

I've used many time UMDH with very good results. The guide you mentioned describing WinDbg uses the same method as UMDH, based on ability of debug heap to record stack traces for all allocations. The only difference is that UMDH does it automated -- you simply run umdh from command line and it creates snapshot of all current allocations. Normally you to repeate the snapshots two or more times, then you calculate 'delta' between two snapshots (also using umdh.exe). The information on the 'delta' file gives you all new allocations that happen between your snapshots, sorted by the allocation size.
UMDH also needs symbols. You will need at least symbols for ntdll.dll (heap implementation lives there). Public symbols available on public symbols from http://msdl.microsoft.com/download/symbols will work fine.
Make sure you are using correct bitness of the umdh.exe. Explorer.exe is 64 bit on 64 bit OS, so if your OS is 64 bit you need to use 64 bit umdh.exe -- i.e. download appropriate bitness of Windows debugging tools.

What is the fastest way to get just the preprocessed source code with MSVC?

I'm trying to find the fastest way to get the complete preprocessed source code (I don't need #line information other comments, just the raw source code) for a C source file.
I have the following little test program which just includes the Windows header file (mini.c):
#include <windows.h>
Using Microsoft Visual Studio 2005, I then run this command:
cl /nologo /P mini.c
This takes 6 seconds to generate a 2.5MB mini.i file; changing this to
cl /nologo /EP mini.c > mini.i
(which skips comments and #line information) needs just 0.5 seconds to write 2.1MB of output.
Is anybody aware of good techniques for improving this even further, without using precompiled headers?
The reason I'm asking is that I wrote a variant of the popular ccache tool for MSVC. As part of the work, the program needs to compute a hash sum of the preprocessed source code (and a few other things). I'd like to make this as fast as possible.
Maybe there is a dedicated preprocessor binary available, or other command line switches which might help?
UPDATE: One idea which just came to my mind: define the WIN32_LEAN_AND_MEAN macro to strip out lots of rarely needed code. This speeds the above preprocessor run up by a factor of approx 3x.

You're repeatedly processing the same source file (<windows.h>) which in turn pulls in a lot of other files. This <windows.h> is located in the default SDK directory.
Now, processing this unchanging file takes serious time. Yet you can rely on it not changing - it's part of the public interface, after all. Hence you could preprocess it - strip out comments, for instance - and pass that version to cl /EP.
Of course, this is typically an I/O bound task, but with a significant CPU part intermixed. An approach which processes multiple sources in parallel will help the total throughput. Measuring single source preprocesing times isn't too relevant.
Finally, measure the time to write the output to NUL. You shouldn't be including the time needed to write mini.i to disk, since you'll intend to pipe the output to md5sum.

There's a preprocess to file project setting that you can use. In addition, some of the newer MSVC versions offer multithreaded compilation.

The /P option has been there for years, it creates the .i file, and no object file.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string