Why doesn't WD Velociraptor speed up my VC++-compilation significantly? - visual-c++

Several people round here recommended switching to the new WD Velociraptor 10000rpm harddisk. Also magazine articles praise the performance.
I bought one and mirrored my old system to it. The resulting increase in compilation-speed is somewhat disappointing:
On my old Samsung drive (SATA, 7200), the compilation time was 16:02.
On the Velociraptor the build takes 15:23.
I have a E6600 with 1.5G ram. It's a C++-Project with 1200 files. The build is done in Visual Studio 2005. The acoustic managment is switchted off (no big difference anyway).
Did something go wrong or is this modest acceleration really all, I can expect?
Edit:
Some recommended increasing the RAM. I did now and got a minimal gain (3-5%) by doubling my RAM to 3GB.

Are you using the /MP option (undocumented, you have to enter it manually to your processor options) to enable source-level parallel build? That'll speed up your compile much more than just a faster harddisk. Gains from that are marginal.

Visual Studio 2005 can build multiple projects in parallel, and will do so by default on a multi-core machine, but depending on how your projects depend on each other it may be unable to parallel build them.
If your 1200 cpp files are in a single project, you're probably not using all of your CPU. If I'm not mistaken a C6600 is a quad-core CPU.
Dave

I imagine that hard disk reading was not your bottleneck in compilation. Realistically, few things need to be read/written from/to the hard disk. You would likely see more performance increase from more ram or a faster processor.

I'd suggest from the results that either your hdd latency speed wasn't the bottleneck you were looking for, or that your project is already close to building as fast as possible. Other items to consider would be:
hdd access time (although you may not be able to do much with this due to bus speed limitations)
RAM access speed and size
Processor speed
Reducing background processes

~6% increase in speed just from improving your hard drive. Just like Howler said. Grab some faster ram, and PCU.

As many have already pointed out, you probably didn't attack the real bottleneck. Randomly changing parts (or code for that matter) is as one could say "bass ackwards".
You first identify the performance bottleneck and then you changesomething.
Perfmon can help you get a good overview if you're CPU or I/O bound, you want to look at CPU utilization, disk queue length and IO bytes to get a first glimpse on what's going on.

That is actually a pretty big bump in speed for just replacing a hard disk. You are probably memory or CPU bound at this point. 1.5GB is light these days, and RAM is very cheap. You might see some pretty big improvements with more memory.
Just as a recommendation, if you have more than one drive installed, you could try setting your build directory to be somewhere on a different disk than your source files.
As for this comment:
If your 1200 cpp files are in a single project, you're probably not using all of your CPU. If I'm not mistaken a C6600 is a quad-core CPU.
Actually, a C6600 isn't anything. There is a E6600 and a Q6600. The E6600 is a dual core and the Q6600 is a quad core. On my dev machine I use a quad core CPU, and although our project has more than 1200 files, it is still EASILY processor limited during compile time (although a faster hard drive would still help speed things up!).

1200 Source files is a lot, but none of them is likely to be more than a couple hundred K, so while they all need to be read into memory, it's not going to take long to do so.
Bumping your system memory to 4G (yes, yes I know about the 3.somethingorother limit that 32-bit OSes have), and maybe looking at your CPU are going to provide a lot more performance improvement than merely using a faster disk drive could.

VC 2005 does not compile more then one file at the time per project so either move to VC 2008 to use both of your CPU cores, or break your solution to multiple libraries sub projects to get multiple compilations going.

I halved my compilation time by putting all my source onto a ram drive.
I tried these guys http://www.superspeed.com/desktop/ramdisk.php, installed a 1GB ramdrive, then copied all my source onto it. If you build directly from RAM, the IO overhead is vastly reduced.
To give you an idea of what I'm compiling, and on what;
WinXP 64-bit
4GB ram
2.? GHz dual-core processors
62 C# projects
approx 250kloc.
My build went from about 135s to 65s.
Downsides are that your source files are living in RAM, so you need to be more vigilant about source control. If your machine lost power, you'd lose all unversioned changes. Mitigated slightly by the fact that some RAMdrives will save themselves to disk when you shut the machine down, but still, you'll lose everything from either your last checkout, or the last time you shut down.
Also, you have to pay for the software. But since you're shelling out for hard drives, maybe this isn't that big a deal.
Upsides are the increased compilation time, and the fact that the exes are already living in memory, so the startup time and debugging time is a bit better. The real benefit is the compilation time, though.

Related

How To implement swapfile cross operating system

There are use cases where I can't have a lot of ram, and sometimes due to docker based services doesn't always provide more than 512mb/1gb of ram, or if I run multiple rust based gui apps and if each take 100mb of ram normally, how can I implement a swapfile/ virtual ram to exceed allotted ram? Also os level swapfiles don't let users choose which app can use real ram and which swapfile, so it can become a problem too. I want to use swapfile as much as possible, and not even real ram, if possible. Users and hosting services provide with lot of storage usually (more than 10gb normally) so it would be a good way to use the available storage too!
If swapfile or anything like that aren't possible, I would like to know if there is any difference in speed and cpu consumption between "cache data in ram" apps and "cache data in file and read it when required" apps. If the latter is slow normally and not as efficient as swapfiles, I would like to know the possible ways how os manages to make swapfiles that efficient than apps.
An application does not control whether the memory they allocate is allocated on real RAM, on a swap partition, or else. You just ask for memory, and the OS is responsible for finding available memory to give to you.
Besides that, note that using swap (sometimes called swapping) is extremely bad performance-wise. How much depends a lot on your hardware, but it's about three orders of magnitude. This is even amplified if you are interacting with a user: a program that is fetching some resources will not be too bothered if it has to wait one minute to get them instead of a few milliseconds because the system is under heavy load, but a user will generally not be that patient.
Also note that, when swapping, the OS does not chose which application gets the faster RAM and which ones get the swap memory at random. It will try to determine which application should be prioritized, by how much, etc. based on how it was configured (at least for the Linux kernel), so in reality it's the user who, in the end, decides which applications get the most RAM (ahead of time, of course: they are not prompted each time the kernel has to make that decision with a little pop-up...).
Finally, modern OS allow several applications to allocate memory that overlap, as long as each application is not fully using the memory it asked for (which is kind of usual), allowing you to run applications that in theory require more RAM that you actually have.
This was on the OS part: now to the application part. Usually, when you write a program (whose purpose is not specifically RAM-related), you should not really care for memory consumption (up to a certain point), especially in Rust. Not only that is usually handled by the OS in case you used a little bit too much memory, but when it's possible, most people prefer to trade a little more memory usage (even a lot more) for better CPU performance, because RAM is a lot cheaper than CPU.
There are exceptions, of course, in which the memory consumption is so high that you can't really afford not paying attention. In these cases, either you let the user deal with this problem (ie. this application is known to consume a lot of memory because there are no other ways to do this, so if you want to use it, just have a lot of memory), as often video games do, or you rethink your application to reduce the memory usage trading it for some CPU efficiency, as for example is done when you are handling graphs so huge you couldn't even store them on all the hard disks of the world (in which case your application has to be smart enough to be able to work on small parts of the graph at the time), or finally you are working with a big resource but which can be stored on the hard disk, so you just write it on a file and access it chunks-by-chunks, as some database managers do.

How do I make linux swap more eagerly?

I have a use-case where I have bursts of allocations in the range of 5-6gb, specifically when Visual Studio Code compiles my D project while I'm typing. (The compiler doesn't release memory at all, in order to be as fast as possible.)
DMD does memory allocation in a bit of a sneaky way. Since compilers are short-lived programs, and speed is of the essence, DMD just mallocs away, and never frees. This eliminates the scaffolding and complexity of figuring out who owns the memory and when it should be released. (It has the downside of consuming all the resources of your machine if the module being compiled is big enough.)
source
The machine is a Dell XPS 13 running Manjaro 64-bit, with 16gb of memory -- and I'm hitting that roof. The system seizes up completely, REISUB may or may not work, etc. I can leave it for an hour and it's still hung, not slowly resolving itself. The times I've been able to get to a tty, dmesg has had all kinds of jovial messages. So I thought to enable a big swap partition to alleviate the pressure, but it isn't helping.
I realise that swap won't be used until it's needed, but by then it's too late. Even with the swap, when I run out of memory everything segfaults; Qt, zsh, fuse-ntfs, Xorg. At that point it will report a typical 70mb of swap in use.
vm.swappiness is at 100. swapon reports the swap as being active, automatically enabled by systemd.
NAME TYPE SIZE USED PRIO
/dev/nvme0n1p8 partition 17.6G 0B -2
What can I do to make it swap more?
Try this. Remember to put this question in superuser or serverfault. Stackoverflow is only for programming stuff.
https://askubuntu.com/questions/371302/make-my-ubuntu-use-more-swap-than-ram

Does increasing the heap/Ram size on Android Studio makes it faster?

I've boosted up the RAM size allocation for Android Studio to about 4000MB.
But What it actually helps in? I mean, i know it helps in handling executions which needs more RAM to open/compile but if my android project (let's say) needs 1000MB max RAM (i.e a light project) and my current allocated RAM is 4000MB, does it makes the AndroidStudio faster or the speed will be the same?
At a point it caps off. I use the default memory amount (700 and something MB). At a point it isn't about how much ram you use, it is about how good the processor is. If the logic is too heavy for the CPU, it will take a long time no matter how much ram you give it.
IMO, 4 gigabyte is too much. You just allocate a ton of RAM you may need somewhere else, which slows down the other programs. Giving it 2 may be fine, but you don't need to give it 4 gigs unless you are running extremely heavy Gradle tasks that makes 700 MB unreasonably low. RAM is mostly memory allocation for fields, the rest is on the CPU (or GPU for applicable programs). If you don't have a CPU that is good enough, adding more RAM isn't going to help.
"One topic you might hear people discussing when they're talking shop about computers is how much random access memory (RAM) they need to add to their computer. Up to a point, adding RAM will normally cause your computer to seem faster on certain types of operations. RAM is important because it eliminates the need to "swap" programs in and out." (source)
So it only works up to a certain point, which varies. You need a certain amount of RAM depending on what you do in Android Studio, but you don't need 4 gigs. The speedup as a result of giving a program more RAM gets lower the more you give it, and eventually there is no boost.

linux CPU cache slowdown

We're getting overnight lockups on our embedded (Arm) linux product but are having trouble pinning it down. It usually takes 12-16 hours from power on for the problem to manifest itself. I've installed sysstat so I can run sar logging, and I've got a bunch of data, but I'm having trouble interpreting the results.
The targets only have 512Mb RAM (we have other models which have 1Gb, but they see this issue much less often), and have no disk swap files to avoid wearing the eMMCs.
Some kind of paging / virtual memory event is initiating the problem. In the sar logs, pgpin/s, pgnscand/s and pgsteal/s, and majflt/s all increase steadily before snowballing to crazy levels. This puts the CPU up correspondingly high levels (30-60 on dual core Arm chips). At the same time, the frmpg/s values go very negative, whilst campg/s go highly positive. The upshot is that the system is trying to allocate a large amount of cache pages all at once. I don't understand why this would be.
The target then essentially locks up until it's rebooted or someone kills the main GUI process or it crashes and is restarted (We have a monolithic GUI application that runs all the time and generally does all the serious work on the product). The network shuts down, telnet blocks forever, as do /proc filesystem queries and things that rely on it like top. The memory allocation profile of the main application in this test is dominated by reading data in from file and caching it as textures in video memory (shared with main RAM) in an LRU using OpenGL ES 2.0. Most of the time it'll be accessing a single file (they are about 50Mb in size), but I guess it could be triggered by having to suddenly use a new file and trying to cache all 50Mb of it all in one go. I haven't done the test (putting more logging in) to correlate this event with these system effects yet.
The odd thing is that the actual free and cached RAM levels don't show an obvious lack of memory (I have seen oom-killer swoop in the kill the main application with >100Mb free and 40Mb cache RAM). The main application's memory usage seems reasonably well-behaved with a VmRSS value that seems pretty stable. Valgrind hasn't found any progressive leaks that would happen during operation.
The behaviour seems like that of a system frantically swapping out to disk and making everything run dog slow as a result, but I don't know if this is a known effect in a free<->cache RAM exchange system.
My problem is superficially similar to question: linux high kernel cpu usage on memory initialization but that issue seemed driven by disk swap file management. However, dirty page flushing does seem plausible for my issue.
I haven't tried playing with the various vm files under /proc/sys/vm yet. vfs_cache_pressure and possibly swappiness would seem good candidates for some tuning, but I'd like some insight into good values to try here. vfs_cache_pressure seems ill-defined as to what the difference between setting it to 200 as opposed to 10000 would be quantitatively.
The other interesting fact is that it is a progressive problem. It might take 12 hours for the effect to happen the first time. If the main app is killed and restarted, it seems to happen every 3 hours after that fact. A full cache purge might push this back out, though.
Here's a link to the log data with two files, sar1.log, which is the complete output of sar -A, and overview.log, a extract of free / cache mem, CPU load, MainGuiApp memory stats, and the -B and -R sar outputs for the interesting period between midnight and 3:40am:
https://drive.google.com/folderview?id=0B615EGF3fosPZ2kwUDlURk1XNFE&usp=sharing
So, to sum up, what's my best plan here? Tune vm to tend to recycle pages more often to make it less bursty? Are my assumptions about what's happening even valid given the log data? Is there a cleverer way of dealing with this memory usage model?
Thanks for your help.
Update 5th June 2013:
I've tried the brute force approach and put a script on which echoes 3 to drop_caches every hour. This seems to be maintaining the steady state of the system right now, and the sar -B stats stay on the flat portion, with very few major faults and 0.0 pgscand/s. However, I don't understand why keeping the cache RAM very low mitigates a problem where the kernel is trying to add the universe to cache RAM.

Store more than 3GB of video-frames in memory, on 32-bit OS

At work we have an application to play 2K (2048*1556px) OpenEXR film sequences. It works well.. apart from when sequences that are over 3GB (quite common), then it has to unload old frames from memory, despite the fact all machines have 8-16GB of memory (which is addressable via the linux BIGMEM stuff).
The frames have to he cached into memory to play back in realtime. The OS is a several-year old 32-bit Fedora Distro (not possible to upgradable to 64bit, for the foreseeable future). The per-process limitation is 3GB per process.
Basically, is it possible to cache more than 3GB of data in memory, somehow? My initial idea was to spread the data between multiple processes, but I've no idea if this is possible..
One possibility may be to use mmap. You would map/unmap different parts of your data into the same virtual memory region. You could only have one set mapped at a time, but as long as there was enough physical memory, the data should stay resident.
How about creating a RAM drive and loading the file into that ... assuming the RAM drive supports the BIGMEM stuff for you.
You could use multiple processes: each process loads a view of the file as a shared memory segment, and the player process then maps the segments in turn as needed.
My, what an interesting problem :)
(EDIT: Oh, I just read Rob's ram drive post...I got all excited by the problem...but have a bit more to suggest, so I won't delete)
Would it be possible to...
setup a multi-gigabyte ram disk, and then
modify the program to do all it's reading from the "disk"?
I'd guess the ram disk part is where all the problem would be, since the size of the ram disk would be OS and file system dependent. You might have to create multiple ram disks and have your code jump between them. Or maybe you could setup a RAID-0 stripe set over multiple ram disks. Or, if there are still OS limitations and you can afford to drop a couple grand (4k?), setup a hardware RAID-0 strip set with some of those new blazing fast solid state drives. Or...
Fun, fun, fun.
Be sure to follow up!
I assume you can modify the application. If so, the easiest thing would be to start the application several times (once for each 3GB chunk of video), have each one hold a chunk of video, and use another program to synchronize them so they each take control of the framebuffer (or other video output) in turn.
The synchronization is going to be a little messy, perhaps, but it can be simplified if each app has its own framebuffer and the sync program points the video controller to the correct framebuffer inbetween frames when switching to the next app.
#dbr said:
There is a review machine with an absurd fiber-channel-RAID-array that can play 2K files direct from the array easily. The issue is with the artist-workstations, so it wouldn't be one $4000 RAID array, it'd be hundreds..
Well, if you can accept a limit of ~30GB, then maybe a single 36GB SSD drive would be enough? Those go for ~US$1k each I think, and the data rates might be enough. That very well maybe cheaper than a pure RAM approach. There are smaller sizes available, too. If ~60GB is enough you could probably get away with a JBOD array of 2 for double the cost, and skip the RAID controller. Be sure only to look at the higher end SSD options--the low end is filled with glorified memory sticks. :P

Resources