Power Pivot bad performance (5 hours waiting for 50M rows) - excel

I have to do analysis on ~50M rows and it seems that PowerPivot can't handle it. I do some ETL in PowerQuery but it seems to work properly. I takes some time to execute but I the end data is uploaded.
However when I try to add to the data model some metrics like: average, median, standard deviation etc., boy it stucks. I waited for 5 hours for it to be executed and in the end I had to restart computer anyway.
Additionally I've noticed that my PC behaves in unexpected way. Normally when I do something that needs higher performance (like playing games) you can feel that all resources are working hard. Computer gets warm, fans are working loudly etc. In this case it's complete silence. From time to time everything freezes for a while (like half an hour) so I can't even move mouse. Then it starts to operate for a while and whole process repeats.
I'm wondering, is it just too complex for PowerPivot/PowerQuery or maybe my computer is too slow (I have i7, 8GB ram and HDD drive in my laptop). I was also thinking about adding RAM or SSD drive but I'm not sure if it will help.
My theory was that this weird behavior of my computer is caused by some component that is a bottleneck. I was thinking that maybe my HDD is too slow and all the other resources can't operate on a full performance because reading speed is too slow. I'm not a computer scientist and I don't know if that is possible.
Thanks for your help!

For data analysis on this scale you should consider other packages, one that comes to mind is SAS - used this to data mine 10 Meg of data into 200,000 individual files based on several criteria...
Warning SAS has a steep learning curve : very good though... There may be other packages to consider.

Related

How can I determine if a raspberry pi is powerful enough to run my code?

Perhaps I posted this with the wrong tags but hopefully someone can help me. I am an engineer finding myself deeper and deeper in automation. Recently I designed an automated system on a raspberry pi. I wrote a pretty simple code which was duplicated to read sensor values from different serial ports simultaneously. I did it this way so I could shut down one script without compromising the others if need be. It runs very well now but I had problems overloading my cpu when I first started (I believe it was because I opened all of the code at once rather than one at a time).
My question is:
How can I determine how much computing power is required by code I have written? How can I spec out a computer to run my code before I start building the robot?
The three resources you're likely to be bounded by on any computer are disk, RAM, and CPU (cores). MicroSD cards are cheap, and easily swapped, so the bigger concern is the latter two.
Depending on the language you're writing in, you'll have more or less control over memory usage. Python in particular "saves" the developer by "handling" memory automatically. There are a few good articles on memory management in Python, like this one. When running a simple script (e.g. activate these IO pins) on a machine with gigabytes of memory, this is rarely an issue. When running data intensive applications (e.g. do linear algebra on this gigantic array) then you have to worry about how much memory you need to do the computation and whether the interpreter actually frees it when you're done. This is not always easy to calculate but if you profile your software on another machine you may be able to estimate it.
CPU utilization is comparatively easy to prepare for. Reserve 1 core for OS and other functions and the rest are available to your software. If you write single threaded code this should be plenty. If you have parallel processing, then either stick to N-1 workers or you'll need to get creative with the software design.
Edit: all of this is with the Raspberry Pi in mind. The Pi is a full computer is a tiny form factor: OS, BIOS, boot time, etc. Many embedded problems can be solved with an Arduino or some other controller which has a different set of considerations.

Profiling resource usage - CPU, memory, hard-drive - of a long-running process on Linux?

We have a process that takes about 20 hours to run on our Linux box. We would like to make it faster, and as a first step need to identify bottlenecks. What is our best option to do so?
I am thinking of sampling the process's CPU, RAM, and disk usage every N seconds. So unless you have other suggestions, my specific questions would be:
How much should N be?
Which tool can provide accurate readings of these stats, with minimal interference or disruption from the fact that the tool itself is running?
Any other tips, nuggets of wisdom, or references to other helpful documents would be appreciated, since this seems to be one of these tasks where you can make a lot of time-consuming mistakes and false-starts as a newbie.
First of all, what you want and what you are asking is completely different.
Monitoring is required when you are running it for first time i.e. when you don't know its resource utilization (CPU, Memory,Disk etc.).
You can follow below procedure to drill down the bottleneck,
Monitor system resources (Generally 10-20 seconds interval should be fine with Munin, ganglia or other tool).
In this you should be able to identify if your hw is bottleneck or not i.e are you running out of resources Ex. 100% cpu util, very low memory, high io etc.
If this your case then probably think about upgrading hw or tuning the existing.
Then you tune your application/utility. Use profilers/loggers to find out which method, process is taking time. Try to tune that process. If you have single threaded codes then probably use parallelism. If DB etc. are involved try to tune your queries, DB params.
Then again run test with monitoring to drill down more :)
I think a graph representation should be helpful for solving your problem and i advice you Munin.
It's a resource monitoring tool with a web interface. By default it monitors disk IO, memory, cpu, load average, network usage... It's light and easy to install. It's also easy to develop your own plugins and set alert thresholds.
http://munin-monitoring.org/
Here is an example of what you can get from Munin : http://demo.munin-monitoring.org/munin-monitoring.org/demo.munin-monitoring.org/

linux CPU cache slowdown

We're getting overnight lockups on our embedded (Arm) linux product but are having trouble pinning it down. It usually takes 12-16 hours from power on for the problem to manifest itself. I've installed sysstat so I can run sar logging, and I've got a bunch of data, but I'm having trouble interpreting the results.
The targets only have 512Mb RAM (we have other models which have 1Gb, but they see this issue much less often), and have no disk swap files to avoid wearing the eMMCs.
Some kind of paging / virtual memory event is initiating the problem. In the sar logs, pgpin/s, pgnscand/s and pgsteal/s, and majflt/s all increase steadily before snowballing to crazy levels. This puts the CPU up correspondingly high levels (30-60 on dual core Arm chips). At the same time, the frmpg/s values go very negative, whilst campg/s go highly positive. The upshot is that the system is trying to allocate a large amount of cache pages all at once. I don't understand why this would be.
The target then essentially locks up until it's rebooted or someone kills the main GUI process or it crashes and is restarted (We have a monolithic GUI application that runs all the time and generally does all the serious work on the product). The network shuts down, telnet blocks forever, as do /proc filesystem queries and things that rely on it like top. The memory allocation profile of the main application in this test is dominated by reading data in from file and caching it as textures in video memory (shared with main RAM) in an LRU using OpenGL ES 2.0. Most of the time it'll be accessing a single file (they are about 50Mb in size), but I guess it could be triggered by having to suddenly use a new file and trying to cache all 50Mb of it all in one go. I haven't done the test (putting more logging in) to correlate this event with these system effects yet.
The odd thing is that the actual free and cached RAM levels don't show an obvious lack of memory (I have seen oom-killer swoop in the kill the main application with >100Mb free and 40Mb cache RAM). The main application's memory usage seems reasonably well-behaved with a VmRSS value that seems pretty stable. Valgrind hasn't found any progressive leaks that would happen during operation.
The behaviour seems like that of a system frantically swapping out to disk and making everything run dog slow as a result, but I don't know if this is a known effect in a free<->cache RAM exchange system.
My problem is superficially similar to question: linux high kernel cpu usage on memory initialization but that issue seemed driven by disk swap file management. However, dirty page flushing does seem plausible for my issue.
I haven't tried playing with the various vm files under /proc/sys/vm yet. vfs_cache_pressure and possibly swappiness would seem good candidates for some tuning, but I'd like some insight into good values to try here. vfs_cache_pressure seems ill-defined as to what the difference between setting it to 200 as opposed to 10000 would be quantitatively.
The other interesting fact is that it is a progressive problem. It might take 12 hours for the effect to happen the first time. If the main app is killed and restarted, it seems to happen every 3 hours after that fact. A full cache purge might push this back out, though.
Here's a link to the log data with two files, sar1.log, which is the complete output of sar -A, and overview.log, a extract of free / cache mem, CPU load, MainGuiApp memory stats, and the -B and -R sar outputs for the interesting period between midnight and 3:40am:
https://drive.google.com/folderview?id=0B615EGF3fosPZ2kwUDlURk1XNFE&usp=sharing
So, to sum up, what's my best plan here? Tune vm to tend to recycle pages more often to make it less bursty? Are my assumptions about what's happening even valid given the log data? Is there a cleverer way of dealing with this memory usage model?
Thanks for your help.
Update 5th June 2013:
I've tried the brute force approach and put a script on which echoes 3 to drop_caches every hour. This seems to be maintaining the steady state of the system right now, and the sar -B stats stay on the flat portion, with very few major faults and 0.0 pgscand/s. However, I don't understand why keeping the cache RAM very low mitigates a problem where the kernel is trying to add the universe to cache RAM.

multiple CPU's usage symetrical

I have noticed this a number of times while doing computational expensive tasks on my computer, anywhere from computing hashes, to rendering videos.
In this specific situation I was rendering a video using all 4 of my cores under Linux, and when I opened my system monitor once again I noticed it.
2 or more of my cores were under symmetrical usage, when one went up the other went down completely symmetrical and in sync.
I have no idea why this is the case and would love to know!
System monitor picture

Why doesn't WD Velociraptor speed up my VC++-compilation significantly?

Several people round here recommended switching to the new WD Velociraptor 10000rpm harddisk. Also magazine articles praise the performance.
I bought one and mirrored my old system to it. The resulting increase in compilation-speed is somewhat disappointing:
On my old Samsung drive (SATA, 7200), the compilation time was 16:02.
On the Velociraptor the build takes 15:23.
I have a E6600 with 1.5G ram. It's a C++-Project with 1200 files. The build is done in Visual Studio 2005. The acoustic managment is switchted off (no big difference anyway).
Did something go wrong or is this modest acceleration really all, I can expect?
Edit:
Some recommended increasing the RAM. I did now and got a minimal gain (3-5%) by doubling my RAM to 3GB.
Are you using the /MP option (undocumented, you have to enter it manually to your processor options) to enable source-level parallel build? That'll speed up your compile much more than just a faster harddisk. Gains from that are marginal.
Visual Studio 2005 can build multiple projects in parallel, and will do so by default on a multi-core machine, but depending on how your projects depend on each other it may be unable to parallel build them.
If your 1200 cpp files are in a single project, you're probably not using all of your CPU. If I'm not mistaken a C6600 is a quad-core CPU.
Dave
I imagine that hard disk reading was not your bottleneck in compilation. Realistically, few things need to be read/written from/to the hard disk. You would likely see more performance increase from more ram or a faster processor.
I'd suggest from the results that either your hdd latency speed wasn't the bottleneck you were looking for, or that your project is already close to building as fast as possible. Other items to consider would be:
hdd access time (although you may not be able to do much with this due to bus speed limitations)
RAM access speed and size
Processor speed
Reducing background processes
~6% increase in speed just from improving your hard drive. Just like Howler said. Grab some faster ram, and PCU.
As many have already pointed out, you probably didn't attack the real bottleneck. Randomly changing parts (or code for that matter) is as one could say "bass ackwards".
You first identify the performance bottleneck and then you changesomething.
Perfmon can help you get a good overview if you're CPU or I/O bound, you want to look at CPU utilization, disk queue length and IO bytes to get a first glimpse on what's going on.
That is actually a pretty big bump in speed for just replacing a hard disk. You are probably memory or CPU bound at this point. 1.5GB is light these days, and RAM is very cheap. You might see some pretty big improvements with more memory.
Just as a recommendation, if you have more than one drive installed, you could try setting your build directory to be somewhere on a different disk than your source files.
As for this comment:
If your 1200 cpp files are in a single project, you're probably not using all of your CPU. If I'm not mistaken a C6600 is a quad-core CPU.
Actually, a C6600 isn't anything. There is a E6600 and a Q6600. The E6600 is a dual core and the Q6600 is a quad core. On my dev machine I use a quad core CPU, and although our project has more than 1200 files, it is still EASILY processor limited during compile time (although a faster hard drive would still help speed things up!).
1200 Source files is a lot, but none of them is likely to be more than a couple hundred K, so while they all need to be read into memory, it's not going to take long to do so.
Bumping your system memory to 4G (yes, yes I know about the 3.somethingorother limit that 32-bit OSes have), and maybe looking at your CPU are going to provide a lot more performance improvement than merely using a faster disk drive could.
VC 2005 does not compile more then one file at the time per project so either move to VC 2008 to use both of your CPU cores, or break your solution to multiple libraries sub projects to get multiple compilations going.
I halved my compilation time by putting all my source onto a ram drive.
I tried these guys http://www.superspeed.com/desktop/ramdisk.php, installed a 1GB ramdrive, then copied all my source onto it. If you build directly from RAM, the IO overhead is vastly reduced.
To give you an idea of what I'm compiling, and on what;
WinXP 64-bit
4GB ram
2.? GHz dual-core processors
62 C# projects
approx 250kloc.
My build went from about 135s to 65s.
Downsides are that your source files are living in RAM, so you need to be more vigilant about source control. If your machine lost power, you'd lose all unversioned changes. Mitigated slightly by the fact that some RAMdrives will save themselves to disk when you shut the machine down, but still, you'll lose everything from either your last checkout, or the last time you shut down.
Also, you have to pay for the software. But since you're shelling out for hard drives, maybe this isn't that big a deal.
Upsides are the increased compilation time, and the fact that the exes are already living in memory, so the startup time and debugging time is a bit better. The real benefit is the compilation time, though.

Resources