Using inline assembly "__asm" what concerns are there with portability?

Using inline assembly "__asm" what concerns are there with portability? - windows-ce

Using inline assembly "__asm" what concerns are there with portability?
Is it safer to use heap or stack memory? Heap memory may be too non-determinstic in yielding different results on each run. The big deal is in some math computations where Microsoft has much in validation that may be slowing things down, that I just want to pull out. I am using Windows CE as an RTOS because there are big latency concerns in the application.

Related

Detect and remove Memory Leak in Linux Application

We have a a very large project which is basically an application which uses Linux Application programming and runs on PowerPC processor. This project was initially developed by another company. We acquired the project from the company and now we are maintaining the project.
The application is reported to have a lot of memory leak issue. Since this is a large project, it is not possible to go to each source code file and find out the memory leak. We have used Valgrid, mpatrol and other memory leak detection tools. These tools did not help much and the memory leak has not decreased by a significant percentage.
In this situation, how to go about to reduce the memory leak by a significant amount.Is there a general method which people use in these case to reduce the memory leak other than the memory leak detection tools like mentioned above.

Usually Valgrind belongs to the best tools for this tasks. If it does not work correctly, there might only be a couple of things you can still do.
First question: What language is the application in? Valgrind is very good for C and C++, but will not help you with garbage collected or scripting language. So check the language first. There might be something similar for java, but I have not used that much java, so you would have to ask someone else.
Play around a lot with the settings of valgrind. There are several plugins, that can help with this. One example could be using --leak-check=full or similar options. There are also plugins for valgrind, that can enhance it detection capabilities.
You say, that the application was reported to have a memory leak. How was this detected? Did the application detect this by itself. If it was detected by the application on it's own without any external tools, this probably means someone has added their own memory tracker inside the application. Custom memory tracker, memory pools etc. mess up valgrind and any other leak detection system very bad. So in case any custom memory handling is present in the application, your only choice is to either deactivate it (if possible) or to hook into this custom mechanism. How this could be done depends on your application only.
Add your own memory tracker. For example in C++ it is possible to hook into new/delete calls and get them to track the memory. There are a couple of libraries you can use for this. You can also write your own new/delete replacement in about 500 LOC. If you decide to use this method, be sure to read a lot of tutorials on replacing new/delete, since there are several things that are unusual in the C++ world when attempting this task.
What makes you so sure, there is an memory leak in the application (i.e. how was this detected)? If a tool just reported huge numbers of allocated memory, this might not even mean, there is an actual memory leak. A memory leak means that the handles to the memory are lost and hence it becomes impossible to every reach and free that memory again. In case your application just get's a lot of memory and keeps it accessible, you probably have a completely different problem. For example you simply might use an algorithm with a bad space complexity at one point or the other, leading to many allocations. In this case you will not need a leak detector, but rather a memory profiler, which gives you more detailed overview of the memory footprint of the code parts. However I have never used a profiler for this kind of task before, so I cannot give you any more hints on this.

You could replace all memory allocation calls with calls to your own allocation methods, which should call original methods and at the same time count memory usage and where it was allocated. This will allow you to find the leaks and eliminate them by hand.
There might also be automated tools that allow you to do this - not sure, haven't used any. But this method works.

Perhaps you might also consider using Boehm's garbage collector (that is using GC_malloc instead of malloc etc... and not bother about free-ing data).

Windows CE (RTOS) class-libraries for latency of interrupts and threads and USB?

I am getting started in working with Windows CE to utilize RTOS to reduce latency concerns with interrupts and threads and USB. What class-libraries(visual c++) can you point me to that would be good to have learned well to speed up the learning curve?
Thanks

That's a really, really broad question. The most important piece of advice I'll give you is that if you're after determinism and speed (your reference to an RTOS leads me to think you consider these important) then you need to be aware that any memory allocation or deallocation in a piece of code makes it non-deterministic.
C++ classes often have allocations and deallocations buried in them, so whatever you choose (and whatever you write), use them wisely. Sometimes they'll allow you to provide custom allocators (e.g. Boost) which you can use to just pull memory from an already allocated heap you create somewhere.
Keep the real-time parts of the code as small and simple as possible.

Is g_slice really faster than malloc

The GLib docs recommend use of the GLib Slice Allocator over malloc:
"For newly written code it is recommended to use the new g_slice API instead of g_malloc() and friends, as long as objects are not resized during their lifetime and the object size used at allocation time is still available when freeing."
-- http://developer.gnome.org/glib/unstable/glib-Memory-Slices.html
But in practise is g_slice significantly faster than Windows/Linux malloc(faster enough to warrant the extra trouble of handling sizes and GLib's preprocessor hacks like g_slice_new)? I'm planning to use GLib in my C++ program to handle INIish configuration (GKeyFile) and to get access to data structures not available in C++ like GHashTable, so the GLib dependency doesn't matter anyway.

Faster enough to be worth it sort of depends on your app. But they should be faster.
There is another issue besides speed, which is memory fragmentation and per-block overhead. GSlice
leaves malloc to deal with large or variable-size allocations while handling small known-size objects more space-efficiently.

Slice API heavily borrows from research conducted by Sun Microsystems in 1980s and it was called slab allocation back then. I could not find original research paper but here is a wikipedia page about it or you can just google for "slab allocation".
Essentially it eliminates expensive allocation/deallocation operations by facilitating reuse of memory blocks. It also reduces or eliminates memory fragmentation. So it is not all about speed, even though it should improve it as well.
If you should used or not - it depends... Look at Havoc's answer - he summarized it pretty well.
Update 1:
Note, that modern Linux kernels include SLAB allocator as one of the option and it is often the default. So, the difference between g_slice() and malloc() may be unnoticeable in that case. However, purpose of glib is cross-platform compatibility, so using slice API may somewhat guarantee consistent performance across different platforms.
Update 2:
As it was pointed by a commenter my first update is incorrect. SLAB allocation is used by kernel to allocate memory to processes but malloc() uses an unrelated mechanism, so claim that malloc() is equivalent to g_slice() on Linux is invalid. Also see this answer for more details.

Can CLR Profiler be used to find memory leaks

My .NET application has memory leak. Few people seem to recommend using CLR Profiler for this pupose I am a bit lost on the idea. To me in order to find a memory leak, tool should compare two memory states that can give you statistics like growth in objects between two states. So in my mind, if a tool cannot compare two (or more) memory states, it cannot be used for detecting memroy leak. Obviously things like performance counters is bit different concept where you can trend the memory usage.
So my question is really if someone can explain how exactly CLR Profiler can be used to detect memory leaks?

Well it depends on what kind of memory leak you have.
We had a reproducible one, where we new that a certain chain of events should always leave a clean table after work was done - but it wasn't.
So we simple setup a test where we did it a couple of thousand times - then we looked at those objects (bigger in number) in the heap graph and at the "root"-object the cause of why the objects where still alive. It helped to solve our problem...

Do you expect that future CPU generations are not cache coherent?

I'm designing a program and i found that assuming implicit cache coherency make the design much much easier. For example my single writer (always the same thread) multiple reader (always other threads) scenarios are not using any mutexes.
It's not a problem for current Intel CPU's. But i want this program to generate income for at least the next ten years (a short time for software) so i wonder if you think this could be a problem for future cpu architectures.

I suspect that future CPU generations will still handle cache coherence for you. Without this, most mainstream programming methodologies would fail. I doubt any CPU architecture that will be used widely in the next ten years will invalidate the current programming model - it may extend it, but it's difficult to drop something so widely assumed.
That being said, programming with the assumption of implicit cache coherency is not always a good idea. There are many issues with false sharing that can easily be avoided if you purposefully try to isolate your data. Handling this properly can lead to huge performance boosts (rather, a lack of huge performance losses) on current generation CPUs. Granted, it's more work in the design, but it is often required.

We are already there. Computers claim cache coherency but at the same time they have a temporary store buffer for writes, reads can be completed via this buffer instead of the cache (ie the store buffer has just become a incoherent cache) and invalidate requests are also queued allowing the processor to temporarily use cache lines it knows are stale.
X86 doesn't use many of these techniques, but it does use some. As long as memory stays significantly slower than the CPU, expect to see more of these techniques and others yet devised to be used. Even itanium, failed as it is, uses many of these ideas, so expect intel to migrate them into x86 over time.
As for avoiding locks, etc: it is always hard to guage people's level of expertise over the Internet so either you are misguided with what you think might work, or you are on the cutting edge of lockfree programming. Hard to tell.
Do you understand the MESI protocol, memory barriers and visibility? Have you read stuff from Paul McKenney, etc?

I don't know per se. But I'd like to see a trend toward non-cache coherent modes.
The conceptual mind shift is significant (can't just pass data in a method call, must pass it through a queue to an async method), but it's required as we move more and more into a multicore world anyway. The closer we get to one processor per memory bank the better. Because then we're working in a world of network message routing, where data is just not available rather than having threads that can silently stomp on data.
However, as Reed Copsey points out, the whole x86 world of computing is built on the assumption of cache coherency (which is even bigger than Microsoft's market share!). So it won't go away any time soon!

Here is a paper from reputed authors in computer architecture area which argues that cache coherence is here to stay.
http://acg.cis.upenn.edu/papers/cacm12_why_coherence.pdf
"Why On-Chip Cache Coherence Is Here to Stay" -By Martin, Hill and Sorin

You are making a strange request. You are asking for our (the SO community) assumptions about future CPU architectures - a very dangerous proposition. Are you willing to put your money where our mouth is? Because if we're wrong and your application will fail it will be you who's not making any money..
Anyway, I would suspect things are not going to change that dramatically because of all the legacy code that was written for single threaded execution but that's just my opinion.

The question seems misleading to me. The CPU architecture is not that important, what is important is the memory model of the platform you are working for.
You are developing the application is some environment, with some defined memory model. E.g. if you are currently targeting x86, you can be pretty sure any future platform will implement the same memory model when it is running x86 code. The same is true for Java or .NET VMs and other execution platforms.
If you expect to port your current application at some other platforms, if the platform memory model will be different, you will have to adjust for it, but in such case you are the one doing the port and you have the complete control over how you do it. This is however true even for current platforms, e.g. PowerPC memory model allows much more reorderings to happen than the x86 one.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string