Garbage Collector in Real-Time System - garbage-collection

I'm new to C#/Java and plan to prototype it for soft real-time system.
If I wrote C#/Java app just like how I do in C++ in terms of memory management, that is, I explicitly "delete" the objects that I no longer use, then would the app still be affected by garbage collector? If so, how does it affect my app?
Sorry if this sounds like an obvious answer, but being new, I want to be thorough.

Take a look at IBM's Metronome, their garbage collector for hard real-time systems.

Your premise is wrong: you cannot explicitly “delete” objects in either Java or C#, so your application will always be affected by the GC.
You may try to trigger a collection by calling GC.Collect (C#) with an appropriate parameter (e.g. GC.MaxGeneration) but this still doesn’t guarantee that the GC won’t be working at other moments during execution.

By explicitly "delete" if you mean releasing the reference to the object then you are reliant on the garbage collector in C# managed code - see the System.GC class for ways of controlling it.
If you choose to write unmanaged C# code then you will have more control over memory, akin to C++, and will be responsible for deleting your instantiated objects, able to use pointers, etc. For more info see MSDN doc - Unsafe Code and Pointers (C# Programming Guide).
In unmanaged code you will not be at the mercy of the the Garbage Collector and its indeterminate cleanup algorithms.
I don't know if Java has an equivalent unmanaged mode, but this Microsoft info might help provide some direction on C#/.NET to use its available features for your requirement of dealing with the garbage collector.

In Csharp or Java you can't delete object. What you can do is only mark them available for deletion. The memory free up will be done by Garbage Collector.. It might be the case that Garbage Collector may not run during the life time of your application. However it's likely to run. When your system is becoming short of resources it is the most likely time when GC routines are run by the runtime. And when resources are low GC becomes the highest priority thread. So your application do get effected. However you can minimize the effect by calculating the correct load and required resources for your application life time and make sure to buy the right hardware which is good enough for that. But still you can't just bench mark your performance.
Besides just GC the managed application do get a slight overhead over the traditional C++ application due to the extra delegation layer involved. And a slight first time performance panelty since the run time needs to be up and running before your application get started.

Here are some references for developing real-time systems with the .net compact framework:
IEEE - C# and the .NET Framework: Ready for Real Time?
MSDN - Real-Time Behavior of the .NET Compact Framework
They both talk about the memory requirements of using the .net framework.

C# and Java are not for Real-Time development. Soft real-time is attainable however as you note.
For C#, the best you can do is implement the finalize/dispose pattern:
http://msdn.microsoft.com/en-us/library/b1yfkh5e(VS.71).aspx
You can request it to collect, but typically it's much better at determining how to do this.
http://msdn.microsoft.com/en-us/library/system.gc(VS.71).aspx
For Java, there are many options to optimize it:
http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html
Along with third party solutions like IBM Metronome as noted above.
This is a real science within CS itself.

Related

.Net Core 2.0 Web API - Servers crashing because of issue with Destructors

Ok, so I have an app with pretty heavy traffic (about 17 requests per second). The app is a REST API built with .Net Core 2.0 (just recently upgraded).
The app is hosted on Azure and we are having a problem that looked like a memory leak in that the servers would very slowly (over a week) eat up all the handlers and resources and eventually crash.
I have spoken a good bit to MS Support and they helped me narrow down the problem. Here is their last email:
"We are seeing a high amount of large objects (strings and arrays over
85000 bytes) can lead to GC Heap fragmentation and thus higher memory
usage in your application. We were investigating how to manage the
destructor and I can provide you the following documentation:
Why does the Finalize/Destructor example not work in .NET Core? Why does the Finalize/Destructor example not work in .NET Core?
(not a Microsoft official documentation but it can be use as reference )
ASP.NET Case Study: Bad perf, high memory usage and high
CPU in GC – Death By ViewState:
https://blogs.msdn.microsoft.com/tess/2006/11/24/asp-net-case-study-bad-perf-high-memory-usage-and-high-cpu-in-gc-death-by-viewstate/
Finalizers (C# Programming Guide):
https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/classes-and-structs/destructors
I will continue looking for more documentation related with the
destructor in .NET Core."
After this they basically said that Azure was not to blame and I needed to open up a "code" support ticket that costs about $500...
So I am coming here instead. :)
While I have been a .Net developer for over 15 years, this was my first time using .Net Core. I found this great article and used it as the backbone to my API (https://chsakell.com/2016/06/23/rest-apis-using-asp-net-core-and-entity-framework-core/).
When I compared it to other .Net Core examples it seemed to fall in line with those so I am reasonably confident that I am following "best practices", but I could be wrong.
My fear is that there is a fundamental problem with .Net Core (which those articles that MS referred to kinda suggest), but I am not sure how to find the answer. I don't want have to rewrite my code because of this, but aside from occasionally rebooting the servers I am not sure what my options are.
Thoughts?
Ok... for posterity... my eventual solution turned out to be a configuration setup issue... the destructor issue with Core wasn't a factor for me because we weren't sending strings large enough to trigger it.
You can see my approach and the eventual answer (using a singleton) in this question:
ASP.Net Core 2 configuration taking up a lot of memory. How do I get config information differently?

What are Managed objects and unmanaged objects in C++/CLI?

What are Managed objects and unmanaged object in C++/CLI
Managed objects are a feature of the .NET framework and its implementation of a C++-like language, and have their memory managed for you by the .NET garbage collector. C++ itself has no such concept, and a better (in general) way of managing all resources (not just memory) called RAII.
The concept Managed/Unmanaged is not typically C++. It is Microsoft .Net technology speak.
In normal, plain C++ applications, the application itself is responsible for deleting all the memory it has allocated. This requires the developer to be very careful about when to delete memory. If memory is deleted too soon, the application may crash if it still has a pointer to it. If memory is deleted too late, or not deleted at all, the application has a memory leak.
Environments like Java and .Net solve this problem by using garbage collectors. The developer should not delete memory anymore, the garbage collector will do this for him.
In the 'native' .Net languages (like C#), the whole language works with the garbage collector concept. To make the transition from normal, plain C++ applications to .Net easier, Microsoft added some extensions to its C++ compiler, so that C++ developers could already benefit from the advantages of .Net.
Whenever you use normal, plain C++, Microsoft talks about unmanaged, or native C++. If you use the .Net extensions in C++, Microsoft talks about managed C++. If your application contains both, you have a mixed-mode application.
Managed objects do not exist in C++.
They exist in Microsoft's .NET extensions to C++, and a complete explanation would be a bit long, sorry.

Managing multiple-processes: What are the common strategies?

While multithreading is faster in some cases, sometimes we just want to spawn multiple worker processes to do work. This has the benefits of not crashing the main app if one of the worker crashes, and that the user doesn't need to worry a lot about inter-locking stuffs.
COM+'s Application Pooling seems like a good way to achieve this on Windows. The downside is that we need to write a COM+ wrapper for the worker process.
However, when I search for Application Pooling on Google, it seems like most of its usages are related to IIS. Don't other applications (such as scientific/graphics) find it useful to spawn multiple worker processes?
So there are several questions:
Why isn't COM+ more popular in areas other than IIS? If I write a non-IIS application and want to use process management on Windows, should I go with COM+ or are there better alternatives out there?
What would be the cross platform way to do it? Are there libraries out there that give me a "process pool" (worker processes will intelligently pick up work, can be managed, etc.)
I can't offer any answers to the COM aspect of your question, but it's worth noting there's another world (besides HPC MPI) where multi-processing (rather than the more common multi-threading approach) is apparently alive, well and thriving: Python.
Why ? Python's GIL ("global interpreter lock") cripples most attempts to multithread python code so badly that multiprocessing is the generally recommended approach to parallelising Python on SMP. The standard library includes process pools; there are various other options too.
Python certainly ought to satisfy any multi-platform requirement!
You might want to investigate how the apache web server manages process pools. From version 2.0 it runs natively on windows and one of the multi-processing models it supports are process pools. A part of apache is also APR (apache portable runtime), which handles platform-specific issues.
No one can answer why something is not popular because may be no body is looking for what you are looking for. After .NET came in picture, people shifted from COM to Managed Environment, before .NET, COM and ATL and relative other technologies were quite painful to implement and they would crash and were also quite difficult to debug.
That is the reason, managed environment came in existence.
However, .NET 4 onwards, parallel libraries give much more power to user for parallel programming and also you can spawn and control other proceeses.
For multiplatform, you can look for zvrba's answer.
Yes, other applications--especially science applications--find it useful to spawn multiple processes. Since few super-computers run Microsoft Windows, scientists generally avoid using anything that ties them to a Microsoft platform. Nothing related to COM will help scientists leverage their enormous existing code base written in Fortran.
People who choose to run IIS have generally already drunk the Microsoft Koolaid, so they have fewer inhibitions to tying themselves to Microsoft's proprietary platforms, which is why COM-specific terminology will get lots of hits related to IIS.
One of the open standards for doing what you want is the Message Passing Interface. Several implementations exist and some of them run on supercomputers using Fortran. Some of them run on cheaper computers using sexier languages.
See http://en.wikipedia.org/wiki/Message_Passing_Interface
There hasn't been a mob rushing through the doors of COM application pooling primarily because of two factors:
COM is a pain in the ass to deal with compared to just about anything else
Threading can be a headache, but it's a lot easier and more convenient to manage than inter-process communication
COM application pooling was essentially created for IIS. It has one very specific benefit over normal multithreading: the multiple processes are fully isolated from each other. This is important for data security and for app stability when dealing with third party plugins of questionable stability.
Scientific computing generally doesn't need strong data security isolation between operations, and I would venture to guess that scientific computing doesn't rely much on third party plugins of questionable stability. When doing big math operations, you're either using a sexy numerics library that had better be rock solid to be taken seriously, or you're using your own code, in which case crashes should be fixed and repeat offenders should be spanked.
Oh, and all crashes except stack overflow can be trapped and dealt with within a multithreaded app, especially if it's your own code.
In short, COM app pooling is overkill for just about anything other than IIS.
Google's webbrowser chrome is a multi-process architecture software. It is open source, so you can check out its code and see how to manage processes.

Should I use .NET 4.0 Tasks in a library?

I'm writing a .NET 4.0 library that should be efficient and simple to use.
The library is used by referencing it and using its different classes.
Should I use .NET 4.0 Tasks tot make things more efficient internally? I fear that it might make the usage of the library more complex and limited since the users might want to decide for themselves when and where to use tasks and threads.
If your answer depends on the kind of library, here is more information:
The library is Pcap.Net, which is a wrapper for WinPcap and includes a packet interpretation framework.
It only is an issue when the user can 'see' the threading, ie you give out access to data that could be accessed (by you) on another Thread. Probably not a good idea.
But when the parallel processing stays completely inside your application then there is very little chance your users would object.
Should? Dunno. How about giving people an option by providing extension methods that use tasks against the library and push that out in a separate DLL? If you want to use tasks, reference the extension library and go crazy. Otherwise, stick with the core dll.
I believe there are many projects that follow this pattern with Linq. They provide their core library and a separate .Linq.DLL which has extension methods...

Are there examples for programming-languages support automatic management of resources besides memory?

The idea of automatic memory management has gained big support with new programming languages. I'm interested if concepts exists for automatic management of other resources like files, network-sockets etc.?
For single threaded applications, the pattern of a resource being available for the extent of a block of code, with clean-up at the end, exists in several languages. Examples are the use of RAII in C++, or with-open-file in Common Lisp (and equivalent in newer Lisp-influenced languages - the same in Dylan, C#, Python and in Ruby you can pass a block to a file object).
I'm not aware of anything better suited for the multithreaded environments where modern garbage collection shines, short of combining RAII and reference counting or auto_ptr in C++, which isn't always a trivial combination.
One important distinction between automatic management of resources and automatic memory management is that memory management can often afford to be non-deterministic and only reclaimed when the process requires it, whereas often a resource is limited at an OS level, so should be reclaimed as soon as it is no longer used. Hence the choice of smart pointers rather than garbage collection as the management implementation. There's an intermediate level of resource - GDI objects, temporary file handles, threads - where an application wants to limit the total it uses, but doesn't care so much about releasing them to other processes - these are often pooled, which gets you some way towards automatic management.
One of the reasons we can automatically manage memory allocation now is we have so much of it.
Back in the days when memory was tight you had to squeeze the most out of every bite the system had.
Other resouces such as file handles and sockets are far fewer, and still need to be handled by hand (pun intended).
Consider also the .net compact framework, it’s not uncommon for windows mobile devices to have 32mb or 64mb of volatile memory to play with which - when you think about it - is still “lots”.
I wondering what the footprint of the .net compact framework is, and how would it perform on a Nokia phone with 4mb of volatile memory.
Anyone any ideas?
(This is a wiki answer, feel free to correct or add more detail)
So, IMHIO we can afford to be slow reclaiming memory, because we're not going to run out of it in a hurry, which isn't the case with other resources.
Object persistence and caching subsystems can be considered an automatic allocation of files and resources. If you apply a caching subsystem to a network connection you don't have to care about file opening, file deleting, and so on.
A way to manage automatically network connection could be done in parallel computing environment (i.e MPI), you can set programmatically the shape of the processors interconnections. Than you just send a message from a process to another, almost ignoring the way it's implemented. Sometimes those messages are translated in sockets.
If you have a function that let you get a page from its Url, would you consider it a sort of Automatic socket management?

Resources