Resources release while coding on Haxe (mainly for XML parser disposal)? - haxe

I'm new to Haxe coming from Actionscript. I was looking for ways to dispose resources when I can't reuse them. In particular, is there something like the Actionscript's "System.disposeXML" for Haxe's Fast XML?

It all depends on the target platform, but in Javascript/AS3, to have an object or graph of objects be disposed of, simply make sure there are no references to it anywhere in your program. The garbage collector will take care of it.
What disposeXML does seems to be overkill. For modern garbage collectors, you don't need to break all references within the group, just those referring to any member of it.

Related

Office Automation with #import harmful?

http://code.msdn.microsoft.com/office/CppAutomateOutlook-55251528 states:
[...] It is very powerful, but often not recommended because of
reference-counting problems that typically occur when used with the
Microsoft Office applications. [...]
Which reference-counting problems are specifically meant here?
For example, does it apply to the particular example?
Similarly as in the example, I just want to open Outlook, create an appointment, done.
I wanted to use #import but this statements makes me feel afraid about it ...
A circular reference happens when you have two or more objects holding references to each other, directly or indirectly. In COM, it means that circularly linked objects have called IUnknown::AddRef on each other.
In case with Excel automation, that could happen if you connect your event handler (sink) to Excell objects sourcing events (via IConnectionPoint::Advise). This way, you may be keeping a reference to e.g., Application object, while Application object keeps a reference to your sink.
This problem is not specific to smart pointers generated by VC++ #import directive. It's about how you handle the shutdown of the COM objects when you no longer need them. You should explicitly break all connections you've made (i.e., do IConnectionPoint::Unadvise), and call any explicit shutdown API the object may expose (e.g., Workbook::Close or Application::Quit). Then you should explicitly release you reference (e.g, call workbookPtr.Release() on a smart pointer).
That said, if you don't handle any COM events sourced by Excel, you shouldn't be worrying much, the chance you might create a circular reference would be low. Besides, Excel is an out-of-process COM server, and COM has some garbage collection logic in place to manage life-time of out-of-process servers. However, while your application is still open, the Excel process will be kept alive until all references to its object have been released, or Application::Quit has been called.
That is pretty nonsensical. Programming Office interop with the aid of #import is the boilerplate and recommended way. The smart pointer types it auto-generates are explicitly intended to get reference counting done automatically for you so you can't forget to call Release(). There are a few sharp edges, you do have to understand what smart pointers can do and not do.
This is otherwise par for the course for the team that's behind the All-In-One Code Framework. The samples are created by a support team in Shanghai, originally hired to help out in the MSDN forums. These guys don't have the kind of credentials you'd expect from a Microsoft programmer that works in Redmond and their snippets are not being reviewed. Some of them are outright ill conceived. If you ever asked questions at the MSDN forums and saw the answers they post then you know what I mean.
Their Solution2.cpp sample uses late-binding through IDispatch. That's definitely the hard way to interop with Office, you get no help whatsoever when you write the code. IntelliSense is unable to give you any useful information when you write the method call nor can the compiler tell you that you missed an argument or got the argument type wrong. Your program will fail at runtime with an opaque error code, like DISP_E_BADVARTYPE or DISP_E_BADPARAMCOUNT. And the Release() calls have to be made explicitly, that of course makes it easier to miss one. Problems you don't have when you use the smart pointers, they give you auto-completion and type checking. You can see for yourself how much smaller and readable Solution1.cpp is.
Diagnosing a missed call to Release() is otherwise easy, your program completes but you'll still see Outlook.exe running in Task Manager. Something you'll get used to checking anyway, it will also happen when you debug your program, find a bug and stop the program to make a correction. Which of course also prevents Release() from being called so Outlook will keep running. You have to kill it yourself.
Do consider writing this kind of code in a managed language like C# or VB.NET. You'll get a lot more help if you have a problem and you'll find lots of sample code. And the garbage collector never forgets to make a Release call. It is just a bit slow at doing so.

Garbage collection with glib?

I would like to interface an garbage collected language (specifically, it's using the venerable Boehm libgc) to the glib family of APIs.
glib and gobject use reference counting internally to manage object lifetime. The normal way to wrap these is to use a garbage collected peer object which holds a reference to the glib object, and which drops the reference when the peer gets finalised; this means that the glib object is kept alive while the application is using the peer. I've done this before, and it works, but it's pretty painful and has its own problems (such as producing two peers of the same underlying object).
Given that I've got all the overhead of a garbage collector anyway, ideally what I'd like to do is to simply turn off glib's reference counting and use the garbage collector for everything. This would simplify the interface no end and hopefully improve performance.
On the face of things this would seem fairly simple --- hook up a garbage collector finaliser to the glib object finaliser, and override the ref and unref functions to be noops --- but further investigation shows there's more to it than that: glib is very fond of keeping its own allocator pools, for example, and of course I let it do that the garbage collector assume that everything in the pool is live and it'll leak.
Is persuading glib to use libgc actually feasible? If so, what other gotchas am I likely to face? What sort of glib performance impact would forcing all allocations to go through libgc produce (as opposed to using the optimised allocators currently in glib)?
(The glib docs do say that it's supposed to interface cleanly to a garbage collector...)
http://mail.gnome.org/archives/gtk-devel-list/2001-February/msg00133.html is old
but still relevant.
Learning how language bindings work (proxy objects, toggle references) would probably be helpful in thinking this through.
Update: oh, from hearing Boehm GC I was thinking you were trying to replace g_malloc etc. with GC, as in that old post.
If you're doing a language binding (not GC'ing C/C++) then yes that's very achievable. A good pretty manageable example to read over would be the gjs (SpiderMonkey JavaScript) codebase.
The basic idea is that you're going to have a proxy object that "holds" a GObject and often has the only reference to the GObject. But, the one complexity is toggle references: http://mail.gnome.org/archives/gtk-devel-list/2005-April/msg00095.html
You have to store the proxy object on the GObject so you can get it back (say someone does widget.get_parent(), then you need to return the same object that was previously set as the parent, by retrieving it from the C GObject). You also have to be able to go from the proxy object to the C object obviously.
No.
Since asking this I have discovered that libgc does not search memory owned by third-party libraries for references. Which means that if glib has, in its own workspace, the only reference to an object allocated via libgc, libgc will collect it and then your program will crash.
libgc is only safe to use on objects owned by the main program.
For future visitors, you can refer to this article (not mine): http://d.hatena.ne.jp/bellbind/20090630/1246362401.
It's written in Japanese but the code is readable.
The compilation options mentioned in https://mail.gnome.org/archives/gtk-devel-list/2001-February/msg00133.html may also work, I haven't tested it myself.
And another relavant issue on G_SLICE if you encountered it: http://www.hpl.hp.com/hosted/linux/mail-archives/gc/2011-January/004289.html.

Parsing .ply files without memory leaks

I downloaded the .ply files in the Stanford 3D scanning repository, and am using Stanford's code from that page (ply.h, plyfile.c) to parse them. However, looking at this code, I see that it's rife with mallocs that are never freed. I could close my eyes and look the other way, but it makes my teeth itch.
I can think of two workarounds:
One is to use Hans Boehm's garbage collector, or something similar, which redefines "malloc" so that it does so within a garbage collector. I've never used this library, but perhaps there's a way to have it operate just on the mallocs in the Stanford code and not anywhere else.
The other workaround is to use a different parser, preferably a C++ one with nicely RAII-ified memory management. I see a few alternative parsers and converters listed at the above link, but rather than kill a day or two trying them all, I was hoping to get a recommendation here.
Can anybody recommend a way to parse .ply files without memory leaks, either by containing the memory leaks in the Stanford parser, or using a different parser, or by some third method I haven't thought of?
Try also RPly.
This library looks promising; until someone else answers this question, I'll mark this as the answer: http://assimp.sourceforge.net/
Another library is the one used by MeshLab
http://vcg.sourceforge.net/index.php/Tutorial

Achieving Thread-Safety

Question How can I make sure my application is thread-safe? Are their any common practices, testing methods, things to avoid, things to look for?
Background I'm currently developing a server application that performs a number of background tasks in different threads and communicates with clients using Indy (using another bunch of automatically generated threads for the communication). Since the application should be highly availabe, a program crash is a very bad thing and I want to make sure that the application is thread-safe. No matter what, from time to time I discover a piece of code that throws an exception that never occured before and in most cases I realize that it is some kind of synchronization bug, where I forgot to synchronize my objects properly. Hence my question concerning best practices, testing of thread-safety and things like that.
mghie: Thanks for the answer! I should perhaps be a little bit more precise. Just to be clear, I know about the principles of multithreading, I use synchronization (monitors) throughout my program and I know how to differentiate threading problems from other implementation problems. But nevertheless, I keep forgetting to add proper synchronization from time to time. Just to give an example, I used the RTL sort function in my code. Looked something like
FKeyList.Sort (CompareKeysFunc);
Turns out, that I had to synchronize FKeyList while sorting. It just don't came to my mind when initially writing that simple line of code. It's these thins I wanna talk about. What are the places where one easily forgets to add synchronization code? How do YOU make sure that you added sync code in all important places?
You can't really test for thread-safeness. All you can do is show that your code isn't thread-safe, but if you know how to do that you already know what to do in your program to fix that particular bug. It's the bugs you don't know that are the problem, and how would you write tests for those? Apart from that threading problems are much harder to find than other problems, as the act of debugging can already alter the behaviour of the program. Things will differ from one program run to the next, from one machine to the other. Number of CPUs and CPU cores, number and kind of programs running in parallel, exact order and timing of stuff happening in the program - all of this and much more will have influence on the program behaviour. [I actually wanted to add the phase of the moon and stuff like that to this list, but you get my meaning.]
My advice is to stop seeing this as an implementation problem, and start to look at this as a program design problem. You need to learn and read all that you can find about multi-threading, whether it is written for Delphi or not. In the end you need to understand the underlying principles and apply them properly in your programming. Primitives like critical sections, mutexes, conditions and threads are something the OS provides, and most languages only wrap them in their libraries (this ignores things like green threads as provided by for example Erlang, but it's a good point of view to start out from).
I'd say start with the Wikipedia article on threads and work your way through the linked articles. I have started with the book "Win32 Multithreaded Programming" by Aaron Cohen and Mike Woodring - it is out of print, but maybe you can find something similar.
Edit: Let me briefly follow up on your edited question. All access to data that is not read-only needs to be properly synchronized to be thread-safe, and sorting a list is not a read-only operation. So obviously one would need to add synchronization around all accesses to the list.
But with more and more cores in a system constant locking will limit the amount of work that can be done, so it is a good idea to look for a different way to design your program. One idea is to introduce as much read-only data as possible into your program - locking is no longer necessary, as all access is read-only.
I have found interfaces to be a very valuable aid in designing multi-threaded programs. Interfaces can be implemented to have only methods for read-only access to the internal data, and if you stick to them you can be quite sure that a lot of the potential programming errors do not occur. You can freely share them between threads, and the thread-safe reference counting will make sure that the implementing objects are properly freed when the last reference to them goes out of scope or is assigned another value.
What you do is create objects that descend from TInterfacedObject. They implement one or more interfaces which all provide only read-only access to the internals of the object, but they can also provide public methods that mutate the object state. When you create the object you keep both a variable of the object type and a interface pointer variable. That way lifetime management is easy, because the object will be deleted automatically when an exception occurs. You use the variable pointing to the object to call all methods necessary to properly set up the object. This mutates the internal state, but since this happens only in the active thread there is no potential for conflict. Once the object is properly set up you return the interface pointer to the calling code, and since there is no way to access the object afterwards except by going through the interface pointer you can be sure that only read-only access can be performed. By using this technique you can completely remove the locking inside of the object.
What if you need to change the state of the object? You don't, you create a new one by copying the data from the interface, and mutate the internal state of the new objects afterwards. Finally you return the reference pointer to the new object.
By using this you will only need locking where you get or set such interfaces. It can even be done without locking, by using the atomic interchange functions. See this blog post by Primoz Gabrijelcic for a similar use case where an interface pointer is set.
Simple: don't use shared data. Every time you access shared data you risk running into a problem (if you forget to synchronize access). Even worse, each time you access shared data you risk blocking other threads which will hurt your paralelization.
I know this advice is not always applicable. Still, it doesn't hurt if you try to follow it as much as possible.
EDIT: Longer response to Smasher's comment. Would not fit in a comment :(
You are totally correct. That's why I like to keep a shadow copy of the main data in a readonly thread. I add a versioning to the structure (one 4-aligned DWORD) and increment this version in the (lock-protected) data writer. Data reader would compare global and private version (which can be done without locking) and only if they differr it would lock the structure, duplicate it to a local storage, update the local version and unlock. Then it would access the local copy of the structure. Works great if reading is the primary way to access the structure.
I'll second mghie's advice: thread safety is designed in. Read about it anywhere you can.
For a really low level look at how it is implemented, look for a book on the internals of a real time operating system kernel. A good example is MicroC/OS-II: The Real Time Kernel by Jean J. Labrosse, which contains the complete annotated source code to a working kernel along with discussions of why things are done the way they are.
Edit: In light of the improved question focusing on using a RTL function...
Any object that can be seen by more than one thread is a potential synchronization issue. A thread-safe object would follow a consistent pattern in every method's implementation of locking "enough" of the object's state for the duration of the method, or perhaps, narrowed to just "long enough". It is certainly the case that any read-modify-write sequence to any part of an object's state must be done atomically with respect to other threads.
The art lies in figuring out how to get useful work done without either deadlocking or creating an execution bottleneck.
As for finding such problems, testing won't be any guarantee. A problem that shows up in testing can be fixed. But it is extremely difficult to write either unit tests or regression tests for thread safety... so faced with a body of existing code your likely recourse is constant code review until the practice of thread safety becomes second nature.
As folks have mentioned and I think you know, being certain, in general, that your code is thread safe is impossible (I believe provably impossible but I would have to track down the theorem). Naturally, you want to make things easier than that.
What I try to do is:
Use a known pattern of multithreaded design: A thread pool, the actor model paradigm, the command pattern or some such approach. This way, the syncronization process happens in the same way, in a uniform way, throughout the application.
Limit and concentrate the points of synchronization. Write your code so you need synchronization in as few places as possible and the keep the synchronization code in one or few places in the code.
Write the synchronization code so that the logical relation between the values is clear on both on entering and on exiting the guard. I use lots of asserts for this (your environment may limit this).
Don't ever access shared variables without guards/synchronization. Be very clear what your shared data is. (I've heard there are paradigms for guardless multithreaded programming but that would require even more research).
Write your code as cleanly, clearly and DRY-ly as possible.
My simple answer combined with those answer is:
Create your application/program using
thread safety manner
Avoid using public static variable in
all places
Therefore it usually fall into this habit/practice easily but it needs some time to get used to:
program your logic (not the UI) in functional programming language such as F# or even using Scheme or Haskell. Also functional programming promotes thread safety practice while it also warns us to always code towards purity in functional programming.
If you use F#, there's also clear distinction about using mutable or immutable objects such as variables.
Since method (or simply functions) is a first class citizen in F# and Haskell, then the code you write will also have more disciplined toward less mutable state.
Also using the lazy evaluation style that usually can be found in these functional languages, you can be sure that your program is safe fromside effects, and you'll also realize that if your code needs effects, you have to clearly define it. IF side effects are taken into considerations, then your code will be ready to take advantage of composability within components in your codes and the multicore programming.

What are all the way that you try to make your code functional like?

so that you can make your program concurrent easily in the future.
I focus on making items Immutable. Immutable objects allow you to reason about multi-threaded code a lot easier than "thread safe" objects. The object has one visible state that can be passed between threads without any synchronization. It takes the thought out of multi-threaded programming.
If you're interested, I've published a lot of my work with immutable objects, in particular immutable collections on code gallery. The name of the project is RantPack. In the collection area I have
ImmutableCollection<T>
ImmutableMap<TKey,TValue>
ImmutableAvlTree<T>
ImmutableLinkedList<T>
ImmutableArray<T>
ImmutableStack<T>
ImmutableQueue<T>
There is an additional shim layer which (CollectionUtility) which will produce wrapper objects that implement BCL interfaces such as IList<T> and ICollection<T>. They can't fully implement the interfaces since they are immutable but all possible methods are implemented.
The source code (C#) including the unit testing is also available on the site.
I program mainly in Java. I'm waiting patiently for the day where closures will be added to the language. But as I am still stuck on Java 1.4.2, even if they get added, that's not going to be for me for a long time !
That said, my main "functional" way of programming is making a lot of use of the "final" keyword. I try to have as many classes as possible completely immutable, and for the rest to have a clear distinction between what's transient and what's immutable.
Don't use member variables or global variables. Use the local stack of functions/methods. When a method uses only internally scoped variables and call parameters and returns all information using out/inout/reference parameters or return values, it is functional.
Make everything asynchronic.
Use immutable objects, messages, etc.
Communicate via queues.
Here's a talk on rubyconf 2008 about the subject, it's mostly ruby centered, but several concepts remain valid.
http://rubyconf2008.confreaks.com/better-ruby-through-functional-programming-2.html

Resources