Optimization in handling kmalloc return

Optimization in handling kmalloc return - linux

So I was wondering why while checking the return value of a kmalloc, we don't use "likely" hints like so:
void *ptr = kmalloc(size, GFP_KERNEL)
if (unlikely(!ptr))
return err;
My assumption is, of course, that kmalloc does not fail very often. I have a hard time remembering the last time that it failed. Based on this, would it not be a useful recommendation to the compiler?

It probably can't hurt to do this, but unless you are running your mallocs in a tight loop, it probably will have a negligible impact on your runtime, compared to the cost of the malloc itself.

First a side note that kmalloc is just a macro which can expand to different function calls depending on args. The funcs it expands to are annotated with __attribute__((__malloc__)), which allows the compiler to assume malloc-like behaviour. I don't know what optimisations are enabled by this practice, but an expectation that the function does not fail is most likely taken advantage of.

Related

Memory barrier in the implementation of single producer single consumer

The following implementation from Wikipedia:
volatile unsigned int produceCount = 0, consumeCount = 0;
TokenType buffer[BUFFER_SIZE];
void producer(void) {
while (1) {
while (produceCount - consumeCount == BUFFER_SIZE)
sched_yield(); // buffer is full
buffer[produceCount % BUFFER_SIZE] = produceToken();
// a memory_barrier should go here, see the explanation above
++produceCount;
}
}
void consumer(void) {
while (1) {
while (produceCount - consumeCount == 0)
sched_yield(); // buffer is empty
consumeToken(buffer[consumeCount % BUFFER_SIZE]);
// a memory_barrier should go here, the explanation above still applies
++consumeCount;
}
}
says that a memory barrier must be used between the line that accesses the buffer and the line that updates the Count variable.
This is done to prevent the CPU from reordering the instructions above the fence along-with that below it. The Count variable shouldn't be incremented before it is used to index into the buffer.
If a fence is not used, won't this kind of reordering violate the correctness of code? The CPU shouldn't perform increment of Count before it is used to index into buffer. Does the CPU not take care of data dependency while instruction reordering?
Thanks

If a fence is not used, won't this kind of reordering violate the correctness of code? The CPU shouldn't perform increment of Count before it is used to index into buffer. Does the CPU not take care of data dependency while instruction reordering?
Good question.
In c++, unless some form of memory barrier is used (atomic, mutex, etc), the compiler assumes that the code is single-threaded. In which case, the as-if rule says that the compiler may emit whatever code it likes, provided that the overall observable effect is 'as if' your code was executed sequentially.
As mentioned in the comments, volatile does not necessarily alter this, being merely an implementation-defined hint that the variable may change between accesses (this is not the same as being modified by another thread).
So if you write multi-threaded code without memory barriers, you get no guarantees that changes to a variable in one thread will even be observed by another thread, because as far as the compiler is concerned that other thread should not be touching the same memory, ever.
What you will actually observe is undefined behaviour.

It seems, that your question is "can incrementing Count and assigment to buffer be reordered without changing code behavior?".
Consider following code tansformation:
int count1 = produceCount++;
buffer[count1 % BUFFER_SIZE] = produceToken();
Notice that code behaves exactly as original one: one read from volatile variable, one write to volatile, read happens before write, state of program is the same. However, other threads will see different picture regarding order of produceCount increment and buffer modifications.
Both compiler and CPU can do that transformation without memory fences, so you need to force those two operations to be in correct order.

If a fence is not used, won't this kind of reordering violate the correctness of code?
Nope. Can you construct any portable code that can tell the difference?
The CPU shouldn't perform increment of Count before it is used to index into buffer. Does the CPU not take care of data dependency while instruction reordering?
Why shouldn't it? What would the payoff be for the costs incurred? Things like write combining and speculative fetching are huge optimizations and disabling them is a non-starter.
If you're thinking that volatile alone should do it, that's simply not true. The volatile keyword has no defined thread synchronization semantics in C or C++. It might happen to work on some platforms and it might happen not to work on others. In Java, volatile does have defined thread synchronization semantics, but they don't include providing ordering for accesses to non-volatiles.
However, memory barriers do have well-defined thread synchronization semantics. We need to make sure that no thread can see that data is available before it sees that data. And we need to make sure that a thread that marks data as able to be overwritten is not seen before the thread is finished with that data.

Does this optimisation reduce memory use?

Take for (a contrived) example
private bool CheckSomeThings()
{
var importantThing = GetImportantThing();
var generatedOtherThing = DoFunkyStuff(importantThing);
bool firstCheck = CheckThing(importantThing);
bool otherCheck = IsItStuff(generatedOtherThing);
bool furtherStuff = FurtherChecking(importantThing, generatedOtherThing);
return firstCheck && otherCheck && furtherStuff;
}
When writing/reviewing code like this I tend to suggest not creating the bool just to use it as a check so I prefer
private bool CheckSomeThings()
{
var importantThing = GetImportantThing();
var generatedOtherThing = DoFunkyStuff(importantThing);
return CheckThing(importantThing)
&& IsItStuff(generatedOtherThing)
&& FurtherChecking(importantThing, generatedOtherThing);
}
I find the second example more readable because I don't have to parse the created variables and check if they are used anywhere... but I recognise that's subjective.
However, I think that the first example (with the additional assignments) uses more memory than the second. So makes an easy win.
While premature optimisation is akin to punching a puppy the real code that lead to this question is going to be running as part of a real-time processing doohickie so optimising memory use is likely to be a real concern.
I wondered if I'm right about the memory use as a result of the assignments?
Also, yes we will run the system through a memory profiler etc to get real data about how it performs this is a question about whether a simple code style approach can protect against unnecessary memory use.

Assigning to the two extra variables will use a very small amount of additional memory but for me that isn't relevant unless you have really, really, really strict memory usage requirements as it'll be released soon enough.
The more important aspect is that if you assign the variables you'll have to call IsItStuff and FurtherChecking regardless of whether CheckThing passes or fails so you're doing unnecessary processing which is usually more important as it'll take longer for the function to complete.
If you use the second example the if statement will be short circuited and won't call any of the functions after the first false is returned so it will be more efficient whenever a check fails.

DoctorMick already did a good job explaining your 'optimization' is actually not a optimization.
Even if you would write such code and use properties or variables only for example (so you wouldn't use expensive method calls), the optimizer would inline that, so the gain would be nothing.
So this:
bool x = SomeVariable;
if (x)
Will be optimized by the compiler to:
if (SomeVariable)
Note that this is only possible if you don't reassign x later on in the code.

Are Data Races bad?

I like to settle a theoretical computing argument.
Assume everything initial 0
Thread0 Thread1
x=1 | y=x
Here we have a data race. As far as I understand (assuming that x fits in the architecture's word-size and is aligned on the word boundary, which it normally would be), the result is either x=1 ^ y=0 or x=1 ^ y=1.
Now my second example uses explicit locking (assume that lock() gets some global lock), and as far as I understand this is not a data race condition anymore.
Thread0 Thread1
lock() | lock()
x=1 | y=x
unlock() | unlock()
However I would argue that both programs are identical, they produce identical output, have identical race issues. Somehow however people are trying to convince me that data race condition is bad, and I don't see why my first program would be worse than my second.

Edit. The full quote from Wikipedia is:
C++11 introduced formal support for multithreading, and defined a data race strictly as a race condition between non-atomic variables. While race conditions in general will continue to exist, a "data race" must be avoided by the programmer, who must assure that only one thread at a time may access any variable if the access is for writing.
Now, assuming this is correct (it's wikipedia, which tends to be reasonably good on programming but can often be very wrong indeed), it's defining "data race" in this context purely as one of the clearly bad cases; those which can cause shearing of values. Such cases obviously must be avoided, so clearly data-races—defined as they are here—must be avoided.
And by this definition, neither program in your question has a data race.
I leave my original answer on race conditions generally:
The second example has a data-race too. Indeed, it has the exact same data-race as the first one.
Is this bad? That depends. Note before any of the rest. Not only are many cases bad, as I'll describe more below, but those cases that are bad tend to be particularly hard to find and fix, which in itself should lean one towards assuming the worse.
An obvious case where a data race is bad is where it corrupts data. Let's say we change your example so that x and y were larger than the architecture's word size and we're setting x = -1. We'll also assume two's-complement. Now the possible values for y are not just -1 and 0, but also -4294967296 and 4294967295.
In this case, the locking you suggest wouldn't remove the data-race completely, but would remove that part of it that could cause shearing: The only possible values of y would again be -1 and 0.
Another question is serialisation. It's often necessary to be able to consider a sequence of concurrent events as having been one of a limited set of sequential events.
For example, consider we start with X = 0 and then have:
Thread 0 Thread 1
++x x = -50
Now, there's still the risk of sheering here that could result in a possible bogus value.
Assuming that x is word-size or smaller, we still might have an issue. There are two possible values if the operations were not concurrent. Either x could be equal to -50 (increment, then assign -50) or x could be equal to -49 (assign -50 then increment). However, concurrently it's possible for us to end up with x having a value of 1 because thread 0 reads 0, thread 1 assigns -50and then thread 0 increments and assigns 1.
Now, it's quite possible that this is perfectly okay. It's very likely though that it isn't.
As programmers we've got four possibilities:
Identify the data-race. Determine that it is harmless (or relatively harmless*), and let it be.
Identify the data-race. Determine that it can cause problems, and fix it.
Identify the data-race. Just fix it because that way we can't make a mistake in determining it is harmless when it actually isn't.
Identify the data-race. Determine that it can cause problems. Change the code so the race doesn't cause problems.
The importance of case number 2 is obvious - we turn code that has a bug into code that isn't.
The importance of case number 3 comes down to time and provability. We might well be making code less efficient (many methods for stopping data-races have at least some overhead), but it often takes less developer time to remove a race than prove it harmless, and the cost of a wrong example is marginally slower code whereas the cost of being wrong in the other direction is a hard to fix bug.
The importance of number 1 is more complicated, it can be important in some very low-level concurrent code to avoid locking, so there are cases where we want to tolerate races. Number 4 is a way to turn something from number 2 into number 1, and comes up when either the data-race is inherent to the problem (we can't remove it) or we're doing the sort of low-level concurrency that number 1 involves.
Here's an interesting example in C#:
public static SomeResource GetTheResource()
{
get
{
if(_theResource == null)
_theResource = CreateTheResource();
return _theResource
}
}
The data-race should be obvious; until theResource is set and all CPU's caches see the update, we might assign to it several times from different threads. Is this a bug? Many people would say it is, but actually it depends. It's possible that it's safe to have a brief period where different versions of theResource are used, and all we really lose is some efficiency in the beginning from the multiple calls to CreateTheResource(). In code with a high requirement for performance we might decide to tolerate this initial lower efficiency for the long-term efficiency gain of no locking. Or it might be vital that we lock. Or we might just lock because we don't have that pressing a need to avoid it, and it's simpler just to assume that the might be a problem.
Important Point 1: If you do decide to tolerate a race like this, you should add a comment to that effect and why. Otherwise every time someone comes across this code they'll have to check again that it's safe, rather than at most check your stated reasoning.
Important Point 2: While the principle here is language-agnostic, the details in each case often are not. In this case tolerating the race depends not just on the temporary multiple copies being safe, but also on garbage collection cleaning those excess copies up. If we were instead assigning a pointer to the heap in C++ the above would at the very best be leaky, even if otherwise safe.
A more complicated case is something like this (again a C# example, but applicable to other languages):
internal sealed class LockFreeQueue<T>
{
private sealed class Node
{
public readonly T Item;
public Node Next;
public Node(T item)
{
Item = item;
}
}
private volatile Node _head;
private volatile Node _tail;
public LockFreeQueue()
{
_head = _tail = new Node(default(T));
}
#pragma warning disable 420 // volatile semantics not lost as only by-ref calls are interlocked
public void Enqueue(T item)
{
Node newNode = new Node(item);
for(;;)
{
Node curTail = _tail;
if (Interlocked.CompareExchange(ref curTail.Next, newNode, null) == null) //append to the tail if it is indeed the tail.
{
Interlocked.CompareExchange(ref _tail, newNode, curTail); //CAS in case we were assisted by an obstructed thread.
return;
}
else
{
Interlocked.CompareExchange(ref _tail, curTail.Next, curTail); //assist obstructing thread.
}
}
}
public bool TryDequeue(out T item)
{
for(;;)
{
Node curHead = _head;
Node curTail = _tail;
Node curHeadNext = curHead.Next;
if (curHead == curTail)
{
if (curHeadNext == null)
{
item = default(T);
return false;
}
else
Interlocked.CompareExchange(ref _tail, curHeadNext, curTail); // assist obstructing thread
}
else
{
item = curHeadNext.Item;
if (Interlocked.CompareExchange(ref _head, curHeadNext, curHead) == curHead)
{
return true;
}
}
}
}
#pragma warning restore 420
}
This code doesn't prevent data-races, but rather it reacts to them. If an operation is affected by another thread, then rather than error or return an incorrect result, the thread deals with the race and returns something else (and indeed even helps the other thread in some cases).
So in summary, data-races are not in and of themselves bad things. They are though complicating things, and those complications can cause problems. When you have a data-race you have a choice between proving it's not a problem, changing your code to tolerate the race so that it's no longer a problem, or changing your code to remove the race. Of these, just removing the race is often the easiest choice.
*I don't mean "relatively harmless" in a vague way here, but relative to the alternative. E.g. if we decide to leave the race in the C# example given, it's because we've decided that the cost of redundant object creation is less harmful than the relative cost of preventing it.

I thank everybody for their answers, although valuable they did not actually answer the question I was hoping I asked. The answers did allow me to reason better about what I was actually asking, and in the end find something of an answer online:
http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
So I guess my question should have been:
The C(++)11 standard defines my first example as a data race (if I don't use the "atomic" keyword), and the second one not. The first one therefore has undefined behaviour (even though there don't seem to be compiler implementations that would result in anything but x==1 && y==0|1, according to the standard any resulting value for x and y is correct compiler behaviour). I was wondering why this is. I think the Intel document answers that question pretty elaborately.

If x and y fit into a machine register then assignment is atomic by default so locks won't change the outcome. It's equally possible to get y = 0 or y = 1 in the second case as well.

D programming without the garbage collector

I've been looking at D today and on the surface it looks quite amazing. I like how it includes many higher level constructs directly in the language so silly hacks or terse methods don't have to be used. One thing that really worries me if the GC. I know this is a big issues and have read many discussions about it.
My own simple tests sprouted from a question here shows that the GC is extremely slow. Over 10 times slower than straight C++ doing the same thing. (obviously the test does not directly convert into real world but the performance hit is extreme and would slow down real world happens that behave similarly(allocating many small objects quickly)
I'm looking into writing a real time low latency audio application and it is possible that the GC will ruin the performance of the application to make it nearly useless. In a sense, if it has any issues it will ruin the real time audio aspect which is much more crucial since, unlike graphics, audio runs at a much higher frame rate(44000+ vs 30-60). (due to it's low latency it is more crucial than a standard audio player which can buffer significant amounts of data)
Disabling the GC improved the results to within about 20% of the C++ code. This is significant. I'll give the code at the end for analysis.
My questions are:
How difficult is it to replace D's GC with a standard smart pointers implementation so that libraries that rely on the GC can still be used. If I remove GC completely I'll lose a lot of grunt work, as D already has limit libraries compared to C++.
Does GC.Disable only halt the garbage collection temporarily(preventing the GC thread from running) and GC.Enable pick back up where it left off. So I could potentially disable the GC from running in high cpu usage moments to prevent latency issues.
Is there any way to enforce a pattern to not use GC consistently. (this is because I've not programming in D and when I start writing my glasses that do not use the GC I would like to be sure I don't forget to implement their own clean up.
Is it possible to replace the GC in D easily? (not that I want to but it might be fun to play around with different methods of GC one day... this is similar to 1 I suppose)
What I'd like to do is trade memory for speed. I do not need the GC to run every few seconds. In fact, if I can properly implement my own memory management for my data structures then chances are it will not need to run very often at all. I might need to run it only when memory becomes scarce. From what I've read, though, the longer you wait to call it the slower it will be. Since there generally will be times in my application where I can get away with calling it without issues this will help alleviate some of the pressure(but then again, there might be hours when I won't be able to call it).
I am not worried about memory constraints as much. I'd prefer to "waste" memory over speed(up to a point, of course). First and foremost is the latency issues.
From what I've read, I can, at the very least, go the route of C/C++ as long as I don't use any libraries or language constructs that rely on the GC. The problem is, I do not know the ones that do. I've seen string, new, etc mentioned but does that mean I can't use the build in strings if I don't enable the GC?
I've read in some bug reports that the GC might be really buggy and that could explain its performance problems?
Also, D uses a bit more memory, in fact, D runs out of memory before the C++ program. I guess it is about 15% more or so in this case. I suppose that is for the GC.
I realize the following code is not representative of your average program but what it says is that when programs are instantiating a lot of objects(say, at startup) they will be much slower(10 times is a large factor). Of the GC could be "paused" at startup then it wouldn't necessarily be an issue.
What would really be nice is if I could somehow have the compiler automatically GC a local object if I do not specifically deallocate it. This almost give the best of both worlds.
e.g.,
{
Foo f = new Foo();
....
dispose f; // Causes f to be disposed of immediately and treats f outside the GC
// If left out then f is passed to the GC.
// I suppose this might actually end up creating two kinds of Foo
// behind the scenes.
Foo g = new manualGC!Foo(); // Maybe something like this will keep GC's hands off
// g and allow it to be manually disposed of.
}
In fact, it might be nice to actually be able to associate different types of GC's with different types of data with each GC being completely self contained. This way I could tailor the performance of the GC to my types.
Code:
module main;
import std.stdio, std.conv, core.memory;
import core.stdc.time;
class Foo{
int x;
this(int _x){x=_x;}
}
void main(string args[])
{
clock_t start, end;
double cpu_time_used;
//GC.disable();
start = clock();
//int n = to!int(args[1]);
int n = 10000000;
Foo[] m = new Foo[n];
foreach(i; 0..n)
//for(int i = 0; i<n; i++)
{
m[i] = new Foo(i);
}
end = clock();
cpu_time_used = (end - start);
cpu_time_used = cpu_time_used / 1000.0;
writeln(cpu_time_used);
getchar();
}
C++ code
#include <cstdlib>
#include <iostream>
#include <time.h>
#include <math.h>
#include <stdio.h>
using namespace std;
class Foo{
public:
int x;
Foo(int _x);
};
Foo::Foo(int _x){
x = _x;
}
int main(int argc, char** argv) {
int n = 120000000;
clock_t start, end;
double cpu_time_used;
start = clock();
Foo** gx = new Foo*[n];
for(int i=0;i<n;i++){
gx[i] = new Foo(i);
}
end = clock();
cpu_time_used = (end - start);
cpu_time_used = cpu_time_used / 1000.0;
cout << cpu_time_used;
std::cin.get();
return 0;
}

D can use pretty much any C library, just define the functions needed. D can also use C++ libraries, but D does not understand certain C++ constructs. So... D can use almost as many libraries as C++. They just aren't native D libs.
From D's Library reference.
Core.memory:
static nothrow void disable();
Disables automatic garbage collections performed to minimize the process footprint. Collections may continue to occur in instances where the implementation deems necessary for correct program behavior, such as during an out of memory condition. This function is reentrant, but enable must be called once for each call to disable.
static pure nothrow void free(void* p);
Deallocates the memory referenced by p. If p is null, no action occurs. If p references memory not originally allocated by this garbage collector, or if it points to the interior of a memory block, no action will be taken. The block will not be finalized regardless of whether the FINALIZE attribute is set. If finalization is desired, use delete instead.
static pure nothrow void* malloc(size_t sz, uint ba = 0);
Requests an aligned block of managed memory from the garbage collector. This memory may be deleted at will with a call to free, or it may be discarded and cleaned up automatically during a collection run. If allocation fails, this function will call onOutOfMemory which is expected to throw an OutOfMemoryError.
So yes. Read more here: http://dlang.org/garbage.html
And here: http://dlang.org/memory.html
If you really need classes, look at this: http://dlang.org/memory.html#newdelete
delete has been deprecated, but I believe you can still free() it.
Don't use classes, use structs. Structs are stack allocated, classes are heap. Unless you need polymorphism or other things classes support, they are overhead for what you are doing. You can use malloc and free if you want to.
More or less... fill out the function definitions here: https://github.com/D-Programming-Language/druntime/blob/master/src/gcstub/gc.d . There's a GC proxy system set up to allow you to customize the GC. So it's not like it is something that the designers do not want you to do.
Little GC knowledge here:
The garbage collector is not guaranteed to run the destructor for all unreferenced objects. Furthermore, the order in which the garbage collector calls destructors for unreference objects is not specified. This means that when the garbage collector calls a destructor for an object of a class that has members that are references to garbage collected objects, those references may no longer be valid. This means that destructors cannot reference sub objects. This rule does not apply to auto objects or objects deleted with the DeleteExpression, as the destructor is not being run by the garbage collector, meaning all references are valid.
import std.c.stdlib; that should have malloc and free.
import core.memory; this has GC.malloc, GC.free, GC.addroots, //add external memory to GC...
strings require the GC because they are dynamic arrays of immutable chars. ( immutable(char)[] ) Dynamic arrays require GC, static do not.
If you want manual management, go ahead.
import std.c.stdlib;
import core.memory;
char* one = cast(char*) GC.malloc(char.sizeof * 8);.
GC.free(one);//pardon me, I'm not used to manual memory management.
//I am *asking* you to edit this to fix it, if it needs it.
why create a wrapper class for an int? you are doing nothing more than slowing things down and wasting memory.
class Foo { int n; this(int _n){ n = _n; } }
writeln(Foo.sizeof); //it's 8 bytes, btw
writeln(int.sizeof); //Its *half* the size of Foo; 4 bytes.
Foo[] m;// = new Foo[n]; //8 sec
m.length=n; //7 sec minor optimization. at least on my machine.
foreach(i; 0..n)
m[i] = new Foo(i);
int[] m;
m.length=n; //nice formatting. and default initialized to 0
//Ooops! forgot this...
foreach(i; 0..n)
m[i] = i;//.145 sec
If you really need to, then write the Time-sensitive function in C, and call it from D.
Heck, if time is really that big of a deal, use D's inline assembly to optimize everything.

I suggest you read this article: http://3d.benjamin-thaut.de/?p=20
There you will find a version of the standard library that does own memory management and completely avoids garbage collection.

D's GC simply isn't as sophisticated as others like Java's. It's open-source so anyone can try to improve it.
There is an experimental concurrent GC named CDGC and there is a current GSoC project to remove the global lock: http://www.google-melange.com/gsoc/project/google/gsoc2012/avtuunainen/17001
Make sure to use LDC or GDC for compilation to get better optimized code.
The XomB project also uses a custom runtime but it's D version 1 I think.
http://wiki.xomb.org/index.php?title=Main_Page

You can also just allocate all memory blocks you need then use a memory pool to get blocks without the GC.
And by the way, it’s not as slow as you mentionned. And GC.disable() doesn’t really disable it.

We might look at the problem from a bit different view. Suboptimal performance of allocating many little objects, which you mention as a rationale for the question, has little to do with GC alone. Rather, it's a matter of balance between general-purpose (but suboptimal) and highly-performant (but task-specialised) memory management tools. The idea is: presence of GC doesn't prevent you from writing a real-time app, you just have to use more specific tools (say, object pools) for special cases.

Since this hasn't been closed yet, recent versions of D have the std.container library which contains an Array data structure that is significantly more efficient with respect to memory than the built-in arrays. I can't confirm that the other data structures in the library are also efficient, but it may be worth looking into if you need to be more memory conscious without having to resort to manually creating data structures that don't require garbage collection.

D is constantly evolving. Most of the answers here are 9+ years old, so I figured I'd answer these questions again for anyone curious what the current situation is.
(...) replace D's GC with a standard smart pointers implementation so that libraries that rely on the GC can still be used. (...)
Replacing the GC itself with smart pointers is not something I've looked into (i.e. where new creates a smart pointer). There are several D libraries that add smart pointers. You can interface with any C library. Interfacing with C++ and even Objective-C is also supported to some degree, so that should cover you pretty well.
Does GC.disable only halt the garbage collection temporarily (preventing the GC thread from running) and GC.enable pick back up where it left off. (...)
"Collections may continue to occur in instances where the implementation deems necessary for correct program behaviour, such as during an out of memory condition."
[source]
So mostly, yes. You can also manually invoke collection during down-time.
Is there any way to enforce a pattern to not use GC consistently. (...) when I start writing my classes that do not use the GC I would like to (...)
Classes are always allocated on the GC and are reference types. Structs should be used instead. However, keep in mind that structs are value types, so by default they're copied when being moved. You can #disable the copy constructor if you don't like this behaviour, but then your struct won't be POD.
What you're probably looking for is #nogc, which is a function attribute that stops a function from using the GC. You can't mark a struct type as #nogc, but you can mark each of its methods as #nogc. Just keep in mind that #nogc code can't call GC code. There's also nothrow.
If you intend to never use GC, you ought to look into Better C. It's a D language setting that removes all of D's runtime, standard library (Phobos), GC and all GC-reliant features (namely associative arrays and exceptions) in favour of using C's runtime and the C Standard Library.
Is it possible to replace the GC in D (...)
Yes it is: https://dlang.org/spec/garbage.html#gc_registry
And you can configure the pre-existing GC to better suit your needs if you don't want to make your own GC.

Is it ok to have multiple threads writing the same values to the same variables?

I understand about race conditions and how with multiple threads accessing the same variable, updates made by one can be ignored and overwritten by others, but what if each thread is writing the same value (not different values) to the same variable; can even this cause problems? Could this code:
GlobalVar.property = 11;
(assuming that property will never be assigned anything other than 11), cause problems if multiple threads execute it at the same time?

The problem comes when you read that state back, and do something about it. Writing is a red herring - it is true that as long as this is a single word most environments guarantee the write will be atomic, but that doesn't mean that a larger piece of code that includes this fragment is thread-safe. Firstly, presumably your global variable contained a different value to begin with - otherwise if you know it's always the same, why is it a variable? Second, presumably you eventually read this value back again?
The issue is that presumably, you are writing to this bit of shared state for a reason - to signal that something has occurred? This is where it falls down: when you have no locking constructs, there is no implied order of memory accesses at all. It's hard to point to what's wrong here because your example doesn't actually contain the use of the variable, so here's a trivialish example in neutral C-like syntax:
int x = 0, y = 0;
//thread A does:
x = 1;
y = 2;
if (y == 2)
print(x);
//thread B does, at the same time:
if (y == 2)
print(x);
Thread A will always print 1, but it's completely valid for thread B to print 0. The order of operations in thread A is only required to be observable from code executing in thread A - thread B is allowed to see any combination of the state. The writes to x and y may not actually happen in order.
This can happen even on single-processor systems, where most people do not expect this kind of reordering - your compiler may reorder it for you. On SMP even if the compiler doesn't reorder things, the memory writes may be reordered between the caches of the separate processors.
If that doesn't seem to answer it for you, include more detail of your example in the question. Without the use of the variable it's impossible to definitively say whether such a usage is safe or not.

It depends on the work actually done by that statement. There can still be some cases where Something Bad happens - for example, if a C++ class has overloaded the = operator, and does anything nontrivial within that statement.
I have accidentally written code that did something like this with POD types (builtin primitive types), and it worked fine -- however, it's definitely not good practice, and I'm not confident that it's dependable.
Why not just lock the memory around this variable when you use it? In fact, if you somehow "know" this is the only write statement that can occur at some point in your code, why not just use the value 11 directly, instead of writing it to a shared variable?
(edit: I guess it's better to use a constant name instead of the magic number 11 directly in the code, btw.)
If you're using this to figure out when at least one thread has reached this statement, you could use a semaphore that starts at 1, and is decremented by the first thread that hits it.

I would expect the result to be undetermined. As in it would vary from compiler to complier, langauge to language and OS to OS etc. So no, it is not safe
WHy would you want to do this though - adding in a line to obtain a mutex lock is only one or two lines of code (in most languages), and would remove any possibility of problem. If this is going to be two expensive then you need to find an alternate way of solving the problem

In General, this is not considered a safe thing to do unless your system provides for atomic operation (operations that are guaranteed to be executed in a single cycle).
The reason is that while the "C" statement looks simple, often there are a number of underlying assembly operations taking place.
Depending on your OS, there are a few things you could do:
Take a mutual exclusion semaphore (mutex) to protect access
in some OS, you can temporarily disable preemption, which guarantees your thread will not swap out.
Some OS provide a writer or reader semaphore which is more performant than a plain old mutex.

Here's my take on the question.
You have two or more threads running that write to a variable...like a status flag or something, where you only want to know if one or more of them was true. Then in another part of the code (after the threads complete) you want to check and see if at least on thread set that status... for example
bool flag = false
threadContainer tc
threadInputs inputs
check(input)
{
...do stuff to input
if(success)
flag = true
}
start multiple threads
foreach(i in inputs)
t = startthread(check, i)
tc.add(t) // Keep track of all the threads started
foreach(t in tc)
t.join( ) // Wait until each thread is done
if(flag)
print "One of the threads were successful"
else
print "None of the threads were successful"
I believe the above code would be OK, assuming you're fine with not knowing which thread set the status to true, and you can wait for all the multi-threaded stuff to finish before reading that flag. I could be wrong though.

If the operation is atomic, you should be able to get by just fine. But I wouldn't do that in practice. It is better just to acquire a lock on the object and write the value.

Assuming that property will never be assigned anything other than 11, then I don't see a reason for assigment in the first place. Just make it a constant then.
Assigment only makes sense when you intend to change the value unless the act of assigment itself has other side effects - like volatile writes have memory visibility side-effects in Java. And if you change state shared between multiple threads, then you need to synchronize or otherwise "handle" the problem of concurrency.
When you assign a value, without proper synchronization, to some state shared between multiple threads, then there's no guarantees for when the other threads will see that change. And no visibility guarantees means that it it possible that the other threads will never see the assignt.
Compilers, JITs, CPU caches. They're all trying to make your code run as fast as possible, and if you don't make any explicit requirements for memory visibility, then they will take advantage of that. If not on your machine, then somebody elses.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string