how to write decimal numbers as an atomic write in F#? - multithreading

the decimal type takes 128 bits, so it is not naturally an atomic write.
I tried:
Interlocked.Exchange(ref myField, some new value)
but then I saw that decimal is not part of the supported types with Interlocked.Exchange.
I was thinking that doing a lock may be a little bit heavy for this write. Are there any other options?

As you said, Interlocked.Exchange can only work with 32bit or 64bit values, so it does not support decimal. Aside from using locks, one suggestion from a related C# StackOverflow post is to wrap the value in an object and then use Interlocked.Exchange to swap the object references. In F#, this would look like this:
type Boxed<'T when 'T : struct>(v:'T) =
member x.Value = v
let mutable d1 = Boxed(1M)
let d2 = Boxed(2M)
Interlocked.Exchange(&d1, d2)
The question is whether the overhead of an additional instance is greater than the overhead of using lock - I think this will depend on your specific case. If you have just a few decimals that you're working with, the extra objects may not be such a big deal, but you'll probably need to run some tests to find out.

Related

How do you approach creating a complete new datatype on the "bit-level"?

I would like to create a new data type in Rust on the "bit-level".
For example, a quadruple-precision float. I could create a structure that has two double-precision floats and arbitrarily increase the precision by splitting the quad into two doubles, but I don't want to do that (that's what I mean by on the "bit-level").
I thought about using a u8-array or a bool-array but in both cases, I waste 7 bits of memory (because also bool is a byte large). I know there are several crates that implement something like bit-arrays or bit-vectors, but looking through their source code didn't help me to understand their implementation.
How would I create such a bit-array without wasting memory, and is this the way I would want to choose when implementing something like a quad-precision type?
I don't know how to implement new data types that don't use the basic types or are structures that combine the basic types, and I haven't been able to find a solution on the internet yet; maybe I'm not searching with the right keywords.
The question you are asking has no direct answer: Just like any other programming language, Rust has a basic set of rules for type layouts. This is due to the fact that (most) real-world CPUs can't address individual bits, need certain alignments when referencing memory, have rules regarding how pointer arithmetic works etc. etc.
For instance, if you create a type of just two bits, you'll still need an 8-bit byte to represent that type, because there is simply no way to address two individual bits on most CPU's opcodes; there is also no way to take the address of such a type because addressing works at least on the byte-level. More useful information regarding this can be found here, section 2, The Anatomy of a Type. Be aware that the non-wasting bit-level type you are thinking about needs to fulfill all the rules mentioned there.
It's a perfectly reasonable approach to represent what you want to do e.g. either as a single, wrapped u128 and implement all arithmetic on top of that type. Another, more generic, approach would be to use a Vec<u8>. You'll always do a relatively large amount of bit-masking, indirecting and such.
Having a look at rust_decimal or similar crates might also be a good idea.

How to implement efficient string interning in f#?

What is to implement a custom string type in f# for interning strings. i have to read large csv files into memory. Given most of the columns are categorical, values are repeating and it makes sense to create new string first time it is encountered and only refer to it on subsequent occurrences to save memory.
In c# I do this by creating a global intern pool (concurrent dict) and before setting a value, lookup the dictionary if it already exists. if it exists, just point to the string already in the dictionary. if not, add it to the dictionary and set the value to the string just added to dictionary.
New to f# and wondering what is the best way to do this in f#. will be using the new string type in records named tuples etc and it will have to work with concurrent processes.
Edit:
String.Intern uses the Intern Pool. My understanding is, it is not very efficient for large pools and is not garbage collected i.e. any/all interned strings will remain in intern pool for lifetime of the app. Imagine a an application where you read a file, perform some operations and write data. Using Intern Pool solution will probably work. Now imagine you have to do the same 100 times and the strings in each file have little in common. If the memory is allocated on heap, after processing each file, we can force garbage collector to clear unnecessary strings.
I should have mentioned I could not really figure out how to do the C# approach in F# (other than implementing a C# type and using it in F#)
Memorisation pattern is slightly different from what I am looking for? We are not caching calculated results - we are ensuring each string object is created no more than once and all subsequent creations of same string are just references to the original. Using a dictionary to do this is a one way and using String.Intern is other.
sorry if is am missing something obvious here.
I have a few things to say, so I'll post them as an answer.
First, I guess String.Intern works just as well in F# as in C#.
let x = "abc"
let y = StringBuilder("a").Append("bc").ToString()
printfn "1 : %A" (LanguagePrimitives.PhysicalEquality x y) // false
let y2 = String.Intern y
printfn "2 : %A" (LanguagePrimitives.PhysicalEquality x y2) // true
Second, are you using a dictionary in combination with String.Intern in your C# solution? If so, why not just do s = String.Intern(s); after the string is ready following input from file?
To create a type for use in your business domain to handle string deduplication in general is a very bad idea. You don't want your business domain polluted by that kind of low level stuff.
As for rolling your own. I did that some years ago, probably to avoid that problem you mentioned with the strings not being garbage collected, but I never tested if that actually was a problem.
It might be a good idea to use a dictionary (or something) for each column (or type of column) where the same values are likely to repeat in great numbers. (This is pretty much what you said already.)
It makes sense to only keep these dictionaries live while you read the information from file, and stuff it into internal data structures. You might be thinking that you need the dictionaries for subsequent reads, but I am not so sure about that.
The important thing is to deduplicate the great majority of strings, and not necessarily every single duplicate. Because of this you can greatly simplify the solution as indicated. You most probably have nothing to gain by overcomplicating your solution to squeeze out the last fraction of memory savings.
Releasing the dictionaries after the file is read and structures filled, will have the advantage of not holding on to strings when they are no longer really needed. And of course you save memory by not holding onto the dictionaries.
I see no need to handle concurrency issues in the implementation here. String.Intern must necessarily be immune to concurrency issues. If you roll your own with the design suggested, you would not use it concurrently. Each file being read would have its own set of dictionaries for its columns.

What does Int use three bits for? [duplicate]

Why is GHC's Int type not guaranteed to use exactly 32 bits of precision? This document claim it has at least 30-bit signed precision. Is it somehow related to fitting Maybe Int or similar into 32-bits?
It is to allow implementations of Haskell that use tagging. When using tagging you need a few bits as tags (at least one, two is better). I'm not sure there currently are any such implementations, but I seem to remember Yale Haskell used it.
Tagging can somewhat avoid the disadvantages of boxing, since you no longer have to box everything; instead the tag bit will tell you if it's evaluated etc.
The Haskell language definition states that the type Int covers at least the range [−229, 229−1].
There are other compilers/interpreters that use this property to boost the execution time of the resulting program.
All internal references to (aligned) Haskell data point to memory addresses that are multiple of 4(8) on 32-bit(64-bit) systems. So, references need only 30bits(61bits) and therefore allow 2(3) bits for "pointer tagging".
In case of data, the GHC uses those tags to store information about that referenced data, i.e. whether that value is already evaluated and if so which constructor it has.
In case of 30-bit Ints (so, not GHC), you could use one bit to decide if it is either a pointer to an unevaluated Int or that Int itself.
Pointer tagging could be used for one-bit reference counting, which can speed up the garbage collection process. That can be useful in cases where a direct one-to-one producer-consumer relationship was created at runtime: It would result directly in memory reuse instead of a garbage collector feeding.
So, using 2 bits for pointer tagging, there could be some wild combination of intense optimisation...
In case of Ints I could imagine these 4 tags:
a singular reference to an unevaluated Int
one of many references to the same possibly still unevaluated Int
30 bits of that Int itself
a reference (of possibly many references) to an evaluated 32-bit Int.
I think this is because of early ways to implement GC and all that stuff. If you have 32 bits available and you only need 30, you could use those two spare bits to implement interesting things, for instance using a zero in the least significant bit to denote a value and a one for a pointer.
Today the implementations don't use those bits so an Int has at least 32 bits on GHC. (That's not entirely true. IIRC one can set some flags to have 30 or 31 bit Ints)

Bad practice to initialize and calculate in a single variable? Visual C++

Is it bad practice to initialize and do arithmetic within a variable? i.e say I have multiple rooms of different area dimensions that I must find the area of:
(example in feet)
double room_area1 = 9.5 * 6.8;
double room_area2 = 9.1 * 6.2;
double room_area3 = 10.0 * 7.1;
Or is it best to do:
double room_area1 = 9.5;
room_area1 = room_area1 * 6.8;
Is there any disparity between the two ways here or is it the same thing and just a matter of style?
First involves only one operation: Initialization,
While second involves two operations: Initialization + Assignment.
For an intrinsic data type like double the overhead is negligible but for user defined data types second is detrimental to performance(How much? Profiling should tell that).
So in general it is better practice to use First because:
It is guaranteed to be atleast as fast if not faster than Second
It is more readable.
The first way is better. Reason: it's more readable.
While it is correct, that these constructions are semantically different, for simple types the compiler will almost certainly optimize the second case to be like the first case, and generate the same machine code, storing a constant (calculated at compile-time) into room_area1.
There's nothing wrong with the first examples. In fact it's better that way because you can declare the variable const.

Creating strings in D without allocating memory?

Is there any typesafe way to create a string in D, using information only available at runtime, without allocating memory?
A simple example of what I might want to do:
void renderText(string text) { ... }
void renderScore(int score)
{
char[16] text;
int n = sprintf(text.ptr, "Score: %d", score);
renderText(text[0..n]); // ERROR
}
Using this, you'd get an error because the slice of text is not immutable, and is therefore not a string (i.e. immutable(char)[])
I can only think of three ways around this:
Cast the slice to a string. It works, but is ugly.
Allocate a new string using the slice. This works, but I'd rather not have to allocate memory.
Change renderText to take a const(char)[]. This works here, but (a) it's ugly, and (b) many functions in Phobos require string, so if I want to use those in the same manner then this doesn't work.
None of these are particularly nice. Am I missing something? How does everyone else get around this problem?
You have static array of char. You want to pass it to a function that takes immutable(char)[]. The only way to do that without any allocation is to cast. Think about it. What you want is one type to act like it's another. That's what casting does. You could choose to use assumeUnique to do it, since that does exactly the cast that you're looking for, but whether that really gains you anything is debatable. Its main purpose is to document that what you're doing by the cast is to make the value being cast be treated as immutable and that there are no other references to it. Looking at your example, that's essentially true, since it's the last thing in the function, but whether you want to do that in general is up to you. Given that it's a static array which risks memory problems if you screw up and you pass it to a function that allows a reference to it to leak, I'm not sure that assumeUnique is the best choice. But again, it's up to you.
Regardless, if you're doing a cast (be it explicitly or with assumeUnique), you need to be certain that the function that you're passing it to is not going to leak references to the data that you're passing to it. If it does, then you're asking for trouble.
The other solution, of course, is to change the function so that it takes const(char)[], but that still runs the risk of leaking references to the data that you're passing in. So, you still need to be certain of what the function is actually going to do. If it's pure, doesn't return const(char)[] (or anything that could contain a const(char)[]), and there's no way that it could leak through any of the function's other arguments, then you're safe, but if any of those aren't true, then you're going to have to be careful. So, ultimately, I believe that all that using const(char)[] instead of casting to string really buys you is that you don't have to cast. That's still better, since it avoids the risk of screwing up the cast (and it's just better in general to avoid casting when you can), but you still have all of the same things to worry about with regards to escaping references.
Of course, that also requires that you be able to change the function to have the signature that you want. If you can't do that, then you're going to have to cast. I believe that at this point, most of Phobos' string-based functions have been changed so that they're templated on the string type. So, this should be less of a problem now with Phobos than it used to be. Some functions (in particular, those in std.file), still need to be templatized, but ultimately, functions in Phobos that require string specifically should be fairly rare and will have a good reason for requiring it.
Ultimately however, the problem is that you're trying to treat a static array as if it were a dynamic array, and while D definitely lets you do that, you're taking a definite risk in doing so, and you need to be certain that the functions that you're using don't leak any references to the local data that you're passing to them.
Check out assumeUnique from std.exception Jonathan's answer.
No, you cannot create a string without allocation. Did you mean access? To avoid allocation, you have to either use slice or pointer to access a previously created string. Not sure about cast though, it may or may not allocate new memory space for the new string.
One way to get around this would be to copy the mutable chars into a new immutable version then slice that:
void renderScore(int score)
{
char[16] text;
int n = sprintf(text.ptr, "Score: %d", score);
immutable(char)[16] itext = text;
renderText(itext[0..n]);
}
However:
DMD currently doesn't allow this due to a bug.
You're creating an unnecessary copy (better than a GC allocation, but still not great).

Resources