Related
When to use which type of STRING in eiffel? I saw using READABLE_STRING_GENERAL and having to l_readable_string.out' to convert it to STRING`
READABLE_STRING_GENERAL is an ancestor of all variants of strings: mutable, immutable, 8-bit, 32-bit, so it can be used as a formal argument type when the feature can handle any string variant.
READABLE_STRING_32 is a good choice when the code handles Unicode, and can work with either mutable or immutable versions.
STRING_32 is a mutable Unicode variant. The code can change its value.
STRING is an alias for a string type that can either be STRING_8 or STRING_32. At the time of writing only a few libraries are adapted to handle the mapping of STRING to STRING_32. However, this mapping could become the default in the future to facilitate working with Unicode.
Regardless of the future, I would recommend using ..._STRING_32 to process strings. This way the code directly supports Unicode. The libraries are also updated in this direction. For example, io.put_string_32 can be used to print a Unicode string to the standard output (using the current locale).
Just as a follow-up (I know I am years late in posting anything).
Is there any typesafe way to create a string in D, using information only available at runtime, without allocating memory?
A simple example of what I might want to do:
void renderText(string text) { ... }
void renderScore(int score)
{
char[16] text;
int n = sprintf(text.ptr, "Score: %d", score);
renderText(text[0..n]); // ERROR
}
Using this, you'd get an error because the slice of text is not immutable, and is therefore not a string (i.e. immutable(char)[])
I can only think of three ways around this:
Cast the slice to a string. It works, but is ugly.
Allocate a new string using the slice. This works, but I'd rather not have to allocate memory.
Change renderText to take a const(char)[]. This works here, but (a) it's ugly, and (b) many functions in Phobos require string, so if I want to use those in the same manner then this doesn't work.
None of these are particularly nice. Am I missing something? How does everyone else get around this problem?
You have static array of char. You want to pass it to a function that takes immutable(char)[]. The only way to do that without any allocation is to cast. Think about it. What you want is one type to act like it's another. That's what casting does. You could choose to use assumeUnique to do it, since that does exactly the cast that you're looking for, but whether that really gains you anything is debatable. Its main purpose is to document that what you're doing by the cast is to make the value being cast be treated as immutable and that there are no other references to it. Looking at your example, that's essentially true, since it's the last thing in the function, but whether you want to do that in general is up to you. Given that it's a static array which risks memory problems if you screw up and you pass it to a function that allows a reference to it to leak, I'm not sure that assumeUnique is the best choice. But again, it's up to you.
Regardless, if you're doing a cast (be it explicitly or with assumeUnique), you need to be certain that the function that you're passing it to is not going to leak references to the data that you're passing to it. If it does, then you're asking for trouble.
The other solution, of course, is to change the function so that it takes const(char)[], but that still runs the risk of leaking references to the data that you're passing in. So, you still need to be certain of what the function is actually going to do. If it's pure, doesn't return const(char)[] (or anything that could contain a const(char)[]), and there's no way that it could leak through any of the function's other arguments, then you're safe, but if any of those aren't true, then you're going to have to be careful. So, ultimately, I believe that all that using const(char)[] instead of casting to string really buys you is that you don't have to cast. That's still better, since it avoids the risk of screwing up the cast (and it's just better in general to avoid casting when you can), but you still have all of the same things to worry about with regards to escaping references.
Of course, that also requires that you be able to change the function to have the signature that you want. If you can't do that, then you're going to have to cast. I believe that at this point, most of Phobos' string-based functions have been changed so that they're templated on the string type. So, this should be less of a problem now with Phobos than it used to be. Some functions (in particular, those in std.file), still need to be templatized, but ultimately, functions in Phobos that require string specifically should be fairly rare and will have a good reason for requiring it.
Ultimately however, the problem is that you're trying to treat a static array as if it were a dynamic array, and while D definitely lets you do that, you're taking a definite risk in doing so, and you need to be certain that the functions that you're using don't leak any references to the local data that you're passing to them.
Check out assumeUnique from std.exception Jonathan's answer.
No, you cannot create a string without allocation. Did you mean access? To avoid allocation, you have to either use slice or pointer to access a previously created string. Not sure about cast though, it may or may not allocate new memory space for the new string.
One way to get around this would be to copy the mutable chars into a new immutable version then slice that:
void renderScore(int score)
{
char[16] text;
int n = sprintf(text.ptr, "Score: %d", score);
immutable(char)[16] itext = text;
renderText(itext[0..n]);
}
However:
DMD currently doesn't allow this due to a bug.
You're creating an unnecessary copy (better than a GC allocation, but still not great).
Once I studied about the advantage of a string being immutable because of something to improve performace in memory.
Can anybody explain this to me? I can't find it on the Internet.
Immutability (for strings or other types) can have numerous advantages:
It makes it easier to reason about the code, since you can make assumptions about variables and arguments that you can't otherwise make.
It simplifies multithreaded programming since reading from a type that cannot change is always safe to do concurrently.
It allows for a reduction of memory usage by allowing identical values to be combined together and referenced from multiple locations. Both Java and C# perform string interning to reduce the memory cost of literal strings embedded in code.
It simplifies the design and implementation of certain algorithms (such as those employing backtracking or value-space partitioning) because previously computed state can be reused later.
Immutability is a foundational principle in many functional programming languages - it allows code to be viewed as a series of transformations from one representation to another, rather than a sequence of mutations.
Immutable strings also help avoid the temptation of using strings as buffers. Many defects in C/C++ programs relate to buffer overrun problems resulting from using naked character arrays to compose or modify string values. Treating strings as a mutable types encourages using types better suited for buffer manipulation (see StringBuilder in .NET or Java).
Consider the alternative. Java has no const qualifier. If String objects were mutable, then any method to which you pass a reference to a string could have the side-effect of modifying the string. Immutable strings eliminate the need for defensive copies, and reduce the risk of program error.
Immutable strings are cheap to copy, because you don't need to copy all the data - just copy a reference or pointer to the data.
Immutable classes of any kind are easier to work with in multiple threads, the only synchronization needed is for destruction.
Perhaps, my answer is outdated, but probably someone will found here a new information.
Why Java String is immutable and why it is good:
you can share a string between threads and be sure no one of them will change the string and confuse another thread
you don’t need a lock. Several threads can work with immutable string without conflicts
if you just received a string, you can be sure no one will change its value after that
you can have many string duplicates – they will be pointed to a single instance, to just one copy. This saves computer memory (RAM)
you can do substring without copying, – by creating a pointer to an existing string’s element. This is why Java substring operation implementation is so fast
immutable strings (objects) are much better suited to use them as key in hash-tables
a) Imagine StringPool facility without making string immutable , its not possible at all because in case of string pool one string object/literal e.g. "Test" has referenced by many reference variables , so if any one of them change the value others will be automatically gets affected i.e. lets say
String A = "Test" and String B = "Test"
Now String B called "Test".toUpperCase() which change the same object into "TEST" , so A will also be "TEST" which is not desirable.
b) Another reason of Why String is immutable in Java is to allow String to cache its hashcode , being immutable String in Java caches its hash code and do not calculate every time we call hashcode method of String, which makes it very fast as hashmap key.
Think of various strings sitting on a common pool. String variables then point to locations in the pool. If u copy a string variable, both the original and the copy shares the same characters. These efficiency of sharing outweighs the inefficiency of string editing by extracting substrings and concatenating.
Fundamentally, if one object or method wishes to pass information to another, there are a few ways it can do it:
It may give a reference to a mutable object which contains the information, and which the recipient promises never to modify.
It may give a reference to an object which contains the data, but whose content it doesn't care about.
It may store the information into a mutable object the intended data recipient knows about (generally one supplied by that data recipient).
It may return a reference to an immutable object containing the information.
Of these methods, #4 is by far the easiest. In many cases, mutable objects are easier to work with than immutable ones, but there's no easy way to share with "untrusted" code the information that's in a mutable object without having to first copy the information to something else. By contrast, information held in an immutable object to which one holds a reference may easily be shared by simply sharing a copy of that reference.
I know that NULL isn't necessary in a programming language, and I recently made the decision not to include NULL in my programming language. Declaration is done by initialization, so it is impossible to have an uninitialized variable. My hope is that this will eliminate the NullPointerException in favor of more meaningful exceptions or simply not having certain kinds of bugs.
Of course, since the language is implemented in C, there will be NULLs used under the covers.
My question is, besides using NULL as an error flag (this is handled with exceptions) or as an endpoint for data structures such as linked lists and binary trees (this is handled with discriminated unions) are there any other use-cases for NULL for which I should have a solution? Are there any really important implications of not having NULL which could cause me problems?
There's a recent article referenced on LtU by Tony Hoare titled Null References: The Billion Dollar Mistake which describes a method to allow the presence of NULLs in a programming language, but also eliminates the risk of referencing such a NULL reference. It seems so simple yet it's such a powerful idea.
Update: here's a link to the actual paper that I read, which talks about the implementation in Eiffel: http://docs.eiffel.com/book/papers/void-safety-how-eiffel-removes-null-pointer-dereferencing
Borrowing a page from Haskell's Maybe monad, how will you handle the case of a return value that may or may not exist? For instance, if you tried to allocate memory but none was available. Or maybe you've created an array to hold 50 foos, but none of the foos have been instantiated yet -- you need some way to be able to check for these kinds of things.
I guess you can use exceptions to cover all these cases, but does that mean that a programmer will have to wrap all of those in a try-catch block? That would be annoying at best. Or everything would have to return its own value plus a boolean indicating whether the value was valid, which is certainly not better.
FWIW, I'm not aware of any program that doesn't have some sort of notion of NULL -- you've got null in all the C-style languages and Java; Python has None, Scheme, Lisp, Smalltalk, Lua, Ruby all have nil; VB uses Nothing; and Haskell has a different kind of nothing.
That doesn't mean a language absolutely has to have some kind of null, but if all of the other big languages out there use it, surely there was some sound reasoning behind it.
On the other hand, if you're only making a lightweight DSL or some other non-general language, you could probably get by without null if none of your native data types require it.
The one that immediately comes to mind is pass-by-reference parameters. I'm primarily an Objective-C coder, so I'm used to seeing things kind of like this:
NSError *error;
[anObject doSomething:anArgumentObject error:&error];
// Error-handling code follows...
After this code executes, the error object has details about the error that was encountered, if any. But say I don't care if an error happens:
[anObject doSomething:anArgumentObject error:nil];
Since I don't pass in any actual value for the error object, I get no results back, and I don't really worry about parsing an error (since I don't care in the first place if it occurs).
You've already mentioned you're handling errors a different way, so this specific example doesn't really apply, but the point stands: what do you do when you pass something back by reference? Or does your language just not do that?
I think it's usefull for a method to return NULL - for example for a search method supposed to return some object, it can return the found object, or NULL if it wasn't found.
I'm starting to learn Ruby and Ruby has a very interesting concept for NULL, maybe you could consider implementing something silimar. In Ruby, NULL is called Nil, and it's an actual object just like any other object. It happens to be implemented as a global Singleton object. Also in Ruby, there is an object False, and both Nil and False evaluate to false in boolean expressions, while everything else evaluates to true (even 0, for example, evaluates to true).
In my mind there are two uses cases for which NULL is generally used:
The variable in question doesn't have a value (Nothing)
We don't know the value of the variable in question (Unknown)
Both of common occurrences and, honestly, using NULL for both can cause confusion.
Worth noting is that some languages that don't support NULL do support the nothing of Nothing/Unknown. Haskell, for instance, supports "Maybe ", which can contain either a value of or Nothing. Thus, commands can return (and accept) a type that they know will always have a value, or they can return/accept "Maybe " to indicate that there may not be a value.
I prefer the concept of having non-nullable pointers be the default, with nullable pointers a possibility. You can almost do this with c++ through references (&) rather than pointers, but it can get quite gnarly and irksome in some cases.
A language can do without null in the Java/C sense, for instance Haskell (and most other functional languages) have a "Maybe" type which is effectively a construct that just provides the concept of an optional null pointer.
It's not clear to me why you would want to eliminate the concept of 'null' from a language. What would you do if your app requires you to do some initialization 'lazily' - that is, you don't perform the operation until the data is needed? Ex:
public class ImLazy {
public ImLazy() {
//I can't initialize resources in my constructor, because I'm lazy.
//Maybe I don't have a network connection available yet, or maybe I'm
//just not motivated enough.
}
private ResourceObject lazyObject;
public ResourceObject getLazyObject() { //initialize then return
if (lazyObject == null) {
lazyObject = new DatabaseNetworkResourceThatTakesForeverToLoad();
}
}
public ResourceObject isObjectLoaded() { //just return the object
return (lazyObject != null);
}
}
In a case like this, how could we return a value for getObject()? We could come up with one of two things:
-require the user to initialize LazyObject in the declaration. The user would then have to fill in some dummy object (UselessResourceObject), which requires them to write all of the same error-checking code (if (lazyObject.equals(UselessResourceObject)...) or:
-come up with some other value, which works the same as null, but has a different name
For any complex/OO language you need this functionality, or something like it, as far as I can see. It may be valuable to have a non-null reference type (for example, in a method signature, so that you don't have to do a null check in the method code), but the null functionality should be available for cases where you do use it.
Interesting discussion happening here.
If I was building a language, I really don't know if I would have the concept of null. I guess it depends on how I want the language to look. Case in point: I wrote a simple templating language whose main strength is nested tokens and ease of making a token a list of values. It doesn't have the concept of null, but then it doesn't really have the concept of any types other than string.
By comparison, the langauge it is built-in, Icon, uses null extensively. Probably the best thing the language designers for Icon did with null is make it synonymous with an uninitialized variable (i.e. you can't tell the difference between a variable that doesn't exist and one that currently holds the value null). And then created two prefix operators to check null and not-null.
In PHP, I sometimes use null as a 'third' boolean value. This is good in "black-box" type classes (e.g. ORM core) where a state can be True, False or I Don't Know. Null is used for the third value.
Of course, both of these languages do not have pointers in the same way C does, so null pointers do not exist.
We use nulls all the time in our application to represent the "nothing" case. For example, if you are asked to look up some data in the database given an id, and no record matches that id: return null. This is very handy because we can store nulls in our cache, which means we don't have to go back to the database if someone asks for that id again in a few seconds.
The cache itself has two different kinds of responses: null, meaning there was no such entry in the cache, or an entry object. The entry object might have a null value, which is the case when we cached a null db lookup.
Our app is written in Java, but even with unchecked exceptions doing this with exceptions would be incredibly annoying.
If one accepts the propositions that powerful languages should have some sort of pointer or reference type (i.e. something which can hold a reference to data which does not exist at compile time), and some form of array type (or other means of having a collection of storage slots which are addressable sequentially via integer index), and that slots of the latter should be able to hold the former, and one accepts the possibility that one may have to read some slots of an array of pointers/references before sensible values exist for all of them, then there will be programs which, from a compiler's perspective, will read an array slot before a sensible value has been written to it (trying to ascertain in the general case whether an array slot could be read before it is written would be equivalent to the Halting Problem).
While it would be possible for a language to require that all array slots be initialized with some non-null reference before any of them could be read, in many situations there isn't really anything that could be stored which would be better than null: if an attempt is made to read an as-yet-unwritten array slot and dereference the (non)item contained there, that represents an error, and it would be better to have the system trap that condition than to access some arbitrary object whose sole purpose for existence is to give the array slots some non-null thing they can reference.
When we talk about strings as being mutable, is this synonymous with using the word 'changeable' or 'modifiable' or is there some additional nuance to explain why this jargon is used instead of a simpler word like 'modifiable'?
I think the word "mutable" is a good option for this.
If you used "modifiable", it would be less clear. For example, if your string is a heap allocated type, when you say you are modifying the "string", it's not clear whether you're changing the data on the heap (the string's contents) or the string's heap pointer.
However, "mutable" suggests that the string's actual data is changing, to me. I think it's due to it being the same root as to mutate. If something is mutating, it's not changing from looking at A to looking at B (ie: changing a pointer), but rather mutating itself, or becoming (iteratively) something it was not originally.
A mutable object is an object that can be modified after it has been created. So in a way, it is synonymous with "modifiable." The word "mutable" means "liable or subject to change or alteration." I think it sounds better than "modifiable."
mutable
"liable or subject to change or alteration."
modifiable
"to change somewhat the form or qualities of; alter partially;"
conversely
immutable
"not mutable; unchangeable; changeless"
unmodifiable
"incapable of being modified in form or character or strength"
In programming you more often hear the terms mutable and immutable than you do modifiable and unmodifiable. However I think it is safe to say that either way has the same meaning.
But when in Rome... so you should use mutable and immutable as they are the more commonly used terms (at least in my experience).
As to why that choice? I asked my mother in law (she is up on words :-) and from a non-programming point of view "mutable" is shorter then "modifiable"... seems likely enough of a reason to me.
Yes, mutable is the term for an object that is capable of being changed after it is created, whereas immutable refers to an object that cannot change. Mutable literally means "ability to mutate" which I think fits perfectly with how mutable objects behave.
Specifically in relation to strings, in a mutable string, if you change the first character from 'A' to 'B', you're still dealing with the same bytes in memory; one of them has just taken a different value. With immutable strings, i.e. in Java, a new object would be created at a new location in memory, because the original one cannot be changed.
Mutable is a decent word to describe that, and it has the advantage of not being loaded with real-world meanings or interpretations that would make it difficult to use precisely.
And since you didn't point out the specific context you're speaking of (and so please forgive me if this is scope creep)...
Strings in .NET specifically are immutable, not mutable. Once they're created, that's it. They never change. If change attempts are made, new strings are created and the old ones get released for garbage collection.
Here is a pretty simple explanation of mutable/immutable using java as an example:
Mutable
can be changed without reallocating memory
a StringBuffer api documentation is a mutable version the standard string class
to change the standard value of the String class you need to reallocate memory
I have never really heard of using the term modifiable.