How to define Empty Char in Delphi - string

Just for curiosity,
Why in Delphi, if we defined an empty char by:
a:Char;
a:='';
we get an error: Incompatible types: 'Char' and 'string'
However, if we placed
a:='a';
it will be fine?
Is it necessary to define an empty char by: a:=#0?

A char is a single (that is, exactly one) character. So 'a', '∫', and '⌬' are all OK, but not 'ab' (a two-character string), 'Hello World!' (a twelve-character string), or '' (a zero-character string).
However, the NULL character (#0) is a character like any other.
In addition, the character datatype is implemented as a word (in modern versions of Delphi), that is, as two bytes. If all these values 0, 1, ..., 2^16 - 1 are used for real characters, how in the world would you represent your 'empty char'?

There is no such thing as an empty char. A char has to have a value. It is an ordinal type, a simple value type. Just as an integer, say, always has a value, so does a char.
The value #0 is not an empty char, it is the character with value 0, commonly known as NUL.

Related

Without null char in second argument for strcpy() function is returning unexpected result. Why?

#include<stdio.h>
#include <string.h>
int main()
{
char a[5];
char b[2]="12";
strcpy(a,b);
printf("%s\n",a);
}
There is no null char in string b that is why the output is not as expected.output : 12#
Why the output is coming like this only?
Your program has undefined behavior.
Your array b contains { '1', '2' }. As you say, there is no null character in the array -- which means it doesn't contain a string.
strcpy's second argument must be a pointer to a string. You gave it a char* value that is not a pointer to a string.
In practice, strcpy will probably continue copying characters from the memory following b. That memory contains arbitrary garbage -- and even the attempt to access it has undefined behavior.
In a sense, you're lucky that you got output that is visibly garbage. If there had happened to be a null character immediately following your array in memory, and if your program didn't blow up trying to access it, it could have just printed 12, and you might not have known that your program is buggy.
If you want to correct your program, you can change
char b[2] = "12";
to
char b[] = "12";
The compiler will figure out how big b needs to be to hold the string (including the required terminating null character).
strcpy keeps copying until it hits a null character (byte with value 0x00). It copies whatever it encounters on the way. In your case, memory after the array b happens to contain a byte with value 0x40 ('#') and then a byte with value 0x00.

String Index Error (Julia)

I'm a Julia newbie. When I was testing out the language, I got this error.
First of all, I'm defining String b to "he§y".
Julia seems behaving strangely when I have "special" characters in a String...
When I'm trying to get the third character of b (it's supposed to be '§'), everything is OK
However when I'm trying to get the fourth character of b (it's supposed to be 'y'), a "StringIndexError" is thrown.
I don't believe the compiler could throw you the error. Do you mean a runtime error?
I know nothing about Julian language but the symptoms seems to be related to indexing of string is not based on code point, but to some encoding.
The document from Julia lang seems supporting my hypothesis:
https://docs.julialang.org/en/stable/manual/strings/
The built-in concrete type used for strings (and string literals) in Julia is String. This supports the full range of Unicode characters via the UTF-8 encoding. (A transcode function is provided to convert to/from other Unicode encodings.)
...
Conceptually, a string is a partial function from indices to characters: for some index values, no character value is returned, and instead an exception is thrown. This allows for efficient indexing into strings by the byte index of an encoded representation rather than by a character index, which cannot be implemented both efficiently and simply for variable-width encodings of Unicode strings.
Edit: Quoted from Julia document, which is an example demonstrating exact "problem" you are facing.
julia> s = "\u2200 x \u2203 y"
"∀ x ∃ y"
Whether these Unicode characters are displayed as escapes or shown as
special characters depends on your terminal's locale settings and its
support for Unicode. String literals are encoded using the UTF-8
encoding. UTF-8 is a variable-width encoding, meaning that not all
characters are encoded in the same number of bytes. In UTF-8, ASCII
characters – i.e. those with code points less than 0x80 (128) – are
encoded as they are in ASCII, using a single byte, while code points
0x80 and above are encoded using multiple bytes – up to four per
character. This means that not every byte index into a UTF-8 string is
necessarily a valid index for a character. If you index into a string
at such an invalid byte index, an error is thrown:
julia> s[1]
'∀': Unicode U+2200 (category Sm: Symbol, math)
julia> s[2]
ERROR: StringIndexError("∀ x ∃ y", 2)
[...]
julia> s[3]
ERROR: StringIndexError("∀ x ∃ y", 3)
Stacktrace:
[...]
julia> s[4]
' ': ASCII/Unicode U+0020 (category Zs: Separator, space)

Convert a char to upper case

I have a variable which contains a single char. I want to convert this char to upper case. However, the to_uppercase function returns a rustc_unicode::char::ToUppercase struct instead of a char.
Explanation
ToUppercase is an Iterator, that may yield more than one char. This is necessary, because some Unicode characters consist of multiple "Unicode Scalar Values" (which a Rust char represents).
A nice example are the so called ligatures. Try this for example (on playground):
let fi_upper: Vec<_> = 'fi'.to_uppercase().collect();
println!("{:?}", fi_upper); // prints: ['F', 'I']
The 'fi' ligature is a single character whose uppercase version consists of two letters/characters.
Solution
There are multiple possibilities how to deal with that:
Work on &str: if your data is actually in string form, use str::to_uppercase which returns a String which is easier to work with.
Use ASCII methods: if you are sure that your data is ASCII only and/or you don't care about unicode symbols you can use std::ascii::AsciiExt::to_ascii_uppercase which returns just a char. But it only changes the letters 'a' to 'z' and ignores all other characters!
Deal with it manually: Collect into a String or Vec like in the example above.
ToUppercase is an iterator, because the uppercase version of the character may be composed of several codepoints, as delnan pointed in the comments. You can convert that to a Vector of characters:
c.to_uppercase().collect::<Vec<_>>();
Then, you should collect those characters into a string, as ker pointed.

Why does Char have an instance for Bounded?

Why is there a maxBound of Char? If Char is character then why it is explained by numbers, and if it is not a number what does it mean?
> maxBound :: Char
'\1114111'
All characters, like all things in a computer, are ultimately just numbers. Char represents unicode characters, which are represented via numbers. You can convert between Char and Int values with ord and chr. E.g. the unicode value for a is 97, so ord 'a' is 97 and chr 97 is 'a'.
Char '\1114111' is the Char that represents the number 1114111, or 0x10FFFF, which is defined as a noncharacter. This is the largest value that is defined in Unicode, and is the largest that Haskell supports: '\1114112' will cause a compile error.
Character encodings are tricky. Behind the scenes, all characters are represented by numbers. The Unicode standard provides a set of "code points" which are simply numbers which map to a particular sequence of real characters. Unicode defines code points between 0 and 1114111 and so that's what you see when you try maxBound.
Char encodes Unicode code points as individual integers, which is somewhat inefficient. If you want an efficient encoding, use Text.
You're seeing \1114111 displayed because that's the code point that maxBound :: Char represents and there is no more efficient, meaningful way to display it. In particular, it's in the "Supplementary Private Use Area-B" of the Unicode standard which means that it's reserved for use outside of the scope of Unicode and thus has no standard meaning.
The Char data type represents Unicode values. These values are stored in the computer as numbers, and each number as a specific representation on the screen. For Char, the minimum value is 0 and the maximum value is 1114111.
An easier example is C in which the char type is equivalent to a 7-bit number corresponding to the ASCII table of characters and they can range in value from 0 to 127, although I believe it is legal to store an entire 8-bit byte in a char, giving you the values 0 through 255.
Remember, everything is a number to a computer. Some data types have representations that can be ordered and are finite, so they have a minimum value and a maximum value.
An example of a data type in Haskell that does not have a minimum or maximum value is Integer, since it can represent any integer value so long as you have enough RAM available.
It's helpful to look at the source of the Bounded Char instance itself. Characters are effectively numbers with representation and the bounds represent the bounds of the Unicode code points.
instance Bounded Char where
minBound = '\0'
maxBound = '\xffff'

(char)NULL , '\0' and 0 all mean the same thing in C's memset?

We are migrating a 32 bit application from rhel 5.3 to 6.4
We are getting an warning "Cast from pointer to integer of different size " on new system's memset.
Do (char)NULL, '\0' and 0 all mean the same thing in C's memset?
The following code is giving the warning in new environment.
#define NULLC (char)NULL
#define MAX_LEN 11
…
memset(process_name, NULLC, MAX_LEN + 1);
strncpy(process_name, "oreo", MAX_LEN);
The do not all mean the same thing, though they're likely to yield the same result.
(char)NULL converts the value of NULL, which is an implementation-defined null pointer constant, to char. The type of NULL may be int, or void*, or some other integer type. If it's of an integer type, the conversion is well defined and yields 0. If it's void*, you're converting a null pointer value to char, which has an implementation-defined result (which is likely, but not guaranteed, to be 0).
The macro NULL is intended to refer to a null pointer value, not a null character, which is a very different thing.
Your macro NULLC is not particularly useful. If you want to refer to a null character, just use the literal constant '\0'. (And NULLC is IMHO too easily confused with NULL.)
The other two constants, '\0' and 0, have exactly the same type (int) and value (zero).
(It's admittedly counterintutive that '\0' has type int rather than char. It's that way for historical reasons, and it rarely matters. In C++, character constants are of type char, but you asked about C.)
They all have the same value 0 but they don't mean the same thing.
(char)NULL - You are casting the value of NULL pointer to character with value 0
'\0' - End of string character with value 0 (NUL)
0 - 32 bit integer with value 0.
You are getting a warning because somewhere in your code you're likely using something like:
short somevar = NULL;
or something similar.
0 and '\0' are both the integer 0 (the type of a character literal is int, not char), so they are exactly equivalent. The second argument to memset is an int, from which only the low-order 8 bits will be used.
NULL is a different beast. It is a pointer type that is guaranteed by the standard to be different from any pointer to a real object. The standard does NOT say that this is done by giving it the value 0, though it must compare equal to zero. Also, it may be of different width than int, so passing it as the second argument to memset() might not compile.
In defining NULLC, you are casting NULL from a native pointer (64-bits, probably defined as (void*)0) to char (8-bits). If you wanted to declare NULLC, you should just do
#define NULLC 0
and do away with NULL and the (char). The formal argument to memset is int, not char.
0 = zero of int datatype
'\0' = (char)0 //null char
NULL = (void*)0 //null pointer
See how they are interlinked to each other. Gcc often gives warnings for all the typecast that are implicitly done by the compiler.
You are using
#define NULLC (char)NULL
.....
memset(process_name, NULLC, MAX_LEN + 1);
equivalent to:
memset(process_name, (char)NULL, MAX_LEN + 1);
equivalent to:
memset(process_name, '\0', MAX_LEN + 1);
You are passing char data (ie; '\0' ) as second parameter where "unsigned int" data is accepted. So compiler is converting it to unsigned int implicilty and thus giving typecast warning. You can simply ignore it or change it as:
memset(process_name, 0, MAX_LEN + 1);

Resources