Atari ST GFA basic : what do variable suffixes correspond to? - basic

I'm dusting off my Atari ST 520, and am trying to understand some semantic details of the GFA basic. The TYPE(ptr) function is documented this way :
Determines the type of the variable at which a pointer
is set.
'ptr' is an integer expression (usually *var).
TYPE(ptr) returns a code according to the type of
variable to which 'ptr' is pointing.
0=var
1=var$
2=var%
3=var!
4=var()
5=var$()
6=var%()
7=var!()
The same documentation does not talk about what these suffixes mean. (It must be so obvious)
I seem to recall that $ is a string/memory block, % an integer, () an array of the same. What are ! and nothing? ! seems to be used for 0/1 variables.

That's correct $ string, % integer, ! boolean , and nothing (0) is double.
http://www.atari-forum.com/wiki/index.php?title=GFAvariablestutorial

The final version of the manual states:
Boolean ! 1 byte (1 bit in arrays) 0 or -1 (FALSE or TRUE)
Byte | 1 byte 0 to 255
Word & 2 bytes -32768 to 32767
Long % 4 bytes -2147483648 to 2147483647
Float # 8 bytes 2.225073858507E-308 to 3.595386269725E+308
String $ 0 to 32767 bytes ASCII value 0 to 255 for each character
The default variable type doesn't display a suffix and can be changed.

Related

Binary Formatting Variables in TCL

I am trying to create a binary message to send over a socket, but I'm having trouble with the way TCL treats all variables as strings. I need to calculate the length of a string and know its value in binary.
set length [string length $message]
set binaryMessagePart [binary format s* { $length 0 }]
However, when I run this I get the error 'expected integer but got "$length"'. How do I get this to work and return the value for the integer 5 and not the char 5?
To calculate the length of a string, use string length. To calculate the length of a string in a particular encoding, convert the string to that encoding and use string length:
set enc "utf-8"; # Or whatever; you need to know this ahead of time for sanity's sake
set encoded [encoding convertto $enc $message]
set length [string length $encoded]
Note that with the encoded length, this will be in bytes whereas the length prior to encoding is in characters. For some messages and some encodings, the difference can be substantial.
To compose a binary message with the length and the body of the message (a fairly common binary format), use binary format like this:
# Assumes the length is big-endian; for little-endian, use i instead of I
set binPart [binary format "Ia*" $length $encoded]
What you were doing wrong was using s* which consumes a list of integers and produces a sequence of little-endian short integer binary values in the output string, and yet were feeding the list that was literally $length 0; and the string $length is not an integer as those don't start with $. We could have instead done [list $length 0] to produce the argument to s* and that would have worked, but that doesn't seem quite right for the context of the question.
In binary format, these are the common formats (there are many more):
a is for string data (mnemonically “ASCII”); this is binary string data, and you need to encode it first.
i and I are for 32-bit numbers (mnemonically “int” like in many programming languages, but especially C). Upper case is big-endian, lower case is little-endian.
s and S are for 16-bit numbers (mnemonically “short”).
c is for 8-bit numbers (mnemonically “char” from C).
w and W are for 64-bit numbers (mnemonically “wide integers”).
f and d are for IEEE binary floating point numbers (mnemonically “float” and “double” respectively, so 4 and 8 bytes).
All can be followed by an optional length, either a number or a *. For the number ones, instead of inserting a single number they insert a list of them (and so consume a list); numbers give fixed lengths, and * does “all the list”. For the string format indicator, a number uses a fixed number of bytes in the message (truncating or padding with zero bytes as necessary) and * does “all the string” (never truncating or padding).

Strings, integers, data types

After several years of writing code for my own use, I'm trying to understand what does it really mean.
a = "Foo"
b = ""
c = 5
d = True
a - string variable. "Foo" (with quotes) - string literal, i.e. an entity of the string data type.
b - string variable. "" - empty string.
c - integer variable. 5 - integer literal, i.e. an entity of the integral data type.
d - Boolean variable. True - Boolean value, i.e. an entity of the Boolean data type.
Questions:
Is my understanding is correct?
It seems that 5 is an integer literal, which is an entity of the integral data type. "Integer" and "integral": for what reason we use different words here?
What is the "string" and "integer"?
As I understand from Wikipedia, "string" and "integer" are not the same thing as string/integer literals or data types. In other words, there are 3 pairs or terms:
string literal, integer literal
string data type, integer data type
string, integer
Firstly, a literal value is any value which appears literally in code, e.g "hello" is a string literal, 123 is an integer literal, etc. In contrast for example:
int a = 5;
int b = 2;
int c = a + b;
a and b have literal values assigned to them, but c does not, it has a computed value assigned to it.
With any literal value we describe the literal value with it's data type ( as in the first sentence ), e.g. "string literal" or "integer literal".
Now a data type refers to how the computer, or the software running on the computer, interprets the binary value of some data. For most kinds of data, the interpretation of the bytes is typically defined in a standard. utf-8 for example is one way to interpret the bytes of a string's internal (binary) value. Interestingly, the actual bytes of a string are treated as unsigned, 8-bit integers. In utf-8, the values of those integers are combined in various ways to determine which glyph, or character, should appear on the screen when those values are encountered in the data. utf-8 is a variable-byte-length encoding which can have between 1 and 4 bytes per character ( 8 to 32-bit ).
For numbers, particularly integers, implementations can vary, but most representations use four bytes with the most significant byte first in order, and the first bit of the first byte as the sign, with signed integers, or the first bit is simply the most significant bit for unsigned integers. This is referred to as big-endian ordering of bytes in a multi-byte integer. There is also little-endian encoding, and integers can in principle use any number of bytes, but the most typically implemented are 1, 2, 4 and sometimes 8, where bit-wise you have 8, 16, 32 or 64, respectively. For integer sizes that are not of these sizes, typically requires a custom implementation.
For floating point numbers it gets a bit more tricky. There is a common standard for floating point numbers called IEEE-754 which describes how floats are encoded. Likewise for floats, there are different sizes and variations, but primarily we use 16, 32, 64 and sometimes 24-bit in some mobile device graphics implementations. There are also extended precision floats which use 40 or 80 bits.

Check whether a specific bit is set in Python 3

I have two bytes in:
b'T'
and
b'\x40' (only bit #6 is set)
In need to perform a check on the first byte to see if bit # 6 is set. For example, on [A-Za-9] it would be set, but on all some characters it would not be set.
if (b'T' & b'\x40') != 0:
print("set");
does not work ...
Byte values, when indexed, give integer values. Use that to your advantage:
value = b'T'
if value[0] & 0x40:
print('set')
You cannot use the & operator on bytes, but it works just fine on integers.
See the documentation on the bytes type:
While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers, with each value in the sequence restricted such that 0 <= x < 256[.]
…
Since bytes objects are sequences of integers (akin to a tuple), for a bytes object b, b[0] will be an integer[.]
Note that non-zero numbers always test as true in a boolean context, there is no need to explicitly test for != 0 here either.
You are looking for the ord built-in function, which converts single-character strings (byte or unicode) to the corresponding numeric codepoint.
if ord(b'T') & 0x40:
print("set")

(char)NULL , '\0' and 0 all mean the same thing in C's memset?

We are migrating a 32 bit application from rhel 5.3 to 6.4
We are getting an warning "Cast from pointer to integer of different size " on new system's memset.
Do (char)NULL, '\0' and 0 all mean the same thing in C's memset?
The following code is giving the warning in new environment.
#define NULLC (char)NULL
#define MAX_LEN 11
…
memset(process_name, NULLC, MAX_LEN + 1);
strncpy(process_name, "oreo", MAX_LEN);
The do not all mean the same thing, though they're likely to yield the same result.
(char)NULL converts the value of NULL, which is an implementation-defined null pointer constant, to char. The type of NULL may be int, or void*, or some other integer type. If it's of an integer type, the conversion is well defined and yields 0. If it's void*, you're converting a null pointer value to char, which has an implementation-defined result (which is likely, but not guaranteed, to be 0).
The macro NULL is intended to refer to a null pointer value, not a null character, which is a very different thing.
Your macro NULLC is not particularly useful. If you want to refer to a null character, just use the literal constant '\0'. (And NULLC is IMHO too easily confused with NULL.)
The other two constants, '\0' and 0, have exactly the same type (int) and value (zero).
(It's admittedly counterintutive that '\0' has type int rather than char. It's that way for historical reasons, and it rarely matters. In C++, character constants are of type char, but you asked about C.)
They all have the same value 0 but they don't mean the same thing.
(char)NULL - You are casting the value of NULL pointer to character with value 0
'\0' - End of string character with value 0 (NUL)
0 - 32 bit integer with value 0.
You are getting a warning because somewhere in your code you're likely using something like:
short somevar = NULL;
or something similar.
0 and '\0' are both the integer 0 (the type of a character literal is int, not char), so they are exactly equivalent. The second argument to memset is an int, from which only the low-order 8 bits will be used.
NULL is a different beast. It is a pointer type that is guaranteed by the standard to be different from any pointer to a real object. The standard does NOT say that this is done by giving it the value 0, though it must compare equal to zero. Also, it may be of different width than int, so passing it as the second argument to memset() might not compile.
In defining NULLC, you are casting NULL from a native pointer (64-bits, probably defined as (void*)0) to char (8-bits). If you wanted to declare NULLC, you should just do
#define NULLC 0
and do away with NULL and the (char). The formal argument to memset is int, not char.
0 = zero of int datatype
'\0' = (char)0 //null char
NULL = (void*)0 //null pointer
See how they are interlinked to each other. Gcc often gives warnings for all the typecast that are implicitly done by the compiler.
You are using
#define NULLC (char)NULL
.....
memset(process_name, NULLC, MAX_LEN + 1);
equivalent to:
memset(process_name, (char)NULL, MAX_LEN + 1);
equivalent to:
memset(process_name, '\0', MAX_LEN + 1);
You are passing char data (ie; '\0' ) as second parameter where "unsigned int" data is accepted. So compiler is converting it to unsigned int implicilty and thus giving typecast warning. You can simply ignore it or change it as:
memset(process_name, 0, MAX_LEN + 1);

Versatile string manipulation procedure in RPG

Another thing in RPG I'm never sure how to do right: writing string manipulation functions/procedures.
As most of the time strings in RPG have a fixed length (at least with our programs), and maybe more important they always have a finite length, I'm always kind of lost when I want to write a procedure for general string manipulation.
How do I write a procedure of that takes care of a string of whatever length? Is there a problem, if I do it function style (as in text = manip_str(text);)? Does it work at all with different lengths if I'm manipulating the argument directy (like manip_str(text);)?
I'll post my own attempt as an answer, but there are some issues there, that I'm not sure about. How do you do it, as I'm sure many had a task like that once or a thousand times. Different approaches wellcome, best if you mention the issues with those approaches.
Before you ask: I have this issue for (EBCDIC) byte strings as for (UTF-16) unicode strings. But I can live with having a procedure twice, once for each.
Most character variables in RPG are indeed fixed length. Which means finite length. A character defined as 50a will always contain exactly 50 characters. Eval myChar = 'A'; will result in myChar containing 50 characters: the letter A followed by 49 blanks. This is boring but important.
The second boring but important bit is to understand that the caller allocates memory, not the callee. If the caller declares myChar 50a and the callee declares myParm 65535a, the caller has initialised only 50 bytes of storage. If the callee tries to work with myParm past the 50th byte, it is working with storage whose condition is unknown. As they say, unpredictable results may occur.
This then is the background to your question about a subprocedure handling character variables whose size is not known to the sub-procedure in advance. The classic way to handle this is to pass not only the character variable, but also its length. eval myProcedure(myChar: %len(myChar)); That's kind of ugly, and it forces every caller to calculate the length of myChar. It sure would be nice if the subprocedure could interrogate the incoming parameter to find how the caller defined it.
IBM have provided just such a facility through something they call Operational Descriptors. With operational descriptors, the caller passes metadata about the character parameter to the callee. One retrieves that via the CEEDOD API. There's an example of using CEEDOD here.
Basically, the subprocedure needs to declare that it wants operational descriptors:
dddeCheck pr n opdesc
d test 20a const options(*varsize)
The caller then makes a normal looking call out to the subprocedure:
if ddeCheck(mtel) = *on; // 10 bytes
...
endif;
if ddeCheck(mdate: *on) = *on; // 6 bytes
...
endif;
Note that the caller passes different sized fixed length variables to the subprocedure.
The subprocedure needs to use CEEDOD to interrogate the incoming parameter's length:
dddeCheck pi n opdesc
d test 20a const options(*varsize)
...
dCEEDOD pr
d parmNum 10i 0 const
d descType 10i 0
d dataType 10i 0
d descInfo1 10i 0
d descInfo2 10i 0
d parmLen 10i 0
d ec 12a options(*omit)
d parmNum s 10i 0
d descType s 10i 0
d dataType s 10i 0
d descInfo1 s 10i 0
d descInfo2 s 10i 0
d parmLen s 10i 0
d ec s 12a
...
CEEDOD (1: descType: dataType: descinfo1: descinfo2: parmlen: *omit);
At this point, parmlen contains the length that the caller has defined the incoming variable as being. Now it's up to us to do something with that information. If we're processing character by character, we need to do something like this:
for i = 1 to parmLen;
char_test = %subst(test: i: 1);
...
endfor;
If we're processing as a single string, we need to do something like this:
returnVar = %xlate(str_lc_letters_c: str_uc_letters_c: %subst(s: 1: parmLen));
The important thing is to never, ever refer to the input parameter unless that reference is somehow bounded by the actual variable length as defined by the caller. These precautions are only necessary for fixed length variables. The compiler already knows the length of variable length character variables.
On the subject of the way the compiler maps myFixed to myVarying via CONST, understand how that works. The compiler will copy all of the bytes from myFixed into MyVarying - all of them. If myFixed is 10a, myVarying will become 10 bytes long. if myFixed is 50a, myVarying will become 50 bytes long. Trailing blanks are always included because they are a part of every fixed length character variable. Those blanks aren't really important for a translate procedure, one that ignores blanks, but they might be important for a procedure that centers a string. In this case, you'd need to resort to operational descriptors or do something like upperVary = str_us(%trimr(myFixed));
The most flexible way of string passing in RPG that I found works with 64k-varlength strings and passing with *varsize (It's supposed to actually only send the number of bytes in the string passed, so the 64k should not be a problem – I think I found that somewhere suggested by Scott Klement). Here how I would write an A-Z only upcase function with that (as it is a most basic example):
* typedefs:
Dstr_string_t S 65535A VARYING TEMPLATE
* constants:
Dstr_uc_letters_c C 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
Dstr_lc_letters_c C 'abcdefghijklmnopqrstuvwxyz'
* prototype:
Dstr_uc PR like(str_string_t)
D s like(str_string_t)
D options(*varsize) const
* implementation:
Pstr_uc B export
D PI like(str_string_t)
D s like(str_string_t)
D options(*varsize) const
/free
return %xlate(str_lc_letters_c:str_uc_letters_c:s);
/end-free
Pstr_uc E
Now there are multiple things that concern me here:
Could there be some problems with fixed length strings that I pass to this?
Does this "only as many bytes as needed are passed" work for the return value as well? I Would hate to have thousands of bytes reserved and passed around everytime I want to upcase 3 char string.
It's only flexible upto 64k bytes. But I think thats more theoretically an issue with our programs – at least for now...

Resources