PL/I & not printing on tab stop - mainframe

I have the following PL/I code:
declare 1 u union,
2 c character(1),
2 ci fixed binary(4) unsigned;
ci = data_mem(data_ptr);
put list (c);
What this does, is it takes an integer and outputs that as if it was an ascii/ebcdic value. So it shows characters. Sofar this works.
The problem now is that each character is printed at an 24 spaces interval, as if 3 TABs are inserted.
I tried converting c to a string first and then applying trim() but that did not help.
Any ideas?

This is the default PUT LIST behavior for a PRINT-attribute file. From the IBM Enterprise PL/I for z/OS Language Reference, under Stream-oriented Data Transmission -> LIST -> PUT list-directed (emphasis mine):
The values of the data-list items are converted to character representations (except for graphics) and transmitted to the data stream. A blank separates successive data values transmitted. For PRINT files, items are separated according to program tab settings (see “PRINT attribute”).
The next manual section discusses the PRINT attribute. Here we have
Data values transmitted by list- and data-directed data transmission are
automatically aligned on the left margin and on implementation-defined preset tab
positions.
Since you omitted FILE, your PUT is going to the default FILE(SYSPRINT). SYSPRINT is defined implicitly as FILE ENVIRONMENT(F RECSIZE(121)) OUTPUT PRINT STREAM (see Input and Output -> FILE Attribute -> File constant in the Language Reference, and Defining and Using Consecutive Data Sets -> Using PRINT files with stream I/O in the Programmer's Guide). IIRC, the defaults are every 24, which gives 5 tabs per line, compatible with the old 120-byte printers common in the early days of PL/I F in the late 1960s. This is modifiable by declaring a PLITABS structure (described in the previously mentioned manual section).
LIST- and DATA-directed I/O are intended to be quick & dirty I/O interfaces with little regard for formatting on output (but are very forgiving on input). EDIT is better for formatting output, but it does show a lot of its FORTRAN roots for input and output. Personally, for traditional reports using formatted output, and for record input, I would work with record I/O, which is analogous to standard COBOL I/O.

Related

Read substrings from a string containing multiplication [duplicate]

This question already has answers here:
'*' and '/' not recognized on input by a read statement
(2 answers)
Closed 4 years ago.
I am a scientist programming in Fortran, and I came up with a strange behaviour. In one of my programs I have a string containing several "words", and I want to read all words as substrings. The first word starts with an integer and a wildcard, like "2*something".
When I perform an internal read on that string, I expect to read all wods, but instead, the READ function repeatedly reads the first substring. I do not understand why, nor how to avoid this behaviour.
Below is a minimalist sample program that reproduces this behaviour. I would expect it to read the three substrings and to print "3*a b c" on the screen. Instead, I get "a a a".
What am I doing wrong? Can you please help me and explain what is going on?
I am compiling my programs under GNU/Linux x64 with Gfortran 7.3 (7.3.0-27ubuntu1~18.04).
PROGRAM testread
IMPLICIT NONE
CHARACTER(LEN=1024):: string
CHARACTER(LEN=16):: v1, v2, v3
string="3*a b c"
READ(string,*) v1, v2, v3
PRINT*, v1, v2, v3
END PROGRAM testread
You are using list-directed input (the * format specifier). In list-directed input, a number (n) followed by an asterisk means "repeat this item n times", so it is processed as if the input was a a a b c. You would need to have as input '3*a' b c to get what you want.
I will use this as another opportunity to point out that list-directed I/O is sometimes the wrong choice as its inherent flexibility may not be what you want. That it has rules for things like repeat counts, null values, and undelimited strings is often a surprise to programmers. I also often see programmers complaining that list-directed input did not give an error when expected, because the compiler had an extension or the programmer didn't understand just how liberal the feature can be.
I suggest you pick up a Fortran language reference and carefully read the section on list-directed I/O. You may find you need to use an explicit format or change your program's expectations.
Following the answer of #SteveLionel, here is the relevant part of the reference on list-directed sequential READ statements (in this case, for Intel Fortran, but you could find it for your specific compiler and it won't be much different).
A character string does not need delimiting apostrophes or quotation marks if the corresponding I/O list item is of type default character, and the following is true:
The character string does not contain a blank, comma (,), or slash ( / ).
The character string is not continued across a record boundary.
The first nonblank character in the string is not an apostrophe or a quotation mark.
The leading character is not a string of digits followed by an asterisk.
A nondelimited character string is terminated by the first blank, comma, slash, or end-of-record encountered. Apostrophes and quotation marks within nondelimited character strings are transferred as is.
In total, there are 4 forms of sequential read statements in Fortran, and you may choose the option that best fits your need:
Formatted Sequential Read:
To use this you change the * to an actual format specifier. If you know the length of the strings at advance, this would be as easy as '(a3,a2,a2)'. Or, you could come with a format specifier that matches your data, but this generally demands you knowing the length or format of stuff.
Formatted Sequential List-Directed:
You are currently using this option (the * format descriptor). As we already showed you, this kind of I/O comes with a lot of magic and surprising behavior. What is hitting you is the n*cte thing, that is interpreted as n repetitions of cte literal.
As said by Steve Lionel, you could put quotation marks around the problematic word, so it will be parsed as one-piece. Or, as proposed by #evets, you could split or break your string using the intrinsics index or scan. Another option could be changing your wildcard from asterisk to anything else.
Formatted Namelist:
Well, that could be an option if your data was (or could be) presented in the namelist format, but I really think it's not your case.
Unformatted:
This may not apply to your case because you are reading from a character variable, and an internal READ statement can only be formatted.
Otherwise, you could split your string by means of a function instead of a I/O operation. There is no intrinsic for this, but you could come with one without much trouble (see this thread for reference). As you may have noted already, manipulating strings in fortran is... awkward, at least. There are some libraries out there (like this) that may be useful if you are doing lots of string stuff in Fortran.

Reading from a binary file with a given data structure

I want to read the 'Last Traded Price' from the given binary file. How do I extract a specific data out of the file by using notations like 'hhl10s6sc'. I know I have to use the struct.unpack method, but where can I learn to write such formatting (with some illustrations) so that I can extract any data that I want from such a binary file.
The thing that is troubling me is the unpacking that the writer of the code (that I'm trying to understand) has written - 'hlhcl6s10s11s10s2s1s10s12schc' . I understood what 6s...12s mean, but what's the significance of the 'hlhcl' (5 characters in the beginning) and 'chc' (3 characters in the last). The writer has tried to retrieve the 'Last traded price' from the data structure.
If you could give some examples and/or some sources for the same, it'd be very helpful. Attached the image which contains the data structure of the given file.
This image shows the data structure
struct format strings are fields described in order. Every letter is a format character, so hlhcl translates to "short, long, short, character, long". This doesn't resemble the image you linked (which is a tad impractical as it's off-site and another step to look up), which starts with a single long and otherwise holds only strings. It might apply to a protocol wrapping that packet.

SUM not working 'Invalid or missing field format'

I have an input file in this format: (length 20, 10 chars and 10 numerics)
jname1 0000500006
bname1 0000100002
wname1 0000400007
yname1 0000000006
jname1 0000100001
mname1 0000500012
mname2 0000700013
In my jcl I have defined my sysin data as such:
SYSIN DATA *
SORT FIELDS=(1,1,CH,A)
SUM FIELDS=(11,10,FD)
DATAEND
*
It works fine as long as I don't add the sum fields so I'm wondering if I'm using the wrong format for my numerics seeing as I know they start at field 11 and have a length of 10 the format is the only thing that could be wrong.
As you might have already realised the point of this JCL is to just list the values but grouped by the first letter of the name (so for the example data and JCL I have given it would group the numeric for mname1 and mname2 together but leave the other records untouched).
I'm kind of new at this so I was wonder what I need for the format if my numerics are like that in the input file.
If new to DFSORT, get hold of the DFSORT Getting Started guide for your version of DFSORT (http://www-01.ibm.com/support/docview.wss?uid=isg3T7000080).
This takes your through all the basic operations with many examples.
The DFSORT Application Programming Guide describes everything you need to know, in detail. Again with examples. Appendix C of that document contains all the data-types available (note, when you tried to use FD, FD is not valid data-type, so probably a typo). There are Tables throughout the document listing what data-types are available where, if there is a particular limit.
For advanced techniques, consult the DFSORT Smart Tricks publication here: http://www-01.ibm.com/support/docview.wss?uid=isg3T7000094
You need to understand a bit more the way data is stored on a Mainframe as well.
Decimals (which can be "packed-decimal" or "zoned-decimal") do not contain a decimal-point. The decimal-point is implied. In high-level languages you tell the compiler where the decimal-point is (in a fixed position) and the compiler does the alignments for you. In Assembler, you do everything yourself.
Decimals are 100% accurate, as there are machine-instructions which act directly on packed-decimal data giving packed-decimal results.
A field which actually contains a decimal-point, cannot be directly used in arithmetic.
An unsigned field is treated as positive when used in any arithmetic.
The SUM statement supports a limited number of numeric definitions, and you have chosen the correct one. It does not matter that your data is unsigned.
If the format of the output from SUM is not what you want, look at OPTION ZDPRINT (or NOZDPRINT).
If you want further formatting, you can use OUTREC or OUTFIL.
As an option to using SUM, you can use OUTFIL reporting functions (especially, although not limited to, if you want a report). You can use SECTIONS and TRAILER3 with TOT/TOTAL.
Something to watch for with SUM (which is not a problem with the reporting features) is if any given one (or more) of your SUMmed fields exceed the field size. To continue to use SUM if that happens, you need to extend the field in INREC and then get SUM to use the new, sufficient, size.
After some trial and error I finally found it, appearantly the format I needed to use was the ZD format (zoned decimal, signed), so my sysin becomes this:
SYSIN DATA *
SORT FIELDS=(1,1,CH,A)
SUM FIELDS=(11,10,ZD)
DATAEND
*
even though my records don't contain any decimals and they are unsigned, I don't really get it so if someone knows why it's like that please go ahead and explain it to me.
For now the way I'm going to remember it is this: Z = symbol for real (meaning integers so no decimals)

How does punycode distinguish similar IRIs?

I've been looking into internationalised resource identifiers and there's one thing bugging me.
My understanding is that, for each label in a domain name (xyzzy.plugh.com has three labels, xyzzy, plugh and com), the following process is performed to translate it into ASCII representation so that it can be processed okay by all legacy software:
If it consists solely of ASCII characters, it's copied as is.
Otherwise:
First we output xn-- followed by all the ASCII characters (skipping non-ASCII).
Then, if the final character isn't -, we output - to separate the ASCII from non-ASCII.
Finally, we encode each of the non-ASCII characters using punycode so that they appear to be ASCII.
My question then is: how do we distinguish between the following two Unicode URIs?
http://aa☃.net/
http://☃aa.net/
It seems to me that both of these will encode to:
http://xn--aa-nfh.net/
simply because the sequencing information has been lost for the label as a whole.
Or am I missing something in the specification?
According to one punycode encoder, there are encoded differently:
aa☃.net -> xn--aa-gsx.net
☃aa.net -> xn--aa-esx.net
^
see here
The relevant RFC 3492 details why this is the case. First, it provides clues in the introduction:
Uniqueness: There is at most one basic string that represents a given extended string.
Reversibility: Any extended string mapped to a basic string can be recovered from that basic string.
That means there must be differentiable one-to-one mapping for every single basic/extended string pair.
Understanding how it differentiates the two possibilities requires an understanding of the decoder (the thing that turns the basic string back into an extended one, with all its Unicode glory) works.
The decoder begins by starting with just the basic string aa.net with a pointer to the first a, then applies a series of deltas, such as gsx or esx.
The delta actually encodes two things. The first is the number of non-insertions to be done and the second is the actual insertion.
So, gsx (the delta in aa☃.net) would encode two non-insertions (to skip the aa) followed by an insertion of ☃. The esx delta (for ☃aa.net) would encode zero non-insertions followed by an insertion of ☃.
That is how position is encoded into the basic strings.

Fortran `write (*, '(3G24.16)')` error

I have a Fortran file that must write these complicated numbers, basically I can't change these numbers:
File name: complicatedNumbers.f
implicit none
write (*,'(3G24.16)') 0.4940656458412465-323, 8.651144521298990, 495.6336980600139
end
It's then run with gfortran -o outa complicatedNumbers.f on my Ubuntu, but this error comes up:
Error: Expected expression in WRITE statement at (1)
I'm sure it has something to do with the complicated numbers because there are no errors if I change the three complicated numbers into simple numbers such as 11.11, 22.2, 33.3.
This is actually a stripped-down version of a complex Fortran file that contains many variables and links to other files. So ideally, the 3G24.16 should not be changed.
What does the 3G24.16 mean?
How can I fix it so that I can ultimately print out these numbers with ./outa?
There is nothing syntactically wrong in the snippet you've shown us. However, your use of a file name with the suffix .f makes me think that the compiler is assuming that your code is written in fixed form. That is the usual default behaviour of gfortran. If that is the case it probably truncates that line at about the last , which means that the compiler sees
write (*,'(3G24.16)') 0.4940656458412465-323, 8.651144521298990,
and raises the complaint you have shared with us. Either join us in the 21st Century and switch to free form source files, change .f to .f90 and see what fun ensues, or continue the line correctly with some character in column 6 of the next line.
As to what 3G24.16 means, refer to your favourite Fortran reference material under the heading of data edit descriptors, in particular the g data edit descriptor.
Oh, and if my bandying about of the terms fixed form source and free form source bamboozles you, read about them in your favourite Fortran reference material too.
Three errors in your program :
as you clearly use the Fortran fixed format, instructions are limited to 72 characters (132 in free format)
the number 0.4940656458412465-323 is probably not correctly written. The exponent character is missing. Try 0.4940656458412465D-323 instead. Here Fortran computes the substraction => 0.4940656458412465-323 is replaced by -322.505934354159. Notice that I propose the exponent D (double precision). Writing 0.4940656458412465E-323 is inaccurate because, for a single precision number, the minimum value for the exponent is -127.
other numbers should have an exponent D0 too because, in single precision, the number of significant digits do not exceed 6.
Possible correction, always in fixed format :
implicit none
write (*,'(3G24.16)') 0.4940656458412465D-323,
& 8.651144521298990d0,
& 495.6336980600139d0
end

Resources