How to concatenate multiple COBOL constants? - string

I was trying to write an ad-hoc source conversion tool that would correct EBCDIC->ASCII translation of source files that had binary character data embedded in their string literals (we are using a compiler that is Linux hosted, but targets the original EBCDIC runtime character set.) I was trying to deal with ascii sources that had messed up lines like:
10 FILLER PIC X(03) VALUE '^A^#<97>'.
whereas what was actually desired was
10 FILLER PIC X(03) VALUE X'010008'.
If that tool was fully general, I'd wanted it be able to handle replacement of something like:
10 FILLER PIC X(010) VALUE 'Hi^A^#<97>There'.
With that in mind I was hoping that there was a "V6.2 Enterprise COBOL" compiler compatible syntax (in case back porting to the original system was required) to concatenate multiple string constants into a single VALUE clause, something like the following:
IDENTIFICATION DIVISION.
PROGRAM-ID. FOO.
ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
DATA DIVISION.
FILE SECTION.
WORKING-STORAGE SECTION.
01 my-var PIC X(10) VALUE 'Hi' X'01' X'00' X'08' 'There'.
PROCEDURE DIVISION.
DISPLAY my-var
GOBACK
.
HOWEVER: After cobbling together an initial version of my tool, I found that in the ~150/7000 sources in the project that had binary character literals in them, the literals were uniformly unprintable in all but one single case (which I handled manually). So, the fully general concatenation operation that I thought would be desired was not actually required for this project (although it could be for other future projects.)
I'll leave this question open in case somebody is aware of an Enterprise COBOL compatible way of doing this string literal concatenation.

To concatenate multiple alphanumeric or national literals COBOL provides the concatenation operator & since COBOL 2002 (support may be missing in your compiler):
01 my-var PIC X(120) VALUE X"F0" & "hi there" & X"F1".
Some dialects (and COBOL2023) also provide an intrinsic function for concatenation of any type:
DISPLAY FUNCTION CONCAT (X"F0" "hi there" X"F1" " FROM " your-name-var).

Related

How to represent a missing xsd:dateTime in RDF?

I have a dataset with data collected from a form that contains various date and value fields. Not all fields are mandatory so blanks are possible and
in many cases expected, like a DeathDate field for a patient who is still alive.
How do I best represent these blanks in the data?
I represent DeathDate using xsd:dateTime. Blanks or empty spaces are not allowed. All of these are flagged as invalid when validating using Jena RIOT:
foo:DeathDate_1
a foo:Deathdate ;
time:inXSDDatetime " "^^xsd:dateTime .
foo:DeathDate_2
a foo:Deathdate ;
time:inXSDDatetime ""^^xsd:dateTime .
foo:DeathDate_3
a foo:Deathdate ;
time:inXSDDatetime "--"^^xsd:dateTime .
I prefer to not omit the triple because I need to know if it was blank on the source versus a conversion error during construction of my RDF.
What is the best way to code these missing values?
You should represent this by just omitting the triple. That's the meaning of a triple that's "not present": it's information that is (currently) unknown.
Alternatively, you can choose to give it the value "unknown"^^xsd:string when there's no death date. The solution in this case is to not datatype it as an xsd:dateTime, but just as a simple string. It doesn't have to be a string of course, you could use any kind of "special" value for this, e.g. a boolean false - just as long as it's a valid literal value that you can distinguish from actual death dates. This will solve the parsing problem, but IMHO if you do this, you are setting yourself up for headaches in processing the data further down the line (because you will need to ask queries over this data, and they will have to take two different types of values into account, plus the possibility that the field is missing).
I prefer to not omit the triple because I need to know if it was blank
on the source versus a conversion error during construction of my RDF.
This sounds like an XY problem. If there are conversion errors, your application should signal that in another way, e.g. by logging an error. You shouldn't try to solve this by "corrupting" your data.

Decoding Specification '(AA$)' For Write Statement

I'm confused as the what this write specification is trying to specify. N is an array of single characters. Could someone help me and explain the write format specification below. I saw someone post the exact same question a few days ago but the page is not there anymore.
WRITE(*,'(AA$)') N(I),","
The dollar sign in a format specifier suppresses a new line.
Therefore, the array N is written element-wise as a string (A) separated by a comma (second string A) one a single line.
Note that this syntax is not standard conforming, in modern Fortran you would write the format as
WRITE(*,'(2A)', advance='no') N(I),","

VC6 /r/n and Write works; Visual Studio 2013 does not work

the following code
if(!cfile.Open(fileName, CFile::modeCreate | CFile::modeReadWrite)){
return;
}
ggg.Format(_T("0 \r\n"));
cfile.Write(ggg, ggg.GetLength());
ggg.Format(_T("SECTION \r\n"));
cfile.Write(ggg, ggg.GetLength());
produces the following:
0 SECTI
clearly this is wrong: (a) \r\n is ignored, and (b) the word SECTION is cut off.
Can someone please tell me what I am doing wrong?
The same code without _T() in VC6 produces the correct results.
Thank you
a.
Apparently, you are building a Unicode build; CString (presumably that's what ggg is) holds a sequence of wchar_t characters, each two bytes large. ggg.GetLength() is the length of the string in characters.
However, CFile::Write takes the length in bytes, not in characters. You are passing half the number of bytes actually taken by the string, so only half the number of characters gets written.
Have you considered changing lines like:
cfile.Write(ggg, ggg.GetLength());
to`
cfile.Write(ggg, ggg.GetLength() * sizeof(TCHAR))
Write needs the number of bytes (not characters). Since Unicode is 2 bytes wide you need to account for that. sizeof(TCHAR) should be the number of bytes each character takes on a given platform. If it is built as Ansi it would be 1 and Unicode would have 2. Multiply that by the string length and the number of bytes should be correct.
Information on TCHAR can be found on MSDN documentation here. In particular it is defined as:
The _TCHAR data type is defined conditionally in Tchar.h. If the symbol _UNICODE is defined for your build, _TCHAR is defined as wchar_t; otherwise, for single-byte and MBCS builds, it is defined as char. (wchar_t, the basic Unicode wide-character data type, is the 16-bit counterpart to an 8-bit signed char.)
TCHAR and _TCHAR in your usage should be synonymous. However I believe these days Microsoft recommends including <tchar.h> and using _TCHAR. What I can't tell you is if _TCHAR existed on VC 6.0.
If using the method above - if you build using Unicode your output files will be in Unicode. If you build for Ansi it will be output as 8bit ASCII.
Want CFile.write to output Ascii no matter what? Read on...
If you want all text written to the file as 8bit ASCII you are going to have to use one of the macros for conversion. In particular CT2A. More on the macros can be found in this MSDN article. Each macro can be broken up by name, however CT2A says convert the Generic character string (equivalent to W when _UNICODE is defined, equivalent to A otherwise) to Ascii per the chart at the link. So no matter whether using Unicode or Ascii it would output Ascii. Your code would look something like:
ggg.Format(_T("0 \r\n"));
cfile.Write(CT2A(ggg), ggg.GetLength());
ggg.Format(_T("SECTION \r\n"));
cfile.Write(CT2A(ggg), ggg.GetLength());
Since the macro converts everything to Ascii CString's GetLength() will suffice.

How to store binary data in a Lua string

I needed to create a custom file format with embedded meta information. Instead of whipping up my own format I decide to just use Lua.
texture
{
format=GL_LUMINANCE_ALPHA;
type=GL_UNSIGNED_BYTE;
width=256;
height=128;
pixels=[[
<binary-data-here>]];
}
texture is a function that takes a table as its sole argument. It then looks up the various parameters by name in the table and forwards the call on to a C++ routine. Nothing out of the ordinary I hope.
Occasionally the files fail to parse with the following error:
my_file.lua:8: unexpected symbol near ']'
What's going on here?
Is there a better way to store binary data in Lua?
Update
It turns out that storing binary data is a Lua string is non-trivial. But it is possible when taking care with 3 sequences.
Long-format-string-literals cannot have an embedded closing-long-bracket (]], ]=], etc).
This one is pretty obvious.
Long-format-string-literals cannot end with something like ]== which would match the chosen closing-long-bracket.
This one is more subtle. Luckily the script will fail to compile if done wrong.
The data cannot embed \n or \r.
Lua's built in line-end processing messes these up. This problem is much more subtle. The script will compile fine but it will yield the wrong data. 0x13 => 0x10, 0x1013 => 0x10, etc.
To get around these limitations I split the binary data up on \r, \n, then pick a long-bracket that works, finally emit Lua that concats the various parts back together. I used a script that does this for me.
input: XXXX\nXX]]XX\r\nXX]]XX]=
texture
{
--other fields omitted
pixels= '' ..
[[XXXX]] ..
'\n' ..
[=[XX]]XX]=] ..
'\r\n' ..
[==[XX]]XX]=]==];
}
Lua is able to encode most characters in long bracket format including nulls. However, Lua opens the script file in text mode and this causes some problems. On my Windows system the following characters have problems:
Char code(s) Problem
-------------- -------------------------------
13 (CR) Is translated to 10 (LF)
13 10 (CR LF) Is translated to 10 (LF)
26 (EOF) Causes "unfinished long string near '<eof>'"
If you are not using windows than these may not cause problems, but there may be different text-mode based problems.
I was only able to produce the error you received by encoding multiple close brackets:
a=[[
]]] --> a.lua:2: unexpected symbol near ']'
But, this was easily fixed with the following:
a=[==[
]]==]
The binary data needs to be encoded into printable characters. The simplest method for decoding purposes would be to use C-like escape sequences for all bytes. For example, hex bytes 13 41 42 1E would be encoded as '\19\65\66\30'. Of course, then the encoded data is three to four times larger than the source binary.
Alternatively, you could use something like Base64, but that would have to be decoded at runtime instead of relying on the Lua interpreter. Personally, I'd probably go the Base64 route. There are Lua examples of Base64 encoding and decoding.
Another alternative would be have two files. Use a well defined image format file (e.g. TGA) that is pointed to by a separate Lua script with the additional metadata. If you don't want two files to move around then they could be combined in an archive.

Lua string.format options

This may seem like a stupid question, but what are the symbols used for string replacement in string.format? can someone point me to a simple example of how to use it?
string.format in Lua follows the same patterns as Printf in c:
https://cplusplus.com/reference/cstdio/printf/
There are some exceptions, for those see here:
http://pgl.yoyo.org/luai/i/string.format
Chapter 20 of PiL describes string.format near the end:
The function string.format is a
powerful tool when formatting strings,
typically for output. It returns a
formatted version of its variable
number of arguments following the
description given by its first
argument, the so-called format string.
The format string has rules similar to
those of the printf function of
standard C: It is composed of regular
text and directives, which control
where and how each argument must be
placed in the formatted string.
The Lua Reference says:
The format string follows the same
rules as the printf family of standard
C functions. The only differences are
that the options/modifiers *, l, L, n,
p, and h are not supported and that
there is an extra option, q.
The function is implemented by str_format() in strlib.c which itself interprets the format string, but defers to the C library's implementation of sprintf() to actually format each field after determining what type of value is expected (string or number, essentially) to correspond to each field.
There should be "Lua Quick Reference" html file in your hard disk, if you used an installation package.
(for example: ../Lua/5.1/docs/luarefv51.html)
There you'll find, among other things,
string.format (s [, args ])
Formatting directives
Formatting field types
Formatting flags
Formatting examples
To add to the other answers: Lua does have a boolean data type, where C does not. C uses numbers for that, where 0 is false and everything else is true.
However, to format a boolean in a String in Lua,
local text = string.format("bool is %d", truth)
gets (at least in Hammerspoon):
bad argument #2 to 'format' (number expected, got boolean)
You can instead use %s for booleans (as for strings):
local text = string.format("bool is %s", truth)

Resources