Sort on length of field

Sort on length of field - mainframe

I want to write a sort JCL with requirement where I want to sort on variable record length file
Input file:
Mark aaaaaaa
Amy bbbbbb
Paula ccccccccccc
Sort on the length of field before spaces on ascending order. That is sorting on length of first col/word Mark,Amy etc.. On basis of their length.
And second one is like performing sort on field after spaces on descending order but if any vowels in field should always be first and then rest of data.
Coming on second part ,here it's like the fields after spaces or aaaaa, bbbbb and ccccc we need to sort it in descending order (alphabetically) ,but then we also need to check if the field is vovel ,if any vovel then that field will be always as top, so the expected output will be like:
Considering above input file output file will be:
Mark aaaaaaaa
Paula cccccc
Amy bbbbbb
Now here vovel as in first record which contains aaaa in it is at top and rest data is sorted in descending order. I want to achieve this.

What you are asking is not at all a simple thing :-)
Whilst DFSORT has much intrinsic functionality, finding the length of a sequence of non-space characters is not available.
So you have to roll-your-own.
Although the task is also possible with fixed-length records (different technique) it is easier with variable-length records.
Because the fields are variable-length as well, you'll need PARSE to separate the fields. For variable-length or variably-located fields, PARSE is usually the answer.
PARSE creates fixed-length parsed fields, so you have to know the maximum lengths of your text. In this example 30 is chosen for each.
The solution will develop piece by piece, because you will need to be secure in your understanding of it. The pieces are presented as "stand alone" code which you can run and see what happens:
OPTION COPY
INREC IFTHEN=(WHEN=INIT,
PARSE=(%01=(ENDBEFR=C' ',
FIXLEN=30),
%02=(FIXLEN=30))),
IFTHEN=(WHEN=INIT,
BUILD=(1,4,%01,%02))
If you run that, you will get this output:
MARK AAAAAAA
AMY BBBBBB
PAULA CCCCCCCCCCC
INREC runs before a SORT, so to make any changes to the data before a SORT, you use INREC. OUTREC runs after SORT, and OUTFIL after OUTREC.
For now, the BUILD is just to show that the PARSEd fields contain the output you want (don't worry about the case, if you used mixed-case it will be like that).
WHEN=INIT means "do this for each record, before the following IFTHEN statements (if any)". You can use multiple WHEN=INIT, and you have to use multiple IFTHEN of some type to transform data in multiple stages.
The 1,4 in the BUILD is for the Record Descriptor Word (RDW) which each variable-length record hase, and is always necessary when creating a variable-length current record in SORT, but we'll use it for another purpose here as well.
The next stage is to "extend" the records, because we need two fields to SORT on. For a variable-length record, you extend "at the front". In general:
BUILD=(1,4,extensionstuff,5)
This makes a new version of the current record, with first the RDW from the old current record, then "does some stuff" to create the extension, then copies from position 5 (the first data-byte on a variable-length record) to the end of the record.
Although the RDW is "copied", the value of the RDW at the time is irrelevant, as it will be calculated for the BUILD. It just must be an RDW to start with, you can't just put anything there except an actual RDW.
Another component that will be needed is to extend the records for the SORT key. We need the length of the first field, and we need a "flag" for whether or not to "sort early" for the second field containing a vowel. For the length it will be convenient to have a two-byte binary value. For now, we are just reserving bytes for the things:
OPTION COPY
INREC BUILD=(1,4,2X,2X,X,5)
The 2X is two blanks, the X is one blank, so a total of five blanks. It could have been written as 5X, and in the final code is best that way, but for now it is clearer. Run that and you will see your records prefixed by five blanks.
There are two tasks. The length of the first field, and whether the second field contains a vowel.
The key to the first task is to replace blanks from the PARSEd field with "nothing". This will cause the record to be shortened by one for each blank replaced. Saving the length of the original current record, and calculating with the length of the current record and the fixed-length (30) reveals the length of the data.
The key to the second task applies a similar technique. This time, change the second PARSEd field such that a, e, i, o, u are replaced by "nothing". Then if the length is the same as the original, there were no vowels.
The FINDREP will look something like this:
IFTHEN=(WHEN=INIT,
FINDREP=(IN=C' ',
OUT=C'',
STARTPOS=n1,
ENDPOS=n2)),
You'll need a variant for the vowels:
IFTHEN=(WHEN=INIT,
FINDREP=(IN=(C'A',C'E',C'I',C'O',C'U'),
OUT=C'',
STARTPOS=n1,
ENDPOS=n2)),
To run:
OPTION COPY
INREC IFTHEN=(WHEN=INIT,
PARSE=(%01=(ENDBEFR=C' ',
FIXLEN=30),
%02=(FIXLEN=30))),
IFTHEN=(WHEN=INIT,
BUILD=(1,4,2X,X,%02)),
IFTHEN=(WHEN=INIT,
OVERLAY=(5:1,2)),
IFTHEN=(WHEN=INIT,
FINDREP=(IN=(C'A',
C'E',
C'I',
C'O',
C'U'),
OUT=C'',
STARTPOS=8,
ENDPOS=38)),
IFTHEN=(WHEN=(1,4,BI,EQ,5,2,BI),
OVERLAY=(7:C'N'))
If you run that, you will see the flag (third data-position) is now space (for a vowel present) or "N". Don't worry that all the "A"s have disappeared, they are still tucked away in %02.
OVERLAY can make changes to the current record without creating a new, replacement record (which is what BUILD does). You'll see OVERLAY used below to get the new record-length after the a new current record-length has been created (the BUILD would get the original record-length from the RDW).
A similar process for the other task.
I've included some additional test-data and made further assumptions about your SORT order. Here's full, annotated (the comments can remain, they do not affect the processing), code:
* PARSE CURRENT INPUT TO GET TWO FIELDS, HELD SEPARATELY FROM THE RECORD.
*
INREC IFTHEN=(WHEN=INIT,
PARSE=(%01=(ENDBEFR=C' ',
FIXLEN=30),
%02=(FIXLEN=30))),
* MAKE A NEW CURRENT RECORD, RDW FROM EXISTING RECORD, THREE EXTENSIONS, AND
* A COPY OF THE FIRST PARSED FIELD.
*
IFTHEN=(WHEN=INIT,
BUILD=(1,4,
2X,
2X,
X,
%01)),
* STORE THE LENGTH OF THE NEW CURRENT RECORD ON THE CURRENT RECORD.
*
IFTHEN=(WHEN=INIT,
OVERLAY=(5:
1,2)),
* REPLACE BLANKS WITH "NOTHING" WITHIN THE COPY OF THE PARSED FIELD. THIS WILL
* AUTOMATICALLY ADJUST THE RDW ON THE CURRENT RECORD.
*
IFTHEN=(WHEN=INIT,
FINDREP=(IN=C' ',
OUT=C'',
STARTPOS=10,
ENDPOS=40)),
* CALCULATE THE LENGTH OF THE NON-BLANKS IN THE FIELD, BY SUBTRACTING PREVIOUS
* STORED RECORD-LENGTH FROM CURRENT RECORD-LENGTH (FIRST TWO BYTES, BINARY, OF
* RDW) AND ADDING 30 (LENGTH OF PARSED FIELD).
*
IFTHEN=(WHEN=INIT,
OVERLAY=(5:
1,2,BI,
SUB,
5,2,BI,
ADD,
+30,
TO=BI,
LENGTH=2)),
* MAKE A NEW CURRENT RECORD, COPYING RDW AND THE VALUE CALCULATED ABOVE, BLANKS
* (COULD BE COPIED) AND THEN THE SECOND PARSED FIELD.
*
IFTHEN=(WHEN=INIT,
BUILD=(1,4,
5,2,
2X,
X,
%02)),
* AGAIN SAVE THE LENGTH OF THE NEW CURRENT RECORD.
*
IFTHEN=(WHEN=INIT,
OVERLAY=(7:
1,2)),
* CHANGE ALL VOWELS TO "NOTHING". THIS WILL AUTOMATICALLY ADJUST THE RDW. FOR
* MIXED-CASE JUST EXTEND THE IN TO INCLUDE LOWER-CASE VOWELS AS WELL.
*
IFTHEN=(WHEN=INIT,
FINDREP=(IN=(C'A',
C'E',
C'I',
C'O',
C'U'),
OUT=C'',
STARTPOS=10,
ENDPOS=40)),
* CALCULATE NUMBER OF VOWELS.
*
IFTHEN=(WHEN=INIT,
OVERLAY=(7:
7,2,BI,
SUB,
1,2,BI,
TO=BI,
LENGTH=2)),
* MAKE A NEW CURRENT RECORD TO BE SORTED, WITH BOTH PARSED FIELDS.
*
IFTHEN=(WHEN=INIT,
BUILD=(1,4,
5,2,
7,2,
9,1,
%01,
%02)),
* SET THE FLAG TO "OUTSORT" THOSE RECORDS WITH A VOWEL IN THE SECOND FIELD.
*
IFTHEN=(WHEN=(7,2,BI,EQ,0),
OVERLAY=(9:
C'N'))
* SORT ON "OUTSORT FLAG", LENGTH OF NAME (DESCENDING), NAME, 2ND FIELD.
SORT FIELDS=(9,1,CH,A,
5,2,CH,D,
10,30,CH,A,
40,30,CH,A)
* FIELDS NEEDED TO BE IN FIXED POSITION FOR SORT, AND EXTENSION FIELDS NO
* LONGER NEEDED. ALSO REMOVE BLANKS FROM THE TWO FIELDS, KEEPING A SEPARATOR
* BETWEEN THEM. THIS COULD INSTEAD BE DONE ON THE OUTFIL.
*
OUTREC BUILD=(1,4,
10,60,
SQZ=(SHIFT=LEFT,
MID=C' '))
* CURRENTLY THE VARIABLE-LENGTH RECORDS ARE ALL THE SAME LENGTH (69 BYTES) SO
* REMOVE TRAILING BLANKS.
*
OUTFIL VLTRIM=C' '
Extensive test-data:
MARK AAAAAAA
AMY BBBBBB
PAULA CCCCCCCCCCC
PAULA BDDDDDDDDDD
IK JJJJJJJJJJO
You can also see how the code works by "removing a line at a time" from the end of the code, so you can see how the transformation reaches that point, or by running the code increasing a line at a time from the start of the code.
It is important that you, and your colleagues, understand the code.
There are some opportunities for some rationalisation. If you can work those out, it means you understand the code. Probably.

Related

Understand the following control cards

I have the following control cards that I can't understand how to read. Could someone help me traduce what this part of a JOB is performing?
OUTFIL FNAMES=(XSCB),BLKCCT1,INCLUDE=(67,7,CH,EQ,
C'XSCB ',OR,69,7,CH,EQ,
C'XSCB '),
HEADER2=(22:C'XSCB MVS USERID SYSTEM USAGE REPORT',/,
01:C'GENERATED ON ',&DATE=(MD4/),70:C'PAGE',&PAGE,/,
01:C' AT ',&TIME,/,X,/,
01:C'JULIAN',/,
01:C'DATE TIME SYSTEM JOB MESSAGE',/,
01:C'-------- -------- ------ -------- ---------------->'),
TRAILER1=(X,/,01:C'RECORDS FOUND =',COUNT,/,34:C'END OF REPORT'),
OUTREC=(20,07,ZD,EDIT=(TTTT.TTT),X, * JULIAN DATE
28,08,X, * TIME
11,06,X, * SYSTEM
40,08,X, * JOB OR REF
59,07,CHANGE=(50,C'IEF125I',C'LOGGED ON ', * MESSAGE
C'IEF126I',C'LOGGED OFF'),
NOMATCH=(79,50),
132:X)
I understand that it searches the ID 'XSCB' in the position 67 or 69. But once it finds it, I cannot interpret what it does next.

Those are SORT control cards. If you look at the SYSOUT for the step, and pay attention to the messages, you will be able to tell if it is DFSORT (messages prefixed by ICE) or SyncSORT (messages prefixed by WER).
Your step may be EXEC PGM=SORT or ICEMAN or something else, depends on your site.
The control cards are producing a report. You have at least one line missing from your control cards (OPTION COPY, or SORT FIELDS=COPY or a different SORT or MERGE statement). There could be any number of missing cards, and you possibly have another output from the step. Otherwise the OUTFIL INCLUDE= could perhaps be a plain INCLUDE COND=.
What does what you have shown actually do?
OUTFIL defines final processing for a particular output data set. With no name, it would be for the SORTOUT DD in your JCL.
With FNAMES=(XSCB) it is for a DD named XSCB in your JCL. For a single name specified in FNAMES, the brackets are redundant.
BLKCTT1 says "put a blank in column one to not get a page-eject from TRAILER1 output".
The INCLUDE= is as you suspect. Testing two different starting positions for the same value. If either test is true, the current record will be included in the OUTFIL group.
HEADER2 defines what appears at the top of each page.
The 01: is a column-number, and is redundant, as each line by default starts are column one.
HEADER2 can create multiple lines (as can any HEADERn or TRAILERn and BUILD (or OUTREC, but don't use it for new) on OUTFIL), each separated by "/". &DATE, &TIME and &PAGE are special, containing the obvious. &DATE can be formatted in various ways, MD4/ is MM, DD, YYYY separated by slashes.
The X is a blank, on a line of its own. You could equally see .../,/... or n/ to create n multiple blank lines.
The constants should be obvious.
TRAILER1 defines what is printed at the end of the report.
COUNT is the number of records in the OUTFIL group, here used with no formatting, but it can be formatted.
The 34: column-number means the items following will start from column 34.
The OUTREC is better spelled as BUILD. OUTREC exists elsewhere. BUILD has been around for more than 10 years, so no need to use OUTREC on OUTFIL in new code (maybe this is old anyway).
What the BUILD would do is format the current input record into what is desired for an output line on the report.
The numbers in pairs are start-position and length of fields. Where no field-type is defined, they are (treated as) character fields.
You have one field-type, ZD, which is zoned-decimal. Its length is seven, and an EDIT mask is used, four digits, full-stop (decimal-point) and then three digits.
The Xs as previously are blanks, used as separators on the report. The content of each field is described in a comment. A comment is any text after the end of a control card. A control card ends at the blank after the statement is complete, or where a there is a blank after a possible continuation (a comma or a colon are possible continuations).
132:X puts a blank in column 132, and pads any intervening columns from the last field or constant with blanks.
That leaves the CHANGE=.
CHANGE= is a very useful test-and-replace.
79,50,CHANGE=(50,C'IEF125I',C'LOGGED ON ', * MESSAGE
C'IEF126I',C'LOGGED OFF'),
NOMATCH=(79,50)
This says "at the current column of the record being created, consider the content of the input from position 79 for a length of 50. The output length will be 50. If IEF125I, then use the constant LOGGED ON, if IEF126I use LOGGED OFF, and else (NOMATCH) use whatever is at position 79 for a length of 50 from the input.
Basically, the report is using the system log, or an extract from it, to report activity related to the Userid/Logon XSCB.

FINDREP a short string with longer without overwriting next column

So I have a set of data such as this:
mxyzd1 0000015000
mxyzd2 0000016000
xyzmd5823 0000017000
I need to use dfsort to get this data:
123xyzd1 0000015000
123xyzd2 0000016000
xyz123d5820000017000
So what I mean is: replace all character 'm' by '123' without overwriting the second column, so truncate data before you get to the second column (which starts at pos 11).
So far I've been able to replace the data but can't prevent all of my data of getting shifted, this is my code so far:
SYSIN DATA *
SORT FIELDS=(1,1,CH,A)
OUTREC FINDREP=(IN=C'm',OUT=C'123',STARTPOS=1,ENDPOS=10,
MAXLEN=20,OVERRUN=TRUNC,SHIFT=YES)
DATAEND
*

The problem you are facing is that all data on a record will be shifted to the right if the FINDREP change increases the length, and to the left if the FINDREP change decreases the length. Any change in the length of the changed data affects the entire record. You have discovered this yourself.
To put that another way, FINDREP does not know about fields (columns are best termed something like that) it only knows about records, even when it is looking only at a portion of the record, changes in length reflect on the rest of the record.
There is no way to write just a FINDREP to avoid this.
OPTION COPY
INREC IFTHEN=(WHEN=INIT,
OVERLAY=(21:1,10)),
IFTHEN=(WHEN=INIT,
FINDREP=(IN=C'm',
OUT=C'123',
STARTPOS=21)),
IFTHEN=(WHEN=INIT,
BUILD=(21,10,
11,10))
This will put the data from 1,10 into a temporary extension to the record. It will do the FINDREP on the temporary extension only. Then it will take the first 10 bytes of the extension and put them into position one for a length of 10.

Just make one small change in your sort card - SHIFT=NO

Adding a new column in SORT

My input data is like this:
trainnumber name station price coach seats
16001 CHN-CENTRAL PALANI 400.00 AC 02
16002 PALANI CHN-CENTRAL 410.00 ORD 76
16003 CHN-CENTRAL NAGARKOIL 425.00 AC 30
16004 NAGARKOIL CHN-CENTRAL 439.00 SLP 37
16005 THANJAVUR CHN-EGMORE 395.00 ORD 60
16006 CHN-EGMORE THANJAVUR 375.00 SLP 10
I want to add a new column before train number containing a four-digit sequence number followed by a blank and add 1 to my train number.
How to do this?

You have:
SORT FIELDS=COPY
OUTREC FIELDS=(1:SEQNUM,4,ZD,X,6:1,5,ZD,ADD,+1,EDIT=(TTTTT),
X,12:7,69)
Simplified:
OPTION COPY
INREC BUILD=(SEQNUM,4,ZD,
X,
1,5,ZD,
ADD,+1,
EDIT=(TTTTT),
X,
7,69)
OUTREC runs after a SORT/MERGE. INREC runs before a SORT/MERGE. Since you're not doing a SORT or MERGE (you're doing a COPY) it doesn't matter, but INREC is the more logical choice.
FIELDS is overloaded (consult the documentation to confirm) and since the presence of BUILD, FIELDS is not needed on INREC or OUTREC (and OUTREC is not needed on OUTFIL) because BUILD does the same job but with no possible confusion (BUILD is a synonym for FIELDS on INREC and OUTREC and OUTREC on OUTFIL - already complicated, without considering FIELDS on SUM, REFORMAT...).
Don't specify column positions (like 1:) if the positions are simply the natural arrangement. You are just building in maintenance.
The default start-point for a BUILD (or even the ugly FIELDS) is 1:. The default for the next field is immediately after the current field. You've used X for the spacing of your columns, so all data abuts the previous data. Using columns just complicates it.
Note: you have X,7,69. You could consider changing that to just 6,69, because position six is blank on your input.
Note: you are "losing" six bytes of your 80-byte record. If your input has, guaranteed, twelve trailing blanks (or other data that you do not require, ie any program using the file doesn't care about that loss) then that's OK, but we can't tell from your description.
Try to make your SORT Control Cards easier to read (try to make everything easier to read). It will save time and reduce errors. Which means cheaper. Time is money.
Assuming that you do mean with SORT, and your "column" isn't for DB2....
For fixed-length records:
OPTION COPY
INREC BUILD=(5X,1,your-lrecel)
The BUILD will cause a new current record, replacing the original, to be created. It will start with five blanks (the 5X) but you can put there whatever you like of whatever size (within the limits of the product, which are large). Change your-lrecl to the actual LRECL value.
For variable-length records:
OPTION COPY
INREC BUILD=(1,4,5X,5)
The 1,4 is the Record Descriptor Word, and it is always necessary to copy an RDW when creating a new current record. Once it is copied, SORT will ensure that the value contained in the first two bytes (the record-length) is correct. Then the new column, again five blanks in the example, then the rest of the variable-length record, which is specified simply by using a start-position (five here, to get the first byte of data) and implicitly this continues to the end of the record.
In your actual JCL (none of the above is JCL, it is SORT Control Cards), ensure that you do not specify any DCB info for SORTOUT. This means you can't use LIKE for that DD, remember that adding data makes the new LRECL different. Don't code the new LRECL in the JCL either. With it not specified, SORT will insert the correct value, and there is only one place to maintain it.

I tried like this and I did it.
SORT FIELDS=COPY
OUTREC FIELDS=(1:SEQNUM,4,ZD,X,6:1,5,ZD,ADD,+1,EDIT=(TTTTT),
X,12:7,69)
My file is 80 record length.

Sync sort, Unpaired records of File1 have spaces for no records in F2 file. Can we replace those specific column's spaces by ZEROS?

SORT:
JOINKEYS FILES=F1,FIELDS=(5,4,A,10,20,A)
JOINKEYS FILES=F2,FIELDS=(1,4,A,6,20,A)
REFORMAT FIELDS=(F1:10,20,9,1,5,4,30,1,31,10,F2:27,10)
JOIN UNPAIRED,F1
INREC BUILD=(1,36,C',',37,10,C',',27,10,SFF,SUB,37,10,SFF,
EDIT=(TTTTTT))
OUTPUT IS: *2nd row 4th column is spaces as unpaired from 2nd file, needs to be 0s automatically.
22680372 ,5102, 1, 1,000000
22222222 ,5105, 2, ,000002
OUTPUT shud be: *2nd row 4th column is 0 or 0000s as unpaired from 2nd file, needs to be 0s automatically.
22680372 ,5102, 1, 1,000000
22222222 ,5105, 2, 0,000002

You need a condition, which means IFTHEN. You can't have IFTHEN and BUILD on the same INREC, but you can have multiple IFTHENs and BUILD can be part of an IFTHEN.
IFTHEN=(WHEN=INIT indicates something which should be done for every record (unconditional).
IFTHEN=(WHEN=(logical-expression will only be actioned if the condition is true.
Every BUILD statement makes a complete new intermediate record (intermediate between input and output). OVERLAY only affects the data at the position specified (assuming no extension of the record).
Your condition will be that the 46th byte of the record is space. You have already used SFF (did you try the other suggestions, especially FS?), so there is no need to make the value zero before the BUILD.
JOINKEYS FILES=F1,FIELDS=(5,4,A,10,20,A)
JOINKEYS FILES=F2,FIELDS=(1,4,A,6,20,A)
REFORMAT FIELDS=(F1:10,20,9,1,5,4,30,1,31,10,F2:27,10)
JOIN UNPAIRED,F1
INREC IFTHEN=(WHEN=INIT,
BUILD=(1,36,
C',',
37,10,
C',',
27,10,SFF,
SUB,
37,10,SFF,
EDIT=(TTTTTT))),
IFTHEN=(WHEN=(47,1,CH,EQ,C' '),
OVERLAY=(46:C'0'))
I don't format the statements like that just for fun, but to make them easier to understand and maintain.
OK, that solution was a little clunky. You can replace the INREC with this, which shows, for this type of data, an alternative to the EDIT:
INREC IFTHEN=(WHEN=INIT,
BUILD=(1,36,
C',',
37,10,FS,TO=FS,LENGTH=10,
C',',
27,10,FS,
SUB,
37,10,FS,
TO=FS,LENGTH=8))
This is much more natural, as the space gets turned into a zero with leading blanks with no conditions at all, and using references only to that field in its position on the REFORMAT record.

Append record with binary value

I have a file with FB length=80. I want to append fixed value numeric 1 at position 81, if value at position 80='Y'
This appended value is supposed to be S9(9) BINARY when viewed from a copybook.
The appended field will be used in SUM FIELDS in a separate step.
How do I code the SORT SYSIN card ?

OPTION COPY
INREC IFTHEN=(WHEN=(80,1,CH,EQ,C'Y'),OVERLAY=(81:+1,TO=BI,LENGTH=2))
There is no need for this to be separate from you step with SUM in. Obviously you'd not use the OPTION COPY.
If you are SUMming records other than Y in Col 80, you'll need a IFTHEN=(WHEN=INIT to set everything to zero first.
Since this is a Mainframe task, you'd have got an earlier response if you'd used that Tag.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string