How can we use sum and max at the same time when group by data? - mainframe

How can we sum amount and get max date at the same time when group by id in JCL
Input:
Id Amount Date
--------------------
123 200 20180516
123 300 20180520
456 100 20180616
456 700 20180420
Expected result:
Id Amount Date
--------------------
123 500 20180520
456 800 20180616
What i try already:
SORTST5 EXEC PGM=SORT
SYSOUT DD SYSOUT=*
SYSPRINT DD SYSOUT=*
SORTIN DD DSN=&VNTMPççWRK.INWORK,DISP=SHR
SORTOUT DD DSN=&VNTMP..WRK.OUTWORK.OUT.FRM,
DISP=(,CATLG,DELETE),
RECFM=FB,LRECL=84, SPACE=(CYL,(100,100), RLSE)
SYSIN DD *
SORT FIELDS=(1,3,PD,A)
SUM FIELDS=(4,3,PD)

JCL isn't an executable and you can't manipulate date in JCL without utilities like SORT.
I've used ICETOOL (a utility) in JCL to achieve your expected result.
First Control statement will sort the Input using ID (in ascending order) and Date (in descending order). Second control statement will do SUM FIELDS on the output yielded using first control statement.
//STEP1 EXEC PGM=ICETOOL
//TOOLMSG DD SYSOUT=*
//DFMSG DD SYSOUT=*
//TOOLIN DD *
SORT FROM(INDD) TO(OUTDD1) USING(CTL1)
SORT FROM(OUTDD1) TO(OUTDD2) USING(CTL2)
//INDD DD *
123 200 20180516
123 300 20180520
456 700 20180420
456 100 20180616
//OUTDD1 DD DSN=XXX.ICETOOL.OUTDD1,
// DISP=(,CATLG,DELETE),
// SPACE=(CYL,(100,0),RLSE),
// DCB=(LRECL=80,RECFM=FB,BLKSIZE=0)
//OUTDD2 DD DSN=XXX.ICETOOL.OUTDD2,
// DISP=(,CATLG,DELETE),
// SPACE=(CYL,(100,0),RLSE),
// DCB=(LRECL=80,RECFM=FB,BLKSIZE=0)
//SSMSG DD SYSOUT=*
//SYSOUT DD SYSOUT=*
//CTL1CNTL DD *
SORT FIELDS=(1,3,ZD,A,9,8,ZD,D)
/*
//CTL2CNTL DD *
SORT FIELDS=(1,3,ZD,A)
SUM FIELDS=NONE
/*

Related

Using SORT in JCL, require only a maximum count after counting records group by

I have a problem, in which I need the max count after performing the group by count of records from position 1 to length 5, in a single step using SORT in JCL.
Tried with below JCL code, and was able to perform the group by count of records, but now only max count record is require, and tried MAX, but for every usage job abends.
Sample Data:
AAAAA 234
AAAAA 124
BBBBB 324
BBBBB 546
AAAAA 325
CCCCC 754
BBBBB 546
BBBBB 346
CCCCC 765
SORT Control statement used in JCL Step:
SORT FIELDS=(1,5,CH,A)
OUTFIL REMOVECC,NODETAIL,
SECTIONS=(1,5,TRAILER3=(1,5,X,COUNT=(M10,LENGHT=4)))
Current output:
AAAAA 3
BBBBB 4
CCCCC 2
Require output:
BBBBB 4
Could anyone please help and suggest how I can use MAX along with SECTIONS, TRAILER and COUNT in single step.

linux/unix convert delimited file to fixed width

I have a requirement to convert a delimited file to fixed-width file, details as follows.
Input file sample:
AAA|BBB|C|1234|56
AA1|BB2|DD|12345|890
Output file sample:
AAA BBB C 1234 56
AA1 BB2 DD 12345 890
Details of field positions
Field 1 Start at position 1 and length should be 5
Field 2 start at position 6 and length should be 6
Field 3 Start at position 12 and length should be 4
Field 4 Start at position 16 and length should be 6
Field 5 Start at position 22 and length should be 3
Another awk solution:
echo -e "AAA|BBB|C|1234|56\nAA1|BB2|DD|12345|890" |
awk -F '|' '{printf "%-5s%-6s%-4s%-6s%-3s\n",$1,$2,$3,$4,$5}'
Note the - before the %-3s in the printf statement, which will left-align the fields, as required in the question. Output:
AAA BBB C 1234 56
AA1 BB2 DD 12345 890
With the following awk command you can achive your goal:
awk 'BEGIN { RS=" "; FS="|" } { printf "%5s%6s%4s%6s%3s\n",$1,$2,$3,$4,$5 }' your_input_file
Your record separator (RS) is a space and your field separator (FS) is a pipe (|) character. In order to parse your data correctly we set them in the BEGIN statement (before any data is read). Then using printf and the desired format characters we output the data in the desired format.
Output:
AAA BBB C 1234 56
AA1 BB2 DD 12345890
Update:
I just saw your edits on the input file format (previously they seemed different). If your input data records are separated with a new line then simply remove the RS=" "; part from the above one-liner and apply the - modifiers for the format characters to left align your fields:
awk 'BEGIN { FS="|" } { printf "%-5s%-6s%-4s%-6s%-3s\n",$1,$2,$3,$4,$5 }' your_input_file

Excel search for value add value next from colum

So I need a bit of help I have a huge excel sheet with materials
Material Quantity
AA 10
BB 11
CC 52
DD 60
AA 16
DD 10
FF 20
QQ 400
RR 25
TT 80
AA 10
AA 122
FF 11
FF 12
GG 1
TT 15
What I would like to be able to do is to select the Value from 1st colum, let's say look for AA and automatically to get the SUM off all the Quantity AA ( In this example 10+16+10+122 = 158)
Although a SUMIF formula does work and might be placed wherever convenient (I have shown #Scott Craner's in D15) a PivotTable (as suggested by #Zahiro Mor) might be a lot more useful, not only because it can sum all Materials at one time but also for further analysis:

NodeJS Copying File over a stream is very slow

I am copying file with Node on an SSD under VMWare, but the performance is very low. The benchmark I have run to measure actual speed is as follows:
$ hdparm -tT /dev/sda
/dev/sda:
Timing cached reads: 12004 MB in 1.99 seconds = 6025.64 MB/sec
Timing buffered disk reads: 1370 MB in 3.00 seconds = 456.29 MB/sec
However, the following Node code that copies file is very slow, evne teh consequent runs do not make it faster:
var fs = require("fs");
fs.createReadStream("bigfile").pipe(fs.createWriteStream("tempbigfile"));
And the runs as:
$ seq 1 10000000 > bigfile
$ ll bigfile -h
-rw-rw-r-- 1 mustafa mustafa 848M Jun 3 03:30 bigfile
$ time node test.js
real 0m4.973s
user 0m2.621s
sys 0m7.236s
$ time node test.js
real 0m5.370s
user 0m2.496s
sys 0m7.190s
What is the issue here and how can I speed it up? I believe I can write it faster in C by just adjusting the buffer size. The thing that confuses me is that when I wrote simple almost pv equivalent program, that pipes stdin to stdout as the below, it is very fast.
process.stdin.pipe(process.stdout);
And the runs as:
$ dd if=/dev/zero bs=8M count=128 | pv | dd of=/dev/null
128+0 records in 174MB/s] [ <=> ]
128+0 records out
1073741824 bytes (1.1 GB) copied, 5.78077 s, 186 MB/s
1GB 0:00:05 [ 177MB/s] [ <=> ]
2097152+0 records in
2097152+0 records out
1073741824 bytes (1.1 GB) copied, 5.78131 s, 186 MB/s
$ dd if=/dev/zero bs=8M count=128 | dd of=/dev/null
128+0 records in
128+0 records out
1073741824 bytes (1.1 GB) copied, 5.57005 s, 193 MB/s
2097152+0 records in
2097152+0 records out
1073741824 bytes (1.1 GB) copied, 5.5704 s, 193 MB/s
$ dd if=/dev/zero bs=8M count=128 | node test.js | dd of=/dev/null
128+0 records in
128+0 records out
1073741824 bytes (1.1 GB) copied, 4.61734 s, 233 MB/s
2097152+0 records in
2097152+0 records out
1073741824 bytes (1.1 GB) copied, 4.62766 s, 232 MB/s
$ dd if=/dev/zero bs=8M count=128 | node test.js | dd of=/dev/null
128+0 records in
128+0 records out
1073741824 bytes (1.1 GB) copied, 4.22107 s, 254 MB/s
2097152+0 records in
2097152+0 records out
1073741824 bytes (1.1 GB) copied, 4.23231 s, 254 MB/s
$ dd if=/dev/zero bs=8M count=128 | dd of=/dev/null
128+0 records in
128+0 records out
1073741824 bytes (1.1 GB) copied, 5.70124 s, 188 MB/s
2097152+0 records in
2097152+0 records out
1073741824 bytes (1.1 GB) copied, 5.70144 s, 188 MB/s
$ dd if=/dev/zero bs=8M count=128 | node test.js | dd of=/dev/null
128+0 records in
128+0 records out
1073741824 bytes (1.1 GB) copied, 4.51055 s, 238 MB/s
2097152+0 records in
2097152+0 records out
1073741824 bytes (1.1 GB) copied, 4.52087 s, 238 MB/s
I don't know the answer to your question, but perhaps this helps in your investigation of the problem.
In the Node.js documentation about stream buffering, it says:
Both Writable and Readable streams will store data in an internal
buffer that can be retrieved using writable.writableBuffer or
readable.readableBuffer, respectively.
The amount of data potentially buffered depends on the highWaterMark
option passed into the stream's constructor. For normal streams, the
highWaterMark option specifies a total number of bytes. For streams
operating in object mode, the highWaterMark specifies a total number
of objects....
A key goal of the stream API, particularly the stream.pipe() method,
is to limit the buffering of data to acceptable levels such that
sources and destinations of differing speeds will not overwhelm the
available memory.
So, you can play with the buffer sizes to improve speed:
var fs = require('fs');
var path = require('path');
var from = path.normalize(process.argv[2]);
var to = path.normalize(process.argv[3]);
var readOpts = {highWaterMark: Math.pow(2,16)}; // 65536
var writeOpts = {highWaterMark: Math.pow(2,16)}; // 65536
var source = fs.createReadStream(from, readOpts);
var destiny = fs.createWriteStream(to, writeOpts)
source.pipe(destiny);

How to keep only those rows which are unique in a tab-delimited file in unix

Here, two rows are considered redundant if second value is same.
Is there any unix/linux command that can achieve the following.
1 aa
2 aa
1 ss
3 dd
4 dd
Result
1 aa
1 ss
3 dd
I generally use the following command but it does not achieve what I want here.
sort -k2 /Users/fahim/Desktop/delnow2.csv | uniq
Edit:
My file had roughly 25 million lines:
Time when using the solution suggested by #Steve : 33 seconds.
$date; awk -F '\t' '!a[$2]++' myfile.txt > outfile.txt; date
Wed Nov 27 18:00:16 EST 2013
Wed Nov 27 18:00:49 EST 2013
The sort and unique is taking too much time. I quit after waiting for 5 minutes.
Perhaps this is what you're looking for:
awk -F "\t" '!a[$2]++' file
Results:
1 aa
1 ss
3 dd
I understand that you want a unique sorted file by the second field.
You need to add -u to sort to achieve this.
sort -u -k2 /Users/fahim/Desktop/delnow2.csv

Resources