add numbers from several input files into 1 file using JCL - mainframe

I have n number of files that have a particular count.
I now want to add all those count values into another file.
it will be something like
//SORTIN DD file1
// DD file2
.
DD filen
//SORTOUT DD output file
need a way to do the same using only JCL.

Regarding your "using only JCL" requirement, please read this.
If your files are GDGs then in your SORTIN just specify the GDG base name without specifying a generation number. The system will automatically concatenate your files for you.

Related

Find the line number of particular record using JCL

I have a input PS file like
aaa1111zzzz
bbb2222bbbb
ccc3333bbbb
ddd3333cccc
eee7777yyyy
I want to know the line number in this input PS file where the word '3333' is present.. output should be something like
3
4
in this scenario
I am looking for a JCL that can do this, I have searched in net, but no luck.
You seem to mis-understand what JCL is.
JCL is not executable, it does not look at data, it does not manipulate data.
JCL is like a memo from you to the operating system requesting the OS to run some programs.
When the JCL is read the OS reads it and sets up whatever it requires to do the tasks defined in the JCL and then DISCARDS the JCL i.e. writes it to the output spool.
The OS then runs the program(s) in accordance with the information it has extracted.
Now, for your task, cschneid has pointed you to one solution. You will have to manipulate the report from superc to get it as you posted.
Or, you can use your sort product as follows:
When reading the records in you ask sort to assign a sequence number to each record. When writing the records out you ask sort to only refer to the records with the value you are looking for and only write out the sequence number for those records.
With respect to NicC's answer, I've tried using SORT to attain the expected result.
//SORT EXEC PGM=SYNCSORT
//SORTIN DD *
aaa1111zzzz
bbb2222bbbb
ccc3333bbbb
ddd3333cccc
eee7777yyyy
//SORTOUT DD SYSOUT=*
//SYSOUT DD SYSOUT=*
//SYSUDUMP DD SYSOUT=*
//SYSIN DD *
INREC FIELDS=(1:SEQNUM,3,ZD,4:1,11)
SORT FIELDS=COPY
OUTFIL FNAMES=SORTOUT,INCLUDE=(7,4,ZD,EQ,3333),OUTREC=(1:1,3)
/*
It's a different approach than most people take, but don't forget you can actually use z/OS UNIX Services utilities for tasks like this. In your example, "grep -nr pattern file" would find all the lines matching "pattern" and show you the line numbers.
Of course, the trick is getting the "file" part right when your data is in a conventional dataset...sometimes, easiest is something like this:
cat "//'my.dataset.name'" | grep -nr pattern
To run this in JCL, you'd put the above command as input into BPXBATCH using JCL like this:
//jobname JOB ...
// EXEC PGM=BPXBATCH
//STDERR DD PATH='/tmp/mystd.err',
// PATHOPTS=(OWRONLY,OCREAT,OTRUNC),PATHMODE=SIRWXU
//STDOUT DD PATH='/tmp/mystd.out',
// PATHOPTS=(OWRONLY,OCREAT,OTRUNC),PATHMODE=SIRWXU
//STDPARM DD *
SH cat "//'my.dataset.name'" | grep -nr pattern
If you want the STDOUT/STDERR somewhere else (say, SYSOUT), just change the STDERR/STDOUT DD statements.
Using UNIX Services this way is a very cool thing if you're already familiar with UNIX/Linux shell commands...read the details here: https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.1.0/com.ibm.zos.v2r1.bpxa400/toc.htm

How to append data in a PS file?

I have the following line in a PS file
version
!cd "//''"
the another file contains the following line
remove
I want to append data from another file in between the single quotes of the above line. After Appending the characters(remove) from another the ps file should contain the following data
version
!cd "/remove/''"
I tried using OUTREC but it din work, the characters versions also got changed
SORT FIELDS=COPY
OUTREC FIELDS=(1:C'!cd "/',
6:1,6,
12:C''"')
Your FIELDS (which is better as BUILD) should be conditional to avoid processing every line.
You've not mentioned how you are getting the data from the other file.
Best would be two steps. Step 1, to create a "symbol file" (SYMNAMES DD when it is used in the second step) to take the data from your second file and give it a name.
Then in the second step, with conditional processing (IFTHEN=(WHEN=(logicalexpression)) to use the value of the symbol as the insertion.
I'm assuming your second file can sometimes contain different values? If not, why not just generate the whole thing? Or use your editor?

How to edit 300 GB text file (genomics data)?

I have a 300 GB text file that contains genomics data with over 250k records. There are some records with bad data and our genomics program 'Popoolution' allows us to comment out the "bad" records with an asterisk. Our problem is that we cannot find a text editor that will load the data so that we can comment out the bad records. Any suggestions? We have both Windows and Linux boxes.
UPDATE: More information
The program Popoolution (https://code.google.com/p/popoolation/) crashes when it reaches a "bad" record giving us the line number that we can then comment out. Specifically, we get a message from Perl that says "F#€%& Scaffolding". The manual suggests we can just use an asterisk to comment out the bad line. Sadly, we will have to repeat this process many times...
One more thought... Is there an approach that would allow us to add the asterisk to the line without opening the entire text file at once. This could be very useful given that we will have to repeat the process an unknown number of times.
Based on your update:
One more thought... Is there an approach that would allow us to add
the asterisk to the line without opening the entire text file at once.
This could be very useful given that we will have to repeat the
process an unknown number of times.
Here you have an approach: If you know the line number, you can add an asterisk in the beginning of that line saying:
sed 'LINE_NUMBER s/^/*/' file
See an example:
$ cat file
aa
bb
cc
dd
ee
$ sed '3 s/^/*/' file
aa
bb
*cc
dd
ee
If you add -i, the file will be updated:
$ sed -i '3 s/^/*/' file
$ cat file
aa
bb
*cc
dd
ee
Even though I always think it's better to do a redirection to another file
sed '3 s/^/*/' file > new_file
so that you keep intact your original file and save the updated one in new_file.
If you are required to have a person mark these records manually with a text editor, for whatever reason, you should probably use split to split the file up into manageable pieces.
split -a4 -d -l100000 hugefile.txt part.
This will split the file up into pieces with 100000 lines each. The names of the files will be part.0000, part.0001, etc. Then, after all the files have been edited, you can combine them back together with cat:
cat part.* > new_hugefile.txt
The simplest solution is to use a stream-oriented editor such as sed. All you need is to be able to write one or more regular expression(s) that will identify all (and only) the bad records. Since you haven't provided any details on how to identify the bad records, this is the only possible answer.
A basic pattern in R is to read the data in chunks, edit, and write out
fin = file("fin.txt", "r")
fout = file("fout.txt", "w")
while (length(txt <- readLines(fin, n=1000000))) {
## txt is now 1000000 lines, add an asterix to problem lines
## bad = <create logical vector indicating bad lines here>
## txt[bad] = paste0("*", txt[bad])
writeLines(txt, fout)
}
close(fin); close(fout)
While not ideal, this works on Windows (implied by the mention of Notepad++) and in a language that you are presumably familiar (R). Using sed (definitely the appropriate tool in the long run) would require installation of additional software and coming up to speed with sed.

How to extract the first x-megabyte from a large file in unix/linux?

I have a large file which I am only interested in the first couple of megabytes in the head.
How do I extract the first x-megabyte from a large file in unix/linux and put it into a seperate file?
(I know the split command can split files into many pieces. And using bash scripts I can erase the pieces I don't want. I would prefer a easier way)
Head works with binary files and the syntax is neater than dd.
head -c 2M input.file > output.file
Tail works the same way if you want the end of a file.
E.g.
dd if=largefile count=6 bs=1M > largefile.6megsonly
The 1M spelling assumes GNU dd. Otherwise, you could do
dd if=largefile count=$((6*1024)) bs=1024 > largefile.6megsonly
This again assumes bash-style arithmetic evaluation.
On a Mac (Catalina) the head and tail commands don't seem to take modifiers like m (mega) and g (giga) in upper or lower case, but will take a large integer byte count like this one for 50 MB
head -c50000000 inputfile.txt > outputfile.txt
Try the command dd. You can use "man dd" to get main ideas of it.

replacing fixed amount of text in a large file

I'm trying to replace a small amount of text on a specific line of a large log file (totaling ~40 mil lines):
sed -i '20000000s/.\{5\}$/zzzzz/' log_file
The purpose of this is to "mark" a line with an expected unique string, for later testing.
The above command works fine, but in-place editing of sed (and perl) creates a temp file, which is costly.
Is there a way to replace a fixed number of characters (i.e. 5 chars with 5 other chars) in a file without having to create a temp file, or a very large buffer, which would wind up becoming a temp file itself.
You could use dd to replace some bytes in place:
dd if=/dev/zero of=path/to/file bs=1 count=10 conv=notrunc skip=1000
would write 10 zeros (0x00) after the 1000s byte. You can put whatever you want to replace inside a file and write the path to it in the if parameter. Then you had to insert the size of the replacement file into the count parameter, so the whole file gets read.
the conv=notrunc parameter tells dd to leave the end of the file untruncated.
This should work well for any 1-byte file encoding.
ex is a scriptable file editor, so it will work in-place:
ex log_file << 'END_OF_COMMANDS'
20000000s/.\{5\}$/zzzzz/
w
q
END_OF_COMMANDS

Resources