concatenate records in columnar format in jcl - mainframe

I have 4 files which have different records:
File 1
12345
34567
12300
21345
File 2
89090
98765
45674
23421
File 3
21356
34560
43121
11223
File 4
98765
12345
12214
I want output should be
12345 | 89090 | 21356 | 98765
....
and so on
I am trying to add sequence number to each file like start with 00 and increment by 25
but unable to build
Please help how to acheive in JCL sort card
I have tried with inrec sequence number not able proceed then

With files that size, it's easier to just edit the files directly and use your emulator's copy/paste function to combine the files.

Related

WER202A Sortout RECFM Incompatible

I have a requirement to create an output file in VB format. My Input file has 1004 record length with VB format and file 2 has fixed 410 byte length. I need to match the both files based on the key and add the 4 bytes from 2nd file (407,04).
I have used the below sortjoin
Sort fields=copy
Joinkeys files=f1, Fields=(19,9,A) -- File1 key position at 15 but used 19 as VB file
Joinkeys files-f2,Fields=(14,9,a) - VB file
Join unpaired, f1,
Reformat fields=(F1:1004,F2:407,4)
While trying to run the job, getting the error stating that RECFM is incompatible.
I have used the RECFM as VB and LRECL=1012.
Can anyone please check and let me know the reason for the error. Thanks in advance!

Find identical Excel files (except file name and some attributes) in a folder

Students submit an assignment in Excel. Many students copy someone else's work and submit identical Excel files (The files are identical in every other way except the filename and date/time attributes might be different. Size may be slightly different for some reason unknown to me.).
All the files are in a single folder.
How may I check to see which files are identical (except for filename, some date/time attributes, and minor file size differences)?
Have a look at Duplicate Files Finder.
In general, you should do a pair-wise comparison. For example, for 6 students, you would compare 15 pairs:
compare 1 to 2
compare 1 to 3
compare 1 to 4
compare 1 to 5
compare 1 to 6
compare 2 to 3
compare 2 to 4
compare 2 to 5
compare 2 to 6
compare 3 to 4
compare 3 to 5
compare 3 to 6
compare 4 to 5
compare 4 to 6
compare 5 to 6
I do not know of any software that checks all pairs of spreadsheets in a folder and lists those that are identical. The tool suggested by #Serge does a byte-by-byte comparison but that is too restrictive for your purposes. Two students may share a spreadsheet and simply by saving them at different times or with difference software versions the files may have differences at the byte level but no real differences at the level of cell contents.
However, if you have a small number of files and can manually compare each pairing, then the following formula may help you. Suppose the spreadsheets are Student1 and Student2, and they have only one worksheet each, and the meaningful content is restricted to the range A1:Z1000. Then this array formula will return true if and only if every cell in the range are identical on the two sheets.
=AND([Student1]Sheet1!A1:Z1000=[Student2]Sheet1!A1:Z1000)
(Note that this is an array formula so must be entered using Ctrl-Shift-Enter.)
Once you get that to work for just two files, then you could set up a list of file pairs to compare, perhaps like this:
+---------------+---------------+-----------+
| Spreadsheet A | Spreadsheet B | Identical |
+---------------+---------------+-----------+
| Student1 | Student2 | FALSE |
| Student1 | Student3 | TRUE |
| Student2 | Student3 | FALSE |
+---------------+---------------+-----------+
where the formula in C2 is {=AND(INDIRECT(CONCAT("'[",A2,".xlsx]Sheet1'!A1:Z1000"))=INDIRECT(CONCAT("'[",B2,".xlsx]Sheet1'!A1:Z1000")))}

Deleting colums of large csv files

I have a large CSV file of around 2 GB, containing 7 columns. I want to delete its 4th column which is a text (snippet). I used "cut" command like:
cut -d, f 4 -- complement file
But it is not removing the column because it is making columns whenever a comma is encountered in a row and deleting the 4th column from that row. Following answer here, i used csvquote like:
csvquote file | cut -d "," -f 4 --complement | uniq -c | csvquote -u
it worked for a small file, but throwing an error for large files:
errno: Value too large for defined data type
I want to know some solutions for deleting columns of the large data file. Thanks.
Edit: Head file output:
funny,user_id,review_id,text,business_id,stars,date,useful,type,cool
0,WV5XKbgVHJXEgw7f-b6PVA,hhmpSM4LcHQv6noXlYYCgw,"Went out of our way to find this place because I read they had amazing poutine. Worth the traveling. It really was spot on amazing. Served out of a storage container this place is hip. $10 for two huge portions of poutine. The fries were crisp and held up to the creamy gravy well. Topped with a huge portion of squeaky white cheese curds this was a fantastic meal.
Have you tried telling cut to use the other fields instead?
Like this:
trucks | cut -f 1,3- -d , | uniq -c | csvquote/csvquote -u
I tested it on my machine and it seems to work. But I didn't see a sample of your data, also you didn't note which program is throwing the
errno: Value too large for defined data type

delete ocurrences that ocurr less than X time

I have a file with numbers. One number per line
1234
54332
54321
32452
1234
1234
54321
I want to delete every number that doesn't appear more than 3 times.
I was thinking about sorting and then joining lines and then delete the ones that don't have 3 words.
I think there is a better way but I don't know enough vim to do it.
Have any tip?
As I commented under your question, I would do it with awk. of course, vim can do it too, by a custom function, for example.
you could try this line:
%!awk '{a[$0]++}END{for(x in a)if(a[x]>3)for(y=1;y<=a[x];y++)print x}'
note that, your example is not so good, because there is no line "appear more than 3 times." If you add another 1234 line, the result of above command would be:
1234
1234
1234
1234

How to delete double lines in bash

Given a long text file like this one (that we will call file.txt):
EDITED
1 AA
2 ab
3 azd
4 ab
5 AA
6 aslmdkfj
7 AA
How to delete the lines that appear at least twice in the same file in bash? What I mean is that I want to have this result:
1 AA
2 ab
3 azd
6 aslmdkfj
I do not want to have the same lines in double, given a specific text file. Could you show me the command please?
Assuming whitespace is significant, the typical solution is:
awk '!x[$0]++' file.txt
(eg, The line "ab " is not considered the same as "ab". It is probably simplest to pre-process the data if you want to treat whitespace differently.)
--EDIT--
Given the modified question, which I'll interpret as only wanting to check uniqueness after a given column, try something like:
awk '!x[ substr( $0, 2 )]++' file.txt
This will only compare columns 2 through the end of the line, ignoring the first column. This is a typical awk idiom: we are simply building an array named x (one letter variable names are a terrible idea in a script, but are reasonable for a one-liner on the command line) which holds the number of times a given string is seen. The first time it is seen, it is printed. In the first case, we are using the entire input line contained in $0. In the second case we are only using the substring consisting of everything including and after the 2nd character.
Try this simple script:
cat file.txt | sort | uniq
cat will output the contents of the file,
sort will put duplicate entries adjacent to each other
uniq will remove adjcacent duplicate entries.
Hope this helps!
The uniq command will do what you want.
But make sure the file is sorted first, it only checks for consecutive lines.
Like this:
sort file.txt | uniq

Resources