SUM not working 'Invalid or missing field format' - mainframe

I have an input file in this format: (length 20, 10 chars and 10 numerics)
jname1 0000500006
bname1 0000100002
wname1 0000400007
yname1 0000000006
jname1 0000100001
mname1 0000500012
mname2 0000700013
In my jcl I have defined my sysin data as such:
SYSIN DATA *
SORT FIELDS=(1,1,CH,A)
SUM FIELDS=(11,10,FD)
DATAEND
*
It works fine as long as I don't add the sum fields so I'm wondering if I'm using the wrong format for my numerics seeing as I know they start at field 11 and have a length of 10 the format is the only thing that could be wrong.
As you might have already realised the point of this JCL is to just list the values but grouped by the first letter of the name (so for the example data and JCL I have given it would group the numeric for mname1 and mname2 together but leave the other records untouched).
I'm kind of new at this so I was wonder what I need for the format if my numerics are like that in the input file.

If new to DFSORT, get hold of the DFSORT Getting Started guide for your version of DFSORT (http://www-01.ibm.com/support/docview.wss?uid=isg3T7000080).
This takes your through all the basic operations with many examples.
The DFSORT Application Programming Guide describes everything you need to know, in detail. Again with examples. Appendix C of that document contains all the data-types available (note, when you tried to use FD, FD is not valid data-type, so probably a typo). There are Tables throughout the document listing what data-types are available where, if there is a particular limit.
For advanced techniques, consult the DFSORT Smart Tricks publication here: http://www-01.ibm.com/support/docview.wss?uid=isg3T7000094
You need to understand a bit more the way data is stored on a Mainframe as well.
Decimals (which can be "packed-decimal" or "zoned-decimal") do not contain a decimal-point. The decimal-point is implied. In high-level languages you tell the compiler where the decimal-point is (in a fixed position) and the compiler does the alignments for you. In Assembler, you do everything yourself.
Decimals are 100% accurate, as there are machine-instructions which act directly on packed-decimal data giving packed-decimal results.
A field which actually contains a decimal-point, cannot be directly used in arithmetic.
An unsigned field is treated as positive when used in any arithmetic.
The SUM statement supports a limited number of numeric definitions, and you have chosen the correct one. It does not matter that your data is unsigned.
If the format of the output from SUM is not what you want, look at OPTION ZDPRINT (or NOZDPRINT).
If you want further formatting, you can use OUTREC or OUTFIL.
As an option to using SUM, you can use OUTFIL reporting functions (especially, although not limited to, if you want a report). You can use SECTIONS and TRAILER3 with TOT/TOTAL.
Something to watch for with SUM (which is not a problem with the reporting features) is if any given one (or more) of your SUMmed fields exceed the field size. To continue to use SUM if that happens, you need to extend the field in INREC and then get SUM to use the new, sufficient, size.

After some trial and error I finally found it, appearantly the format I needed to use was the ZD format (zoned decimal, signed), so my sysin becomes this:
SYSIN DATA *
SORT FIELDS=(1,1,CH,A)
SUM FIELDS=(11,10,ZD)
DATAEND
*
even though my records don't contain any decimals and they are unsigned, I don't really get it so if someone knows why it's like that please go ahead and explain it to me.
For now the way I'm going to remember it is this: Z = symbol for real (meaning integers so no decimals)

Related

is there a way to calculate every possible order of operation for 1 operation in Python?

Let's say that I have a = '1+2*5/3', there's a specific order to which my machine will evaluate this statement (with eval(a))
I would like to know if there's a line of code (or a function? just an elegant way that could get the job done) that would calculate :
(1+2)*5/3
1+(2*5)/3
1+2*(5/3)
(1+2*5)/3
1+(2*5/3)
(1+2)*(5/3)
1+2*5/3
In this example, I used an operation with 4 factors, so I could just code 1 function for each possibility, but I need to do the same thing with 6 factors and that would just take way too much time and effort since the possibility of different operation order would increase exponentially
It would be also great that it returns everything in a dictionary in this form {operation:result} with the parentheses included, if not i'll find my way around it
edit: as requested, the main goal is to make a program that find the solution to the game " le compte est bon " brute force method, the rules can be found here : https://en.wikipedia.org/wiki/Des_chiffres_et_des_lettres#Le_compte_est_bon_.28.22the_total_is_right.22.29
This is going to be very hard. I recommend you follow these steps:
Create a list to check if the formula has already been calculated
Randomize the order (such as +-*/ and randomly place numbers
Check if rule number`s one is a valid formula. if not try number 1 again
Randomize the order (such as opening and closing parentheses and ^)
Check if the sentence above is a valid formula. if not try number 3 again.
Check the formula through the list and see if it has already been calculated. if it has been calculated then we don't use it and go back to number two. but...
If it is not in the list then we can use it.
Those are the basic steps for common known math symbols, but what about square root?
Another way to do this is by making python move the symbols over like you did with the parentheses, but for EVERYTHING (numbers and symbols(+-/*))
EDIT:
This was before the original question was changed.

Fortran `write (*, '(3G24.16)')` error

I have a Fortran file that must write these complicated numbers, basically I can't change these numbers:
File name: complicatedNumbers.f
implicit none
write (*,'(3G24.16)') 0.4940656458412465-323, 8.651144521298990, 495.6336980600139
end
It's then run with gfortran -o outa complicatedNumbers.f on my Ubuntu, but this error comes up:
Error: Expected expression in WRITE statement at (1)
I'm sure it has something to do with the complicated numbers because there are no errors if I change the three complicated numbers into simple numbers such as 11.11, 22.2, 33.3.
This is actually a stripped-down version of a complex Fortran file that contains many variables and links to other files. So ideally, the 3G24.16 should not be changed.
What does the 3G24.16 mean?
How can I fix it so that I can ultimately print out these numbers with ./outa?
There is nothing syntactically wrong in the snippet you've shown us. However, your use of a file name with the suffix .f makes me think that the compiler is assuming that your code is written in fixed form. That is the usual default behaviour of gfortran. If that is the case it probably truncates that line at about the last , which means that the compiler sees
write (*,'(3G24.16)') 0.4940656458412465-323, 8.651144521298990,
and raises the complaint you have shared with us. Either join us in the 21st Century and switch to free form source files, change .f to .f90 and see what fun ensues, or continue the line correctly with some character in column 6 of the next line.
As to what 3G24.16 means, refer to your favourite Fortran reference material under the heading of data edit descriptors, in particular the g data edit descriptor.
Oh, and if my bandying about of the terms fixed form source and free form source bamboozles you, read about them in your favourite Fortran reference material too.
Three errors in your program :
as you clearly use the Fortran fixed format, instructions are limited to 72 characters (132 in free format)
the number 0.4940656458412465-323 is probably not correctly written. The exponent character is missing. Try 0.4940656458412465D-323 instead. Here Fortran computes the substraction => 0.4940656458412465-323 is replaced by -322.505934354159. Notice that I propose the exponent D (double precision). Writing 0.4940656458412465E-323 is inaccurate because, for a single precision number, the minimum value for the exponent is -127.
other numbers should have an exponent D0 too because, in single precision, the number of significant digits do not exceed 6.
Possible correction, always in fixed format :
implicit none
write (*,'(3G24.16)') 0.4940656458412465D-323,
& 8.651144521298990d0,
& 495.6336980600139d0
end

Remove Column with DFSORT

I have the following input file with a variable length (RECFM=VB) for example :
AAAAABBBBBCCCCCDDDDDEEEEEFFFFF
AAAAABBBBBCCCCCDDDDDEEEEEFFFFF
AAAAABBBBBCCCCCDDDDDEEEEEFFFFF
I am trying to get the output file as below by skipping the A column. Is there a way that I could do this using DFSORT ? (outrec ?!)
BBBBBCCCCCDDDDDEEEEEFFFFF
BBBBBCCCCCDDDDDEEEEEFFFFF
Of course
OPTION COPY
INREC BUILD=(1,4,6)
The 1,4 is the RDW (Record Descriptor Word) and is always necessary in a BUILD for a variable-length record. The "6" says, "from start position 6 to the end of the variable record". DFSORT will adjust the record-length in the RDW accordingly, and your output on SORTOUT should be what you want.
It would the same with OUTREC instead of INREC, but unless OUTREC is needed (after a SORT and with processing dependent on that) I use INREC.
It would also be possible with OUTFIL, but same applies (for me).
EDIT:
For a comparison, here is removing the first five of a fixed-length record. I'll use an LRECL of 80:
OPTION COPY
INREC BUILD=(6,75,5X)
The 5X will put five blanks after the 75 bytes of data, if the LRECL is to stay the same, else leave it off.
DFSORT manuals are available online from IBM, including a good "Getting Started". There are many examples in the manuals. For more complex manipulations there is the "Smart DFSORT Tricks" publication from IBM.
EDIT:
From your comment, start reading from here:
http://publib.boulder.ibm.com/infocenter/zos/v1r13/index.jsp?topic=%2Fcom.ibm.zos.r13.iceg200%2Fice1cg6025.htm
And here:
The fields are "fixed" in the discussion, do not confuse this with fixed-length records. Fixed-length fields are where you have a start-position and a length. Variable fields are where you only have the start-position (on a variable-length record) or when defining PARSEd fields.
The document is also available as a PDF, ice1cg60.pdf is the current one, but it is worth locating the one which matches your version/level of DFSORT.

Check if values of two string-type items are equal in a Zabbix trigger

I am monitoring an application using Zabbix and have defined a custom item which returns a string value. Since my item's values are actually checksums, they will only contain the characters [0-9a-f]. Two mirror copies of my application are running on two servers for the sake of redundancy. I would like to create a trigger which would take the item values from both machines and fire if they are not the same.
For a moment, let's forget about the moment when values change (it's not an atomic operation, so the system may see inconsistent state, which is not a real error, for a short time), since I could work around it by looking at several previous values.
The crux is: how to write a Zabbix trigger expression which could compare for equality the string values of two items (the same item on two mirror hosts, actually)?
Both according to the fine manual and as I confirmed in praxis, the standard operators = and # only work on numeric values, so I can't just write the natural {host1:myitem[param].last(0)} # {host2:myitem[param].last(0)}. Functions such as change() or diff() can only compare values of the same item at different points in time. Functions such as regexp() can only compare the item's value with a constant string/regular expression, not with another item's value. This is very limiting.
I could move the comparison logic into the script which my custom item executes, but it's a bit messy and not elegant, so if at all possible, I would prefer to have this logic inside my Zabbix trigger.
Perhaps despite the limitations listed above, someone can come up with a workaround?
Workaround:
{host1:myitem[param].change(0)} # {host2:myitem[param].change(0)}
When only one of the servers sees a modification since the previously received value, an event is triggered.
From the Zabbix Manual,
change (float, int, str, text, log)
Returns difference between last and previous values.
For strings:
0 - values are equal
1 - values differ
I believe, and am struggling with this EXACT situation to this myself, that the correct way to do this is via calculated items.
You want to create a new ITEM, not trigger (yet!), that performs a calculated comparison on multiple item values (Strings Difference, Numbers within range, etc).
Once you have that item, have the calculation give you a value you can trigger off of. You can use ANY trigger functions in your calculation along with arrhythmic operations.
Now to the issue (which I've submitted a feature request for because this is extremely limiting), most trigger expressions evaluate to a number or a 0/1 bool.
I think I have a solution for my problem, which is that I am tracking a version number from a webpage: e.g. v2.0.1, I believe I can use string connotation and regex in calculated items in order to convert my string values into multiple number values. As these would be a breeze to compare.
But again, this is convoluted and painful.
If you want my advice, have yourself or a dev look at the code for trigger expressions and see if you can submit a patch add one trigger function for simple string comparison. (Difference, Length, Possible conversion to numerical values (using binary and/or hex combinations) etc.)
I'm trying to work on a patch myself, but I don't have time as I have so much monitoring to implement and while zabbix is powerful, it's got several huge flaws. I still believe it's the best monitoring system out there.
Simple answer: Create a UserParameter until someone writes a patch.
You could change your items to return numbers instead of strings. Because your items are checksums that are using only [0-9a-f] characters, they are numbers written in hexadecimal. So you would need to convert the checksum to decimal number.
Because the checksum is a big number, you would need to limit the hexadecimal number to 8 characters for Numeric (unsigned) type before conversion. Or if you would want higher precision, you could use float (but that would be more work):
Numeric (unsigned) - 64bit unsigned integer
Numeric (float) - floating point number
Negative values can be stored.
Allowed range (for MySQL): -999999999999.9999 to 999999999999.9999 (double(16,4)).
I wish Zabbix would have .hashedUnsigned() function that would compute hash of a string and return it as a number. Such a function should be easy to write.

Difference between text and varchar (character varying)

What's the difference between the text data type and the character varying (varchar) data types?
According to the documentation
If character varying is used without length specifier, the type accepts strings of any size. The latter is a PostgreSQL extension.
and
In addition, PostgreSQL provides the text type, which stores strings of any length. Although the type text is not in the SQL standard, several other SQL database management systems have it as well.
So what's the difference?
There is no difference, under the hood it's all varlena (variable length array).
Check this article from Depesz: http://www.depesz.com/index.php/2010/03/02/charx-vs-varcharx-vs-varchar-vs-text/
A couple of highlights:
To sum it all up:
char(n) – takes too much space when dealing with values shorter than n (pads them to n), and can lead to subtle errors because of adding trailing
spaces, plus it is problematic to change the limit
varchar(n) – it's problematic to change the limit in live environment (requires exclusive lock while altering table)
varchar – just like text
text – for me a winner – over (n) data types because it lacks their problems, and over varchar – because it has distinct name
The article does detailed testing to show that the performance of inserts and selects for all 4 data types are similar. It also takes a detailed look at alternate ways on constraining the length when needed. Function based constraints or domains provide the advantage of instant increase of the length constraint, and on the basis that decreasing a string length constraint is rare, depesz concludes that one of them is usually the best choice for a length limit.
As "Character Types" in the documentation points out, varchar(n), char(n), and text are all stored the same way. The only difference is extra cycles are needed to check the length, if one is given, and the extra space and time required if padding is needed for char(n).
However, when you only need to store a single character, there is a slight performance advantage to using the special type "char" (keep the double-quotes — they're part of the type name). You get faster access to the field, and there is no overhead to store the length.
I just made a table of 1,000,000 random "char" chosen from the lower-case alphabet. A query to get a frequency distribution (select count(*), field ... group by field) takes about 650 milliseconds, vs about 760 on the same data using a text field.
(this answer is a Wiki, you can edit - please correct and improve!)
UPDATING BENCHMARKS FOR 2016 (pg9.5+)
And using "pure SQL" benchmarks (without any external script)
use any string_generator with UTF8
main benchmarks:
2.1. INSERT
2.2. SELECT comparing and counting
CREATE FUNCTION string_generator(int DEFAULT 20,int DEFAULT 10) RETURNS text AS $f$
SELECT array_to_string( array_agg(
substring(md5(random()::text),1,$1)||chr( 9824 + (random()*10)::int )
), ' ' ) as s
FROM generate_series(1, $2) i(x);
$f$ LANGUAGE SQL IMMUTABLE;
Prepare specific test (examples)
DROP TABLE IF EXISTS test;
-- CREATE TABLE test ( f varchar(500));
-- CREATE TABLE test ( f text);
CREATE TABLE test ( f text CHECK(char_length(f)<=500) );
Perform a basic test:
INSERT INTO test
SELECT string_generator(20+(random()*(i%11))::int)
FROM generate_series(1, 99000) t(i);
And other tests,
CREATE INDEX q on test (f);
SELECT count(*) FROM (
SELECT substring(f,1,1) || f FROM test WHERE f<'a0' ORDER BY 1 LIMIT 80000
) t;
... And use EXPLAIN ANALYZE.
UPDATED AGAIN 2018 (pg10)
little edit to add 2018's results and reinforce recommendations.
Results in 2016 and 2018
My results, after average, in many machines and many tests: all the same (statistically less than standard deviation).
Recommendation
Use text datatype, avoid old varchar(x) because sometimes it is not a standard, e.g. in CREATE FUNCTION clauses varchar(x)≠varchar(y).
express limits (with same varchar performance!) by with CHECK clause in the CREATE TABLE e.g. CHECK(char_length(x)<=10). With a negligible loss of performance in INSERT/UPDATE you can also to control ranges and string structure e.g. CHECK(char_length(x)>5 AND char_length(x)<=20 AND x LIKE 'Hello%')
On PostgreSQL manual
There is no performance difference among these three types, apart from increased storage space when using the blank-padded type, and a few extra CPU cycles to check the length when storing into a length-constrained column. While character(n) has performance advantages in some other database systems, there is no such advantage in PostgreSQL; in fact character(n) is usually the slowest of the three because of its additional storage costs. In most situations text or character varying should be used instead.
I usually use text
References: http://www.postgresql.org/docs/current/static/datatype-character.html
In my opinion, varchar(n) has it's own advantages. Yes, they all use the same underlying type and all that. But, it should be pointed out that indexes in PostgreSQL has its size limit of 2712 bytes per row.
TL;DR:
If you use text type without a constraint and have indexes on these columns, it is very possible that you hit this limit for some of your columns and get error when you try to insert data but with using varchar(n), you can prevent it.
Some more details: The problem here is that PostgreSQL doesn't give any exceptions when creating indexes for text type or varchar(n) where n is greater than 2712. However, it will give error when a record with compressed size of greater than 2712 is tried to be inserted. It means that you can insert 100.000 character of string which is composed by repetitive characters easily because it will be compressed far below 2712 but you may not be able to insert some string with 4000 characters because the compressed size is greater than 2712 bytes. Using varchar(n) where n is not too much greater than 2712, you're safe from these errors.
text and varchar have different implicit type conversions. The biggest impact that I've noticed is handling of trailing spaces. For example ...
select ' '::char = ' '::varchar, ' '::char = ' '::text, ' '::varchar = ' '::text
returns true, false, true and not true, true, true as you might expect.
Somewhat OT: If you're using Rails, the standard formatting of webpages may be different. For data entry forms text boxes are scrollable, but character varying (Rails string) boxes are one-line. Show views are as long as needed.
A good explanation from http://www.sqlines.com/postgresql/datatypes/text:
The only difference between TEXT and VARCHAR(n) is that you can limit
the maximum length of a VARCHAR column, for example, VARCHAR(255) does
not allow inserting a string more than 255 characters long.
Both TEXT and VARCHAR have the upper limit at 1 Gb, and there is no
performance difference among them (according to the PostgreSQL
documentation).
The difference is between tradition and modern.
Traditionally you were required to specify the width of each table column. If you specify too much width, expensive storage space is wasted, but if you specify too little width, some data will not fit. Then you would resize the column, and had to change a lot of connected software, fix introduced bugs, which is all very cumbersome.
Modern systems allow for unlimited string storage with dynamic storage allocation, so the incidental large string would be stored just fine without much waste of storage of small data items.
While a lot of programming languages have adopted a data type of 'string' with unlimited size, like C#, javascript, java, etc, a database like Oracle did not.
Now that PostgreSQL supports 'text', a lot of programmers are still used to VARCHAR(N), and reason like: yes, text is the same as VARCHAR, except that with VARCHAR you MAY add a limit N, so VARCHAR is more flexible.
You might as well reason, why should we bother using VARCHAR without N, now that we can simplify our life with TEXT?
In my recent years with Oracle, I have used CHAR(N) or VARCHAR(N) on very few occasions. Because Oracle does (did?) not have an unlimited string type, I used for most string columns VARCHAR(2000), where 2000 was at some time the maximum for VARCHAR, and in all practical purposes not much different from 'infinite'.
Now that I am working with PostgreSQL, I see TEXT as real progress. No more emphasis on the VAR feature of the CHAR type. No more emphasis on let's use VARCHAR without N. Besides, typing TEXT saves 3 keystrokes compared to VARCHAR.
Younger colleagues would now grow up without even knowing that in the old days there were no unlimited strings. Just like that in most projects they don't have to know about assembly programming.
I wasted way too much time because of using varchar instead of text for PostgreSQL arrays.
PostgreSQL Array operators do not work with string columns. Refer these links for more details: (https://github.com/rails/rails/issues/13127) and (http://adamsanderson.github.io/railsconf_2013/?full#10).
If you only use TEXT type you can run into issues when using AWS Database Migration Service:
Large objects (LOBs) are used but target LOB columns are not nullable
Due to their unknown and sometimes large size, large objects (LOBs) require more processing
and resources than standard objects. To help with tuning migrations of systems that contain
LOBs, AWS DMS offers the following options
If you are only sticking to PostgreSQL for everything probably you're fine. But if you are going to interact with your db via ODBC or external tools like DMS you should consider using not using TEXT for everything.
character varying(n), varchar(n) - (Both the same). value will be truncated to n characters without raising an error.
character(n), char(n) - (Both the same). fixed-length and will pad with blanks till the end of the length.
text - Unlimited length.
Example:
Table test:
a character(7)
b varchar(7)
insert "ok " to a
insert "ok " to b
We get the results:
a | (a)char_length | b | (b)char_length
----------+----------------+-------+----------------
"ok "| 7 | "ok" | 2

Resources