Extend Templated Data Structure (Inheritance) - rpgle

I have been reading about LIKEDS, TEMPLATE, and BASED trying to determine if there is a way to create data structure templates (prototypes) with inheritance. I have:
D costs DS QUALIFIED TEMPLATE
D material 6 0
D cutting 6 0
D ...etc...
D boxCosts DS LIKEDS(costs)
D folding 6 0
D ...etc...
D posterCosts DS LIKEDS(costs)
D laminating 6 0
D ...etc...
Where I want boxCosts to look like:
boxCosts:
material
cutting
folding
etc. (no laminating, this isn't a poster)
Is there any way to achieve this type of data structure template? I know I could do:
D boxCosts DS
D common LIKEDS(costs)
D folding 6 0
D ...etc...
But this creates a hierarchy when I want a flat structure.
I could maybe do this with a copybook, but I don't know if it would be worse to have a copy book for just the data structure parts I want in its own file, or to have potentially complicated conditional copy book for the whole application that has a small area for copying this information...? Templates come so close to what I want I suspect I must just be missing something.
If you are wondering, the compile error I get from trying to create an inherited data structure like I have shown is RNF3703: The subfield or parameter definition is not specified within a group. on the first D spec below the LIKEDS keyword.
Thanks for reading.

RPG data structures are memory maps. They define a way to group and overlap variables in a specific way in memory. That's why if you LIKEDS() you get a hierarchy - the compiler is copying the hierarchy from the template to your destination.
There's at least one way to flatten the structure:
d costs ds template
d t_material 6s 0
d t_cutting 6s 0
d box e ds extname(boxcosts) prefix(t_) template
d boxCosts ds qualified
d material like(t_material)
d cutting like(t_cutting)
d folding like(t_folding)
boxCosts.cutting = 1;
boxCosts.folding = 2;
The first structure is defined in the program; the second is based on a file. I did that only to show two different ways of getting the subfields defined.

You can accomplish your goal, if you are willing to use SQL to solve the problem. While ILE RPG data structures do not have inheritance, SQL tables can simulate this.
CREATE TABLE costs
(material num(6,0)
,cutting num(6,0)
);
CREATE TABLE boxCosts
( LIKE costs
,folding num(6,0)
,sealing num(6,0)
);
CREATE TABLE postrCosts
( LIKE costs
,laminating num(6,0)
);
If all you care about it the field names and definitions, that method may be fine, and all you need to use those structures in RPG would be
D boxCosts E DS EXTNAME(boxCosts)
D posterCosts E DS EXTNAME(postrCosts)
If field text or other attributes are important to you, then you may be better off with a slightly different strategy.
CREATE TABLE costs
(material num(6,0)
,cutting num(6,0)
);
LABEL ON COLUMN costs
(material text is 'Material Costs'
,cutting text is 'Cutting Costs'
);
CREATE TABLE boxCosts as
(SELECT *
FROM costs
) with no data
;
ALTER TABLE boxCosts
ADD COLUMN folding num(6,0)
ADD COLUMN sealing num(6,0)
;
LABEL ON COLUMN boxCosts
(folding text is 'Folding Costs'
,sealing text is 'Folding Costs'
);

Related

How can I generate a list of linked data with SQL

I've got a table with this kind of data:
I would like to this result with a query:
Do you have any idea how to achieve this ?
I know that I need to use XMLAGG somewhere to get the final concatenation but I don't know how to group A B C and D (the rule is because 1 has A & B, 2 has B & C then A is linked to C, etc)
Thanks

How to produce a table of three inputs to reach a given output? (Excel model)

I have a very detailed excel model to calculate the profitability of a project, that we can call P.
The model has been simplified to compute from 3 unrelated variables. I would like to automatically create a table that shows how inputs A, B and C might vary in order to produce a pre-defined level of profitability, P. For instance, if A = 4 & B = 30, then C must = 2 in order for P to equal 20%. Likewise, if A = 5 & B = 25, then C must = 3 in order for P to equal 20%. A and B should be tested at sensible increments, perhaps 8 intervals each.
A laborious (not scalable) equivalent would be to manually define A and B, then goal-seek C to our pre-defined level of P - we'd then repeat for each combination of A and B at the given intervals and record in a two-way table.
I believe a conventional two-way data table would be pratical if the model sitting behind the inputs were greatly simplified, unfortunately this isn't possible.
Thanks to anyone that can lend a hand. Kind regards.
I think the best way to approach this will be with a VBA macro and the prebuilt GoalSeek Function something like this (p is in cell D1) :
Range(”D1”).GoalSeek Goal:=20 _
ChangingCell:=Range(“C1”)

Summing up a related table's values in PowerPivot/DAX

Say I have two tables. attrsTable:
file | attribute | value
------------------------
A | xdim | 5
A | ydim | 6
B | xdim | 7
B | ydim | 3
B | zdim | 2
C | xdim | 1
C | ydim | 7
sizeTable:
file | size
-----------
A | 17
B | 23
C | 34
I have these tables related via the 'file' field. I want a PowerPivot measure within attrsTable, whose calculation uses size. For example, let's say I want xdim+ydim/size for each of A, B, C. The calculations would be:
A: (5+6)/17
B: (7+3)/23
C: (1+7)/34
I want the measure to be generic enough so I can use slicers later on to slice by file or attribute. How do I accomplish this?
I tried:
dimPerSize := CALCULATE([value]/SUM(sizeTable[size])) # Calculates 0
dimPerSize := CALCULATE([value]/SUM(RELATED(sizeTable[size]))) # Produces an error
Any idea what I'm doing wrong? I'm probably missing some fundamental concepts here of how to use DAX with relationships.
Hi Redstreet,
taking a step back from your solution and the one proposed by Jacob, I think it might be useful to create another table that would aggregate all the calculations (especially given you probably have more than 2 tables with file-specific attributes).
So I have created one more table that contains (only) unique file names, and thus the relationships could be visualized this way:
It's much simpler to add necessary measures (no need for calculated columns). I have actually tested 2 scenarios:
1) create simple SUM measures for both Attribute Value and File Size. Then divide those two measures and job done :-).
2) use SUMX functions to have a bit more universal solution. Then the final formula for DimPerSize calculation could look like this:
=DIVIDE(
SUMX(DISTINCT(fileTable[file]),[Sum of AttrValue]),
SUMX(DISTINCT(fileTable[file]),[Sum of FileSize]),
BLANK()
)
With [Sum of AttrValue] being:
=SUM(attrsTable[value])
And Sum of FileSize being:
=SUM(sizeTable[size])
This worked perfectly fine, even though SUMX in both cases goes over all instances of given file name. So for file B it also calculates with zdim (if there is a need to filter this out, then use simple calculate / filter combination). In case of file size, I am using SUMX as well, even though it's not really needed since the table contains only 1 record for each file name. If there would be 2 instances, then use SUMX or AVERAGEX depending on the desired outcome.
This is the link to my source file in Excel (2010).
Hope this helps.
You look to have the concept of relationships OK but you aren't on the right track in terms of CALCULATE() either in terms of the structure or the fact that you can't simply use 'naked' numerical columns, they need to be packaged in some way.
Your desired approach is correct in that once you get a simple version of the thing running, you will be able to slice and dice it over any of your related dimensions.
Best practice is probably to build this up using several measures:
[xdim] = CALCULATE(SUM('attrstable'[value]), 'attrstable'[attribute] = "xdim")
[ydim] = CALCULATE(SUM('attrstable'[value]), 'attrstable'[attribute] = "ydim")
[dimPerSize] = ([xdim] + [ydim]) / VALUES('sizeTable'[size])
But depending on exactly how your pivot is set up, this is likely to also throw an error because it will try and use the whole 'size' column in your totals. There are two main strategies for dealing with this:
Use an 'iterative' formula such as SUX() or AVERAGEX() to iterate individually over the 'file' field and then adds up or averages for the total e.g.
[ItdimPerSize] = AVERAGEX(VALUES('sizeTable'[file]), [dimPerSize])
Depending on the maths you want to use, you might find that produce a useful average that you need to use SUMX but devide by the number of cases i.e. COUNTROWS('sizeTable'[file]).
You might decide that the totals are irrelevant and simply introduce an error handling element that will make them blank e.g.
[NtdimPerSize] = IF(HASONEVALUE('sizeTable'[file]),[dimPerSize],BLANK())
NB, all of this assumes that when you are creating your pivot that you are 'dragging in' the file field from the 'sizetable'.

script task in SSIS to import excel spreadsheet

I have reviewed the questions that may have had my answer and unfortunately they don't seem to apply. Here is my situation. I have to import worksheets from my client. In columns A, C, D, and AA the client has the information I need. The balance of the columns have what to me is worthless information. The column headers are consistent in the four columns I need, but are very inconsistent in the columns that don't matter. For example cell A1 contains Division. This is true across all of the spreadsheets. Cell B1 can contain anything from sleeve length to overall length to fit. What I need to do is to import only the columns I need and map them to an SQL 2008 R2 table. I have defined the table in a stored procedure which is currently calling an SSIS function.
The problem is that when I try to import a spreadsheet that has different column names the SSIS fails and I have to go back in an run it manually to get the fields set up right.
I cannot imagine that what I am trying to do has not been done before. Just so the magnitude is not lost, I have 170 users who have over 120 different spreadsheet templates.
I am desperate for a workable solution. I can do everything after getting the file into my table in SQL. I have even written the code to move the files back to the FTP server.
I put together a post describing how I've used a Script task to parse Excel. It's allowe me to import decidedly non-tabular data into a data flow.
The core concept is that you will use a the JET or ACE provider and simply query the data out of an Excel Worksheet/named range. Once you have that, you have a dataset you can walk through row-by-row and perform whatever logic you need. In your case, you can skip row 1 for the header and then only import columns A, C, D and AA.
That logic would go in the ExcelParser class. So, the Foreach loop on line 71 would probably be distilled down to something like (code approximate)
// This gets the value of column A
current = dr[0].ToString();
// this assigns the value of current into our output row at column 0
newRow[0] = current;
// This gets the value of column C
current = dr[2].ToString();
// this assigns the value of current into our output row at column 1
newRow[1] = current;
// This gets the value of column D
current = dr[3].ToString();
// this assigns the value of current into our output row at column 2
newRow[2] = current;
// This gets the value of column AA
current = dr[26].ToString();
// this assigns the value of current into our output row at column 3
newRow[3] = current;
You obviously might need to do type conversions and such here but that's core of the parsing logic.

Best way to match 4 million rows of data against each other and sort results by similarity?

We use libpuzzle ( http://www.pureftpd.org/project/libpuzzle/doc ) to compare 4 million images against each other for similarity.
It works quite well.
But rather then doing a image vs image compare using the libpuzzle functions, there is another method of comparing the images.
Here is some quick background:
Libpuzzle creates a rather small (544 bytes) hash of any given image. This hash can in turn be used to compare against other hashes using libpuzzles functions. There are a few APIs... PHP, C, etc etc... We are using the PHP API.
The other method of comparing the images is by creating vectors from the given hash, here is a paste from the docs:
Cut the vector in fixed-length words. For instance, let's consider the
following vector:
[ a b c d e f g h i j k l m n o p q r s t u v w x y z ]
With a word length (K) of 10, you can get the following words:
[ a b c d e f g h i j ] found at position 0
[ b c d e f g h i j k ] found at position 1
[ c d e f g h i j k l ] found at position 2
etc. until position N-1
Then, index your vector with a compound index of (word + position).
Even with millions of images, K = 10 and N = 100 should be enough to
have very little entries sharing the same index.
So, we have the vector method working. Its actually works a bit better then the image vs image compare since when we do the image vs image compare, we use other data to reduce our sample size. Its a bit irrelevant and application specific what other data we use to reduce the sample size, but with the vector method... we would not have to do so, we could do a real test of each of the 4 million hashes against each other.
The issue we have is as follows:
With 4 million images, 100 vectors per image, this becomes 400 million rows. We have found MySQL tends to choke after about 60000 images (60000 x 100 = 6 million rows).
The query we use is as follows:
SELECT isw.itemid, COUNT(isw.word) as strength
FROM vectors isw
JOIN vectors isw_search ON isw.word = isw_search.word
WHERE isw_search.itemid = {ITEM ID TO COMPARE AGAINST ALL OTHER ENTRIES}
GROUP BY isw.itemid;
As mentioned, even with proper indexes, the above is quite slow when it comes to 400 million rows.
So, can anyone suggest any other technologies / algos to test these for similarity?
We are willing to give anything a shot.
Some things worth mentioning:
Hashes are binary.
Hashes are always the same length, 544 bytes.
The best we have been able to come up with is:
Convert image hash from binary to ascii.
Create vectors.
Create a string as follows: VECTOR1 VECTOR2 VECTOR3 etc etc.
Search using sphinx.
We have not yet tried the above, but this should probably yield a bit better results than doing the mysql query.
Any ideas? As mentioned, we are willing to install any new service (postgresql? hadoop?).
Final note, an outline of exactly how this vector + compare method works can be found in question Libpuzzle Indexing millions of pictures?. We are in essence using the exact method provided by Jason (currently the last answer, awarded 200+ so points).
Don't do this in a database, just use a simple file. Below i have shown a file with some of the words from the two vectores [abcdefghijklmnopqrst] (image 1) and [xxcdefghijklxxxxxxxx] (image 2)
<index> <image>
0abcdefghij 1
1bcdefghijk 1
2cdefghijkl 1
3defghijklm 1
4efghijklmn 1
...
...
0xxcdefghij 2
1xcdefghijk 2
2cdefghijkl 2
3defghijklx 2
4efghijklxx 2
...
Now sort the file:
<index> <image>
0abcdefghij 1
0xxcdefghij 2
1bcdefghijk 1
1xcdefghijk 2
2cdefghijkl 1
2cdefghijkl 2 <= the index is repeated, those we have a match
3defghijklm 1
3defghijklx 2
4efghijklmn 1
4efghijklxx 2
When the file have been sorted it's easy to find the records that have the same index. Write a small program or something that can run through the sorted list and find the duplicates.
i have opted to 'answer my own' question as we have found a solution that works quite well.
in the initial question, i mentioned we were thinking of doing this via sphinx search.
well, we went ahead and did it and the results are MUCH better then doing this via mysql.
so, in essence the process looks like this:
a) generate hash from image.
b) 'vectorize' this hash into 100 parts.
c) binhex (binary to hex) each of these vectors since they are in binary format.
d) store in sphinx search like so:
itemid | 0_vector0 1_vector1 2_vec... etc
e) search using sphinx search.
initially... once we had this sphinxbase full of 4 million records, it would still take about 1 second per search.
we then enabled distributed indexing for this sphinxbase, on 8 cores, and now are about to query about 10+ searches per second. this is good enough for us.
one final step would be to further distribute this sphinxbase over the multiple servers we have, further utilizing the unused cpu cycles we have available.
but for the time being, good enough. we add about 1000-2000 'items' per day, so searching thru 'just the new ones' will happen quite quickly... after we do the initial scan.

Resources