I am trying to map a schema like this example, that has repeating records inside repeating records.
Sample File:
0000456Toy Industries56Palumbia DriveEnfield CT06082 98724
302369097 King Toy Store20 Cherry WayDedhamMA02026 TS2349
402369097 036436 Playstation 4 Entert03452449.99 20140826
402369097 036437 Msoft XBOX ONEEntert01234399.99 20140826
402369097 036438 Wooden Horse 07892 59.99 20140827
402369097 036439 Playstation 4 Entert03452449.99 20140827
402369097 036440 My First Brew Kit 99.99 20140828
602369097000100010005I
302369235 Make Believe 342 Brand DriveBridgeMA02324 TS5439
402369235 054324 Playstation 4 Entert53452449.99 20140827
402369235 054325 Steam Box Ultimate 54234699.99 20140827
602369235000100010002I
900033336133310001
Sequence:
<Processor Record> (one per file | 1)
<Store Record> (one per store | 1+)
<Order Record> (one for each order the store had | 1+)
<Store Batch Control Record> (one per store | 1+)
<File Batch Control Record> (one per file | 1)
Each Record (line) will be broken into field elements by positional specifications
I have tried using the wizard a couple of times and manually adjusting the settings as well, but keep running into issues like this one:
The current definition being parsed is Root. The stream offset where the error occurred is 0. The line number where the error occurred is 1. The column where the error occurred is 0.
I am rather new to BizTalk and was hoping that someone may be able to give me some help on how to accomplish this task. Thanks!
Use the Tag Identifiers to indicate the type of record.
In your case, this would be:
00 Processor Record Header
30 Store Record Header
40 Order Record
60 Store Record Summary
90 Processor Record Summary
This is how I would do it:
You would have a root node with the following parameters:
Structure: delimited
Child Delimiter Type: hexadecimal
Child delimiter: 0x0a 0x0d (newline character of your choice)
Child Order: Infix
The node below that would be the "Header" node.
It's properties:
Max Occurs: 1
Min Occurs: 1
Structure: positional
Tag Identifier: 00
The next node would be the "Store" node.
This is an abstract node, just to be able to capture a store header, a recurring order record and a store summary node.
It's properties:
Max Occurs: *
Min Occurs: 0 (depending what's possible on your flat file)
Structure: delimited
Child Delimiter Type: hexadecimal
Child delimiter: 0x0a 0x0d (newline character of your choice)
Child Order: Infix
The node below that would be the "StoreHeader" node.
Max Occurs: 1
Min Occurs: 0 (depending what's possible on your flat file)
Structure: positional (only you know the exact sequence of fields)
Tag Identifier: 30
Next to that, the "Order" node:
Max Occurs: *
Min Occurs: 0 (depending what's possible on your flat file)
Structure: positional (only you know the exact sequence of fields)
Tag Identifier: 40
Next to that, the "StoreSummary" node:
Max Occurs: 1
Min Occurs: 0 (depending what's possible on your flat file)
Structure: positional (only you know the exact sequence of fields)
Tag Identifier: 60
At last, at the same level as the "Header" and "Store" nodes, you would have the "Summary" node:
Max Occurs: 1
Min Occurs: 1
Structure: positional
Tag Identifier: 90
Related
How do I print a conditional field using PPFA code. When a value is an 'X' then I'd like to print it. However, if the 'X' is not present then I'd like to print an image. Here is my code:
LAYOUT C'mylayout' BODY
POSITION .25 in ABSOLUTE .25 in
FONT TIMES
OVERLAY MYTEMPOVER 8.5 in 11.0 in;
FIELD START 1 LENGTH 60
POSITION 2.0 in 1.6 in;
Where it has FIELD START 1 LENGTH 60 that will print the given text at that location. But based on the value I want to print either the given text or an image. How would I do that?
Here is an answer from the AFP-L list:
I would create two PAGEFORMATS, one with LAYOUT for TEXT and one with LAYOUT for IMAGE. With CONDITION you can jump between the Pageformats (where Copygroup is always 'NULL')
If you work in a z/OS environment, be careful of 'JES Blanc Truncation'.
That means in one sentence:
if there is a X in the data, condition is true
if there is nothing in the data, condition doesn't work and is always wrong (nothing happens)
In this case you must create a Condition which is always true. I call it a Dummy-Condition.
PPFA sample syntax:
CONDITION TEST start 1 length 1
when eq 'X' NULL PAGEFORMAT PRTTXT
when ge x'00' NULL PAGEFORMAT PRTIMAGE;
You must copy this CONDITION into both PAGEFORMATS after LAYOUT command.
Blanc truncation is a difficult problem on z/OS.
In this sample, the PAGEFORMAT named PRTTXT contains all the formatting and printing directives when the condition is true, and the other called PRTIMAGE contains every directive needed to print the image.
HTH
I am working on question 4 of adventofcode. I have a list of strings called "q4" where there are 3 lines (just simple data for now) and each line has keys & values, such as: passport ID being 662406624, or birth year being 1947, etc.
show q4
"eyr:2024 pid:662406624 hcl:#cfa07d byr:1947 iyr:2015 ecl:amb hgt:150cm"
"iyr:2013 byr:1997 hgt:182cm hcl:#ceb3a1 eyr:2027 ecl:gry cid:102 pid:018128"
"hgt:61in iyr:2014 pid:916315544 hcl:#733820 ecl:oth"
I created a function to grab the value for a given key
get_field_value: {[field; pp_str] pp_fields: " " vs pp_str; pid_field: pp_fields[where like[pp_fields; field,":*"]]; start_i: (pid_field[0] ss ":")[0] + 1; end_i: count pid_field[0]; indices: start_i + til (end_i - start_i); pid_field[0][indices]}
fields: ("eyr"; "pid"; "hcl"; "byr"; "iyr"; "ecl"; "hgt")
With the help from this other thread, I could get the values for a given list of keys: kdb/q: How to apply a string manipulation function to a vector of strings to output a vector of strings?
get_field_value[; q4[0]] each fields / Iterates through each field
"2024"
"662406624"
"#cfa07d"
"1947"
"2015"
"amb"
"150cm"
But now how do I do this for each line in my text file (each of the 3 strings in "q4")? In Python or C++ logic, basically I want to do a nested for-loop, the outer loop to iterate through each of the strings, and then within that, for each string, the inner loop grabs the value for each of the keys (fields):
/ Attempt 1 - Fail
get_field_value[each fields ; each q4]
/ Attempt 2 - Fail
each[get_field_value[; each q4]] fields
/ Attempt 3 - Fail
get_field_value[; each q4] each fields
How do I do this? Thanks!
I have a BUNCH of fixed width text files that contain multiple transaction types with only 3 that I care about (121,122,124).
Sample File:
D103421612100188300000300000000012N000002000001000032021420170012260214201700122600000000059500000300001025798
D103421612200188300000300000000011000000000010000012053700028200004017000000010240000010000011NNYNY000001000003N0000000000 00
D1034216124001883000003000000000110000000000300000100000000000CS00000100000001200000033NN0 00000001200
So What I need to do is read line by line from these files and look for the ones that have a 121, 122, or 124 at startIndex = 9 and length = 3.
Each line needs to be parsed based on a data dictionary I have and the output needs to be grouped by transaction type into three different files.
I have a process that works but it's very inefficient, basically reading each line 3 times. The code I have is something like this:
#121 = EXTRACT
col1 string,
col2 string,
col3 string //ect...
FROM inputFile
USING new MyCustomExtractor(
new SQL.MAP<string, string> {
{"col1","2"},
{"col2","6"},
{"col3","3"} //ect...
};
);
OUTPUT #121
TO 121.csv
USING Outputters.Csv();
And I have the same code for 122 and 124. My custom extractor takes the SQL MAP and returns the parsed line and skips all lines that don't contain the transaction type I'm looking for.
This approach also means I'm running through all the lines in a file 3 times. Obviously this isn't as efficient as it could be.
What I'm looking for is a high level concept of the most efficient way to read a line, determine if it is a transaction I care about, then output to the correct file.
Thanks in advance.
How about pulling out the transaction type early using the Substring method of the String datatype? Then you can do some work with it, filtering etc. A simple example:
// Test data
#input = SELECT *
FROM (
VALUES
( "D103421612100188300000300000000012N000002000001000032021420170012260214201700122600000000059500000300001025798" ),
( "D103421612200188300000300000000011000000000010000012053700028200004017000000010240000010000011NNYNY000001000003N0000000000 00" ),
( "D1034216124001883000003000000000110000000000300000100000000000CS00000100000001200000033NN0 00000001200" ),
( "D1034216999 0000000000000000000000000000000000000000000000000000000000000000000000000000000 00000000000" )
) AS x ( rawData );
// Pull out the transaction type
#working =
SELECT rawData.Substring(8,3) AS transactionType,
rawData
FROM #input;
// !!TODO do some other work here
#output =
SELECT *
FROM #working
WHERE transactionType IN ("121", "122", "124"); //NB Note the case-sensitive IN clause
OUTPUT #output TO "/output/output.csv"
USING Outputters.Csv();
As of today, there is no specific U-SQL function that can define the output location of a tuple on the fly.
wBob presented an approach to a potential workaround. I'd extend the solution the following way to address your need:
Read the entire file, adding a new column that helps you identify the transaction type.
Create 3 rowsets (one for each file) using a WHERE statement with the specific transaction type (121, 122, 124) on the column created in the previous step.
Output each rowset created in the previous step to their individual file.
If you have more feedback or needs, feel free to create an item (and voting for others) on our UserVoice site: https://feedback.azure.com/forums/327234-data-lake. Thanks!
We are trying to implement a file based student record program.We want to sort the file that contain the details of the student according to the roll number which is at the first position of every line.
the file contains the following data:
1/rahul/cs
10/manish sharma/mba
5/jhon/ms
2/ram/bba
We want to sort the file's data according to the first field i.e. roll number.
Any help shall be great
You use sorted:
dataSorted = list(sorted(f.readlines()))
If you want to sort only the first element use:
dataSorted = list(sorted(f.readlines(), lambda line: line[:line.find('/')]))
f is the file-object.
Further information: help(sorted)
I'm running a fortran 90 program that has an array of alpha values with i=1 to 40. I'm trying to output the array into 5 rows of 8 using the code below:
write(4,*) "alpha "
write(4,*)alpha(1), alpha(2), alpha(3), alpha(4), alpha(5), alpha(6), alpha(7), alpha(8)
write(4,*)alpha(9), alpha(10), alpha(11), alpha(12), alpha(13), alpha(14), alpha(15), alpha(16)
write(4,*)alpha(17), alpha(18), alpha(19), alpha(20), alpha(21), alpha(22), alpha(23), alpha(24)
write(4,*)alpha(25), alpha(26), alpha(27), alpha(28), alpha(29), alpha(30), alpha(31), alpha(32)
write(4,*)alpha(33), alpha(34), alpha(35), alpha(36), alpha(37), alpha(38), alpha(39), alpha(40)
where 4 is the desired output file. But when I open the output, there are 10 rows instead of 5 each with 5 values then 3 values alternating. Any idea what I can do to avoid this?
Thanks.
Use formatted IO. List-directed IO (i.e., with "*") is designed to be easy but is not fully specified. Different compilers will produce different output. Try something such as:
write (4, '( 8(2X, ES14.6) )' ) alpha (1:8)
Or use a loop:
do i=1, 33, 8
write (4, '( 8(2X, ES14.6) )' ) alpha (i:i+7)
end do
write (4,"(8(1x,f0.4))") alpha
prints the 40 numbers over 5 lines, because in Fortran "format reversion" the format is re-used when you reach the end of it, with further data printed on a new line.
The site http://www.obliquity.com/computer/fortran/format.html says this about format reversion:
"If there are fewer items in the data transfer list than there are data descriptors, then all of the unused descriptors are simply ignored. However, if there are more items in the data transfer list than there are data descriptors, then forced reversion occurs. In this case, FORTRAN 77 advances to the next record and rescans the format, starting with the right-most left parenthesis, including any repeat-count indicators. It then re-uses this part of the format. If there are no inner parenthesis in the FORMAT statement, then the entire format is reused."