Skip element in BizTalk flat file assembly? - xsd

I've been tasked to map an input xml (actually an SAP idoc xml), and to generate a number of flat files. Each input xml may yield multiple output files (one output file per lot number), so I will be using xsl:key and the key() function in my mapping, based on the lot number
The thing is, the lot number itself will not be in the file itself, but the output file name needs to contain that lot number value.
So the question really is: can I map the lot number to the xml and have the flat file assembler skip it when it produces the file? Or is there another way the lot number can be applied as file name by the assembly without having it inside the file itself?

In your orchestration you can set a context property for each output message:
msgOutput(FILE.ReceivedFileName) = "DynamicStuff";
msgOutput then goes to the send shape.
In your send port you set the output file like this:
FixedStuff_%SourceFileName%.xml
The result:
FixedStuff_DynamicStuff.xml

If the value is not required in the message content, don't map it. That's it.
To insert at value in the file name, lot number in this case, you will need to promote that value to the FILE.ReceivedFileName Context Property. Then, you can use the %SourceFileName% Macro as part of the name setting in the Send Port. You can set FILE.ReceivedFileName by either Property Promotion or xpath() in an Orchestration.
Bonus: Sorting and Grouping in xslt is rather unwieldy, which is why I don't do that anymore. Instead, you can use SQL: BizTalk: Sorting and Grouping Flat File Data In SQL Instead of XSL

Related

Copy CSV File with Multiline Attribute with Azure Synapse Pipeline

I have a CSV File in the Following format which want to copy from an external share to my datalake:
Test; Text
"1"; "This is a text
which goes on on a second line
and on on a third line"
"2"; "Another Test"
I do now want to load it with a Copy Data Task in an Azure Synapse Pipeline. The result is the following:
Test; Text
"1";" \"This is a text"
"which goes on on a second line";
"and on on a third line\"";
"2";" \"Another Test\""
So, yo see, it is not handling the Multi-Line Text correct. I also do not see an option to handle multiline text within a Copy Data Task. Unfortunately i'm not able to use a DataFlow Task, because it is not allowing to run with an external Azure Runtime, which i'm forced to use, due to security reasons.
In fact, i'm of course not speaking about this single test file, instead i do have x thousands of files.
My settings for the CSV File look like follows:
CSV Connector Settings
Can someone tell me how to handle this kind of multiline data correctly?
Do I have any other options within Synapse (apart from the Dataflows)?
Thanks a lot for your help
Well turns out this is not possible with a CSV File.
The pragmatic solution is to use "binary" files instead, to transfer the CSV Files and only load and transform them later on with a Python Notebook in Synapse.
You can achieve this in azure data factory by iterating through all lines and check for delimiter in each line. And then, use string manipulation functions with set variable activities to convert multi-line data to a single line.
Look at the following example. I have a set variable activity with empty value (taken from parameter) for req variable.
In lookup, create a dataset with following configuration to the multiline csv:
In foreach, where I iterate each row by giving items value as #range(0,sub(activity('Lookup1').output.count,1)). Inside for each, I have an if activity with following condition:
#contains(activity('Lookup1').output.value[item()]['Prop_0'],';')
If this is true, then I concat the current result to req variable using 2 set variable activities.
temp: #if(contains(activity('Lookup1').output.value[add(item(),1)]['Prop_0'],';'),concat(variables('req'),activity('Lookup1').output.value[item()]['Prop_0'],decodeUriComponent('%0D%0A')),concat(variables('req'),activity('Lookup1').output.value[item()]['Prop_0'],' '))
actual (req variable): #variables('val')
For false, I have handled the concatenation in the following way:
temp1: #concat(variables('req'),activity('Lookup1').output.value[item()]['Prop_0'],' ')
actual1 (req variable): #variables('val2')
Now, I have used a final variable to handle last line of the file. I have used the following dynamic content for that:
#if(contains(last(activity('Lookup1').output.value)['Prop_0'],';'),concat(variables('req'),decodeUriComponent('%0D%0A'),last(activity('Lookup1').output.value)['Prop_0']),concat(variables('req'),last(activity('Lookup1').output.value)['Prop_0']))
Finally, I have taken copy data activity with a sample source file with 1 column and 1 row (using this to copy our actual data).
Now, take source file configuration as shown below:
Create an additional column with value as final variable value:
Create a sink with following configuration and select mapping for only above created column:
When I run the pipeline, I get the data as required. The following is an output image for reference.

Azure Data Factory removing spaces from column names of csv file

I'm a bit new to azure data factory so apologies if I'm missing anything obvious. I've done several searches and I can't find anything that quite fits.
So the situation is that we have an existing pipeline that will take the path to a csv file and pass this in as a delimited data set. As a sink it is using a parquet data set. This is a generic process that we can pass any delimited file into and it will output it as parquet.
This has been working well but now we have started receiving files with spaces and special characters in the header which causes the output to parquet to fail. Unfortunately we don't have control over the format of the files we receive so I can't handle this at source.
What I would like to do is on ingestion of the file replace any spaces and other special characters in the header with an underscore. If I were doing this on premise I could quickly create a powershell script to do it. I had thought about creating a custom task in AFD to call a powershell script to do this in the blob storage but that seems more complicated than it should be. Is there something else I can do to get this process working while keeping it generic?
As #Joel Cochran mentioned, you can use the below expression in Select transformation to replace space and special characters in the header.
regexReplace($$,'[^a-zA-Z]','_')
Source:
In Select transformation, remove the auto mappings and add new rule base mapping to use this expression.
preview:
You can change the output filename not directly in the Copy activity, assuming you are using this activity.
The workaround is to use a parameter for the filename output that you can cleanup.
You can use the Get Metadata activity to get all filenames from the source csv files.
Then loop over these files with a foreach activity.
Within the foreach activity you can set the output filename with the new name with the cleaned value.
The function could look like this:
#replace(item().name, ' ', '_')
More information on the replace function

Create automated report from web data

I have a set of multiple API's I need to source data from and need four different data categories. This data is then used for reporting purposes in Excel.
I initially created web queries in Excel, but my Laptop just crashes because there is too many querie which have to be updated. Do you guys know a smart workaround?
This is an example of the API I will source data from (40 different ones in total)
https://api.similarweb.com/SimilarWebAddon/id.priceprice.com/all
The data points I need are:
EstimatedMonthlyVisits, TopOrganicKeywords, OrganicSearchShare, TrafficSources
Any ideas how I can create an automated report which queries the above data on request?
Thanks so much.
If Excel is crashing due to the demand, and that doesn't surprise me, you should consider using Python or R for this task.
install.packages("XML")
install.packages("plyr")
install.packages("ggplot2")
install.packages("gridExtra")
require("XML")
require("plyr")
require("ggplot2")
require("gridExtra")
Next we need to set our working directory and parse the XML file as a matter of practice, so we're sure that R can access the data within the file. This is basically reading the file into R. Then, just to confirm that R knows our file is in XML, we check the class. Indeed, R is aware that it's XML.
setwd("C:/Users/Tobi/Documents/R/InformIT") #you will need to change the filepath on your machine
xmlfile=xmlParse("pubmed_sample.xml")
class(xmlfile) #"XMLInternalDocument" "XMLAbstractDocument"
Now we can begin to explore our XML. Perhaps we want to confirm that our HTTP query on Entrez pulled the correct results, just as when we query PubMed's website. We start by looking at the contents of the first node or root, PubmedArticleSet. We can also find out how many child nodes the root has and their names. This process corresponds to checking how many entries are in the XML file. The root's child nodes are all named PubmedArticle.
xmltop = xmlRoot(xmlfile) #gives content of root
class(xmltop)#"XMLInternalElementNode" "XMLInternalNode" "XMLAbstractNode"
xmlName(xmltop) #give name of node, PubmedArticleSet
xmlSize(xmltop) #how many children in node, 19
xmlName(xmltop[[1]]) #name of root's children
To see the first two entries, we can do the following.
# have a look at the content of the first child entry
xmltop[[1]]
# have a look at the content of the 2nd child entry
xmltop[[2]]
Our exploration continues by looking at subnodes of the root. As with the root node, we can list the name and size of the subnodes as well as their attributes. In this case, the subnodes are MedlineCitation and PubmedData.
#Root Node's children
xmlSize(xmltop[[1]]) #number of nodes in each child
xmlSApply(xmltop[[1]], xmlName) #name(s)
xmlSApply(xmltop[[1]], xmlAttrs) #attribute(s)
xmlSApply(xmltop[[1]], xmlSize) #size
We can also separate each of the 19 entries by these subnodes. Here we do so for the first and second entries:
#take a look at the MedlineCitation subnode of 1st child
xmltop[[1]][[1]]
#take a look at the PubmedData subnode of 1st child
xmltop[[1]][[2]]
#subnodes of 2nd child
xmltop[[2]][[1]]
xmltop[[2]][[2]]
The separation of entries is really just us, indexing into the tree structure of the XML. We can continue to do this until we exhaust a path—or, in XML terminology, reach the end of the branch. We can do this via the numbers of the child nodes or their actual names:
#we can keep going till we reach the end of a branch
xmltop[[1]][[1]][[5]][[2]] #title of first article
xmltop[['PubmedArticle']][['MedlineCitation']][['Article']][['ArticleTitle']] #same command, but more readable
Finally, we can transform the XML into a more familiar structure—a dataframe. Our command completes with errors due to non-uniform formatting of data and nodes. So we must check that all the data from the XML is properly inputted into our dataframe. Indeed, there are duplicate rows, due to the creation of separate rows for tag attributes. For instance, the ELocationID node has two attributes, ValidYN and EIDType. Take the time to note how the duplicates arise from this separation.
#Turning XML into a dataframe
Madhu2012=ldply(xmlToList("pubmed_sample.xml"), data.frame) #completes with errors: "row names were found from a short variable and have been discarded"
View(Madhu2012) #for easy checking that the data is properly formatted
Madhu2012.Clean=Madhu2012[Madhu2012[25]=='Y',] #gets rid of duplicated rows
Here is a link that should help you get started.
http://www.informit.com/articles/article.aspx?p=2215520
If you have never used R before, it will take a little getting used to, but it's worth it. I've been using it for a few years now and when compared to Excel, I have seen R perform anywhere from a couple hundred percent faster to many thousands of percent faster than Excel. Good luck.

Can I import SAP tables that were exported by SE16?

I have exported the contents of a table with transaction SE16, by selecting all the entries and going selecting Download, unconverted.
I'd like to import these entries into another system (where the same table exists and is active).
Furthermore, when I import, there's a possibility that the specific key already exists for a number of entries (old entries).
Other entries won't have a field with the same key present in the table where they're to be imported (new entries).
Is there a way to easily update my table in the second system with the file provided from the first system? If needed, I can export the data in the 3 other format types (Spreadsheet, Rich text format and HTML format). It seems to me though like the spreadsheet and rich text formats sometimes corrupt the data, and the html is far too verbose.
[EDIT]
As per popular demand, the table i'm trying to export / import is a Z table whose fields are all numeric, character, date or time fields (flat data types).
I'm trying to do it like this because the clients don't have any basis resource to help them transport, and would like to "kinna" automate the process of updating one of the tables in one system.
At the moment it's a business request to do it like this, but I'm open to suggestions (and the clients are open too)
Edit
Ok I doubt that what you describe in your comment exists out of the box, but you can easily write something like that:
Create a method (or function module if that floats your boat) that accepts the following:
iv_table name TYPE string and
iv_filename TYPE string
This would be the method:
method upload_table.
data: lt_table type ref to data,
lx_root type ref to cx_root.
field-symbols: <table> type standard table.
try.
create data lt_table type table of (iv_table_name).
assign lt_table->* to <table>.
call method cl_gui_frontend_services=>gui_upload
exporting
filename = iv_filename
has_field_separator = abap_true
changing
data_tab = <table>
exceptions
others = 4.
if sy-subrc <> 0.
"Some appropriate error handling
"message id sy-msgid type 'I'
" number sy-msgno
" with sy-msgv1 sy-msgv2
" sy-msgv3 sy-msgv4.
return.
endif.
modify (p_name) from table <table>.
"write: / sy-tabix, ' entries updated'.
catch cx_root into lx_root.
"lv_text = lx_root->get_text( ).
"some appropriate error handling
return.
endtry.
endmethod.
This would still require that you make sure that the exported file matches the table that you want to import. However cl_gui_frontend_services=>gui_upload should return sy-subrc > 0 in that case, so you can bail out before you corrupt any data.
Original Answer:
I'll assume that you want to update a z-table and not a SAP standard table.
You will probably have to format your datafile a little bit to make it tab or comma delimited.
You can then upload the data file using cl_gui_frontend_services=>gui_upload
Then if you want to overwrite the existing data in the table you can use
modify zmydbtab from table it_importeddata.
If you do not want to overwrite existing entries you can use.
insert zmydbtab from table it_importeddata.
You will get a return code of sy-subrc = 4 if any of the keys already exists, but any new entries will be inserted.
Note
There are many reasons why you would NOT do this for a SAP-standard table. Most prominent is that there is almost always more to the data-model than what we are aware of. Also when creating transactional data, there are often follow-on events or workflow that kicks off, that will not be the case if you're updating the database directly. As a rule of thumb, it is usually a bad idea to update SAP standard tables directly.
In that case try to find a BADI, or if that's not available, record a BDC and do the updates that way.
If the system landscape was setup correctly, your client would not need any kind of basis operations support whatsoever to perform the transports. So instead of re-inventing the wheel, I'd strongly suggest to catch up on what the CTS and TMS can do once they're setup with sensible settings.

How to use Relational Stores with a position based data file?

I have different data files that are mapped on relational stores. I do have a formatter which contains the separators used by the different data files (most of them csv). Here is an example of how it looks like:
DQKI 435741198746445 45879645422727JHUFHGLOBAL COLLATERAL SERVICES AGGREGATOR V9
The rule to read this file is as following: from index 0 to 3, it's the code name, from index 8 to 11, it's PID, from index 11 to 20, it's account number, and so on...
How do you specify such rule in ActivePivot Relational Stores?
The relational-store of ActivePivot ships with a high performance, multithreaded CSV-Source to parse files and load them into data stores. I suppose that's what you hope to use for your fixed-length field file.
But this is not supported in the current version of the Relational Store (1.5.x).
You could pre-process your file with a small script to add a separator character at the end of each of the fields. Then the entire CSV Source can be reused immediately.
You could write your own data source that defines fields as offset in the text line. If you do that you can reuse all of the fast field parsers available in the CSV Source project (they work on any char sequences):
com.quartetfs.fwk.format.impl.DoubleParser
com.quartetfs.fwk.format.impl.FloatParser
com.quartetfs.fwk.format.impl.DoubleVectorParser
com.quartetfs.fwk.format.impl.FloatVectorParser
com.quartetfs.fwk.format.impl.IntegerParser
com.quartetfs.fwk.format.impl.IntegerVectorParser
com.quartetfs.fwk.format.impl.LongParser
com.quartetfs.fwk.format.impl.ShortParser
com.quartetfs.fwk.format.impl.StringParser
com.quartetfs.fwk.format.impl.DateParser

Resources