Converting bibliographic record from RUSMARC to MARC21 - bibliography

I have a bibliographic record in RUSMARC (Russian UNIMARC) standard. For further processing I need to convert this record into MARCXML (MARC21 in XML) format.
How to accomplish such a tranformation programmatically?
UPDATE
I have some routine to read and parse ISO 2709 format. However, RUSMARC (and UNIMARC in general) is different to MARC21 in terms of fields meaning.

UNIMARC records should be converted to MARC21 according to specifications published by Library of Congress (http://www.loc.gov/marc/unimarctomarc21.html).
First, you need to read your RUSMARC (UNIMARC) record into memory and construct XML according to UNISlim schema (http://www.rusmarc.ru/shema/UNISlim.xsd).
Then you can use XSL transformation that converts UNIMARC XML (in UNISlim schema) into MARCXML.
You can git this XSL transformation here: https://github.com/edsd/biblio-metadata

Related

Skip element in BizTalk flat file assembly?

I've been tasked to map an input xml (actually an SAP idoc xml), and to generate a number of flat files. Each input xml may yield multiple output files (one output file per lot number), so I will be using xsl:key and the key() function in my mapping, based on the lot number
The thing is, the lot number itself will not be in the file itself, but the output file name needs to contain that lot number value.
So the question really is: can I map the lot number to the xml and have the flat file assembler skip it when it produces the file? Or is there another way the lot number can be applied as file name by the assembly without having it inside the file itself?
In your orchestration you can set a context property for each output message:
msgOutput(FILE.ReceivedFileName) = "DynamicStuff";
msgOutput then goes to the send shape.
In your send port you set the output file like this:
FixedStuff_%SourceFileName%.xml
The result:
FixedStuff_DynamicStuff.xml
If the value is not required in the message content, don't map it. That's it.
To insert at value in the file name, lot number in this case, you will need to promote that value to the FILE.ReceivedFileName Context Property. Then, you can use the %SourceFileName% Macro as part of the name setting in the Send Port. You can set FILE.ReceivedFileName by either Property Promotion or xpath() in an Orchestration.
Bonus: Sorting and Grouping in xslt is rather unwieldy, which is why I don't do that anymore. Instead, you can use SQL: BizTalk: Sorting and Grouping Flat File Data In SQL Instead of XSL

Vertex error, encoding error in azure data lake analytics while loading data. The appropriate reasons can be?

USE DATABASE retail;
#log=EXTRACT id int,
item string
FROM "/Retailstock/stock.txt"
USING Extractors.Tsv();
INSERT INTO sales.stock
SELECT id, item FROM #log;
It is the question from Azure data lake analytics course. I need to load the table sales.stock with sales schema.
It gives vertex error and encoding error.
I can't understand the problem after head banging for 2 days. Thanks.
It might be due to encoding mismatch. Extractors have a default encoding set to UTF8, and in case the encoding of your source file is different a runtime error will happen during extraction.
You can change the encoding by providing "encoding" parameter, e.g:
USING Extractors.Text(encoding : Encoding.[ASCII]);
See more about supported encodings here: Extractor parameters - encoding
In my case setting the encoding to ASCII in the Extractors.Text() / Extractors.Tsv() did not work. Not sure why, since the file is clearly in the ASCII encoding. I've had to manually convert the file to UTF-8.

Can I import SAP tables that were exported by SE16?

I have exported the contents of a table with transaction SE16, by selecting all the entries and going selecting Download, unconverted.
I'd like to import these entries into another system (where the same table exists and is active).
Furthermore, when I import, there's a possibility that the specific key already exists for a number of entries (old entries).
Other entries won't have a field with the same key present in the table where they're to be imported (new entries).
Is there a way to easily update my table in the second system with the file provided from the first system? If needed, I can export the data in the 3 other format types (Spreadsheet, Rich text format and HTML format). It seems to me though like the spreadsheet and rich text formats sometimes corrupt the data, and the html is far too verbose.
[EDIT]
As per popular demand, the table i'm trying to export / import is a Z table whose fields are all numeric, character, date or time fields (flat data types).
I'm trying to do it like this because the clients don't have any basis resource to help them transport, and would like to "kinna" automate the process of updating one of the tables in one system.
At the moment it's a business request to do it like this, but I'm open to suggestions (and the clients are open too)
Edit
Ok I doubt that what you describe in your comment exists out of the box, but you can easily write something like that:
Create a method (or function module if that floats your boat) that accepts the following:
iv_table name TYPE string and
iv_filename TYPE string
This would be the method:
method upload_table.
data: lt_table type ref to data,
lx_root type ref to cx_root.
field-symbols: <table> type standard table.
try.
create data lt_table type table of (iv_table_name).
assign lt_table->* to <table>.
call method cl_gui_frontend_services=>gui_upload
exporting
filename = iv_filename
has_field_separator = abap_true
changing
data_tab = <table>
exceptions
others = 4.
if sy-subrc <> 0.
"Some appropriate error handling
"message id sy-msgid type 'I'
" number sy-msgno
" with sy-msgv1 sy-msgv2
" sy-msgv3 sy-msgv4.
return.
endif.
modify (p_name) from table <table>.
"write: / sy-tabix, ' entries updated'.
catch cx_root into lx_root.
"lv_text = lx_root->get_text( ).
"some appropriate error handling
return.
endtry.
endmethod.
This would still require that you make sure that the exported file matches the table that you want to import. However cl_gui_frontend_services=>gui_upload should return sy-subrc > 0 in that case, so you can bail out before you corrupt any data.
Original Answer:
I'll assume that you want to update a z-table and not a SAP standard table.
You will probably have to format your datafile a little bit to make it tab or comma delimited.
You can then upload the data file using cl_gui_frontend_services=>gui_upload
Then if you want to overwrite the existing data in the table you can use
modify zmydbtab from table it_importeddata.
If you do not want to overwrite existing entries you can use.
insert zmydbtab from table it_importeddata.
You will get a return code of sy-subrc = 4 if any of the keys already exists, but any new entries will be inserted.
Note
There are many reasons why you would NOT do this for a SAP-standard table. Most prominent is that there is almost always more to the data-model than what we are aware of. Also when creating transactional data, there are often follow-on events or workflow that kicks off, that will not be the case if you're updating the database directly. As a rule of thumb, it is usually a bad idea to update SAP standard tables directly.
In that case try to find a BADI, or if that's not available, record a BDC and do the updates that way.
If the system landscape was setup correctly, your client would not need any kind of basis operations support whatsoever to perform the transports. So instead of re-inventing the wheel, I'd strongly suggest to catch up on what the CTS and TMS can do once they're setup with sensible settings.

How to use Relational Stores with a position based data file?

I have different data files that are mapped on relational stores. I do have a formatter which contains the separators used by the different data files (most of them csv). Here is an example of how it looks like:
DQKI 435741198746445 45879645422727JHUFHGLOBAL COLLATERAL SERVICES AGGREGATOR V9
The rule to read this file is as following: from index 0 to 3, it's the code name, from index 8 to 11, it's PID, from index 11 to 20, it's account number, and so on...
How do you specify such rule in ActivePivot Relational Stores?
The relational-store of ActivePivot ships with a high performance, multithreaded CSV-Source to parse files and load them into data stores. I suppose that's what you hope to use for your fixed-length field file.
But this is not supported in the current version of the Relational Store (1.5.x).
You could pre-process your file with a small script to add a separator character at the end of each of the fields. Then the entire CSV Source can be reused immediately.
You could write your own data source that defines fields as offset in the text line. If you do that you can reuse all of the fast field parsers available in the CSV Source project (they work on any char sequences):
com.quartetfs.fwk.format.impl.DoubleParser
com.quartetfs.fwk.format.impl.FloatParser
com.quartetfs.fwk.format.impl.DoubleVectorParser
com.quartetfs.fwk.format.impl.FloatVectorParser
com.quartetfs.fwk.format.impl.IntegerParser
com.quartetfs.fwk.format.impl.IntegerVectorParser
com.quartetfs.fwk.format.impl.LongParser
com.quartetfs.fwk.format.impl.ShortParser
com.quartetfs.fwk.format.impl.StringParser
com.quartetfs.fwk.format.impl.DateParser

How can I search in PDF documents/PDX catalog in powershell

I have a vendor that supplies their documentation library as a series of PDF files (and some CHM files) and include a .PDX catalog also.
I want to write a powershell script to front end it (using either powershell forms, or hosting powershell in asp.net).
I'm in the early stages, I've worked out how to get document information from the PDF stream (the xmpmeta XML metadata block near the end of the PDF file - one of the few streams in the file that's in plaintext) which looks like this:
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 4.2.1-c043 52.372728, 2009/01/18-15:08:04
"><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"><rdf:Description rdf:about="
" xmlns:pdf="http://ns.adobe.com/pdf/1.3/"><pdf:Producer>GPL Ghostscript 8.64</pdf:Producer><pdf:Keywo
rds>86000056-413</pdf:Keywords></rdf:Description><rdf:Description rdf:about="" xmlns:xmp="http://ns.ad
obe.com/xap/1.0/"><xmp:ModifyDate>2011-03-03T17:38:34-05:00</xmp:ModifyDate><xmp:CreateDate>2011-01-28
T23:12:07+05:30</xmp:CreateDate><xmp:CreatorTool>PScript5.dll Version 5.2</xmp:CreatorTool><xmp:Metada
taDate>2011-03-03T17:38:34-05:00</xmp:MetadataDate></rdf:Description><rdf:Description rdf:about="" xml
ns:xmpMM="http://ns.adobe.com/xap/1.0/mm/"><xmpMM:DocumentID>6cb2263d-2d61-11e0-0000-1390d57dcfcb</xmp
MM:DocumentID><xmpMM:InstanceID>uuid:1a0e68ba-14ad-4a03-b7a1-0a0e127b8753</xmpMM:InstanceID></rdf:Desc
ription><rdf:Description rdf:about="" xmlns:dc="http://purl.org/dc/elements/1.1/"><dc:format>applicati
on/pdf</dc:format><dc:title><rdf:Alt><rdf:li xml:lang="x-default">I/O Subsystem Programming Guide</rdf
:li></rdf:Alt></dc:title><dc:creator><rdf:Seq><rdf:li>Unisys Information Development</rdf:li></rdf:Seq
></dc:creator><dc:description><rdf:Alt><rdf:li xml:lang="x-default">ClearPath MCP 13.1,Application Dev
elopment,Administration,ClearPath MCP</rdf:li></rdf:Alt></dc:description></rdf:Description></rdf:RDF><
/x:xmpmeta>
using the following code (powershell v3, in v2 you need to select and expand the properties thus [string]$title = ($rdf.GetElementsByTagName('dc:title')| Select -expand Alt|Select -expand li)."#text"):
$file = ".\Downloads\68698703-007\PDF\86000056-413.pdf"
#determine what line in file the xmpmeta string starts
[int]$startln = (select-string -pattern '^<x:' $file).ToString().Split(":")[2]
#determine what line in file the xmpmeta string ends
[int]$endln = (select-string -pattern '^</x:' $file).ToString().Split(":")[2]
$startln--
#grab the xmpmeta and cast as type xml
[xml]$xmp = (gc $file)["$startln".."$endln"]
[xml]$rdf = $xmp.xmpmeta.InnerXml
#get title/creator/description element text
[string]$title = $rdf.GetElementsByTagName('dc:title').Alt.li."#text"
[string]$creator = $rdf.GetElementsByTagName('dc:creator').Alt.li."#text"
[string]$description = $rdf.GetElementsByTagName('dc:description').Alt.li."#text"
That's crucial because the filenames are in the format 12345678-123.pdf, the actual title is in the metadata itself, as well as document category etc.
So, I can produce a list of documents (displaying their proper titles, not the real filename) and allow them to be launched, but I also want to be able to search in all the documents using PDX file, but it's by no means plaintext!
I guess I could use one of a number of tools out there to convert each PDF into text, search it, repeat for each document and then return results for each document.
But, it strikes me that Adobe Reader already does that, so can I either start AcroRd32.exe with switches that will start the search, with search terms I've passed in to the AcroRd32 program, or can I use Adobe Search.API from within Powershell?
Any ideas specifically on automating load of the .PDX in Adobe Reader and firing off the search, or using adobe's API in powershell?
EDIT:
I can now launch acrobat from command line and search (so could mimic this in powershell) but the search only works when searching a PDF, not a PDX catalog. Both bring up the search pane, but only in a PDF document does the search field get populated and the search executed.
C:\Program Files (x86)\Adobe\Reader 10.0\Reader>AcroRd32.exe /A "search=trim" "P:\Doc Library\PDF\00_home.pdx"
Or
C:\Program Files (x86)\Adobe\Reader 10.0\Reader>AcroRd32.exe /A "search=trim" "P:\Doc Library\PDF\86000056-413.pdf"
Regards,
Graham
This is an old post, but be aware that the searching you do is potentially dangerous and that there is a better way to find the XMP metadata in a PDF file. XMP was designed specifically to be "findable" by text search. To that purpose it has a well defined begin and end code defined that is in there specifically so that you can extract the XMP data without having to parse the PDF format (or any other format the XMP metadata blob might be embedded in.
You can download the XMP specification here: http://www.adobe.com/devnet/xmp.html. Part 1 is the part where the explanation about XMP Packets explains how a text scanner can find the XMP packet with more accuracy.
Finally, PDF has an additional quirk that allows it to be incrementally updated. This might cause multiple XMP packets to appear in the file (where the last packet is normally the correct one). But annoyingly when the PDF is exported from applications like InDesign, images in the PDF (and other objects) might also have their own "object" XMP attached to it.
So consider where your files come from and how many strange things you might encounter and you want to provision for. But reading the XMP specification is not a bad idea for sure.

Resources