Concatenate long strings from multiple records into one string - string

I have a situation where I need to concatenate long strings from multiple records in an Oracle database into a single string. These long strings are portions of a larger XML string, and my ultimate goal is to be able to convert this XML into something resembling query results and pull out specific values.
The data would look something like this, with the MSG_LINE_TEXT field being VARCHAR2(4000). So if the total message is less than 4000 characters, then there'd only be one record. In theory, there could be an infinite number of records for each message, although the highest I've seen so far is 14 records, which means I need to be able to handle strings that are at least 56000 characters long.
MESSAGE_ID MSG_LINE_NUMBER MSG_LINE_TEXT
---------- --------------- --------------------------------
17415414 1 Some XML snippet here
17415414 2 Some XML snippet here
17415414 3 Some XML snippet here
17415414 4 Some XML snippet here
The total XML for one MESSAGE_ID might look something like this. There could be many App_Advice_Error tags, although this specific example only contains one.
<tXML>
<Header>
<Source>MANH_prod_wmsweb</Source>
<Action_Type />
<Sequence_Number />
<Company_ID>1</Company_ID>
<Msg_Locale />
<Version />
<Internal_Reference_ID>17415414</Internal_Reference_ID>
<Internal_Date_Time_Stamp>2021-02-09 13:45:22</Internal_Date_Time_Stamp>
<External_Reference_ID />
<External_Date_Time_Stamp />
<User_ID>ESBUSER</User_ID>
<Message_Type>RESPONSE</Message_Type>
</Header>
<Response>
<Persistent_State>0</Persistent_State>
<Error_Type>2</Error_Type>
<Resp_Code>501</Resp_Code>
<Response_Details>
<Application_Advice>
<Shipper_ID />
<Imported_Object_Type>ASN</Imported_Object_Type>
<Response_Type>Error</Response_Type>
<Transaction_Date>2/9/21 13:45</Transaction_Date>
<Application_Ackg_Code>TE</Application_Ackg_Code>
<Business_Unit></Business_Unit>
<Tran_Set_Identifier_Code></Tran_Set_Identifier_Code>
<Transaction_Purpose_Code>11</Transaction_Purpose_Code>
<Imported_Message_Id></Imported_Message_Id>
<Imported_Object_Id>Reference Number Here</Imported_Object_Id>
<Additional_References>
<Additional_Reference_Info>
<Reference_Type>BusinessPartner</Reference_Type>
<Reference_ID></Reference_ID>
</Additional_Reference_Info>
</Additional_References>
<App_Advice_Errors>
<App_Advice_Error>
<App_Error_Text>Some error text here</App_Error_Text>
<Error_Message_Tokens>
<Error_Message_Token>Object that errored out</Error_Message_Token>
</Error_Message_Tokens>
<App_Err_Cond_Code>6100234</App_Err_Cond_Code>
</App_Advice_Error>
</App_Advice_Errors>
<Imported_Data></Imported_Data>
</Application_Advice>
</Response_Details>
</Response>
</tXML>
The values that I'm most interested in pulling out are the App_Err_Cond_Code, Error_Message_Token, and App_Error_Text tags. I had tried using something like this:
extractvalue(xmltype(msg_line_text), '//XPath of Tag')
This works beautifully for stuff where the entire XML is less than 4000 characters, i.e. the entire XML is stored in a single record. The problem comes when there are multiple records, because each individual snippet of XML isn't a valid XML string on its own, and so XMLTYPE throws an error, hence the reason I'm trying to concatenate them all into a single string, which I can then use with the above method.
I've tried a variety of ways to do this - LISTAGG, XMLAGG, SYS_CONNECT_BY_PATH, as well as writing a custom function something like this:
with
function get_messages(pTranLogID number) return string
is
xml varchar2;
begin
xml := '';
for msg in (
select r.msg_line_text
from tran_log_response_message r, tran_log t
where
t.message_id = r.message_id
and t.tran_log_id = pTranLogID
order by r.msg_line_number
)
loop
xml := xml || msg.msg_line_text;
end loop;
return 'test';
end;
select
tran_log_id, get_messages(tran_log_id)
from
tran_log
where
tran_log_id = '20633610';
/
The problem is that every one of these methods complained that the string was too long. Does anyone have any other ideas? Or maybe a better approach to this problem?
Thanks.

Related

Convert ITAB to XSTRING and back

I need to save an itab as an xstring or something like this and save it in dbtab.
Later I need to gather this xstring from dbtab and convert it in the itab before with exactly the same input from before.
I tried a lot of fuba´s like:
SCMS_STRING_TO_XSTRING or SCMS_XSTRING_TO_BINARY but I didn´t find something to convert it back.
Does somebody have tried something like this before and have some samples for me ?
Unfortunately I didn´t find something on other blogs or else.
An easy solution to convert into an xstring:
CALL TRANSFORMATION id SOURCE root = it_table RESULT XML DATA(lv_xstring).
Back would be like:
CALL TRANSFORMATION id SOURCE XML lv_xstring RESULT root = it_table.
For more information, see the ABAP documentation about data serialization and deserialization by using the XSL Identity Transformation.
use
import ... from data buffer
and
export ... to data buffer
to (re)store any variable as xstring.
Or you can use
import|export ... from|to database ...
I did some methods to do this:
First I loop at the table and concatenate it into a string.
Then convert the string into an xstring.
LOOP AT IT_TABLE ASSIGNING FIELD-SYMBOL(<LS_TABLE>).
CONCATENATE LV_STRING <LS_TABLE> INTO LV_STRING SEPARATED BY CL_ABAP_CHAR_UTILITIES=>NEWLINE.
ENDLOOP.
CALL FUNCTION 'SCMS_STRING_TO_XSTRING'
EXPORTING
TEXT = IV_STRING
IMPORTING
BUFFER = LV_XSTRING.
Back would be like:
Convert xstring back to string
String into table
TRY.
CL_BCS_CONVERT=>XSTRING_TO_STRING(
EXPORTING
IV_XSTR = IV_XSTRING
IV_CP = 1100 " SAP character set identification
RECEIVING
RV_STRING = LV_STRING
).
CATCH CX_BCS.
ENDTRY.
SPLIT IV_STRING AT CL_ABAP_CHAR_UTILITIES=>NEWLINE INTO: TABLE <LT_TABLE> .
READ TABLE <LT_TABLE> ASSIGNING FIELD-SYMBOL(<LS_TABLE>) INDEX 1.
IF <LS_TABLE> IS INITIAL.
DELETE TABLE <LT_TABLE> FROM <LS_TABLE>.
ENDIF.

NodeJS why is object[0] returning '{' instead of the first property from this json object?

So I have to go through a bunch of code to get some data from an iframe. the iframe has a lot of data but in there is an object called '_name'. the first key of name is 'extension_id' and its value is a big long string. the json object is enclosed in apostrophes. I have tried removing the apostrophes but still instead of 'extension_id_output' I get a single curly bracket. the json object looks something like this
Frame {
...
...
_name: '{"extension_id":"a big huge string that I need"} "a bunch of other stuff":"this is a valid json object as confirmed by jsonlint", "globalOptions":{"crev":"1.2.50"}}}'
}
it's a whole big ugly paragraph but I really just need the extension_id. so this is the code I'm currently using after attempt 100 or whatever.
var frames = await page.frames();
// I'm using puppeteer for this part but I don't think that's relevant overall.
var thing = frames[1]._name;
console.log(frames[1])
// console.log(thing)
thing.replace(/'/g, '"')
// this is to remove the apostrophes from the outside of the object. I thought that would change things before. it does not. still outputs a single {
JSON.parse(thing)
console.log(thing[0])
instead of getting a big huge string that I need or whatever is written in extension_id. I get a {. that's it. I think that is because the whole object starts with a curly bracket. this is confirmed to me because console.log(thing[2]) prints e. so what's going on? jsonlint says this is a valid json object but maybe it's just a big string and I should be doing some kind of split to grab whaat's between the first : and the first ,. I'm really not sure.
For two reasons:
object[0] doesn't return the value an object's "first property", it returns the value of the property with the name "0", if any (there probably isn't in your object); and
Because it's JSON, and when you're dealing with JSON in JavaScript code, you are by definition dealing with a string. (More here.) If you want to deal with the object that the JSON describes, parse it.
Here's an example of parsing it and getting the value of the extension_id property from it:
const parsed = JSON.parse(frames[1]._name);
console.log(parsed.extension_id); // The ID

How can I define multiple input file patterns in USQL?

I have U-SQL script where I need to process some data. The data is stored in blob, with ~100 files per day in this folder structure: /{year}/{month}/{day}/{hour}/filenames.tsv
Getting one day of data is easy, just put a wildcard in the end and it will pick out all the files for all the hours for the day.
However, in my script I want to read out the current day and the last 2 hours of the previous day. The naive way is with 3 extract statements in this way:
DECLARE #input1 = #"/data/2017/10/08/22/{*}.tsv";
DECLARE #input2 = #"/data/2017/10/08/23/{*}.tsv";
DECLARE #input3 = #"/data/2017/10/09/{*}.tsv";
#x1 = EXTRACT .... FROM #input1 USING Extractors.Tsv();
#x2 = EXTRACT .... FROM #input2 USING Extractors.Tsv();
#x3 = EXTRACT .... FROM #input3 USING Extractors.Tsv();
But in my case each extract line is very long and complicated (~50 columns) using the AvroExtractor, so I would really prefer to only specify the columns and extractor once instead of 3 times. Also, by having 3 inputs its not possible from the caller side to decide how many hours from the previous days that should be read.
My question is how can I define this in a convenient way, ideally using only one extract statement?
You could wrap your logic up into a U-SQL stored procedure so it is encapsulated. Then you need only make a few calls to the proc. A simple example:
CREATE PROCEDURE IF NOT EXISTS main.getContent(#inputPath string, #outputPath string)
AS
BEGIN;
#output =
EXTRACT
...
FROM #inputPath
USING Extractors.Tsv();
OUTPUT #output
TO #outputPath
USING Outputters.Tsv();
END;
Then to call it (untested):
main.getContent (
#"/data/2017/10/08/22/{*}.tsv",
#"/output/output1.tsv"
)
main.getContent (
#"/data/2017/10/08/23/{*}.tsv",
#"/output/output2.tsv"
)
main.getContent (
#"/data/2017/10/09/{*}.tsv",
#"/output/output3.tsv"
)
That might be one way to go about it?

Treat all cells as strings while using the Apache POI XSSF API

I'm using the Apache POI framework for parsing large Excel spreadsheets. I'm using this example code as a guide: XLSX2CSV.java
I'm finding that cells that contain just numbers are implicitly being treated as numeric fields, while I wanted them to be treated always as strings. So rather than getting 1.00E+13 (which I'm currently getting) I'll get the original string value: 10020300000000.
The example code uses a XSSFSheetXMLHandler which is passed an instance of DataFormatter. Is there a way to use that DataFormatter to treat all cells as strings?
Or as an alternative: in the implementation of the interface SheetContentsHandler.cell method there is string value that is the cellReference. Is there a way to convert a cellReference into an index so that I can use the SharedStringsTable.getEntryAt(int idx) method to read directly from the strings table?
To reproduce the issue, just run the sample code on an xlsx file of your choice with a number like the one in my example above.
UPDATE: It turns out that the string value I get seems to match what you would see in Excel. So I guess that's going to be "good enough" generally. I'd expect the data I'm sent to "look right" and therefore it'll get parsed correctly. However, I'm sure there will be mistakes and in those cases it'd be nice if I could get at the raw string value using the streaming API.
To resolve this issue I created my own class based on XSSFSheetXMLHandler
I copied that class, renamed it and then in the endElement method I changed this part of the code which is formatting the raw string:
case NUMBER:
String n = value.toString();
if (this.formatString != null && n.length() > 0)
thisStr = formatter.formatRawCellContents(Double.parseDouble(n), this.formatIndex, this.formatString);
else
thisStr = n;
break;
I changed it so that it would not format the raw string:
case NUMBER:
thisStr = value.toString();
break;
Now every number in my spreadsheet has its raw value returned rather than a formatted version.

Deserialize XMLDocument with encoded characters in attribute names

I'm Trying to deserialize xml data into an object with c#. I have always done this using the .NET deserialize method, and that has worked well for most of what I have needed.
Now though, I have XML that is created by Sharepoint and the attribute names of the data I need to deserialize have encoded caracters, namely:
*space, º, ç ã, :, * and a hyphen as
x0020, x00ba, x007a, x00e3, x003a and x002d respectivly
I'm trying to figure out what I have to put in the attributeName parameter in the properties XmlAttribute
x0020 converts to a space well, so, for instance, I can use
[XmlAttribute(AttributeName = "ows_Nome Completo")]
to read
ows_Nome_x0020_Completo="MARIA..."
On The other hand, neither
[XmlAttribute(AttributeName = "ows_Motiva_x00e7__x00e3_o_x003a_")]
nor
[XmlAttribute(AttributeName = "ows_Motivação_x003a_")]
nor
[XmlAttribute(AttributeName = "ows_Motivação:")]
allow me to read
ows_Motiva_x00e7__x00e3_o_x003a_="text to read..."
With the first two I get no value returned, and the third gives me a runtime error for invalid caracters (the colon).
Anyway to get this working with .NET Deserialize, or do I have to build a specific deserializer for this?
Thanks!
What you are looking at (the "cryptic" data) is called XML entities. It's used by SharePoint to safekeep attribute names and similar elements.
There are a few ways of dealing with this, the most elegant ways to solve it is by extracting the List schema and match the element towards the schema. The schema contain all meta-data about your list data. A polished example of a Schema can be seen below or here http://www.bendsoft.com/documentation/camelot-php-tools/1_5/packets/schema-and-content-packets/schemas/example-list-view-schema/
If you don't want to walk that path you could start here http://msdn.microsoft.com/en-us/library/35577sxd.aspx
<Field Name="ContentType">
<ID>c042a256-787d-4a6f-8a8a-cf6ab767f12d</ID>
<DisplayName>Content Type</DisplayName>
<Type>Text</Type>
<Required>False</Required>
<ReadOnly>True</ReadOnly>
<PrimaryKey>False</PrimaryKey>
<Percentage>False</Percentage>
<RichText>False</RichText>
<VisibleInView>True</VisibleInView>
<AppendOnly>False</AppendOnly>
<FillInChoice>False</FillInChoice>
<HTMLEncode>False</HTMLEncode>
<Mult>False</Mult>
<Filterable>True</Filterable>
<Sortable>True</Sortable>
<Group>_Hidden</Group>
</Field>
<Field Name="Title">
<ID>fa564e0f-0c70-4ab9-b863-0177e6ddd247</ID>
<DisplayName>Title</DisplayName>
<Type>Text</Type>
<Required>True</Required>
<ReadOnly>False</ReadOnly>
<PrimaryKey>False</PrimaryKey>
<Percentage>False</Percentage>
<RichText>False</RichText>
<VisibleInView>True</VisibleInView>
<AppendOnly>False</AppendOnly>
<FillInChoice>False</FillInChoice>
<HTMLEncode>False</HTMLEncode>
<Mult>False</Mult>
<Filterable>True</Filterable>
<Sortable>True</Sortable>
</Field>
<Field>
...
</Field>
Well... I guess I kind of hacked a way around, which works for now. Just replaced the _x***_ charecters for nothing, and corrected the XmlAttributes acordingly. This replacement is done by first loading the xml as a string, then replacing, then loading the "clean" text as XML.
But I wopuld still like to know if it is possible to use some XmlAttribute Name for a more direct approach...
Try using System.Xml; XmlConvert.EncodeName and XmlConvert.DecodeName
I use a simply function to get the NameCol:
private string getNameCol(string colName) {
if (colName.Length > 20) colName = colName.Substring(0, 20);
return System.Xml.XmlConvert.EncodeName(colName);
}
I'm already searching for replace characters like á, é, í, ó, ú. EncodeName doesn't convert this characters.
Can use Replace:
.Replace("ó","_x00f3_").Replace("á","_x00e1_")

Resources