Handle unknown characters

Handle unknown characters - string

I'm in need to retrieve a substring from a text. The text is returned by a device and the problem is it sends it with unknown characters in it. What I'm trying to achieve is to retrieve the value '1' at the end but the XSLT statement would fail due to the JUNK characters(shown as BS and in a vi editor as ^H).
Is there a way I can remove these keystroke characters out of the text and use regular string functions in XSLT?
Any help would be much appreciated.
Thank you guys!
<xsl:value-of select="substring-before('show owp onu next-available port gpon_1/2$nu next-available port gpon_1/2 / 3 : 81.' , '.')"/>

If your data contains a Backspace control character then it isn't legal XML, and if it isn't legal XML then you can't process it using XSLT. You have to deal with the problem at the stage when you are turning the text returned by the device into XML.

Related

Antlr Lexer and Parser for catching exressions within another expression

I need to get the pieces of text out of text)). Very simple example actually, but gives me quite some pain.
Here is the sample text, it is an email template:
{!Account.Name}
Hi hi there {!Account.Id + 'cool'}.
Very interesting stuff - {!Contact.Description}
Now we get {!Contact.Description + Contact.Email__c}
So I need all the occurances of text like Account.Name, but only those which are within opening "{!" and closing "}" tags.
What is the simplest/starting approach to do it? Note that in case of the last line, I need to get the two occurances, Contact.Description and Contact.Email__c.
Thanks a lot for any help!

I would just do a plain text search for {...} blocks and parse their content with a simple expression parser. Don't try to come up with a parser that gets all the text and must be prepared to deal with any rubbish that can come in outside of the blocks (which could ultimatively lead to security problems).

ANTLR4 - How to parse content between same string values

I'm trying to write an antlr4 parser rule that can match the content between some arbitrary string values that are same. So far I couldn't find a method to do it.
For example, in the below input, I need a rule to extract Hello and Bye. I'm not interested in extracting xyz though.
TEXT Hello TEXT
TEXT1 Bye TEXT1
TEXT5 xyz TEXT8
As it is very much similar to an XML element grammar, I tried an example for XML Parser given in ANTLR4 XML Grammar, but it parses an input like <ABC> ... </XYZ> without error which is not what I wanted.
I also tried using semantic predicates without much success.
Could anyone please help with a hint on how to match content that is embedded between same strings?
Thank you!
Satheesh

Not sure how this works out performance wise, because of many many checks the parser has to do, but you could try something like:
token:
start = IDENTIFIER WORD* end = IDENTIFIER { start == end }?
;
The part between the curly braces is a validating semantic predicate. The lexer tokens are self-explanatory, I believe.
The more I think about it, it might be better you just tokenize the input and write an owner parser that processes the input and acts accordingly. Depends of course on the complexity of the syntax.

How to find non-ASCII symbols in a string. DB2

please advise on my particular issue.
I have a table field with VARCHAR type. I need to validate this field the way it DOESN'T have any non-ASCII symbols (like ╥ ї ╡ etc.) I didn't find any ways to resolve this issue.
Please give me a hand in this. Thanks in advance!
**Update:
The example attached in comments can't resolve my issue. There is shown a fixed set of latin chars and numbers, but my field accepts Japanese and Chinese symbols.

Time for another silly XML trick:
SELECT
XMLQUERY('matches($X,"^[A-z0-9]+$")'
PASSING XMLTEXT('╥ї╡') AS "X"
)
FROM SYSIBM.SYSDUMMY1
1
-----
false
See https://stackoverflow.com/a/17467695/3434508 for details on using Regular Expressions for DB2
See https://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.xml.doc/doc/xqrregexp.html for advanced RegEx character classes.

Decoding Specification '(AA$)' For Write Statement

I'm confused as the what this write specification is trying to specify. N is an array of single characters. Could someone help me and explain the write format specification below. I saw someone post the exact same question a few days ago but the page is not there anymore.
WRITE(*,'(AA$)') N(I),","

The dollar sign in a format specifier suppresses a new line.
Therefore, the array N is written element-wise as a string (A) separated by a comma (second string A) one a single line.
Note that this syntax is not standard conforming, in modern Fortran you would write the format as
WRITE(*,'(2A)', advance='no') N(I),","

ABAP startRFC.exe UTF-8 diacritics text transfer

I have a function module (FM) in SAP and I call it externally using startRFC. The only output of FM is one internal table. This table has only 1 column of type char(100) and I need to get it to text file. StartRFC works well, but if there is diacritics (for example Czech language: ěščřžýáíé) instead of these characters only hashes # appear.
Have someone ever solved similar issue?
If I call the same algorithm manually and write strings on screen in SAP, everything is ok. But startRFC somehow destroys it. The problem may be in the data transfer between SAP and startRFC. But I don't know how this transfer works.
I found a solution but it is terribly slow. It converts string to hexadecimal string using "gcl_conv_to_x->write" and "gcl_conv_to_x->get_buffer" than calls "SCMS_XSTRING_TO_BINARY" and you need a binary table. But it takes 5minutes to do all this stuff. Without this conversion my algorithm takes 15 seconds.

So finally a solution...
You need to create XSTRING variable and fill it with your text. To convert STRING to XSTRING use FM: SCMS_STRING_TO_XSTRING.
Then you will need an internal table with row type BAPICONTEN. It already contains component (column) of type SDOK_SDATX (RAW 1022).
And you just append a new line to this table like this:
data: my_table_row LIKE LINE OF my_table.
my_table_row-line = my_xstring.
APPEND my_table_row INTO my_table.
This table (my_table) can be returned via RFC and will contain Cyrillic, German characters etc..
I am just a beginner, so do not ask me how to create the table, please :)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Handle unknown characters - string

If your data contains a Backspace control character then it isn't legal XML, and if it isn't legal XML then you can't process it using XSLT. You have to deal with the problem at the stage when you are turning the text returned by the device into XML.

Related

Antlr Lexer and Parser for catching exressions within another expression

ANTLR4 - How to parse content between same string values

How to find non-ASCII symbols in a string. DB2

Decoding Specification '(AA$)' For Write Statement

ABAP startRFC.exe UTF-8 diacritics text transfer

Categories

Resources