ABAP startRFC.exe UTF-8 diacritics text transfer - text

I have a function module (FM) in SAP and I call it externally using startRFC. The only output of FM is one internal table. This table has only 1 column of type char(100) and I need to get it to text file. StartRFC works well, but if there is diacritics (for example Czech language: ěščřžýáíé) instead of these characters only hashes # appear.
Have someone ever solved similar issue?
If I call the same algorithm manually and write strings on screen in SAP, everything is ok. But startRFC somehow destroys it. The problem may be in the data transfer between SAP and startRFC. But I don't know how this transfer works.
I found a solution but it is terribly slow. It converts string to hexadecimal string using "gcl_conv_to_x->write" and "gcl_conv_to_x->get_buffer" than calls "SCMS_XSTRING_TO_BINARY" and you need a binary table. But it takes 5minutes to do all this stuff. Without this conversion my algorithm takes 15 seconds.

So finally a solution...
You need to create XSTRING variable and fill it with your text. To convert STRING to XSTRING use FM: SCMS_STRING_TO_XSTRING.
Then you will need an internal table with row type BAPICONTEN. It already contains component (column) of type SDOK_SDATX (RAW 1022).
And you just append a new line to this table like this:
data: my_table_row LIKE LINE OF my_table.
my_table_row-line = my_xstring.
APPEND my_table_row INTO my_table.
This table (my_table) can be returned via RFC and will contain Cyrillic, German characters etc..
I am just a beginner, so do not ask me how to create the table, please :)

Related

TextInputFormat vs HiveIgnoreKeyTextOutputFormat

I'm just starting out with Hive, and I have a question about Input/Output Format. I'm using the OpenCSVSerde serde, but I don't understand why for text files the Input format is org.apache.hadoop.mapred.TextInputFormat but the output format is org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat.
I've read this question but it's still not clear to my why the Input/Output formats are different. Isn't that basically saying your going to store data added to this table differently the data that's read from the table??
Anyway, any help would be appreciated
In a TextInputFormat, Keys are the position in the file (long data type), and values are the line of text. When the program reads a file, It might use the keys for random read, where while writing the text data using HiveIgnoreKeyTextOutputFormat there is no value in maintaining position as it doesn't make sense.
Hence, using HiveIgnoreKeyTextOutputFormat passes keys as null to underlining RecordWriter. When the RecordWriter receives key as null, it ignores key and just write the value with line separator. Otherwise, RecordWriter will key, then delimiter, then value and finally a line separator.

How to represent a missing xsd:dateTime in RDF?

I have a dataset with data collected from a form that contains various date and value fields. Not all fields are mandatory so blanks are possible and
in many cases expected, like a DeathDate field for a patient who is still alive.
How do I best represent these blanks in the data?
I represent DeathDate using xsd:dateTime. Blanks or empty spaces are not allowed. All of these are flagged as invalid when validating using Jena RIOT:
foo:DeathDate_1
a foo:Deathdate ;
time:inXSDDatetime " "^^xsd:dateTime .
foo:DeathDate_2
a foo:Deathdate ;
time:inXSDDatetime ""^^xsd:dateTime .
foo:DeathDate_3
a foo:Deathdate ;
time:inXSDDatetime "--"^^xsd:dateTime .
I prefer to not omit the triple because I need to know if it was blank on the source versus a conversion error during construction of my RDF.
What is the best way to code these missing values?
You should represent this by just omitting the triple. That's the meaning of a triple that's "not present": it's information that is (currently) unknown.
Alternatively, you can choose to give it the value "unknown"^^xsd:string when there's no death date. The solution in this case is to not datatype it as an xsd:dateTime, but just as a simple string. It doesn't have to be a string of course, you could use any kind of "special" value for this, e.g. a boolean false - just as long as it's a valid literal value that you can distinguish from actual death dates. This will solve the parsing problem, but IMHO if you do this, you are setting yourself up for headaches in processing the data further down the line (because you will need to ask queries over this data, and they will have to take two different types of values into account, plus the possibility that the field is missing).
I prefer to not omit the triple because I need to know if it was blank
on the source versus a conversion error during construction of my RDF.
This sounds like an XY problem. If there are conversion errors, your application should signal that in another way, e.g. by logging an error. You shouldn't try to solve this by "corrupting" your data.

Separating fields out of a string in Hive

I have the following problem...
I work with Hive and want to add a file with several (different) rows of Strings. Those contain fields with a fixed size, like this:
A20130420bcd 34 fgh
where the fields have the length 1,8,6,4,3.
Separated it would look like this:
"A,20130420,bcd,fgh"
Is there any possibility to read the String and sort it into a field besides getting it as a substring for every field like
substring(col_value,1,1) Field1
etc?
I would imagine that cutting the already read part of the string would increase the performance, but i could think of any way to do this with the given functions here.
Secondly, as stated before, there are different types of strings, ordered and identified by the first character.right now just check those with the WHERE-Statement, but it's horrible, as it runs through the whole file just to find only the first String. Is there any way to read specific lines by their number? If i know, that the first string will be of a certain kind, can read it directly?
right it looks like this:
insert overwrite table TEST
SELECT
substring(col_value,1,1) field1,
...
substring(col_value,10,3) field 5
from temp_data WHERE substring(col_value,1,1) = 'A';
any ideas on this?
I would love to hear some ideas =)
You need to write yours generic-UDF parser that output the struct or map or whatever appropriate. you can refer to UDF that output multi-values.
then you can write
insert overwrite table output
select parsed.first, parsed.second
from (
select parse(taget)
from input
) parsed
where first='X';
About second question,you may need to check "explain" command of hive to see if hive do filter push-down for you.(just see how many map reduce it takes, theoretically it should be one map, depending on 1.hive version,
2.output table format
.)
In general sense, this is why database is popular -- take optimization into consideration for you .

What does "SEM1:3ENCE_B:NW:NG102:EECT300:120:0900:2" mean?

In my project I am developing teachers and their timetable. I was provided with a text file that contains the teacher timetable from my uni. They ware unable to tell me what is the syntax or code language so I would know how to read it and use it in my iPhone app. Can you help me identifying what sot of code is this and how can I read that?
Sample:
SEM1:3ENCE_B:NW:NG102:EECT300:120:0900:2
SEM1:3ENCE_B,3ENCE_C:TW:NLG107:EEEL300:120:0900:1
19:3ENCE_A,3ENCE_B,3ENCE_C:TW:CLG.01:EEEL305_L:120:1100:1
19:3ENCE_A,3ENCE_B,3ENCE_C:TW:NLG107:EEEL305:120:0900:1
SEM1:3ENCE_A,3ENCE_B:TW::EEEL300:120:1100:4
SEM1&2:3ENCE_A,3ENCE_B,3ENCE_C,3ENCE_D:SK:CLG.06:EEEL315_L:120:1400:4
SEM1:3CS_A,3CS_B,3CS_C,3CS_D,3ENCE_A,3ENCE_B,3ENCE_C,3ENCE_D:DHE:CLLT:EICG301_L:120:0900:5
SEM1:3CS_A,3CS_B:ABO,DHE:N5.114:EICG301:120:1100:5
SEM1:3CS_A,3CS_B,3CS_C,3CS_D,3ENCE_A,3ENCE_B,3ENCE_C,3ENCE_D:NW:LTS205:EECT300_L:120:1600:2
27:3ENCE_A,3ENCE_B,3ENCE_C,3ENCE_CS::NG100:EEEL320:120:1100:2
SEM1:3CS_A,3CS_B,3CS_C,3CS_D:NW:C2.14:ECSC302_L:120:0900:3
SEM1:3CS_A:NW:NG100:EECT300:120:1400:2
It's not code, it's data. And the best way of interpreting it is to compare this representation with another : Think Rosetta Stone.
Obviously, colon is used to separate the fields, and each line probably represents a single tinmetable item. Each line appears to have 8 fields on it.
One field looks like a course ID : EECT300
Another looks like a time : 0900
As for the rest, you'll have to work it out...
University of Westminster, maybe...?
It is not a code language.
It is just a plain text file which contains data using colons : as a separator
I guess you have to parse it and retrieve the information for each column. You have to be aware of the signification of each column (if no ask to your uni)

sas generate all possible miss spelling

Does any one know how to generate the possible misspelling ?
Example : unemployment
- uemployment
- onemploymnet
-- etc.
If you just want to generate a list of possible misspellings, you might try a tool like this one. Otherwise, in SAS you might be able to use a function like COMPGED to compute a measure of the similarity between the string someone entered, and the one you wanted them to type. If the two are "close enough" by your standard, replace their text with the one you wanted.
Here is an example that computes the Generalized Edit Distance between "unemployment" and a variety of plausible mispellings.
data misspell;
input misspell $16.;
length misspell string $16.;
retain string "unemployment";
GED=compged(misspell, string,'iL');
datalines;
nemployment
uemployment
unmployment
uneployment
unemloyment
unempoyment
unemplyment
unemploment
unemployent
unemploymnt
unemploymet
unemploymen
unemploymenyt
unemploymenty
unemploymenht
unemploymenth
unemploymengt
unemploymentg
unemploymenft
unemploymentf
blahblah
;
proc print data=misspell label;
label GED='Generalized Edit Distance';
var misspell string GED;
run;
Essentially you are trying to develop a list of text strings based on some rule of thumb, such as one letter is missing from the word, that a letter is misplaced into the wrong spot, that one letter was mistyped, etc. The problem is that these rules have to be explicitly defined before you can write the code, in SAS or any other language (this is what Chris was referring to). If your requirement is reduced to this one-wrong-letter scenario then this might be managable; otherwise, the commenters are correct and you can easily create massive lists of incorrect spellings (after all, all combinations except "unemployment" constitute a misspelling of that word).
Having said that, there are many ways in SAS to accomplish this text manipulation (rx functions, some combination of other text-string functions, macros); however, there are probably better ways to accomplish this. I would suggest an external Perl process to generate a text file that can be read into SAS, but other programmers might have better alternatives.
If you are looking for a general spell checker, SAS does have proc spell.
It will take some tweaking to get it working for your situation; it's very old and clunky. It doesn't work well in this case, but you may have better results if you try and use another dictionary? A Google search will show other examples.
filename name temp lrecl=256;
options caps;
data _null_;
file name;
informat name $256.;
input name &;
put name;
cards;
uemployment
onemploymnet
;
proc spell in=name
dictionary=SASHELP.BASE.NAMES
suggest;
run;
options nocaps;

Resources