Encoding issues HPCC - hpcc-ecl

I have been sent a dataset that contains data for US, UK, France, and Germany product dictionaries. With the German data, I'm having trouble displaying accents, etc.
I've sprayed the data as ASCII and UTF8.
I've defined my record structure as
gbrec := RECORD
STRING5 CountryId;
INTEGER8 ProductId;
INTEGER8 ABV;
UTF8_de ProductDescription;
INTEGER8 ProductItemId;
INTEGER MultiBuys;
STRING UomDescription;
I define the dataset as
ProductDictionary := Project(DISTRIBUTE(DATASET('~cga::ml_fullproductextract_20220808_UTF.txt', gbrec ,CSV(SEPARATOR('\t'))))(std.uni.ToUpperCase(ProductDescription[1..4]) != 'ANY ' AND std.uni.ToUpperCase(CGA_GenealogyLvl3Desc) NOT IN ['NA_BRAND FAMILY']),
I have used the UTF and ASCII versions with no joy. The data is displayed below.
VS Code Image
Do you have any advice or suggestions? I've looked over posted on the original forum which is where I got these ideas from.
Any help would be appreciated.
Thanks
Problem Data

David,
I would start by going back to the spray. ASCII will never work, so UTF8 would be my first choice. But since that does not work, I would next go back to take a look at the raw data in a Hex editor to see exactly what I was dealing with. IOW, it is some form of Unicode, but which exactly? Perhaps you could ask the data supplier?
HTH,
Richard

Related

perl Excel::Writer::XLSX write_formula encoding

I am trying to use Excel::Writer::XLSX. Most things already successfully, but I struggle to get a formula into a cell.
use utf8;
is set
I am trying to set the formula into the cell with the following statement:
$av_obj_excel_worksheet_DATA->write_formula( 'a3', '=_xlfn._xlws.FILTER(gw_col_gwuPMBo,(MONTH(gw_col_DATUM)=1)*(gw_col_gwuPMBo<>0),"_empty")' );
I have the extracted the .xlsx-file (since it is a simple zip-file) and had a look at the relevant xml of the spreadsheet.
The result is:
_xlfn._xlws.FILTER(gw_col_gwuPMBo,(MONTH(gw_col_DATUM)=1)*(gw_col_gwuPMBo&lt;&gt;0),"_empty")
but the result should be, since I created an .xlsx-file manually and had again a look at the relevant xml-file of the relevant spreadsheet:
_xlfn._xlws.FILTER(gw_col_gwuPMBo,(MONTH(gw_col_DATUM)=2)*(gw_col_gwuPMBo<>0),"_empty")
I seems to me some unicode problem.
Unicode is difficult to understand and - I regret - I don't realy do!
Can someone help me what to do to get the correct form of the formula into the .xlsx file (or related .xml-file of the relevant spreadsheet?
Thanks
I already experimented with encode and decode and now found the solution:
$av_tmp_STRING = '=_xlfn._xlws.FILTER(gw_col_gwuPMBo,(MONTH(gw_col_DATUM)=1)*(gw_col_gwuPMBo<>0),"_empty")';
$av_tmp_STRING = decode( 'UTF-8', $av_tmp_STRING );
$av_obj_excel_worksheet_DATA->write_formula( 'a3', $av_tmp_STRING );
the hurdle was that I did not see the result durging debugging the script. But the correct string was written to the .xlsx-file.
Sometimes, by thinking, searching and trying it comes to a positive result.

Converting Unicode in Swift

I currently have a string as follows which I received through an API call:
\n\nIt\U2019s a great place to discover Berlin and a comfortable place
to come home to.
And I want to convert it into something like this which is more readable:
It's a great place to discover Berlin and a comfortable place to come
home to.
I've taken a look at this post, but that's manually writing down every conversion, and there may be more of these unicode scalar characters introduced.
What I understand is \u{2019} is unicode scalar, but the format for this is \U2019 and I'm quite confused. Are there any built in methods to do this conversion?
This answer suggests using the NSString method stringByFoldingWithOptions.
The Swift String class has a concept called a "view" which lets you operate on the string under different encodings. It's pretty neat, and there are some views that might help you.
If you're dealing with strings in Swift, read this excellent post by Mike Ash. He discusses the idea of what a string really is with great detail and has some helpful hints for Swift 2.
Assuming you are already splitting the string and can get the offending format separately:
func convertFormat(stringOrig: String) -> Character {
let subString = String(stringOrig.characters.split("U").map({$0})[1])
let scalarValue = Int(subString)
let scalar = UnicodeScalar(scalarValue!)
return Character(scalar)
}
This will convert the String "\U2019" to the Character represented by "\u{2019}".

What does "SEM1:3ENCE_B:NW:NG102:EECT300:120:0900:2" mean?

In my project I am developing teachers and their timetable. I was provided with a text file that contains the teacher timetable from my uni. They ware unable to tell me what is the syntax or code language so I would know how to read it and use it in my iPhone app. Can you help me identifying what sot of code is this and how can I read that?
Sample:
SEM1:3ENCE_B:NW:NG102:EECT300:120:0900:2
SEM1:3ENCE_B,3ENCE_C:TW:NLG107:EEEL300:120:0900:1
19:3ENCE_A,3ENCE_B,3ENCE_C:TW:CLG.01:EEEL305_L:120:1100:1
19:3ENCE_A,3ENCE_B,3ENCE_C:TW:NLG107:EEEL305:120:0900:1
SEM1:3ENCE_A,3ENCE_B:TW::EEEL300:120:1100:4
SEM1&2:3ENCE_A,3ENCE_B,3ENCE_C,3ENCE_D:SK:CLG.06:EEEL315_L:120:1400:4
SEM1:3CS_A,3CS_B,3CS_C,3CS_D,3ENCE_A,3ENCE_B,3ENCE_C,3ENCE_D:DHE:CLLT:EICG301_L:120:0900:5
SEM1:3CS_A,3CS_B:ABO,DHE:N5.114:EICG301:120:1100:5
SEM1:3CS_A,3CS_B,3CS_C,3CS_D,3ENCE_A,3ENCE_B,3ENCE_C,3ENCE_D:NW:LTS205:EECT300_L:120:1600:2
27:3ENCE_A,3ENCE_B,3ENCE_C,3ENCE_CS::NG100:EEEL320:120:1100:2
SEM1:3CS_A,3CS_B,3CS_C,3CS_D:NW:C2.14:ECSC302_L:120:0900:3
SEM1:3CS_A:NW:NG100:EECT300:120:1400:2
It's not code, it's data. And the best way of interpreting it is to compare this representation with another : Think Rosetta Stone.
Obviously, colon is used to separate the fields, and each line probably represents a single tinmetable item. Each line appears to have 8 fields on it.
One field looks like a course ID : EECT300
Another looks like a time : 0900
As for the rest, you'll have to work it out...
University of Westminster, maybe...?
It is not a code language.
It is just a plain text file which contains data using colons : as a separator
I guess you have to parse it and retrieve the information for each column. You have to be aware of the signification of each column (if no ask to your uni)

ABAP startRFC.exe UTF-8 diacritics text transfer

I have a function module (FM) in SAP and I call it externally using startRFC. The only output of FM is one internal table. This table has only 1 column of type char(100) and I need to get it to text file. StartRFC works well, but if there is diacritics (for example Czech language: ěščřžýáíé) instead of these characters only hashes # appear.
Have someone ever solved similar issue?
If I call the same algorithm manually and write strings on screen in SAP, everything is ok. But startRFC somehow destroys it. The problem may be in the data transfer between SAP and startRFC. But I don't know how this transfer works.
I found a solution but it is terribly slow. It converts string to hexadecimal string using "gcl_conv_to_x->write" and "gcl_conv_to_x->get_buffer" than calls "SCMS_XSTRING_TO_BINARY" and you need a binary table. But it takes 5minutes to do all this stuff. Without this conversion my algorithm takes 15 seconds.
So finally a solution...
You need to create XSTRING variable and fill it with your text. To convert STRING to XSTRING use FM: SCMS_STRING_TO_XSTRING.
Then you will need an internal table with row type BAPICONTEN. It already contains component (column) of type SDOK_SDATX (RAW 1022).
And you just append a new line to this table like this:
data: my_table_row LIKE LINE OF my_table.
my_table_row-line = my_xstring.
APPEND my_table_row INTO my_table.
This table (my_table) can be returned via RFC and will contain Cyrillic, German characters etc..
I am just a beginner, so do not ask me how to create the table, please :)

artist working with base 64

Hi Im fairly new to base64 code and could do with some info/help etc. I have recently used it while converting my photos into code as part of an experiment regarding my sculptures, I converted a photo into 152 pages of base 64 using an online converter, just to get started. I was wondering how I can reverse this, and also if I edited out some code would the image change or would it not convert at all!? Like I said I am interested in using this to help my experiments. thanks for any info .
Base64 decode functions will decode it. If you remove some of the base64 code it may not translate back into a valid image or translate into valid base64 code at all depending on how much you remove. Depending on how the image was originally compressed, it MAY give you something, but the results will be incredibly unpredictable... So you'll never know what the result of the change will be on the final product. Part of the base64 standard involves a padded width (which is why some strings end in up to 2 = signs).

Resources