Url command with utf-8 url - livecode

I was trying to download a URL from the following address:
http://data.riksdagen.se/personlista/?utformat=json&valkrets=Värmlands+Län
(Open Data from the Swedish government)
This works perfectly in the browser but using the url command in LiveCode doesn't quite as the Swedish character ä doesn't get encoded properly. I've tried to urlEncode the string but it still doesn't work. Is there any way to download a url with utf-8 encoded characters.
If I call curl via shell I do get the correct values, but that isn't available on the mobile...

After some thinking and digging I realised that the answer is of course to translate the url from UTF-16 that LiveCode uses internally into UTF-8 that the server expects. The browsers use UTF-8 by default so thats why it's working there. So
put url "http://data.riksdagen.se/personlista/?utformat=json&valkrets=" & textEncode("Värmlands+Län", "utf8")
did the trick!
The problem is that I can't use the urlencodefunction as that translates all Swedish characters and the server expects them as UTF-8 (which is of course strange by itself!)

Related

Excel VBA on Mac german special characters not encoded correctly (ÄÜÖ)

I have an Excel VBA Script that I originally wrote for Windows (where it works fine) and now had to port to Mac OS. I don't think that it matters but the script is calling cURL to get a JSON Response from a web API which is then parsed, edited and inserted into the spreadsheet.
Some of the fields in the parsed JSON contain special characters like Ä, Ü, Ö (German characters). The script can handle these just fine on Windows but on Mac instead of ÖÜÄ I get other symbols. This breaks the tool as it depends on some vlookup-functions where the values are written by hand (with the correct symbols).
I tried lots of googling but was not able to find anything.
One thing that might be interesting is that the code itself changes on Mac as well! I have some statements printed to the console and even the hardcoded strings that contain a special character are broken as soon as I open the script on a Mac.
The question is for Mac VBA. This is a pain. The only solution I have is to send the curl output to a file, then open that file with workbooks.opentext and Origin:=65001 and all the response is in cell A1, correctly encoded.
I have asked my own question on that, to see if any one has a more recent answer.
How to read UTF8 data output from cURL in popen/fread in VBA on Mac?

Azure Cognitives services speech to text accentuation on Spanish

I'm having issues with the c++ sdk of Azure cognitives services speech to text with the spanish language related to accentuation.
I'm seeing the following error:
'sÃ' instead of 'Si' or 'Sí' which will be the correct transcription.
I'm guessing this is due to the api responde encoding. Is there any way to set headers to enable response on UTF-8 or any encoding with full spanish support?
The return is UTF8 encoded, if you redirect the output to file and load it into a UTF8 capable editor, you will see the text is actually correct. the problem is UTF8 output in the Windows cmd console.
There are several stackoverflow discussions about this. Perhaps something like this helps: how to convert utf-8 to ASCII in c++?

Windows to UTF-8 Character Encoding Behaviour Query

A simple query about expected behaviour when compiling Windows-1252 characters under UTF-8. When building using an ant task on java source code it seems that some weird character encoding occurs.
For certain fields characters that are normally encoded as \u2013 on the windows machine for example, turn into \226 on Linux. What is the explanation for the \226? Will it still be rendered correctly on a browser, for example?

Section Sign when displaying .txt in Browser?

I have the following issue: We have a tool that saves some data to a .txt and this data is delimited by §. This data can be accessed through the webservers directory index and viewed in the browser.
But both FF and Chrome will not display the § character correctly, and when copy / pasting data it will not copy the correct char as well.
The whole thing works fine though, when using Ctrl + S, the file is set as UTF-8 and the character is displayed in a proper way in any text editor.
The file is correctly encoded as UTF-8, the Content-Type header I get is text/plain; charset=UTF-8
Should I be mistaken and § not be printable in UTF-8 (why would the text editor show it correctly then), please make a suggestion for a proper charset to use, while keeping in mind that we might also have Chinese / Japanese characters etc. in our data.
Thanks for your help.
P.S. by not being diplayed correctly, i mean the black diamond with the ? in it.

Questions on Chinese Encoding

I'm trying to create a webpage in Chinese and I realized that while the text looks fine when I run it on browsers, once I change the Character Encoding, the text becomes gibberish. Here's what's happening:
I create my html file in Emacs, encoded in UTF-8.
I upload it to the server, and view it on my browsers (FF, IE, Chrome, Opera) - no problem.
I try to view the page in other encodings via FF > View > Character Encoding > All those different Chinese encoding systems, e.g. Chinese Simplified (HZ)
Apart from UTF-8, on every other encoding the text becomes gibberish.
I'm assuming this isn't a problem - i.e. browsers are smart enough to know which encoding the page is in, and parse the content accurately. What I'm wondering is why I can't read the Chinese text anymore once I change encoding - is it because I don't have Chinese fonts installed on my OS? Should I stick to UTF-8 if my audience are Chinese or should I choose among one of their many encoding systems?
Thanks in advance for your help/opinions.
UTF isn't a 'catch-all' encoding. It's designed to contain international language character symbols for ease of use, but it is still an encoding, just like the other encodings you've selected. You would have to retype the text in each encoding to make it appear correctly when viewed with that encoding.
Viewer encoding MUST match the file being read. Viewing UTF-8 as something other makes about same sense as renaming .txt to .exe and trying to run it.
You should specify correct encoding in HTML. The option you're using in web browser exist only for those rare occasions when web developer screwed up his job and declared other encoding than actually used OR mixed up 2 different encodings on one page.
Of course changing the encoding in your browser will "break" the text! The browser is taking the stream of UTF-8 codepoints and tries to force another encoding on the raw data. Needless to say, the result ain't pretty. Changing the encoding in the browser is NOT the equivalent of converting.
As you surmised correctly, modern browsers usually guess correctly -- but not always. As Agent_L make sure to declare the encoding in the headers.

Resources