Can Microsoft Translator API translate text with special characters? - azure

I am trying to use Microsoft Translator API to translate text from Polish to any other language. In Polish, there are a couple of special characters like "ą", "ś", "ż" etc. When I send the HTTP request with no special characters:
POST /translate?api-version=3.0&from=pl&to=en HTTP/1.1
Ocp-Apim-Subscription-Key: ********
Ocp-Apim-Subscription-Region: ******
Content-Length: 21
Host: api.cognitive.microsofttranslator.com
Connection: close
User-Agent: Apache-HttpClient/4.5.10 (Java/15.0.2)
Accept-Encoding: gzip, deflate
[{"Text": "Gramatyka"}]
I receive a correct translation:
[{"translations":[{"text":"grammar","to":"en"}]}]
However, it is likely that a Polish word or sentence contains special characters:
POST /translate?api-version=3.0&from=pl&to=en HTTP/1.1
Ocp-Apim-Subscription-Key: ********
Ocp-Apim-Subscription-Region: ********
Content-Length: 21
Host: api.cognitive.microsofttranslator.com
Connection: close
User-Agent: Apache-HttpClient/4.5.10 (Java/15.0.2)
Accept-Encoding: gzip, deflate
[{"Text": "Roślina"}]
This request results in error code 400000:
{"error":{"code":400000,"message":"One of the request inputs is not valid."}}
If I change the special characters to standard ones (like change "ś" into "s"), the API does not give a proper translation. For example:
[{"Text": "Roslina"}]
results in:
[{"translations":[{"text":"Roslina","to":"en"}]}]
Whereas "roślina" should translate to "plant".
This problem applies to other languages too. For example German:
[{"Text": "Wörterbuch"}]
results in an 400000 error as well.
Has anyone found a solution to this?

Did you try checking the language detect score, to just understand if it is taking it as Polish. Can you try without "From" attribute. Make sure you put all headers.
curl -X POST "https://api.cognitive.microsofttranslator.com/translate?api-version=3.0&to=zh-Hans" -H "Ocp-Apim-Subscription-Key: " -H "Content-Type: application/json; charset=UTF-8" -d "[{'Text':'Hello, what is your name?'}]"

Related

Google Translate API: Translates symbols to Gibberish - Python

I am using Google Translate API to translate a excel column from Japanese to English. The Japanese column not only contains Japanese characters but some numeric symbols like ①, ⑥ etc.
No problem in translating the Japanese characters but the symbols gets converted into a gibberish.
Example:
Japanese: #⑥その他
English: # â‘¥ Other
But the same text works fine with Google Translate Web
How to prevent translating symbols in Google Translate API?
The issue comes from mixing numeric symbols with a language, since then it's harder for the Translation API to detect which is the source language.
I don't know which method you are using to call the Translation API, but in any case, specifying the source language solves the issue.
For example, with a REST call from the Command Line Interface:
curl -X POST -H "Authorization: Bearer "\
$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" --data "{
'q': '#⑥その他',
'source': 'ja',
'target': 'en'
}" "https://translation.googleapis.com/language/translate/v2"
Will return "# ⑥ Other" as the result of the translation.

In Automic 12 bash, special characters in mailx body result in body being attached as a binary file

I am trying to send an email using mailx through Automic Workload Automation 12.0's bash jobs. The message needs to have a special character, in this case the percent sign "°".
The message should have the body This is the ° sign., for this example.
I am using this code to send the message. printf '\xB0' prints the ° sign.
(
printf 'This is the '
printf '\xB0'
printf ' sign.'
) | mailx [etc]
If I copy and paste this directly into a bash terminal, the email sends fine with the special character printed in the message body.
However, when I use the same code in Automic bash jobs, the email body is blank. There is a file attached, named ATT00001.bin. If I open ATT00001.bin using notepad.exe, the file contains the text that should have been in the body, This is the ° sign. With the characters printed exactly as they should be in the body.
The following when used in Automic results in a message being sent with the correct body. No files are attached. So it seems clear that the special character is causing this issue with Automic.
(
printf 'This is the '
printf 'placeholder'
printf ' sign.'
) | mailx [etc]
Does anyone know why this happens, or how to resolve it?
Mailx is a evolved MUA. For just sending a mail, if you use sendmail, you could build your own mail header:
/usr/sbin/sendmail destuser#desthost <<eomail
To: destuser#desthost
Subject: Any interesting thing
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
This is the ° sign
eomail
Or you could use html encoding:
/usr/sbin/sendmail destuser#desthost <<eomail
To: destuser#desthost
Subject: Any interesting thing
MIME-Version: 1.0
Content-Type: text/html; charset="ASCII"
Content-Transfer-Encoding: 8bit
<html><body>This is the ° sign</body></html>
eomail
Care, use only ASCII characters there!

Danish date is not formatted correctly in Xpages

In our application we have implemented that all dates and values should be presented using the browsers locale.
However when selecting Danish as the locale/language in any web browser the date formatting is wrong.
We see no errors for English, Swedish, Norwegian formatting, only for Danish.
The dates are formatted as "20/08/15" but should be "20-08-2015"
The server is a Domino 9.0.1 version using Server Locale and when testing the locale output I see that it is serving "da". When changing to Browser Locale on server the setting do not change the date formatting.
This issue has been reported on our servers in different countries.
I have tried to locate an explanation and/or answer to our problem but failed.
The application has no locale specific formatting on any fields, view columns… and we'd like to keep it that way. Our application is run in different countries so not controlling the locale formatting is our preferred way. However we'd like to present the dates and numbers in the language specific correct way.
We do not explicitly use any Dojo components, only plain date fields and view columns in a view panel. We do not have any International Options set.
I have tried to set the locale as #Sven Hasselbach answer in another question but failed. Haven't tried his Xsnippet…
an example of header:
GET /demo/tradesec.nsf HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate, sdch
Accept-Language: da,sv;q=0.8,no;q=0.6,en-US;q=0.4,en;q=0.2,nl;q=0.2
DNT: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.37 Safari/537.36
X-Chrome-UMA-Enabled: 1
X-Client-Data: CJe2yQEIo7bJAQicksoBCOeUygEI/ZXKAQi8mMoB
HTTP/1.1 200 OK
Connection: Keep-Alive
Content-Encoding: gzip
Content-Length: 8956
Content-Type: text/html;charset=UTF-8
Date: Mon, 17 Aug 2015 07:59:43 GMT
Expires: -1
Keep-Alive: timeout=10, max=100
Tradechannel: Work_and_fun_professionally_done
X-Pad: avoid browser bug
Please advice, thanks!
/M
XPages is using the ICU4J library for date formatting. That library is using '/' as the separator for the Danish short date format.
So code like this:
com.ibm.icu.text.DateFormat.getDateInstance(
com.ibm.icu.text.DateFormat.SHORT,
new java.util.Locale("da")).toPattern()
gives date patterns like:
en: M/d/yy
da: dd/MM/yy
sv: yyyy-MM-dd
nb: dd.MM.yy
You might try using the long date format instead:
da (long): d. MMM yyyy
output: 17. aug 2015
da (medium): dd/MM/yyyy
output: 17/08/2015
by setting dateStyle="long" on the converter.
Or if you do need to override the language-specific pattern for Danish then the code would be like:
<xp:viewColumn columnName="_MainTopicsDate" id="viewColumn3">
<xp:viewColumnHeader value="Date" id="viewColumnHeader3"></xp:viewColumnHeader>
<xp:this.converter>
<xp:convertDateTime dateStyle="short"
pattern="${javascript: ('da' == context.getLocale().getLanguage())?
'd-MM-yyyy': null}">
</xp:convertDateTime>
</xp:this.converter>
</xp:viewColumn>
Just a quick thing to check - are there any settings in the browser that you are using that could make any trouble here? Can you confirm that it works correctly using a non-XPages webpage with the correct locale?
I know I have had issues with browsers trying to be "clever" in other situations - so I think it would be a good idea to try and establish if the browser or the XPage is the culprit ;-)
/John

How to set the subject line and add attachment while using sendmail utility on linux?

I am using sendmail utility on CentOs to send mail. I am not able to set the subject-line and add attachment for the mails which are sent using this utility. Using option "-s" to set the subject line is not applicable for sendmail utility. Please tell what options to use with the sendmail for achieving thses objectives.
sendmail is a low-level utility. You have to compose the extra message headers yourself.
That is, to add the subject line, before the body of the message you prepend:
Subject: <your-subject>
And a new line to separate the headers from the body.
Likewise, to add the attachment:
Subject: <your-subject>
Content-Type: multipart/mixed; boundary="-unique-str"
---unique-str
Content-Type: text/html
Content-Disposition: inline
<html-body here>
---unique-str
Content-Type: application; name=<attachment-mime>
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename=<attachment-name>
<your base64-encoded attachment here>
---unique-str--
Or something like this (I didn't test it).
You can see how real messages are formatted by looking at the "show original" or "show source" options available in most e-mail clients. Those options will show you the raw-message and you just need to build something similar.

Apache Stanbol sentiment analysis and sentence detection not working

I am using Apache Stanbol. It works for enhancing the text, however when I tried sentiment analysis and sentence detection, it doesn't work.
I tried this code
curl -v -X POST -H "Accept: text/plain" -H "Content-type: text/plain; \
charset=UTF-8" --data "Some text for analysis" \
"http://localhost:8081/enhancer/engine/sentiment-wordclassifier"
But it gives blank { } output, I tried changing the header attributes but no luck.
am I missing something? Do I need to do some configuration first?
I even tried adding analyzer in the enhancer chain but the same blank output, also tried REST API for opennlp-sentence, but it didn't work.
I guess you are sending data to the wrong endpoint... usually calls to the enhancer need to be done to all chains:
http://host/stanbol/enhancer
or to a concrete chain:
http://host/stanbol/enhancer/chain/<name>
The enhancer results couldn't be serialized as plain text, but in any of the RDF serialization supported by Stanbol. So the Accept header would need to be any of those, text/turtle for instance.
Further details at the documentation: http://stanbol.apache.org/docs/trunk/components/enhancer/#RESTful_API

Resources