Convert a String to unicode - unicode-string

I want to convert the following below string into unicode and save the same into database. Can someone please let me know how can I convert a string that has non ascii characters to uncode.
For example:
String s = how are you ¿cómo estás
convert it into
how are you \u??? ( non ascii characters into unicode)
Is there any Java API that does this?
Thanks,
Sai.

Related

NumberFormatException when i try to toDouble() the String, even when the input String is a valid representation of a Double

i got an issue with my Code. Im using Android Studio with Kotlin.
So, i got an EditText-field with an input-type="numberDecimal".
when i try to convert that string to a Double:
val preis = etProduktGekauftPreis.text.toString().toDouble()
and try to create a new Object with the "preis"
val produkt = Produkt(name, anzahl.toInt(), preis)
For example im getting the error: NumberFormatException: For input string: "3.00"
The string is a valid representation of a number or not? why do i keep getting this Error?
Thanks for the help :)
Its due to your locale, in german it would be expected to give "3,00" instead of "3.00". You would need to parse the string correctly/differently for example by replacing the comma based on what ever possible locales you support or by removing the comma converting to double then dividing by 100

Regular expression for splitting a comma with the string

I had to split string data based on Comma.
This is the excel data:-
Please find the excel data
string strCurrentLine="\"Himalayan Salt Body Scrub with Lychee Essential Oil from Majestic Pure, All Natural Scrub to Exfoliate & Moisturize Skin, 12 oz\",SKU_27,\"Tombow Dual Brush Pen Art Markers, Portrait, 6-Pack\",SKU_27,My Shopify Store 1,Valid,NonInventory".
Regex CSVParser = new Regex(",(?=(?:[^\"]\"[^\"]\")(?![^\"]\"))");
string[] lstColumnValues = CSVParser.Split(strCurrentLine);
I have attached the image.The problem is I used the Regex to split the string with comma but i need the ouptut just like SKU_27 because string[0] and string2 contains the forward and backward slash.I need the output string1 and remove the forward and backward slash.
The file seems to be a CVA file. For CVA to be properly formatted, it will use quotes "" to wrap strings that contains comma, such as
id, name, date
1,"Some text, that includes comma", 2020/01/01
Simply split the string by comma, you will get the 2nd column with double quote.
I'm not sure whether you are asking how to remove the double-quotes from lstColumnValues[0] and lstColumnValues[2], or add them to lstColumnValues[1].
To remove the double-quotes, just use Replace:
string myString = lstColumnValues[0].Replace("\"", "");
If you need to add them:
string myString = $"\"{lstColumnValues[1]}\"";

Python3: convert apostrophe unicode string

I have a string value with an apostrophe like this:
"I\\xE2\\x80\\x99m going now."
How can I get correct apostrophe value?
"I`m going now."
As you know, \xE2\x80\x99 is the a unicode character U+2019 RIGHT SINGLE QUOTATION MARK, but I have a string representation instead of byte...
Perhaps this is what you want:
utf8_apostrophe = b'\xe2\x80\x99'.decode("utf8")
str = "I"+utf8_apostrophe+"m going now"
Aside:
I ran into this when converting a single quotation mark, within a UTF-8-encoded tweet, into a normal single quote.
import re
original_tweet = 'I’m going now'
string_apostrophe = "'"
print re.sub(utf8_apostrophe, string_apostrophe, original_tweet)
which produces
I'm going now

Extract numbers from String Array in Scala

I'm new to Scala and unsure of how to achieve the following
I have the String
val output = "6055039\n3000457596\n3000456748\n180013\n"
I want to extract the numbers separated by \n and store them in an Array
output.split("\n").map(_.toInt)
Or only
output.split("\n")
if you want to keep the numbers in String format. Note that .toInt throws. You might want to wrap it accordingly.

What encoding is this and how can I decode it?

I've got an old project file with translations to Portuguese where special characters are broken:
error.text.required=\u00C9 necess\u00E1rio o texto.
error.categoryid.required=\u00C9 necess\u00E1ria a categoria.
error.email.required=\u00C9 necess\u00E1rio o e-mail.
error.email.invalid=O e-mail \u00E9 inv\u00E1lido.
error.fuel.invalid=\u00C9 necess\u00E1rio o tipo de combust\u00EDvel.
error.regdate.invalid=\u00C9 necess\u00E1rio ano de fabrica\u00E7\u00E3o.
error.mileage.invalid=\u00C9 necess\u00E1ria escolher a quilometragem.
error.color.invalid=\u00C9 necess\u00E1ria a cor.
Can you tell me how to decode the file to use the common Portuguese letters?
Thanks
The "\u" is prefix for unicode. You can use the strings "as is", and you'll have diacritics showing in the output. A python code would be something like:
print u"\u00C9 necess\u00E1rio o texto."
which outputs:
É necessário o texto.
Otherwise, you need to convert them in their ASCII equivalents. You can do a simple find/replace. I ended up writing a function like that for converting Romanian diacritics a while ago, but I had dynamic strings coming in...
Smell to me like this is unicode?
\u = prefix unicode character
00E1 = hex code for the 2 byte number of the unicode.
Not sure what the format is - I would ask the sencer, but i would try this approach to decode it.
found it ;)
http://www.fileformat.info/info/unicode/char/20/index.htm
Look at the tables with source code. This can be a C++ source file. This is the way you give unicodde characters in source.

Resources