What encoding is this and how can I decode it? - text

I've got an old project file with translations to Portuguese where special characters are broken:
error.text.required=\u00C9 necess\u00E1rio o texto.
error.categoryid.required=\u00C9 necess\u00E1ria a categoria.
error.email.required=\u00C9 necess\u00E1rio o e-mail.
error.email.invalid=O e-mail \u00E9 inv\u00E1lido.
error.fuel.invalid=\u00C9 necess\u00E1rio o tipo de combust\u00EDvel.
error.regdate.invalid=\u00C9 necess\u00E1rio ano de fabrica\u00E7\u00E3o.
error.mileage.invalid=\u00C9 necess\u00E1ria escolher a quilometragem.
error.color.invalid=\u00C9 necess\u00E1ria a cor.
Can you tell me how to decode the file to use the common Portuguese letters?
Thanks

The "\u" is prefix for unicode. You can use the strings "as is", and you'll have diacritics showing in the output. A python code would be something like:
print u"\u00C9 necess\u00E1rio o texto."
which outputs:
É necessário o texto.
Otherwise, you need to convert them in their ASCII equivalents. You can do a simple find/replace. I ended up writing a function like that for converting Romanian diacritics a while ago, but I had dynamic strings coming in...

Smell to me like this is unicode?
\u = prefix unicode character
00E1 = hex code for the 2 byte number of the unicode.
Not sure what the format is - I would ask the sencer, but i would try this approach to decode it.
found it ;)
http://www.fileformat.info/info/unicode/char/20/index.htm
Look at the tables with source code. This can be a C++ source file. This is the way you give unicodde characters in source.

Related

remove backslash from string lua

I working with some url string and i tried to remove "\" from the string to use url for my further use.
But when i tried using strin.gsub its not working as it should. rather then its giving me wrong output.
the String is
nas="\\192.168.1.220\STORAGE_1d1b7\a\b\c"
Code I have tried:
nas=string.gsub(nas,'\\',"")
print(nas)
Output:
192.168.1.220STORAGE_1d1b7??c
Output i need:
192.168.1.220STORAGE_1d1b7_a_b_c
its removing the "\" but it also affecting the "\" with "?"
i don't know where the "?" comes from?
The character \ is used to escape some special characters in a string, for eg.: \n represents a newline character (ASCII code 10) etc. (\a is ASCII code 7 in C/C++)
So, you'd need to define your string as:
nas = "\\\\192.168.1.220\\STORAGE_1d1b7\\a\\b\\c"
Alternatively, lua provides another way to define raw strings:
nas = [[\\192.168.1.220\STORAGE_1d1b7\a\b\c]]
Any ways Figured it out....
NASLocation = NASLocation:gsub('\\\\', ''):gsub('\\', '_',1):gsub('\\','/')

Tell if specific char in string is a long char or a short char

Be prepared, this is one of those hard questions.
In Farsi or Persian language ی which sounds like y or i and is written in 4 different shapes according to it's place in word. I'll call ی as YA from now for simplification.
take a look at this image
All YA characters are painted in red, in the first word YA is attached to it's previous (right , in Farsi we right from RIGHT to LEFT) character and is free at the end whereas the last YA (3rd word, left-most red char) is free both from left or right.
Having said this long story, I want to find out if a part of a string ends with long YA (YA without points) or short YA (YA with two points beneath it).
i.e تحصیلداری (the 3rd word) ends with long YA but تحصیـ which is a part of 3rd word does not ends with short YA.
Question: How can I say تحصیلداری ends whit which unicode? I just have a simple string, "تحصیلداری", how can I convert its characters to unicode?
I tried the unicodes
string unicodes = "";
foreach (char c in "تحصیلداری")
{
unicodes += c+" "+((int)c).ToString() + Environment.NewLine;
}
MessageBox.Show(unicodes);
result :
but at the end of the day unfortunately all YAs have the same unicode.
Bad news : YA was an example, a real one though. There are also a dozen of other characters like YA with different appearances too.
Additional info :
using this useful link about unicodes I found unicode of different YAs
We solved similar problem the way bellow:
We had a core banking application, the customer sub-system needed a full text search on customers name, family, father name etc.
Different encoding, legacy migrated data, keyboard layouts and Farsi fonts ... made search process inaccurate.
We overcame the problem by replacing problematic characters with some standard one and saving the standard string for search purpose.
After several iterations, the replacement is as bellow that may come in handy:
Formula="UPPER(REPLACE(REPLACE(REPLACE
(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE
(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE
(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE
(REPLACE(REPLACE(REPLACE(REPLACE
(REPLACE(FirsName || LastName || FatherName,
chr(32),''),
chr(13),''),
chr(9),''),
chr(10),''),
'-',''),
'-',''),
'آ','ا'),
'أ', 'ا'),
'ئ', 'ي'),
'ي', 'ي'),
'ك', 'ک'),
'آإئؤةي','اايوهي'),
'ء',''),
'شأل','شاال'),
'ا.','اله'),
'.',''),
'الله','اله'),
'ؤ','و'),
'إ','ا'),
'ة','ه'),
' ا لله','اله'),
'ا لله','اله'),
' ا لله','اله'))"
Despite there are different YEHs in Unicode, it must noticed that all presentation forms of YEHs are same Unicode character with code 0x06cc. You can not determine presentation forms by their Unicode code.
But you can reach your goal be checking to see what characters is before or after YEH.
You can also use Fardis to see Unicode codes of strings.

VIM: how to change case of accented characters with 'gU'?

Following sentence contains all accented characters (chars with diacritic) that are used in Czech language.
příliš žluťoučký kůň úpěl ďábelské ódy
Now I convert this line to uppercase using gUU and I get:
PříLIš žLUťOUčKý Kůň úPěL ďáBELSKé óDY
instead of:
PŘÍLIŠ ŽLUŤOUČKÝ KŮŇ ÚPĚL ĎÁBELSKÉ ÓDY
As you can see the characters with accents don't get converted. What do I have to set in my .vimrc to get it working right?

Convert a String to unicode

I want to convert the following below string into unicode and save the same into database. Can someone please let me know how can I convert a string that has non ascii characters to uncode.
For example:
String s = how are you ¿cómo estás
convert it into
how are you \u??? ( non ascii characters into unicode)
Is there any Java API that does this?
Thanks,
Sai.

Eggplant/Sensetalk parsing and separating a string with capitalized words

I'm in need of the ability to parse and separate a text string using Sensetalk (the scripting language the Eggplant GUI tester uses). What I'd like to be able to do is provide the code a text string:
Put "MyTextIsHere" into exampleString
And then have spaces inserted before every capital letter save for the first, so the following is then stored in exampleString:
"My Text Is Here"
I basically want to separate the string into the words it contains. After searching the documentation and the web, I'm no closer to finding a solution to this (I agree, it would be far easier in a different language - alas, not my choice).
Thank you in advance to anyone who can provide some insight!
See question at http://www.testplant.com/phpBB2/viewtopic.php?t=2192.
With credit to Pamela at TestPlant forums:
set startingString to "HereAreMyWords"
set myRange to 2 to the number of characters in startingString // The range to iterate over– every character except the first
Put the first character in startingString into endString // The first character isn't included in the repeat loop, so you have to put it in separately
repeat with each character myletter of characters myRange of startingString
if charToNum(myLetter) is between 65 and 90 // if the character's unicode number is between 65-90...
Put space after endString
end if
Put myLetter after endString
end repeat
put endString
or you could do it this way:
Put "MyTextIsHere" into exampleString
repeat with each char of chars 2 to last of exampleString by reference
if it is an uppercase then put space before it
end repeat
put exampleString

Resources