(Python3) string's (real) length/width - python-3.x

Let's say I want to "underline" a string with dots with Python3 :
Everything's fine with ASCII characters : I get the length with len(mystring) and I write as much dots as needed. Here is an example with a string whose length is 8 :
mystring
........
But with non-ASCII characters len(mystring) doesn't return the result I need; e.g. len("列島") is 2, but I need 4 dots to underline the string :
列島
....
How can I get the correct result ? Any help would be appreciated !

... unicodedata.east_asian_width() does the trick.
See by example this implementation.

Related

How to find arabic character all occurrences with its dialect?

I am trying to find the occurence of arabic character with its harakat in string such as "رَّ" in "بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ".
Arabic characters can take harakat for example "ر" is the original arabic character but can have harakat so it can look something like this "رَّ"> I am using Python 3 to find the character occurence with a specific harakat but could not do that. I have tried for loop and tried converting the string to unicode but could not do that.
str = "مرة رجل حكيم قال بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ"
i=0
for s in str:
if s == "رَّ":
i = i + 1
print(i)
Expected output is 2 but 0 is what I get.
len("رَّ") returns 3, which means the glyph is represented by three characters. Your loop checks a single character at a time and so never finds a match.
You need to be looking for substrings, which is exactly what .count() is for.
i = str.count('رَّ')

Remove non ASCII characters in octave

I'm trying to remove non ASCII characters read from a data file using OCTAVE but I can't make it work. I tried getting the ASCII codes of these "weird" characters and they do have random ASCII codes. An example string of characters is this:
asdqwФЕДЕРАЛЬ234НОЕ234 АГЕНТСqewwqedasТВО ПasdsadО ОБРАasdasdЗОВАНИЮ
Госудаsadasdsagwfрственная акадеasdмия профессиональной п
Do you guys have any suggestions on how can i remove the non ASCII characters from this string? Or better yet, how will I be able to determine if a given string has non ASCII characters?
Thanks in advance!
To remove all non ASCII chars in the range of 0..127 decimal use
a = "asdqwФЕДЕРАЛЬ234НОЕ234 АГЕНТСqewwqedasТВО ПasdsadО ОБРАasdasdЗОВАНИЮ Госудаsadasdsagwfрственная акадеasdмия профессиональной п";
a(! isascii (a)) = []
which gives
a = asdqw234234 qewwqedas asdsad asdasd sadasdsagwf asd
and if you just want to check if there are non ASCII chars:
any (! isascii("foobar"))
ans = 0
any (! isascii("foobaröäüß"))
ans = 1

Oracle, adding leading zeros to string (not number)

I am using Oracle (work space is TOAD) and I need to make my strings that if they are shorted then 10 characters then add leading zeros to make them all 10 digit strings.
For example if I have a string like this:
'12H89' need to be '0000012H89'
or
'1234' to be '0000001234'
How can this be done? Whats the best way?
Thanks in advance .
You can use the LPAD function for that, passing in the string, the length you want it to be, and the character to pad it with. For 10 digits with leading zeroes this would be:
LPAD('12H89', 10, '0')
The return value is the padded string.
See: http://www.techonthenet.com/oracle/functions/lpad.php

AS3 - "\u2605" NOT the same as "\\u"+"2605"?

Trying to make a textfield where people write the unicode without the backslash. I want to add the backslash after they typed it. So the user types u2605 and the code converts it to "\u2605", i then convert this to a unicode character and insert it in textflow.
My code:
this works:
span.text = publicFunctions.htmlUnescape(he.encode("\u2605"))
this doesn't work:
span.text = publicFunctions.htmlUnescape(he.encode("\\u"+"2605"))
how to make a string that acts as a unicode string?
Tried all sorts of things, escape(unescape()), convert to number, "\u", "\u" ... nothing helps.
trace("\u2605" == "\u"+"2605") ... will return false. So will
trace("\u2605" == "\u"+"2605")
"\u2605" is a string with a single character, the character with the code point 2605, while "\\u" + "2605" is a string with 6 characters (the backslash, the u and the four digit number).
If you want to construct a unicode character from just the four digits, you should be able to use String.fromCharCode. The thing is just that the escape sequence uses a hexadecimal number, while the method obviously takes a decimal number. So if the user enters a hexadecimal string, you will have to convert that first:
trace(String.fromCharCode(parseInt('2605', 16)) == '\u2605'));
That's an interesting issue! I don't think you can concatenate a string literal and achieve what you're trying to do. The relevant character escaping happens when the string literal is originally formed, which means that you need the whole sequence together in the first place.
But you should be able to take the user-supplied number and dynamically generate a Unicode string with String.fromCharCode(...).
http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/String.html#fromCharCode()

Remove unwanted text string with strange hex character?

I have a program that reads in hex strings, and returns text based on the parameters of the string. A hex string goes as follows:
A : B : C
A - Length of B
B - Name of interface
C - Unimportant
so for example;
0465746830010000 =
04 : 65746830 : 010000 =
4 : eth0 : __________
Now, I want to process the hex strings so that if there is a character that isn't in the alphabet, 0-9, or '-' , it lets me know somehow.
Such as here :
0266010000
02 : 6601 : _______
2 : f[unreadable] : _______
Any ideas on how would I process this so that it lets me know if any of the characters outside of these parameters arise?
A quick regex can do that, for instance in java:
myString.matches([0-9a-fA-F]*)
will return true if the entire string consists of hex digits, and will return false if there are other characters.
Also, why would you want to accept the - character? These strings should (if properly designed) contain the unsigned representations of the numbers. If they are out of your control, then I guess you just gotta deal with it, but still.

Resources