I would like to convert numbers from a string I receive after an OCR recognition over Japanese text.
For example, when I extract a date:
③① 年 ⑫ 月 ①③ 日
I would like to convert it to:
31 年 12 月 13 日
What would be the best way to achieve it ?
I would use unicodedata
import unicodedata
print(unicodedata.normalize("NFKC","③① 年 ⑫ 月 ①③ 日"))
The result is this,
31 年 12 月 13 日
This also converts other variation of Japanese digits, full-width digits.
import unicodedata
print(unicodedata.normalize("NFKC","123①②③123"))
to
123123123
Assuming you already have the text OCR'd to the circled numbers in your question, a simple text replace will suffice. Here's how you'd do it in Python:
def uncircle(s):
for i in range(1, 21):
s = s.replace(chr(0x245f + i), str(i))
return s.replace('\u24ea', '0')
The circled numbers 1 through 20 are the Unicode codepoints 0x2460 through 0x2473, and the circled number 0 is the Unicode codepoint 0x24ea.
Related
I have a list of strings with varying formats. Some of them start with a hyphen separated date time followed by a string that's a mixture of 16 numbers of letters. I would like to filter for only the strings that match this format. I've provided input and out put examples below. I'm not a regex expert, could someone please suggest a slick way to do this with python?
Input:
example_list=['2022-05-05-16-59-25-5840ZQ37F231D95W',
'wereD/22fdas/',
'mnkljlj/124kljf/oaahreljah',
'2022-09-11-16-59-25-5840XY37F231D95Z']
output:
['2022-05-05-16-59-25-5840ZQ37F231D95W',
'2022-09-11-16-59-25-5840XY37F231D95Z']
update:
using the suggestion below with re.match and list comprehension worked fine, thanks!
import re
[x for x in example_list if re.match("^\d{4}(-\d\d){5}-[A-Z\d]{16}$",x)]
Try this:
^\d{4}(-\d\d){5}-[A-Z\d]{16}$
See live demo.
Regex breakdown:
^ start of input
\d{4} 4 digits
(-\d\d){5} 5 lots of a dash then 2 digits
[A-Z\d]{16} 16 of a caps letter or a digit
$ end of input
I'm trying to understand how Base64 works.
If you wanted to send !"# using Base64, what would it look like?
Here's my working out:
String: ! " #
Hex: 21 22 23
Binary: 00100001 00100010 00100011
Base64 conversion:
Hex: 4 12 8 23
Binary: 001000 010010 001000 100011
None of the final binary values are able to be represented using any of the ascii chars in Base64.
I've obviously misunderstood something here, if anyone can point me in the right direction with an example that would be great.
If I understand your question correctly, you are using trying to re-interpret the Base64 values as characters using an ASCII table (i.e. 0x04 would be EOT). However, you will have to use the base64 index table to convert the resulting numbers back to characters (note that the index values are in decimal, not in HEX there).
Here, your values will be
Base64:
Hex: 4 12 8 23
String: E S I j
Does that make sense?
I have the following join in my code
(' '.join(s.split()[10:14])
But i also want to print word [16], i have tried "and", "+" etc. but no luck
Hope somebody can help me :-)
You can use multiple slice objects with the operator.itemgetter method, and use itertools.chain.from_iterable to join the slices, so that you would not have to split the same string twice or store the result of the split in a temporary variable:
from operator import itemgetter
from itertools import chain
print(' '.join(chain.from_iterable(itemgetter(slice(10, 15), slice(16, 17))(s.split()))))
Here's one way to do that:
>>> s = 'But i also want to print word [16], i have tried "and", "+" etc. but no luck Hope somebody can help me :-)'
>>> a = s.split()
>>> print(' '.join([*a[10:14],a[16]]))
tried "and", "+" etc. luck
You can do it by joining both strings after extracting them individually.
Sample string:
s = '1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19'
' '.join(s.split()[10:14]) + ' ' + s.split()[15]
Also, as pointed out by #blhsing , the second join might not be needed here and only the 16th element of split can be added as a string. Also added a whitespace between the two strings.
Output:
Out[19]: '11 12 13 14 16
I am finding a formula which can convert a 16 bit binary number into two separate decimal number
0000000110010000 -> 0x0190
I want the decimal number to be 1 and 144
I have 50 columns(say M1 to M50) of binary numbers so need to make a generic formula for this
If M1 contains your binary number (as a text string), then use
=BIN2DEC(LEFT(M1, 8))
to extract the left part
and
=BIN2DEC(RIGHT(M1, 8))
to extract the right part.
If you want the result in the same cell then use something like
=BIN2DEC(LEFT(M1, 8)) & "|" & BIN2DEC(RIGHT(M1, 8))
where the | is an arbitrary separator, which you can change or omit to suit personal taste.
are they all exactly 16 characters long? You could do:
=BIN2DEC(RIGHT(M1,8))
=BIN2DEC(LEFT(M1,8))
I want to this but i don't know what to do, the only functions it seems to be useful is "DEC.TO.HEX".
This is the problem, i have in one cell this text:
1234
And in the next cell i want the hexadecimal value of each character, the expected result would be:
31323334
Each character must be represented by two hexadecimal characters. I don't have an idea how to solve this in excel avoiding make a coded program.
Regards!
Edit: Hexadecimal conversion
Text value Ascii Value (Dec) Hexadecimal Value
1 49 31
2 50 32
3 51 33
4 52 34
Please try:
=DEC2HEX(CODE(MID(A1,1,1)))&DEC2HEX(CODE(MID(A1,2,1)))&DEC2HEX(CODE(MID(A1,3,1)))&DEC2HEX(CODE(MID(A1,4,1)))
In your version you might need the .s in the function (and perhaps ;s rather than ,s).
DEC2HEX may be of assistance. Use, as follows:
=DEC2HEX(A3)
First split 1234 to 1 2 3 4 by using MID(), then use Code() for each character, and then again concentate. Below is the formula, Y21 is the cell in which 1234 is written
=CONCATENATE(CODE(MID(Y21,1,1)),CODE(MID(Y21,2,1)),CODE(MID(Y21,3,1)),CODE(MID(Y21,4,1)))
1234 >> 49505152