tkinter - Hebrew text changes order mid-text - python-3.x

So I got this weird issue where I simply want to display a Hebrew text (with special characters, accents, diacritics, etc.) on a Label.
lines = None
with open("hebrew.txt", encoding="utf-8") as f:
lines = f.read().split("\n")
window = Tk()
lbl = Label(window, text=lines[11], font=("Narkisim", 14))
lbl.grid(column=0, row=0)
window.geometry('850x200')
window.mainloop()
Hebrew is a RTL language. The thing is - up until a specific length of the line - it displays correctly. But once the line is longer than that length, the next words appear to the right of the text (in English, think of it as if words started appearing to the left of the text instead of each word being the right-most). This only happens with lines over a certain length... (e.g. if I do text=lines[11][:108] it's still good. The "good" length changes a bit from line to line.
This only happens with all nikkud/diacritics/accents. If I use regular Hebrew text without these special characters - it's all good. Any ideas what could be the issue? It's driving me crazy.

Related

PIL Drawing text and breaking lines at \n

Hi im having some trouble getting sometimes longer texts which should be line breaked at specific lines onto an image it always just prints the \n with it without breaking the line and i cant find any info online if this is even possible or if it just sees the raw string to put on the img without checking for linebreaks. The Text is just some random stuff from a CSV
def place_text(self,text,x,y):
temp = self.csv_input[int(c)][count]
font = ImageFont.truetype('arial.ttf', 35) # font z.b.: arial.ttf
w_txt, h_txt = font.getsize(text)
print("Jetzt sind wie in der zweiten möglichkeit")
draw_text = ImageDraw.Draw(self.card[self.cardCount])
draw_text.text((x, y), temp, fill="black", font=font, align="left")
Yeah i know this Code is kinda all over the place but for putting the text on the image that shouldnt cause any issues does it?
Writing stuff on an Imgae with that results in just one line of continuous text with the \n's still in there and no line breaks.
Found the answer the String pulled fomr the CSV had to be decoded again before beeing placed
text = bytes(text, 'utf-8').decode("unicode_escape")
did the trick

How to begin highlighting words

I am writing an app of a Python editor of my own. I am using a Text widget and want to highlight words as the words are typed in like Python editor. As soon as the character # is typed in, I want to begin highlighting all characters followed from the character # with color of red.
Below is the partial code for the purpose. When the character # was identified as it was typed in to the Text widget, I added a tag of "CM" from the typed-in-character to the end of the line (I thought this would do the job for me).
import tkinter as tk
def onModification(event=None):
c=event.char
if not c: return
pos=hT0.index(tk.INSERT)
if c=='#':
hT0.tag_add('CM',pos,f'{int(pos.split(".")[0])}.end')
return
hW=tk.Tk()
hT0=tk.Text(hW,wrap='none',font=('Times New Roman'12))
hT0.insert('end','')
hT0.place(x=27, y=0, height=515,width=460)
hT0.bind('<Key>', onModification)
hT0.tag_config('CM', foreground='#DD0000')
But the output highlights only characters already existed even without the just-typed-in-character #.
An idea for the job I want?
Thank you so much in advance.
I obtained an idea from the Get position in tkinter Text widget
def onModification(event=None):
...
pos=hT0.index(tk.INSERT)
lineN, ColN=[int(c) for c in pos.split('.')]
if c=='#':
#hT0.tag_add('CM',pos,f'{int(pos.split(".")[0])}.end')
hT0.tag_add('CM',f'{lineN}.{ColN-1}',f'{lineN}.end')
return
...
#hT0.binds('<key>', onModification) needs to be changed to...
hT0.bindtags(('Text','post-class-bindings','.','all'))
hT0.bind_class('post-class-bindings', '<KeyPress>', onModification)

opening a large text file on Label or Text widget in tkinter

I'm working a tkinter application, i want to display a read only text file, whenever i add the text file using label or text, the file is not oranised at all
I want to open a text file in Label or Text(which ever works), but what I keep getting is a text that skids beyond my window frame. I added the scroll button, it's still doing the same thing. I want to the text to be well ordered in a specified Label/Text widgets(read only). thank you in advance.
from tkinter import *
root = Tk()
text_file = open("C:\\Users\stone's\Desktop\\works.txt")
text1 = text_file.read()
for i in text1:
if len(text1)==50:
## MOVE TO NEXT LINE
Label(root, text="%s" % ('\n'),
font=('Bradley Hand ITC', '25', 'bold'),
bg='#c9e3c1').pack()
else:
## DON'T MOVE OVER TO NEXT LINE
Label(root, text="%s" % (i), font=('Bradley Hand ITC',
'25', 'bold'
),
bg='#c9e3c1').pack(side = LEFT)
## ALL I'M TRYING TO DO IS TO SHOW A TEXT ON A LABEL APPROPRIATELY
## WITHOUT THE TEXTS SKIDDING OUT OF THE WINDOW FRAME
root.mainloop()
As said in the answers in this question the question is for python 2 however the comment I linked shows it for python 3 so you can make the text widget readonly.
As a side note: labels are for displaying small pieces of text as labels for other elements not displaying a whole file read only.

Unicode manipulation and garbage '[]' characters

I have a 4GB text file which I can't even load to view so I'm trying to separate it but I need to manipulate the data a bit at a time.
The problem is I'm getting these garbage white vertical rectangular characters and I can't search for what they are in a search engine because it won't paste nor can I get rid of them.
They look like these square parenthesis '[]' but without that small amount of space in the middle.
Their Unicode values differ so I can't just select one value and get rid of it.
I want to get rid of all of these rectangles.
Two more questions.
1) Why are there any Unicode characters here (in the img below) at all? I decoded them. What am I missing? Note: Later on I get string output that looks like a normal string such as 'code1234' etc but there are those Unicode exceptions there as well.
2) Can you see why larger end values would get this exception list index out of range? This only happens towards the end of the range and it isn't constant i.e. if end is 100 then maybe the last 5 will throw that exception but if end is 1000 then ONLY the LAST let's say 10 throw that exception.
Some code:
from itertools import islice
def read_from_file(file, start, end):
with open(file,'rb') as f:
for line in islice(f, start, end):
data.append(line.strip().decode("utf-8"))
for i in range(len(data)-1):
try:
if '#' in data[i]:
a = data.pop(i)
mail.append(a)
else:
print(data[i], data[i].encode())
except Exception as e:
print(str(e))
data = []
mail = []
read_from_file('breachcompilationuniq.txt', 0, 10)
Some Output:
Image link here as it won't let me format after pasting.
There's also this stuff later on, I don't know what these are either.
It appears that you have a text file which is not in the default encoding assumed by python (UTF-8), but nevertheless uses bytes values in the range 128-255. Try:
f = open(file, encoding='latin_1')
content = f.read()

Sublime 3; How to accurately count characters when both CR and LF are present in line termination

When editing a text file, the default character-counting function in Sublime 3 counts a newline as one character, irrespective of whether the line ends in LF or CR,LF. I cannot find a setting to give me the correct count for a text file with CR,LF line endings.
I've tried installing the WordCount package, but it has the same issue. Setting the preference
char_ignore_whitespace : true
does not change the behaviour.
One could argue that Sublime's behaviour is incorrect since (for the files I am working with) the newline constitutes two characters, not one.
The reason I would like the count to include the CR in the character count (i.e. a newline is 2 characters) is so that I can use Sublime to help debug some code that is using ftell/fseek. As it stands, I have to keep adding the line count to the character count to get the correct byte position in the file (when selecting all text from the beginning of the file to the point of interest).
Is there a setting I am missing? Is there a different package that can be used?
EDIT: I noticed that Notepad++ correctly reports the character count for such files, but I prefer to use Sublime :)
EDIT2: I found some code here (Is it possible to show the exact position in Sublime Text 2?) that works in Sublime 3, but also counts two-character newlines incorrectly.
I modified the code in the link above to crudely compensate the the missing end-of-line counts. The code below simply adds the number of lines (zero-based) to the character position and displays this as an alternative position (in case a unix-style file is open, in which case the compensation is not required).
This is admittedly crude, and not intelligent enough to determine what type of line ending the file has. Maybe someone else has a better idea?
import sublime, sublime_plugin
class PositionListener(sublime_plugin.EventListener):
def on_selection_modified(self,view):
text = "Position: "
sels = view.sel()
for s in sels:
if s.empty():
row, col = view.rowcol(s.begin())
text += str(s.begin())
text += " [" + str(s.begin() + row) + "]"
else:
text += str(s.begin()) + "-" + str(s.end())
row, col = view.rowcol(s.begin())
text += " [" + str(s.begin() + row) + "-"
row, col = view.rowcol(s.end())
text += str(s.end() + row) + "]"
view.set_status('exact_pos', text)

Resources