Is there a way to instruct pylatex to not ignore extra spaces (whitespaces) on its own? - python-3.x

I have started using pylatex four days ago to automate report generation (I have no earlier experience in latex as well). The text that I want to enter in the latex report has been generated by an online server and stored in my file as a text file. In the text file, there exist places with more than one simultaneous space for proper formatting and aligning (Protein Sequence Alignment). Hence, while simply trying to use .append, pylatex seems to ignore all the extra spaces on its own. I searched the internet with various relevant terms, but could not find any answer for pylatex. I did find some latex answers, exploring which I tried to incorporate pylatex's NoEscape and explicitly replacing " " (double spaces) with " \space ", it seems to work for some lines as expected, but not for others (maybe some of the places have \ (or some other special characters for latex) which are expected to be meaningful but could not understand what it actually means since it does not even exist (for example \clo). Can someone please suggest a method where I can simply allow the simultaneous existence of the spaces, which is also visible in the final document without the need to think about the existence of a possible special character that needs to be escaped. The following code snippets that I used might be of use to answer/understand my query.
i = 0
with doc.create(Subsection('Details')):
with open(whatcheck_detail, 'r') as fh:
lines = fh.read().split("#")[1:]
for line in lines:
i += 1
line = line.replace(" ", " \space ").replace("#", "\#") + "\n"
if "Note:" in line:
print(i)
# doc.append(TextColor(line))
doc.append(NoEscape(line))
elif "Warning:" in line:
print(i)
# line = "\color{blue} " + line
doc.append(NoEscape(line))
# doc.append(TextColor("blue", line))
elif "Error:" in line:
print(i)
# line = "\color{red} " + line
doc.append(NoEscape(line))
# doc.append(TextColor("red", line))
And the following is the screenshot of the last part of the error report.
Error Description upon running the above script.
Following are the images of what is happening by using the simple pylatex code doc.append(TextColor("color", line)), and what is actually in the text file (which is how I want it to be on PDF generated by latex/pylatex).
What is happening to the text in the output file.
What is in the text file, or how do I want it.
Thank you!

Related

How to detect what kind of break line in a text file in python?

My problem is the following. I have a text file with a bunch of lines in it. The problem is this text might have been created by Windows or Unix or Mac.
I want to open this text in python (as a string block) and split according to a break line to get an array at the end with all lines. The problem is I only tested this with a windows created file so I can split the string block easily according \n. But if I understand correctly other environnement use \r \r\n ...Etc
I want a general solution where I can detect what kind of line break is used in a file before I start splitting in order to split it correctly. Is that possible to do?
thanks;
UNIX_NEWLINE = '\n'
WINDOWS_NEWLINE = '\r\n'
MAC_NEWLINE = '\r'
This will be how the different os apply line breaks in a file and how python sees it

How to allow argument containing spaces with python3 cmd library?

I'm using cmd library to create simple command line interface with code completion. Problem occurs when command argument contains special characters. Code completion runs only on last part separated by these special characters.
Here is simple code to test it:
class Test(Cmd):
def complete_test(self, text, line, b, e):
print(text)
print(line)
print(b)
print(e)
Type test and argument containing, for example, slash. Only last part after / is included in text, and if you return something, only this last part gets replaced.
I used comments under this answer to fix problems with other special characters. But I can't just do readline.set_completer_delims(""), because code completion does not work. I need to at least set space as delimiter (readline.set_completer_delims(" ")), so that it code completion finds where argument starts. But now I can't pass paths containing spaces (see my completion code below):
def complete_export(self, text:str, line:str, begidx, endidx):
return [x for x in glob(text + "*") if x.startswith(text)]
My export command only requires one argument - path, so ideal behavior would be to consider first space as beginning of argument and other spaces would be considered part of path.
Note: I have realized that it's possible to use line argument, and extract path manually, but code completion would still replace only last part, so path would have to be edited. I submitted this as an answer, but it's not very elegant solution.
Here is solution manually separating path from line, performing globbing and returning only parts of path after spaces that are already present. One problem is, that if paths contain spaces, if you press tab twice, you get suggestions only for rest of text after space. Depending on use case, this might be problem.
def complete_export(self, text:str, line:str, begidx, endidx):
path = line[line.find(" ")+1:] # get everything after space
return [" ".join(x.split(" ")[(line.count(" ") - 1):]) for x in glob(path + "*")] # completion suggestions after last space

Python - How do I separate data into multiple lines

I have two strings that i want to put into a txt file but when I try and write then, it's only on the first line, I want the string to be on separate lines how do I do so?
Here is the writing part of my code btw:
saveFile = open('points.txt', 'w')
saveFile.write(str(jakesPoints))
saveFile.write(str(alexsPoints))
saveFile.close
if jakesPoints was 10 and alexsPoints was 12 then the text file would be
1012
but i want to to be
10
12
You can use a newline character (\n) to move to a new line. For your example:
with open('points.txt', 'w') as saveFile:
saveFile.write("{}\n".format(jakesPoints))
saveFile.write("{}\n".format(alexsPoints))
The other things to note:
It is helpful to open files using with - this will take care of opening and closing the file automatically (which is typically preferred over trying to remember to .close()).
The {}.format() section is used to convert your numbers to a string and add the newline character. I found https://pyformat.info/ explained the string formatters pretty good and highlight all the main advantages.
with open('points.txt', 'w') as saveFile:
saveFile.write(str(jakesPoints))
saveFile.write("\n")
saveFile.write(str(alexsPoints))
See difference betweenw and a used in open(). Also see join() .

Multiline input prompt indentation interfering with output indentation

I have a function that prints the number of pixels found in an image and then asks the user how they would like to proceed. As long as the interpreter hasn't moved on from the function I want all the output to be indented accordingly.
One such 'sub output' (the input prompt) needs to be multiple lines. So I kick off with the 3*quote (''') followed by two spaces to create the indentation. At the end of the question 'how would you like to proceed?' I use a hard return. An extra indentation is assumed by the text editor so I remove it causing the following list of suggestions to line up flush with the input variable command. Here's how it looks:
def returnColors():
#
# lots of code that does stuff...
#
print("The source image contains", lSize, "px.")
print("")
command=input(''' What would you like to do? You can say:
get all
get unique
''')
The problem with this is that the interpreter is acknowledging the indentation that separates the function body from the function statement as actual string contents, causing the output to look like this:
The source image contains 512 px.
What would you like to do? You can say...
get all
get unique
|
The only way to avoid this is by breaking indentation in the interpreter. Although I know it works, it doesn't look very good. So what options do I have?
EDIT: Just because I have the screenshot_
One thing that you should keep in mind is that once you have start a multiline string declaration, all the text until it is closed is taken as is and syntax (ie, indentation) is no longer considered.
You can start your multiline with an explicit new line so that everything in the multiline string can be indented together in code.
IE.
command=input('''
What would you like to do? You can say:
get all
get unique
''')
would print out the prompt with a new line on top, but the formatting of the text is more explicitly shown and should appear as seen.
OR you could use the \n for each new line in the string to get it formatted more correctly and remember to use a single \ after each new line. E.g.
instead of:
''' What would you like to do? You can say:
get all
get unique
'''
Try
' What would you like to do? You can say:\
\n\
\n get all\
\n get unique\
\n'
The indent won't matter, no matter where you use \n at the beginning of new line, the input() will output the same. This is will give the same input() string:
' What would you like to do? You can say:\
\n\
\n get all\
\n get unique\
\n'

How to make a dictionary that contains an Arabic diacritic as a key in python

I am trying to make a program that converts the Arabic diacritics and letters into the Latin script. The letters work well in the program, but the diacritics can not be converted as I get an error every time I run the program.
At the beginning, I put the diacritics alone as keys but that did not work with me. please, see the last key, it contains َ ,which is a diacritic, but do not work properly as the letters:
def convert(lit):
ArEn = {'ا':'A', 'ل':'L', "و": "W", "َ":"a"}
end_word=[]
for i in range(len(lit)):
end_word.append(ArEn[lit[i]])
jon = ""
print(jon.join(end_word))
convert("الوَ")
However, I tried to fix the problem by using letters attached with diacritics as keys, but the program resulted in the same error:
the dictionary:
ArEn = {'ا':'A', 'ل':'L', "وَ":"Wa"}
the error:
Traceback (most recent call last):
File "C:\Users\Abdulaziz\Desktop\converter AR to EN SC.py", line 10, in <module>
convert("الوَ")
File "C:\Users\Abdulaziz\Desktop\converter AR to EN SC.py", line 5, in convert
end_word.append(ArEn[lit[i]])
KeyError: 'و'
The chances are rather there is a bug in the programing-code editor you are using for coding Python than on Pyhton itself.
Since you are using Python-3.x, the diacritics from the running progam point of view are just a single character, like any other, and there should be no issues at all.
From the cod-editor point of view, there are issues such as whether to advance one character when displaying certain special unicode characters or not, and maybe the " character itself can be show out of space - when one tries to manually correct the position of the ", one could place it out of order, leaving the special character actually outside the quoted string -
The fact you could solve the issue by re-editing the file suggests that is indeed what happened.
One way to avoid this is to put certain special characters - specially ones that have different displaying rules, is to escape then with the "\uxxxx" unicode codepoint unicode sequence. This will avoid yourself or other persons having issues when editing your file again in the future, since even i yu get it working now, the editor may show then incorrectly when they are opened, and by trying to fix it one might break the syntax again.
You can use a table on the web or Python3's interactive prompt to get the unicode codepoint of each character, ensuring the code part of the program is displayed in a deterministic way in any editor - (if you add the diacritical char as a comment on the same line, it will actually enhance the readability of your code - enormously if it is ever supposed to be edited by non Arabic speakers)
So, your above declaration, I used this snippet to extract the codepoints:
>>> ArEn = {'ا':'A', 'ل':'L', "و": "W", "َ":"a"}
>>> [print (hex(ord(yy)), yy ) for yy in ArEn.keys()]
0x648 و
0x644 ل
0x64e َ
0x627 ا
Which allows me to declare the dictionary like this:
ArEn = {
"\u0648": "W", # و
"\u0644": "L", # L
"\u064e": "a", # ۮ
"\u0627": "A", # ا
}
(And yes, I had trouble with displaying the characters on my terminal like I said you probably had on your editor while getting these - the fatha ("\u064e" - "a") character is tricky ! :-) )
Alternatively for using the codepoints in your code, is to use Python's unicode data module to discover and them use the actual character names - this can enhance readability further, and maybe by exploring unicodedata you can find out you don't even have to create this dictionary manually, but use that module instead -
In [16]: [print("\\u{:04x} - '{}' - {}".format(ord(yy), unicodedata.name(yy), yy) ) for yy in ArEn.keys()]
\u0648 - 'ARABIC LETTER WAW' - و
\u0644 - 'ARABIC LETTER LAM' - ل
\u064e - 'ARABIC FATHA' - َ
\u0627 - 'ARABIC LETTER ALEF' - ا
And from these full text names, you can get back to the character with the unicodedata.lookup function:
>>> unicodedata.lookup("ARABIC LETTER LAM")
'ل'
notes:
1) This requires Python3 - for Python2 one might try to prefix each string with u"" - but one dealign with these characters is far better off using Python 3, since unicode support is one of the big deals with it.
2) This also requires a terminal with a nice support for unicode characters using "utf-8" encoding - I am on a Linux system with the "konsole" terminal. On Windows, the idle Python prompt might work, but not the cmd Python prompt.
You might need proper indentation in python:
def convert(lit):
ArEn = {'ا':'A', 'ل':'L', "و":"W", "َ":"a", "ُ":"w", "":""}
end_word=[]
for i in range(len(lit)):
end_word.append(ArEn[lit[i]])
jon = ""
print(jon.join(end_word))
convert("اُلوَ")
Update: I just noticed, after years, that the letters and diacritics are put together in the first try. When I separated them, the program worked.
I just solved the problem!
I am not really sure if it is a mistake in python or something else, but as far as I know python does not support Arabic very well. Or maybe I made a problem in the program above.
I kept writing the same program and suddenly it worked very well.
I even added different diacritics and they worked properly.
def convert(lit):
ArEn = {'ا':'A', 'ل':'L', "و":"W", "َ":"a", "ُ":"w", "":""}
end_word=[]
for i in range(len(lit)):
end_word.append(ArEn[lit[i]])
jon = ""
print(jon.join(end_word))
convert("اُلوَ")
the reult is
AwLWa

Resources