wxPython pdfviewer unknown font preventing pdf loading - python-3.x

While checking out wxPython's wx.lib.pdfViewer, I ran the code in the documentation. It worked well, but most of the pdf did not load, and messages like these
Unknown font ghlcbg+arial
Unknown font ghlcdi+arial,bold
were printed. I surmise that it is because the fonts used in the pdf are not implemented by PyPDF2. How can I solve this problem? I can preprocess the pdf to replace the fonts if necessary, but this I do not know how to do either.
Is there a library which allows for replacement of fonts? If there is not, is there a replacement for PyPDF2 which permits this replacement (PyMuPDF does not work for me due to another error)? If such a replacement is not possible, is there an extension which permits the interpretation of such fonts?

wxPython cannot load a PDF that PyPDF2 does not support. I think there is talk about moving to pdfrw, but that has not occurred as of yet and I don't know if it actually has handling for unknown fonts or not. The main benefit to moving is PyPDF2 is no longer being maintained.
PyQt can do it since it integrates with poppler. See https://github.com/frescobaldi/python-poppler-qt4 or http://shallowsky.com/blog/programming/qt5-poppler-pdf.html

Related

Render small LaTeX-Snippets on GTK GUI using Rust

Problem
I am developing a Rust program which has a GTK3 GUI using the given rust-gtk-binding.
The program should be cross-platform (at least Linux and Windows).
The GUI should be able to show custom plaintexts and small LaTeX-snippets to allow the use math environments (small means sizes of one formula as an element to display).
Therefore, I need a way to convert LaTeX-code into something which can be displayed by the GUI.
Ideas and their problems
I can see two approaches displaying LaTeX:
Compile the LaTeX-source into pdf and then into some image type. It should be possible to use Ghostscript to get the image. But I do not know how to generate the pdf in a way which is lightweight (does not include rather large packages like miktex) and cross-platform. This option could also be overkill as there is no need to dynamically download special packages, a good math support would be sufficient. The positive side is that rendering an image in GTK should be easy.
Use KaTeX which should be sufficient for math environments. I was able to install matching crates and generate HTML source from some formulas. But here it becomes difficult to render the result as GTK has no native way for displaying HTML. As it would be difficult to integrate a HTML-engine into the GUI it would be optimal to find a tool which is able to render HTML to an image type which then can be displayed.
Now I have two ways both using an intermediate step where for common LaTeX the first step is difficult and for KaTeX the second step displays a problem. For both approaches' difficult steps I could not find any feasible solution.
Are there any libraries or similar I could not find or are there any different approaches?
It would be perfectly sufficient to be able to render a single formula, I just want to avoid such massive and difficult overkills like using a complete LaTeX compiler or half a browser to render HTML.
After searching and evaluating many more approaches I got a solution which is kind of good while also having some major drawbacks:
First of all, I use TinyTex as LaTeX environment. I did not restrict the use of LaTeX to e.g. math environments. TinyTex offers support for major platforms while being lightweight and portable. Additional LaTeX packages have to be installed manually which allows me to decide which ones are being shipped with my application.
The negative side is that while TinyTex is lightweight for a LaTeX environment it still is rather big for its purpose here (about 250MB).
I installed the required packages to use \documentclass[preview]{standalone} to get an already cropped pdf.
Afterwards I use Ghostscript to get a png-image for the generated pdf. I did not use language bindings and instead just went std::process::Command.
The following lines should be sufficient to convert test.tex into test.png with the portable TinyTex installation and Ghostscripts gswin64c.exe present in subfolders of the project's directory under Windows: (As TinyTex and Ghostscript also exist for other OS the given example can easily be changed to work on other systems)
use std::process::Command;
fn main() {
let output = Command::new("TinyTex\\bin\\win32\\pdflatex.exe")
.args(&["test.tex"])
.output()
.expect("Some error message 1");
println!("{}", String::from_utf8(output.stdout).unwrap());
let output = Command::new("gs\\gswin64c.exe")
.args(&[
"-dNOPAUSE",
"-dBATCH",
"-sDEVICE=png16m",
"-r1000",
"-sOutputFile=test.png",
"test.pdf",
])
.output()
.expect("Some error message 2");
println!("{}", String::from_utf8(output.stdout).unwrap());
}
Of course this is no particulary good and useful code at this stage but it shows how to proceed with the given problem and I wanted to leave it here in case anyone with a similar problems finds this post.

ImageMagick issue on AppEngine Standard (PDFs and NodeJS)

I am using App Engine Standard. Since ImageMagick is available on it, I tried a few PDF manipulation libraries and basically, what I would like to do, is simply converting a PDF into an image.
The issue I am getting is this:
'convert-im6.q16: not authorized /tmp/ygM1sF-Txq00JkGbpal8YWBQ.pdf\'
# error/constitute.c/ReadImage/412.\nconvert-im6.q16: no images
defined/tmp/ygM1sF-Txq00JkGbpal8YWBQ-0.png\' #
error/convert.c/ConvertImageCommand/3258.\n' }
After some research, I found out that post here: Fix for ImageMagick convert errors with pdf files. Here is what he says:
PDF files on Linux systems are usually handled by ghostscript (via the
terminal command gs). And, ImageMagick (done through the terminal
convert command) uses ghostscript for reading and writing PDF files.
Because the security problems are serious and numerous, ImageMagick’s
access to PDF files is then cut off.
Granted, through these security flaws in PDF someone could craft a
malicious image file that, when converted by ImageMagick into a PDF,
will then do very nasty things to your computer.
But, ghostscript has since been updated once and once again with
security fixes. How about a fix for ImageMagick to get PDF
functionality back? Or, at least an explanation of progress towards
fixing this issue?
I can't change the ImageMagick configuration on App Engine Standard, but I wonder if there is something else I can do. Or maybe the engineers at Google would be able to update ImageMagick instead and remove that limitation?
I really need to convert PDF into images, so I wonder if it worth waiting, or if I need to find another solution.
Thanks for your ideas.

How to suppress graphviz warnings in Python 3?

I'm using the graphviz module to render a network of nodes and links.
I use small circles as node's shape, so the labels are intentionally bigger than the nodes.
As a consequence, I get the following warning:
"Warning: node 'wave', graph 'N' size too small for label"
'Wave' is just an example of a node's label.
I get lots of this warnings because of the high quantity of nodes (screencapture).
So, my question is: How can I suppress warnings like those?
The graphviz command I'm using is:
n.view() # n is my digraph
I have already tried the suggestions from:
How to suppress a third-party warning using warnings.filterwarnings
How to redirect python warnings to a custom stream?
But nothing so far.
Thanks in advance.
Try Eli Bendersky's excellent page: Redirecting all kinds of stdout in Python
After replacing stdout with stderr, Eli's solution worked on graphviz for me.
If you happen to be using Evince, Ubuntu's builtin PDF viewer, see:
https://superuser.com/questions/980237/silence-evinces-warnings-in-ubuntu
For graphviz, as of v0.11, there is a silent option for the .draw() method (see graphviz docs:graphviz.view).
Apparently not on windows.
This applies to those using networkx (which uses pygraphviz) or pygraphviz.
In pygraphviz, the warnings are collected and then pushed using the warnings module (see pygraphviz/agraph.py:1390).
You can therefore silent the warnings specifically when drawing:
warnings docs:Temporarily Suppressing Warnings
import warnings
<create graph etc>
with warnings.catch_warnings():
warnings.simplefilter("ignore")
g.draw()

Is importing a truetype font for perl module possible?

After a great deal of toiling I have managed to hack up a perl script that use PDF::Create to create pdf labels for a label printer. The script now works perfectly to suit my needs but the choices of fonts are limited. I have installed a true type font with gnome-font-viewer (I am using Debian Stretch).
According to the docs for PDF::Create: A postscript named "BaseFont" (Courier, Courier-Bold, Courier-BoldOblique, Courier-Oblique, Helvetica, Helvetica-Bold, Helvetica-BoldOblique, Helvetica-Oblique, Times-Roman, Times-Bold, Times-Italic, Times-BoldItalic or Symbol) can be used to create the pdf.
How can I add my newly installed true type font to this list of Postscript basefonts so that I can use it to create a PDF? Or is this not possible?
There is an old and unresolved ticket requesting that feature on that module, so I suspect for PDF::Create, the answer is no, you can't use TTF fonts (I'd be happy to be proven wrong on that point).
Other Perl modules can use True Type and Open Type fonts. PDF::API2 does support them. I'll confess, I found that module's API to be somewhat intimidating.
Another module I have used extensively is PDF::Reuse which can be used to create PDFs from scratch, or by combining existing pages as templates and overlaying text and other elements. PDF::Reuse also supports the use of True Type fonts.

Converting HTML to odt, doc, docx

Is there an easy way to convert HTML(with CSS styles and embedded images) to ODT, DOCX, DOC from the command line on linux server. I searched a lot but have not found a good option.
There was a problem the same way to convert to PDF, decided by wkhtmltopdf. Perhaps there are ways to convert the resulting PDF documents to other formats?
To convert to odt it's pretty easy after installing pandoc.
After the relatively hard part: from odt (or even html) you can script (Open|Libre)Office via e.g. unoconv
Or you can like:
abiword --to=doc filename.odt
Also see this thread, and this blog post.
HTH
If you want to convert HTML into docx you may use a solution like PHPDocX. You need to get the PRO version though because the free one does not include the conversion functionality.
If you're on ruby there is a gem based on libreoffice headless (with pyod/jod converter) and pdf tools.
Post with your issues to the pandoc GoogleGroup, John is very responsive in every way.
You may even find the latest release v1.9 may fix your problem, or maybe you just need to get to know the toolset in more detail.
I found soultion - is abiword in console variant.

Resources