Font's poor quality in Graphviz - linux

I have extremely simple DOT test.gv file:
digraph SampleGraph
{
sampleNode [ label="StackOverflow" ];
}
When I render it with dot with -Tpdf flag, I have really poor quality pixelated font.
This is result of dot test.gv -Tpdf > test.pdf:
PostScript flag -Tps works fine. This is result of dot test.gv -Tps > test.ps:
Why?
Edit: I've found Graphviz font FAQ but it doesn't explain this.

Even the official documentation doesn't recommend to use direct -Tpdf conversion.
pdf
Produces PDF output. (This option assumes Graphviz includes the Cairo renderer.) Alternatively, one can use the ps2 option to
produce PDF-compatible PostScript, and then use a ps-to-pdf converter.
Note: At present, this option does not support anchors, etc. To get these
included in your PDF output, use ps2.
It doesn't support several things, they marked it with etc. So you can guess what else they don't support with -Tpdf.
Better convert it to ps then to pdf with ps2pdf from ghostscript package.
ps2pdf myfile.ps myfile.pdf
Also have a look at this
No freetype. ============
When graphviz is built on systems without freetype, then only the gd
renderer will be available for bitmap outputs, and the only available
fonts are a small set of builtin bitmap fonts. The poor quality of
these fonts will be evident, also, "dot ... -v 2>&1 | grep font" will
say that the font is "". This may actually be desirable for
installing minimal graphviz programs on a server where fonts may not
even be installed.
Probably you need to install freetype fonts and rebuild dot if you have built it manually

Related

ps2pdf creates a very big pdf file from paps-created-ps file

In linux, I use ps2pdf to convert text file report to pdf in bash script.
To feed ps2pdf for ps file, I use paps command because of UTF8 encoding.
The problem is pdf file from ps2pdf is about 30 times bigger than ps file created from paps.
Previous, I used a2ps to convert text to ps and then fed to ps2pdf, and the pdf output from this is normal size and not big.
Is there any way to reduce the pdf size from paps and ps2pdf? Or what am I doing wrong?
The command I used is as below.
paps --landscape --font="Freemono 10" textfile.txt > textfile.ps
ps2pdf textfile.ps textfile.pdf
Thank you very much.
As the author of paps, I agree with #Kens's description of paps' inner workings. Indeed, I chose to create my own font mechanism in the postscript language. That is history though as I have just released a new version of paps that uses cairo for its postscript, pdf, or svg rendering. This is much more compact than paps output, especially w.r.t. the result after doing ps2pdf. Please check out http://github.com/dov/paps .
For ps2pdf, it is easiest to control output size is by designating paper size.
An example command is:
ps2pdf -sPAPERSIZE=a4 -dOptimize=true -dEmbedAllFonts=true YourPSFile.ps
ps2pdf is the wrapper to ghostscript (ps2pdf is owned by ghostscript package)
with -sPAPERSIZE=something you define the paper size. Wondering about valid PAPERSIZE values? See [http://ghostscript.com/doc/current/Use.htm#Known_paper_sizes here]
-dOptimize=true let's the created PDF be optimised for loading
-dEmbedAllFonts=true makes the fonts look always nice
All of this is from : https://wiki.archlinux.org/index.php/Ps2pdf
I think he means the size on disk, rather than the size of the output media. The 'most likely' scenario normally is that the source contains a large DCT encoded image (JPEG) which is decoded and then compressed losslessly into the PDF file using something like flate.
But that can't be the case here, as its apparently only text. So the next most likely problem is that the text is being rasterised, which suggests some odd fonts in the PostScript, which is possible if you are using UTF-8 text, its probably constructing something daft like a CIDFont with TrueType descendant fonts.
However, since the version of Ghostscript isn't given, and we don't have a file to look at, its really impossible to tell. Older versions of the pdfwrite device did less well on creating optimal files, especially from CIDFonts.
Setting 'Optimize=true' won't actually do anything with the current version of pdfwrite, that's an Acrobat Distiller parameter we no longer implement. Older versions of Ghostscript did use it, but the output wasn't correctly Linearised.
The correct parameter for newer versions is '-dFastWebView' which is supposed to be faster when loading from the web if the client can deal with this format. Given the crazy way its specified, practically no viewer in the world does. However, the file is properly constructed in recent versions, so if you can find a viewer which supports it, you can use this (at the expense of making the PDF file slightly larger)
If you would like to post a URL to a PostScript file exhibiting problems I can look at it, but without it there's really nothing much I can say.
Update
The problem is the paps file, it doesn't actually contain any text at all, in a PostScript sense.
Each character is stored as a procedure, where a path is drawn and then filled. This is NOT stored in a font, just in a dictionary. All the content on the page is stored in strings in a paps 'language'. In the case of text this simply calls the procedure for the relevant glyph(s)
Now, because this isn't a font, the repeated procedures are simply seen by pdfwrite (and pretty much all other PostScript consumers) as a series of paths and fills, and that's exactly what gets written to the output in the PDF file.
Now normally a PDF file would contain text that looks like :
/Helvetica 20 Tf
(AAA) Tj
which is pretty compact, the font would contain the program to draw the 'A' so we only include it once.
The output from paps for the same text would look like (highly truncated) :
418.98 7993.7 m
418.98 7981.84 l
415.406 7984.14 411.82 7985.88 408.219 7987.04 c
...
... 26 lines omitted
...
410.988 7996.3 414.887 7995.19 418.98 7993.7 c
f
418.98 7993.7 m
418.98 7981.84 l
415.406 7984.14 411.82 7985.88 408.219 7987.04 c
...
... 26 lines omitted
...
410.988 7996.3 414.887 7995.19 418.98 7993.7 c
f
418.98 7993.7 m
418.98 7981.84 l
415.406 7984.14 411.82 7985.88 408.219 7987.04 c
...
... 26 lines omitted
...
410.988 7996.3 414.887 7995.19 418.98 7993.7 c
f
which as you can clearly see is much larger. Whereas with a font we would only include the instructions to draw the glyph once, and then use only a few bytes to draw each occurrence, with the paps output we include the drawing instructions for the glyph each and every time it is drawn.
So the problem is the way paps emits PostScript, and there is nothing that pdfwrite can do about it.
That said, I see that you are using Ghostscript 8.71 which is now 4 years old, you should probably consider upgrading.

How do you programmatically - or from CLI - rotate a PDF?

What I need to be able to do is to rotate a PDF clockwise or counter clockwise either programmatically or from the CLI.
I have spent considerable time researching and attempting to rotate PDFs with GhostScript (as GS is already installed for other reasons) to no avail, nothing seems result in any rotation.
I have looked into using pdftk but it requires java libraries, and I would rather not have java installed on the system.
We've even looked at modifying the PDF file ourselves, and we have had some success with this, but we haven't come across a reliable method that works for all PDFs.
So my #1 preference would be to achieve this with existing resources on the system.
My #2 preference would be to do a little modification to the existing system as possible to achieve this.
The server in question is using Linux (OpenSuse 11) with PHP 5.2 and I have Ghostscript 9.0 installed.
Any advice would be much appreciated.
You can use pdfpages LaTeX package
\documentclass{article}
\usepackage{pdfpages}
\begin{document}
\includepdf[angle=45]{document.pdf}
\end{document}
The LaTeX document above, compiled via pdflatex, produces a document rotated 45 degrees.
There are also tools (wrappers of pdfpages) like pdfjam that can be used directly from command line:
$ pdfjam --suffix rotated45 --angle 45 --fitpaper true document.pdf
There is a way to do this with ImageMagick, if that's available to you.
Example:
$ convert originalfile.pdf -rotate <cw_angle_degrees> newfile.pdf
Note, however, that since this is a raster-based rotation, there will (generally) be a noticeable loss of quality AND increase in filesize.
You haven't said how you are trying to do this with Ghostscript, but if you want to use the pdfwrite device you will need to set /AutoRotatePages=/None or it will rotate the final page so that the majorty of any real text is horizontal.

What are best parameters to run ImageMagick to convert low quality pdf to images (for OCR)

I have several low quality pdfs. I would like to use OCR -- to be more precise Ocropus to get text from them. To do use, I use first ImageMagick -- a command line tool to convert pdf to images -- to transforms these pdfs into jpg or png.
However ImageMagick produces very low quality images and Ocropus hardly recognizes anything. I would like to learn what are the best parameters for handling low quality pdfs to provide as-good-as-possible-quality images to OCR.
I have found this page, but I do not know where to start.
You can learn about the detailed settings ImageMagick's "delegates" (external programs IM uses, such as Ghostscript) by typing
convert -list delegate
(On my system that's a list of 32 different commands.) Now to see which commands are used to convert to PNG, use this:
convert -list delegate | findstr /i png
Ok, this was for Windows. You didn't say which OS you use. [*] If you are on Linux, try this:
convert -list delegate | grep -i png
You'll discover that IM does produce PNG only from PS or EPS input. So how does IM get (E)PS from your PDF? Easy:
convert -list delegate | findstr /i PDF
convert -list delegate | grep -i PDF
Ah! It uses Ghostscript to make a PDF => PS conversion, then uses Ghostscript again to make a PS => PNG conversion. Works, but isn't the most efficient way if you know that Ghostscript can do PDF => PNG in one go. And faster. And in much better quality.
About IM's handling of PDF conversion to images via the Ghostscript delegate you should know two things first and foremost:
By default, if you don't give an extra parameter, Ghostscript will output images with a 72dpi resolution. That's why Karl's answer suggested to add -density 600 which tells Ghostscript to use a 600 dpi resolution for its image output.
The detour of IM to call Ghostscript twice to convert first PDF => PS and then PS => PNG is a real blunder. Because you never win and harldy keep quality in the first step, but very often loose some. Reasons:
PDF can handle transparencies, which PostScript can not.
PDF can embed TrueType fonts, which Ghostscript can not. etc.pp.
Conversion in the direction PS => PDF is not that critical....)
That's why I'd suggest you convert your PDFs in one go to PNG (or JPEG) using Ghostscript directly. And use the most recent version 8.71 (soon to be released: 9.01) of Ghostscript! Here are example commands:
gswin32c.exe ^
-sDEVICE=pngalpha ^
-o output/page_%03d.png ^
-r600 ^
d:/path/to/your/input.pdf
(This is the commandline for Windows. On Linux, use gs instead of gswin32c.exe, and \ instead of ^.) This command expects to find an output subdirectory where it will store a separate file for each PDF page. To produce JPEGs of good quality, try
gs \
-sDEVICE=jpeg \
-o output/page_%03d.jpeg \
-r600 \
-dJPEGQ=95 \
/path/to/your/input.pdf
(Linux command version). This direct conversion avoids the intermediate PostScript format, which may have lost your TrueType font and transparency object's information that were in the original PDF file.
[*] D'oh! I missed to see your "linux" tag at first...
-density 600 or so should give you what you need.
At least two other tools you may want to consider:
pdfimages, which comes with the package poppler-utils, makes it easy to extract the images from a PDF without degrading them.
pdfsandwich, which can give you an OCR'd file by simply running pdfsandwich inputfile.pdf. You may need to tweak the options to get a decent result. See the official page for more info.

PDF on Linux: Combine font subsets and replace Type 3 with Type 1

I have a PDF file that I'd like to post-process on Linux. In particular I'd like to:
Replace Type 3 fonts with Type 1 fonts
Replace multiple subsets of the same font with a single subsets (the subsets are the result of including figures in LaTeX, where each figure contains a subset'ed font)
With Windows these two steps are possible with the Adobe Distiller (open the document file and print it into a new PDF document with the respective settings).
On Linux I'm able to subset fonts with Ghostscript [1], but it does not seem to be able to replace (all?) Type 3 fonts with Type 1 fonts or to combine multiple subsets of the same font.
Any hints on how I can achieve these two tasks with free tools?
(I am aware of the reply to How to convert Type 3 font to Type 1 font in PDF. However, I don't really care if I theoretically lose information about the font, as this conversation seems to work fine in Distiller).
[1] With the arguments:
gs -dPDFA -dSAFER -dNOPLATFONTS -dNOPAUSE -dBATCH \
-sDEVICE=pdfwrite -dCompatibilityLevel=1.4 \
-dPDFSETTINGS=/printer -dCompatibilityLevel=1.4 \
-dMaxSubsetPct=100 -dSubsetFonts=true \
-dEmbedAllFonts=true -sOutputFile=/tmp/tmp.pdf -f "$1"
Somehow I doubt your statement "With Windows these two steps are possible with the Adobe Distiller". I'd need to see with my own eyes that this works before I can believe it. This is especially true for the "replace multiple subsets of the same font with a single subsets". (But I'm not in a position to verify or falsify the statement myself right now... so I'll just take it for a fact for the time being.)
Type 3 fonts are described in a fully-fledged version of PostScript. Type 1 fonts are described by using a subset of the PostScript language.
Replacements of embedded fonts are a non-trivial task when processing PDF files. I'm not familiar with any Ghostscript-related utility that could do that.
callassoftware.com has a very powerful commandline utility for sale called pdfToolbox CLI 4. It is available for Windows, Linux, Mac OS X and Solaris. pdfToolbox4 is capable of achieving practically everything you can imagine in so-called PDF preflighting jobs. This includes un-embedding of font subsets and re-embedding them again with their full sets (do it in 2 separate steps, so it might produce the result you want).
That's about the only tool I can think of which could help you. (BTW, a part of callas' PDF preflighting technology is licensed by Adobe to pose in Acrobat 9 Pro as its own preflighting tool...)

Get started with Latex on Linux [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 5 years ago.
Improve this question
Impressed by is-latex-worth-learning-today, and many how-to's on Windows,
How do you have someone started with LaTeX on Linux?
How do you generate a pdf out of it and give up the OOO Word processer?
Update:
Thanks to all the suggestions given in here. I was able to create an awesome ppt using the Beamer class: http://github.com/becomingGuru/gids-django-ppt. I found this approach too far better than using powerpoint and the like.
Those interested may checkout the TEX file, with many custom commands and the corresponding presentation.
First you'll need to Install it:
If you're using a distro which packages LaTeX (almost all will do) then look for texlive or tetex. TeX Live is the newer of the two, and is replacing tetex on most distributions now.
If you're using Debian or Ubuntu, something like:
<code>apt-get install texlive</code>
..will get it installed.
RedHat or CentOS need:
<code>yum install tetex</code>
Note : This needs root permissions, so either use su to switch user to root, or prefix the commands with sudo, if you aren't already logged in as the root user.
Next you'll need to get a text editor. Any editor will do, so whatever you are comfortable with. You'll find that advanced editors like Emacs (and vim) add a lot of functionality and so will help with ensuring that your syntax is correct before you try and build your document output.
Create a file called test.tex and put some content in it, say the example from the LaTeX primer:
\documentclass[a4paper,12pt]{article}
\begin{document}
The foundations of the rigorous study of \emph{analysis}
were laid in the nineteenth century, notably by the
mathematicians Cauchy and Weierstrass. Central to the
study of this subject are the formal definitions of
\emph{limits} and \emph{continuity}.
Let $D$ be a subset of $\bf R$ and let
$f \colon D \to \mathbf{R}$ be a real-valued function on
$D$. The function $f$ is said to be \emph{continuous} on
$D$ if, for all $\epsilon > 0$ and for all $x \in D$,
there exists some $\delta > 0$ (which may depend on $x$)
such that if $y \in D$ satisfies
\[ |y - x| < \delta \]
then
\[ |f(y) - f(x)| < \epsilon. \]
One may readily verify that if $f$ and $g$ are continuous
functions on $D$ then the functions $f+g$, $f-g$ and
$f.g$ are continuous. If in addition $g$ is everywhere
non-zero then $f/g$ is continuous.
\end{document}
Once you've got this file you'll need to run latex on it to produce some output (as a .dvi file to start with, which is possible to convert to many other formats):
latex test.tex
This will print a bunch of output, something like this:
=> latex test.tex
This is pdfeTeX, Version 3.141592-1.21a-2.2 (Web2C 7.5.4)
entering extended mode
(./test.tex
LaTeX2e <2003/12/01>
Babel <v3.8d> and hyphenation patterns for american, french, german, ngerman, b
ahasa, basque, bulgarian, catalan, croatian, czech, danish, dutch, esperanto, e
stonian, finnish, greek, icelandic, irish, italian, latin, magyar, norsk, polis
h, portuges, romanian, russian, serbian, slovak, slovene, spanish, swedish, tur
kish, ukrainian, nohyphenation, loaded.
(/usr/share/texmf/tex/latex/base/article.cls
Document Class: article 2004/02/16 v1.4f Standard LaTeX document class
(/usr/share/texmf/tex/latex/base/size12.clo))
No file test.aux.
[1] (./test.aux) )
Output written on test.dvi (1 page, 1508 bytes).
Transcript written on test.log.
..don't worry about most of this output -- the important part is the Output written on test.dvi line, which says that it was successful.
Now you need to view the output file with xdvi:
xdvi test.dvi &
This will pop up a window with the beautifully formatted output in it. Hit `q' to quit this, or you can leave it open and it will automatically update when the test.dvi file is modified (so whenever you run latex to update the output).
To produce a PDF of this you simply run pdflatex instead of latex:
pdflatex test.tex
..and you'll have a test.pdf file created instead of the test.dvi file.
After this is all working fine, I would suggest going to the LaTeX primer page and running through the items on there as you need features for documents you want to write.
Future things to consider include:
Use tools such as xfig or dia to create diagrams. These can be easily inserted into your documents in a variety of formats. Note that if you are creating PDFs then you shouldn't use EPS (encapsulated postscript) for images -- use pdf exported from your diagram editor if possible, or you can use the epstopdf package to automatically convert from (e)ps to pdf for figures included with \includegraphics.
Start using version control on your documents. This seems excessive at first, but being able to go back and look at earlier versions when you are writing something large can be extremely useful.
Use make to run latex for you. When you start on having bibliographies, images and other more complex uses of latex you'll find that you need to run it over multiple files or multiple times (the first time updates the references, and the second puts references into the document, so they can be out-of-date unless you run latex twice...). Abstracting this into a makefile can save a lot of time and effort.
Use a better editor. Something like Emacs + AUCTeX is highly competent. This is of course a highly subjective subject, so I'll leave it at that (that and that Emacs is clearly the best option :)
To get started with LaTeX on Linux, you're going to need to install a couple of packages:
You're going to need a LaTeX distribution. This is the collection of programs that comprise the (La)TeX computer typesetting system. The standard LaTeX distribution on Unix systems used to be teTeX, but it has been superceded by TeX Live. Most Linux distributions have installation packages for TeX Live--see, for example, the package database entries for Ubuntu and Fedora.
You will probably want to install a LaTeX editor. Standard Linux text editors will work fine; in particular, Emacs has a nice package of (La)TeX editing macros called AUCTeX. Specialized LaTeX editors also exist; of those, Kile (KDE Integrated LaTeX Environment) is particularly nice.
You will probably want a LaTeX tutorial. The classic tutorial is "A (Not So) Short Introduction to LaTeX2e," but nowadays the LaTeX wikibook might be a better choice.
I would recommend start using Lyx, with that you can use Latex just as easy as OOO-Writer.
It gives you the possibility to step into Latex deeper by manually adding Latex-Code to your Document.
PDF is just one klick away after installatioin. Lyx is cross-plattform.
It depends on your Linux distibution and your preference of editors etc. but I would recommend to start with Kile (a KDE app) as it is easy to learn and installing it should install most of the needed packages for LaTex and PDF generation. Just have a look at the screenshots.
If you use Ubuntu or Debian, I made a tutorial easy to follow: Install LaTeX on Ubuntu or Debian. This tutorial explains how to install LaTeX and how to create your first PDF.
LaTeX comes with most Linux distributions in the form of the teTeX distribution. Find all packages with 'teTeX' in the name and install them.
Most editors such as vim or emacs come with TeX editing modes. You can also get WYSIWIG-ish front-ends (technically WYSIWYM), of which perhaps the best known is LyX.
The best quick intro to LaTeX is Oetiker's 'The not so short intro to LaTeX'
LaTeX works like a compiler. You compile the LaTeX document (which can include other files), which generates a file called a .dvi (device independent). This can be post-processed to various formats (including PDF) with various post-processors.
To do PDF, use dvips and use the flag -PPDF (IIRC - I don't have a makefile to hand) to produce a PS with font rendering set up for conversion to pdf. PDF conversion can then be done with ps2pdf or distiller (if you have this).
The best format for including graphics in this environment is eps (Encapsulated Postscript) although not all software produces well-behaved postscript. Photographs in jpeg or other formats can be included using various mechanisms.
I would personally use a complete editing package such as:
TexWorks
TexStudio
Then I would install "MikTeX" as the compiling package, which allows you to generate a PDF from your document, using the pdfLaTeX compiler.
yum -y install texlive
was not enough for my centos distro to get the latex command.
This site https://gist.github.com/melvincabatuan/350f86611bc012a5c1c6 contains additional packages. In particular:
yum -y install texlive texlive-latex texlive-xetex
was enough but the author also points out these as well:
yum -y install texlive-collection-latex
yum -y install texlive-collection-latexrecommended
yum -y install texlive-xetex-def
yum -y install texlive-collection-xetex
Only if needed:
yum -y install texlive-collection-latexextra

Resources